Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : Knowledge Engineering and Data Science

The Effect of Resampling on Classifier Performance: an Empirical Study Utomo Pujianto; Muhammad Iqbal Akbar; Niendhitta Tamia Lassela; Deni Sutaji
Knowledge Engineering and Data Science Vol 5, No 1 (2022)
Publisher : Universitas Negeri Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.17977/um018v5i12022p87-100

Abstract

An imbalanced class on a dataset is a common classification problem. The effect of using imbalanced class datasets can cause a decrease in the performance of the classifier. Resampling is one of the solutions to this problem. This study used 100 datasets from 3 websites: UCI Machine Learning, Kaggle, and OpenML. Each dataset will go through 3 processing stages: the resampling process, the classification process, and the significance testing process between performance evaluation values of the combination of classifier and the resampling using paired t-test. The resampling used in the process is Random Undersampling, Random Oversampling, and SMOTE. The classifier used in the classification process is Naïve Bayes Classifier, Decision Tree, and Neural Network. The classification results in accuracy, precision, recall, and f-measure values are tested using paired t-tests to determine the significance of the classifier's performance from datasets that were not resampled and those that had applied the resampling. The paired t-test is also used to find a combination between the classifier and the resampling that gives significant results. This study obtained two results. The first result is that resampling on imbalanced class datasets can substantially affect the classifier's performance more than the classifier's performance from datasets that are not applied the resampling technique. The second result is that combining the Neural Network Algorithm without the resampling provides significance based on the accuracy value. Combining the Neural Network Algorithm with the SMOTE technique provides significant performance based on the amount of precision, recall, and f-measure.
Can Multinomial Logistic Regression Predicts Research Group using Text Input? Harits Ar Rosyid; Aulia Yahya Harindra Putra; Muhammad Iqbal Akbar; Felix Andika Dwiyanto
Knowledge Engineering and Data Science Vol 5, No 2 (2022)
Publisher : Universitas Negeri Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.17977/um018v5i22022p150-159

Abstract

While submitting proposals in SISINTA, students often confuse or falsely submit their proposals to the less relevant or incorrect research group. There are 13 research groups for the students to choose from. We proposed a text classification method to help students find the best research group based on the title and/or abstract. The stages in this study include data collection, preprocessing data, classification using Logistic Regression, and evaluation of the results. Three scenarios in research group classification are based on 1) title only, 2) abstract only, and 3) title and abstract. Based on the experiments, research group classification using title-only input is the best overall. This scenario gets the most optimal results with accuracy, precision, recall, and f1-score successively at 63.68%, 64.91%, 63.68%, and 63.46%. This result is sufficient to help students find the best research group based on the text titles. In addition, lecturers can comment more elaborately since the proposals are relevant to the research group’s scope.