Garuda - Garba Rujukan Digital

Jurnal Teknologi Terpadu

Vol 9 No 2 (2023): Desember, 2023

Widyanto, Agung (Unknown)
Kusrini (Unknown)
Kusnawi (Unknown)

Publish Date
12 Dec 2023

In classification, unbalanced data is expected. Unbalanced data has an inequality ratio between the majority and minority classes. Models trained with unbalanced data tend to predict the minority class as the majority class. This study aims to determine the effect of data balance on the accuracy of the Support Vector Machine (SVM) classification model. The data set used is the blood donor data set downloaded from the repository belonging to the University of California, Irvine (UCI). The Waikato Environment for Knowledge Analysis (WEKA) tool was chosen to present the results of training development and model testing. The research framework scheme is used as a reference for knowledge flow.Â In scenario 1, data pre-processing includes handling missing values using mean-impulse and normalizing MinMax scaling. With a data set that has an inequality ratio of 1:3, the SVM classifier gets an accuracy performance of 76.7%. In scenario 2, post-pre-processing is done by balancing the data using the Synthetic Minority Oversampling Technique (SMOTE). SVM classifier gets 69.8% accuracy performance. Model performance is evaluated using confusion metrics. The gap in recall values for each class is very high in scenario 1 (2.8% and 99.8%). Things are different in scenario 2 (75.6% and 64%). The test results of 748 samples obtained an accuracy of 76.7% for the scenario-1 model and 93.2% for the scenario-2 model. This proves that the balance of data influences the accuracy of the SVM classification model.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref