TEKNOKOM : Jurnal Teknologi dan Rekayasa Sistem Komputer
Vol. 4 No. 2 (2021): TEKNOKOM

HOW MACHINE LEARNING METHOD PERFORMANCE FOR IMBALANCED DATA : Case Study: Classification of Working Status of Banten Province

Sihombing, Pardomuan Robinson (Unknown)



Article Info

Publish Date
13 Jul 2021

Abstract

This study will examine the application of several classification methods to machine learning models by taking into account the case of imbalanced data. The research was conducted on a case study of classification modeling for working status in Banten Province in 2020. The data used comes from the National Labor Force Survey, Statistics Indonesia. The machine learning methods used are Classification and Regression Tree (CART), Naïve Bayes, Random Forest, Rotation Forest, Support Vector Machine (SVM), Neural Network Analysis, One Rule (OneR), and Boosting. Classification modeling using resample techniques in cases of imbalanced data and large data sets is proven to improve classification accuracy, especially for minority classes, which can be seen from the sensitivity and specificity values that are more balanced than the original data (without treatment). Furthermore, the eight classification models tested shows that the Boost model provides the best performance based on the highest sensitivity, specificity, G-mean, and kappa coefficient values. The most important/most influential variables in the classification of working status are marital status, education, and age.

Copyrights © 2021






Journal Info

Abbrev

teknokom

Publisher

Subject

Computer Science & IT

Description

Jurnal Teknologi dan Rekayasa Sistem Komputer (TEKNOKOM) with frequency 2 (two) times a year, ie in March and September. The editors receive scientific writings from lecturers, teachers and educational observers about the results of research, scientific studies and analysis and problem solving ...