Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Variance : Journal of Statistics and Its Applications

PERFORMANCE ANALYSIS OF RANDOM FOREST CLASSIFICATION ON UNEMPLOYMENT RATE IN MALUKU PROVINCE BASED ON DATA BALANCING METHOD Yunizar, Mahdayani Putri; Sinay, Lexy Janzen; Yudistira, Yudistira
VARIANCE: Journal of Statistics and Its Applications Vol 7 No 1 (2025): VARIANCE: Journal of Statistics and Its Applications
Publisher : Statistics Study Programme, Department of Mathematics, Faculty of Mathematics and Natural Sciences, University of Pattimura

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30598/variancevol7iss1page31-38

Abstract

In 2023, the number of unemployed people in Maluku will reach 59,800 or 6.08% of the total population. To reduce unemployment in Maluku, it is essential to understand the unemployment situation of the Moluccan population based on socioeconomic factors immediately. Therefore, applying classification methods such as random forests is the right step, but it is recommended that the data be balanced to get accurate results. However, the unemployment rate in Maluku is much lower than that of the unemployed, so data imbalance affects the accuracy of the classification results. Therefore, a data balancing process is needed, among others, using the Random Oversampling of Sample (ROSE), Synthetic Minority Oversampling Technique (SMOTE), and Adaptive Synthetic Sampling (ADASYN) methods. This study uses data from the 2023 National Labor Force Survey (SAKERNAS) conducted in February by the Central Statistics Agency (BPS) of Maluku. The number of unemployed people is smaller than the number of unemployed residents. Therefore, action needs to be taken to address data inequality. The results of this study show that the random forest classification model with SMOTE has the best performance with a combination of 90% training data and 10% testing data, with a higher AUC value than other methods, and age variables are the most essential variables built into the model.