Claim Missing Document
Check
Articles

Found 1 Documents
Search

PERFORMANCE ANALYSIS OF MODIFIED-ODBOT AND SMOTE FOR TREE-BASED CLASSIFICATION OF IMBALANCED HUMAN DEVELOPMENT INDEX DATA Yunna Mentari Indah; Anwar Fitrianto; Indahwati Indahwati
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 20 No 3 (2026): BAREKENG: Journal of Mathematics and Its Application
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30598/barekengvol20iss3pp2311-2326

Abstract

Classification of Human Development Index (HDI) data presents significant challenges due to severe class imbalance, where low-development regions are substantially underrepresented. This imbalance reduces classification performance because machine learning models tend to be biased toward the majority classes, making it challenging to accurately identify minority classes. This study proposes a modified ODBOT that replaces Euclidean distance with Mahalanobis distance within the oversampling mechanism (Mahalanobis-based ODBOT) and compares its performance with Euclidean-based ODBOT with and without Principal Component Analysis (PCA), as well as the conventional SMOTE technique. Four tree-based classifications were used, namely Random Forest, Double Random Forest, XGBoost, and LightGBM. The Human Development Index (HDI) data set from the Central Statistics Agency, consisting of 514 observations and four features, with an imbalance ratio (IR) of 19.0, was divided into training and testing sets (ratio 80:20) with 30 repetitions and evaluated using F1-Measure (F1-M), Geometric Mean (G-M), Area Under the Curve (AUC), and computation time. The results show that Mahalanobis-based ODBOT achieved the highest performance on the AUC evaluation metric across all classification models and the highest on the G-M evaluation metric in three of the four classification models, but required significantly longer computation time (2545.66 seconds). In contrast, the Euclidean-based ODBOT with PCA improved F1-M while reducing computation time (7.21 seconds) compared to the original ODBOT (68.23 seconds), while SMOTE consistently improved G-M and AUC across all experiments. These findings suggest that oversampling techniques should be selected based on practical application needs. Specifically, the Mahalanobis-based ODBOT can be recommended when improving prediction performance is a priority, while the Euclidean-based ODBOT with PCA or SMOTE is preferable for real-world implementations that require faster execution and lower computational cost.