BAREKENG: Jurnal Ilmu Matematika dan Terapan
Vol 20 No 3 (2026): BAREKENG: Journal of Mathematics and Its Application

PERFORMANCE ANALYSIS OF MODIFIED-ODBOT AND SMOTE FOR TREE-BASED CLASSIFICATION OF IMBALANCED HUMAN DEVELOPMENT INDEX DATA

Yunna Mentari Indah (Department of Statistics and Data Science, IPB University, Indonesia)
Anwar Fitrianto (Department of Statistics and Data Science, IPB University, Indonesia)
Indahwati Indahwati (Department of Statistics and Data Science, IPB University, Indonesia)



Article Info

Publish Date
08 Apr 2026

Abstract

Classification of Human Development Index (HDI) data presents significant challenges due to severe class imbalance, where low-development regions are substantially underrepresented. This imbalance reduces classification performance because machine learning models tend to be biased toward the majority classes, making it challenging to accurately identify minority classes. This study proposes a modified ODBOT that replaces Euclidean distance with Mahalanobis distance within the oversampling mechanism (Mahalanobis-based ODBOT) and compares its performance with Euclidean-based ODBOT with and without Principal Component Analysis (PCA), as well as the conventional SMOTE technique. Four tree-based classifications were used, namely Random Forest, Double Random Forest, XGBoost, and LightGBM. The Human Development Index (HDI) data set from the Central Statistics Agency, consisting of 514 observations and four features, with an imbalance ratio (IR) of 19.0, was divided into training and testing sets (ratio 80:20) with 30 repetitions and evaluated using F1-Measure (F1-M), Geometric Mean (G-M), Area Under the Curve (AUC), and computation time. The results show that Mahalanobis-based ODBOT achieved the highest performance on the AUC evaluation metric across all classification models and the highest on the G-M evaluation metric in three of the four classification models, but required significantly longer computation time (2545.66 seconds). In contrast, the Euclidean-based ODBOT with PCA improved F1-M while reducing computation time (7.21 seconds) compared to the original ODBOT (68.23 seconds), while SMOTE consistently improved G-M and AUC across all experiments. These findings suggest that oversampling techniques should be selected based on practical application needs. Specifically, the Mahalanobis-based ODBOT can be recommended when improving prediction performance is a priority, while the Euclidean-based ODBOT with PCA or SMOTE is preferable for real-world implementations that require faster execution and lower computational cost.

Copyrights © 2026






Journal Info

Abbrev

barekeng

Publisher

Subject

Computer Science & IT Control & Systems Engineering Economics, Econometrics & Finance Energy Engineering Mathematics Mechanical Engineering Physics Transportation

Description

BAREKENG: Jurnal ilmu Matematika dan Terapan is one of the scientific publication media, which publish the article related to the result of research or study in the field of Pure Mathematics and Applied Mathematics. Focus and scope of BAREKENG: Jurnal ilmu Matematika dan Terapan, as follows: - Pure ...