Journal of Applied Data Sciences
Vol 6, No 3: September 2025

Enhancing SMOTE Using Euclidean Weighting for Imbalanced Classification Dataset

Ramadhan, Nur Ghaniaviyanto (Unknown)
Maharani, Warih (Unknown)
Gozali, Alfian Akbar (Unknown)
Adiwijaya, Adiwijaya (Unknown)



Article Info

Publish Date
20 Jul 2025

Abstract

Class imbalance is a significant challenge in machine learning classification tasks because it often causes models to be biased toward the majority class, resulting in poor detection of minority classes. This study proposes a novel enhancement to the Synthetic Minority Over-sampling Technique (SMOTE) by incorporating Euclidean distance-based feature weighting, called Weighted SMOTE. The key idea is to improve the quality of synthetic minority samples by calculating feature importance using a Random Forest model and assigning higher weights to the most relevant features. The objective of this research is to generate more representative synthetic data, reduce model bias, and increase predictive accuracy on highly imbalanced datasets. Experiments were conducted on four benchmark datasets from the KEEL Repository with imbalance ratios ranging from 0.013 to 0.081. The proposed Weighted SMOTE combined with an ensemble voting classifier (Random Forest, AdaBoost, and XGBoost) demonstrated significant improvements compared to standard SMOTE and models without resampling. For example, on the Zoo-3 dataset, the Balanced Accuracy Score (BAS) increased from 75% to 90%, while the F1-score improved from 48% to 94%. On the Cleveland-0_vs_4 dataset, precision improved from 83% to 91% and recall remained high at 99%. Statistical testing using the Wilcoxon signed-rank test confirmed these improvements with p-values 0.05 for key metrics. The findings show that the proposed method effectively balances sensitivity and precision, generates more meaningful synthetic samples, and reduces the risk of overfitting compared to conventional oversampling. The novelty of this work lies in integrating Euclidean-based feature weighting into the SMOTE process and validating its performance on multiple domains with varying feature types and imbalance ratios. These results indicate that the proposed Weighted SMOTE approach contributes a practical solution for improving classification performance and model stability on severely imbalanced data.

Copyrights © 2025






Journal Info

Abbrev

JADS

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes ...