p-Index From 2021 - 2026
7.007
P-Index
This Author published in this journals
All Journal Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI) CommIT (Communication & Information Technology) Journal of ICT Research and Applications International Journal of Advances in Intelligent Informatics Scientific Journal of Informatics Journal of Information Systems Engineering and Business Intelligence Indonesian Journal on Computing (Indo-JC) IJoICT (International Journal on Information and Communication Technology) JOIV : International Journal on Informatics Visualization Sinkron : Jurnal dan Penelitian Teknik Informatika Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) International Journal of Artificial Intelligence Research Journal of Information Technology and Computer Science (JOINTECS) JURNAL MEDIA INFORMATIKA BUDIDARMA Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control JURIKOM (Jurnal Riset Komputer) Building of Informatics, Technology and Science Journal of Information Systems and Informatics RADIAL: JuRnal PerADaban SaIns RekAyasan dan TeknoLogi Indonesian Journal of Electrical Engineering and Computer Science Journal of Computer System and Informatics (JoSYC) Madani : Indonesian Journal of Civil Society Teknika Journal of Applied Data Sciences KLIK: Kajian Ilmiah Informatika dan Komputer Journal of Dinda : Data Science, Information Technology, and Data Analytics Jurnal Ilmiah IT CIDA : Diseminasi Teknologi Informasi SisInfo : Jurnal Sistem Informasi dan Informatika Jurnal INFOTEL RADIAL: Jurnal Peradaban Sains, Rekayasa dan Teknologi
Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Journal of Applied Data Sciences

Enhancing SMOTE Using Euclidean Weighting for Imbalanced Classification Dataset Ramadhan, Nur Ghaniaviyanto; Maharani, Warih; Gozali, Alfian Akbar; Adiwijaya, Adiwijaya
Journal of Applied Data Sciences Vol 6, No 3: September 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i3.798

Abstract

Class imbalance is a significant challenge in machine learning classification tasks because it often causes models to be biased toward the majority class, resulting in poor detection of minority classes. This study proposes a novel enhancement to the Synthetic Minority Over-sampling Technique (SMOTE) by incorporating Euclidean distance-based feature weighting, called Weighted SMOTE. The key idea is to improve the quality of synthetic minority samples by calculating feature importance using a Random Forest model and assigning higher weights to the most relevant features. The objective of this research is to generate more representative synthetic data, reduce model bias, and increase predictive accuracy on highly imbalanced datasets. Experiments were conducted on four benchmark datasets from the KEEL Repository with imbalance ratios ranging from 0.013 to 0.081. The proposed Weighted SMOTE combined with an ensemble voting classifier (Random Forest, AdaBoost, and XGBoost) demonstrated significant improvements compared to standard SMOTE and models without resampling. For example, on the Zoo-3 dataset, the Balanced Accuracy Score (BAS) increased from 75% to 90%, while the F1-score improved from 48% to 94%. On the Cleveland-0_vs_4 dataset, precision improved from 83% to 91% and recall remained high at 99%. Statistical testing using the Wilcoxon signed-rank test confirmed these improvements with p-values 0.05 for key metrics. The findings show that the proposed method effectively balances sensitivity and precision, generates more meaningful synthetic samples, and reduces the risk of overfitting compared to conventional oversampling. The novelty of this work lies in integrating Euclidean-based feature weighting into the SMOTE process and validating its performance on multiple domains with varying feature types and imbalance ratios. These results indicate that the proposed Weighted SMOTE approach contributes a practical solution for improving classification performance and model stability on severely imbalanced data.