Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : Journal of Applied Data Sciences

Enhancing SMOTE Using Euclidean Weighting for Imbalanced Classification Dataset Ramadhan, Nur Ghaniaviyanto; Maharani, Warih; Gozali, Alfian Akbar; Adiwijaya, Adiwijaya
Journal of Applied Data Sciences Vol 6, No 3: September 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i3.798

Abstract

Class imbalance is a significant challenge in machine learning classification tasks because it often causes models to be biased toward the majority class, resulting in poor detection of minority classes. This study proposes a novel enhancement to the Synthetic Minority Over-sampling Technique (SMOTE) by incorporating Euclidean distance-based feature weighting, called Weighted SMOTE. The key idea is to improve the quality of synthetic minority samples by calculating feature importance using a Random Forest model and assigning higher weights to the most relevant features. The objective of this research is to generate more representative synthetic data, reduce model bias, and increase predictive accuracy on highly imbalanced datasets. Experiments were conducted on four benchmark datasets from the KEEL Repository with imbalance ratios ranging from 0.013 to 0.081. The proposed Weighted SMOTE combined with an ensemble voting classifier (Random Forest, AdaBoost, and XGBoost) demonstrated significant improvements compared to standard SMOTE and models without resampling. For example, on the Zoo-3 dataset, the Balanced Accuracy Score (BAS) increased from 75% to 90%, while the F1-score improved from 48% to 94%. On the Cleveland-0_vs_4 dataset, precision improved from 83% to 91% and recall remained high at 99%. Statistical testing using the Wilcoxon signed-rank test confirmed these improvements with p-values 0.05 for key metrics. The findings show that the proposed method effectively balances sensitivity and precision, generates more meaningful synthetic samples, and reduces the risk of overfitting compared to conventional oversampling. The novelty of this work lies in integrating Euclidean-based feature weighting into the SMOTE process and validating its performance on multiple domains with varying feature types and imbalance ratios. These results indicate that the proposed Weighted SMOTE approach contributes a practical solution for improving classification performance and model stability on severely imbalanced data.
Hybrid Multi-Objective Metaheuristic Machine Learning for Optimizing Pandemic Growth Prediction Adiwijaya, Adiwijaya; Pane, Syafrial Fachri; Sulistiyo, Mahmud Dwi; Gozali, Alfian Akbar
Journal of Applied Data Sciences Vol 6, No 4: December 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i4.981

Abstract

Pandemic and epidemic events underscore the challenges of balancing health protection, economic resilience, and mobility sustainability. Addressing these multidimensional trade-offs requires adaptive and data-driven decision-support tools. This study proposes a hybrid framework that integrates machine learning with multi-objective optimization to support evidence-based policymaking in outbreak scenarios. Six key indicators—confirmed cases, disease-related mortality, recovery count, exchange rate, stock index, and workplace mobility—were predicted using eight regression models. Among these, the XGBoost Regressor consistently achieved the highest predictive accuracy, outperforming other approaches in capturing complex temporal and socioeconomic dynamics. To enhance interpretability, we developed SHAPPI, a novel method that combines Shapley Additive Explanations (SHAP) with Permutation Importance (PI). SHAPPI generates stable and meaningful feature rankings, with immunization coverage and transit station activity identified as the most influential factors in all domains. These importance scores were subsequently embedded into the Non-dominated Sorting Genetic Algorithm II (NSGA-II) to construct Pareto-optimal solutions. The optimization results demonstrate transparent trade-offs among health outcomes, economic fluctuations, and mobility changes, allowing policymakers to systematically evaluate competing priorities and design balanced intervention strategies. The findings confirm that the proposed framework successfully balances predictive performance, interpretability, and optimization, while providing a practical decision-support tool for epidemic management. Its generalizable design allows adaptation to diverse geographic and epidemiological contexts. In general, this research highlights the potential of hybrid machine learning and metaheuristic approaches to improve preparedness and policymaking in future health and socioeconomic crises.