Claim Missing Document
Check
Articles

Found 1 Documents
Search

Solving Simulated Imbalanced Body Performance Data using A-SUWO and Tomek Link Algorithm Febryan Grady; Joel Rizky Wahidiyat; Abba Suganda Girsang
Journal of Applied Engineering and Technological Science (JAETS) Vol. 6 No. 2 (2025): Journal of Applied Engineering and Technological Science (JAETS)
Publisher : Yayasan Riset dan Pengembangan Intelektual (YRPI)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.37385/jaets.v6i2.4738

Abstract

This research examines the impact of various sampling techniques on the performance of classification models in the context of imbalanced datasets, employing the body performance dataset as a case study.  Many studies in this field analyze the effect of sampling techniques on a model performance, however they often begin with imbalance datasets, lacking a balanced baseline for comparison. This research addresses that gap by simulating an imbalanced dataset from an originally balanced dataset, obtaining a target reference point for evaluating the effectiveness of the sampling methods. The dataset is categorized into three versions: (1) a normal distribution, (2) a simulated imbalanced distribution, and (3) a synthesized dataset achieved through various data sampling techniques, including oversampling with Adaptive Semi-Unsupervised Weighted Oversampling (A-SUWO), undersampling with Tomek Link, and hybrid sampling combining both techniques. The primary objective of this research is to identify sampling techniques, when combined with model performance, closely match the performance observed in the original balanced dataset. Based on all experiments using Decision Tree, Random Forest, and K-Nearest Neighbors (KNN) as classifiers, both A-SUWO and Tomek Link led to overfitting due to discernible gap between the training and testing accuracy, averaging 0.21304. Despite overftting and general performance issue, the undersampling with Tomek Link obtained highest test accuracy (0.65023), outperforming A-SUWO (0.62883) and the hybrid approach (0.63568) on average. These findings highlight the importance of appropriate sampling techniques and optimizing model performance in imbalanced datasets.