Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Jurnal Teknik Informatika (JUTIF)

Comparative Analysis of Data Balancing Techniques for Machine Learning Classification on Imbalanced Student Perception Datasets Saekhu, Ahmad; Berlilana, Berlilana; Saputra, Dhanar Intan Surya
Jurnal Teknik Informatika (Jutif) Vol. 6 No. 2 (2025): JUTIF Volume 6, Number 2, April 2025
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2025.6.2.4286

Abstract

Class imbalance is a common challenge in machine learning classification tasks, often leading to biased predictions toward the majority class. This study evaluates the effectiveness of various machine learning algorithms combined with advanced data balancing techniques in addressing class imbalance in a dataset collected from Class XI students of SMK Ma'arif 1 Kebumen. The dataset, comprising 300 instances and 36 features, includes textual attributes, demographic information, and sentiment labels categorized as Positive, Neutral, and Negative. Preprocessing steps included text cleaning, target encoding, handling missing data, and vectorization. Four sampling techniques—SMOTE, SMOTE + Tomek Links, ADASYN, and SMOTE + ENN—were applied to the training data to create balanced datasets. Nine machine learning algorithms, including CatBoost, Extra Trees, Random Forest, Gradient Boosting, and others, were evaluated using four train-test splits (60:40, 70:30, 80:20, and 90:10). Model performance was assessed using metrics such as accuracy, precision, recall, F1-score, and AUC- ROC. The results demonstrate that SMOTE + Tomek Links is the most effective balancing technique, achieving the highest accuracy when paired with ensemble algorithms like Extra Trees and Random Forest. CatBoost also delivered competitive performance, showcasing its adaptability in imbalanced scenarios. The 90:10 train-test split consistently yielded the best results, emphasizing the importance of adequate training data for model generalization. This study highlights the critical role of data balancing techniques and robust algorithms in optimizing classification performance for imbalanced datasets and provides a framework for future research in similar contexts.