Garuda - Garba Rujukan Digital

p-Index From 2021 - 2026

0.817

P-Index

This Author published in this journals

All Journal Building of Informatics, Technology and Science IJIIS: International Journal of Informatics and Information Systems Jurnal Teknik Informatika (JUTIF) International Journal for Applied Information Management

Saekhu, Ahmad

Unknown Affiliation

Author-ID : 8287641

Humanities Computer Science & IT Decision Sciences, Operations Research & Management Economics, Econometrics & Finance Environmental Science Social Sciences

Published : 4 Documents Claim Missing Document

Claim Missing Document

Articles

Title

Comparative Analysis of Data Balancing Techniques for Machine Learning Classification on Imbalanced Student Perception Datasets Saekhu, Ahmad; Berlilana, Berlilana; Saputra, Dhanar Intan Surya
Jurnal Teknik Informatika (Jutif) Vol. 6 No. 2 (2025): JUTIF Volume 6, Number 2, April 2025
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2025.6.2.4286

Class imbalance is a common challenge in machine learning classification tasks, often leading to biased predictions toward the majority class. This study evaluates the effectiveness of various machine learning algorithms combined with advanced data balancing techniques in addressing class imbalance in a dataset collected from Class XI students of SMK Ma'arif 1 Kebumen. The dataset, comprising 300 instances and 36 features, includes textual attributes, demographic information, and sentiment labels categorized as Positive, Neutral, and Negative. Preprocessing steps included text cleaning, target encoding, handling missing data, and vectorization. Four sampling techniques—SMOTE, SMOTE + Tomek Links, ADASYN, and SMOTE + ENN—were applied to the training data to create balanced datasets. Nine machine learning algorithms, including CatBoost, Extra Trees, Random Forest, Gradient Boosting, and others, were evaluated using four train-test splits (60:40, 70:30, 80:20, and 90:10). Model performance was assessed using metrics such as accuracy, precision, recall, F1-score, and AUC- ROC. The results demonstrate that SMOTE + Tomek Links is the most effective balancing technique, achieving the highest accuracy when paired with ensemble algorithms like Extra Trees and Random Forest. CatBoost also delivered competitive performance, showcasing its adaptability in imbalanced scenarios. The 90:10 train-test split consistently yielded the best results, emphasizing the importance of adequate training data for model generalization. This study highlights the critical role of data balancing techniques and robust algorithms in optimizing classification performance for imbalanced datasets and provides a framework for future research in similar contexts.

Co-Authors Berlilana Berlilana Dhanar Intan Surya Saputra Eko Priyanto Prasetyo, Priyo Agung

Title Search

Found 1 Documents Search Journal : Jurnal Teknik Informatika (JUTIF)

Abstract

Title

Found 1 Documents
Search
Journal : Jurnal Teknik Informatika (JUTIF)