Budi Warsito
Department of Software Engineering Technology, Indonusa Polytechnic

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Dropout Prediction Using KNN, Decision Tree, Naive Bayes, and Ensemble Learning: A Comparative Performance Analysis with Synthetic Data Validation Norma Puspitasari; Mochammad Agung Wibowo; Budi Warsito
Jurnal Sisfokom (Sistem Informasi dan Komputer) Vol. 15 No. 02 (2026): MAY
Publisher : ISB Atma Luhur

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32736/sisfokom.v15i02.2591

Abstract

Student dropout is a critical issue in higher education because it affects institutional performance, resource allocation, and student success. Early identification of students with a high risk of dropout enables institutions to design timely academic and non-academic interventions. However, predicting dropout is challenging due to the complexity of influencing factors and class imbalance in educational data. This study presents a comparative performance analysis of four machine learning algorithms—K-Nearest Neighbor (KNN), Decision Tree (DT), Naive Bayes (NB), and an Ensemble Weighted Voting classifier—to support the development of an effective dropout prediction model. Due to restricted access to complete non-dropout student records, this study integrates real institutional withdrawal data from 2023–2024 to calibrate dropout characteristics and employs a transparently generated synthetic dataset for methodological validation. The dataset consists of 300 instances and is processed using the SMOTE technique to address class imbalance. Model performance is evaluated using accuracy, precision, recall, F1-score, and AUC. The experimental results obtained from synthetic validation indicate that the ensemble model outperforms individual classifiers, achieving an accuracy of 0.97, precision of 1.00, recall of 0.86, F1-score of 0.92, and AUC of 0.93. These findings highlight the potential of ensemble learning as a robust approach for early-warning systems in higher education while providing a transparent framework for predictive modeling under data-access constraints.