The high failure rate in Python programming courses has become a serious issue for educational institutions. This study aims to evaluate the performance of four machine learning algorithms as the basis of an Early Warning System for predicting student graduation, namely Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), and K-Nearest Neighbors (KNN). The dataset consists of 3,000 records with 15 features, including demographic data, programming experience, and students’ learning activities. Performance evaluation was conducted using accuracy, precision, recall, F1-score, and ROC-AUC metrics after optimal hyperparameter tuning through GridSearchCV with 5-fold cross-validation. The evaluation results indicate that Random Forest achieved the best performance with an accuracy of 89.33%, precision of 87.50%, recall of 46.23%, F1-score of 60.49%, and ROC-AUC of 94.40%, outperforming SVM (accuracy 86.33%, F1-score 55.43%), Logistic Regression (accuracy 86.50%, F1-score 53.71%), and KNN (accuracy 84.83%, F1-score 44.17%). Feature importance analysis identified experience_encoded, hours_spent_learning_per_week, and projects_completed as the three strongest predictors of student graduation. These findings provide empirical evidence that Random Forest is the most effective algorithm for implementing an Early Warning System in Python programming courses, enabling instructors to identify at-risk students early and provide timely interventions to improve learning success rates.
Copyrights © 2026