This study analyzes factors influencing students’ final project completion status in a higher education context using six classification models: C4.5, Random Forest (RF), C4.5 with SMOTE, RF with SMOTE, Cost-Sensitive Random Forest (RF-CS), and Cost-Sensitive C4.5 (C4.5-CS). The dataset consists of 1,017 student records categorized into Ideal and Tidak Ideal, with a severe class imbalance where the minority class represents only 16.49% of the data.The results indicate that baseline models achieved high overall accuracy but showed limited effectiveness in identifying the minority Tidak Ideal class. SMOTE-based models improved minority-class recall but introduced a higher number of false positives, highlighting a trade-off between recall and precision. In contrast, cost-sensitive learning produced the most substantial improvement in minority-class detection. Among all evaluated models, Cost-Sensitive Random Forest demonstrated the most balanced performance by significantly reducing false-negative errors while maintaining reasonable overall accuracy.These findings confirm that algorithm-level cost-sensitive approaches are more effective than oversampling techniques for handling severe class imbalance in educational datasets. The proposed model provides a reliable basis for early identification of students at risk of delayed final project completion and supports data-driven academic decision-making
Copyrights © 2025