Government regulations that heavily influence graduation decisions often lead to data imbalances that obscure the effectiveness of machine learning models in the education sector. This study evaluates the performance of the Naïve Bayes algorithm and compares it with Decision Tree and K-NN on a dataset of 385 students from SD Negeri 067053 Medan Deli, which exhibits extreme label imbalance (with the “Pass” class dominating at 88%). Model evaluation was conducted using Stratified 10-Fold Cross Validation. Test results show that Naïve Bayes achieved a high accuracy of 94.04% and proved to be the most robust in identifying the minority class with a Recall of 91.11%, outperforming other comparison algorithms that suffered from overfitting. However, this high accuracy masked an administrative bias, where the precision of Naïve Bayes in predicting the “Fail” class plummeted to 68.33%. This study confirms that accuracy metrics alone can be misleading on imbalanced data, making the application of resampling techniques during the data preprocessing stage absolutely necessary to address bias in educational data mining implementations.
Copyrights © 2026