Predicting academic performance is an important aspect of data-driven decision making in education, particularly in primary schools where early identification of learning difficulties is crucial. This study compares the performance of Linear Regression and Random Forest Regression models for predicting students’ academic performance using an Educational Data Mining approach. The experiment uses the Students Performance Dataset from Kaggle, consisting of 1000 student records with eight predictor variables, including demographic and learning-related attributes. The target variable is the average score derived from mathematics, reading, and writing results. Model development and evaluation are conducted using Python in Google Colaboratory. Performance is assessed using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the coefficient of determination (R²), while Random Forest is further optimized using GridSearchCV with 5-fold cross-validation. The results show that Linear Regression achieves the best performance (R² = 0.162, RMSE = 13.40, MAE = 10.49), outperforming both the default Random Forest (R² ≈ 0.000) and the tuned Random Forest (R² ≈ 0.112). Although the explained variance is relatively low, this finding indicates that simple demographic features provide limited predictive power for academic performance. A case study using a local dataset from a private primary school involving 132 sixth-grade students further confirms that Linear Regression performs more effectively than Random Forest for small and simple educational datasets. These results highlight the importance of aligning model selection with dataset characteristics in educational data mining.
Copyrights © 2026