Moi, Sim Hiew
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : JOIV : International Journal on Informatics Visualization

The Effects of Imbalanced Datasets on Machine Learning Algorithms in Predicting Student Performance Sujon, Khaled Mahmud; Hassan, Rohayanti; Khairudin, Alif Ridzuan; Moi, Sim Hiew; Mohd Shafie, Muhammad Luqman; Saringat, Zainuri; Erianda, Aldo
JOIV : International Journal on Informatics Visualization Vol 8, No 3-2 (2024): IT for Global Goals: Building a Sustainable Tomorrow
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62527/joiv.8.3-2.2449

Abstract

Predictive analytics technologies are becoming increasingly popular in higher education institutions. Students' grades are one of the most critical performance indicators educators can use to predict their academic achievement. Academics have developed numerous techniques and machine-learning approaches for predicting student grades over the last several decades. Although much work has been done, a practical model is still lacking, mainly when dealing with imbalanced datasets. This study examines the impact of imbalanced datasets on machine learning models' accuracy and reliability in predicting student performance. This study compares the performance of two popular machine learning algorithms, Logistic Regression and Random Forest, in predicting student grades. Secondly, the study examines the impact of imbalanced datasets on these algorithms' performance metrics and generalization capabilities. Results indicate that the Random Forest (RF) algorithm, with an accuracy of 98%, outperforms Logistic Regression (LR), which achieved 91% accuracy. Furthermore, the performance of both models is significantly impacted by imbalanced datasets. In particular, LR struggles to accurately predict minor classes, while RF also faces difficulties, though to a lesser extent. Addressing class imbalance is crucial, notably affecting model bias and prediction accuracy. This is especially important for higher education institutes aiming to enhance the accuracy of student grade predictions, emphasizing the need for balanced datasets to achieve robust predictive models.