Academic data plays a central role in supporting decision-making in educational institutions. However, the successful implementation of machine learning to analyze and make predictions based on academic data highly depends on the quality and readability of the data. To fully harness the potential of machine learning, careful preprocessing of academic data is essential. This research aims to design and implement preprocessing techniques, that is imputation, winsorizing, and dropping data on academic data. To handle missing values, the Multivariate Imputation by Chained Equation method is used with three different algorithms, linear regression, random forest, and KNN, and then the accuracy of these three algorithms in predicting missing values is compared. Additionally, winsorizing method is applied to outliers and data duplication is addressed by dropping duplicate data. Based on the testing results through evaluation metrics, these preprocessing techniques can improve model accuracy by 0.037 for MAE, 0.11 for RMSE, and 0.006 for MSE. The processed data allows the model to function more optimally and produce more reliable results.
Copyrights © 2026