This study aims to optimize a machine learning model to predict the corrosion inhibitor effectiveness of N-Heterocyclic compounds. The main challenge in this modelling is the limited dataset due to the high cost and time required to collect experimental data. To overcome this problem, this research utilizes Kernel Density Estimation (KDE) as a data augmentation technique, generating virtual samples that improve dataset diversity and model predictive performance. The developed dataset includes 11 relevant chemical features such as HOMO, LUMO, and Gap Energy. Linear (MLR, Ridge, Lasso, and ElasticNet) and non-linear (KNR, Random Forest, Gradient Boosting, Adaboost, XGBoost) machine learning models were evaluated based on Root Mean Squared Error (RMSE) and coefficient of determination (R²). The results show that data augmentation using KDE improves prediction accuracy and stability, especially in non-linear models like Random Forest and XGBoost. The application of KDE proved effective in improving the performance of predictive models. It can be recommended as an augmentation method in similar studies that require additional data to improve prediction accuracy.Keywords: Machine Learning, Kernel Density Estimator (KDE), Corrosion Inhibitor, Dataset
Copyrights © 2025