JOURNAL OF APPLIED INFORMATICS AND COMPUTING
Vol. 9 No. 6 (2025): December 2025

Enhancing the Predictive Accuracy of Corrosion Inhibition Efficiency Using Gradient Boosting with Feature Engineering and Gaussian Mixture Model

Amri, Sahrul (Unknown)
Akrom, Muhamad (Unknown)
Trisnapradika, Gustina Alfa (Unknown)



Article Info

Publish Date
15 Dec 2025

Abstract

Prediction The development of Quantitative structure property relationship (QSPR) models for predicting corrosion inhibition efficiency (IE) often faces challenges due to small datasets, which heightens the risk of overfitting and results in less reliable performance assessments. This research creates an entirely leakage-free modeling framework by combining per-fold preprocessing, augmentation of training-only data, and rigorous Leave-One-Out Cross-Validation (LOOCV). A set of 20 pyridazine derivatives was evaluated using 12 quantum-chemical descriptors, including HOMO, LUMO, ΔE, dipole moment, electronegativity, hardness, softness, and the electron-transfer fraction. An initial assessment showed that all baseline models lacking augmentation Gradient Boosting, Random Forest, SVR, and XGBoost demonstrated limited predictive power (R² < 0.20), revealing the dataset's inherently low information complexity.To enhance representation in the feature space, a multi-scale Gaussian Mixture Model (GMM) was used to generate chemically valid synthetic samples, with all components trained solely on the training subset from each LOOCV fold. This strategy consistently improved model performance. The two most successful configurations, XGBoost + GMM v2 and Random Forest + GMM v3, reached R² values of 0.4457 and 0.4108, respectively, along with significant decreases in RMSE, MAE, and MAPE. These findings illustrate that GMM-based generative augmentation effectively captures multicluster structures within the descriptor space while expanding the chemical variability domain in a controlled way.While the resulting R² values remain inadequate for high-precision quantitative predictions, the proposed methodology provides a solid basis for early-stage evaluation of corrosion inhibitors in situations with limited data. Future research will aim to integrate advanced DFT-derived descriptors, molecular graph representations, and tests against larger external datasets to enhance model generalizability.

Copyrights © 2025






Journal Info

Abbrev

JAIC

Publisher

Subject

Computer Science & IT

Description

Journal of Applied Informatics and Computing (JAIC) Volume 2, Nomor 1, Juli 2018. Berisi tulisan yang diangkat dari hasil penelitian di bidang Teknologi Informatika dan Komputer Terapan dengan e-ISSN: 2548-9828. Terdapat 3 artikel yang telah ditelaah secara substansial oleh tim editorial dan ...