JOURNAL OF APPLIED INFORMATICS AND COMPUTING
Vol. 9 No. 5 (2025): October 2025

Gaussian Mixture-Based Data Augmentation Improves QSAR Prediction of Corrosion Inhibition Efficiency

Ignasius, Darnell (Unknown)
Akrom, Muhamad (Unknown)
Budi, Setyo (Unknown)



Article Info

Publish Date
04 Oct 2025

Abstract

Predicting corrosion inhibition efficiency IE (%) is often hindered by small, heterogeneous datasets. This study proposes a Gaussian mixture–based data augmentation pipeline to strengthen QSAR generalization under data scarcity. A curated set of 70 drug-like compounds with 14 physicochemical and quantum descriptors was cleaned, split 90/10 (train/test), and transformed using a Quantile Transformer followed by a Robust Scaler. A Gaussian Mixture model (GMM) with 2–5 components selected by the variational lower bound was fitted to the transformed training features and used to generate up to 2,500 synthetic samples. Eight regressors (Gaussian Process, Decision Tree, Random Forest, Bagging, Gradient Boosting, Extra Trees, SVR, and KNN) were evaluated on the held-out test set using R2 and RMSE. Augmentation improved performance across several families: for example, Gaussian Process R2 improved from −1.54 to 0.54 (RMSE 11.71 to 5.01) and Decision Tree R2 from −0.33 to 0.63 (RMSE 8.48 to 4.44), Bagging and Random Forest showed R2 increases of 0.67 and 0.40, respectively. The optimal synthetic size varied by model.

Copyrights © 2025






Journal Info

Abbrev

JAIC

Publisher

Subject

Computer Science & IT

Description

Journal of Applied Informatics and Computing (JAIC) Volume 2, Nomor 1, Juli 2018. Berisi tulisan yang diangkat dari hasil penelitian di bidang Teknologi Informatika dan Komputer Terapan dengan e-ISSN: 2548-9828. Terdapat 3 artikel yang telah ditelaah secara substansial oleh tim editorial dan ...