Thrombotic disorders remain a major cause of global morbidity and mortality, with dysregulation of blood coagulation pathways playing a central role in disease progression. In particular, Thrombin is a key therapeutic target for anticoagulant drug development, making accurate prediction of inhibitory activity highly relevant for accelerating discovery efforts. Despite advances in computational drug discovery, there is still a need for systematic evaluation of machine learning approaches for QSAR-based prediction of anticoagulant activity. Many existing studies focus on single models or lack consistent comparison frameworks, limiting insights into the relative performance of different ensemble techniques. To address this gap, this study explores the application of multiple ensemble machine learning methods, including Random Forest, XGBoost, Gradient Boosting, and Extra Trees, combined with hyperparameter optimization using random search. The main objective of this work is to conduct a comparative analysis of these ensemble models to predict pIC50 values for thrombin inhibitors using molecular descriptors derived from chemical structures. The results show that the Extra Trees model achieved the best overall performance, with an R2 of 0.697, RMSE of 0.851, and MAE of 0.615 after tuning. Additionally, Gradient Boosting and XGBoost demonstrated significant improvement following hyperparameter optimization, highlighting the importance of model tuning in QSAR tasks. Overall, the study confirms that ensemble learning methods yield reliable, accurate predictions of anticoagulant activity, with Extra Trees emerging as the most effective approach for this dataset.
Copyrights © 2026