Drug induced autoimmunity (DIA) poses significant challenges in pharmaceutical development due to its complex immunological mechanisms and delayed clinical manifestations. This study proposes a comparative evaluation of three ensemble machine learning models CatBoost, XGBoost, and Gradient Boosting for predicting DIA using molecular descriptors. A curated dataset of drug compounds with known autoimmune outcomes was analyzed through a systematic workflow incorporating preprocessing, stratified sampling, and model evaluation using accuracy, F1 score, and ROC AUC. Results indicate that CatBoost achieved the highest ROC AUC, while XGBoost demonstrated superior balance between precision and recall, as reflected by its F1 score. Feature importance analysis using SHAP highlighted key molecular properties such as SlogP_VSA10 and fr_NH2 as major contributors to prediction outcomes. The study provides a reproducible and interpretable framework for early toxicity screening, offering valuable insights for data driven decision making in drug safety assessment.
Copyrights © 2025