Corporate bankruptcy prediction is a critical task in financial risk management, particularly under conditions of economic uncertainty and highly imbalanced datasets. This study presents a comprehensive benchmarking framework that evaluates multiple supervised learning models and a voting ensemble approach for corporate bankruptcy prediction. Using a publicly available dataset comprising 78,682 financial records from US-listed companies on NYSE and NASDAQ (1999-2018), we compare the performance of Random Forest, XGBoost, Gradient Boosting, Support Vector Machine, Decision Tree, and a Voting Classifier. Extensive preprocessing, including outlier removal, normalization, and feature selection, and cost-sensitive learning to mitigate severe class imbalance was conducted to ensure data quality. Model performance was assessed using multiple evaluation metrics such as accuracy, F1-score, and ROC AUC to account for class imbalance. Results demonstrate that the Voting Classifier, integrating Random Forest, XGBoost, and Gradient Boosting via hard voting, achieves superior overall performance with an accuracy of 93.6%, F1-score of 96.5%, and ROC AUC of 82.6%, outperforming individual models. The findings underscore the value of ensemble approaches in improving prediction robustness while addressing class imbalance challenges in financial distress forecasting. This study contributes a reproducible experimental design that can guide future research and practical implementation of learning models in corporate bankruptcy risk assessment.
Copyrights © 2025