This study examines the performance of Random Forest and XGBoost in predicting the diagnosis and severity of respiratory diseases using a simulated dataset of 2,000 patient records. The models were tested on two classification tasks: identifying disease types (e.g., pneumonia, influenza) and classifying severity levels (mild, moderate, severe). Both models achieved perfect accuracy in severity classification, with 1.0000 ± 0.0000 cross-validation scores, demonstrating strong stability under balanced class distributions. However, in the diagnosis task, Random Forest underperformed on minority classes, particularly pneumonia, with a recall of 0.18 and F1-score of 0.31. XGBoost, on the other hand, achieved superior results across all classes, including minority cases, with 0.9825 ± 0.0170 cross-validation accuracy and perfect test set performance. These findings highlight XGBoost’s robustness in handling imbalanced and multiclass medical data, making it a promising candidate for clinical decision support. Future work should address class imbalance and explore explainability techniques to improve trust and transparency in real-world applications.
Copyrights © 2025