Heart disease, diabetes, and breast cancer pose significant global health challenges, and effectively addressing these chronic diseases necessitates a coordinated international effort. The integration of machine learning and predictive analytics offers promising solutions for tackling these issues. Our study presents a unified model that utilizes the random forest (RF) algorithm and SparkMLlib to predict these three diseases, testing the model on three distinct datasets and evaluating its performance using scientific metrics, including the receiver operating characteristic (ROC) curve, accuracy, precision, recall, and F1-score. Furthermore, we aim to investigate whether variations in medical data and contextual factors impact the results. The findings indicate that while the model shows strong overall performance, its effectiveness may differ for each disease due to factors such as data characteristics, disease-specific features, model behavior, and various biological and medical considerations; understanding these factors is essential for improving model performance and ensuring its appropriate use in clinical environments.
Copyrights © 2025