Cardiovascular disease (CVD) remains one of the leading causes of mortality globally, emphasizing the need for early detection and effective risk stratification. With the increasing availability of clinical and lifestyle-related health data, machine learning (ML) has become a powerful tool to support data-driven diagnosis and decision-making in healthcare. This study aims to develop and evaluate multiple supervised ML models to predict the presence of cardiovascular disease based on non-invasive features obtained from routine medical checkups. The dataset, comprising 69,301 individual records, includes variables such as age, gender, blood pressure, cholesterol, glucose levels, body measurements, and lifestyle habits. Following comprehensive data cleaning and feature engineering such as the derivation of BMI, Mean Arterial Pressure (MAP), and Pulse Pressure four classifiers were applied: Logistic Regression, Random Forest, Gradient Boosting, and Support Vector Machine (SVM). Model performance was evaluated using metrics including accuracy, precision, recall, F1-score, and ROC-AUC. Among all models tested, the Gradient Boosting Classifier achieved the highest performance, with a ROC-AUC score of 0.8060 and a balanced precision-recall tradeoff, indicating strong discriminatory power. Visualizations such as ROC curves and confusion matrices confirmed the superior capability of Gradient Boosting in differentiating between patients with and without CVD. These findings demonstrate the viability of ML-driven risk assessment models as decision-support tools in clinical settings, potentially aiding in earlier diagnosis and more personalized intervention strategies.
Copyrights © 2025