Accurate and well-calibrated heart disease risk prediction is essential for supporting medical decision-making. This study analyzes Logistic Regression as an applied statistical model for heart disease prediction using the UCI Heart Disease dataset. Beyond discrimination metrics, we explicitly focus on probability reliability by evaluating calibration through the Brier score, calibration slope, and intercept, and by quantifying the impact of post-hoc calibration (isotonic regression and Platt scaling) on both calibration and discrimination. Model validation was conducted using stratified 5-fold cross-validation with AUROC, AUPRC, accuracy, and F1-score as evaluation metrics. The results show that Logistic Regression achieved competitive performance (AUROC 0.903; AUPRC 0.911; Accuracy 0.822; F1-score 0.835) with well-calibrated probability estimates relative to Random Forest and Gradient Boosting under the evaluated setting. Feature importance analysis using permutation methods identified chest pain type, number of major vessels (ca), ST depression (oldpeak), and exercise-induced angina (exang) as key predictors consistent with clinical literature. These findings indicate that simple applied statistical modeling, when paired with rigorous calibration assessment, can provide interpretable risk estimates that are more suitable for threshold-based decision support in early heart disease screening.
Copyrights © 2026