This study examines the application of the Synthetic Minority Over-sampling Technique (SMOTE) for heart disease classification using four machine learning algorithms, namely Logistic Regression, Random Forest, LightGBM, and XGBoost, based on the Heart Disease UCI dataset consisting of 920 medical records with 16 clinical features. The original severity labels (0–4) are converted into two classes, namely not sick (0) and sick (1–4), to better align with binary decision-making needs in clinical screening. The experiments are conducted in two scenarios: (1) training models on the original data without handling class imbalance and (2) training models with SMOTE applied only to the training data within a pipeline, accompanied by hyperparameter tuning using k-fold cross-validation. Model performance is evaluated using accuracy, precision, recall, F1-score, AUC-ROC, as well as confusion matrix analysis to examine misclassifications, particularly false negatives in the sick class. In the scenario without SMOTE, the best model, Logistic Regression, achieves an accuracy of 84.78%, recall of 84.31%, F1-score of 86.00%, and AUC-ROC of 91.95%, although the number of false negatives remains relatively high. After applying SMOTE, there is an increase in recall and F1-score for the positive class across all models, with the best performance obtained by Random Forest with SMOTE, which achieves an accuracy of 86.96%, recall of 87.25%, F1-score of 88.12%, and AUC-ROC of 93.34%. These findings indicate that the combination of SMOTE and hyperparameter optimization can produce a more balanced and reliable heart disease classification model that is potentially useful as a clinical decision support system in healthcare services.
Copyrights © 2026