Heart disease remains a significant health concern, and early prediction plays a crucial role in improving patient outcomes. This study examines data mining techniques for heart disease classification, with a focus on the Naïve Bayes algorithm. A common challenge in such classification tasks is data imbalance, which can negatively impact the performance and evaluation metrics of the algorithm. To address this, we employed the Synthetic Minority Over-sampling Technique (SMOTE) to handle imbalanced data. Using the Knowledge Discovery in Databases (KDD) framework, the research followed data selection, pre-processing, transformation, mining, and evaluation stages. We applied SMOTE to the Naïve Bayes algorithm across three data split ratios (70:30, 60:40, and 50:50) and compared performance metrics before and after the SMOTE application. For the first dataset, the 50:50 split ratio showed the most tremendous improvement, with precision increasing from 30.74% to 78.15%, recall from 42.88% to 63.89%, and the Area Under Curve (AUC) from 0.819 to 0.831, although accuracy decreased from 86.82% to 73.01%. For the second dataset, the 70:30 split ratio yielded the most significant improvements, with accuracy rising from 95.22% to 97.72%, precision from 96.33% to 99.88%, recall from 51.11% to 95.57%, and AUC from 0.969 to 0.996. These results demonstrate that SMOTE can substantially improve classification performance in heart disease prediction, particularly in precision, recall, and AUC, with varying effects on accuracy depending on the dataset.
Copyrights © 2025