Wibowo, Jonathan Juliano
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Enhancing Heart Disease Classification: A Comparative Analysis of SMOTE and Naïve Bayes on Imbalanced Data Wibowo, Jonathan Juliano; Kristiyanti, Dinar Ajeng; Wiratama, Jansen
JOIV : International Journal on Informatics Visualization Vol 9, No 5 (2025)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62527/joiv.9.5.3248

Abstract

Heart disease remains a significant health concern, and early prediction plays a crucial role in improving patient outcomes. This study examines data mining techniques for heart disease classification, with a focus on the Naïve Bayes algorithm. A common challenge in such classification tasks is data imbalance, which can negatively impact the performance and evaluation metrics of the algorithm. To address this, we employed the Synthetic Minority Over-sampling Technique (SMOTE) to handle imbalanced data. Using the Knowledge Discovery in Databases (KDD) framework, the research followed data selection, pre-processing, transformation, mining, and evaluation stages. We applied SMOTE to the Naïve Bayes algorithm across three data split ratios (70:30, 60:40, and 50:50) and compared performance metrics before and after the SMOTE application. For the first dataset, the 50:50 split ratio showed the most tremendous improvement, with precision increasing from 30.74% to 78.15%, recall from 42.88% to 63.89%, and the Area Under Curve (AUC) from 0.819 to 0.831, although accuracy decreased from 86.82% to 73.01%. For the second dataset, the 70:30 split ratio yielded the most significant improvements, with accuracy rising from 95.22% to 97.72%, precision from 96.33% to 99.88%, recall from 51.11% to 95.57%, and AUC from 0.969 to 0.996. These results demonstrate that SMOTE can substantially improve classification performance in heart disease prediction, particularly in precision, recall, and AUC, with varying effects on accuracy depending on the dataset.