Heart disease remains one of the leading causes of mortality worldwide, including in Indonesia, where delayed detection continues to be a serious challenge. In medical data mining, class imbalance often degrades classification performance by reducing sensitivity toward minority class cases. This study aims to compare the performance of the K-Nearest Neighbors (KNN) and Decision Tree algorithms for heart disease classification and to evaluate the effectiveness of random oversampling in handling imbalanced data. This research uses a heart disease dataset consisting of 10,000 medical records obtained from Kaggle. Data preprocessing includes categorical transformation, missing value imputation using KNN Imputer, and Min–Max normalization. Random oversampling is applied to increase minority class representation. Model evaluation is performed using stratified 10-fold cross-validation with accuracy, precision, recall, F1-score, and Receiver Operating Characteristic–Area Under the Curve (ROC–AUC) as performance metrics. Experimental results show that after random oversampling, the KNN model achieves the best performance with an accuracy of 94%, precision of 96%, recall of 90%, F1-score of 92%, and ROC–AUC of 90.2%. In comparison, the Decision Tree model records an accuracy of 80%, precision of 78%, recall of 81%, F1-score of 79%, and ROC–AUC of 81.5%. These findings demonstrate that random oversampling significantly improves minority class detection, particularly for KNN. This study contributes to Informatics by providing empirical evidence that simple and efficient data mining strategies can effectively address class imbalance in large-scale medical datasets, supporting the development of accurate, interpretable, and accessible AI-based diagnostic systems for early heart disease detection.
Copyrights © 2026