Shifa Aldila, Amalia
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

EVALUATING LOGISTIC REGRESSION, SVM, KNN, AND ENSEMBLE MODELS FOR ACCURATE HEART DISEASE RISK PREDICTION Shifa Aldila, Amalia; Supriyono, Lawrence
JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer) Vol. 11 No. 3 (2026): JITK Issue February 2026
Publisher : LPPM Nusa Mandiri

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33480/jitk.v11i3.6738

Abstract

Cardiovascular disease remains the most significant contributor to global mortality, highlighting the importance of early and precise risk assessment within preventive healthcare frameworks. Alongside the rapid growth of clinical data availability, machine learning approaches have increasingly been adopted to assist medical decision-making, particularly for interpreting complex and high-dimensional health information. This research investigates the predictive capability of six supervised machine learning models in determining the likelihood of cardiovascular disease incidence: Logistic Regression, Support Vector Machine, k-Nearest Neighbors, Decision Tree, Random Forest, and Gradient Boosting. The Cleveland Heart Disease dataset from the UCI Machine Learning Repository served as the study's foundation. It includes 303 patient samples with a total of 76 recorded attributes. From this dataset, 14 clinically significant variables frequently reported in previous studies were selected for analysis. Considering the relatively small dataset size and the possibility of redundant or low-impact features, a feature selection approach was implemented to improve model robustness, minimize overfitting, and enhance interpretability. The data preparation process involved cleaning, normalization, feature selection, and division into datasets for testing and training. Metrics like accuracy, precision, recall, and F1-score were used to evaluate the model. The results of the experiment show that Random Forest and Logistic Regression models produced the highest predictive performance, followed by k-Nearest Neighbours and Support Vector Machine. These results indicate that supervised machine learning techniques, when supported by appropriate feature selection methods, are effective as decision-support tools for the early detection of cardiovascular disease.