Alif Zayyin Kamandani
Universitas Dian Nuswantoro, Semarang

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Evaluasi KNN dan Logistic Regression untuk Klasifikasi Diabetes dengan Preprocessing Terstandarisasi: Trade-off Kinerja dan Interpretabilitas Alif Zayyin Kamandani; Egia Rosi Subhiyakto
Building of Informatics, Technology and Science (BITS) Vol 7 No 4 (2026): March 2026
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i4.9534

Abstract

Although K-Nearest Neighbors (KNN) and Logistic Regression have been widely used in diabetes classification, studies that systematically combine a standardized preprocessing pipeline—including median imputation, feature standardization, and stratified data splitting—and evaluate the trade-off between predictive performance and model interpretability remain limited. This study aims to compare the performance of both algorithms in classifying diabetes status using the Pima Indians Diabetes dataset, which consists of 768 samples with eight numerical attributes. The research stages include data exploration, handling missing values using median imputation, feature standardization using StandardScaler, and stratified data splitting with a ratio of 80:20. Model evaluation is conducted using accuracy, precision, recall, F1-score, confusion matrix, and ROC-AUC metrics. The experimental results show that KNN with an optimal parameter of K=21 achieves an accuracy of 75.97%, an F1-score of 61.86%, and a ROC-AUC of 0.8120, while Logistic Regression achieves an accuracy of 70.78%, an F1-score of 54.55%, and a ROC-AUC of 0.8130. Although KNN demonstrates higher predictive performance, Logistic Regression provides advantages in interpretability through model coefficients, where the variables Glucose (β=1.1825) and BMI (β=0.6887) are identified as the main predictors of diabetes risk. These findings indicate a clear trade-off between accuracy and interpretability, suggesting that KNN is more suitable for high-accuracy prediction tasks, while Logistic Regression is more appropriate in clinical contexts requiring transparency and model accountability.