JOURNAL OF SCIENCE AND SOCIAL RESEARCH
Vol 8, No 3 (2025): August 2025

DIABETES PREDICTION BASED ON MEDICAL RECORDS (PIMA INDIANS DIABETES DATASET) USING K-NN

Ruziq, Fahmi (Unknown)
Wayahdi, M. Rhifky (Unknown)
Ginting, Subhan Hafiz Nanda (Unknown)



Article Info

Publish Date
28 Aug 2025

Abstract

Abstract: The development of predictive technologies, especially artificial intelligence (AI) and machine learning, has opened up great opportunities in the health sector, including early detection of chronic diseases such as diabetes. This study aims to implement the K-Nearest Neighbors (KNN) algorithm in predicting the likelihood of a person having diabetes based on medical record data from the Pima Indians Diabetes Dataset. The dataset consists of 768 samples with eight key health features. The analysis process includes data cleaning, data distribution exploration, and data preparation for the modelling process. The distance between data is calculated using the Euclidean formula, and normalization is performed so that all features have equal weight. The data was then divided into training and test data with a ratio of 80:20. The analysis results showed an unbalanced class distribution, with more non-diabetic patients than those with diabetes. The age group of 21-30 years dominates in the dataset. The implementation of KNN in this study shows that the method is effective for medical classification based on numerical data. This research demonstrates the potential of KNN as a practical and easy-to-implement early diagnosis tool in data-driven health systems. Keyword: K-Nearest Neighbors, diabetes prediction, machine learning, medical data, classification. Abstrak: Perkembangan teknologi prediktif, khususnya kecerdasan buatan (AI) dan pembelajaran mesin (machine learning), telah membuka peluang besar dalam bidang kesehatan, termasuk deteksi dini penyakit kronis seperti diabetes. Penelitian ini bertujuan untuk mengimplementasikan algoritma K-Nearest Neighbors (KNN) dalam memprediksi kemungkinan seseorang menderita diabetes berdasarkan data rekam medis dari Pima Indians Diabetes Dataset. Dataset terdiri dari 768 sampel dengan delapan fitur kesehatan utama. Proses analisis meliputi pembersihan data, eksplorasi distribusi data, serta persiapan data untuk proses modeling. Jarak antar data dihitung menggunakan rumus Euclidean, dan dilakukan normalisasi agar seluruh fitur memiliki bobot yang seimbang. Data kemudian dibagi menjadi data latih dan uji dengan rasio 80:20. Hasil analisis menunjukkan distribusi kelas yang tidak seimbang, dengan jumlah pasien non-diabetes lebih banyak dibandingkan yang menderita diabetes. Kelompok usia 21–30 tahun mendominasi dalam dataset. Implementasi KNN dalam studi ini menunjukkan bahwa metode ini efektif digunakan untuk klasifikasi medis berbasis data numerik. Penelitian ini mendemonstrasikan potensi KNN sebagai alat bantu diagnosis awal yang praktis dan mudah diimplementasikan dalam sistem kesehatan berbasis data. Kata kunci: K-Nearest Neighbors, prediksi diabetes, machine learning, data medis,                     klasifikasi.

Copyrights © 2025






Journal Info

Abbrev

JSSR

Publisher

Subject

Computer Science & IT Economics, Econometrics & Finance Education Social Sciences

Description

Journal of Science and Social Research is accepts research works from academicians in their respective expertise of studies. Journal of Science and Social Research is platform to disclose the research abilities and promote quality and excellence of young researchers and experienced thoughts towards ...