JOURNAL OF SCIENCE AND SOCIAL RESEARCH
Vol 8, No 4 (2025): November 2025

OPTIMALISASI VALIDITAS KLASTERISASI IPM MELALUI PENERAPAN VARIASI DISTANCE MEASURE PADA ALGORITMA K-MEANS++

Sipayung, Sardo Pardingotan (Unknown)
Efendi, Syahril (Unknown)



Article Info

Publish Date
30 Nov 2025

Abstract

Abstract: The Human Development Index (HDI) is an important indicator for measuring the quality of regional development through the dimensions of health, education, and decent living standards. In North Sumatra Province, HDI achievements between districts/cities still show significant disparities, requiring a data-based analytical approach to map development patterns objectively. This study aims to optimize the validity of regional HDI clustering through the application of the K-Means++ algorithm with distance measure variations. This study uses a quantitative approach with an unsupervised learning method. The data analyzed includes HDI, Average Length of Schooling (ALS), and Adjusted Per Capita Expenditure sourced from the Central Statistics Agency. The research stages include data preprocessing and standardization, determining the optimal number of clusters using the Elbow method, applying the K-Means++ algorithm, and evaluating cluster quality using the Davies–Bouldin Index (DBI) and Purity Index. In addition, a comparison of clustering performance based on Euclidean, Manhattan, and Cosine distances was conducted. The results of the study show that the optimal number of clusters is three clusters representing high, medium, and low levels of human development. A DBI value of 0.60 and a Purity Index of 0.61 indicate good clustering quality. Euclidean and Manhattan distances produced the best performance compared to Cosine distance. Keyword: Human Development Index; K-Means++; Clustering; Distance Measure; Davies–Bouldin Index; Purity Index. Abstrak: Indeks Pembangunan Manusia (IPM) merupakan indikator penting untuk mengukur kualitas pembangunan wilayah melalui dimensi kesehatan, pendidikan, dan standar hidup layak. Di Provinsi Sumatera Utara, capaian IPM antar kabupaten/kota masih menunjukkan ketimpangan yang cukup signifikan, sehingga diperlukan pendekatan analitis berbasis data untuk memetakan pola pembangunan secara objektif. Penelitian ini bertujuan untuk mengoptimalkan validitas klasterisasi IPM wilayah melalui penerapan algoritma K-Means++ dengan variasi distance measure. Penelitian ini menggunakan pendekatan kuantitatif dengan metode unsupervised learning. Data yang dianalisis meliputi IPM, Rata Lama Sekolah (RLS), dan Pengeluaran per Kapita Disesuaikan yang bersumber dari Badan Pusat Statistik. Tahapan penelitian mencakup praproses dan standarisasi data, penentuan jumlah klaster optimal menggunakan metode Elbow, penerapan algoritma K-Means++, serta evaluasi kualitas klaster menggunakan Davies–Bouldin Index (DBI) dan Purity Index. Selain itu, dilakukan perbandingan kinerja klasterisasi berdasarkan Euclidean, Manhattan, dan Cosine distance. Hasil penelitian menunjukkan bahwa jumlah klaster optimal adalah tiga klaster yang merepresentasikan tingkat pembangunan manusia tinggi, menengah, dan rendah. Nilai DBI sebesar 0,60 dan Purity Index sebesar 0,61 menunjukkan kualitas klasterisasi yang baik. Euclidean dan Manhattan distance menghasilkan performa terbaik dibandingkan Cosine distance. Kata kunci: Indeks Pembangunan Manusia; K-Means++; Klasterisasi; Distance Measure; Davies–Bouldin Index; Purity Index.

Copyrights © 2025






Journal Info

Abbrev

JSSR

Publisher

Subject

Computer Science & IT Economics, Econometrics & Finance Education Social Sciences

Description

Journal of Science and Social Research is accepts research works from academicians in their respective expertise of studies. Journal of Science and Social Research is platform to disclose the research abilities and promote quality and excellence of young researchers and experienced thoughts towards ...