Abstract: The Human Development Index (HDI) is an important indicator for measuring the quality of regional development through the dimensions of health, education, and decent living standards. In North Sumatra Province, HDI achievements between districts/cities still show significant disparities, requiring a data-based analytical approach to map development patterns objectively. This study aims to optimize the validity of regional HDI clustering through the application of the K-Means++ algorithm with distance measure variations. This study uses a quantitative approach with an unsupervised learning method. The data analyzed includes HDI, Average Length of Schooling (ALS), and Adjusted Per Capita Expenditure sourced from the Central Statistics Agency. The research stages include data preprocessing and standardization, determining the optimal number of clusters using the Elbow method, applying the K-Means++ algorithm, and evaluating cluster quality using the Davies–Bouldin Index (DBI) and Purity Index. In addition, a comparison of clustering performance based on Euclidean, Manhattan, and Cosine distances was conducted. The results of the study show that the optimal number of clusters is three clusters representing high, medium, and low levels of human development. A DBI value of 0.60 and a Purity Index of 0.61 indicate good clustering quality. Euclidean and Manhattan distances produced the best performance compared to Cosine distance. Keyword: Human Development Index; K-Means++; Clustering; Distance Measure; Davies–Bouldin Index; Purity Index. Abstrak: Indeks Pembangunan Manusia (IPM) merupakan indikator penting untuk mengukur kualitas pembangunan wilayah melalui dimensi kesehatan, pendidikan, dan standar hidup layak. Di Provinsi Sumatera Utara, capaian IPM antar kabupaten/kota masih menunjukkan ketimpangan yang cukup signifikan, sehingga diperlukan pendekatan analitis berbasis data untuk memetakan pola pembangunan secara objektif. Penelitian ini bertujuan untuk mengoptimalkan validitas klasterisasi IPM wilayah melalui penerapan algoritma K-Means++ dengan variasi distance measure. Penelitian ini menggunakan pendekatan kuantitatif dengan metode unsupervised learning. Data yang dianalisis meliputi IPM, Rata Lama Sekolah (RLS), dan Pengeluaran per Kapita Disesuaikan yang bersumber dari Badan Pusat Statistik. Tahapan penelitian mencakup praproses dan standarisasi data, penentuan jumlah klaster optimal menggunakan metode Elbow, penerapan algoritma K-Means++, serta evaluasi kualitas klaster menggunakan Davies–Bouldin Index (DBI) dan Purity Index. Selain itu, dilakukan perbandingan kinerja klasterisasi berdasarkan Euclidean, Manhattan, dan Cosine distance. Hasil penelitian menunjukkan bahwa jumlah klaster optimal adalah tiga klaster yang merepresentasikan tingkat pembangunan manusia tinggi, menengah, dan rendah. Nilai DBI sebesar 0,60 dan Purity Index sebesar 0,61 menunjukkan kualitas klasterisasi yang baik. Euclidean dan Manhattan distance menghasilkan performa terbaik dibandingkan Cosine distance. Kata kunci: Indeks Pembangunan Manusia; K-Means++; Klasterisasi; Distance Measure; Davies–Bouldin Index; Purity Index.
Copyrights © 2025