Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : JOIN (Jurnal Online Informatika)

K-Means-Based Pseudo-Labeling Technique in Supervised Learning Models for Regional Classification Based on Types of Non-Communicable Diseases Surbakti, Herison; Munandar, Tb Ai
JOIN (Jurnal Online Informatika) Vol 10 No 2 (2025)
Publisher : Department of Informatics, UIN Sunan Gunung Djati Bandung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15575/join.v10i2.1609

Abstract

Non-Communicable Diseases (NCDs) pose a critical threat to global public health, with Indonesia experiencing significant challenges due to high mortality rates and uneven regional distribution. In Banten Province, limited access to labeled health data hampers effective, data-driven intervention strategies. This study proposes a semi-supervised learning approach to develop a regional classification model for NCDs. The methodology begins with K-Means clustering applied to data from 254 community health centers (Puskesmas) to generate pseudo-labels. Various cluster configurations (k=2 to 8) were evaluated, with the optimal result being two clusters based on a silhouette score of 0.735. These clusters were then used to create a semi-labeled dataset for supervised learning. Eight classification algorithms—CN2 Rule Inducer, k-Nearest Neighbor (kNN), Logistic Regression, Naïve Bayes, Neural Network, Random Forest, Support Vector Machine (SVM), and Decision Tree—were trained and compared. Among them, the Neural Network model achieved the highest performance, with an AUC of 0.999 and an MCC of 0.976, indicating excellent stability and predictive accuracy. The findings validate the effectiveness of semi-supervised learning for health classification tasks when labeled data is scarce. This approach can serve as a valuable decision-support tool for regional health planning and targeted interventions, enhancing the precision and efficiency of public health responses.