Ramadhan, Adrian Putra
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Mixed-Data K-Means Clustering with Hyperparameter-Tuned Random Forest for OSS-Based MSME Investment Profiling and Policy Targeting Sari, Laura; Maharrani, Ratih Hafsarah; Hastuti, Hety Dwi; Ramadhan, Adrian Putra; Windasari, Wahyuni
Jurnal Teknik Informatika (Jutif) Vol. 7 No. 2 (2026): JUTIF Volume 7, Number 2, April 2026
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2026.7.2.5545

Abstract

Administrative data of Micro, Small, and Medium Enterprises collected through the Online Single Submission system are highly heterogeneous, combining numerical and categorical attributes that hinder conventional investment segmentation and early-stage policy mapping. This study aims to develop a predictive clustering framework for enterprise investment profiling using mixed-type administrative data. The proposed methodology applies robust preprocessing, including RobustScaler for numerical variables and one-hot encoding with singular value decomposition for categorical features. Mixed-type similarity is computed using Gower distance, followed by a hybrid Gower–K-Means clustering approach, where the optimal number of clusters (k = 3) is determined using Silhouette, Calinski–Harabasz, and Davies–Bouldin indices. A comparative evaluation of clustering algorithms is conducted, with K-Prototypes performing best in the initial assessment and K-Means achieving superior performance after optimization. Cluster membership is subsequently predicted using a Random Forest classifier with hyperparameters optimized through randomized search. Experiments on 20,857 enterprise records identify three distinct clusters representing low-capital micro enterprises, transitional firms, and asset-intensive corporate entities. The optimized K-Means model achieves a Silhouette score of 0.97 and a Davies–Bouldin Index of 0.54. Compared with the untuned baseline, the tuned Random Forest model improves recall from 0.25 to 0.75 (200% increase) and increases the F1-score from 0.40 to 0.86 (114% improvement), while achieving 99.89% accuracy. These gains correspond to an estimated 20–30% improvement in MSME investment mapping effectiveness compared with traditional profiling approaches, providing a scalable AI-based blueprint for targeted regional economic governance.