Jurnal Teknik Informatika (JUTIF)
Vol. 7 No. 2 (2026): JUTIF Volume 7, Number 2, April 2026

Mixed-Data K-Means Clustering with Hyperparameter-Tuned Random Forest for OSS-Based MSME Investment Profiling and Policy Targeting

Sari, Laura (Unknown)
Maharrani, Ratih Hafsarah (Unknown)
Hastuti, Hety Dwi (Unknown)
Ramadhan, Adrian Putra (Unknown)
Windasari, Wahyuni (Unknown)



Article Info

Publish Date
15 Apr 2026

Abstract

Administrative data of Micro, Small, and Medium Enterprises collected through the Online Single Submission system are highly heterogeneous, combining numerical and categorical attributes that hinder conventional investment segmentation and early-stage policy mapping. This study aims to develop a predictive clustering framework for enterprise investment profiling using mixed-type administrative data. The proposed methodology applies robust preprocessing, including RobustScaler for numerical variables and one-hot encoding with singular value decomposition for categorical features. Mixed-type similarity is computed using Gower distance, followed by a hybrid Gower–K-Means clustering approach, where the optimal number of clusters (k = 3) is determined using Silhouette, Calinski–Harabasz, and Davies–Bouldin indices. A comparative evaluation of clustering algorithms is conducted, with K-Prototypes performing best in the initial assessment and K-Means achieving superior performance after optimization. Cluster membership is subsequently predicted using a Random Forest classifier with hyperparameters optimized through randomized search. Experiments on 20,857 enterprise records identify three distinct clusters representing low-capital micro enterprises, transitional firms, and asset-intensive corporate entities. The optimized K-Means model achieves a Silhouette score of 0.97 and a Davies–Bouldin Index of 0.54. Compared with the untuned baseline, the tuned Random Forest model improves recall from 0.25 to 0.75 (200% increase) and increases the F1-score from 0.40 to 0.86 (114% improvement), while achieving 99.89% accuracy. These gains correspond to an estimated 20–30% improvement in MSME investment mapping effectiveness compared with traditional profiling approaches, providing a scalable AI-based blueprint for targeted regional economic governance.

Copyrights © 2026






Journal Info

Abbrev

jurnal

Publisher

Subject

Computer Science & IT

Description

Jurnal Teknik Informatika (JUTIF) is an Indonesian national journal, publishes high-quality research papers in the broad field of Informatics, Information Systems and Computer Science, which encompasses software engineering, information system development, computer systems, computer network, ...