Farit Mochamad Afendi
Department of Statistics and Data Science, School of Data Science, Mathematics, and Informatics, IPB University, Indonesia

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

COMPARATIVE STUDY OF LIGHTGBM, CATBOOST, AND RANDOM FOREST IN MODELING PUBLIC COMPLAINTS CLASSIFICATION Oktaviyani Daswati; Hari Wijayanto; Farit Mochamad Afendi
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 20 No 3 (2026): BAREKENG: Journal of Mathematics and Its Application
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30598/barekengvol20iss3pp2535-2548

Abstract

Public complaints data on maladministration in Indonesia is a dataset with high-cardinality categorical variables and imbalanced category distributions, posing significant challenges for conventional machine learning algorithms. To address this issue, this study aims to evaluate and compare the performance of three widely used classification algorithms (LightGBM, CatBoost, and Random Forest) on actual public complaint data that has never been analysed using machine learning methods. Hyperparameter tuning was applied to obtain optimal configurations and ensure robust performance. Analysis was conducted using 30 repeated simulations with accuracy and sensitivity as the primary metrics. ANOVA followed by Tukey HSD was used to explicitly determine whether there were differences in performance between models at a 95% confidence level. The results show that LightGBM performed best with an accuracy of 74.50% and a sensitivity of 76.70%, followed by CatBoost with an accuracy of 74.12% and a sensitivity of 75.54%, while Random Forest lagged far behind. Statistical tests confirmed significant performance differences between the three models. This study is not without limitations. Only three classification algorithms were evaluated, encoding strategies were not systematically compared, and the hyperparameter search space was restricted, meaning broader model exploration may yield improved performance. Nonetheless, the study provides originality and value by representing the first empirical application of machine learning to Indonesian public complaint data on maladministration, demonstrating how algorithm selection directly affects predictive outcomes when handling complex categorical structures. The findings offer practical insights for government agencies, highlighting how data-driven models can support policy design, strengthen transparency, and improve the quality of public services.