I Gusti Ngurah Sentana Putra
IPB University

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Evaluation of Tree-Based Models for Predicting Social Assistance Recipient Status Based on National Socio-Economic Survey (SUSENAS) 2024 Yani Prihantini Hiola; Zulhijrah; I Gusti Ngurah Sentana Putra; Syella Zignora Limba; Bagus Sartono; Aulia Rizki Firdawanti; Budi Susetyo; Gerry Alfa Dito
Journal of Mathematics, Computations and Statistics Vol. 9 No. 1 (2026): Volume 09 Issue 01 (March 2026)
Publisher : Jurusan Matematika FMIPA UNM

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35580/xyyv0f37

Abstract

Abstract. Poverty is a major socioeconomic challenge in Indonesia that affects the effectiveness of social protection programs. In response to this challenge, the government has created social assistance programs to improve the welfare of the people. However, the distribution of social assistance is often considered to be inaccurate, resulting in households that are deemed eligible for social assistance not being identified as recipients. One solution to improve the accuracy of distribution is the application of machine learning in the context of classification. Several tree-based models, such as LightGBM, Random Forest, and XGBoost, were selected because of their superior capabilities compared to classical models such as logistic regression, especially in handling complex data and fulfilling model assumptions. This study compares the performance of these three models in predicting social assistance recipient status using data from the 2024 West Java Provincial National Socioeconomic Survey (SUSENAS). Model evaluation was conducted on several data pre-processing scenarios involving outlier handling, class balancing, and feature engineering. The results show that LightGBM consistently outperforms the other models on six metrics, namely Accuracy, Balanced Accuracy, F1-Score, ROC-AUC, PR-AUC, and Brier Score, out of a total of eight evaluation metrics used. SHAP analysis identifies Social Assistance History and Asset Score as the most influential features for model prediction. Friedman and Nemenyi nonparametric tests confirmed significant performance differences between LightGBM and other models based on the F1-Score, PR-AUC, and Brier Score metrics. These findings indicate that tree-based models, particularly LightGBM, can support the development of a more targeted and data-driven social assistance targeting system. Keywords: Social Assistance; Tree-Based; SHAP; SUSENAS; Hybrid Bayesian Optimization
CLASSIFICATION OF CARDIOVASCULAR AND CHRONIC RESPIRATORY DISEASES UTILIZING ENSEMBLE MODELS WITH DATA EXPLORATION TECHNIQUES I Gusti Ngurah Sentana Putra; Amri Luthfi Najih; Unique DA Resiloy; Rachmat Bintang Yudhianto; Erfiani Erfiani; Anwar Fitrianto
JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika) Vol 10, No 4 (2025)
Publisher : STKIP PGRI Tulungagung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29100/jipi.v10i4.9311

Abstract

Non-communicable diseases, especially cardiovascular and chronic respiratory conditions, contribute significantly to Indonesia’s healthcare burden and BPJS expenditure. Health claim data often suffer from class imbalance, multicollinearity, and outliers that impair model accuracy. This study evaluates the impact of essential data exploration techniques such as winsorizing, correlation and VIF analysis, variable selection, and SMOTE on the performance of ensemble classifiers. The dataset comprises 497,439 BPJS health insurance claims from 2022, including 27 predictors (14 numerical and 13 categorical). Two data pipelines were compared: one without preprocessing and another incorporating systematic data exploration. Five ensemble models were tested, namely Decision Tree, Extra Trees, Random Forest, XGBoost, and LightGBM. Model performance was assessed using F1-score, balanced accuracy, and G-mean across 20 stratified cross-validations. The results show that preprocessing substantially improves classification fairness and accuracy. Bagging models, particularly Random Forest, achieved the highest improvement, with balanced accuracy and G-mean increasing from around 0.93 to 0.99. Boosting models showed modest gains. These findings highlight that rigorous data exploration enhances ensemble classifier performance, enabling more reliable disease classification and supporting fairer, data-driven decision-making in BPJS health management.