Claim Missing Document
Check
Articles

Found 15 Documents
Search

Classification of Drinking Water Source Suitability in West Java Using XGBoost and Cluster Analysis Based on SHAP Values: Klasifikasi Kelayakan Sumber Air Minum di Jawa Barat Menggunakan XGBoost dan Analisis Klasterisasi Berdasarkan Nilai SHAP Sari, Annisa Permata; Billy; Tsaqif, Denanda Aufadlan; Sartono, Bagus; Firdawanti, Aulia Rizki
Indonesian Journal of Statistics and Applications Vol 8 No 2 (2024)
Publisher : Statistics and Data Science Program Study, IPB University, IPB University, in collaboration with the Forum Pendidikan Tinggi Statistika Indonesia (FORSTAT) and the Ikatan Statistisi Indonesia (ISI)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v8i2p202-214

Abstract

Water is essential for meeting the basic needs of living organisms. In Indonesia, ensuring safe and quality drinking water is crucial for public health. However, in some regions, particularly in West Java Province, people still rely on unsuitable water sources, which can negatively impact health. The classification of water source suitability can be achieved using machine learning, such as the Extreme Gradient Boosting (XGBoost) model. XGBoost with feature selection is effective in improving prediction accuracy and minimizing overfitting. This study evaluates the performance of the XGBoost model in classifying household drinking water sources in West Java and uses the K-Means algorithm for cluster SHAP values to identify key characteristics of households with safe drinking water. The results show that the XGBoost model, with an accuracy of 77.43% and an F1-Score of 80.17%, successfully classified 4187 households, with 2349 having safe drinking water and 1838 having unsuitable sources. SHAP value analysis identified location, water collection time, and monthly per capita expenditure as significant factors influencing water source suitability. Households with water sources inside the house's fence, a short water collection time, and high monthly per capita expenditure tend to have safe drinking water sources. There are 4 clusters formed, with cluster 1 and cluster 3 needing immediate quality of drinking water sources improvement with cluster 2 as an indicator of success. Cluster 4 consists of households with high expenditure, marking it as a potential household for the government to make water quality improvements.
Hedging Strategy Analysis of GOTO Stock Using Collar, Bear Put Spread, and Long Strangle Agustiani, Nur; Wahyu, Sri; Firdawanti, Aulia Rizki; Ahmad, Hafidlotul Fatimah
Euler : Jurnal Ilmiah Matematika, Sains dan Teknologi Volume 13 Issue 3 December 2025
Publisher : Universitas Negeri Gorontalo

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.37905/euler.v13i3.34092

Abstract

This study compares the performance of three hedging strategies, Collar, Bear Put Spread, and Long Strangle, in a case study of PT GoTo Gojek Tokopedia Tbk (GOTO) stock. The analysis focuses on the risk management effectiveness and profit potential of these strategies within an emerging market context. The research utilizes weekly stock price data from July 2023 to June 2024 (54 observations). The methodological procedures include calculating returns and volatility, testing return normality using the Shapiro-Wilk test, determining European option prices using the Black-Scholes model with a 6% risk-free interest rate, and conducting profit simulations. The findings indicate that the Collar strategy provides maximum protection against stock price declines, albeit with limited profit potential. The Bear Put Spread strategy proves effective in generating returns during moderate price decreases while offering lower risk and cost. Conversely, the Long Strangle strategy possesses high profit potential during significant price volatility but carries the risk of total loss if stock prices remain stagnant. As a comprehensive comparison of these three option strategies applied to GOTO stock, this study recommends the Collar strategy as the optimal choice for risk-averse investors during bearish trends.
Studi Komparatif Metode Boosting Dalam Pengklasifikasian Penerima Bantuan Program Keluarga Harapan (PKH) Amatullah, Fida Fariha; MY, Hadyanti Utami; Rizqi, Tasya Anisah; Wahyuni, Silvia Tri; Sartono, Bagus; Firdawanti, Aulia Rizki
TELKA - Telekomunikasi Elektronika Komputasi dan Kontrol Vol 11, No 3 (2025): TELKA
Publisher : Jurusan Teknik Elektro UIN Sunan Gunung Djati Bandung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15575/telka.v11n3.315-326

Abstract

Ensemble Learning adalah paradigma pembelajaran mesin dimana beberapa model (biasanya disebut "weak learners") dilatih untuk memecahkan masalah yang sama dan digabungkan untuk mendapatkan hasil yang lebih baik. Salah satu model Ensemble, yaitu model boosting. Beberapa metode boosting yang digunakan dalam penelitian ini, yaitu Gradient Boosting Machines (GBM), Extreme Gradient Boosting Machine (XGBM), Light Gradient Boosting Machine (LGBM), dan CatBoost. Penelitian ini akan mengklasifikasikan Rumah Tangga (RT) yang menerima bantuan Program Keluarga Harapan (PKH). Pengklasifikasian PKH sangat penting dilakukan, karena saat ini pemberian PKH belum optimal dan masih banyak yang tidak tepat sasaran. Hasil penelitian menunjukkan bahwa metode LGBM menunjukkan performa terbaik ketika jumlah data latih berukuran besar, yaitu 90% dengan akurasi sebesar 67,97%, sedangkan untuk data latih kecil yaitu 60:40, LGBM memiliki performa yang kurang baik, dengan nilai balanced accuracy terendah dibandingkan metode boosting lainnya, yaitu sebesar 54,43%. Keunggulan LGBM ini disebabkan karena kemampuannya dalam mengelola data besar dan kompleks yang sesuai dengan karakteristik data sosial ekonomi rumah tangga penerima PKH. Dua fitur yang memiliki peran penting untuk pengklasifikasian PKH dalam model terbaik yaitu LGBM adalah faktor ekonomi dan jumlah anggota rumah tangga. Ensemble Learning is a machine learning paradigm in which multiple models (commonly referred to as "weak learners") are trained to solve the same problem and combined to achieve better results. One of the Ensemble models is the boosting model. Several boosting methods used in this study include Gradient Boosting Machines (GBM), Extreme Gradient Boosting Machine (XGBM), Light Gradient Boosting Machine (LGBM), and CatBoost. This study aims to classify households (RT) that receive assistance from the Program Keluarga Harapan (PKH). The classification of PKH recipients is crucial because the distribution of PKH aid has not been optimal, with many cases of misallocation. The results of the study indicate that the LGBM method demonstrates the best performance when the latih dataset is large (90%), achieving an accuracy of 67.97%. However, when the latih dataset is small (60:40), LGBM performs poorly, recording the lowest balanced accuracy among the boosting methods, at 54.43%. The superiority of LGBM is attributed to its ability to handle large and complex data, which aligns with the socio-economic characteristics of PKH recipient households. Two key features that play a significant role in PKH classification using the best-performing model, LGBM, are economic factors and the number of household members.
Evaluation of Tree-Based Models for Predicting Social Assistance Recipient Status Based on National Socio-Economic Survey (SUSENAS) 2024 Hiola, Yani Prihantini; Zulhijrah; Putra, I Gusti Ngurah Sentana; Limba, Syella Zignora; Sartono, Bagus; Firdawanti, Aulia Rizki; Susetyo, Budi; Dito, Gerry Alfa
Journal of Mathematics, Computations and Statistics Vol. 9 No. 1 (2026): Volume 09 Issue 01 (March 2026)
Publisher : Jurusan Matematika FMIPA UNM

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35580/xyyv0f37

Abstract

Abstract. Poverty is a major socioeconomic challenge in Indonesia that affects the effectiveness of social protection programs. In response to this challenge, the government has created social assistance programs to improve the welfare of the people. However, the distribution of social assistance is often considered to be inaccurate, resulting in households that are deemed eligible for social assistance not being identified as recipients. One solution to improve the accuracy of distribution is the application of machine learning in the context of classification. Several tree-based models, such as LightGBM, Random Forest, and XGBoost, were selected because of their superior capabilities compared to classical models such as logistic regression, especially in handling complex data and fulfilling model assumptions. This study compares the performance of these three models in predicting social assistance recipient status using data from the 2024 West Java Provincial National Socioeconomic Survey (SUSENAS). Model evaluation was conducted on several data pre-processing scenarios involving outlier handling, class balancing, and feature engineering. The results show that LightGBM consistently outperforms the other models on six metrics, namely Accuracy, Balanced Accuracy, F1-Score, ROC-AUC, PR-AUC, and Brier Score, out of a total of eight evaluation metrics used. SHAP analysis identifies Social Assistance History and Asset Score as the most influential features for model prediction. Friedman and Nemenyi nonparametric tests confirmed significant performance differences between LightGBM and other models based on the F1-Score, PR-AUC, and Brier Score metrics. These findings indicate that tree-based models, particularly LightGBM, can support the development of a more targeted and data-driven social assistance targeting system. Keywords: Social Assistance; Tree-Based; SHAP; SUSENAS; Hybrid Bayesian Optimization
Analysis of Household Risk Factors Associated with Food Anxiety Using Boosting-Based Machine Learning Methods Aisyah, Nisa Nur; Butar, Rupmana Br; Putri, Mega Ramatika; Amelia, Lisa; Sartono, Bagus; Firdawanti, Aulia Rizki
Journal of Mathematics, Computations and Statistics Vol. 9 No. 1 (2026): Volume 09 Issue 01 (March 2026)
Publisher : Jurusan Matematika FMIPA UNM

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35580/nz9epj83

Abstract

Food anxiety represents an early psychological indicator of household food insecurity and is influenced by economic vulnerability, household characteristics, and unstable access to food. West Java, as Indonesia’s most populous province, faces substantial socio-economic disparities that heighten the risk of food insecurity. Using SUSENAS 2024 data, this study aims to classify household food anxiety and evaluate the predictive performance of three boosting algorithms XGBoost, LightGBM, and CatBoost. The dataset exhibits a strong class imbalance, with only 19.1% of households categorized as food anxious, prompting the application of SMOTE and Winsorization during preprocessing. SMOTE considerably improved model performance, particularly in balanced accuracy. For XGBoost, balanced accuracy increased sharply from 0.5199 to 0.8738, while LightGBM experienced a similar improvement from 0.5261 to 0.8736. Winsorization produced only marginal additional effects. Across all scenarios, XGBoost demonstrated the highest overall performance, followed closely by LightGBM, whereas CatBoost showed limited ability to detect minority-class households. These findings underscore the effectiveness of boosting algorithms especially XGBoost enhanced by SMOTE in identifying food-anxious households and supporting data-driven, targeted food security interventions in West Java.