Indonesian Journal of Statistics and Its Applications
Vol 8 No 2 (2024)

Classification of Drinking Water Source Suitability in West Java Using XGBoost and Cluster Analysis Based on SHAP Values: Klasifikasi Kelayakan Sumber Air Minum di Jawa Barat Menggunakan XGBoost dan Analisis Klasterisasi Berdasarkan Nilai SHAP

Sari, Annisa Permata (Unknown)
Billy (Unknown)
Tsaqif, Denanda Aufadlan (Unknown)
Sartono, Bagus (Unknown)
Firdawanti, Aulia Rizki (Unknown)



Article Info

Publish Date
31 Dec 2024

Abstract

Water is essential for meeting the basic needs of living organisms. In Indonesia, ensuring safe and quality drinking water is crucial for public health. However, in some regions, particularly in West Java Province, people still rely on unsuitable water sources, which can negatively impact health. The classification of water source suitability can be achieved using machine learning, such as the Extreme Gradient Boosting (XGBoost) model. XGBoost with feature selection is effective in improving prediction accuracy and minimizing overfitting. This study evaluates the performance of the XGBoost model in classifying household drinking water sources in West Java and uses the K-Means algorithm for cluster SHAP values to identify key characteristics of households with safe drinking water. The results show that the XGBoost model, with an accuracy of 77.43% and an F1-Score of 80.17%, successfully classified 4187 households, with 2349 having safe drinking water and 1838 having unsuitable sources. SHAP value analysis identified location, water collection time, and monthly per capita expenditure as significant factors influencing water source suitability. Households with water sources inside the house's fence, a short water collection time, and high monthly per capita expenditure tend to have safe drinking water sources. There are 4 clusters formed, with cluster 1 and cluster 3 needing immediate quality of drinking water sources improvement with cluster 2 as an indicator of success. Cluster 4 consists of households with high expenditure, marking it as a potential household for the government to make water quality improvements.

Copyrights © 2024






Journal Info

Abbrev

ijsa

Publisher

Subject

Computer Science & IT Mathematics Other

Description

Indonesian Journal of Statistics and Its Applications (eISSN:2599-0802) (formerly named Forum Statistika dan Komputasi), established since 2017, publishes scientific papers in the area of statistical science and the applications. The published papers should be research papers with, but not limited ...