Groundwater quality is a fundamental aspect of fulfilling clean water needs, particularly in urban areas such as Jakarta, which faces significant supply limitations due to severe contamination from domestic waste, chemical pollutants, industrial activities, and septic tank leakage. This study aims to compare the performance of nine machine learning algorithms in developing a classification model for groundwater feasibility based on physical parameters. Real-time data were collected from three administrative regions in Jakarta using Internet of Things (IoT) sensors, which monitored pH, temperature, total dissolved solids (TDS), and turbidity. Model evaluation involved hyperparameter tuning, cross-validation, feature importance analysis, LIME interpretation, and performance metrics including AUC, accuracy, precision, recall, and F1-score. The results indicate that CatBoost achieved the highest overall performance (AUC: 0.9448, accuracy: 0.9318, F1-score: 0.9209). LightGBM demonstrated competitive results with an F1-score of 0.9211 and AUC of 0.9431, while XGBoost recorded the highest recall at 0.9359. Random Forest and AdaBoost also exhibited consistent performance, with precision of 0.9094 and recall of 0.9327, respectively. In contrast, Support Vector Machine (SVM) yielded the lowest performance (AUC: 0.8860, accuracy: 0.8499). Based on a comprehensive evaluation, CatBoost model is recommended as the most suitable model for IoT-based groundwater quality classification systems.
Copyrights © 2025