Jakarta Bay experiences persistent anthropogenic pressures that produce spatially heterogeneous water-quality conditions. This study develops a regulation-aligned, explainable classification framework using a 2024 in-situ dataset collected at 53 stations across two sampling periods (March and August). After preprocessing—including unit harmonization, outlier screening, missing-value imputation, and treatment of below-detection-limit measurements—the dataset yielded 104 complete samples classified into Good (n=46), Lightly Polluted (n=28), and Moderately Polluted (n=34) categories based on KEPMEN LH No. 51/2004. Three ensemble algorithms (LightGBM, CatBoost, and Random Forest) were evaluated using stratified cross-validation to maintain class balance and prevent spatial leakage. CatBoost achieved the best overall performance (Accuracy = 0.8338; F1 = 0.8257), followed by Random Forest, while LightGBM showed the highest variability across folds. Class-level metrics indicate that CatBoost produced the most balanced predictions, particularly for the borderline Lightly Polluted class. SHAP analysis identified turbidity/TSS, nutrients, dissolved oxygen, salinity, and spatial gradients as dominant predictors, enabling transparent interpretation of model decisions. The resulting framework provides a reproducible and operationally deployable approach for rapid screening, hotspot detection, and decision support in Jakarta Bay’s water-quality management.
Copyrights © 2026