In the realm of environmental monitoring, particularly river water quality, the study at hand addresses the paramount challenge of accurately predicting dissolved oxygen (DO) levels—a critical indicator of aquatic ecosystem health. This research targets the complexities inherent in environmental datasets, including the presence of anomalies that can skew predictive models, thereby undermining the reliability of DO level forecasts. By applying and critically evaluating advanced anomaly detection methods—One-Class SVM, Isolation Forest, and Autoencoders—the study endeavors to enhance predictive accuracy and address gaps in existing research methodologies. The methodology encompasses data collection, preprocessing, anomaly detection, and evaluation, working with a dataset comprising five indicators across eight monitoring stations. The research process entailed thorough data preparation, ensuring dataset integrity and uniformity. Anomaly detection was meticulously performed, with each method revealing varying outlier detection sensitivities. The One-Class SVM method identified 23 outliers, the Isolation Forest found 38, and the Autoencoders flagged 88. When assessing the impact on model accuracy, reflected by the RMSE, the Isolation Forest method outperformed the others, achieving the lowest RMSE of 0.9668, indicating a more effective anomaly mitigation contributing to a cleaner dataset. In contrast, the Autoencoders, while detecting the most anomalies, yielded the highest RMSE, suggesting a propensity to overfit and misclassify data variations as anomalies. This study illuminates the criticality of selecting suitable anomaly detection methods tailored to the dataset's nuances, emphasizing that the choice profoundly influences predictive model performance. The Isolation Forest's proficiency in this context underscores its potential as a robust method for environmental data analysis, capable of balancing outlier detection accuracy with predictive model precision.
Copyrights © 2024