Air quality plays a crucial role in safeguarding human health, environmental integrity, and ecosystem sustainability. Accurate assessment of the Air Quality Index (AQI) is essential for effective air quality monitoring and management. However, prior studies often relied on a single machine learning method, which may limit classification performance, especially under class imbalance conditions. This study aims to compare the performance of multiple machine learning algorithms for AQI classification by applying a random oversampling technique to address imbalance among AQI categories. The dataset comprises secondary data on pollutant concentrations (PM10, SO₂, CO, O₃, NO₂) and AQI categories collected from five monitoring stations between 2010 and 2023. Four classification algorithms were evaluated, and performance was measured using accuracy, precision, recall, and F1-score. Before applying random oversampling, the Random Forest model achieved 97.68% accuracy. After oversampling, its performance improved to 99.60%, alongside consistently high precision, recall, and F1-score. Feature importance analysis revealed that ozone (O₃) was the most influential predictor, contributing 67.14% to model decisions. These findings highlight the effectiveness of combining random oversampling with ensemble-based machine learning for highly accurate AQI classification, offering a robust framework for future environmental monitoring applications.
Copyrights © 2025