Karto Iskandar
Program Studi Teknik Informatika Universitas Bina Nusantara Jakarta

Published : 14 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Journal of Applied Data Sciences

Global Air Quality Index Prediction Using Machine Learning on Major Pollutants Santoso, Richard; Iskandar, Karto
Journal of Applied Data Sciences Vol 7, No 1: January 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i1.1112

Abstract

Air pollution remains a major global concern due to its significant impact on public health and environmental sustainability. This study aims to develop a reliable global Air Quality Index (AQI) prediction model by evaluating five regression-based machine learning algorithms, including Linear Regression, Support Vector Regression, Random Forest, XGBoost, and LightGBM. The dataset contains over twenty thousand pollutant concentration records from multiple countries. Since the dataset consists of independent pollutant observations without timestamps or temporal sequences, this research employs supervised regression techniques rather than time-series forecasting methods to ensure methodological consistency with the non-temporal structure of the data. The methodology includes data preprocessing, validation of geocoded country information for missing values, transformations to address skewed pollutant distributions, and feature selection based on established environmental standards. Sample weights were applied to account for uneven regional representation, and systematic hyperparameter tuning with cross-validation was conducted to optimize model parameters and reduce potential overfitting. Evaluation metrics are supported by correlation analysis to quantify relationships between pollutants and AQI. The results show that XGBoost delivers the highest and most stable performance, with a MAE of 0.0216, MSE of 0.0010, RMSE of 0.0318, R² of 0.9971, and MAPE of 0.5664. Feature importance analysis highlights PM2.5 as the most influential pollutant, followed by ozone, nitrogen dioxide, and carbon monoxide. The predicted AQI values closely align with observed measurements, demonstrating strong generalizability across regions. An interactive dashboard was developed to visualize AQI predictions and pollutant contributions across countries, improving practical usability for environmental monitoring. Overall, this study provides a comprehensive framework for global AQI prediction and demonstrates the potential of machine learning to support decision-making in environmental management and public health planning.