Journal of Applied Data Sciences
Vol 7, No 1: January 2026

Global Air Quality Index Prediction Using Machine Learning on Major Pollutants

Santoso, Richard (Unknown)
Iskandar, Karto (Unknown)



Article Info

Publish Date
22 Feb 2026

Abstract

Air pollution remains a major global concern due to its significant impact on public health and environmental sustainability. This study aims to develop a reliable global Air Quality Index (AQI) prediction model by evaluating five regression-based machine learning algorithms, including Linear Regression, Support Vector Regression, Random Forest, XGBoost, and LightGBM. The dataset contains over twenty thousand pollutant concentration records from multiple countries. Since the dataset consists of independent pollutant observations without timestamps or temporal sequences, this research employs supervised regression techniques rather than time-series forecasting methods to ensure methodological consistency with the non-temporal structure of the data. The methodology includes data preprocessing, validation of geocoded country information for missing values, transformations to address skewed pollutant distributions, and feature selection based on established environmental standards. Sample weights were applied to account for uneven regional representation, and systematic hyperparameter tuning with cross-validation was conducted to optimize model parameters and reduce potential overfitting. Evaluation metrics are supported by correlation analysis to quantify relationships between pollutants and AQI. The results show that XGBoost delivers the highest and most stable performance, with a MAE of 0.0216, MSE of 0.0010, RMSE of 0.0318, R² of 0.9971, and MAPE of 0.5664. Feature importance analysis highlights PM2.5 as the most influential pollutant, followed by ozone, nitrogen dioxide, and carbon monoxide. The predicted AQI values closely align with observed measurements, demonstrating strong generalizability across regions. An interactive dashboard was developed to visualize AQI predictions and pollutant contributions across countries, improving practical usability for environmental monitoring. Overall, this study provides a comprehensive framework for global AQI prediction and demonstrates the potential of machine learning to support decision-making in environmental management and public health planning.

Copyrights © 2026






Journal Info

Abbrev

JADS

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes ...