Journal of Dinda : Data Science, Information Technology, and Data Analytics
Vol 4 No 2 (2024): August

Determining Air Quality Influential Parameters Using Machine Learning Techniques

Evita Fitri (Universitas Nusa Mandiri)
Andi Saryoko (Universitas Nusa Mandiri)



Article Info

Publish Date
06 Aug 2024

Abstract

Air quality is an important issue in public health and the environment. This research aims to develop an air quality prediction model based on PM10 and PM2.5 parameters using various regression and machine learning approaches. The dataset used includes air pollutant standard index (ISPU) data from a number of stations in the Jakarta area with an observation period from January to April 2024. The research method includes collecting datasets, reviewing literature and testing several models of machine learning techniques. Furthermore, the handling of outliers was carried out using the numeric outliers node and data normalization to prepare the data before dividing the training and testing data. The models evaluated include Linear Regression, Random Forest Regression, Gradient Boosted Trees, and Multilayer Perceptron (MLP), with validation using 10 times cross-validation. The results showed that the Random Forest Regression and Gradient Boosted Trees models provided good prediction performance for both PM10 and PM2.5 parameters. Random Forest Regression showed the lowest RMSE value on testing data for PM10 (0.048) and PM2.5 (0.037), while Gradient Boosted Trees showed the lowest RMSE value on training data for PM2.5 (0.032). The process of handling outliers and normalizing the data successfully improved the prediction accuracy of the model. Suggestions for future research include the exploration of new models, the addition of meteorological and socio-economic variables, and the application of models in real-time air quality monitoring systems.

Copyrights © 2024






Journal Info

Abbrev

dinda

Publisher

Subject

Computer Science & IT

Description

Journal of Dinda : Data Science, Information Technology, and Data Analytics as a publication media for research results in the fields of Data Science, Information Technology, and Data Analytics, but not implicitly limited. Published 2 times a year in February and August. The journal is managed by ...