RAGAM: Journal of Statistics and Its Application
Vol 5, No 1 (2026): RAGAM: Journal of Statistics & Its Application

PEMODELAN KUALITAS UDARA JAKARTA BERBASIS DATA MINING DENGAN ALGORITMA RANDOM FOREST, KNN, DAN NAIVE BAYES

Ramadha Meisa Putra, Naufalarizqa (Unknown)



Article Info

Publish Date
09 Mar 2026

Abstract

Air quality prediction plays an important role in supporting public health monitoring in highly urbanized regions such as DKI Jakarta. This study aims to predict the Air Pollutant Standard Index (ISPU) category using three supervised learning algorithms, namely Random Forest, k Nearest Neighbors (kNN), and Naive Bayes, based on five pollutant parameters: PM10, SO2, CO, O3, and NO2. The dataset used in this study consists of validated daily air‑quality records that have undergone preprocessing steps including handling missing values and applying min max normalization. Model evaluation is conducted using the Test and Score feature in the Orange Data Mining software, which provides a visual programming environment for machine learning analysis. The results show that Random Forest achieves the highest performance with an accuracy of 97 percent, followed by kNN with 94 percent and Naive Bayes with 88 percent. Feature ranking using the Chi Square test indicates that PM10 is the most dominant factor influencing ISPU category with a value of 870.174, followed by O3 and NO2. These findings highlight that ensemble-based models are well suited for multiclass air quality classification and confirm that particulate matter remains a key determinant of air quality conditions in Jakarta.

Copyrights © 2026






Journal Info

Abbrev

ragam

Publisher

Subject

Humanities Computer Science & IT Economics, Econometrics & Finance Mathematics Public Health

Description

RAGAM Journal publishes scientific articles in the field of statistics and its applications, including: * Biostatistics * Parametric and nonparametric statistics * Quality control * Econometrics and business * Industrial statistics * Time series analysis * Spatial statistics * Data mining * ...