TECHNOMEDIA : Informatics and Computer Science
Vol 2 No 1 (2025): Januari

IMPLEMENTASI BIG DATA ANALYTICS DALAM KLASIFIKASI KUALITAS UDARA MENGGUNAKAN ALGORITMA GRADIENT-BOOSTED TREE CLASSIFIER PADA PYSPARK

Muhamad Fuat Asnawi (Unknown)
Nur Fitriyanto (Unknown)
M. Agoeng Pamoengkas (Unknown)



Article Info

Publish Date
31 Jan 2025

Abstract

This study aims to classify air quality based on PM1.0, PM2.5, and PM10 parameters using a Big Data Analytics approach with the Gradient-Boosted Tree Classifier (GBT) algorithm implemented on the PySpark framework. The dataset used was downloaded from OpenAQ, covering the period from April 14, 2021, to April 16, 2023, with a total of 1,048,154 entries, representing a large and complex volume of data. The research process includes data preprocessing to address data imbalance, dataset splitting for training and testing, and hyperparameter tuning using grid search and cross-validation to optimize model performance. By leveraging PySpark’s advantage in parallel processing of large data, the GBT model achieved an accuracy of 98.87%, precision of 99.00%, recall of 98.87%, and an F1-Score of 98.90%. This study demonstrates how Big Data Analytics can enhance efficiency and accuracy in air quality classification, contributing significantly to the development of real-time monitoring systems that support air pollution mitigation and data-driven policy-making.

Copyrights © 2025






Journal Info

Abbrev

technomedia

Publisher

Subject

Computer Science & IT

Description

TECHNOMEDIA : Informatics and Computer Science adalah Jurnal Ilmiah Informatika dan Ilmu Komputer yang diterbitkan 2 (Dua) kali dalam setahun, yaitu pada bulan Januari dan Juli oleh CV Nature Creative Innovation. Jurnal ini merupakan jurnal yang dapat akses secara terbuka bagi para Peneliti, ...