Jurnal Informatika dan Rekayasa Perangkat Lunak
Vol. 7 No. 2 (2025): September

Perbandingan Apache Airflow dan Apache Spark dalam Proses ETL untuk Memprediksi DropOut dan Keberhasilan Akademik Mahasiswa

Laksono, Triyan Agung (Unknown)
Andriyani, Widyastuti (Unknown)



Article Info

Publish Date
30 Sep 2025

Abstract

Dropout prediction in higher education is important because it impacts the academic success of students and the overall effectiveness of educational institutions. This research aims to build an automated ETL pipeline using Apache Airflow and Apache Spark to process academic data and predict student graduation status. The dataset used consists of 4,424 samples with 36 features covering demographic, academic, and socio-economic attributes. The data is processed through the stages of extraction, transformation (including SMOTE normalization), with loading into the Random Forest model. The evaluation results showed an accuracy of 62.93% and the highest ROC-AUC value of 0.81 for the dropout class. The Airflow pipeline excels in task scheduling efficiency, while Spark is effective for large-scale data processing. This approach shows practical potential in supporting early warning systems for academic policy decision-making. This research contributes to the intergation of big data and machine learning technologies for efficient and automated higher education data processing.

Copyrights © 2025






Journal Info

Abbrev

JINRPL

Publisher

Subject

Computer Science & IT

Description

Journal of Informatics and Software Engineering accepts scientific articles in the focus of Informatics. The scope can be: Software Engineering, Information Systems, Artificial Intelligence, Computer Based Learning, Computer Networking and Data Communication, and ...