Laksono, Triyan Agung
Unknown Affiliation

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Jurnal Informatika dan Rekayasa Perangkat Lunak

Perbandingan Apache Airflow dan Apache Spark dalam Proses ETL untuk Memprediksi DropOut dan Keberhasilan Akademik Mahasiswa Laksono, Triyan Agung; Andriyani, Widyastuti
Jurnal Informatika dan Rekayasa Perangkat Lunak Vol. 7 No. 2 (2025): September
Publisher : Universitas Wahid Hasyim

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Dropout prediction in higher education is important because it impacts the academic success of students and the overall effectiveness of educational institutions. This research aims to build an automated ETL pipeline using Apache Airflow and Apache Spark to process academic data and predict student graduation status. The dataset used consists of 4,424 samples with 36 features covering demographic, academic, and socio-economic attributes. The data is processed through the stages of extraction, transformation (including SMOTE normalization), with loading into the Random Forest model. The evaluation results showed an accuracy of 62.93% and the highest ROC-AUC value of 0.81 for the dropout class. The Airflow pipeline excels in task scheduling efficiency, while Spark is effective for large-scale data processing. This approach shows practical potential in supporting early warning systems for academic policy decision-making. This research contributes to the intergation of big data and machine learning technologies for efficient and automated higher education data processing.