Dropout prediction in higher education is important because it impacts the academic success of students and the overall effectiveness of educational institutions. This research aims to build an automated ETL pipeline using Apache Airflow and Apache Spark to process academic data and predict student graduation status. The dataset used consists of 4,424 samples with 36 features covering demographic, academic, and socio-economic attributes. The data is processed through the stages of extraction, transformation (including SMOTE normalization), with loading into the Random Forest model. The evaluation results showed an accuracy of 62.93% and the highest ROC-AUC value of 0.81 for the dropout class. The Airflow pipeline excels in task scheduling efficiency, while Spark is effective for large-scale data processing. This approach shows practical potential in supporting early warning systems for academic policy decision-making. This research contributes to the intergation of big data and machine learning technologies for efficient and automated higher education data processing.