The high unemployment rate of undergraduate graduates in Indonesia, reaching 11.4% in the first six months after graduation, indicates the need for an early prediction system to identify factors that influence student employability. This study aims to analyze and compare the performance of three machine learning algorithms (Random Forest, Logistic Regression, and XGBoost) to predict employment status 6 months after graduation based on academic and socioeconomic data. The dataset consists of 3,945 graduates from universities in Padangsidimpuan with variables of study program, study duration, GPA, gender, and parental income. The operational target is employment status 6 months after graduation (binary: employed = 1, not yet = 0) with the proportion of employed classes: 48.2 %, not yet: 51.8%. Evaluation uses stratified 5- fold cross-validation with accuracy metrics, balanced accuracy, F1- macro, ROC-AUC, and PR-AUC. Model interpretability is analyzed using permutation importance and SHAP values. Random Forest achieved the best performance with F1- macro 0.524±0.015, ROC-AUC 0.567±0.012, followed by Logistic Regression (F1- macro : 0.511±0.018) and XGBoost (F1- macro : 0.506±0.020). The majority baseline achieved an accuracy of 51.8 %. Permutation importance analysis identified GPA as the most influential factor (importance : 0.082), followed by parental income (0.067) and duration of study (0.041). The machine learning model provided a moderate improvement compared to the majority baseline. GPA and socioeconomic factors were shown to significantly influence graduates' employment status. These findings can support the development of an early warning system for data-based student mentoring. Abstrak Tingginya tingkat pengangguran lulusan sarjana di Indonesia mencapai 11.4% dalam enam bulan pertama pasca kelulusan menunjukkan perlunya sistem prediksi dini untuk mengidentifikasi faktor-faktor yang mempengaruhi employability mahasiswa. Penelitian ini bertujuan menganalisis dan membandingkan performa tiga algoritma machine learning (Random Forest, Logistic Regression, dan XGBoost) untuk memprediksi status kerja 6 bulan pascawisuda berdasarkan data akademik dan sosial ekonomi. Dataset terdiri dari 3.945 data lulusan dari universitas di Padangsidimpuan dengan variabel program studi, durasi studi, IPK, jenis kelamin, dan penghasilan orang tua. Target operasional adalah status kerja 6 bulan pascawisuda (biner: bekerja=1, belum=0) dengan proporsi kelas bekerja:48.2%, belum:51.8%. Evaluasi menggunakan stratified 5-fold cross-validation dengan metrik akurasi, balanced accuracy, F1-macro, ROC-AUC, dan PR-AUC. Interpretabilitas model dianalisis menggunakan permutation importance dan SHAP values. Random Forest mencapai performa terbaik dengan F1-macro 0.524±0.015, ROC-AUC 0.567±0,012, diikuti Logistic Regression (F1-macro: 0.511±0,018) dan XGBoost (F1-macro: 0.506±0.020). Baseline mayoritas mencapai akurasi 51,8%. Analisis permutation importance mengidentifikasi IPK sebagai faktor paling berpengaruh (importance: 0.082), diikuti penghasilan orang tua (0.067) dan durasi studi (0.041). Model machine learning memberikan peningkatan moderat dibanding baseline mayoritas. IPK dan faktor sosial ekonomi terbukti berpengaruh signifikan terhadap status kerja lulusan. Temuan ini dapat mendukung pengembangan sistem early warning untuk pendampingan mahasiswa berbasis data.
Copyrights © 2025