Vanisa Amalia Putri
Universitas Sriwijaya, Palembang

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Leakage-Aware Random Forest Regression for Predicting Job Automation Risk Using Structured Labor Market Data Alya Zalfa Chairunnisa; Nawirah Athqiyah; Vanisa Amalia Putri; Ken Dhita Tania; Allsela Meiriza
Building of Informatics, Technology and Science (BITS) Vol 8 No 1 (2026): June 2026
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v8i1.9706

Abstract

This study aims to predict job automation risk in the era of artificial intelligence (AI) using a leakage-aware Random Forest Regression approach. The automation risk score, defined as a composite index derived from task exposure to AI, occupational routine intensity, and technological susceptibility indicators sourced from the AI Impact Jobs Dataset, serves as the target variable. The dataset comprises 5,000 job vacancy records from 44 countries across 9 industries spanning 2010 to 2025. A rigorous methodological framework is applied by systematically identifying and eliminating potential data leakage features, including ai_intensity_score, reskilling_required, and ai_mentioned, which were found to share mathematical or conceptual derivation paths with the target variable. The model is evaluated using R², RMSE, MAE, and MAPE with 5-fold cross-validation. The results show that the model achieves an R² score of 0.8087 on testing data, with RMSE of 0.1129 and MAE of 0.0893. Feature importance analysis reveals that salary_change_vs_prev_year_percent is the most influential predictor (55.85%), which, although indicative of dominance bias typical in synthetic datasets, aligns with economic theories linking wage dynamics to automation incentives. The findings demonstrate that leakage control significantly reduces inflated performance estimates (from R² = 0.8857 to 0.8087), and that Random Forest Regression provides a robust predictive framework for tabular socio-economic data when combined with rigorous preprocessing. This study contributes a methodological template for preventing data leakage in labor market prediction tasks.