Poverty in South Sumatera remains a complex challenge influenced by socioeconomic factors. Traditional methods often fail to capture nonlinear relationships critical for accurate prediction. This study enhances poverty prediction by optimizing feature engineering using 32-variable socioeconomic data from South Sumatra for the years 2019 to 2023. Data preprocessing included cleaning, imputation, normalization, and outlier handling. Feature aggregation created composite indices: Education Index (P1, P2, P3), Health Index (AH1–AH4), Economic Index (IE, GR, AI, EG), and Healthcare Workforce Index (HW1–HW9). Feature interaction derived ratios such as Income vs. Economy (AN/Education Index), Infrastructure vs. Health (road length/Healthcare Workforce Index), and Unemployment vs. Workforce (HI/AT), highlighting interdependencies. Dimensionality reduction (PCA) and Lasso Regression selected eight key predictors, including Year and Poverty Level. Among tested models, Random Forest performed best (R²=0.7244, MAE=0.2489). SHAP analysis identified Education and Economic Indices as top predictors. Optimized feature engineering improved model accuracy and interpretability, supporting targeted poverty reduction strategies in South Sumatera.
Copyrights © 2025