A comparative study of four regression algorithms, namely Support Vector Regression (SVR), Gradient Boosting Regressor (GBR), Random Forest Regressor (RFR), and Extreme Gradient Boosting (XGBoost), was conducted to predict annual aggregate sales based on socioeconomic indicators in Cirebon Regency from 2010 to 2023. The study utilized secondary data obtained from the Central Bureau of Statistics (Badan Pusat Statistik) of Cirebon Regency. Five predictor variables were employed, including life expectancy, expected years of schooling, mean years of schooling, per capita expenditure, and the Human Development Index (HDI). Model performance was evaluated using Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and the coefficient of determination (R-squared). The experimental results indicate that the GBR model achieved the best predictive performance, with the lowest error values (MAE = 127.98 and RMSE = 185.63) and the highest R² value (0.94), outperforming RFR, XGBoost, and SVR after parameter tuning. Feature importance analysis consistently identified life expectancy as the most influential variable across models. These findings demonstrate that ensemble-based regression methods, particularly boosting algorithms, are effective for modeling complex socioeconomic patterns and can support data-driven economic forecasting and regional policy planning
Copyrights © 2026