IJIIS: International Journal of Informatics and Information Systems
Vol 9, No 1: Regular Issue: January 2026

An Empirical Study on the Impact of Feature Scaling and Encoding Strategies on Machine Learning Regression Pipelines

Toer, Guevara Ananta (Unknown)
Kim, Gwanpil (Unknown)



Article Info

Publish Date
25 Jan 2026

Abstract

Data preprocessing is a critical yet often underestimated component of Machine Learning (ML) regression pipelines. While prior studies have largely focused on algorithm selection and model architecture, the combined impact of feature scaling and categorical encoding strategies within end-to-end regression pipelines remains insufficiently explored. This study presents an empirical evaluation of how different preprocessing configurations influence regression model performance. Three regression algorithms, Linear Regression, Random Forest Regression, and Gradient Boosting Regression are evaluated in combination with multiple feature scaling methods (Min–Max, Standard, and Robust scaling) and categorical encoding techniques (One-Hot and Ordinal encoding). Experiments are conducted on a real-world car sales dataset comprising 50,000 records, using a k-fold cross-validation framework to ensure robust performance estimation. Model performance is assessed primarily using mean R², supported by RMSE and MAE as error-based metrics. The results demonstrate that ensemble-based models, particularly Gradient Boosting and Random Forest, consistently outperform Linear Regression across all preprocessing configurations. Feature scaling shows limited influence on ensemble model performance, whereas categorical encoding plays a more significant role, with One-Hot Encoding yielding higher predictive accuracy and lower error dispersion than Ordinal Encoding. Overall, the findings highlight that model choice is the dominant determinant of regression performance, followed by encoding strategy, while scaling has a comparatively minor effect. This study provides empirical guidance for designing robust and effective ML regression pipelines and underscores the importance of evaluating preprocessing techniques in conjunction with model selection.

Copyrights © 2026






Journal Info

Abbrev

IJIIS

Publisher

Subject

Computer Science & IT

Description

The IJIIS is an international journal that aims to encourage comprehensive, multi-specialty informatics and information systems. The Journal publishes original research articles and review articles. It is an open access journal, with free access for each visitor (ijiis.org/index.php/IJIIS/); ...