Toer, Guevara Ananta
Unknown Affiliation

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

An Empirical Study on the Impact of Feature Scaling and Encoding Strategies on Machine Learning Regression Pipelines Toer, Guevara Ananta; Kim, Gwanpil
International Journal of Informatics and Information Systems Vol 9, No 1: Regular Issue: January 2026
Publisher : International Journal of Informatics and Information Systems

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/ijiis.v9i1.293

Abstract

Data preprocessing is a critical yet often underestimated component of Machine Learning (ML) regression pipelines. While prior studies have largely focused on algorithm selection and model architecture, the combined impact of feature scaling and categorical encoding strategies within end-to-end regression pipelines remains insufficiently explored. This study presents an empirical evaluation of how different preprocessing configurations influence regression model performance. Three regression algorithms, Linear Regression, Random Forest Regression, and Gradient Boosting Regression are evaluated in combination with multiple feature scaling methods (Min–Max, Standard, and Robust scaling) and categorical encoding techniques (One-Hot and Ordinal encoding). Experiments are conducted on a real-world car sales dataset comprising 50,000 records, using a k-fold cross-validation framework to ensure robust performance estimation. Model performance is assessed primarily using mean R², supported by RMSE and MAE as error-based metrics. The results demonstrate that ensemble-based models, particularly Gradient Boosting and Random Forest, consistently outperform Linear Regression across all preprocessing configurations. Feature scaling shows limited influence on ensemble model performance, whereas categorical encoding plays a more significant role, with One-Hot Encoding yielding higher predictive accuracy and lower error dispersion than Ordinal Encoding. Overall, the findings highlight that model choice is the dominant determinant of regression performance, followed by encoding strategy, while scaling has a comparatively minor effect. This study provides empirical guidance for designing robust and effective ML regression pipelines and underscores the importance of evaluating preprocessing techniques in conjunction with model selection.
Clustering Digital Governance Adoption Patterns in the Metaverse Using K-Means and DBSCAN Algorithms Widjaja, Andree Emmanuel; Hery; Toer, Guevara Ananta
International Journal Research on Metaverse Vol. 3 No. 1 (2026): Regular Issue March 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/ijrm.v3i1.42

Abstract

The rapid advancement of immersive digital environments has accelerated global interest in leveraging metaverse technologies as extensions of public governance systems. This study analyses citizen readiness and perception toward metaverse-based digital governance in The Gambia using two unsupervised machine learning algorithms: K-Means and DBSCAN, applied to a dataset of 115 survey responses. After preprocessing and feature standardization, the K-Means algorithm identified two distinct adoption clusters, consisting of Cluster 0 with 76 respondents and Cluster 1 with 39 respondents. The centroid projections in PCA space revealed a clear behavioural separation, with Cluster 1 exhibiting a substantially higher mean PC1 score (2.5270) compared to Cluster 0 (−1.2968), indicating stronger readiness, optimism, and trust among respondents in the former group. In contrast, DBSCAN produced a single dominant cluster of 107 respondents and identified 8 outliers, suggesting a generally cohesive perception landscape with a small number of respondents expressing atypical attitudes toward metaverse-enabled governance. Collectively, these findings demonstrate that while public sentiment toward metaverse governance is broadly aligned, significant intra-group differences exist, making behavioural segmentation crucial for informing policy strategies. The results underscore the need for tailored approaches that address both enthusiastic adopters and more cautious individuals to support equitable and inclusive metaverse governance adoption.