Regression is a data science method for evaluating the relationship between independent and dependent variables. This study compares the performance of various regression algorithms using the Boston Housing Dataset, which consists of 506 samples divided into 80% for training and 20% for testing. Performance evaluation was conducted using metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and the Coefficient of Determination (R²). All algorithms were implemented with default hyperparameter settings provided by the Scikit-learn library to ensure fair comparison. The results showed that versatile algorithms, particularly Gradient Boosting Machines (GBM) and Random Forest, achieved the best performance with R² values of 0.92 and 0.89, respectively, and lower errors. Conversely, regression-specific algorithms, such as Linear Regression and Ridge Regression, recorded R² values of approximately 0.67, while the k-Nearest Neighbors algorithm had the lowest performance with an R² of 0.65. Versatile algorithms proved to be more effective for datasets with complex non-linear patterns, while regression-specific algorithms were better suited for linear data patterns. These findings provide guidance for practitioners in selecting algorithms based on data characteristics and analysis objectives.
Copyrights © 2025