Car price prediction is a major challenge in the automotive industry because it is influenced by various factors, such as technical specifications, fuel type, and transmission system. This research aims to evaluate and compare the performance of linear regression models and ensemble learning methods, namely Random Forest and Gradient Boosting, in predicting car prices. The dataset used comes from Kaggle, with 11,914 rows of data and 16 features. The research process includes the stages of data understanding, data preparation, modeling, and evaluation using the Mean Squared Error (MSE) and R-squared (R²) metrics. The research results show that the Gradient Boosting model has the best performance, with an R² value of 0.963868 and the lowest MSE compared to other models, followed by Random Forest with an R² of 0.899657. In contrast, linear regression showed lower performance, with an R² of 0.417905, indicating its limitations in handling non-linear relationships in the data. The prediction results from the best model show price estimates that are quite close to actual prices, although some improvements still need to be made through hyperparameter optimization. This research confirms that ensemble learning methods, especially Gradient Boosting, provide a more effective approach to predicting car prices than linear regression. This model has the potential to be applied in the automotive industry to improve the accuracy of vehicle price estimates for manufacturers, dealers, and consumers.
Copyrights © 2025