Using the Random Forest algorithm within the CRISP-DM framework, this study investigates residential building energy consumption and identifies key factors influencing electricity demand. The dataset, obtained from Kaggle, includes environmental, operational, and temporal variables, such as temperature, humidity, occupancy, HVAC usage, lighting activity, renewable energy utilization, and hour_of_day. Data preprocessing and categorical encoding were applied prior to model training, followed by hyperparameter optimization using RandomizedSearchCV. The optimized Random Forest model achieved a Mean Absolute Error (MAE) of 4.349, a Root Mean Square Error (RMSE) of 5.499, and a Coefficient of Determination (R²) of 0.510, indicating moderate predictive performance. While the model effectively captures general consumption patterns, part of the variability remains unexplained due to the absence of explicit temporal dependencies and detailed behavioral factors. Model interpretability was examined using Feature Importance, SHAP, and Partial Dependence Plots (PDP), revealing that temperature and HVAC usage are the most influential predictors of residential energy consumption. Overall, the proposed approach provides interpretable insights into residential energy use patterns and supports data-driven strategies to improve energy efficiency in residential buildings.
Copyrights © 2026