Accurate daily calorie estimates are essential for personalized nutrition and prevention of diet-related conditions, yet lifestyle variability can reduce the effectiveness of one-size-fits-all recommendations. This study aims to develop an accurate lifestyle-based calorie estimation model by comparing an interpretable linear approach with ensemble machine learning methods. A publicly available lifestyle dataset from Kaggle was used, containing demographic variables, anthropometric measurements, food intake, dietary patterns, and physical activity attributes. A preprocessing pipeline was applied, including outlier handling using interquartile range capping, categorical encoding, normalization, and feature selection via Recursive Feature Elimination to identify the most relevant predictors. Four models (Linear Regression, Random Forest, XGBoost, and LightGBM) were trained and evaluated, followed by hyperparameter tuning of ensemble models using GridSearchCV. Performance was assessed using R², Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE) and training time. Linear Regression achieved the best overall performance (R² = 0.9650, MAE = 80.95, RMSE = 101.71, training time = 8.95 seconds). Among ensembles, the tuned XGBoost performed best (R² = 0.9646, MAE = 81.34, RMSE = 102.35, training time = 10.55 seconds). Compared with tuned XGBoost, Linear Regression was superior with MAE by 0.39 and RMSE by 0.64, while R² increased by 0.0004 and required less computational time, indicating that added complexity did not yield meaningful gains on this structured dataset. These findings suggest that, for structured lifestyle data, interpretable linear models can match or outperform complex ensembles while remaining computationally efficient for real-time or edge-deployed health applications.
Copyrights © 2026