Sleep is a vital aspect in maintaining a person's physical and psychological balance. Poor sleep quality can reduce physical and cognitive performance, increasing the risk of various health problems. This study aims to develop a predictive model for sleep quality based on factors such as lifestyle, stress, daily activities, and caffeine consumption, using XGBoost combined with Recursive Feature Elimination (RFE). XGBoost was chosen for its ability to handle imbalanced datasets and heterogeneous features, while RFE helps simplify the model without losing important information. In the data pre-processing stage, a class imbalance was found, so the Synthetic Minority Over-sampling Technique (SMOTE) process was carried out to balance the proportion of the minority class. The dataset in this study was divided into two parts, namely 80% as training data and 20% as testing data, and validated using cross-validation to ensure generalization. The results show very high model performance with an accuracy of 99.79% on training data, 99.63% on cross-validation, and 99.10% on testing data. This model was then developed into a web application for practical use in analyzing sleep quality prediction. This study emphasizes the methodological contribution of a SMOTE-based hybrid machine learning model and its ready-to-use application implementation, while also opening opportunities for further testing on more diverse datasets and evaluating biases caused by synthetic data.
Copyrights © 2025