Chronic diseases influenced by lifestyle factors are a crucial public health issue, while predictive models are often limited by class imbalance and a lack of clinical interpretability. This research aims to build an accurate and transparent disease risk prediction model based on lifestyle factors. The method used is hybrid classification, combining the Random Forest algorithm with the SMOTE (Synthetic Minority Oversampling Technique) technique to effectively address the initial data imbalance (3:1 ratio) in the Health Lifestyle Dataset. This balanced data was then split 80:20 for testing. The test results show the model achieved an aggregate accuracy of 74.43%, with strong precision (79%) for the risk class, indicating prediction reliability. Feature Importance analysis provides significant clinical insights, identifying Daily Water Intake (water_intake_l) and Sleep Duration (sleep_hours) as the most dominant predictive factors, even surpassing physiological factors. The conclusion indicates that this hybrid approach is effective as an early screening instrument, with the main advantage being the transparency of lifestyle variable interpretation, which directly supports data-driven prevention strategies
Copyrights © 2025