Claim Missing Document
Check
Articles

Found 1 Documents
Search

Depression Risk Prediction Among Teenagers Using Explainable Machine Learning and Imbalanced Behavioral Data Rudi Setiawan; Effan Najwaini; Rezania Agramanisti Azdy; Rasmiati Rasyid
International Journal of Artificial Intelligence in Medical Issues Vol. 4 No. 1 (2026): International Journal of Artificial Intelligence in Medical Issues
Publisher : Yocto Brain

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56705/0w9q4238

Abstract

Adolescent depression has become an important public health concern, particularly in relation to increasing digital media exposure, lifestyle changes, and psychosocial pressure. This study proposes an explainable machine learning framework for predicting depression risk among teenagers using social media usage, lifestyle behavior, and psychosocial indicators. The dataset consisted of 1,200 records with 13 variables, including age, gender, daily social media hours, platform usage, sleep hours, screen time before sleep, academic performance, physical activity, social interaction level, stress level, anxiety level, addiction level, and depression label. The target variable was highly imbalanced, with 1,169 samples categorized as non-depression and only 31 samples categorized as depression risk. Several machine learning models were evaluated, including Logistic Regression, Random Forest, Support Vector Machine, and Gradient Boosting. The experiments compared two feature settings, namely behavioral-only features and full features, combined with three imbalance handling strategies: no imbalance treatment, class weighting, and SMOTE. Model performance was evaluated using accuracy, precision, recall, F1-score, balanced accuracy, ROC-AUC, PR-AUC, Cohen’s Kappa, MAE, and RMSE. The results showed that the full-feature setting substantially outperformed the behavioral-only setting. The best performance was achieved by Random Forest using full features without imbalance handling, producing perfect classification results with accuracy, precision, recall, F1-score, ROC-AUC, and PR-AUC of 1.0000. Permutation importance analysis identified sleep hours, stress level, anxiety level, and daily social media hours as the most influential predictors. These findings indicate that teenage depression risk in this dataset is strongly associated with sleep behavior and psychosocial conditions, in addition to social media exposure. Although the model achieved excellent performance, the result should be interpreted cautiously due to the small number of positive depression-risk samples and the possibility of highly separable label patterns. Therefore, the proposed approach should be positioned as an early risk screening framework rather than a clinical diagnostic tool