Salim, Edwin Ibrahim
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Machine Learning Based Prediction of Osteoporosis Risk Using the Gradient Boosting Algorithm and Lifestyle Data Salim, Edwin Ibrahim; Rahardi, Majid
Journal of Applied Informatics and Computing Vol. 9 No. 6 (2025): December 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i6.10483

Abstract

Osteoporosis is a degenerative disease characterized by decreased bone mass and an increased risk of fractures, particularly among the elderly population. Early detection is essential; however, standard diagnostic methods such as Dual-Energy X-ray Absorptiometry (DEXA) remain limited in terms of availability and cost. This study aims to develop a machine learning-based risk prediction model for osteoporosis by utilizing lifestyle data with the Gradient Boosting algorithm. The secondary dataset was obtained from the Kaggle platform, consisting of 1,958 samples covering lifestyle and clinical attributes such as age, gender, physical activity, smoking habits, calcium intake, vitamin D consumption, and family history. Preprocessing involved normalization and categorical feature encoding, along with a balance check of class distribution, which indicated that the dataset was relatively balanced. The data were then divided using stratified sampling with an 80% training set and 20% testing set. Model performance was evaluated using accuracy, precision, recall, F1-score, and the area under the ROC curve (AUC). The results showed that the Gradient Boosting algorithm achieved an accuracy of 91%, precision of 90.8%, recall of 90.2%, F1-score of 90.5%, and an AUC of 0.92, outperforming baseline methods such as Logistic Regression and Random Forest. These findings demonstrate that Gradient Boosting is effective as a decision-support tool for early osteoporosis screening based on lifestyle data and has the potential to be integrated into clinical decision-making systems to enhance early detection in healthcare services. Nevertheless, since this study relied on a secondary dataset from Kaggle, the results require further validation using real clinical data from Indonesia to ensure representativeness for the local population.