Journal of Computing Theories and Applications
Vol. 2 No. 4 (2025): JCTA 2(4) 2025

Big Data-Driven Health Risk Stratification: A Health Index-Based Approach Using Feature Importance and PySpark

Abioye, Oluwasegun Abiodun (Unknown)
Irhebhude, Martins Ekata (Unknown)



Article Info

Publish Date
24 Mar 2025

Abstract

Health risk stratification is crucial for preventive healthcare, yet existing models often rely on binary classification generalized disease prediction, neglecting personalized health indicators and graded risk levels. Many studies apply feature selection techniques like Relief and Univariate Selection without quantifying the weighted impact of features. To address these gaps, this study introduces a Big Data-driven Health Index (HI) framework using PySpark for scalable health risk stratification. The HI is computed as a weighted sum of health-related features using SHAP Analysis, XGBoost, Random Forest, and Correlation Analysis. PySpark enables efficient processing of large-scale health data, and individuals are classified into Low and High Risk. Optimal classification thresholds are determined using the Youden Index from the ROC curve to balance sensitivity and specificity. Personalized health recommendations are generated based on risk categories to guide preventive interventions. Performance evaluation reveals that Correlation Analysis achieves 100% precision and 98.90% recall, outperforming other methods. SHAP prioritizes recall but has low precision, while XGBoost and Random Forest improve precision but struggle with recall. By leveraging Big Data techniques with PySpark, this study enhances computational efficiency, scalability, and classification accuracy, addressing prior research limitations and providing a robust data-driven approach to personalized health monitoring.

Copyrights © 2025






Journal Info

Abbrev

jcta

Publisher

Subject

Computer Science & IT Decision Sciences, Operations Research & Management

Description

Journal of Computing Theories and Applications (JCTA) is a refereed, international journal that covers all aspects of foundations, theories and the practical applications of computer science. FREE OF CHARGE for submission and publication. All accepted articles will be published online and accessed ...