This study explores the transformative potential of big data analytics in the realm of health insurance risk management. Focusing on data sourced from Highmark Health from 2015 to 2018, the research aims to evaluate the efficacy of advanced data manipulation techniques and machine learning models in enhancing predictive accuracy. The analysis involves a comprehensive examination of Health Maintenance Organization (HMO) and Preferred Provider Organization (PPO) plans, with rigorous data preparation processes such as cleaning, aggregation, feature engineering, and outlier handling to ensure model suitability. Four distinct models were developed: an initial model utilizing raw data without outlier treatment, a model post-outlier treatment considering both HMO and PPO members, and models focusing exclusively on HMO and PPO members respectively. Results demonstrated significant improvements in predictive accuracy following outlier treatment, with Random Forest and Multivariate Adaptive Regression Splines showing superior performance. The Random Forest model achieved a Root Mean Square Error (RMSE) of 630.04 and an R-squared value of 0.757, underscoring its robust predictive capabilities. Similarly, the Multivariate Adaptive Regression Splines model exhibited strong fit with commendable metrics. The HMO-focused model yielded promising outcomes with a minimal RMSE of 675.85 and an R-squared value of 0.68. However, the PPO-focused model's suboptimal results highlight potential data quality issues and dataset limitations. This research underscores the critical role of integrating machine learning techniques in health insurance analytics, providing valuable insights for proactive risk management and decision-making, and enhancing efficiency and effectiveness within the industry,
Copyrights © 2025