The underwriting process in life insurance is a critical step in determining the risk classification of prospective policyholders, which impacts premium setting and the company’s sustainability. This study aims to analyze underwriting risk classification using the Ordinal Logistic Regression and XGBoost methods. The data used is the Prudential Life Insurance Assessment dataset, consisting of 59,381 training data points and 19,765 test data points with over 120 variables. The research methodology includes data preprocessing, variable selection using XGBoost, and modeling using Ordinal Logistic Regression and XGBoost. Model evaluation was conducted using the accuracy metric and Quadratic Weighted Kappa (QWK). The results indicate that variables related to health conditions and medical history, such as Medical_History, Medical_Keyword, and BMI, have a significant influence on risk classification. The Ordinal Logistic Regression model offers an advantage in interpreting relationships between variables, while XGBoost demonstrates fairly good classification performance with an accuracy of 0.568 and a QWK of 0.540. Overall, this study demonstrates that a combination of statistical and machine learning approaches can support a more effective underwriting process in the life insurance industry.
Copyrights © 2026