Stroke is a serious disease with global impact that requires high-accuracy early detection. Significant difficulties in designing machine learning-based predictive models arise due to disproportionate data conditions (imbalanced datasets). This occurs because the number of stroke cases (minority class) is very small compared to non-stroke cases. This imbalanced data situation often causes models to become biased and potentially produce high false negative rates, which is very risky in a clinical setting. This study focuses on improving the sensitivity of the Gaussian Naive Bayes (GNB) model through hyperparameter optimization and classification threshold adjustment. The research process included data preprocessing, stratified dataset division (70% training and 30% testing), feature scaling, var_smoothing parameter optimization using GridSearchCV, and threshold adjustment to maximize the Recall value. The results showed that the standard GNB model only achieved a Recall value of 0.4400. However, after var_smoothing optimization (1.00×10⁻¹⁰) and threshold adjustment to 0.0100, the Recall value increased significantly to 0.8000. This increase was accompanied by a decrease in Accuracy (0.5988) and Precision (0.0909). This improvement was accompanied by a decrease in Accuracy (0.5988) and Precision (0.0909). The high Recall (0.8000) indicates that the model is better for mass screening (early detection phase), although it must be balanced with further diagnostic processes due to low precision. This high Recall value confirms the model's success in minimizing False Negatives, which is a top priority in stroke risk prediction cases.
Copyrights © 2025