This study examines how variations in training–testing data partition ratios influence the performance of the Naive Bayes algorithm in predicting diabetes among women, addressing the problem of whether different split proportions meaningfully affect classification outcomes. Employing a quantitative experimental design, the research utilizes the Pima Indians Diabetes dataset comprising 768 records, which undergo preprocessing prior to model development using the Gaussian Naive Bayes method across three partition scenarios: 70:30, 60:40, and 50:50. Model performance is assessed through accuracy, precision, recall, and F1-score to capture both predictive correctness and class sensitivity. The findings demonstrate that variations in data partitioning exert no statistically significant effect on overall model performance, as accuracy consistently ranges between 76% and 79% across all scenarios. Models trained with as little as 50% of the dataset still achieve comparable predictive capability, indicating stable generalization of the algorithm. The study argues that once a minimum threshold of training data is achieved, increasing data proportion does not substantially enhance performance, while class imbalance emerges as a more decisive factor influencing the effectiveness of diabetes prediction.
Copyrights © 2026