Diabetes mellitus is a chronic disease whose prevalence continues to increase and demands accurate early detection solutions that are adaptive to patient data diversity. This study implements the stacking ensemble method for diabetes risk classification with a cross-domain approach, integrating two popular datasets, namely the PIMA Indians Diabetes and NHANES. The experimental pipeline includes feature and label harmonization, missing value imputation using the median, standardization, and class balancing through oversampling. The base models used include Random Forest, Support Vector Machine, Decision Tree, and Multi-Layer Perceptron, with Logistic Regression as the meta learner in the stacking scheme. The evaluation was conducted systematically using stratified k-fold cross-validation and test split, as well as cross-domain scenarios to measure the model's cross-domain adaptation capabilities. In the adaptive domain scenario, the stacking ensemble achieved an accuracy of approximately 0.987% with a recall of 1.000% and an ROC-AUC of approximately 0.987%, while the accuracy of the single base learner reached an accuracy of 0.976% with a recall of 1.000% and an ROC-AUC of approximately 0.977%, thus demonstrating that the adaptive domain stacking approach provides consistently higher performance than the base model. These findings confirm the superiority of adaptive domain-based stacking in dealing with medical data heterogeneity and class imbalance issues, and reinforce its potential as a decision support system for early detection of diabetes in a wider population.
Copyrights © 2026