Diabetes mellitus represents a metabolic disease that constitutes a global health challenge with continuously increasing prevalence rates. Early detection through automated prediction systems can help reduce complications and treatment costs. This study develops a diabetes mellitus prediction system using an ensemble gradient boosting approach optimized with advanced feature engineering. The research dataset combines 768 Pima Indians samples with 5,000 samples from diabetes prediction dataset, resulting in 5,768 total data points subsequently balanced using ADASYN technique. Feature engineering process transforms 8 original features into 25 predictive features encompassing diabetes risk scores, BMI categories, age groups, and glucose categories. Three gradient boosting algorithms (XGBoost, LightGBM, CatBoost) along with ensemble voting classifier were optimized using Optuna framework with Tree-structured Parzen Estimator. Evaluation employed accuracy, precision, recall, F1-score, and ROC-AUC metrics through 5-fold cross validation. Results demonstrate LightGBM achieving optimal performance with 97.14% accuracy and 0.9976 ROC-AUC, followed by CatBoost (97.14%, 0.9973) and XGBoost (96.45%, 0.9971). Feature importance analysis identified DiabetesPedigreeFunction, Pregnancies, and SmokingHistory as key predictors. The developed model can be implemented as a diabetes screening system in primary healthcare facilities
                        
                        
                        
                        
                            
                                Copyrights © 2025