Claim Missing Document
Check
Articles

Found 32 Documents
Search

Enhancing diabetes classification performance using XGBoost integrated with SMOTE and bayesian hyperparameter optimization Ulum, Muhammad Nurul Ihyaul; Unjung, Jumanto
Journal of Soft Computing Exploration Vol. 7 No. 1 (2026): March 2026
Publisher : SHM Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52465/joscex.v7i1.3

Abstract

Diabetes mellitus is a long-term metabolic disorder that is becoming more common around the world. Finding people at risk early can help prevent serious health problems and improve patient outcomes. Machine learning is often used to predict diabetes, but imbalanced medical data can make it harder for models to spot positive cases. In this study, we created a diabetes classification model by combining the Extreme Gradient Boosting (XGBoost) algorithm with the Synthetic Minority Over-sampling Technique (SMOTE), and we used Bayesian Optimization to fine-tune the model’s settings. We worked with the Pima Indians Diabetes Dataset, which has 768 patient records and eight clinical features. Our steps included preprocessing the data, splitting it into training and testing sets, using SMOTE to balance the training data classes, training the XGBoost model, and performing hyperparameter tuning using Bayesian Optimization with Stratified 5-Fold Cross-Validation to determine the optimal parameter configuration. The final model reached an accuracy of 0.88, a precision of 0.79, a recall of 0.91, an F1-score of 0.84, and a ROC-AUC of 0.955. These results show that our approach can identify diabetes cases more effectively while keeping strong overall performance.
Hybrid Chi-Square and Binary Particle Swarm Optimization Feature Selection with Prior-Corrected Multinomial Naive Bayes for SMS Spam Detection Chrisandito Sebastian Erlangga Bia; Jumanto Unjung
Journal of Vocational, Informatics and Computer Education Vol 4, No 2 (2026): June 2026
Publisher : Academic Bright Collaboration

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.66053/voice.v4i2.730

Abstract

Purpose – SMS spam remains a persistent cybersecurity threat, with 68% of mobile users exposed to unsolicited messages. Existing lightweight classifiers suffer from two compounding problems: feature representations that fail to capture semantic spam patterns, and class imbalance that biases probabilistic classifiers toward the majority class. This study proposes a unified pipeline that resolves both problems simultaneously. Methods – A dual feature extraction scheme combining TF-IDF with 12 empirically validated semantic features feeds a two-stage Chi-Square and Binary Particle Swarm Optimization (BPSO) feature selection pipeline. A Prior-Corrected Multinomial Naive Bayes (PC-MNB) recalibrates class priors at inference time to counteract Random Oversampling bias. Experiments were conducted on the UCI SMS Spam Collection. Findings – The proposed model achieved 98.07% accuracy, 95.45% macro F1, and 96.64% spam precision with only 4 false positives across 903 legitimate messages reducing false alarms by 89.5% over the strongest baseline. Research implications – Evaluation is limited to English-language SMS; generalization to multilingual corpora remains unvalidated. The rule-based semantic features are brittle against adversarial obfuscation, and BPSO incurs a one-time offline training cost of 10–25 minutes. Originality – This study is the first to integrate dual semantic-statistical feature extraction, filter-wrapper hybrid selection, and inference-time prior correction into a single CPU-deployable pipeline for SMS spam detection, distinguished from prior CS-BPSO work by domain, feature architecture, and probabilistic calibration mechanism. Future work will explore multilingual validation and SHAP-based explainability.