Claim Missing Document
Check
Articles

Found 1 Documents
Search

A Predictive Model for Type 2 Diabetes Using A Wrapper-Based Feature Selection Method Khairunisa Hilyati; Nuciko Abdul Halim; Wendi Usino
PIKSEL : Penelitian Ilmu Komputer Sistem Embedded and Logic Vol. 14 No. 1 (2026): March 2026
Publisher : LPPM Universitas Islam 45 Bekasi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33558/piksel.v14i1.12293

Abstract

Diabetes mellitus continues to show a rising global prevalence, making early detection of diabetes risk essential to prevent serious complications. This research aims to evaluate the effectiveness of a wrapper-based feature selection technique in improving the performance of classification models for early-stage diabetes risk prediction. The feature selection method employed is Recursive Feature Elimination (RFE), which is combined with three classification algorithms: Random Forest, Support Vector Machine (SVM), and Logistic Regression. The dataset used in this research was obtained from RSUD Pemangkat, Sambas Regency, West Kalimantan. The implementation of RFE is expected to identify and eliminate less relevant features, thereby simplifying the model, enhancing interpretability, and improving efficiency without compromising accuracy. This approach is particularly important in medical data analysis, where datasets are often complex and contain numerous clinical variables. Model performance is evaluated using accuracy, F1-score, and Area Under the Curve (AUC) to ensure a comprehensive assessment of classification capability. A comparative analysis is conducted to determine the optimal combination of feature selection method and classification algorithm that yields the best performance. In the scenario of applying the model with all features (baseline), Random Forest showed the best performance compared to other algorithms with an accuracy value of 0.9909, F1-Score of 0.9927, AUC of 0.9995, and sensitivity (recall) of 1.0000, which indicates that all cases of diabetes in the test data were successfully detected without false negative errors. SVM and Logistic Regression produced accuracies of 0.9545 and 0.9273, respectively. Despite having good classification capabilities, SVM tends to produce higher false positives, while Logistic Regression excels in the aspect of model interpretability. With an optimized model, the system has the potential to assist healthcare professionals in screening processes and clinical decision-making more quickly and effectively