Scientific Journal of Informatics
Vol. 12 No. 3: August 2025

Optimization of Random Forest Algorithm with SMOTE Method to Improve the Accuracy of Early Diabetes Prediction

Nisa, Siti Khoirun (Unknown)
Barata, Mula Agung (Unknown)
Yuwita, Pelangi Eka (Unknown)



Article Info

Publish Date
04 Aug 2025

Abstract

Purpose: This research aims to examine the performance of the random forest algorithm in diabetes risk classification with data balancing using the Synthetic Minority Oversampling Technique (SMOTE) method to improve the representation of minority classes and increase the prediction accuracy value. Methods: The study used the Behavioral Risk Factor Surveillance System (BRFSS) dataset, obtained from Kaggle, which contains health-related survey data used to identify individuals at risk of diabetes. The Random Forest algorithm was applied to classify diabetes. To balance the data, the SMOTE method was used. The model’s performance was evaluated using 10-fold cross-validation by comparing result before and after SMOTE. Result: The results showed that the application of the SMOTE method improved the performance of the Random Forest classification model, especially in minority classes. Model performance in minority classes without SMOTE had poor evaluation metrics with precision of 49%, recall of 18%, and F1-score of 26%. After applying SMOTE, these values increased to precision of 96%, recall of 88%, and F1-score of 92%. Representing improvements of 47 percentage points in precision, 70 points in recall, and 66 points F1-score. The overall accuracy of the Random Forest model also increased from 86% to 92%, showing a 6 percentage point improvement. Novelty: This study use integrating the Random Forest algorithm with the SMOTE technique and validating the results using 10-fold cross-validation. The combination significantly improves minority class prediction performance in early diabetes detection, addressing the common limitations of previous studies in handling imbalanced datasets effectively.

Copyrights © 2025






Journal Info

Abbrev

sji

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management Electrical & Electronics Engineering Engineering

Description

Scientific Journal of Informatics (p-ISSN 2407-7658 | e-ISSN 2460-0040) published by the Department of Computer Science, Universitas Negeri Semarang, a scientific journal of Information Systems and Information Technology which includes scholarly writings on pure research and applied research in the ...