Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : Building of Informatics, Technology and Science

Penerapan Penyeimbangan Data Pada Analisis Sentimen Ulasan Game Magic Chess Go Go di Play Store dengan Naive Bayes Mustaqim, Muhammad Hafizd; Santoso, Angga Bayu
Building of Informatics, Technology and Science (BITS) Vol 7 No 2 (2025): September 2025
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i2.7845

Abstract

This study aims to perform sentiment analysis on reviews of the Magic Chess Go Go game from the Google Play Store, which exhibits data imbalance with 2,949 negative sentiment entries and 1,537 positive ones. To address this issue, a sentiment classification model was developed using the Naïve Bayes algorithm, along with a comparison of four data balancing methods: SMOTE, ADASYN, Random Oversampling (ROS), and Random Undersampling (RUS). Evaluation was conducted using a confusion matrix under two data splitting schemes, with the 80:20 split yielding the best performance. In this scheme, SMOTE achieved the highest accuracy at 84.2%, followed by ADASYN (83.8%), ROS (82.9%), and RUS (77.9%). These results indicate that SMOTE is the most effective method for handling data imbalance in this context. It can be concluded that applying SMOTE to the Naïve Bayes model in the 80:20 split scenario provides the best performance, demonstrating that synthetic data generation through SMOTE helps balance the dataset without significant information loss. Future work may explore alternative algorithms and parameter tuning to enhance sentiment classification performance.
Optimizing Ensemble Learning Models with SMOTE-ENN for Early Stroke Detection in Imbalanced Clinical Datasets Nurmala, Dina; Santoso, Angga Bayu
Building of Informatics, Technology and Science (BITS) Vol 7 No 4 (2026): March 2026
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i4.9347

Abstract

Stroke remains a leading cause of mortality and long-term disability worldwide, including in Indonesia, highlighting the urgent need for early risk identification. Machine learning models for stroke prediction often suffer from severe class imbalance, where stroke cases constitute only 4.9% of clinical datasets, leading to biased predictions that favor the majority class. This study evaluates three ensemble and kernel-based algorithms Random Forest, XGBoost, and Support Vector Machinecombined with two resampling strategies (SMOTE and SMOTE-ENN) using the Healthcare Stroke Dataset (5,110 records, 11 clinical attributes). To prevent data leakage, resampling was strictly applied within each training fold of 5-fold stratified cross-validation, while all evaluations were conducted on the original imbalanced test set. The results demonstrate that XGBoost integrated with SMOTE-ENN achieved the highest minority-class sensitivity, improving PR-AUC by 23.5% (0.1537 vs. 0.1244 with SMOTE alone), while detecting 24% of stroke cases (12 out of 50) in the test set. Although cross-validation results indicate strong class discrimination with AUC-ROC values above 0.98, the low PR-AUC reflects the operational challenge of extreme class imbalance and the inevitable trade-off between recall and precision, resulting in an increased number of false positives. Consequently, the proposed model is best positioned as a first-tier population screening tool that flags high-risk individuals for confirmatory clinical diagnostics, rather than as a standalone diagnostic system. The approach maintains computational efficiency (training time < 0.12 seconds) and substantially improves model stability, evidenced by a 73% reduction in cross-validation variance. These findings support the integration of hybrid resampling techniques with ensemble learning as a practical and scalable framework for early stroke risk screening in resource-constrained primary healthcare settings.