Claim Missing Document
Check
Articles

Found 5 Documents
Search
Journal : Scientific Journal of Informatics

Classification Modeling with RNN-based, Random Forest, and XGBoost for Imbalanced Data: A Case of Early Crash Detection in ASEAN-5 Stock Markets Siswara, Deri; M. Soleh, Agus; Hamim Wigena, Aji
Scientific Journal of Informatics Vol. 11 No. 3: August 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v11i3.4067

Abstract

Purpose: This research aims to evaluate the performance of several Recurrent Neural Network (RNN) architectures, including Simple RNN, Gated Recurrent Units (GRU), and Long Short-Term Memory (LSTM), compared to classic algorithms such as Random Forest and XGBoost, in building classification models for early crash detection in the ASEAN-5 stock markets. Methods: The study examines imbalanced data, which is expected due to the rarity of market crashes. It analyzes daily data from 2010 to 2023 across the major stock markets of the ASEAN-5 countries: Indonesia, Malaysia, Singapore, Thailand, and the Philippines. A market crash is the target variable when the primary stock price indices fall below the Value at Risk (VaR) thresholds of 5%, 2.5%, and 1%. Predictors include technical indicators from major local and global markets and commodity markets. The study incorporates 213 predictors with their respective lags (5, 10, 15, 22, 50, 200) and uses a time step of 7, expanding the total number of predictors to 1,491. The challenge of data imbalance is addressed with SMOTE-ENN. Model performance is evaluated using the false alarm rate, hit rate, balanced accuracy, and the precision-recall curve (PRC) score. Result: The results indicate that all RNN-based architectures outperform Random Forest and XGBoost. Among the various RNN architectures, Simple RNN is the most superior, primarily due to its simple data characteristics and focus on short-term information. Novelty: This study enhances and extends the range of phenomena observed in previous studies by incorporating variables such as different geographical zones and periods and methodological adjustments.
Performance of Ensemble Learning in Diabetic Retinopathy Disease Classification Nurizki, Anisa; Fitrianto, Anwar; Mohamad Soleh, Agus
Scientific Journal of Informatics Vol. 11 No. 2: May 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v11i2.4725

Abstract

Purpose: This study explores diabetic retinopathy (DR), a complication of diabetes leading to blindness, emphasizing early diagnostic interventions. Leveraging Macular OCT scan data, it aims to optimize prevention strategies through tree-based ensemble learning. Methods: Data from RSKM Eye Center Padang (October-December 2022) were categorized into four scenarios based on physician certificates: Negative & non-diagnostic DR versus Positive DR, Negative versus Positive DR, Non-Diagnosis versus Positive DR, and Negative DR versus non-Diagnosis versus Positive DR. The suitability of each scenario for ensemble learning was assessed. Class imbalance was addressed with SMOTE, while potential underfitting in random forest models was investigated. Models (RF, ET, XGBoost, DRF) were compared based on accuracy, precision, recall, and speed. Results: Tree-based ensemble learning effectively classifies DR, with RF performing exceptionally well (80% recall, 78.15% precision). ET demonstrates superior speed. Scenario III, encompassing positive and undiagnosed DR, emerges as optimal, with the highest recall and precision values. These findings underscore the practical utility of tree-based ensemble learning in DR classification, notably in Scenario III. Novelty: This research distinguishes itself with its unique approach to validating tree-based ensemble learning for DR classification. This validation was accomplished using Macular OCT data and physician certificates, with ETDRS scores demonstrating promising classification capabilities.
Evaluating Ensemble Learning Techniques for Class Imbalance in Machine Learning: A Comparative Analysis of Balanced Random Forest, SMOTE-RF, SMOTEBoost, and RUSBoost Fulazzaky, Tahira; Saefuddin, Asep; Soleh, Agus Mohamad
Scientific Journal of Informatics Vol. 11 No. 4: November 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v11i4.15937

Abstract

Purpose: This research aims to identify the optimal ensemble learning method for mitigating class imbalance in datasets utilizing various advanced techniques which include balanced random forest (BRF), SMOTE-random forest (SMOTE-RF), RUSBoost, and SMOTEBoost. The methods were systematically evaluated against conventional algorithms, including random forest and AdaBoost, across heterogeneous datasets with varying class imbalance ratios. Methods: This study utilized 13 secondary datasets from diverse sources, each with binary class outputs. The datasets exhibited varying degrees of class imbalance, offering scenarios to assess the effectiveness of ensemble learning techniques and traditional machine learning approaches in managing class imbalance issues. Study data were split into training (80%) and testing (20%), with stratified sampling applied to maintain consistent class proportions across both sets. Each method underwent hyperparameter optimization with distinct settings with repetition over 10 iterations. The optimal method was evaluated based on balanced accuracy, recall, and computation time. Result: Based on the evaluation, the BRF method exhibited the highest performance in balanced accuracy and recall when compared to SMOTE-RF, RUSBoost, SMOTEBoost, random forest, and AdaBoost. Conversely, the classical random forest method outperformed other techniques in terms of computational efficiency. Novelty: This study presents an innovative analysis of advanced ensemble learning techniques, including BRF, SMOTE-random forest, SMOTEBoost, and RUSBoost, which demonstrate significant effectiveness in addressing class imbalance across various datasets. By systematically optimizing hyperparameters and applying stratified sampling, this research produces findings that redefine the benchmarks of balanced accuracy, recall and computational efficiency in machine learning.
A Hybrid Sampling Approach for Handling Data Imbalance in Ensemble Learning Algorithms Astari, Reka Agustia; Sumertajaya, I Made; Soleh, Agus Mohamad
Scientific Journal of Informatics Vol. 12 No. 2: May 2025
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v12i2.19163

Abstract

Purpose: This research aims to address the methodological challenges posed by imbalanced data in classification tasks, where minority classes are severely underrepresented, often leading to biased model performance. It evaluates the effectiveness of hybrid sampling techniques specifically, the Synthetic Minority Oversampling Technique combined with Neighborhood Cleaning Rule (SMOTE-NCL) and with Edited Nearest Neighbors (SMOTE-ENN) in improving the predictive performance of ensemble classifiers, namely Double Random Forest (DRF) and Extremely Randomized Trees (ET), with a focus on enhancing minority class detection. Methods: A total of eighteen simulated scenarios were developed by varying class imbalance ratios, sample sizes, and feature correlation levels. In addition, empirical data from the 2023 National Socioeconomic Survey (SUSENAS) in Riau Province were employed. The data were partitioned using stratified random sampling (80% training, 20% testing). Models were trained with and without hybrid sampling and optimized through grid search. Their performance was evaluated over 100 iterations using balanced accuracy, sensitivity, and G-mean. Feature importance was interpreted using Shapley Additive Explanations (SHAP). Results: DRF combined with SMOTE-NCL consistently outperformed all other models, achieving 87.56% balanced accuracy, 82.17% sensitivity, and 86.75% G-mean in the most extreme simulation scenario. On the empirical dataset, the model achieved 76.37% balanced accuracy and 75.49% G-mean. Novelty: This study introduces a novel integration of hybrid sampling techniques and ensemble learning within an interpretable machine learning framework, providing a robust solution for poverty classification in imbalanced datasets.
Comparison of Ensemble Forest-Based Methods Performance for Imbalanced Data Classification Hasnataeni, Yunia; Saefuddin, Asep; Soleh, Agus Mohamad
Scientific Journal of Informatics Vol. 12 No. 2: May 2025
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v12i2.24269

Abstract

Purpose: Classification of imbalanced data presents a major challenge in meteorological studies, particularly in rainfall classification where extreme events occur infrequently. This research addresses the issue by evaluating ensemble learning models in handling imbalanced rainfall data in Bogor Regency, aiming to improve classification performance and model reliability for hydrometeorological risk mitigation. Methods: Four ensemble methods: RF, RoF, DRF, and RoDRF were applied to rainfall classification using three resampling techniques: SMOTE, RUS, and SMOTE-RUS-NC. The data underwent preprocessing, stratified splitting, resampling, and 5-fold cross-validation. Performance was evaluated over 100 iterations using accuracy, precision, recall, and F1-score. Result: The combination of DRF with SMOTE-RUS-NC yielded the most balanced results between accuracy (0.989) and computation time (107.28 seconds), while RoDRF with SMOTE achieved the highest overall performance with an accuracy of 0.991 but required a longer computation time (149.30 seconds). Feature importance analysis identified average humidity, maximum temperature, and minimum temperature as the most influential predictors of extreme rainfall. Novelty: This research contributes a comprehensive comparison of ensemble forest-based methods for imbalanced rainfall data, revealing DRF-SMOTE as an optimal trade-off between performance and efficiency. The findings contribute to improved rainfall classification models and offer practical insight for disaster mitigation planning and resource management in tropical regions.
Co-Authors Aam Alamudi Afendi, Farit M Aji Hamim Wigena Alfa Nugraha Pradana Alfa Nugraha Pradana Anadra, Rahmi Anang Kurnia Andespa, Reyuli Andriansyah, . Anik Djuraidah Annisarahmi Nur Aini Aldania Ardhani, Rizky Arif Handoyo Marsuhandi Aris Yaman ASEP SAEFUDDIN Astari, Reka Agustia Baehera, Seta Bagus Sartono Belinda, Nadira Sri Budi Susetyo Cici Suhaeni Dalimunthe, Amir Abduljabbar Daulay, Nurmai Syaroh Dede Dirgahayu Domiri Dede Dirgahayu Domiri Dede Dirgahayu Domiri, Dede Dirgahayu Deri Siswara Devi Andrian Dini Ramadhani Erfiani Erfiani Erfiani Etis Sunandi Farit Mochamad Afendi Fitrianto, Anwar Fulazzaky, Tahira Hamim Wigena, Aji Hari Wijayanto Hari Wijayanto Hasnataeni, Yunia Hengki Muradi Herlin Fransiska I Gusti Ngurah, Sentana Putra I Made Sumertajaya Indahwati Jumansyah, L. M. Risman Dwi Karel Fauzan Hakim Khairil Anwar Notodiputro Koesnandy H, Abialam Kusman Sadik Kusnaeni Kusnaeni, Kusnaeni Latifah K. Darusman Leni Anggraini Susanti Lutfiah Adisti, Tiara M. Yunus Mohamad Rafi Mubarak, Fadhlul Muhammad Nur Aidi Muhammad Nuruddin Prathama Muhammad Yusran Muradi, Hengki Nisrina Az-Zahra, Putri Nofrida Elly Zendrato NURADILLA, SITI Nurhambali, M Rizky Nurizki, Anisa Pika Silvianti Rahardiantoro, Septian Rais Ramadhani, Dini Rizki Manaf, Silmi Anisa Rizki, Akbar Rochman, Nur Seran, Karlina Setyono Siregar, Indra Rivaldi Siti Arni Wulandya, Siti Arni Siti Hafsah Suhaeni, Cici Tarida, Arna Ristiyanti Tyas, Maulida Fajrining Uswatun Hasanah Utami Dyah Syafitri Yanke, Aldino Yudistira Yudistira Yumna Karimah _ Aunuddin