Claim Missing Document
Check
Articles

Performance of Ensemble Learning in Diabetic Retinopathy Disease Classification Nurizki, Anisa; Fitrianto, Anwar; Mohamad Soleh, Agus
Scientific Journal of Informatics Vol. 11 No. 2: May 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v11i2.4725

Abstract

Purpose: This study explores diabetic retinopathy (DR), a complication of diabetes leading to blindness, emphasizing early diagnostic interventions. Leveraging Macular OCT scan data, it aims to optimize prevention strategies through tree-based ensemble learning. Methods: Data from RSKM Eye Center Padang (October-December 2022) were categorized into four scenarios based on physician certificates: Negative & non-diagnostic DR versus Positive DR, Negative versus Positive DR, Non-Diagnosis versus Positive DR, and Negative DR versus non-Diagnosis versus Positive DR. The suitability of each scenario for ensemble learning was assessed. Class imbalance was addressed with SMOTE, while potential underfitting in random forest models was investigated. Models (RF, ET, XGBoost, DRF) were compared based on accuracy, precision, recall, and speed. Results: Tree-based ensemble learning effectively classifies DR, with RF performing exceptionally well (80% recall, 78.15% precision). ET demonstrates superior speed. Scenario III, encompassing positive and undiagnosed DR, emerges as optimal, with the highest recall and precision values. These findings underscore the practical utility of tree-based ensemble learning in DR classification, notably in Scenario III. Novelty: This research distinguishes itself with its unique approach to validating tree-based ensemble learning for DR classification. This validation was accomplished using Macular OCT data and physician certificates, with ETDRS scores demonstrating promising classification capabilities.
Evaluating Ensemble Learning Techniques for Class Imbalance in Machine Learning: A Comparative Analysis of Balanced Random Forest, SMOTE-RF, SMOTEBoost, and RUSBoost Fulazzaky, Tahira; Saefuddin, Asep; Soleh, Agus Mohamad
Scientific Journal of Informatics Vol. 11 No. 4: November 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v11i4.15937

Abstract

Purpose: This research aims to identify the optimal ensemble learning method for mitigating class imbalance in datasets utilizing various advanced techniques which include balanced random forest (BRF), SMOTE-random forest (SMOTE-RF), RUSBoost, and SMOTEBoost. The methods were systematically evaluated against conventional algorithms, including random forest and AdaBoost, across heterogeneous datasets with varying class imbalance ratios. Methods: This study utilized 13 secondary datasets from diverse sources, each with binary class outputs. The datasets exhibited varying degrees of class imbalance, offering scenarios to assess the effectiveness of ensemble learning techniques and traditional machine learning approaches in managing class imbalance issues. Study data were split into training (80%) and testing (20%), with stratified sampling applied to maintain consistent class proportions across both sets. Each method underwent hyperparameter optimization with distinct settings with repetition over 10 iterations. The optimal method was evaluated based on balanced accuracy, recall, and computation time. Result: Based on the evaluation, the BRF method exhibited the highest performance in balanced accuracy and recall when compared to SMOTE-RF, RUSBoost, SMOTEBoost, random forest, and AdaBoost. Conversely, the classical random forest method outperformed other techniques in terms of computational efficiency. Novelty: This study presents an innovative analysis of advanced ensemble learning techniques, including BRF, SMOTE-random forest, SMOTEBoost, and RUSBoost, which demonstrate significant effectiveness in addressing class imbalance across various datasets. By systematically optimizing hyperparameters and applying stratified sampling, this research produces findings that redefine the benchmarks of balanced accuracy, recall and computational efficiency in machine learning.
A Hybrid Sampling Approach for Handling Data Imbalance in Ensemble Learning Algorithms Astari, Reka Agustia; Sumertajaya, I Made; Soleh, Agus Mohamad
Scientific Journal of Informatics Vol. 12 No. 2: May 2025
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v12i2.19163

Abstract

Purpose: This research aims to address the methodological challenges posed by imbalanced data in classification tasks, where minority classes are severely underrepresented, often leading to biased model performance. It evaluates the effectiveness of hybrid sampling techniques specifically, the Synthetic Minority Oversampling Technique combined with Neighborhood Cleaning Rule (SMOTE-NCL) and with Edited Nearest Neighbors (SMOTE-ENN) in improving the predictive performance of ensemble classifiers, namely Double Random Forest (DRF) and Extremely Randomized Trees (ET), with a focus on enhancing minority class detection. Methods: A total of eighteen simulated scenarios were developed by varying class imbalance ratios, sample sizes, and feature correlation levels. In addition, empirical data from the 2023 National Socioeconomic Survey (SUSENAS) in Riau Province were employed. The data were partitioned using stratified random sampling (80% training, 20% testing). Models were trained with and without hybrid sampling and optimized through grid search. Their performance was evaluated over 100 iterations using balanced accuracy, sensitivity, and G-mean. Feature importance was interpreted using Shapley Additive Explanations (SHAP). Results: DRF combined with SMOTE-NCL consistently outperformed all other models, achieving 87.56% balanced accuracy, 82.17% sensitivity, and 86.75% G-mean in the most extreme simulation scenario. On the empirical dataset, the model achieved 76.37% balanced accuracy and 75.49% G-mean. Novelty: This study introduces a novel integration of hybrid sampling techniques and ensemble learning within an interpretable machine learning framework, providing a robust solution for poverty classification in imbalanced datasets.
Comparison of Ensemble Forest-Based Methods Performance for Imbalanced Data Classification Hasnataeni, Yunia; Saefuddin, Asep; Soleh, Agus Mohamad
Scientific Journal of Informatics Vol. 12 No. 2: May 2025
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v12i2.24269

Abstract

Purpose: Classification of imbalanced data presents a major challenge in meteorological studies, particularly in rainfall classification where extreme events occur infrequently. This research addresses the issue by evaluating ensemble learning models in handling imbalanced rainfall data in Bogor Regency, aiming to improve classification performance and model reliability for hydrometeorological risk mitigation. Methods: Four ensemble methods: RF, RoF, DRF, and RoDRF were applied to rainfall classification using three resampling techniques: SMOTE, RUS, and SMOTE-RUS-NC. The data underwent preprocessing, stratified splitting, resampling, and 5-fold cross-validation. Performance was evaluated over 100 iterations using accuracy, precision, recall, and F1-score. Result: The combination of DRF with SMOTE-RUS-NC yielded the most balanced results between accuracy (0.989) and computation time (107.28 seconds), while RoDRF with SMOTE achieved the highest overall performance with an accuracy of 0.991 but required a longer computation time (149.30 seconds). Feature importance analysis identified average humidity, maximum temperature, and minimum temperature as the most influential predictors of extreme rainfall. Novelty: This research contributes a comprehensive comparison of ensemble forest-based methods for imbalanced rainfall data, revealing DRF-SMOTE as an optimal trade-off between performance and efficiency. The findings contribute to improved rainfall classification models and offer practical insight for disaster mitigation planning and resource management in tropical regions.
Land Use Change Modelling Using Logistic Regression, Random Forest and Additive Logistic Regression in Kubu Raya Regency, West Kalimantan Pradana, Alfa Nugraha; Djuraidah, Anik; Soleh, Agus Mohamad
Forum Geografi Vol 37, No 2 (2023): December 2023
Publisher : Universitas Muhammadiyah Surakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.23917/forgeo.v37i2.23270

Abstract

Kubu Raya Regency is a regency in the province of West Kalimantan which has a wetland ecosystem including a high-density swamp or peatland ecosystem along with an extensive area of mangroves. The function of wetland ecosystems is essential for fauna, as a source of livelihood for the surrounding community and as storage reservoir for carbon stocks. Most of the land in Kubu Raya Regency is peatland. As a consequence, peat has long been used for agriculture and as a source of livelihood for the community. Along with the vast area of peat, the regency also has a potential high risk of peat fires. This study aims to predict land use changes in Kubu Raya Regency using three statistical machine learning models, specifically Logistic Regression (LR), Random Forest (RF) and Additive Logistic Regression (ALR). Land cover map data were acquired from the Ministry of Environment and Forestry and subsequently reclassified into six types of land cover at a resolution of 100 m. The land cover data were employed to classify land use or land cover class for the Kubu Raya regency, for the years 2009, 2015 and 2020. Based on model performance, RF provides greater accuracy and F1 score as opposed to LR and ALR. The outcome of this study is expected to provide knowledge and recommendations that may aid in developing future sustainable development planning and management for Kubu Raya Regency.
Handling Multicollinearity Problems in Indonesia's Economic Growth Regression Modeling Based on Endogenous Economic Growth Theory: Penanganan Masalah Multikolinieritas pada Pemodelan Pertumbuhan Ekonomi Indonesia Berdasarkan Teori Pertumbuhan Ekonomi Endogenous Yanke, Aldino; Zendrato, Nofrida Elly; Soleh, Agus M
Indonesian Journal of Statistics and Applications Vol 6 No 2 (2022)
Publisher : Statistics and Data Science Program Study, IPB University, IPB University, in collaboration with the Forum Pendidikan Tinggi Statistika Indonesia (FORSTAT) and the Ikatan Statistisi Indonesia (ISI)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v6i2p214-230

Abstract

One of the multiple linear regression applications in economics is Indonesia’s economic growth model based on the theory of endogenous economic growth. Endogenous economic theory is the development of classical theory which cannot explain how the economy grows in the long run. The regression model based on the theory of endogenous economic growth used many independent variables, which caused multicollinearity problems. In this study, the multiple linear regression model using the least-squares estimation method and some methods to handle the multicollinearity problem was implemented. Variable selection methods (backward, forward, and stepwise), principal component regression (PCR), partial least square (PLS), and regularization methods (Ridge, Lasso, and Elastic Net) were applied to solve the multicollinearity problem. Variable selection method with backward, forward, and stepwise has not been able to overcome the problem of multicollinearity. In contrast, Principal Component Regression, PLS regression, and regularization regression methods overcame the multicollinearity problem. We used "leave one out cross-validation" (LOOCV) to determine the best method for handling multicollinearity problems with the smallest mean square of error (MSE). Based on the MSE value, the best method to overcome the multicollinearity problem in the economic growth model based on endogenous economic growth theory was the Lasso regression method.
Study of Spatial Autoregressive Regression With Heteroskedasticity Using the Generalized Method of Moments and Bayesian Approach : Kajian Regresi Spasial Autoregresif dengan Heteroskedastik Menggunakan Generalized Method of Moments dan Pendekatan Bayes Koesnandy H, Abialam; Agus Mohamad Soleh; Farit Mochamad Afendi
Indonesian Journal of Statistics and Applications Vol 8 No 1 (2024)
Publisher : Statistics and Data Science Program Study, IPB University, IPB University, in collaboration with the Forum Pendidikan Tinggi Statistika Indonesia (FORSTAT) and the Ikatan Statistisi Indonesia (ISI)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v8i1p58-69

Abstract

Spatial dependence and spatial heteroskedasticity are problems in spatial regression. Spatial autoregressive regression (SAR) concerns only to the dependence on lag. The estimation of SAR parameters containing heteroskedasticity using the maximum likelihood estimation (MLE) method provides biased and inconsistent estimators. The alternative method that can be used are generalized method of moments (GMM) and Bayesian method. GMM uses a combination of linear and quadratic moment functions simultaneously so that the computation is easier than MLE. Bayesian method solves heteroskedasticity by modeling the structure of variance-covariance matrix. The bias are used to evaluate the GMM and Bayes in estimating parameters of SAR model with heteroskedasticity disturbances in simulation data. The results show that GMM and Bayes provides the bias of parameter estimates relatively consistent and smaller with larger number of observations. GMM and Bayes methods are applied to district/city GRDP data in Indonesia. The result show GMM method with Eksponential Distance Weights (EDW) matrix produces the minimum variance and the largest pseudo-R2
Support vector machine performance: simulation and rice phenology application Muradi, Hengki; Saefuddin, Asep; Sumertajaya, I Made; Soleh, Agus Mohamad; Domiri, Dede Dirgahayu
IAES International Journal of Artificial Intelligence (IJ-AI) Vol 14, No 6: December 2025
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijai.v14.i6.pp4878-4890

Abstract

In the case of classification, model accuracy is expected to result in correct predictions. This study aims to analyze the performance of two kinds of support vector machine (SVM) methods: the support vector machine one versus one (SVM OvO) method and the generalized multiclass support vector machine (GenSVM) method. This method will compare to the generalized linear model, namely the multinomial logistic regression (MLR) method. Simulations were conducted using SVM OvO and GenSVM methods to get an overview of the parameters affecting both methods' performance. Furthermore, the three classification methods are implemented in the case of modelling the rice phenology and tested for performance. Simulation results show that, however, the SVM OvO and GenSVM machine learning methods are sensitive to the choice of model parameters. The empirical study results show that the SVM OvO and GenSVM methods can produce satisfactory model accuracy and are comparable to the MLR method. The best rice phenology model accuracy was obtained from the SVM OvO model, where 79.20 ± 0.21 overall accuracy and 70.69 ± 0.29 kappa were obtained. This research can be continued by handling samples, especially when class members are a minority, and can also add random effects to the SVM model.