Claim Missing Document
Check
Articles

Comparison of Ensemble Forest-Based Methods Performance for Imbalanced Data Classification Hasnataeni, Yunia; Saefuddin, Asep; Soleh, Agus Mohamad
Scientific Journal of Informatics Vol. 12 No. 2: May 2025
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v12i2.24269

Abstract

Purpose: Classification of imbalanced data presents a major challenge in meteorological studies, particularly in rainfall classification where extreme events occur infrequently. This research addresses the issue by evaluating ensemble learning models in handling imbalanced rainfall data in Bogor Regency, aiming to improve classification performance and model reliability for hydrometeorological risk mitigation. Methods: Four ensemble methods: RF, RoF, DRF, and RoDRF were applied to rainfall classification using three resampling techniques: SMOTE, RUS, and SMOTE-RUS-NC. The data underwent preprocessing, stratified splitting, resampling, and 5-fold cross-validation. Performance was evaluated over 100 iterations using accuracy, precision, recall, and F1-score. Result: The combination of DRF with SMOTE-RUS-NC yielded the most balanced results between accuracy (0.989) and computation time (107.28 seconds), while RoDRF with SMOTE achieved the highest overall performance with an accuracy of 0.991 but required a longer computation time (149.30 seconds). Feature importance analysis identified average humidity, maximum temperature, and minimum temperature as the most influential predictors of extreme rainfall. Novelty: This research contributes a comprehensive comparison of ensemble forest-based methods for imbalanced rainfall data, revealing DRF-SMOTE as an optimal trade-off between performance and efficiency. The findings contribute to improved rainfall classification models and offer practical insight for disaster mitigation planning and resource management in tropical regions.
Land Use Change Modelling Using Logistic Regression, Random Forest and Additive Logistic Regression in Kubu Raya Regency, West Kalimantan Pradana, Alfa Nugraha; Djuraidah, Anik; Soleh, Agus Mohamad
Forum Geografi Vol 37, No 2 (2023): December 2023
Publisher : Universitas Muhammadiyah Surakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.23917/forgeo.v37i2.23270

Abstract

Kubu Raya Regency is a regency in the province of West Kalimantan which has a wetland ecosystem including a high-density swamp or peatland ecosystem along with an extensive area of mangroves. The function of wetland ecosystems is essential for fauna, as a source of livelihood for the surrounding community and as storage reservoir for carbon stocks. Most of the land in Kubu Raya Regency is peatland. As a consequence, peat has long been used for agriculture and as a source of livelihood for the community. Along with the vast area of peat, the regency also has a potential high risk of peat fires. This study aims to predict land use changes in Kubu Raya Regency using three statistical machine learning models, specifically Logistic Regression (LR), Random Forest (RF) and Additive Logistic Regression (ALR). Land cover map data were acquired from the Ministry of Environment and Forestry and subsequently reclassified into six types of land cover at a resolution of 100 m. The land cover data were employed to classify land use or land cover class for the Kubu Raya regency, for the years 2009, 2015 and 2020. Based on model performance, RF provides greater accuracy and F1 score as opposed to LR and ALR. The outcome of this study is expected to provide knowledge and recommendations that may aid in developing future sustainable development planning and management for Kubu Raya Regency.
Handling Multicollinearity Problems in Indonesia's Economic Growth Regression Modeling Based on Endogenous Economic Growth Theory: Penanganan Masalah Multikolinieritas pada Pemodelan Pertumbuhan Ekonomi Indonesia Berdasarkan Teori Pertumbuhan Ekonomi Endogenous Yanke, Aldino; Zendrato, Nofrida Elly; Soleh, Agus M
Indonesian Journal of Statistics and Applications Vol 6 No 2 (2022)
Publisher : Statistics and Data Science Program Study, SSMI, IPB University, in collaboration with the Forum Pendidikan Tinggi Statistika Indonesia (FORSTAT) and the Ikatan Statistisi Indonesia (ISI)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v6i2p214-230

Abstract

One of the multiple linear regression applications in economics is Indonesia’s economic growth model based on the theory of endogenous economic growth. Endogenous economic theory is the development of classical theory which cannot explain how the economy grows in the long run. The regression model based on the theory of endogenous economic growth used many independent variables, which caused multicollinearity problems. In this study, the multiple linear regression model using the least-squares estimation method and some methods to handle the multicollinearity problem was implemented. Variable selection methods (backward, forward, and stepwise), principal component regression (PCR), partial least square (PLS), and regularization methods (Ridge, Lasso, and Elastic Net) were applied to solve the multicollinearity problem. Variable selection method with backward, forward, and stepwise has not been able to overcome the problem of multicollinearity. In contrast, Principal Component Regression, PLS regression, and regularization regression methods overcame the multicollinearity problem. We used "leave one out cross-validation" (LOOCV) to determine the best method for handling multicollinearity problems with the smallest mean square of error (MSE). Based on the MSE value, the best method to overcome the multicollinearity problem in the economic growth model based on endogenous economic growth theory was the Lasso regression method.
Study of Spatial Autoregressive Regression With Heteroskedasticity Using the Generalized Method of Moments and Bayesian Approach : Kajian Regresi Spasial Autoregresif dengan Heteroskedastik Menggunakan Generalized Method of Moments dan Pendekatan Bayes Koesnandy H, Abialam; Agus Mohamad Soleh; Farit Mochamad Afendi
Indonesian Journal of Statistics and Applications Vol 8 No 1 (2024)
Publisher : Statistics and Data Science Program Study, SSMI, IPB University, in collaboration with the Forum Pendidikan Tinggi Statistika Indonesia (FORSTAT) and the Ikatan Statistisi Indonesia (ISI)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v8i1p58-69

Abstract

Spatial dependence and spatial heteroskedasticity are problems in spatial regression. Spatial autoregressive regression (SAR) concerns only to the dependence on lag. The estimation of SAR parameters containing heteroskedasticity using the maximum likelihood estimation (MLE) method provides biased and inconsistent estimators. The alternative method that can be used are generalized method of moments (GMM) and Bayesian method. GMM uses a combination of linear and quadratic moment functions simultaneously so that the computation is easier than MLE. Bayesian method solves heteroskedasticity by modeling the structure of variance-covariance matrix. The bias are used to evaluate the GMM and Bayes in estimating parameters of SAR model with heteroskedasticity disturbances in simulation data. The results show that GMM and Bayes provides the bias of parameter estimates relatively consistent and smaller with larger number of observations. GMM and Bayes methods are applied to district/city GRDP data in Indonesia. The result show GMM method with Eksponential Distance Weights (EDW) matrix produces the minimum variance and the largest pseudo-R2
Support vector machine performance: simulation and rice phenology application Muradi, Hengki; Saefuddin, Asep; Sumertajaya, I Made; Soleh, Agus Mohamad; Domiri, Dede Dirgahayu
IAES International Journal of Artificial Intelligence (IJ-AI) Vol 14, No 6: December 2025
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijai.v14.i6.pp4878-4890

Abstract

In the case of classification, model accuracy is expected to result in correct predictions. This study aims to analyze the performance of two kinds of support vector machine (SVM) methods: the support vector machine one versus one (SVM OvO) method and the generalized multiclass support vector machine (GenSVM) method. This method will compare to the generalized linear model, namely the multinomial logistic regression (MLR) method. Simulations were conducted using SVM OvO and GenSVM methods to get an overview of the parameters affecting both methods' performance. Furthermore, the three classification methods are implemented in the case of modelling the rice phenology and tested for performance. Simulation results show that, however, the SVM OvO and GenSVM machine learning methods are sensitive to the choice of model parameters. The empirical study results show that the SVM OvO and GenSVM methods can produce satisfactory model accuracy and are comparable to the MLR method. The best rice phenology model accuracy was obtained from the SVM OvO model, where 79.20 ± 0.21 overall accuracy and 70.69 ± 0.29 kappa were obtained. This research can be continued by handling samples, especially when class members are a minority, and can also add random effects to the SVM model.
OPEC Crude Oil Price Forecasting Using ARIMA with Ensemble Empirical Mode Decomposition Lutfiah Adisti, Tiara; Soleh, Agus M; Alamudi, Aam; Rahardiantoro, Septian; Rizki, Akbar
Indonesian Journal of Statistics and Applications Vol 9 No 2 (2025)
Publisher : Statistics and Data Science Program Study, SSMI, IPB University, in collaboration with the Forum Pendidikan Tinggi Statistika Indonesia (FORSTAT) and the Ikatan Statistisi Indonesia (ISI)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v9i2p230-239

Abstract

World crude oil prices fluctuate every day. One source of crude oil traded is oil from crude oil exporting countries that are members of the Organization of the Petroleum Exporting Countries (OPEC). In the total of 40% of world crude oil is produced by OPEC. This makes forecasting the price of crude oil OPEC’s policy very necessary in order to maintain world oil market stability. Fluctuating oil price data is made simpler and easier to interpret by applying the Ensemble Empirical Mode Decomposition (EEMD) method. The EEMD method decomposes the data into a number of Intrinsic Mode Functions (IMF) and residual of the IMF. In this study, the ARIMA forecasting model is compared using the original data and the decomposition results in the form of IMF components and IMF residuals. The comparison of the two methods is seen based on the overall and average MAPE value of the forecasting results in five time ranges. The EEMD-ARIMA method has an average MAPE value of 9.09% and standard deviation MAPE value of 7.39%. OPEC crude oil price forecast in January-August 2021 ranges from $42.22 to $60.6 per barrel. The final result of the analysis in this study shows that the ARIMA method with decomposition data (EEMD-ARIMA) is better than the ARIMA method using original data
OPTIMIZING LANDSLIDE SUSCEPTIBILITY MAPPING IN CENTRAL SULAWESI WITH RECURSIVE FEATURE ELIMINATION AND RANDOM FOREST ALGORITHM Siregar, Indra Rivaldi; Djuraidah, Anik; Soleh, Agus Mohamad
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 20 No 2 (2026): BAREKENG: Journal of Mathematics and Its Application
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30598/barekengvol20iss2pp1019-1034

Abstract

Landslides are among the most destructive natural hazards, causing severe casualties, economic losses, and environmental degradation. Central Sulawesi, characterized by active tectonics such as the Palu-Koro fault, is highly susceptible to landslides, as tragically demonstrated in 2018. Therefore, developing accurate landslide susceptibility maps is essential to support comprehensive landslide mitigation efforts in this region. While machine learning, particularly Random Forest (RF), has proven highly effective for landslide modeling, previous studies around Palu have often overlooked model simplification through feature selection and hyperparameter optimization. This study proposes an integrated approach combining RF with Recursive Feature Elimination (RFE) to reduce model complexity and enhance predictive accuracy. This research utilizes 498 landslide events with fifteen conditions, including topography, environment, geology, and anthropogenic influences. The RFE-RF model achieves superior classification performance, with accuracy, balanced accuracy, and F1-scores exceeding 0.81, outperforming the RF without RFE and Logistic Regression baselines. These findings underscore the urgent need to integrate feature selection methods such as RFE into landslide modeling frameworks to improve predictive accuracy. High accuracy enables government authorities and stakeholders to develop more targeted and effective mitigation priorities. Spatial analysis indicates that Donggala, Palu, and Sigi are the most critical areas requiring prioritized mitigation, with over 9% of their territories classified as highly susceptible. Feature importance analysis reveals that elevation, slope, and land cover are the most influential factors. This study suggests that mitigation efforts should focus on the hills and mountainous areas on both sides of the Palu Valley, with recommended strategies emphasizing land cover management practices, such as reforestation, to enhance slope stability and reduce landslide risk.
MIXED-EFFECTS MODELS WITH GENERALIZED RANDOM FOREST: IMPROVED FOOD INSECURITY ANALYSIS Fransiska, Herlin; Soleh, Agus Mohamad; Notodiputro, Khairil Anwar; Erfiani, Erfiani
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 20 No 2 (2026): BAREKENG: Journal of Mathematics and Its Application
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30598/barekengvol20iss2pp1111-1124

Abstract

Food insecurity is a complex issue that requires a deep understanding of its influencing factors. Accurate predictions are crucial for effective interventions. Machine learning is well-suited to the large and complex data available in the big data era. However, machine learning generally does not accommodate hierarchical or clustered data structures, making them challenging for machine learning modeling. One model that accommodates hierarchical data structures is the mixed-effects model. This study introduces a novel approach to predict food insecurity by integrating mixed-effects models and a generalized random forest. Mixed-effects models capture variations in hierarchical or clustered data, such as differences between regions, and the generalized random forest, as extended and developed from the traditional random forest, is integrated to model fixed effects and improve prediction accuracy. The empirical data used were the food insecurity data from 2021 in West Java, Indonesia. The results show that mixed-effects models with a generalized random forest significantly improve the prediction accuracy compared to other models. The average performance measure shows GMEGRF is a good model and has a balanced accuracy value of 0.6789709, which is the highest result compared to other methods. This methodological advancement offers a new robust model for understanding and potentially mitigating food insecurity, ultimately informing efforts towards SDG 2 (Zero Hunger).
Siamese Model-Based Face Verification Using CNN and MobileNetV2 Abd Rahman; Agus Mohamad Soleh; Erfiani
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol 10 No 2 (2026): April 2026
Publisher : Ikatan Ahli Informatika Indonesia (IAII)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29207/resti.v10i2.6996

Abstract

Face verification plays an important role in computer vision, especially in mobile and embedded systems with limited computational capacity. This study proposes a face verification system based on the Siamese Neural Network (SNN) architecture by integrating six embedding models. These models consist of a standard CNN, an L2-normalized CNN, a baseline MobileNetV2, a structurally adjusted MobileNetV2, a pre-trained MobileNetV2, and a fine-tuned MobileNetV2. The dataset includes facial images captured from three webcams and additional samples obtained from the Labeled Faces in the Wild and ImageNet datasets. The experimental procedure includes image preprocessing, construction of balanced positive and negative image pairs, model training, and evaluation using accuracy, precision, recall, F1-score, and AUC. The results show that the pre-trained MobileNetV2 and the standard CNN achieve the highest verification accuracy, reaching 100 percent and 99.998 percent, respectively. Among all models, the structurally adjusted MobileNetV2 presents the best trade-off by combining high accuracy, computational efficiency, and training stability while successfully avoiding overfitting. The real-time implementation involves only the structurally adjusted MobileNetV2 model due to its lightweight structure and consistent performance. This model produces low embedding distances, low latency, and high throughput during CPU-based inference. The performance outperforms GPU execution in one-by-one image processing. The proposed system offers a practical and efficient face verification solution for deployment in identity authentication applications on resource-constrained platforms. These findings support the development of scalable and adaptive biometric security systems that rely on deep learning.
DETECTION OF ADULTERATION IN COCONUT MILK USING CUCKOO SEARCH-OPTIMIZED XGBOOST ON HIGH-DIMENSIONAL FTIR SPECTRAL DATA Sentana Putra, I Gusti Ngurah; Sadik, Kusman; Soleh, Agus Mohamad; Suhaeni, Cici
JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika) Vol 10, No 3 (2025)
Publisher : STKIP PGRI Tulungagung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29100/jipi.v10i3.8376

Abstract

Coconut milk adulteration is an important issue because it can reduce food quality and endanger consumers. This study aims to develop a rapid and accurate detection method for coconut milk adulteration using a combination of FTIR spectroscopy technology and the XGBoost machine learning algorithm optimized with the Cuckoo Search Algorithm (CSA). FTIR spectral data from traditional and instant coconut milk samples were analyzed using Standard Normal Variate (SNV) and Savitzky-Golay (SG) preprocessing to reduce noise and clarify spectral features. The XGBoost model was then optimized through CSA with hyperparameter tuning. The results showed that the combination of SNV+SG preprocessing increased the model accuracy by 84.44%, with a precision of 92.73% and an F1-score of 79.94%. In addition, CSA optimization provided a 19.7% increase in accuracy compared to the model without tuning. These findings prove the effectiveness of the CSA-XGBoost approach in analyzing high-dimensional spectral data and is a potential solution in efficiently detecting the authenticity of coconut milk. In conclusion, this approach has the potential to be widely applied to test the authenticity of other food products quickly, non-destructively and accurately.
Co-Authors Aam Alamudi Abd Rahman Afendi, Farit M Aji Hamim Wigena Alfa Nugraha Pradana Alfa Nugraha Pradana Alfiryal, Naufalia Anadra, Rahmi Anang Kurnia Andespa, Reyuli Andriansyah Andriansyah Andriansyah, . Anik Djuraidah Annisarahmi Nur Aini Aldania Ardhani, Rizky Arif Handoyo Marsuhandi Aris Yaman ASEP SAEFUDDIN Astari, Reka Agustia Baehera, Seta Bagus Sartono Belinda, Nadira Sri Budi Susetyo Butar-butar, Victor Pandapotan Cici Suhaeni Dalimunthe, Amir Abduljabbar Daulay, Nurmai Syaroh Dede Dirgahayu Domiri Dede Dirgahayu Domiri Dede Dirgahayu Domiri, Dede Dirgahayu Deri Siswara Devi Andrian Dini Ramadhani Erfiani Erfiani Erfiani Etis Sunandi Farit Mochamad Afendi Fauzi, Asep Andri Fitrianto, Anwar Fulazzaky, Tahira Hamim Wigena, Aji Hari Wijayanto Hari Wijayanto Hasnataeni, Yunia Hengki Muradi Herlin Fransiska I Gusti Ngurah, Sentana Putra I Made Sumertajaya Indahwati Iqbal Hanif, Iqbal Jumansyah, L. M. Risman Dwi Kamila, Sabrina Adnin Karel Fauzan Hakim Karimah, Yumna Khairil Anwar Notodiputro Koesnandy H, Abialam Kusman Sadik Kusnaeni Kusnaeni, Kusnaeni Latifah K. Darusman Leni Anggraini Susanti Lutfiah Adisti, Tiara M. Yunus Mohamad Rafi Mubarak, Fadhlul Muchisha, Nadya Dwi Muhammad Nur Aidi Muhammad Nuruddin Prathama Muhammad Yusran Muradi, Hengki Nisrina Az-Zahra, Putri Nofrida Elly Zendrato NURADILLA, SITI Nurhambali, M Rizky Nurizki, Anisa Pika Silvianti Pusparani, Windyana Rahardiantoro, Septian Rais Rakhmalia, Riza Indriani Rizki Manaf, Silmi Anisa Rizki, Akbar Rochman, Nur Sentana Putra, I Gusti Ngurah Seran, Karlina Setyono Siregar, Indra Rivaldi Siti Arni Wulandya, Siti Arni Siti Hafsah Suhaeni, Cici Tamara, Novian Tarida, Arna Ristiyanti Trianjaya, Beny Tyas, Maulida Fajrining Ulfa, Yopi Ariesia Uswatun Hasanah Utami Dyah Syafitri Wigena, Aji H Yanke, Aldino Yudistira Yudistira Yudistira Yudistira Yumna Karimah _ Aunuddin