p-Index From 2021 - 2026
6.566
P-Index
This Author published in this journals
All Journal FORUM STATISTIKA DAN KOMPUTASI Media Statistika Statistika JURNAL MATEMATIKA STATISTIKA DAN KOMPUTASI IPTEK The Journal for Technology and Science CAUCHY: Jurnal Matematika Murni dan Aplikasi Sosioinforma Jurnal Pengelolaan Sumberdaya Alam dan Lingkungan (Journal of Natural Resources and Environmental Management) International Journal of Advances in Intelligent Informatics Scientific Journal of Informatics JOIN (Jurnal Online Informatika) Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Jurnal Penelitian Pertanian Tanaman Pangan BAREKENG: Jurnal Ilmu Matematika dan Terapan JOURNAL OF APPLIED INFORMATICS AND COMPUTING SINTECH (Science and Information Technology) Journal MIND (Multimedia Artificial Intelligent Networking Database) Journal Jurnal Aplikasi Statistika & Komputasi Statistik FIBONACCI: Jurnal Pendidikan Matematika dan Matematika Inferensi International Journal of Advances in Data and Information Systems InPrime: Indonesian Journal Of Pure And Applied Mathematics Majalah Ilmiah Matematika dan Statistika (MIMS) Jurnal Lebesgue : Jurnal Ilmiah Pendidikan Matematika, Matematika dan Statistika Enthusiastic : International Journal of Applied Statistics and Data Science Prosiding Seminar Nasional Official Statistics Jurnal Natural Eduvest - Journal of Universal Studies Xplore: Journal of Statistics PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND OFFICIAL STATISTICS Parameter: Jurnal Matematika, Statistika dan Terapannya Scientific Journal of Informatics Journal of Mathematics, Computation and Statistics (JMATHCOS) Advance Sustainable Science, Engineering and Technology (ASSET) Indonesian Journal of Statistics and Its Applications Journal on Mathematics Education
Claim Missing Document
Check
Articles

Performance Comparison of Random Forest and XGBoost Optimized with Cuckoo Search Algorithm for Coconut Milk Adulteration Detection Using FTIR Spectroscopy I Gusti Ngurah, Sentana Putra; Kusman Sadik; Agus Mohamad Soleh; Cici Suhaeni
Journal of Mathematics, Computations and Statistics Vol. 8 No. 2 (2025): Volume 08 Nomor 02 (Oktober 2025)
Publisher : Jurusan Matematika FMIPA UNM

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35580/jmathcos.v8i2.7817

Abstract

Coconut milk has emerged as a strategic food commodity in the global tropical region, with market demand growing at 7.2% per annum since 2021. This increasing demand has led to sophisticated adulteration practices, including dilution with water. Such adulteration not only reduces the nutritional value but also poses serious health risks, including food poisoning and allergic reactions. This study developed an innovative detection method combining Fourier Transform Infrared (FTIR) spectroscopy with a sophisticated machine learning algorithm. We analyzed 719 coconut milk samples (wavelength range 2500-4000 nm) consisting of traditional market products and instant commercial products. This study aims to develop an FTIR-based coconut milk adulteration detection model by optimizing RF and XGBoost parameters using CSA and evaluating the comparative performance of the two models in identifying different types of adulterants. The spectral data underwent rigorous preprocessing using a combination of Standard Normal Variate (SNV) and Savitzky-Golay (SG) techniques to overcome the effects of noise and light scattering, which significantly improved feature extraction. The results show that CSA-optimized XGBoost achieves superior performance with 92% accuracy and 91% F1 score, outperforming Random Forest in all evaluation metrics. The model shows particular strength in precision (98%), indicating its outstanding ability to minimize false positives in adulteration detection. Stability tests through 30 experimental repetitions reveal that the combination of XGBoost+CSA maintains consistent performance with minimal variance, confirming its reliability for industrial applications. Comparative analysis shows that the combination of SNV+SG preprocessing improves the accuracy of the baseline model by 9-12%, while CSA optimization provides an additional performance improvement of 10-15%. This research makes significant contributions to food science and safety. This study demonstrates the effectiveness of CSA in optimizing spectroscopic models, achieving 19.5% higher precision. The combination of SNV+SG preprocessing improves the baseline accuracy by 9-12%, while CSA optimization provides an additional performance improvement of 10-15%. This study not only provides a rapid and non-destructive adulteration detection solution but also proves the effectiveness of the CSA approach in optimizing the spectroscopic model. These findings have important implications for strengthening food safety regulations and developing real-time quality control systems in the coconut milk industry.
Effect of Feature Normalization and Distance Metrics on K-Nearest Neighbors Performance for Diabetes Disease Classification Yusran, Muhammad; Sadik, Kusman; Soleh, Agus M; Suhaeni, Cici
Journal of Mathematics, Computations and Statistics Vol. 8 No. 2 (2025): Volume 08 Nomor 02 (Oktober 2025)
Publisher : Jurusan Matematika FMIPA UNM

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35580/jmathcos.v8i2.8012

Abstract

Diabetes is a global health issue with a steadily increasing prevalence each year. Early detection of the disease is an important step in preventing severe complications. The K-Nearest Neighbors (KNN) algorithm is often used in disease classification, but its performance is highly influenced by the choice of normalization method and distance metric used. This study aims to evaluate the effect of various normalization methods and distance metrics on the performance of the KNN algorithm in diabetes disease classification. The three normalization methods were employed: z-score normalization, min-max scaling, and median absolute deviation (MAD). In addition, the seven distance metrics were assessed: Euclidean, Manhattan, Chebyshev, Canberra, Hassanat, Lorentzian, and Clark. The dataset used is Pima Indians Diabetes which consists of 768 observations and 8 features. The data were split into 80% training data and 20% test data, and using 5-fold cross-validation to determine the optimal k value. The results show that the MAD-Canberra combination produces the highest overall accuracy, recall, and F1-score of 87.32%, 82.33%, and 81.94%, respectively. The highest precision was obtained from the Baseline-Hassanat combination at 86.96%, while the lowest performance was observed for the Z-Score-Chebyshev combination with F1-Score 58.02%. These results highlight that no single combination universally outperforms others, underscoring the need for empirical evaluation. Nonetheless, combining MAD normalization with metrics such as Canberra or Hassanat can serve as a strong starting point for developing KNN-based classification systems, especially in medical contexts that are sensitive to misclassification.
Analysis and Optimization of Rainfall Prediction in Makassar City Using Artificial Neural Networks Based on Data Augmentation, Regularization, and Bayesian Optimization Abdullah, Adib Roisilmi; Sadik, Kusman; Suhaeni, Cici; Saleh, Agus Muhammad
Journal of Mathematics, Computations and Statistics Vol. 8 No. 2 (2025): Volume 08 Nomor 02 (Oktober 2025)
Publisher : Jurusan Matematika FMIPA UNM

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35580/jmathcos.v8i2.8304

Abstract

This study develops a robust and efficient rainfall prediction model using an Artificial Neural Network (ANN), significantly enhanced through integrated data augmentation, regularization, and Bayesian optimization techniques. We utilized a dataset of 118 monthly rainfall records from Makassar City, spanning 2014–2022, sourced from the Meteorological, Climatological, and Geophysical Agency (BMKG). To effectively capture inherent temporal patterns, lag features (specifically lag-1, lag-3, and lag-6 rainfall values) were meticulously constructed as input variables. Subsequently, Min-Max normalization was applied across all features, ensuring input consistency and optimizing the ANN's learning process. An initial manual grid search identified the most effective baseline ANN architecture, featuring four hidden layers ([128, 32, 16, 64] neurons), a tanh activation function, and a learning rate of 0.01. While the baseline ANN model achieved a commendable initial performance with an RMSE of 0.1608, comprehensive experiments revealed the superior benefits of a fully integrated approach. This advanced model, which synergistically combined data augmentation (to address data limitations and enhance generalization), regularization (to mitigate overfitting), and Bayesian optimization (for efficient hyperparameter tuning), demonstrated significantly improved generalization capabilities and enhanced model stability. This integrated model yielded an RMSE of 0.1861, an MSE of 0.0346, and an MAE of 0.1359. These compelling findings unequivocally underscore that integrated optimization strategies are crucial for developing more robust and reliable ANN-based rainfall prediction models, particularly for critical applications in climate-based time series forecasting.
Bayesian Spatial BYM CAR Model for Estimating the Relative Risk of Dengue Hemmorhagic Fever in Bandung Ananda Shafira; Asep Saefuddin; Kusman Sadik
Journal of Mathematics, Computations and Statistics Vol. 8 No. 2 (2025): Volume 08 Nomor 02 (Oktober 2025)
Publisher : Jurusan Matematika FMIPA UNM

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35580/jmathcos.v8i2.9272

Abstract

Dengue Hemorrhagic Fever (DHF) is an endemic disease whose transmission is influenced by spatial and environmental factors, including population density, altitude, household sanitation, and clean and healthy living behaviors. In 2022, the city of Bandung reported a high incidence of DHF cases, highlighting the need for spatial modeling to capture interdependencies among geographic regions. This study aims to examine the impact of different parameter settings in hyperprior distributions on the Besag-York-Mollie conditional autoregressive (BYM CAR) model, estimate the relative risk (RR) of DHF, and map district-level risk to support the identification of priority areas for targeted prevention. The BYM CAR model was employed within a Bayesian framework, and the posterior distributions were obtained using Markov Chain Monte Carlo (MCMC) with the Gibbs sampling algorithm. Five hyperprior scenarios based on the Inverse-Gamma distribution were compared to evaluate their influence on model performance. The results show that hyperprior selection substantially affects model outcomes, with the best model obtained when the prior for the structured spatial component was specified as Inverse-Gamma(0.1, 0.1), and the unstructured spatial component as Inverse-Gamma(1, 0.01). Gedebage, Arcamanik, and Rancasari districts were identifies as high-risk areas, while Babakan Ciparay and Bandung Kulon exhibited the lowest RR estimates. This spatial risk mapping offers insights for policymakers in formulating more targeted and efficient DHF prevention strategies.
Comparison of LASSO, Ridge, and Elastic Net Regularization with Balanced Bagging Classifier Nisrina Az-Zahra, Putri; Sadik, Kusman; Suhaeni, Cici; Mohamad Soleh, Agus
Parameter: Jurnal Matematika, Statistika dan Terapannya Vol 4 No 2 (2025): Parameter: Jurnal Matematika, Statistika dan Terapannya
Publisher : Jurusan Matematika FMIPA Universitas Pattimura

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30598/parameterv4i2pp287-296

Abstract

Predicting Drug-Induced Autoimmunity (DIA) is crucial in pharmaceutical safety assessment, as early identification of compounds with autoimmune risk can prevent adverse drug reactions and improve patient outcomes. Classification analysis often faces challenges when the number of predictor variables exceeds the number of observations or when high correlations among predictors lead to multicollinearity and overfitting. Regularization methods, such as Ridge Regression, Least Absolute Shrinkage and Selection Operator (LASSO), and Elastic-Net, help stabilize parameter estimation and improve model interpretability. This study focuses on building a binary classification model to predict the risk of DIA using 196 molecular descriptors derived from chemical compound structures. To address class imbalance in the response variable, the Balanced Bagging Classifier (BBC) is combined with regularized logistic regression models. Elastic Net + BBC outperforms other models with the highest accuracy (0.825), followed closely by LASSO + BBC and Ridge + BBC (both 0.816). This integration not only improves classification accuracy but also enhances generalization and the reliable detection of minority class instances, supporting the early identification of autoimmune risks in drug discovery.
EVALUATING RANDOM FOREST AND XGBOOST FOR BANK CUSTOMER CHURN PREDICTION ON IMBALANCED DATA USING SMOTE AND SMOTE-ENN Andespa, Reyuli; Sadik, Kusman; Suhaeni, Cici; Soleh, Agus M
MEDIA STATISTIKA Vol 18, No 1 (2025): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/medstat.18.1.25-36

Abstract

The banking industry faces significant challenges in retaining customers, as churn can critically affect both revenue and reputation. This study introduces a robust churn prediction framework by comparing the performance of XGBoost and Random Forest algorithms under imbalanced data conditions. The novelty of this research lies in integrating the SMOTE and SMOTE-ENN techniques with machine learning algorithms to enhance model performance and reliability on highly imbalanced datasets. Unlike conventional approaches that rely solely on oversampling or undersampling, this study demonstrates that the hybrid combination of XGBoost and SMOTE provides superior predictive accuracy, stability, and efficiency. Hyperparameter optimization using GridSearchCV was conducted to identify the most effective parameter configurations for both algorithms. Model performance was evaluated using the F1-Score and Area Under the Curve (AUC). The results indicate that XGBoost with SMOTE achieved the best performance, with an F1-Score of 0.8730 and an AUC of 0.9828, showing an optimal balance between precision and recall. Feature importance analysis identified Months_Inactive_12_mon, Total_Trans_Amt, and Total_Relationship_Count as the most influential predictors. Overall, this approach outperforms traditional resampling and modeling techniques, providing practical insights for data-driven customer retention strategies in the banking industry.
Image Classification of Rice Leaf Diseases with KNN Based Model using Stratified-KCV Rizqi, Tasya Anisah; Anwar Fitrianto; Kusman Sadik
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol 9 No 5 (2025): October 2025
Publisher : Ikatan Ahli Informatika Indonesia (IAII)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29207/resti.v9i5.6590

Abstract

Rice is a staple food for people in the world, especially Indonesia. The rice harvest decreased in 2023, reducing harvest productivity and causing losses for farmers. Rice cultivation is often affected by diseases that hinder rice harvests. SKCV is a resampling method that performs more accurately because it can ensure that class frequencies are maintained. RGB and VGG16 are image processing methods that extract images into numerics. RGB image extraction is done by taking the average value of the red, green, and blue layers while VGG16 image extraction is done by taking the value of visual pattern features such as edges, textures, and object shapes. In this study, rice leaf diseases were classified using KNN-based models, including KNN, WKNN, CDNN, and ECDNN. This classification was performed to determine which method had better performance using SKCV and comparing the results of RGB and VGG16 image extraction. This classification also produces a comparison of SKCV and KCV results to determine the best resampling performance. The results of the analysis that have been carried out show that the ECDNN method produces the highest accuracy of 81.20% in classifying rice leaf diseases using SKCV with VGG16 extraction followed by CDNN and WKNN each at 68.80%, and KNN at 56.20% while RGB extraction only produces an accuracy of 43.8% using ECDNN and CDNN, 56.20% using WKNN, and 50% using KNN. The results of this rice leaf diseases classification analysis are expected to help farmers in increasing rice production in Indonesia.
Evaluating Ordinal Multivariate Models under Multicollinearity via Pairwise Likelihood: A Simulation Perspective Achmad Fauzan; Kusman Sadik; Anang Kurnia
Advance Sustainable Science Engineering and Technology Vol. 7 No. 4 (2025): August-October
Publisher : Science and Technology Research Centre Universitas PGRI Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26877/asset.v7i4.2282

Abstract

This study examines the effect of multicollinearity on ordinal regression through a two-stage Monte Carlo simulation. A synthetic population of 2,000,000 observations was generated with predictors drawn from a normal distribution, and responses simulated using an ordinal probit model. A Monte Carlo procedure was employed with 10 repetitions, each consisting of 100 random samples of 1,000 observations. Parameter estimation employed Maximum Likelihood Estimation (MLE) for univariate models and Pairwise Likelihood (PL) for multivariate models, with performance assessed using mean squared error (MSE), bias, and computation time. Results show that multicollinearity had negligible impact on estimator bias and MSE, confirming the robustness of both MLE and PL to correlated predictors. However, severe multicollinearity substantially increased computation time, indicating a trade-off between estimator stability and efficiency. These findings highlight PL as a viable approach for analyzing complex ordinal data, particularly in applications such as socio-economic surveys and health metrics where predictor correlation is unavoidable.
Analyzing Household Expenditures with Generalized Random Forests Isnanda, Eriski; Notodiputro, Khairil Anwar; Sadik, Kusman
CAUCHY: Jurnal Matematika Murni dan Aplikasi Vol 10, No 1 (2025): CAUCHY: JURNAL MATEMATIKA MURNI DAN APLIKASI
Publisher : Mathematics Department, Universitas Islam Negeri Maulana Malik Ibrahim Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.18860/cauchy.v10i1.30104

Abstract

This study investigates the performance of Generalized Random Forest (GRF), which has been known to be useful in understanding heterogeneous treatment effects (HTE) and non-linear relationships in high-dimensional data. In this paper the performance of GRF was compared with Random Forest (RF), Generalized Linear Mixed Model (GLMM) as continuation of previous study conducted by Athey (2019). The data utilized in this study is from the National Socioeconomic Survey (SUSENAS) to predict household per capita expenditure in West Java, Indonesia. The models are evaluated based on their ability to handle outliers using Winsorization. The results show that RF performed the best, yielding the smallest MSE values, followed by GRF with reasonably good performance, and GLMM with the highest MSE, indicating its limitations in handling non-linear data patterns. These findings indicate that RF is the most accurate method for modeling per capita expenditure in West Java, with recommendations for further research to develop hybrid methods or use more specific random effects in GLMM
N-Level Structural Equation Models (nSEM): The Effect of Sample Size on the Parameter Estimation in Latent Random-Intercept Model Eminita, Viarti; Saefuddin, Asep; Sadik, Kusman; Syafitri, Utami Dyah
InPrime: Indonesian Journal of Pure and Applied Mathematics Vol. 6 No. 1 (2024)
Publisher : Department of Mathematics, Faculty of Sciences and Technology, UIN Syarif Hidayatullah

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15408/inprime.v6i1.38914

Abstract

Multilevel Structural Equation Modeling (MSEM) is claimed to address hierarchical data structures and latent response variables, but it becomes unstable with an increasing number of levels. N-Level SEM (nSEM) is an SEM framework designed to handle a growing number of levels in the model. The nSEM framework uses the Maximum Likelihood Estimation (MLE) method for parameter estimation, which requires a large sample size and correct model specification. Therefore, it is essential to consider the necessary minimal sample size to ensure accurate and efficient parameter estimation in the nSEM model. This study examined how sample size affects the performance of parameter estimators in nSEM models. We propose a method to evaluate the effect of many environments to estimate the results of factor loadings and environmental variance produced by the model. In addition, we also assess the impact of environment size on the estimation results of factor loadings and individual variance. The results were then applied to actual data on student mathematics learning motivation in Depok. The findings show that neither the number of environments nor the size of the environment affects the performance of fixed parameter estimation in the nSEM model. nSEM indicates excellent performance in estimating environmental variance at level 2 when the number of environments increases. Conversely, increasing the size of the environment worsens the performance of estimating individual variance parameters. Overall, the nSEM framework for the latent random-intercept (LatenRI) model performs well with increasing sample sizes. The application data on LatenRI models show almost similar estimation results.Keywords: hierarchical data; latent random intercept model; multilevel structural equation modeling; n-level structural equation modeling.AbstrakMultilevel Structural Equation Modeling (MSEM) diklaim dapat mengatasi struktur data hierarki dan variabel respons laten, namun menjadi tidak stabil dengan bertambahnya jumlah level. N-Level SEM (nSEM) adalah kerangka kerja SEM yang dirancang untuk menangani semakin banyak level dalam model. Kerangka kerja nSEM menggunakan metode Maximum Likelihood Estimation (MLE) untuk estimasi parameter, yang memerlukan ukuran sampel yang besar dan spesifikasi model yang benar. Oleh karena itu, penting untuk mempertimbangkan ukuran sampel minimal yang diperlukan untuk memastikan estimasi parameter yang akurat dan efisien dalam model nSEM. Studi ini menguji bagaimana ukuran sampel mempengaruhi kinerja penduga parameter dalam model nSEM. Kami mengusulkan metode untuk mengevaluasi pengaruh banyak lingkungan dalam memperkirakan hasil factor loadings  dan varians lingkungan yang dihasilkan oleh model. Selain itu, kami juga menilai dampak ukuran lingkungan terhadap hasil estimasi factor loadings dan varians individu. Hasilnya kemudian diterapkan pada data aktual motivasi belajar matematika siswa di Depok. Hasil menunjukkan bahwa baik jumlah lingkungan maupun ukuran lingkungan tidak mempengaruhi kinerja estimasi parameter tetap pada model nSEM. nSEM menunjukkan kinerja yang sangat baik dalam memperkirakan varians lingkungan pada level 2 ketika jumlah lingkungan meningkat. Sebaliknya, peningkatan ukuran lingkungan akan memperburuk kinerja pendugaan parameter varians individu. Secara keseluruhan, kerangka nSEM untuk model intersepsi acak laten (LatenRI) bekerja dengan baik dengan meningkatnya ukuran sampel. Data penerapan model LatenRI menunjukkan hasil estimasi yang hampir serupa.Kata Kunci: data hirarki; model intersep acak laten; model persamaan structural multilevel; model persamaan structural n-level. 2020MSC: 62D99
Co-Authors . Erfiani . Indahwati A.Tuti Rumiati Aam Alamudi Abdullah, Adib Roisilmi Achmad Fauzan Agus Mohamad Soleh Ahmad Rifai Nasution Aji Hamim Wigena Akbar Rizki Akbar Rizki Akbar Rizki Akmala Firdausi Amalia, Rahmatin Nur Anadra, Rahmi Ananda Shafira Anang Kurnia Andespa, Reyuli Andi Okta Fengki ASEP SAEFUDDIN Astari, Reka Agustia Astari, Reka Agustia Aulya Permatasari Azka Ubaidillah Bagus Sartono Budi Susetyo Cici Suhaeni Cici Suhaeni Dito, Gerry Alfa Dwi Agustin Nuriani Sirodj Efriwati Efriwati Embay Rohaeti Eminita, Viarti EVITA PURNANINGRUM Fahira, Fani FARDILLA RAHMAWATI Farit Mochamad Afendi Fitrianto, Anwar Haikal, Husnul Aris Hari Wijayanto Hasnataeni, Yunia Hazan Azhari Zainuddin Hermawati, Neni I Gusti Ngurah, Sentana Putra I Made Sumertajaya I Wayan Mangku Indahwati Indahwati Indahwati Intan Arassah, Fradha Iqbal, Teuku Achmad Isnanda, Eriski Khairi A N Khairil Anwar Notodiputro Khikmah, Khusnia Nurul Khusnul Khotimah Kusni Rohani Rumahorbo Latifah, Leli Lili Puspita Rahayu Logananta Puja Kusuma M Soleh, Agus Mochamad Ridwan Mochamad Ridwan, Mochamad Mohammad Masjkur Muh Nur Fiqri Adham Muhammad Yusran Mulianto Raharjo Naima Rakhsyanda Nisrina Az-Zahra, Putri Nur Khamidah NURADILLA, SITI Nusar Hajarisman Pangestika, Dhita Elsha Parwati Sofan, Parwati Purnama Sari Rifqi Aulya Rahman Rizaldi Boer Rizki, Akbar Rizqi, Tasya Anisah ROCHYATI ROCHYATI Sahamony, Nur Fitriyani Saleh, Agus Muhammad Satriyo Wibowo Siregar, Jodi jhouranda Siti Raudlah Sitti Nurhaliza Soleh, Agus M Suhaeni, Cici Sundari, Marta Supriatin, Febriyani Eka Tendi Ferdian Diputra Titin Suhartini Titin Suhartini, Titin Tri Wahyuni Uswatun Hasanah Utami Dyah Syafitri Viarti Eminita Widhiyanti Nugraheni Yenni Angraini Yenni Kurniawati Yuli Eka Putri