Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : Journal of Mathematics, Computation and Statistics (JMATHCOS)

Performance Comparison of Random Forest and XGBoost Optimized with Cuckoo Search Algorithm for Coconut Milk Adulteration Detection Using FTIR Spectroscopy I Gusti Ngurah, Sentana Putra; Kusman Sadik; Agus Mohamad Soleh; Cici Suhaeni
Journal of Mathematics, Computations and Statistics Vol. 8 No. 2 (2025): Volume 08 Nomor 02 (Oktober 2025)
Publisher : Jurusan Matematika FMIPA UNM

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35580/jmathcos.v8i2.7817

Abstract

Coconut milk has emerged as a strategic food commodity in the global tropical region, with market demand growing at 7.2% per annum since 2021. This increasing demand has led to sophisticated adulteration practices, including dilution with water. Such adulteration not only reduces the nutritional value but also poses serious health risks, including food poisoning and allergic reactions. This study developed an innovative detection method combining Fourier Transform Infrared (FTIR) spectroscopy with a sophisticated machine learning algorithm. We analyzed 719 coconut milk samples (wavelength range 2500-4000 nm) consisting of traditional market products and instant commercial products. This study aims to develop an FTIR-based coconut milk adulteration detection model by optimizing RF and XGBoost parameters using CSA and evaluating the comparative performance of the two models in identifying different types of adulterants. The spectral data underwent rigorous preprocessing using a combination of Standard Normal Variate (SNV) and Savitzky-Golay (SG) techniques to overcome the effects of noise and light scattering, which significantly improved feature extraction. The results show that CSA-optimized XGBoost achieves superior performance with 92% accuracy and 91% F1 score, outperforming Random Forest in all evaluation metrics. The model shows particular strength in precision (98%), indicating its outstanding ability to minimize false positives in adulteration detection. Stability tests through 30 experimental repetitions reveal that the combination of XGBoost+CSA maintains consistent performance with minimal variance, confirming its reliability for industrial applications. Comparative analysis shows that the combination of SNV+SG preprocessing improves the accuracy of the baseline model by 9-12%, while CSA optimization provides an additional performance improvement of 10-15%. This research makes significant contributions to food science and safety. This study demonstrates the effectiveness of CSA in optimizing spectroscopic models, achieving 19.5% higher precision. The combination of SNV+SG preprocessing improves the baseline accuracy by 9-12%, while CSA optimization provides an additional performance improvement of 10-15%. This study not only provides a rapid and non-destructive adulteration detection solution but also proves the effectiveness of the CSA approach in optimizing the spectroscopic model. These findings have important implications for strengthening food safety regulations and developing real-time quality control systems in the coconut milk industry.
Effect of Feature Normalization and Distance Metrics on K-Nearest Neighbors Performance for Diabetes Disease Classification Yusran, Muhammad; Sadik, Kusman; Soleh, Agus M; Suhaeni, Cici
Journal of Mathematics, Computations and Statistics Vol. 8 No. 2 (2025): Volume 08 Nomor 02 (Oktober 2025)
Publisher : Jurusan Matematika FMIPA UNM

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35580/jmathcos.v8i2.8012

Abstract

Diabetes is a global health issue with a steadily increasing prevalence each year. Early detection of the disease is an important step in preventing severe complications. The K-Nearest Neighbors (KNN) algorithm is often used in disease classification, but its performance is highly influenced by the choice of normalization method and distance metric used. This study aims to evaluate the effect of various normalization methods and distance metrics on the performance of the KNN algorithm in diabetes disease classification. The three normalization methods were employed: z-score normalization, min-max scaling, and median absolute deviation (MAD). In addition, the seven distance metrics were assessed: Euclidean, Manhattan, Chebyshev, Canberra, Hassanat, Lorentzian, and Clark. The dataset used is Pima Indians Diabetes which consists of 768 observations and 8 features. The data were split into 80% training data and 20% test data, and using 5-fold cross-validation to determine the optimal k value. The results show that the MAD-Canberra combination produces the highest overall accuracy, recall, and F1-score of 87.32%, 82.33%, and 81.94%, respectively. The highest precision was obtained from the Baseline-Hassanat combination at 86.96%, while the lowest performance was observed for the Z-Score-Chebyshev combination with F1-Score 58.02%. These results highlight that no single combination universally outperforms others, underscoring the need for empirical evaluation. Nonetheless, combining MAD normalization with metrics such as Canberra or Hassanat can serve as a strong starting point for developing KNN-based classification systems, especially in medical contexts that are sensitive to misclassification.