This Author published in this journals
All Journal Jurnal Mandiri IT
Miptahudin, Rd. Apip
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Bayesian-Optimized XGBoost Model for Predicting Mushroom Toxicity Sastradinata, Aria Kusumah; Sunarta, Sunarta; Miptahudin, Rd. Apip; Abdurrahman, M. Daffa; Taqwa, Rangga
Jurnal Mandiri IT Vol. 14 No. 2 (2025): Computer Science and Field
Publisher : Institute of Computer Science (IOCS)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35335/mandiri.v14i2.465

Abstract

Mushroom poisoning remains a significant public health concern due to the morphological similarities between edible and poisonous species, making traditional identification unreliable. This study aims to develop an accurate and interpretable machine learning framework for mushroom toxicity prediction using a Bayesian-Optimized Extreme Gradient Boosting (XGBoost) model. The dataset consists of morphological and ecological features derived from the secondary mushroom dataset, which underwent preprocessing through imputation, standardization, and one-hot encoding. Bayesian Optimization, implemented via the Hyperopt Tree-structured Parzen Estimator (TPE) algorithm, was employed to automatically fine-tune the XGBoost hyperparameters, thereby improving convergence and reducing manual experimentation. The model’s performance was evaluated using 10-fold cross-validation and standard metrics, including accuracy, precision, recall, F1-score, and the Area Under the ROC Curve (AUC). Experimental results demonstrated that the proposed framework achieved an exceptionally high performance with an accuracy of 99.99% and an AUC of 1.0000, indicating near-perfect discrimination between edible and poisonous mushrooms. Feature importance analysis further revealed that habitat, veil color, and stem root were the most influential predictors of toxicity. The findings highlight the effectiveness of Bayesian-optimized ensemble learning in handling high-dimensional biological data, offering a reliable, transparent, and computationally efficient approach for biosafety assessment and ecological data analysis.