Claim Missing Document
Check
Articles

Found 1 Documents
Search

Machine Learning-Based Allergen Risk Detection in Food Recipes Using K-Means Clustering and Support Vector Machine Zakaria, Adil; Wibawa, Aji Prasetya; Musyaffa', Ahmad 'Ammar; Alamsyah, David Satria; Yulianto, Aldy Rahmat; Utama, Agung Bella Putra
JMMR (Jurnal Medicoeticolegal dan Manajemen Rumah Sakit) Vol. 15 No. 1 (2026): April 2026
Publisher : Universitas Muhammadiyah Yogyakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.18196/jmmr.v15i1.705

Abstract

Errors in identifying food allergens in hospital menus may pose serious risks to patient safety. This study proposes a machine learning approach for automated allergen risk classification using food recipe data. A dataset of 9,986 Indonesian recipes was collected from an online recipe platform via web scraping and mapped to 14 major allergen attributes in accordance with international food safety standards. To represent ingredient variability, a rule-based data augmentation strategy was applied, generating recipe variations from optional ingredients, yielding 15,031 additional records after filtering out unrealistic combinations. Because ground-truth clinical labels were unavailable, K-Means clustering was used to generate pseudo-labels that capture similarity patterns in allergen composition. These cluster assignments were then used as target classes for classification using Support Vector Machine (SVM) with Linear, Polynomial, Radial Basis Function (RBF), and Sigmoid kernels. Model performance was evaluated using 10-fold cross-validation with accuracy, precision, recall, and F1-score metrics, and additional hyperparameter tuning was performed to optimize model parameters. The results show that Linear, Polynomial, and RBF kernels consistently achieve high performance (0.99–1.00), whereas the Sigmoid kernel yields lower, less stable performance. However, these findings should be interpreted cautiously, as the dataset originates from a recipe platform and the labeling structure is derived from clustering rather than direct clinical annotation.