Garuda - Garba Rujukan Digital

JOIN (Jurnal Online Informatika)

Vol 11 No 1 (2026)

Rahma Salsabila, Navira (Unknown)
Regita Azzahra, Adela (Unknown)
Utaminingrum, Fitri (Unknown)
Henryranu Prasetio, Barlian (Unknown)

Publish Date
24 Apr 2026

Identifying the most appropriate food dish based on available kitchen ingredients remains a practical yet challenging task in everyday life. To address this, this study specifically aims to develop an intelligent food classification system using a multimodal approach. We propose a multimodal food classification method that performs early fusion by combining visual and textual features extracted using the Contrastive Language–Image Pretraining (CLIP) model. Features from food images and ingredient lists are fused and classified through a two-layer multilayer perceptron. The model is evaluated on the Recipes5k dataset with 4,826 samples across 101 food categories. Results show that the proposed multimodal model achieves 91.32% accuracy, outperforming text-only (85.65%) and image-only (57.26%) baselines. The main contribution of this work lies in demonstrating the effectiveness of early fusion for combining cross-modal representations in food classification. Unlike prior methods, our model supports flexible inference with either text or image input, enabling practical real-world applications. These findings highlight the potential of multimodal learning for food recommendation systems, offering both accuracy and contextual relevance beyond unimodal approaches.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

JOIN (Jurnal Online Informatika)

Website

Abbrev

join

Publisher

Universitas Islam Negeri Sunan Gunung Djati Bandung

Subject

Computer Science & IT

Description

JOIN (Jurnal Online Informatika) is a scientific journal published by the Department of Informatics UIN Sunan Gunung Djati Bandung. This journal contains scientific papers from Academics, Researchers, and Practitioners about research on informatics. JOIN (Jurnal Online Informatika) is published ...

Article Info

Abstract

Early Fusion of Visual and Ingredient Representations for Multimodal Food Classification

Article Info

Abstract