International Journal of Advances in Artificial Intelligence and Machine Learning
Vol. 2 No. 2 (2025): International Journal of Advances in Artificial Intelligence and Machine Learni

Comparative Study of CNN and Vision Transformers on Indonesian Tradisional Cakes Classification

Trisnawarman, Dedi (Unknown)
Supriyanton, Adolf Asih (Unknown)
Mawardi, Viny Christanti (Unknown)
Okengwu, Ugochi A (Unknown)



Article Info

Publish Date
13 Jul 2025

Abstract

Background of study: Food image classification is a challenging task in computer vision, particularly when dealing with traditional food items that exhibit subtle visual variations. While Convolutional Neural Networks (CNNs) have long been the standard for image recognition, their limitation in capturing long-range dependencies has led to the emergence of Vision Transformers (ViTs). In this context, the classification of Indonesian traditional cakes offers a culturally rich yet complex problem for automated image recognition systems.Aims and scope of paper: This study aims to conduct a comparative analysis between EfficientNet-B0 (CNN-based) and ViT-B/16 (Transformer-based) architectures in classifying eight categories of Indonesian traditional cakes. The research evaluates not only classification accuracy but also the strengths and limitations of each model in handling fine-grained visual distinctions.Methods: Both models were fine-tuned using the “Kue Indonesia” dataset from Kaggle. The methodology includes image preprocessing, model training with consistent parameters, and evaluation using accuracy, precision, recall, and F1-score. A confusion matrix was also used to visualize misclassifications and analyze per-class performance.Result: ViT-B/16 achieved slightly higher accuracy (96.25%) compared to EfficientNet-B0 (95.62%). ViT performed better in classes with subtle variations, such as kue lapis and kue dadar gulung, while EfficientNet-B0 showed superior efficiency and high accuracy on visually distinct cakes.Conclusion: Both CNN and ViT models demonstrate strong performance in traditional food classification. ViT is more robust in fine-grained visual analysis, whereas EfficientNet-B0 is preferable for resource-constrained environments. This study highlights the role of AI in supporting digital preservation of culinary heritage.

Copyrights © 2025






Journal Info

Abbrev

ijaaiml

Publisher

Subject

Computer Science & IT

Description

The International Journal of Advances in Artificial Intelligence and Machine Learning (IJAAIML) is a prominent academic journal dedicated to publishing cutting-edge research and developments in the fields of Artificial Intelligence (AI) and Machine Learning (ML). It serves as an essential platform ...