Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : International Journal of Advances in Artificial Intelligence and Machine Learning

A Comparative Study of Convolutional Neural Networks and Vision Transformers for Fruit Classification Jawarneh, Malik; Marwanto, Arief; Syamsuar, Dedy; Kusnandar, Maivi
International Journal of Advances in Artificial Intelligence and Machine Learning Vol. 2 No. 2 (2025): International Journal of Advances in Artificial Intelligence and Machine Learni
Publisher : CV Media Inti Teknologi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.58723/ijaaiml.v2i2.435

Abstract

Background of study:  Accurate fruit classification is vital for agricultural automation, yet traditional methods are often subjective and inefficient. Convolutional Neural Networks (CNNs) are effective but struggle with global context in fine-grained tasks. Vision Transformers (ViTs), inspired by NLP models, offer global attention mechanisms that may improve classification in complex scenarios.Aims and scope of paper: This study compares the performance of EfficientNet-B0 (a CNN model) and ViT-B/16 (a Transformer model) on a fruit classification task involving five fruit types. The goal is to evaluate their strengths and weaknesses under controlled experimental conditions using a moderately sized dataset.Methods: A dataset of 10,000 fruit images was preprocessed with standard augmentation techniques and split into training and validation sets. Both models were fine-tuned using pretrained weights. Performance was evaluated using accuracy, precision, recall, F1-score, and confusion matrices.Result: EfficientNet-B0 achieved higher overall accuracy (94%) than ViT-B/16 (92%). The CNN model performed consistently across all classes, particularly excelling in bananas and strawberries. ViT-B/16 showed superior results for strawberries but struggled with apples. Confusion matrices revealed class-specific strengths and weaknesses.Conclusion: EfficientNet-B0 is better suited for general fruit classification due to its balanced performance, while ViT-B/16 excels in capturing fine-grained visual features. A hybrid approach may leverage both models’ strengths for enhanced performance in real-world applications.