Claim Missing Document
Check
Articles

Found 1 Documents
Search

Fine-Grained Plant Classification using Vision Transformers with Optimized MLP Heads Yuardi, Koko; Alfarisy, Gusti Ahmad Fanshuri; Ramadhan Paninggalih
Jurnal ELTIKOM : Jurnal Teknik Elektro, Teknologi Informasi dan Komputer Vol. 9 No. 2 (2025)
Publisher : P3M Politeknik Negeri Banjarmasin

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31961/eltikom.v9i2.1500

Abstract

Automatic plant species classification is crucial for advancing education and biodiversity conserva-tion. Deep learning models, such as Vision Transformer (ViT), have demonstrated strong performance in plant species classification tasks. However, limited research explored the impact of hyperparameters in the Multi-Layer Perceptron (MLP) head of ViT models for plant-species classification. This study investi-gated the influence of learning rates, number of neurons, and activation functions on model performance. It also evaluated efficiency in both CPU and GPU environments. The objective was to determine the opti-mal configuration by analyzing accuracy, F1-score, and computation time. Two ViT models, ViT-B/16 and ViT-L/16, were tested using the VNPlant-200 dataset, which contains 200 plant species. Thirteen activa-tion functions, multiple learning rates, and neuron configurations were examined. The results showed that the Tanh activation function, combined with a learning rate of 10-4 and 1024 neurons, yielded the best performance on the ViT-B/16 model, achieving an accuracy of 0.9692 and F1-score of 0.9684. Meanwhile, the Hard Tanh activation function, with a learning rate of 10-4 and 256 neurons, delivered the best results on the ViT-L/16 model, achieving an accuracy of 0.9855 and an F1-score of 0.9854. Computational analy-sis showed that ViT-B/16 achieved an average inference time of 0.0159 seconds on a GPU and 0.8902 seconds on a CPU, while ViT-L/16 took 0.0492 seconds on a GPU and 2.8335 seconds on a CPU. These findings highlight the importance of selecting suitable activation functions, learning rates, and neuron configurations to optimize model performance while maintaining computational efficiency.