Sinergi
Vol. 30 No. 2 (2026)

Benchmarking YOLOv8 and vision transformers for intelligent fish monitoring in aquaponics and controlled aquarium environments

Tresna Dewi (Department of Electrical Engineering, Politeknik Negeri Sriwijaya)
Yurni Oktarina (Department of Electrical Engineering, Politeknik Negeri Sriwijaya)
Sri Rezki Artini (Department of Civil Engineering, Politeknik Negeri Sriwijaya)
Gita Ayu Julianka (Department of Electrical Engineering, Politeknik Negeri Sriwijaya)
Jhoni Satria (Koi Agro Farm)



Article Info

Publish Date
08 Jun 2026

Abstract

Sustainable aquaculture requires reliable and accurate fish monitoring systems capable of operating across heterogeneous environmental conditions. Conventional monitoring approaches are labor-intensive and prone to human error, while recent advances in deep learning have enabled vision-based automation for aquatic environments. Convolutional object detectors such as YOLO and emerging Vision Transformer (ViT) models have demonstrated promising performance; however, most existing studies remain limited to single-environment evaluations and rarely address energy-constrained, real-world deployment. To bridge this gap, this study presents a systematic benchmark of YOLOv8 and ViT across two complementary settings: a controlled aquarium environment and a solar-powered, off-grid aquaponics system. The proposed framework integrates 1080p CCTV video acquisition, dataset annotation and augmentation, and standardized training and evaluation using COCO metrics. Experimental results show that ViT consistently outperforms YOLOv8 in detection accuracy and prediction stability across both environments. ViT achieves 99.73% accuracy in the controlled aquarium and ≥99.6% accuracy performance (99.68–99.73%) in aquaponics, while YOLOv8 records 87.90% accuracy in the aquarium and 93.92–97.92% across aquaponics fish classes, exhibiting higher sensitivity to background clutter. Statistical validation using McNemar’s test (p < 0.001) confirms that these differences are statistically significant. Beyond accuracy, the results reveal a trade-off between robustness and computational efficiency. ViT provides superior resilience under occlusion and glare, whereas YOLOv8 offers faster inference suitable for real-time operation on resource-limited edge devices. End-to-end deployment on a solar-powered NVIDIA Jetson Xavier NX demonstrates the feasibility of continuous, off-grid aquaculture monitoring and provides practical guidance for context-aware model selection in intelligent aquaculture systems.

Copyrights © 2026






Journal Info

Abbrev

sinergi

Publisher

Subject

Civil Engineering, Building, Construction & Architecture Control & Systems Engineering Electrical & Electronics Engineering Engineering Industrial & Manufacturing Engineering

Description

SINERGI is a peer-reviewed international journal published three times a year in February, June, and October. The journal is published by Faculty of Engineering, Universitas Mercu Buana. Each publication contains articles comprising high quality theoretical and empirical original research papers, ...