Deploying object detection models on resource-constrained devices requires balancing high accuracy with computational efficiency. This research analyzes the performance trade-off of hybrid architectures combining MobileNet backbones (V2, V3-Small, and V3-Large) with YOLO detectors. The study employs an experimental method using the Pascal VOC dataset across four input resolutions ranging from 320 to 640 pixels to measure Mean Average Precision (mAP@50) and Frames Per Second (FPS). The results demonstrate a significant trade-off between precision and speed. MobileNetV3-Large at 640px achieves the highest accuracy of 58.5% mAP, making it suitable for precision-critical tasks. Conversely, MobileNetV3-Small at 320px proves to be the most efficient model, achieving 165.2 FPS on a GPU and maintaining real-time performance on a standard CPU at 30.4 FPS. These findings provide empirical guidance for selecting optimal hybrid architectures based on specific hardware constraints, proving that MobileNetV3-Small is the superior choice for low-power edge implementation.
Copyrights © 2026