International Journal of Engineering, Science and Information Technology
Vol 2, No 3 (2022)

Comparison of CSPDarkNet53, CSPResNeXt-50, and EfficientNet-B0 Backbones on YOLO V4 as Object Detector

Marsa Mahasin (National Institute of Technology, Bandung)
Irma Amelia Dewi (National Institute of Technology, Bandung)



Article Info

Publish Date
14 Sep 2022

Abstract

YOLO v4 has a structure consisting of 3 parts: backbone, neck, and head. The backbone is a part of the YOLO v4 structure that serves as a feature extractor from the image; the backbone is also a convolutional neural network that can be replaced with another convolutional neural network. Many backbones are recommended by previous research, such as CSPDarkNet53, CSPResNeXt-50, and EfficientNet-B0. Therefore, research needs to be done to determine the effect of different backbones on the  YOLO v4 model. One of the research objects that can be used is a microfossil. Research on the detection of microfossils is fundamental to assist paleontologists in knowing the species of microfossils as a determinant of rock age and distinguishing between similar microfossils. In this research, three backbones consisting of CSPDarkNet53, CSPResNeXt-50, and EfficientNet-B0 were used to train and detect image sets of 5 species of foraminiferal microfossils. The results were evaluated to determine the advantages of each backbone. There are a few metrics are that being used for evaluation, namely precision, recall, f1-score, average precision (AP), mean average precision (mAP), frames per second (FPS), and model size. As a result, the mean average precision (mAP) of the CSPDarkNet53 model reached 83.41%, the highest compared to CSPResNeXt-50 and EfficientNet-B0, which get a value of 81,00% and 81,76%. CSPResNeXt-50 model has a precision of 75.60%, recall of 81.10%, and f1-score of 78%. CSPDarkNet53 model also got the highest FPS value of 33.4FPS. However, the YOLO v4 model with the EfficientNet-B0 backbone is the lightest model, with only 156.8 MB.

Copyrights © 2022