This study aimed to develop an automatic text extraction system for ingredient labels by integrating YOLOv8 for object detection and a Transformer-based Optical Character Recognition (OCR) for text recognition. YOLOv8 was trained to detect and crop the label area in the image, while TrOCR was used to extract text from the cropped bounding box. The evaluation involved 16 sample image inputs under various conditions, including background color (Monochrome and RGB), languages (Bahasa Indonesia and English), and text formatting (single-line and multi-line). The results indicated that TrOCR performed well in single-line format, but struggled with multi-line format and longer text, even omitting words. Character and word error rates reached up to 100% for this complex layout.
Copyrights © 2025