Media of Computer Science
Vol. 1 No. 1 (2024): June 2024

Comparative Analysis of Google Vision OCR with Tesseract on Newspaper Text Recognition

Prakisya, Nurcahya Pradana Taufik (Unknown)
Kusmanto, Bintang Timur (Unknown)
Hatta, Puspanda (Unknown)



Article Info

Publish Date
19 Jul 2024

Abstract

Optical Character Recognition (OCR) is a technique used to convert image files into machine-readable text. There are two Optical Character Recognition (OCR) algorithms that are currently well known and widely used, namely Google Vision's Optical Character Recognition (OCR) and Tesseract. The purpose of this study is to compare the Optical Character Recognition (OCR) algorithms of Google Vision and Tesseract so that people can more easily find out which algorithm is the right one to implement on the system they are going to build. The method used in this research is Research and Development (R&D) with the stages of literature study, needs analysis, dataset collection and expansion, architectural design development and application modeling, system implementation, testing and evaluation, drawing conclusions. To be able to determine the level of accuracy, precision and sensitivity of each algorithm, this research uses the Confusion Matrix formula. The results of this study conclude that Google Vision's Optical Character Recognition (OCR) algorithm is superior to Tesseract because the level of accuracy, sensitivity, and precision is superior to Google Vision.

Copyrights © 2024






Journal Info

Abbrev

mcs

Publisher

Subject

Computer Science & IT Control & Systems Engineering Electrical & Electronics Engineering Languange, Linguistic, Communication & Media Library & Information Science

Description

Media of Computer Science (MCS), a two times annually provides a forum for the full range of scholarly study . MCS focuses on advanced computational intelligence, including the synergetic integration of neural networks, fuzzy logic and eveolutionary computation, so that more intelligent system can ...