In the digital era, the identification of book titles on covers has become a crucial requirement in digital library management, archiving systems, and book e-commerce platforms. The main challenges lie in the limitations of manual methods and traditional pattern-matching techniques, which are inefficient, as well as in the complexity of processing the Indonesian language, which exhibits diverse morphological variations and syntactic structures. To address these issues, this study proposes the integration of Optical Character Recognition (OCR) with the Natural Language Processing (NLP) method. OCR is utilized to extract textual information from book cover images, while NLP is applied to recognize and classify the extracted text to identify the main book title. The implementation results demonstrate that this approach significantly improves title identification accuracy compared to traditional methods, particularly through the application of Named Entity Recognition (NER) techniques and modern NLP models such as BERT and LSTM. The developed system proves effective in accelerating the book digitalization process, enhancing information management efficiency, and contributing to the advancement of Indonesian language processing technology.
Copyrights © 2025