This Author published in this journals
All Journal Jurnal Infra
Martina Marcelline Taslim
Program Studi Informatika

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Deteksi Rumus Matematika pada Halaman Dokumen Digital dengan Metode Convolutional Neural Network Martina Marcelline Taslim; Kartika Gunadi; Alvin Nathaniel Tjondrowiguno
Jurnal Infra Vol 7, No 2 (2019)
Publisher : Jurnal Infra

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (1485.281 KB)

Abstract

Mathematical formulae in academic papers or scientific journals are an important part of said documents. However, mathematical formulae are oftentimes not properly recognized by Optical Character Recognition (OCR) processes. One of the causes of this failure is the difference between mathematical formulae and ordinary text. Therefore, mathematical formula detection in those document pages might help with this problem. The formula detection is done by converting digital document pages into images, then performing text line segmentation and word segmentation and classifying those results with a Convolutional Neural Network. The aim is to help OCR processes by recognizing which parts of the document pages contain formulae and which parts do not. The CNN architectures used to perform classification comes with 64 kernels in each convolutional layer. For displayed formulae (formulae that doesn’t share its space with regular text), the model uses 10 groups of Convolutional-ReLU-Max Pooling layers. For inline formulae (formulae that shares its text line with regular text), 12 groups of Convolutional-ReLU-Max Pooling layers are used. Results of the CNN architectures mentioned above are an F1 score of 0,980 for displayed formulae classification in 1-column documents, 0,940 for 2-column documents, and 0,916 for inline formulae.