Had, Iqbaluddin Syam
Unknown Affiliation

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Improving Tesseract OCR Accuracy Using SymSpell Algorithm on Passport Data Had, Iqbaluddin Syam; Maulana Baihaqi, Wiga; Putriana Nuramanah Kinding, Dwi
Sinkron : jurnal dan penelitian teknik informatika Vol. 9 No. 1 (2025): Research Article, January 2025
Publisher : Politeknik Ganesha Medan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33395/sinkron.v9i1.14395

Abstract

Optical Character Recognition (OCR) is a technology used to recognize text from images or digital documents, such as passports. One popular OCR tool is Tesseract as it offers high accuracy. However, OCR accuracy is often affected by various factors, including image noise and/or non-text elements. This article discusses the application of the SymSpell algorithm for post processing to improve OCR accuracy on standard Indonesian passports. OCR will be focused on the Visual Inspection Zone, specifically the Place of Birth and Issuing Office values. Unlike the Machine Readable Zone which is composed of individual codes and a clear background, the Visual Inspection Zone often experiences OCR errors due to holograms blocking the text and spaced layouts. SymSpell is an edit distance based spelling correction algorithm designed to process data quickly and efficiently, even on very huge datasets. In this study, SymSpell is used to detect and correct errors in OCR results that are compared to a corpus word list. Experimental results with 10 tested scans and passport photos showed that the integration of SymSpell with the Research and Development methodology was able to improve the OCR accuracy rate by 21,43% for certain Place of Birth and Issuing Office data from the Visual Inspection Zone. With this approach, OCR systems can provide more reliable results for practical applications.
Algoritma Jaro-Winkler Distance: Fitur Autocorrect dan Spelling Suggestion pada Penulisan Naskah Bahasa Indonesia di BMS TV Prasetyo, Agung; Baihaqi, Wiga Maulana; Had, Iqbaluddin Syam
Jurnal Teknologi Informasi dan Ilmu Komputer Vol 5 No 4: Agustus 2018
Publisher : Fakultas Ilmu Komputer, Universitas Brawijaya

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (556.38 KB) | DOI: 10.25126/jtiik.201854780

Abstract

Autocorrect adalah suatu sistem yang dapat memeriksa dan memperbaiki kesalahan penulisan kata secara otomatis. Dewasa ini fitur autocorrect memang sering ditemui pada berbagai perangkat dan aplikasi, misalkan pada papan ketik smartphone dan aplikasi misalkan sebut saja Microsoft Word. Sistem autocorrect tersebut langsung mengganti kata yang dianggap salah oleh sistem secara otomatis tanpa memberi tahu pengguna sehingga pengguna seringkali tidak sadar tulisannya berubah sedangkan kata penggantinya tidak selalu benar sesuai dengan yang dimaksud pengguna. Pengetahuan Microsoft Word pada fitur autocorrect-nya berbahasa Inggris sehingga tidak dapat diterapkan pada penulisan naskah berita di BMS TV. Setiap harinya News Director BMS TV memeriksa naskah yang akan diberitakan dimana termasuk diantaranya adalah pemeriksaan ejaan. Dengan fitur autocorrect dan spelling suggestion bahasa Indonesia diharapkan dapat membantu News Director BMS TV untuk memeriksa dan memperbaiki kesalahan penulisan kata secara otomatis serta memberi saran penulisan ejaan kata yang benar dalam bahasa Indonesia. Metode pengembangan perangkat lunak yang digunakan adalah Extreme Programming dan algoritme Jaro-Winkler Distance. Jaro-Winkler adalah algoritme untuk menghitung nilai jarak kedekatan antara dua teks. Hasil dari penelitian ini adalah sebuah sistem yang dapat membantu News Director BMS TV dalam pemeriksaan kesalahan penulisan ejaan kata pada naskah bahasa Indonesia dan mempermudah News Director pusat dalam penghimpunan naskah dari berbagai kontributor BMS TV. Dapat disimpulkan bahwa fitur autocorrect dan spelling suggestion dapat menengani kesalahan penulisan ejaan kata dengan pengujian 60 kata yang terdiri dari berbagai skenario kesalahan penulisan kata fitur ini dapat memperbaiki sepuluh kata secara otomatis dengan benar dan memunculkan saran ejaan kata pada 39 kata dengan tepat. AbstractAutocorrect is a software system that automatically identifies and correct misspelled words. Nowadays autocorrect feature is often encountered in various devices dan applications, like on the smartphone keyboard dan Microsoft Word application. The autocorrect system instantly replaces the word that is considered wrong by the system automatically without notifying the user so that users are often not aware of writing changes while the replacement word is not always true in accordance with the intended user. The Autocorrect feature of Microsoft Word uses English so it can’t be applied on writing news script in BMS TV. Every day News Director of BMS TV checks the script that would be reported where there is a spell checking included. By using bahasa in autocorrect dan spelling suggestion, it is expected to help News Director BMS TV to check dan fix the misspelled word automatically dan give suggestion for the right words spelling in bahasa. The development software method that is used is Extreme Programming dan Jaro-Winkler Distance algorithm. Jaro-Winkler is an algorithm that is applied to calculate the distance of proximity between two texts. The results of this study is a system that could help News Director BMS TV in identifying  misspelled words on script in bahasa dan to make it easier for News Director center in collecting of manuscripts from various contributors of BMS TV. It can be concluded that the autocorrect dan spelling suggestion features can compound the misspelled words with a 60-word test consisting of various error scenarios. This feature can correct ten words automatically dan show correct spelling suggestion word on 39 words.