Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : International Journal Of Computer, Network Security and Information System (IJCONSIST)

Comparing Structured Prompts for Denoising Noisy Certificate Text Dimas Saputra; I Gede Susrama Mas Diyasa; Eva Yulia Puspaningrum; Wan Suryani Wan Awang
IJCONSIST JOURNALS Vol 6 No 2 (2025): March
Publisher : International Journal of Computer, Network Security and Information System

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33005/ijconsist.v6i2.133

Abstract

This study addresses the challenge of noisy text resulting from Optical Character Recognition (OCR) on certificates, which hinders effective classification in Recognition of Prior Learning (RPL) contexts. To mitigate this issue, researchers propose the use of prompt-based denoising leveraging a Large Language Model (LLM), specifically the Gemini model, to refine the extracted text prior to classification. The methodology integrates OCR via PyTesseract, LLM-driven denoising using structured prompts (CSIR, CLEAR, and CO-STAR), and a BERT-base-uncased model for classification. Synonym replacement is also applied for data augmentation. Performance evaluation is conducted using accuracy, validation accuracy, confusion matrix, and classification reports. The results demonstrate a substantial improvement in classification performance. The baseline scenario achieved an accuracy of 82.14%, whereas the best-performing prompt structure, CO-STAR, reached 98.81%, marking an increase of over 15 percentage points. Similar trends were observed across all evaluation metrics, with CO-STAR delivering the highest precision, recall, and F1-score values. In conclusion, incorporating LLM-driven denoising through effective prompt strategies enhances the quality of OCR-extracted text and significantly boosts classification outcomes in certificate-based applications.