Garuda - Garba Rujukan Digital

International Journal Of Computer, Network Security and Information System (IJCONSIST)

Vol 6 No 2 (2025): March

Dimas Saputra (Unknown)
I Gede Susrama Mas Diyasa (Unknown)
Eva Yulia Puspaningrum (Unknown)
Wan Suryani Wan Awang (Unknown)

Publish Date
28 Mar 2025

This study addresses the challenge of noisy text resulting from Optical Character Recognition (OCR) on certificates, which hinders effective classification in Recognition of Prior Learning (RPL) contexts. To mitigate this issue, researchers propose the use of prompt-based denoising leveraging a Large Language Model (LLM), specifically the Gemini model, to refine the extracted text prior to classification. The methodology integrates OCR via PyTesseract, LLM-driven denoising using structured prompts (CSIR, CLEAR, and CO-STAR), and a BERT-base-uncased model for classification. Synonym replacement is also applied for data augmentation. Performance evaluation is conducted using accuracy, validation accuracy, confusion matrix, and classification reports. The results demonstrate a substantial improvement in classification performance. The baseline scenario achieved an accuracy of 82.14%, whereas the best-performing prompt structure, CO-STAR, reached 98.81%, marking an increase of over 15 percentage points. Similar trends were observed across all evaluation metrics, with CO-STAR delivering the highest precision, recall, and F1-score values. In conclusion, incorporating LLM-driven denoising through effective prompt strategies enhances the quality of OCR-extracted text and significantly boosts classification outcomes in certificate-based applications.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

International Journal Of Computer, Network Security and Information System (IJCONSIST)

Website

Abbrev

ijconsist

Publisher

Universitas Pembangunan Nasional Veteran Jawa Timur

Subject

Computer Science & IT

Description

Focus and Scope The Journal covers the whole spectrum of intelligent informatics, which includes, but is not limited to : • Artificial Immune Systems, Ant Colonies, and Swarm Intelligence • Autonomous Agents and Multi-Agent Systems • Bayesian Networks and Probabilistic Reasoning • ...

Article Info

Abstract

Comparing Structured Prompts for Denoising Noisy Certificate Text

Article Info

Abstract