Claim Missing Document
Check
Articles

Found 1 Documents
Search

Annotation Error Detection and Correction for Indonesian POS Tagging Corpus Muhammad Alfian; Umi Laili Yuhana; Daniel Siahaan; Harum Munazharoh
Lontar Komputer : Jurnal Ilmiah Teknologi Informasi Vol. 16 No. 01 (2025): Vol.16, No. 01 April 2025
Publisher : Institute for Research and Community Services, Udayana University

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24843/LKJITI.2025.v16.i01.p04

Abstract

Linguistic Corpus is the primary material for training and evaluating machine learning models, especially for POS Tagging. However, the human-annotated corpus is not free from annotation errors. Annotation errors have a negative impact on model performance. Therefore, we propose annotation error detection and correction. We detect annotation errors in the Indonesian POS Tagging corpus using the n-gram variation method. Then, we correct the corpus using an expert-voting approach. Annotation error detection successfully collected 6,536 annotation error candidates. Each candidate has two possibilities: (i) an ambiguous word or (ii) an incorrect annotation. Annotation error correction validated and corrected the candidates using the majority-voting method in an expert group. Annotation error correction successfully identified and corrected 503 words from 1918 sentences. Then, we compared the performance of the POS Tagging model with the corpus before and after correction. The results showed a significant improvement in the F1-score value (+9.69%) compared to the uncorrected corpus.