Lontar Komputer: Jurnal Ilmiah Teknologi Informasi
Vol. 16 No. 01 (2025): Vol.16, No. 01 April 2025

Annotation Error Detection and Correction for Indonesian POS Tagging Corpus

Muhammad Alfian (Department of Informatics, Institut Teknologi Sepuluh Nopember)
Umi Laili Yuhana (Department of Informatics, Institut Teknologi Sepuluh Nopember)
Daniel Siahaan (Department of Informatics, Institut Teknologi Sepuluh Nopember)
Harum Munazharoh (Department of Indonesian Language and Literature, Universitas Airlangga)



Article Info

Publish Date
12 Oct 2025

Abstract

Linguistic Corpus is the primary material for training and evaluating machine learning models, especially for POS Tagging. However, the human-annotated corpus is not free from annotation errors. Annotation errors have a negative impact on model performance. Therefore, we propose annotation error detection and correction. We detect annotation errors in the Indonesian POS Tagging corpus using the n-gram variation method. Then, we correct the corpus using an expert-voting approach. Annotation error detection successfully collected 6,536 annotation error candidates. Each candidate has two possibilities: (i) an ambiguous word or (ii) an incorrect annotation. Annotation error correction validated and corrected the candidates using the majority-voting method in an expert group. Annotation error correction successfully identified and corrected 503 words from 1918 sentences. Then, we compared the performance of the POS Tagging model with the corpus before and after correction. The results showed a significant improvement in the F1-score value (+9.69%) compared to the uncorrected corpus.

Copyrights © 2025






Journal Info

Abbrev

lontar

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management Electrical & Electronics Engineering Engineering

Description

Lontar Komputer: Jurnal Ilmiah Teknologi Informasi focuses on the theory, practice, and methodology of all aspects of technology in the field of computer science and engineering. It provides an international publication platform to boost the scientific and academic publication of research in the ...