Jurnal Arbitrer
Vol. 12 No. 3 (2025)

DICO-JALF v.1.0: Diponegoro Corpus of Japanese Learners as a Foreign Language in Indonesia with AI Error Annotation and Human Supervision

Prihantoro, Prihantoro (Unknown)
Ishikawa, Shin'Ichiro (Unknown)
Liu, Tanjun (Unknown)
Fadli, Zaki Ainul (Unknown)
Rini, Elizabeth Ika Hesti Aprilia Nindia (Unknown)
Kepirianto, Catur (Unknown)



Article Info

Publish Date
30 Sep 2025

Abstract

There is a growing body of research in using AI for corrective feedback in foreign language teaching. However, few studies have specifically addressed the accuracy of AI analysis in learner corpus research. This study aims to create an AI-annotated corpus whose data were obtained from learners of Japanese as a Foreign Language (JFL) in Indonesia with human supervision; branded it as DICO-JALF v.1.0. The aim is to measure to what extent ChatGPT accurately annotates errors. A task was first administered to collect corpus data and metadata to build the corpus. The corpus was error-annotated using ChatGPT 4.0. Human annotators manually supervised the accuracy of AI-generated annotations. Regarding errors committed by learners, it is observed that incorrect lexical choices and forms dominate the cause of errors, while underuse and overuse are minimal. It can be concluded that ChatGPT demonstrated an average accuracy of 70% correct identification of errors. Regarding error rate, the verb is the category where errors are most frequent, which maybe driven by its conjugation, a feature absent in Indonesian, the L1 of the students. This suggests that Indonesian learners' acquisition of Japanese verbs needs greater emphasis. As compared to other similar studies, this is relatively low. However, it can be argued that one factor determining the accuracy of ChatGPT annotations, or any other LLM-based tool, is the complexity of the annotation scheme they adhere to. The corpus have been made available for download. The annotations shall be readable by a corpus query system that reads XML tags. This corpus serves as a foundational resource for future research on AI-assisted error analysis in JFL learning contexts in Indonesia.

Copyrights © 2025






Journal Info

Abbrev

ARBITRER

Publisher

Subject

Languange, Linguistic, Communication & Media

Description

ARBITRER Jurnal Masyarakat Linguistik Indonesia, merupakan jurnal ilmiah yang menyajikan artikel orisinil tentang pengetahuan dan informasi penelitian atau aplikasi penelitian dan pengembangan terkini dalam bidang ilmu bahasa (linguistik). Jurnal ini merupakan sarana publikasi dan ajang berbagi ...