Prihantoro Prihantoro
English Department, Faculty Of Humanities, Diponegoro University, Indonesia

Published : 31 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Jurnal Arbitrer

DICO-JALF v.1.0: Diponegoro Corpus of Japanese Learners as a Foreign Language in Indonesia with AI Error Annotation and Human Supervision Prihantoro, Prihantoro; Ishikawa, Shin'Ichiro; Liu, Tanjun; Fadli, Zaki Ainul; Rini, Elizabeth Ika Hesti Aprilia Nindia; Kepirianto, Catur
Jurnal Arbitrer Vol. 12 No. 3 (2025)
Publisher : Masyarakat Linguistik Indonesia Universitas Andalas

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.25077/ar.12.3.274-288.2025

Abstract

There is a growing body of research in using AI for corrective feedback in foreign language teaching. However, few studies have specifically addressed the accuracy of AI analysis in learner corpus research. This study aims to create an AI-annotated corpus whose data were obtained from learners of Japanese as a Foreign Language (JFL) in Indonesia with human supervision; branded it as DICO-JALF v.1.0. The aim is to measure to what extent ChatGPT accurately annotates errors. A task was first administered to collect corpus data and metadata to build the corpus. The corpus was error-annotated using ChatGPT 4.0. Human annotators manually supervised the accuracy of AI-generated annotations. Regarding errors committed by learners, it is observed that incorrect lexical choices and forms dominate the cause of errors, while underuse and overuse are minimal. It can be concluded that ChatGPT demonstrated an average accuracy of 70% correct identification of errors. Regarding error rate, the verb is the category where errors are most frequent, which maybe driven by its conjugation, a feature absent in Indonesian, the L1 of the students. This suggests that Indonesian learners' acquisition of Japanese verbs needs greater emphasis. As compared to other similar studies, this is relatively low. However, it can be argued that one factor determining the accuracy of ChatGPT annotations, or any other LLM-based tool, is the complexity of the annotation scheme they adhere to. The corpus have been made available for download. The annotations shall be readable by a corpus query system that reads XML tags. This corpus serves as a foundational resource for future research on AI-assisted error analysis in JFL learning contexts in Indonesia.