Garuda - Garba Rujukan Digital

Lingua : Journal of Linguistics and Language

Vol. 3 No. 3 (2025): September 2025

Aribowo, Eric Kunto (Unknown)
Prima, Anggra (Unknown)

Publish Date
30 Sep 2025

This study introduces a genre-annotated academic corpus for Indonesian and evaluates IndoSciBERT, a domain-specific NLP model trained on this resource. To address the scarcity of rhetorical datasets in low-resource languages, we compiled a 52,300-document corpus from DOAJ and SINTA-indexed journals (2015–2025) and annotated 5,200 paragraphs using the CARS and Argumentative Zoning frameworks. IndoSciBERT was then fine-tuned for rhetorical classification. We employed GROBID for PDF to TEI conversion, TEITOK for annotation, and SIPEBI/KBBI for spelling normalization. The IndoSciBERT model was benchmarked against IndoBERT on rhetorical classification tasks. IndoSciBERT achieved an F1 score of 0.82 and an accuracy of 84.2%, outperforming the baseline model and showing strong reliability in distinguishing rhetorical moves. These results affirm the value of domain-specific modeling for educational applications. The annotated corpus not only supports genre analysis, pedagogy, and automated writing feedback, but also establishes a foundation for inclusive NLP. In particular, this work makes a distinct contribution by offering a sustainable path to enhance academic literacy in Bahasa Indonesia through intelligent, genre-aware tools.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Lingua : Journal of Linguistics and Language

Website

Abbrev

lingua

Publisher

PT Penerbit Ilmiah Indonesia

Subject

Languange, Linguistic, Communication & Media

Description

Lingua : Journal of Linguistics and Language with ISSN Number 3032-3304 (Online) published by Indonesian Scientific Publication, is a leading scholarly journal that has undergone a rigorous peer-review process and is committed to open access publication. Established to advance the field of ...

Article Info

Abstract

Genre Aware Language Modeling for Indonesian Academic Writing: Building and Evaluating IndoSciBERT

Article Info

Abstract