This Author published in this journals
All Journal Media Penelitian Pendidikan : Jurnal Penelitian dalam Bidang Pendidikan dan Pengajaran Sistemasi: Jurnal Sistem Informasi Informatika Mulawarman: Jurnal Ilmiah Ilmu Komputer Jurnal CoreIT Jurnal Pengabdian Pada Masyarakat IT JOURNAL RESEARCH AND DEVELOPMENT Dinamisia: Jurnal Pengabdian Kepada Masyarakat Jurnal Pertanian Agros MATRIK : Jurnal Manajemen, Teknik Informatika, dan Rekayasa Komputer International Journal of Community Service Learning Digital Zone: Jurnal Teknologi Informasi dan Komunikasi Ensiklopedia of Journal JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika) Al-Khidmat : Jurnal Ilmiah Pengabdian Kepada Masyarakat Jurnal Agrium Jurnal Agroplasma Zonasi: Jurnal Sistem Informasi Best Journal (Biology Education, Sains and Technology) Jurnal Teknik Informatika (JUTIF) Community Empowerment Jurnal Cahaya Mandalika JUSTIN (Jurnal Sistem dan Teknologi Informasi) Margin Eco : Jurnal Ekonomi dan Perkembangan Bisnis Jurnal Pengabdian Masyarakat : Pemberdayaan, Inovasi dan Perubahan J-COSCIS : Journal of Computer Science Community Service Jurnal Ilmiah Fokus Ekonomi, Manajemen, Bisnis & Akuntansi (EMBA) CONSEN: Indonesian Journal of Community Services and Engagement International Journal of Educational Research Excellence (IJERE) Jurnal Pemberdayaan Sosial dan Teknologi Masyarakat JURNAL KARYA ILMIAH MULTIDISIPLIN Jurnal Agro Fabrica Blantika : Multidisciplinary Journal JIPITI: Jurnal Pengabdian kepada Masyarakat INOVTEK Polbeng - Seri Informatika Jurnal Komtika (Komputasi dan Informatika)
Claim Missing Document
Check
Articles

Complex Word Identification in Indonesian Children’s Texts: An IndoBERT Baseline and Error Analysis Lisnawita, Lisnawita; Bakar, Juhaida Abu; Rasli, Ruziana Mohamad; Costaner, Loneli; Guntoro, Guntoro
Jurnal Teknik Informatika (Jutif) Vol. 6 No. 6 (2025): JUTIF Volume 6, Number 6, Desember 2025
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2025.6.6.5501

Abstract

Complex Word Identification (CWI) is a crucial step for building text simplification systems, especially for Indonesian children’s reading materials where unfamiliar vocabulary can hinder comprehension. This study formulates token-level CWI for Indonesian children’s texts and establishes two baselines:  an interpretable rule-based model using linguistic features e.g., length, syllable heuristics, and affix patterns, and an IndoBERT model fine-tuned for token classification. This study construct and annotate a children’s text corpus and evaluate both approaches using standard classification metrics. On the test set (22.584 tokens), IndoBERT achieves an F1-score of 0.9972 for the CWI class, substantially outperforming the rule-based baseline (F1 = 0.8607). The IndoBERT system makes only 39 errors (23 false positives and 16 false negatives), indicating near-perfect performance under the evaluated setting. Furthermore, this study provides an error analysis to highlight remaining failure patterns and borderline cases that are difficult even for contextual models. The resulting benchmark and findings contribute to Informatics/Computer Science by providing a strong baseline and analysis for educational NLP in a low-resource language setting, supporting the development of Indonesian child-oriented NLP resources and downstream text simplification tools.
Co-Authors Abdullah Abdullah Ahmad Zamsuri, Ahmad Alfarasy, Febrizal Ali, Helmiyati Abdullah Antonius Fernando Aulia, Mhd. Iqbal AYU RAHMAWATI Bakar, Juhaida Abu Bayu Febriadi, Bayu Bimby, Novia Putri Budia Misri Budianto Hamuddin Budiastuti, Susanti Costaner, Loneli Costaner David Setiawan David Setiawan, David Djunaedi Djunaedi Elfrida Ratnawati Fadhillah, Resty Fenty Widya Fitria, Poppy Hamzah Eteruddin Hamzah Hamzah Handoko, Habib Hari Gunawan Herni Utami Rahmawati Hutabarat, Charles Parmonangan idel waldelmi, idel Ikhwan, Ferdy Ingrid Ovie Yosephine Iqbal Bukhori Istiatin, Istiatin Jeni Wardi Johar, Olivia Anggie Lasri Nijal Latifa Siswati Lisnawita Lisnawita Loneli Costaner Lubis, Abdul Rahman Lubis, Ahmad Fahmi Alhafiz Maisarah Maisarah Maisarah, Maisarah Makhrani Sari Ginting Mariza Devega Marzuti Isra Maulina, Viny Meilano, Dimas Mhd. Arief Hasan, Mhd. Arief Monika, Winda Monika Muhamad Sadar, Muhamad Muhammad Fikri Muhammad Iqbal Muhammad Yusuf Dibisono Mulyara, Budi Musfawati Nurhamin Nurhamin Nurholidan Siregar Nurholidan Siregar Nurul Hasanah Ovie, Ingrid Pandu Pratama Putra, Pandu Pratama Rahmad Dian Rahmad Syah Putra Rasli, Ruziana Mohamad Ratu Mutiara Siregar Rina Maharany Ririn Sari Wati Rizky Octa Putri Charin Roosmawati, Febriana SANTOSO SANTOSO Sapiri, Muhtar Saputra, Septian Tri Sari, Makhrani Sasi Utami Simorangkir, Jansihar Sinaga, Anisyah Sri Utaminingsih Sudarwati Sudarwati Suhardi Suhardi Sunaryanto, Hadi Sutejo Sutejo Syafitri (Scopus ID: 57200085316), Wenni Taufik, Kemal Tohir, Kurnainy Wagino Wenni Syafitri, Wenni Wenny Syafitri Wibisono, Moh Arief Aryo Yuhelmi Yuhelmi Yusuf Dibisono, Mhd zamzami Zamzami, Zamzami Zulham Effendi