Claim Missing Document
Check
Articles

Domain Adaptation of Bert Models for Biomedical Entity Extraction from Indonesian Health News Norman, Maria Bernadette Chayeenee; Dewi, Ika Novita; Ignasius, Darnell
Jurnal ELTIKOM : Jurnal Teknik Elektro, Teknologi Informasi dan Komputer Vol. 10 No. 1 (2026)
Publisher : P3M Politeknik Negeri Banjarmasin

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31961/eltikom.v10i1.2116

Abstract

Health-related news articles play an increasingly important role in public health monitoring. However, their unstructured linguistic style complicates the automatic extraction of biomedical information. Indonesian health news shows high lexical variation by combining medical terms, colloquial expressions, borrowed Eng-lish words, and culturally specific symptom descriptions. This condition creates challenges for Named Entity Recognition (NER). To address the limited availability of domain-specific resources, this study compares four Transformer-based models, namely BERT, IndoBERT, RoBERTa, and BioBERT, for biomedical NER in Indone-sian health news. A new BIO-annotated dataset consisting of 272 manually labeled articles was constructed and validated, achieving strong inter-annotator agreement (Cohen’s Kappa = 0.88). To reduce data limita-tions, an additional 103 articles were automatically annotated using the best-performing model, RoBERTa, through a semi-supervised approach. All models were fine-tuned under identical settings and evaluated at both BIO and entity levels. The results show that RoBERTa achieves the highest weighted F1-score (0.9543). Howev-er, its macro F1-score (0.3873) indicates uneven performance across entity classes because of severe label im-balance, with non-entity tokens dominating the dataset. This finding highlights the importance of emphasizing macro-level evaluation to better reflect entity recognition performance. RoBERTa consistently outperforms the other models, which may be explained by its robust architecture and adaptability to diverse linguistic patterns. In contrast, BioBERT underperforms because of cross-lingual and domain mismatch, as it is pretrained on Eng-lish biomedical corpora and optimized for scientific text rather than journalistic language. The error analysis further identifies boundary inconsistencies and under-detection of low-frequency entities, especially in the drug and symptom categories.
Co-Authors Abas Setiawan Abdul Syukur Abdul Syukur Abu Salam Adhitya Nugraha Adhitya Nugraha Adriani, Mira Riezky Agung Priyo Utomo, Rino Agustin, Kristina Allifian Ilham Febriyana Almira Zuhrotus Safira Alzami, Farrikh Ardytha Luthfiarta Arifin, Muhammad Farhan Arry Maulana Syarif, Arry Maulana Arunia, Aurelya Prameswari Asih Rohmani, Asih Atha Rohmatullah, Fawwaz Ayuningsih, Dewi Putri Azhari Azhari Bramantyo, Satrio Bisma Candra Irawan Catur Supriyanto Catur Supriyanto Darnell Ignasius Dhita Aulia Octaviani Diana Aqmala Dwi Puji Prabowo, Dwi Puji Dzaki, Azmi Abiyyu Dzawil Uqul Egia Rosi Subhiyakto Egia Rosi Subhiyakto, Egia Rosi Erika Devi Udayanti Erwin Yudi Hidayat Erwin Yudi Hidayat Fahri Firdausillah Fajar Agung Nugroho Fitri, Maulatus Shaffira Fitriyani, Shelomita Hafiizhudin, Lutfi Azis Handayani, Sri Haresta, Alif Agsakli Hasan Asari Heribertus Himawan Ifan Rizqa Ignasius, Darnell Indrayani, Heni Irawan, Enrico Irvan Muzakkir Irvan Muzakkir Isworo, Slamet Junta Zeniarja Khafiizh Hastuti Khariroh, Shofiyatul Kurniawan, Defri Laurent, Feby Lisa Mardiana Marjuni, Aris Megantara, Rama Aria Michael Surya Adi Prasaja Muljono Muljono Mumtaz, Najma Amira MY. Teguh Sulistyono Norman, Maria Bernadette Chayeenee Octaviani, Dhita Aulia Priyo Utomo, Rino Agung Puri Sulistiyawati Pusung, Elvanro Marthen Ramadhan Rakhmat Sani Ramadhani, Talitha Olga Reza, Ivan Muhammad Rhyan David Levandra Ricardus Anggi Pramunendar Rifamuthia, Titis Ritzkal, Ritzkal Rizqi, Ainur Rahma Miftakhul Ruri Suko Basuki Salsabilla, Annisa Ratna Saputra, Filmada Ocky Shelomita Fitriyani Sholikun, Sholikun Sindhu Rakasiwi Sindhu Rakasiwi Sri Winarno Subowo, Moh Hadi Sulistyono, Teguh Suyatno, Revalina Syarifah, Ulima Muna Utomo, Danang Wahyu Wellia Shinta Sari Wibowo, Isro' Rizky Widhiyanti, Erna Amalia Winarsih, Nurul Anisa Sri Yanuaresta, Dianna Zainal Arifin Hasibuan