Garuda - Garba Rujukan Digital

Jurnal ELTIKOM : Jurnal Teknik Elektro, Teknologi Informasi dan Komputer

Vol. 10 No. 1 (2026)

Norman, Maria Bernadette Chayeenee (Unknown)
Dewi, Ika Novita (Unknown)
Ignasius, Darnell (Unknown)

Publish Date
21 May 2026

Health-related news articles play an increasingly important role in public health monitoring. However, their unstructured linguistic style complicates the automatic extraction of biomedical information. Indonesian health news shows high lexical variation by combining medical terms, colloquial expressions, borrowed Eng-lish words, and culturally specific symptom descriptions. This condition creates challenges for Named Entity Recognition (NER). To address the limited availability of domain-specific resources, this study compares four Transformer-based models, namely BERT, IndoBERT, RoBERTa, and BioBERT, for biomedical NER in Indone-sian health news. A new BIO-annotated dataset consisting of 272 manually labeled articles was constructed and validated, achieving strong inter-annotator agreement (Cohen’s Kappa = 0.88). To reduce data limita-tions, an additional 103 articles were automatically annotated using the best-performing model, RoBERTa, through a semi-supervised approach. All models were fine-tuned under identical settings and evaluated at both BIO and entity levels. The results show that RoBERTa achieves the highest weighted F1-score (0.9543). Howev-er, its macro F1-score (0.3873) indicates uneven performance across entity classes because of severe label im-balance, with non-entity tokens dominating the dataset. This finding highlights the importance of emphasizing macro-level evaluation to better reflect entity recognition performance. RoBERTa consistently outperforms the other models, which may be explained by its robust architecture and adaptability to diverse linguistic patterns. In contrast, BioBERT underperforms because of cross-lingual and domain mismatch, as it is pretrained on Eng-lish biomedical corpora and optimized for scientific text rather than journalistic language. The error analysis further identifies boundary inconsistencies and under-detection of low-frequency entities, especially in the drug and symptom categories.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Jurnal ELTIKOM : Jurnal Teknik Elektro, Teknologi Informasi dan Komputer

Website

Abbrev

eltikom

Publisher

Politeknik Negeri Banjarmasin

Subject

Aerospace Engineering Computer Science & IT Control & Systems Engineering Electrical & Electronics Engineering Engineering

Description

We are the Editor of Jurnal ELTIKOM, invites Mr. / Ms Lecturer, researcher and practitioner to be able to publish your paper on topics covering Electrical Engineering, Electronics Engineering, Telecommunications Engineering, Computer Engineering, Information ...

Article Info

Abstract

Domain Adaptation of Bert Models for Biomedical Entity Extraction from Indonesian Health News

Article Info

Abstract