Health-related news articles play an increasingly important role in public health monitoring. However, their unstructured linguistic style complicates the automatic extraction of biomedical information. Indonesian health news shows high lexical variation by combining medical terms, colloquial expressions, borrowed Eng-lish words, and culturally specific symptom descriptions. This condition creates challenges for Named Entity Recognition (NER). To address the limited availability of domain-specific resources, this study compares four Transformer-based models, namely BERT, IndoBERT, RoBERTa, and BioBERT, for biomedical NER in Indone-sian health news. A new BIO-annotated dataset consisting of 272 manually labeled articles was constructed and validated, achieving strong inter-annotator agreement (Cohen’s Kappa = 0.88). To reduce data limita-tions, an additional 103 articles were automatically annotated using the best-performing model, RoBERTa, through a semi-supervised approach. All models were fine-tuned under identical settings and evaluated at both BIO and entity levels. The results show that RoBERTa achieves the highest weighted F1-score (0.9543). Howev-er, its macro F1-score (0.3873) indicates uneven performance across entity classes because of severe label im-balance, with non-entity tokens dominating the dataset. This finding highlights the importance of emphasizing macro-level evaluation to better reflect entity recognition performance. RoBERTa consistently outperforms the other models, which may be explained by its robust architecture and adaptability to diverse linguistic patterns. In contrast, BioBERT underperforms because of cross-lingual and domain mismatch, as it is pretrained on Eng-lish biomedical corpora and optimized for scientific text rather than journalistic language. The error analysis further identifies boundary inconsistencies and under-detection of low-frequency entities, especially in the drug and symptom categories.
Copyrights © 2026