Jurnal Informatika dan Rekayasa Perangkat Lunak
Vol. 8 No. 1 (2026): Maret

Comparison of BioBERT and DistilBERT for Named Entity Recognition on Indonesian Radiology Clinical Data

Aprilia, Nadia Eka (Unknown)
Utomo, Danang Wahyu (Unknown)



Article Info

Publish Date
30 Mar 2026

Abstract

Named Entity Recognition (NER) in Indonesian language radiology reports faces significant challenges due to the limited availability of labeled data for model training. This constraint is a major obstacle to developing an accurate medical information extraction system. Pseudo-labeling emerges as a potential solution by leveraging abundant unlabeled data to expand the training dataset without the need for time-consuming manual annotation. This study aims to compare the performance of two transformer models, BioBERT and DistilBERT, fine-tuned on pseudo-labeled data for extracting medical entities from Indonesian radiology reports. The research methodology encompasses three main stages text preprocessing and normalization, text alignment using regular expressions with BIO labeling, and model fine-tuning with a pseudo-labeling strategy. Model performance was evaluated using Precision, Recall, and F1-score metrics on an adapted radiology dataset. The results indicate that pseudo-labeling was effective in enhancing the performance of both models. DistilBERT achieved a higher accuracy of 96,4, while BioBERT reached 92.78%. Nonetheless, DistilBERT demonstrated superior computational efficiency with faster training time. This study provides valuable insight for selecting an optimal model architecture for NER tasks on Indonesian medical text, considering the balance between accuracy and computational efficiency.

Copyrights © 2026






Journal Info

Abbrev

JINRPL

Publisher

Subject

Computer Science & IT

Description

Journal of Informatics and Software Engineering accepts scientific articles in the focus of Informatics. The scope can be: Software Engineering, Information Systems, Artificial Intelligence, Computer Based Learning, Computer Networking and Data Communication, and ...