Aprilia, Nadia Eka
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Comparison of BioBERT and DistilBERT for Named Entity Recognition on Indonesian Radiology Clinical Data Aprilia, Nadia Eka; Utomo, Danang Wahyu
Jurnal Informatika dan Rekayasa Perangkat Lunak Vol. 8 No. 1 (2026): Maret
Publisher : Universitas Wahid Hasyim

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Named Entity Recognition (NER) in Indonesian language radiology reports faces significant challenges due to the limited availability of labeled data for model training. This constraint is a major obstacle to developing an accurate medical information extraction system. Pseudo-labeling emerges as a potential solution by leveraging abundant unlabeled data to expand the training dataset without the need for time-consuming manual annotation. This study aims to compare the performance of two transformer models, BioBERT and DistilBERT, fine-tuned on pseudo-labeled data for extracting medical entities from Indonesian radiology reports. The research methodology encompasses three main stages text preprocessing and normalization, text alignment using regular expressions with BIO labeling, and model fine-tuning with a pseudo-labeling strategy. Model performance was evaluated using Precision, Recall, and F1-score metrics on an adapted radiology dataset. The results indicate that pseudo-labeling was effective in enhancing the performance of both models. DistilBERT achieved a higher accuracy of 96,4, while BioBERT reached 92.78%. Nonetheless, DistilBERT demonstrated superior computational efficiency with faster training time. This study provides valuable insight for selecting an optimal model architecture for NER tasks on Indonesian medical text, considering the balance between accuracy and computational efficiency.