JOURNAL OF APPLIED INFORMATICS AND COMPUTING
Vol. 9 No. 6 (2025): December 2025

Medical Named Entity Recognition from Indonesian Health-News using BiLSTM-CRF with Static and Contextual Embeddings

Ignasius, Darnell (Unknown)
Novita Dewi , Ika (Unknown)
Bernadette Chayeenee Norman , Maria (Unknown)
Rakhmat Sani, Ramadhan (Unknown)



Article Info

Publish Date
05 Dec 2025

Abstract

Named Entity Recognition (NER) is vital for structuring medical texts by identifying entities such as diseases, symptoms, and drugs. However, research on Indonesian medical NER remain limited due to the lack of annotated corpora and linguistic resources. This scarcity often leads to difficulties in learning meaningful word representations, which are crucial for accurate entity identification. This research aims to compare the effectiveness of static and contextual embeddings in enhancing entity recognition on Indonesian biomedical text. The experimental setup involved utilizing both static (Word2Vec) and contextual (IndoBERT) embeddings in conjunction with neural architectures (BiLSTM) along with Conditional Random Fields (CRF). The BiLSTM architecture was selected for its ability to capture bidirectional dependencies in language sequences. Specifically, four models: Word2Vec-BiLSTM, Word2Vec-BiLSTM-CRF, IndoBERT-BiLSTM, and IndoBERT-BiLSTM-CRF were evaluated to assess the impact of contextual representations and structured decoding. The models were trained on a manually annotated DetikHealth corpus, where specific medical entities such as diseases, symptoms, and drugs were labeled with the BIO-tagging scheme. Performance was subsequently evaluated based on standard metrics: precision, recall, and F1-score. Results indicate that IndoBERT’s contextual embeddings significantly outperform static Word2Vec features. The IndoBERT-BiLSTM-CRF model achieved the highest performance micro-F1 0.4330, macro-F1 0.3297, with the Disease entity reaching an F1-score of 0.5882. Combining contextual embeddings with CRF-based decoding enhances semantic understanding and boundary consistency, demonstrating superior performance for Indonesian biomedical NER. Future work should explore domain-adaptive pretraining and larger biomedical corpora to further improve contextual accuracy.

Copyrights © 2025






Journal Info

Abbrev

JAIC

Publisher

Subject

Computer Science & IT

Description

Journal of Applied Informatics and Computing (JAIC) Volume 2, Nomor 1, Juli 2018. Berisi tulisan yang diangkat dari hasil penelitian di bidang Teknologi Informatika dan Komputer Terapan dengan e-ISSN: 2548-9828. Terdapat 3 artikel yang telah ditelaah secara substansial oleh tim editorial dan ...