Bernadette Chayeenee Norman , Maria
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Medical Named Entity Recognition from Indonesian Health-News using BiLSTM-CRF with Static and Contextual Embeddings Ignasius, Darnell; Novita Dewi , Ika; Bernadette Chayeenee Norman , Maria; Rakhmat Sani, Ramadhan
Journal of Applied Informatics and Computing Vol. 9 No. 6 (2025): December 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i6.11574

Abstract

Named Entity Recognition (NER) is vital for structuring medical texts by identifying entities such as diseases, symptoms, and drugs. However, research on Indonesian medical NER remain limited due to the lack of annotated corpora and linguistic resources. This scarcity often leads to difficulties in learning meaningful word representations, which are crucial for accurate entity identification. This research aims to compare the effectiveness of static and contextual embeddings in enhancing entity recognition on Indonesian biomedical text. The experimental setup involved utilizing both static (Word2Vec) and contextual (IndoBERT) embeddings in conjunction with neural architectures (BiLSTM) along with Conditional Random Fields (CRF). The BiLSTM architecture was selected for its ability to capture bidirectional dependencies in language sequences. Specifically, four models: Word2Vec-BiLSTM, Word2Vec-BiLSTM-CRF, IndoBERT-BiLSTM, and IndoBERT-BiLSTM-CRF were evaluated to assess the impact of contextual representations and structured decoding. The models were trained on a manually annotated DetikHealth corpus, where specific medical entities such as diseases, symptoms, and drugs were labeled with the BIO-tagging scheme. Performance was subsequently evaluated based on standard metrics: precision, recall, and F1-score. Results indicate that IndoBERT’s contextual embeddings significantly outperform static Word2Vec features. The IndoBERT-BiLSTM-CRF model achieved the highest performance micro-F1 0.4330, macro-F1 0.3297, with the Disease entity reaching an F1-score of 0.5882. Combining contextual embeddings with CRF-based decoding enhances semantic understanding and boundary consistency, demonstrating superior performance for Indonesian biomedical NER. Future work should explore domain-adaptive pretraining and larger biomedical corpora to further improve contextual accuracy.