Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2021 - 2026

0.23

P-Index

This Author published in this journals

All Journal Building of Informatics, Technology and Science

Nafanda, Cynthia Dwi

Unknown Affiliation

Author-ID : 8287687

Computer Science & IT

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

Optimalisasi Model BioBERT untuk Pengenalan Entitas pada Teks Medis dengan Conditional Random Fields (CRF) Nafanda, Cynthia Dwi; Salam, Abu
Building of Informatics, Technology and Science (BITS) Vol 6 No 4 (2025): March 2025
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i4.7042

This research evaluates the performance of various models in the Named Entity Recognition (NER) task for medical entities, focusing on imbalanced datasets. Six BioBERT model configurations were tested, incorporating optimization techniques such as Class Weight, Conditional Random Fields (CRF), and Hyperparameter Tuning. The evaluation was conducted using Precision, Recall, and F1-Score metrics, which are particularly relevant in the context of NER, especially for addressing class imbalance in the data. The dataset used is BC5CDR, which targets chemical and disease entities in unstructured medical texts from PubMed. The data was divided into three parts: a training dataset for model training, a validation dataset for model tuning, and a test dataset for performance evaluation. The dataset was split evenly to ensure unbiased model testing, leading to more accurate results that can serve as a reference for developing more efficient medical NER systems. The evaluation results indicate that BioBERT + CRF is the model with an F1-Score that reflects an optimal balance between Precision (ranked 3rd, 0.6067 for B-Chemical, 0.5594 for B-Disease, 0.4600 for I-Disease, and 0.5083 for I-Chemical) and Recall (ranked 3rd, 0.5580 for B-Chemical, 0.4491 for B-Disease, 0.5718 for I-Disease, and 0.3840 for I-Chemical) compared to other models. This model proved to be more accurate in detecting medical entities without compromising prediction precision. The model's stability is also enhanced by a smaller gap between Precision and Recall, making it the best choice for NER in medical texts. The application of early stopping techniques effectively prevented overfitting, ensuring the model learned optimally without losing generalization. With better balance in recognizing medical entities from unstructured texts, this model presents the most effective approach for NER systems in the medical domain.

Co-Authors Abu Salam

Title

Found 1 Documents
Search

Abstract

Title Search

Found 1 Documents Search

Abstract

Title

Found 1 Documents
Search