JURIKOM (Jurnal Riset Komputer)
Vol. 13 No. 1 (2026): Februari 2026

Optimasi IndoBERT untuk Pengenalan Entitas Bernama Bahasa Indonesia pada Data Media Sosial dengan Penalaan Hiperparameter Optuna

Siswanto, Bambang (Unknown)
M. Hanafi (Unknown)



Article Info

Publish Date
28 Feb 2026

Abstract

Named Entity Recognition (NER) merupakan salah satu tugas fundamental dalam pemrosesan bahasa alami yang berperan penting dalam ekstraksi informasi terstruktur dari teks tidak terstruktur. Pada Bahasa Indonesia, kinerja model NER berbasis pre-trained BERT sangat dipengaruhi oleh konfigurasi hiperparameter pada tahap fine-tuning. Namun, banyak penelitian masih menggunakan konfigurasi bawaan atau penyesuaian terbatas, sehingga potensi peningkatan kinerja dan stabilitas model belum sepenuhnya dimanfaatkan. Penelitian ini bertujuan untuk mengevaluasi dampak optimasi hiperparameter berbasis Optuna terhadap kinerja dan stabilitas pelatihan model pre-trained BERT untuk tugas NER Bahasa Indonesia. Model yang digunakan adalah IndoBERT (indobenchmark/indobert-base-p1) yang difine-tune untuk mengenali entitas Person (PER), Organization (ORG), dan Location (LOC) dengan skema pelabelan BIO. Metode optimasi hiperparameter dilakukan menggunakan pendekatan Bayesian berbasis Named Entity Recognition (NER) is a fundamental task in natural language processing for extracting structured information from unstructured text. In Indonesian, particularly for informal and diverse social media text, the performance of NER models based on Bidirectional Encoder Representations from Transformers (BERT) is strongly influenced by hyperparameter configurations during fine-tuning. However, many studies still rely on default settings or limited adjustments, so the potential improvements in performance and training stability have not been fully exploited. This study evaluates the impact of hyperparameter tuning using Optuna with a Tree-structured Parzen Estimator (TPE) on the performance and training stability of IndoBERT (indobenchmark/indobert-base-p1) on Twitter/X data. The main contribution of this work is an empirical evaluation of how hyperparameter tuning improves IndoBERT’s performance and training stability, and the resulting recommendations of reliable configurations for reproducible experiments and practical deployment of Indonesian NER. The dataset is annotated using the Begin–Inside–Outside (BIO) labeling scheme for three entity types: person (PER), organization (ORG), and location (LOC). The optimization objective is defined as the F1-score on the validation set. The results show that the Optuna configuration achieves a precision of 0.9338, recall of 0.9312, F1-score of 0.9325, and accuracy of 0.9854 on the test set, outperforming the baseline with an F1-score of 0.9253 and accuracy of 0.9837. Multi-seed evaluation indicates consistent improvements, with an average F1 of 0.9302 ± 0.0016 compared to 0.9238 ± 0.0009 for the baseline. These findings confirm that Optuna-based hyperparameter tuning improves both the performance and reliability of IndoBERT for Indonesian NER on social media text.

Copyrights © 2026






Journal Info

Abbrev

jurikom

Publisher

Subject

Computer Science & IT Control & Systems Engineering Electrical & Electronics Engineering

Description

JURIKOM (Jurnal Riset Komputer) membahas ilmu dibidang Informatika, Sistem Informasi, Manajemen Informatika, DSS, AI, ES, Jaringan, sebagai wadah dalam menuangkan hasil penelitian baik secara konseptual maupun teknis yang berkaitan dengan Teknologi Informatika dan Komputer. Topik utama yang ...