International Journal of Advances in Intelligent Informatics
Vol 12, No 2 (2026): May 2026

IndoBERTSkill: pretrained domain-specific language model for recognition Indonesian skill

Meilany Nonsi Tentua (Universitas PGRI Yogyakarta)
Suprapto Suprapto (Universitas Gadjah Mada)
Afiahayati Afiahayati (Universitas Gadjah Mada)



Article Info

Publish Date
31 May 2026

Abstract

The pretrained language model in Indonesian is already available for natural language processing tasks. However, this pre-trained model has been trained on Indonesian text, which has a different structure from the job description. Due to this, the pre-trained language model effectiveness for skill recognition purposes. IndoBERTSkill is a novel pre trained domain-specific language model that recognizes Indonesian language skills. It is built on the Bidirectional Encoder Representations from Transformers (BERT) architecture. IndoBERTSkill was trained on an extensive collection of Indonesian language texts from the Indonesian Wikipedia, the English Wikipedia, and the Indonesian Job Description from the job portal. IndoBERTSkill's performance was evaluated through two main approaches: (1) language modeling via Masked Language Model (MLM) prediction, and (2) fine-tuning on a custom annotated dataset (NERSkill) for Named Entity Recognition (NER) tasks. The fine-tuning process involved training a classification layer on top of the IndoBERTSkill model using BIO tagging to identify hard skills, soft skills, and technology entities. Similarly, the skill recognition model derived from IndoBERTSkill exhibits the highest F1-Score among various pre-trained language models, precisely at 87%, thus demonstrating robustness and strong generalizability for skill entity recognition in Indonesian job descriptions. IndoBERTSkill provides valuable resources for developing Indonesian natural language processing applications that require skills introduction. This could increase the accuracy and efficiency of skills recognition across various domains, including job matching, education, and training.

Copyrights © 2026






Journal Info

Abbrev

IJAIN

Publisher

Subject

Computer Science & IT

Description

International journal of advances in intelligent informatics (IJAIN) e-ISSN: 2442-6571 is a peer reviewed open-access journal published three times a year in English-language, provides scientists and engineers throughout the world for the exchange and dissemination of theoretical and ...