Journal of Applied Data Sciences
Vol 7, No 2: May 2026

A Hybrid Method for Low-Resource Named Entity Recognition

Do Minh Duc (Vietnam National University, Hanoi)
Quan Xuan Truong (Vietnam National University, Hanoi)
Viet Tran Hong (Vietnam National University, Hanoi)
Le Hoang Anh (Center for Biodiversity Monitoring and Investigation, Ha Noi)
Mac Thi Minh Tra (Center for Biodiversity Monitoring and Investigation, Ha Noi)
Nguyen Van Thuy (Center for Biodiversity Monitoring and Investigation, Ha Noi)
Le Hai Ha (Hanoi University of Science and Technology)
Vinh Nguyen Van (Vietnam National University, Hanoi)



Article Info

Publish Date
05 Apr 2026

Abstract

Named Entity Recognition (NER) is a critical component of Natural Language Processing with diverse applications in information extraction and conversational AI. However, NER in specific domains for low-resource languages faces challenges such as limited annotated data and heterogeneous label sets. This study addresses these issues by proposing a hybrid neurosymbolic framework that integrates rule-based processing with deep learning models for Vietnamese NER. The core idea involves a two-stage pipeline: first, a rule-based component reduces label complexity by grouping relational and special categories; second, pre-trained language models are fine-tuned for high-precision extraction. A post-processing module is then utilized to restore fine-grained labels, preserving expressiveness for application-level usability. To mitigate data scarcity, a scalable data augmentation strategy leveraging Large Language Models (LLMs) is introduced to expand the label set without full re-annotation—a significant novelty of this work. The effectiveness of this method was evaluated across five specific-domain datasets, including logistics, wildlife, and healthcare. Experimental results demonstrate substantial improvements over strong RoBERTa-based baselines. Specifically, the proposed system achieved F1 scores of 90% in Customer Service (up from 83%), 84% in GAM (up from 73%), 83% in AI Fluent (up from 80%), 94% in PhoNER_Covid19 (up from 91%), and 60% in Rare Wildlife (up from 36%). These findings confirm that the hybrid approach effectively captures the linguistic complexity of Vietnamese and contextual nuances in specialized domains, offering a robust contribution to low-resource NER research.

Copyrights © 2026






Journal Info

Abbrev

JADS

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes ...