Digital Zone: Jurnal Teknologi Informasi dan Komunikasi
Vol. 16 No. 2 (2025): Digital Zone: Jurnal Teknologi Informasi dan Komunikasi

Integrated Named Entity Recognition and Identical-Entity Detection for Extracting Unique Information Sources in News Articles

Ansyah, Adi Surya Suwardi (Unknown)
Oranova Siahaan, Daniel (Unknown)
izqi Paradisiaca , Brian R (Unknown)



Article Info

Publish Date
14 Oct 2025

Abstract

Native advertising is often difficult to detect because it resembles regular news articles. One indicator is the absence of diverse information sources or the reliance on a single perspective. Therefore, it is necessary to employ an extraction technique capable of consolidating various forms of identical entity mentions. This study integrates an NER model based on XLNet+BiLSTM+CRF with identical entity classification using Levenshtein distance features and static and contextual vector representations. The results show an F1-score of 93.71% at the entity level and 92.84% for identical entity identification, along with a list of unique citation sources. These findings demonstrate that this unique list can be an additional feature in detecting native advertising, which often relies on a single source. With an average unique entity coverage of 97.40%, the proposed architecture can extract unique entities within news articles

Copyrights © 2025






Journal Info

Abbrev

dz

Publisher

Subject

Computer Science & IT Engineering

Description

Digital Zone journal publish by Fakultas Ilmu Komputer Universitas Lancang Kuning (Online ISSN 2477-3255 and Print ISSN 2086-4884) This journal publish two periode in a year on May and ...