Native advertising is often difficult to detect because it resembles regular news articles. One indicator is the absence of diverse information sources or the reliance on a single perspective. Therefore, it is necessary to employ an extraction technique capable of consolidating various forms of identical entity mentions. This study integrates an NER model based on XLNet+BiLSTM+CRF with identical entity classification using Levenshtein distance features and static and contextual vector representations. The results show an F1-score of 93.71% at the entity level and 92.84% for identical entity identification, along with a list of unique citation sources. These findings demonstrate that this unique list can be an additional feature in detecting native advertising, which often relies on a single source. With an average unique entity coverage of 97.40%, the proposed architecture can extract unique entities within news articles
Copyrights © 2025