Information Technology Education Journal
Vol. 5, No. 2, May (2026)

Automated Abstractive Summarization and Entity Extraction of Online News Information Using mT5 and BERT

Marwan Aldi Pratama (Universitas Islam Negeri Walisongo)
Siti Nur'aini (Universitas Islam Negeri Walisongo)
Maya Rini Handayani (Universitas Islam Negeri Walisongo)
Khothibul Umam (Universitas Islam Negeri Walisongo)



Article Info

Publish Date
30 May 2026

Abstract

Purpose – This research aims to address the phenomenon of information overload on online news portals by developing an automated text summarization system capable of generating abstractive summaries while preserving essential entities. In addition, this research also aims to improve the coherence and quality of summaries compared to conventional extractive methods. Methods/approach – This research employs a quantitative approach with an experimental method conducted on 100 news articles regarding the Israel–Iran conflict collected from CNN via RSS. The proposed system integrates the mT5 model for abstractive summarization and the multilingual BERT model for Named Entity Recognition (NER). The stages encompass data acquisition, preprocessing, the preparation of reference summaries, automated summarization, entity extraction, and evaluation using reduction rates and ROUGE metrics. Findings – The research results show that the system is capable of producing summaries with an average reduction rate of 89.83%, such that the summary length is only approximately 10.17% of the original text. Evaluation indicates a ROUGE-1 value of 0.4095, ROUGE-2 of 0.2356, and ROUGE-L of 0.3442. The mT5 pipeline model yielded marginally superior ROUGE-1 and ROUGE-L scores, whereas the baseline mT5 model demonstrated a slight advantage in the ROUGE-2 metric. Conversely, the extractive TextRank method lagged significantly behind both transformer based models, particularly in generating fluent and contextually coherent summaries. Research limitations – This research has limitations in terms of data coverage, which still focuses on a single conflict domain, as well as entity classification errors due to lexical ambiguity and limitations in the model's contextual understanding, which may affect the generalization and accuracy of the system. Originality – This research offers an integration between abstractive summarization and entity extraction within a structured pipeline, there by producing summaries that are not only concise but also more informative and organized.

Copyrights © 2026






Journal Info

Abbrev

INTEC

Publisher

Subject

Computer Science & IT Education

Description

INTEC Journal is published by the Informatics and Computer Engineering Education Study Program at Makassar State University. INTEC Journal is published periodically three times a year, containing articles on research results and / or critical studies in the field of Informatics and Computer ...