International Journal of Advances in Intelligent Informatics
Vol 12, No 2 (2026): May 2026

Enhancing multi-document summarization through topic–pattern-based sentence selection

Shaufiah Shaufiah (Queensland University of Technology, Brisbane)
Yuefeng Li (Queensland University of Technology, Brisbane)
Richi Nayak (Queensland University of Technology, Brisbane)
Yutong Wu (Australian e-Health Research Centre, CSIRO, Brisbane)



Article Info

Publish Date
31 May 2026

Abstract

The growing digital text content requires automated summarization systems as fundamental tools to enable users to access information at high speed. The automatic Multi-Document Summarization (MDS) requires systems to produce a summary which combines essential information from multiple documents. The extractive methods which use lexical signals and sentence-based rules only produce repetitive results because they cannot identify complex thematic relationships. This study developed an improved extractive MDS model which combines topic modeling with pattern-based semantic indicators and a method to choose diverse sentences. The model employs LDA to identify concealed thematic structures and retrieves typical word patterns which improve topic models and chooses topics through a greedy algorithm that reduces redundancy to achieve suitable salience and coverage. The proposed system achieves better results than classical baselines in experiments performed on DUC 2006 and DUC 2007 datasets by outperforming Lead and CLASSY04 and KL-SUM and LexRank and TextRank and PETMSUM. The system demonstrates superior performance to all baseline methods by achieving better results in ROUGE-1 and ROUGE-2 and ROUGE-SU4 evaluation metrics. The results show that extractive summarization tasks reach their best results when topic–pattern representations work together with diversity-aware scoring methods.

Copyrights © 2026






Journal Info

Abbrev

IJAIN

Publisher

Subject

Computer Science & IT

Description

International journal of advances in intelligent informatics (IJAIN) e-ISSN: 2442-6571 is a peer reviewed open-access journal published three times a year in English-language, provides scientists and engineers throughout the world for the exchange and dissemination of theoretical and ...