Summarizing text in multi-documents requires choosing important sentences which are more complex than in one document because there is different information which results in contradictions and redundancy of information. The process of selecting important sentences can be done by scoring sentences that consider the main information. The combination of features is carried out for the process of scoring sentences so that sentences with high scores become candidates for summary. The centroid approach provides an advantage in obtaining key information. However, the centroid approach is still limited to information close to the center point. The addition of positional features provides increased information on the importance of a sentence, but positional features only focus on the main position. Therefore, researchers use the keyword feature as a research contribution that can provide additional information on important words in the form of N-grams in a document. In this study, the centroid, position, and keyword features were combined for a scoring process which can provide increased performance for multi-document news data and reviews. The test results show that the addition of keyword features produces the highest value for news data DUC2004 ROUGE-1 of 35.44, ROUGE-2 of 7.64, ROUGE-L of 37.02, and BERTScore of 84.22. While the Amazon review data was obtained with ROUGE-1 of 32.24, ROUGE-2 of 6.14, ROUGE-L of 34.77, and BERTScore of 85.75. The ROUGE and BERScore values outperform the other unsupervised models.
Copyrights © 2023