Automatic text summarization is an effective solution to manage the vast amount of information in the digital age. This study aims to develop an extractive text summarization system for Indonesian news articles using sentence embeddings generated by IndoBERT and mBERT, combined with TextRank and LexRank algorithms for sentence ranking. The dataset used is Indonesian Text Summarization (IndoSum), which contains thousands of manually summarized articles. The research includes data collection, cleaning, preprocessing, embedding extraction, sentence similarity calculation, and ranking using graph-based methods. Model performance was evaluated using ROUGE and BERTScore. The results show that the combination of IndoBERT and LexRank achieved the highest performance with ROUGE-1 score 0.7018 and BERTscore 0.8696. The model was then implemented into a web prototype using Streamlit to allow users to summarize texts interactively. This study contributes to the advancement of automatic summarization technology for the Indonesian language.
Copyrights © 2025