Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : ILKOMNIKA: Journal of Computer Science and Applied Informatics

Fine-Tuned BART Transfer Learning for Abstractive Summarization of Indonesian YouTube Transcripts with ROUGE Evaluation Setiawan, Diyan Nova; Faisal, Muhammad; Imamudin, Mochamad
ILKOMNIKA Vol 8 No 1 (2026): Volume 8, Number 1, April 2026
Publisher : Lembaga Penelitian dan Pengabdian Masyarakat

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.28926/ilkomnika.v8i1.862

Abstract

Indonesian YouTube transcripts present substantial challenges for abstractive summarization because they contain informal expressions, filler words, fragmented utterances, and automatically generated caption errors. Existing Indonesian summarization studies have mostly focused on formal written texts such as news articles, while duration-aware abstractive summarization of noisy Indonesian educational video transcripts remains underexplored. This study fine-tunes a BART-based sequence-to-sequence model using a curated corpus of Indonesian YouTube transcripts from the “Kok Bisa?” educational channel. From 1,000 collected videos, 957 transcripts were successfully retrieved, and 730 transcripts passed the final filtering criteria for experimental analysis. The dataset was divided into short, medium, and long transcript categories to evaluate the effect of input duration on summarization quality. The proposed pipeline includes transcript retrieval, metadata extraction, text normalization, filler-word removal, repetition filtering, BART fine-tuning, summary generation, and ROUGE-based evaluation. The model achieved the best performance on short transcripts, with ROUGE-1 F1 = 0.621, ROUGE-2 F1 = 0.438, and ROUGE-L F1 = 0.587. Performance decreased on long transcripts, with ROUGE-1 F1 = 0.552, ROUGE-2 F1 = 0.384, and ROUGE-L F1 = 0.509, indicating that longer narratives reduce lexical and structural alignment. These findings show that fine-tuned BART is effective for short Indonesian educational transcripts but requires segmentation, semantic evaluation, and stronger baseline comparison for long-form video summarization.