The growing volume of virtual meetings has increased the need for effective long-document summarization systems that capture essential discussion points from lengthy transcripts. However, existing transformer-based models often struggle to handle long-context inputs and require substantial computational resources for fine-tuning. Moreover, prior work provides limited comparative analysis of full fine-tuning and parameter-efficient fine-tuning (PEFT) specifically for meeting summarization tasks. This study systematically evaluates three long-sequence Transformer architectures—LongT5, BigBird, and LED—on the MeetingBank dataset using both full fine-tuning and PEFT strategies. Models are assessed through ROUGE scores, BERTScore, parameter efficiency, and qualitative error analysis. Experimental results show that LongT5 with full fine-tuning achieves the best performance (ROUGE-1 = 0.675, BERTScore F1 = 0.921), outperforming BigBird as the next-best model by 31.6% in ROUGE-1. PEFT reduces trainable parameters by over 90% and remains competitive only for LongT5 (ROUGE-1 = 0.543, BERTScore F1 = 0.872), while BigBird and LED experience severe degradation, producing semantically weak and incoherent summaries despite low validation loss. These findings highlight that PEFT effectiveness is highly model-dependent and that validation loss alone is an unreliable indicator of generative quality. The study contributes a comprehensive benchmarking analysis and practical insights into optimizing long-document meeting summarization under computational constraints.
Copyrights © 2025