Retrieval-augmented generation (RAG) systems promise grounded answers from large language models (LLMs), yet performance depends critically on how source documents are segmented before indexing. This study investigates how pre-index chunking strategies affect both retrieval accuracy and answer quality in domain-specific scenarios. We curated a corpus on software-as-a-service (SaaS) editorial content and constructed a high-quality evaluation dataset containing 2,419 question-answer (QA) pairs generated through automated prompting and quality control. We compared four chunking approaches, including fixed-size, structure-aware recursive, semantic, and LLM-based methods. Our evaluation protocol assessed retrieval through document localization, semantic similarity, and context relevance, while generation quality was evaluated using chain-of-thought (CoT) criteria driven by judgments from LLMs. Results demonstrate that recursive chunking consistently outperforms other approaches across all metrics. Smaller chunks improve document localization, while moderately larger chunks enhance semantic alignment and generation scores. LLM based chunking variants show competitive performance but do not exceed top recursive configurations on the dataset. These findings indicate that preserving document structure through recursive chunking is beneficial for practical RAG implementations, providing actionable guidance for chunk size selection while highlighting token-budget constraints in current long context models.
Copyrights © 2026