Retrieval-Augmented Generation (RAG) is a hybrid method that combines information retrieval with large language models to generate context-aware, factually grounded responses. However, the RAG system relies heavily on well-structured input data to generate accurate and contextually relevant responses. Documents with complex table layouts pose significant challenges, as most chunking strategies are text-centric and often overlook table-rich documents containing multi-column and multi-row structures. Hence, this study proposes a customized structure-aware chunking framework specifically designed for university course documents containing multi-column, multi-row tables with nested headers. The framework employs Camelot for high-fidelity table extraction, followed by customized logic that constructs semantically coherent chunks by preserving academic term, subject name, credit hour, and category. This prevents semantic fragmentation during retrieval. The proposed method is evaluated using the RAGAS framework and compared against several baselines using standard parsing and chunking techniques. Results show that the proposed approach achieves the highest answer accuracy of 0.73 and substantially improves retrieval relevance and contextual precision. These findings demonstrate the frameworkâs effectiveness in handling structure-dependent academic queries. This study highlights that ensuring both parsing quality and chunking strategy is essential to retain semantic relationships in table-rich documents, offering a practical improvement for RAG systems in structurally complex scenarios.
Copyrights © 2026