Access to accurate, relevant, and timely information is crucial for prospective university students; however, conventional information services often struggle with high query volumes and the risk of generative hallucinations in automated systems. This study investigates whether reasoning-oriented large language models provide measurable improvements in response quality within a Retrieval-Augmented Generation (RAG) architecture for university admission services. The study hypothesizes that internal chain-of-thought reasoning improves factual grounding compared with non-reasoning models under identical retrieval conditions. A vector-based institutional knowledge base was constructed from 30 official admission sources using VoyageAI embeddings and evaluated on a multilingual dataset of 353 real-world inquiries in Indonesian, English, and Javanese dialects. To isolate the effect of reasoning capabilities, retrieval outputs and prompt configurations were controlled across all models. Performance was evaluated using the RAGAS framework across six models categorized as reasoning (DeepSeek-R1, Gemini-2.5-Flash, o4-mini) and non-reasoning (DeepSeek-V3, Gemini-2.0-Flash, GPT-4o-mini). The results show that reasoning models achieved a higher average RAGAS score (0.7772) than non-reasoning models (0.7289), representing a 6.63% improvement, with the largest gain observed in factual correctness (+15.95%). Additional multilingual benchmarking confirmed that reasoning models maintain more stable performance across languages. Gemini-2.5-Flash achieved the highest composite score (0.8207) while maintaining favorable cost efficiency. These findings indicate that reasoning-enabled models significantly improve factual reliability in domain-specific RAG systems, although overall system performance remains strongly dependent on retrieval quality.
Copyrights © 2026