Indonesian Journal of Innovation Studies
Vol. 27 No. 1 (2026): January

Comparative Performance of Retrieval Augmented Generation Tourism Chatbots: Kinerja Komparatif Retrieval Augmented Generation pada Chatbot Pariwisata

Farizi, Amar Al (Unknown)
Arsi, Primandani (Unknown)
Subarkah, Pungkas (Unknown)



Article Info

Publish Date
07 Jan 2026

Abstract

General Background: The rapid adoption of artificial intelligence in smart tourism has increased the use of contextual chatbots to deliver destination information efficiently. Specific Background: However, tourism chatbots based on Large Language Models frequently encounter information hallucination, reducing reliability when handling dynamic and local tourism data. Knowledge Gap: Existing studies mainly focus on rule-based or single-model chatbot implementations and provide limited comparative evaluation of Retrieval Augmented Generation configurations combining embedding models and Large Language Models. Aims: This study aims to comparatively evaluate multiple Retrieval Augmented Generation configurations to identify the most suitable combination for contextual tourism chatbots and to analyze differences between large multilingual and small monolingual embedding models using a local tourism dataset. Results: Experimental evaluation using data from 49 tourist destinations in Banyumas Regency shows that the Multilingual-E5-Large embedding model consistently achieves perfect Precision, Recall, and F1-Score across all tested Large Language Models. The combination of Multilingual-E5-Large and GPT-4.1-Mini demonstrates the most balanced performance, achieving a BERTScore F1 of 0.7515 with an average response time of 1.555 seconds. Novelty: This research provides a systematic comparative assessment of embedding capacity and Large Language Model selection within a unified Retrieval Augmented Generation framework for tourism chatbots. Implications: The findings offer practical guidance for selecting model configurations that ensure accurate retrieval, high-quality responses, and efficient system performance in contextual tourism information services. Highlights • Multilingual embedding models deliver consistently higher retrieval accuracy across all tested configurations• GPT-4.1-Mini produces the most balanced generative quality and response latency• Embedding model selection plays a more decisive role than language model variation Keywords Retrieval Augmented Generation; Tourism Chatbot; Large Language Model; Embedding Model; Comparative Evaluation

Copyrights © 2026






Journal Info

Abbrev

ijins

Publisher

Subject

Computer Science & IT Education Engineering Law, Crime, Criminology & Criminal Justice

Description

Indonesian Journal of Innovation Studies (IJINS) is a peer-reviewed journal published by Universitas Muhammadiyah Sidoarjo four times a year. This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global ...