Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Jurnal Sisfokom (Sistem Informasi dan Komputer)

Comparative Analysis of RAG-Based Open-Source LLMs for Indonesian Banking Customer Service Optimization Using Simulated Data Lijaya, Hendra; Ho, Patricia; Santoso, Handri
Jurnal Sisfokom (Sistem Informasi dan Komputer) Vol. 14 No. 3 (2025): JULY
Publisher : ISB Atma Luhur

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32736/sisfokom.v14i3.2383

Abstract

In the digital era, banks face challenges in delivering fast, accurate, and efficient customer service, especially for frequently asked simple questions. This study evaluates the effectiveness of three open-source Large Language Models (LLMs), namely Gemma2-9B-Sahabat-AI, Qwen2.5-14B-Instruct, and Mistral-Nemo-Instruct in supporting a Retrieval-Augmented Generation (RAG) question-answering system for the banking sector. Using 12,000 synthetic billing documents indexed with intfloat/multilingual-e5-large-instruct embeddings (1024 dimensions), model performance was assessed via semantic similarity metrics, LLM-as-a-Judge scores (GPT-4o-mini and Gemini 2.0 Flash), and human validation Gemma2-9B-Sahabat-AI achieved the highest semantic similarity score (0.9627), followed by Mistral (0.9614) and Qwen2.5 (0.9284). In LLM-as-a-Judge evaluations, Qwen2.5 ranked highest on GPT-4o-mini (92.2), while Gemma2 led under Gemini 2.0 Flash (88.4). Human evaluators gave perfect scores for factual questions (1–10), but all models struggled with arithmetic in question 13. Gemma2’s average response time was 41 seconds, faster than Qwen2.5’s 72 seconds and Mistral’s 48 seconds, confirming Gemma2’s balanced performance in accuracy, speed, and computational efficiency. These findings underscore the potential of locally operated open-source LLMs for banking applications, ensuring privacy and regulatory compliance. However, limitations include reliance on synthetic data, a narrow question set, and lack of user diversity. Future research should involve broader queries, real user testing, and numeric reasoning modules to ensure robust and scalable deployment in real-world banking customer service environments.