Claim Missing Document
Check
Articles

Found 1 Documents
Search

Evaluating IndoGPT for Legal Queries: A Benchmark Against GPT-4 Models Palupi, Ade Cahyaning; irawan, ade
JITCE (Journal of Information Technology and Computer Engineering) Vol. 9 No. 2 (2025): Journal of Information Technology and Computer Engineering
Publisher : Universitas Andalas

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

This study evaluates a chatbot developed with the Large Language Model (LLM) IndoGPT, focusing on its use of Retrieval-Augmented Generation (RAG) to answer questions about university regulations from legal PDF documents in the Indonesian Language. IndoGPT's performance is benchmarked against the more advanced models, GPT-4 and GPT-4o. The chatbot combines RAG techniques with the LangChain framework, and its effectiveness is assessed using the Retrieval-Augmented Generation Assessment (RAGAS) framework. The dataset includes publicly available legal documents from Universitas Pertamina, with test data created by the authors. IndoGPT consistently underperforms compared to GPT-4 and GPT-4o. GPT-4 achieves superior metrics with Context Precision at 0.9027, Context Recall at 0.8693, Faithfulness at 0.8486, and Answer Relevancy at 0.8142. Similarly, GPT-4o delivers strong results with Context Precision at 0.8896, Context Recall at 0.8594, Faithfulness at 0.8804, and Answer Relevancy at 0.8773. In contrast, IndoGPT shows significant deficiencies, with much lower scores: Context Precision at 0.6687, Context Recall at 0.5711, Faithfulness at 0.0738, and Answer Relevancy at 0.1628. These findings highlight IndoGPT's substantial limitations, especially when compared to GPT-4 and GPT-4o, which excel in providing accurate, contextually relevant answers. The GPT-4-based chatbot demonstrates strong capabilities in understanding user queries and delivering accurate responses while effectively reducing hallucinations through the RAG technique.