Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Journal of Information Systems Engineering and Business Intelligence

Academic Guidebook Chatbot: Performance Comparison of Fine-Tuned Mistral 7B and LlaMA-2 7B Rachman, Davied Indra; Akbar, Agus Subhan; Sabilla, Alzena Dona
Journal of Information Systems Engineering and Business Intelligence Vol. 11 No. 3 (2025): October
Publisher : Universitas Airlangga

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.20473/jisebi.11.3.383-392

Abstract

Background: Chatbot is recently ranked as the main technological solution due to the high demand for fast and efficient information retrieval. Therefore, this study was carried out to develop a local document-based chatbot that can answer questions related to the contents of PDF documents using open-source AI models such as Mistral 7B and LLaMA-2 7B. Although these models were effective at processing natural language, a major challenge was observed in the tendency to generate hallucinated answers, characterized by having inaccuracies and being out of context. Objective: This study aims to reduce hallucinatory responses from chatbot models by making their responses more precise and accurate through fine-tuning. The performance of fine-tuned models (Mistral 7B and LLaMA-2 7B) was also compared. Methods: Fine-tuning of the two models was performed using domain-specific datasets taken from Academic Guidebook. This process was conducted to improve models ability to understand and answer questions relevant to Academic Guidebook context. Performance was evaluated using METEOR Score to measure literal agreement and BERTScore to assess meaning agreement. In addition, response time was measured to assess efficiency, while chatbot system was developed using Streamlit and LangChain for real-time interaction. Results: Fine-tuned Mistral 7B model achieved the highest METEOR value of 0.40 and F1 of 0.78 based on BERTScore results. Regarding efficiency, fine-tuned Mistral 7B showed a faster response time than LLaMA-2. Meanwhile, the non-fine-tuned Mistral 7B and LLaMA-2 7B showed a longer response time than fine-tuned Mistral 7B and LLaMA-2 7B. Conclusion: The results showed that the enhancements significantly improved the performance of large language models in specific tasks, reduced hallucinations, and enhanced response quality Keywords: Chatbot, Large Language Model, Mistral 7B, LLaMA-2 7B, METEOR Score