The rapid advancement of language-based artificial intelligence, particularly Large Language Models, has enabled the development of adaptive and context-aware virtual assistants. This research aims to develop an interactive chatbot for Politeknik Negeri Padang utilizing the Sahabat-AI model (Gemma2 9B CPT), a large-scale model specialized in Bahasa Indonesia and local dialects (Javanese, Sundanese), combined with the Retrieval-Augmented Generation (RAG) method to enhance document-based answer accuracy. The system architecture integrates a Streamlit-based user interface supporting text/voice input and multilingual output, an automated web-scraping module using Scrapy to update institutional data, a structured knowledge base in Supabase, and a semantic vector search with FAISS. The development process followed a systematic design and implementation approach, with the RAG pipeline incorporating all-indo-e5-small-v4 embeddings to ensure semantic relevance. Performance evaluation using LangSmith demonstrated that Sahabat-AI outperformed Llama 3, achieving an average score of 0.84 (correctness: 0.89, relevance: 0.90, groundedness: 0.80, retrieval quality: 0.77) in Indonesian language testing. The chatbot exhibited strong local language understanding, scoring 0.74 for Javanese and 0.71 for Sundanese, while reducing hallucinations through RAG integration. Black-box testing confirmed the reliability of multimodal features such as speech-to-text and text-to-speech. The findings contribute to the development of the first Sahabat-AIābased multilingual chatbot for Politeknik Negeri Padang, integrating automated document retrieval and embedding pipelines for efficient information services.
Copyrights © 2026