Michelle Hiu
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

An Indonesian Chatbot for Disease Diagnosis Using Retrieval-Augmented Generation Muhammad Adrinta Abdurrazzaq; Edwin Lesmana Tjiong; Aulia Fasya; Michelle Hiu; Joses Tanuwidjaya
INOVTEK Polbeng - Seri Informatika Vol. 10 No. 3 (2025): November
Publisher : P3M Politeknik Negeri Bengkalis

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35314/9nnkn955

Abstract

The rapid advancement of Large Language Models (LLMs) has enabled their use in medical information systems, although challenges such as hallucinations, domain mismatches, and the lack of a verified knowledge base remain significant, particularly in low-source languages ​​like Indonesian. This study introduces an Indonesian-language medical chatbot based on the open-source GPT-OSS-20B model enhanced through a Retrieval-Augmented Generation (RAG) pipeline. The system combines semantic retrieval using jina-embeddings-v3, lexical re-ranking with the BM25 algorithm, and a lightweight Logistic Regression-based domain filter as an initial filter to prevent out-of-domain LLM usage. Evaluation using Indonesian medical articles and annotated patient-doctor conversations shows that the domain filter works well on synthetic data but results in misclassification of natural queries. A hybrid weighted reranker (FAISS L2 + BM25) performed the best with a Top-30 accuracy of 0.699. Black-box testing indicates that the system flow functions as designed, although the response quality has not been validated by clinical experts. These findings suggest that RAG-based open-source LLMs can improve access to Indonesian-language medical information, but still have important limitations such as the lack of clinical validation, potential errors in scraped data, and suboptimal robustness of domain filters.