This study designed and evaluated a hybrid chatbot for a domain-specific application by addressing two main issues: limited NLU coverage and the variability of latency and cost when all queries are routed directly to an LLM. The proposed solution integrates a deterministic Rasa-based pipeline with a DeepSeek fallback mechanism. In this architecture, Rasa handles NLU processing, rules, stories, and context storage for mk and jk, while the LLM is only invoked when the NLU confidence score falls below a defined threshold. The methodology includes end-to-end implementation through a Node.js bridge connected to Rasa, functional testing to validate the intent–entity–action flow, and performance testing using load (stress) testing across two access paths: the Rasa REST endpoint and the Node-to-Rasa bridge. Meanwhile, the LLM pipeline was profiled separately through instrumented action calls. The results indicate that domain-specific conversations were successfully answered using curated knowledge, and both deterministic access paths met the service level objective (SLO), achieving a median latency of approximately 32 milliseconds with no observed errors. This study contributes by demonstrating that a hybrid chatbot architecture separating deterministic and generative pipelines can maintain SLO compliance in domain-specific settings. In addition, it highlights limitations of LLMs in understanding domain ontologies, reinforcing the need for semantic guardrails.
Copyrights © 2026