Garuda - Garba Rujukan Digital

Journal of Information Systems and Informatics

Vol 7 No 3 (2025): September

Xin, Qi (Unknown)

Publish Date
22 Sep 2025

Large Language Models (LLMs) have achieved remarkable success across natural language tasks, but their enormous computational requirements pose challenges for practical deployment. This paper proposes a hybrid cloud–edge architecture to deploy LLMs in a cost-effective and efficient manner. The proposed system employs a lightweight on-premise LLM to handle the bulk of user requests, and dynamically offloads complex queries to a powerful cloud-hosted LLM only when necessary. We implement a confidence-based routing mechanism to decide when to invoke the cloud model. Experiments on a question-answering use case demonstrate that our hybrid approach can match the accuracy of a state-of-the-art LLM while reducing cloud API usage by over 60%, resulting in significant cost savings and a ~40% reduction in average latency. We also discuss how the hybrid strategy enhances data privacy by keeping sensitive queries on-premise. These results highlight a promising direction for organizations to leverage advanced LLM capabilities without prohibitive expense or risk, by intelligently combining local and cloud resources.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Journal of Information Systems and Informatics

Website

Abbrev

isi

Publisher

Asosiasi Doktor Sistem Informasi Indonesia

Subject

Computer Science & IT

Description

Journal-ISI is a scientific article journal that is the result of ideas, great and original thoughts about the latest research and technological developments covering the fields of information systems, information technology, informatics engineering, and computer science, and industrial engineering ...

Article Info

Abstract

Hybrid Cloud Architecture for Efficient and Cost-Effective Large Language Model Deployment

Article Info

Abstract