Garuda - Garba Rujukan Digital

International Journal of Engineering, Science and Information Technology

Vol 5, No 3 (2025)

Chandra, Reena (Unknown)
Bansal, Rishab (Unknown)
Lulla, Karan (Unknown)

Publish Date
26 Jun 2025

Large language models (LLMs) should perform reliably and work efficiently in today's applications that use AI chatbots, copilots, and search systems. Usually, the traditional type of benchmarking deals mainly with understanding linguistics and accurate task performance, while important factors like latency, how much memory is used, and optimisation are ignored. A benchmarking framework is proposed in this study that reviews LLMs using four critical factors: number of tokens processed per second, accuracy, peak memory usage, and Efficiency. Using the Open LLM Performance dataset, 350 open-source models were examined with standardized tools and methods across various families and sizes of parameters. According to the studies, the TinyStories-33M and OPT-19M middle-scale models are ideal for practical use because they handle many words per second without taking up much memory. ONNX Run-time uses less memory than PyTorch, and applying LLM.fp4 quantisation greatly increases throughput without a significant loss in accuracy. Visualisations and ranks are presented to help choose a production model. By following the framework, AI engineers, MLOps teams, and system architects can spot innovative models that can be built, deployed, expanded, and managed within budget. It improves LLM assessment by relating technical measures to practical limitations in real systems so that smarter choices can be made for systems used in actual operations.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

International Journal of Engineering, Science and Information Technology

Website

Abbrev

ijesty

Publisher

Universitas Malikussaleh

Subject

Astronomy Biochemistry, Genetics & Molecular Biology Chemical Engineering, Chemistry & Bioengineering Chemistry Civil Engineering, Building, Construction & Architecture Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management Earth & Planetary Sciences Education Electrical & Electronics Engineering Energy Engineering Industrial & Manufacturing Engineering Library & Information Science Materials Science & Nanotechnology Mathematics Mechanical Engineering Physics Social Sciences Transportation

Description

The journal covers all aspects of applied engineering, applied Science and information technology, that is: Engineering: Energy Mechanical Engineering Computing and Artificial Intelligence Applied Biosciences and Bioengineering Environmental and Sustainable Science and Technology Quantum Science and ...

Article Info

Abstract

Benchmarking Techniques for Real-Time Evaluation of LLMs In Production Systems

Article Info

Abstract