Claim Missing Document
Check
Articles

Found 2 Documents
Search

Benchmarking Techniques for Real-Time Evaluation of LLMs In Production Systems Chandra, Reena; Bansal, Rishab; Lulla, Karan
International Journal of Engineering, Science and Information Technology Vol 5, No 3 (2025)
Publisher : Malikussaleh University, Aceh, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52088/ijesty.v5i3.955

Abstract

Large language models (LLMs) should perform reliably and work efficiently in today's applications that use AI chatbots, copilots, and search systems. Usually, the traditional type of benchmarking deals mainly with understanding linguistics and accurate task performance, while important factors like latency, how much memory is used, and optimisation are ignored. A benchmarking framework is proposed in this study that reviews LLMs using four critical factors: number of tokens processed per second, accuracy, peak memory usage, and Efficiency. Using the Open LLM Performance dataset, 350 open-source models were examined with standardized tools and methods across various families and sizes of parameters. According to the studies, the TinyStories-33M and OPT-19M middle-scale models are ideal for practical use because they handle many words per second without taking up much memory. ONNX Run-time uses less memory than PyTorch, and applying LLM.fp4 quantisation greatly increases throughput without a significant loss in accuracy. Visualisations and ranks are presented to help choose a production model. By following the framework, AI engineers, MLOps teams, and system architects can spot innovative models that can be built, deployed, expanded, and managed within budget. It improves LLM assessment by relating technical measures to practical limitations in real systems so that smarter choices can be made for systems used in actual operations.
Operationalizing No-Code AI: Cross-Functional Implementation and Organizational Impact Mukesh Shah, Binita; Bansal, Rishab
International Journal of Engineering, Science and Information Technology Vol 5, No 3 (2025)
Publisher : Malikussaleh University, Aceh, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52088/ijesty.v5i3.1190

Abstract

This paper explores how non-technical teams can be the form of organizational adoption and quantifiable results of the so-called no-code AI platforms. Through the sequential mixed-method design, 32 organizations in the six industries supplied data complemented by large-volume data sets such as the Stack Overflow Developer Survey (n = 73,268) and Kaggle Data Science Skills dataset (n = 25,973). Hierarchic clustering produced the following three cases of adopters: early adopters in marketing and operations, pragmatic adopters in customer service and HR, and conservative adopters in finance and legal with high adoption differences (37.82-fold asymptotic, p = 0.001). Regression analysis identified functional success predictors like, MarTech integrations of the marketing system-based system (= 0.43, P = 0.001) integration of the operations systems-based system (= 0.52, P = 0.001) and privacy protection-based HR system (= 0.56, P = 0.001). Productivity analysis showed that initial implementation cost decreased output by -7 percentage in the first month, but was compensated in 2-3 and 4-6 months on marketing/operation and other functions respectively. In twelve months, long-term returns amounted to 37 per cent marketing, 31 per cent operations and 26 customer service. Three clusters were verified by calculation of ROI: high ROI in marketing/operations (143%-217%), moderate ROI in customer service (87% -112%), delayed ROI in HR, finance, and legal (31% -64%). A tested implementation model has been constructed, which relies on the use of functional approaches, levels of governance, capability-building and integration methods with good predictive validity (R 2 = 0.71, error rate = 12%). The evidence shows that the democratization of AI can be achieved through strategic alignment, risk-sensitive governance, and role-specific training that would optimize the use of AI and its long-term organizational value.