Claim Missing Document
Check
Articles

Found 2 Documents
Search

Benchmarking Techniques for Real-Time Evaluation of LLMs In Production Systems Chandra, Reena; Bansal, Rishab; Lulla, Karan
International Journal of Engineering, Science and Information Technology Vol 5, No 3 (2025)
Publisher : Malikussaleh University, Aceh, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52088/ijesty.v5i3.955

Abstract

Large language models (LLMs) should perform reliably and work efficiently in today's applications that use AI chatbots, copilots, and search systems. Usually, the traditional type of benchmarking deals mainly with understanding linguistics and accurate task performance, while important factors like latency, how much memory is used, and optimisation are ignored. A benchmarking framework is proposed in this study that reviews LLMs using four critical factors: number of tokens processed per second, accuracy, peak memory usage, and Efficiency. Using the Open LLM Performance dataset, 350 open-source models were examined with standardized tools and methods across various families and sizes of parameters. According to the studies, the TinyStories-33M and OPT-19M middle-scale models are ideal for practical use because they handle many words per second without taking up much memory. ONNX Run-time uses less memory than PyTorch, and applying LLM.fp4 quantisation greatly increases throughput without a significant loss in accuracy. Visualisations and ranks are presented to help choose a production model. By following the framework, AI engineers, MLOps teams, and system architects can spot innovative models that can be built, deployed, expanded, and managed within budget. It improves LLM assessment by relating technical measures to practical limitations in real systems so that smarter choices can be made for systems used in actual operations.
Factory-Grade Diagnostic Automation for GeForce and Data Centre GPUs Lulla, Karan; Chandra, Reena; Ranjan, Kishore
International Journal of Engineering, Science and Information Technology Vol 5, No 3 (2025)
Publisher : Malikussaleh University, Aceh, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52088/ijesty.v5i3.1089

Abstract

The growing deployment of Graphics Processing Units (GPUs) across data centers, AI workloads, and cryptocurrency mining operations has elevated the importance of scalable, accurate, and real-time diagnostic mechanisms for hardware quality assurance (QA). Traditional factory QA processes are manual, time-consuming, and lack adaptability to subtle performance degradation. This study proposes an automated diagnostic pipeline that leverages publicly available GPU telemetry-like data, including hashrate, power draw, and efficiency metrics, to simulate factory-grade fault detection. Using the Kaggle “GPU Performance and Hashrate” dataset, we implement a machine learning-based framework combining XGBoost for anomaly classification and Long Short-Term Memory (LSTM) neural networks for temporal efficiency forecasting. Anomalies are heuristically labeled by identifying GPUs in the bottom 10% of the efficiency distribution, simulating fault flags. The XGBoost model achieves perfect accuracy on the test set with full interpretability via SHAP values, while the LSTM model captures degradation trends with low training loss and forecast visualizations. The framework is implemented in Google Colab to ensure accessibility and reproducibility. Diagnostic outputs include efficiency analysis, prediction overlays, and automated GPU health reports. Comparative results show higher efficiency variance in GeForce GPUs versus the more stable performance of data center models, highlighting hardware class differences. While limitations exist, such as reliance on simulated labels and static time windows, the study demonstrates the feasibility of ML-driven, scalable diagnostics using real-world data. This approach has direct applications in early fault detection, GPU fleet management, and embedded QA systems in both production and deployment environments.