G-Tech : Jurnal Teknologi Terapan
Vol 9 No 2 (2025): G-Tech, Vol. 9 No. 2 April 2025

Compressing Large Language Models (LLMs) using Knowledge Distillation for Optimizing Inference Time and Model Size

Tarecha, Rachmad Imam (Unknown)
Choirina, Priska (Unknown)
Septarina, Amalia Agung (Unknown)



Article Info

Publish Date
26 Apr 2025

Abstract

Large Language Models (LLMs) contain a vast number of parameters and are significantly large in size. For instance, the DeepSeek-V3 model consists of approximately 671 billion parameters and has a file size of up to 720GB. The sheer number of parameters in LLMs reflects their high complexity, which can serve as both an advantage and a drawback, particularly when deployed in environments with limited computational resources. This study focuses on compressing a custom-built lightweight model using knowledge distillation techniques applied to LLMs. The results indicate that the model’s parameters can be reduced by up to 94.18%, its file size by up to 71.00%, and its inference time by up to 1.13%. Notably, despite these reductions, the model remains capable of performing specialized tasks with satisfactory accuracy. This finding underscores the potential of knowledge distillation as an effective method for reducing model size while maintaining operational efficiency, particularly in scenarios where computational constraints lead to mismatched capabilities. Efficiency in knowledge distillation is achieved through a combination of model size reduction and the alignment of computational capacity with task-specific requirements.

Copyrights © 2025






Journal Info

Abbrev

g-tech

Publisher

Subject

Computer Science & IT Decision Sciences, Operations Research & Management Energy Engineering

Description

Jurnal G-Tech bertujuan untuk mempublikasikan hasil penelitian asli dan review hasil penelitian tentang teknologi dan terapan pada ruang lingkup keteknikan meliputi teknik mesin, teknik elektro, teknik informatika, sistem informasi, agroteknologi, ...