Large Language Models (LLMs) contain a vast number of parameters and are significantly large in size. For instance, the DeepSeek-V3 model consists of approximately 671 billion parameters and has a file size of up to 720GB. The sheer number of parameters in LLMs reflects their high complexity, which can serve as both an advantage and a drawback, particularly when deployed in environments with limited computational resources. This study focuses on compressing a custom-built lightweight model using knowledge distillation techniques applied to LLMs. The results indicate that the model’s parameters can be reduced by up to 94.18%, its file size by up to 71.00%, and its inference time by up to 1.13%. Notably, despite these reductions, the model remains capable of performing specialized tasks with satisfactory accuracy. This finding underscores the potential of knowledge distillation as an effective method for reducing model size while maintaining operational efficiency, particularly in scenarios where computational constraints lead to mismatched capabilities. Efficiency in knowledge distillation is achieved through a combination of model size reduction and the alignment of computational capacity with task-specific requirements.
Copyrights © 2025