Rakandhiya Daanii Rachmanto
Universitas AMIKOM, Yogyakarta, Indonesia

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Deep Learning Model Compression Techniques Performance on Edge Devices Rakandhiya Daanii Rachmanto; Ahmad Naufal Labiib Nabhaan; Arief Setyanto
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer Vol 23 No 3 (2024)
Publisher : LPPM Universitas Bumigora

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30812/matrik.v23i3.3961

Abstract

Artificial intelligence at the edge can help solve complex tasks faced by various sectors such as automotive, healthcare and surveillance. However, challenged by the lack of computational power from the edge devices, artificial intelligence models are forced to adapt. Many have developed and quantified model compres-sion approaches over the years to tackle this problem. However, not many have considered the overhead of on-device model compression, even though model compression can take a considerable amount of time. With the added metric, we provide a more complete view on the efficiency of model compression on the edge. The objective of this research is identifying the benefit of compression methods and it’s tradeoff between size and latency reduction versus the accuracy loss as well as compression time in edge devices. In this work, quantitative method is used to analyze and rank three common ways of model compression: post-training quantization, unstructured pruning and knowledge distillation on the basis of accuracy, latency, model size and time to compress overhead. We concluded that knowledge distillation is the best, with potential of up to 11.4x model size reduction, and 78.67% latency speed up, with moderate loss of accura-cy and compression time.
Deep Learning Model Compression Techniques Performance on Edge Devices Rakandhiya Daanii Rachmanto; Ahmad Naufal Labiib Nabhaan; Arief Setyanto
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer Vol. 23 No. 3 (2024)
Publisher : LPPM Universitas Bumigora

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30812/matrik.v23i3.3961

Abstract

Artificial intelligence at the edge can help solve complex tasks faced by various sectors such as automotive, healthcare and surveillance. However, challenged by the lack of computational power from the edge devices, artificial intelligence models are forced to adapt. Many have developed and quantified model compres-sion approaches over the years to tackle this problem. However, not many have considered the overhead of on-device model compression, even though model compression can take a considerable amount of time. With the added metric, we provide a more complete view on the efficiency of model compression on the edge. The objective of this research is identifying the benefit of compression methods and it’s tradeoff between size and latency reduction versus the accuracy loss as well as compression time in edge devices. In this work, quantitative method is used to analyze and rank three common ways of model compression: post-training quantization, unstructured pruning and knowledge distillation on the basis of accuracy, latency, model size and time to compress overhead. We concluded that knowledge distillation is the best, with potential of up to 11.4x model size reduction, and 78.67% latency speed up, with moderate loss of accura-cy and compression time.