Nabhaan, Ahmad Naufal Labiib
Unknown Affiliation

Published : 3 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 3 Documents
Search

Characterizing Hardware Utilization on Edge Devices when Inferring Compressed Deep Learning Models Nabhaan, Ahmad Naufal Labiib; Rachmanto, Rakandhiya Daanii; Setyanto, Arief
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer Vol 24 No 1 (2024)
Publisher : LPPM Universitas Bumigora

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30812/matrik.v24i1.3938

Abstract

Implementing edge AI involves running AI algorithms near the sensors. Deep Learning (DL) Model has successfully tackled image classification tasks with remarkable performance. However, their requirements for huge computing resources hinder the implementation of edge devices. Compressing the model is an essential task to allow the implementation of the DL model on edge devices. Post-training quantization (PTQ) is a compression technique that reduces the bit representation of the model weight parameters. This study looks at the impact of memory allocation on the latency of compressed DL models on Raspberry Pi 4 Model B (RPi4B) and NVIDIA Jetson Nano (J. Nano). This research aims to understand hardware utilization in central processing units (CPU), graphics processing units (GPU),and memory. This study focused on the quantitative method, which controls memory allocation and measures warm-up time, latency, CPU, and GPU utilization. Speed comparison among inference of DL models on RPi4B and J. Nano. This paper observes the correlation between hardware utilization versus the various DL inference latencies. According to our experiment, we concluded that smaller memory allocation led to high latency on both RPi4B and J. Nano. CPU utilization on RPi4B. CPU utilization in RPi4B increases along with the memory allocation; however, the opposite is shown on J. Nano since the GPU carries out the main computation on the device. Regarding computation, thesmaller DL Size and smaller bit representation lead to faster inference (low latency), while bigger bit representation on the same DL model leads to higher latency.
Characterizing Hardware Utilization on Edge Devices when Inferring Compressed Deep Learning Models Nabhaan, Ahmad Naufal Labiib; Rachmanto, Rakandhiya Daanii; Setyanto, Arief
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer Vol. 24 No. 1 (2024)
Publisher : Universitas Bumigora

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30812/matrik.v24i1.3938

Abstract

Implementing edge AI involves running AI algorithms near the sensors. Deep Learning (DL) Model has successfully tackled image classification tasks with remarkable performance. However, their requirements for huge computing resources hinder the implementation of edge devices. Compressing the model is an essential task to allow the implementation of the DL model on edge devices. Post-training quantization (PTQ) is a compression technique that reduces the bit representation of the model weight parameters. This study looks at the impact of memory allocation on the latency of compressed DL models on Raspberry Pi 4 Model B (RPi4B) and NVIDIA Jetson Nano (J. Nano). This research aims to understand hardware utilization in central processing units (CPU), graphics processing units (GPU),and memory. This study focused on the quantitative method, which controls memory allocation and measures warm-up time, latency, CPU, and GPU utilization. Speed comparison among inference of DL models on RPi4B and J. Nano. This paper observes the correlation between hardware utilization versus the various DL inference latencies. According to our experiment, we concluded that smaller memory allocation led to high latency on both RPi4B and J. Nano. CPU utilization on RPi4B. CPU utilization in RPi4B increases along with the memory allocation; however, the opposite is shown on J. Nano since the GPU carries out the main computation on the device. Regarding computation, thesmaller DL Size and smaller bit representation lead to faster inference (low latency), while bigger bit representation on the same DL model leads to higher latency.
Deep Learning Model Compression Techniques Performance on Edge Devices Rachmanto, Rakandhiya Daanii; Nabhaan, Ahmad Naufal Labiib; Setyanto, Arief
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer Vol. 23 No. 3 (2024)
Publisher : Universitas Bumigora

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30812/matrik.v23i3.3961

Abstract

Artificial intelligence at the edge can help solve complex tasks faced by various sectors such as automotive, healthcare and surveillance. However, challenged by the lack of computational power from the edge devices, artificial intelligence models are forced to adapt. Many have developed and quantified model compres-sion approaches over the years to tackle this problem. However, not many have considered the overhead of on-device model compression, even though model compression can take a considerable amount of time. With the added metric, we provide a more complete view on the efficiency of model compression on the edge. The objective of this research is identifying the benefit of compression methods and it’s tradeoff between size and latency reduction versus the accuracy loss as well as compression time in edge devices. In this work, quantitative method is used to analyze and rank three common ways of model compression: post-training quantization, unstructured pruning and knowledge distillation on the basis of accuracy, latency, model size and time to compress overhead. We concluded that knowledge distillation is the best, with potential of up to 11.4x model size reduction, and 78.67% latency speed up, with moderate loss of accura-cy and compression time.