International Journal Science and Technology (IJST)
Vol. 3 No. 3 (2024): November: International Journal Science and Technology

Efficient TinyML Architectures for On-Device Small Language Models: Privacy-Preserving Inference at the Edge

Mangesh Pujari (Unknown)
Anshul Goel (Unknown)
Anil Kumar Pakina (Unknown)



Article Info

Publish Date
28 Nov 2024

Abstract

Deploying small language models (SLMs) on ultra-low-power edge devices requires careful optimization to meet strict memory, latency, and energy constraints while preserving privacy. This paper presents a systematic approach to adapting SLMs for Tiny ML, focusing on model compression, hardware-aware quantization, and lightweight privacy mechanisms. We introduce a sparse ternary quantization technique that reduces model size by 5.8× with minimal accuracy loss and an efficient federated fine-tuning method for edge deployment. To address privacy concerns, we implement on-device differential noise injection during text preprocessing, adding negligible computational overhead. Evaluations on constrained devices (Cortex-M7 and ESP32) show our optimized models achieve 92% of the accuracy of full-precision baselines while operating within 256KB RAM and reducing inference latency by 4.3×. The proposed techniques enable new applications for SLMs in always-on edge scenarios where both efficiency and data protection are critical.

Copyrights © 2024






Journal Info

Abbrev

IJST

Publisher

Subject

Computer Science & IT Control & Systems Engineering Electrical & Electronics Engineering

Description

International Journal Science and Technology (IJST) is a scientific journal that presents original articles about research knowledge and information or the latest research and development applications in the field of technology. The scope of the IJST Journal covers the fields of Informatics, ...