Voice assistants are increasingly used for smart building control, yet cloud-based architectures raise privacy risks and become unavailable during internet outages. This study designs and evaluates a fully offline AIoT voice assistant for smart building management using local speech and language models. The system employs an edge audio node (Raspberry Pi Zero 2W with ReSpeaker 2-Mics Pi HAT) and a local GPU server running containerized microservices for speech-to-text (Whisper), intent understanding and action planning (Ollama-hosted LLMs), and text-to-speech (Piper). Building devices and sensors are integrated through Home Assistant, enabling voice-driven control and monitoring without sending audio or interaction logs to external services. Experiments in a laboratory smart-building testbed evaluate speech recognition robustness under varying noise levels, LLM command understanding accuracy and memory footprint, and end-to-end IoT task execution. The speech subsystem achieves a Word Error Rate of 5–20% depending on background noise. Across 33 IoT entities, the assistant reaches a 96.67% execution success rate with an average response time of 5.5 s. Among the evaluated local models, Qwen3 8B achieves the highest intent-to-action accuracy (Acc_I2A=100% on an oracle-text command test set with N=43) with 6.8 GB memory use. The results demonstrate that privacy-preserving and resilient voice interaction for smart building management is feasible using current local LLM stacks.
Copyrights © 2026