This paper presents the development of a low-cost learning assistant embedded in an NVIDIA Jetson Xavier board that uses speech and gesture recognition, together with a long language model for offline work. Using the large language model (LLM) Phi-3 Mini (3.8B) model and the Whisper (model base) model for automatic speech recognition, a learning assistant is obtained under a compact and efficient design based on extensive language model architectures that give a general answer set of a topic. Average processing times of 0.108 seconds per character, a speech transcription efficiency of 94.75%, an average accuracy of 9.5/10 and 8.5/10 in the consistency of the responses generated by the learning assistant, a full recognition of the hand raising gesture when done for at least 2 seconds, even without fully extending the fingers, were obtained. The prototype is based on the design of a graphical interface capable of responding to voice commands and generating dynamic interactions in response to the user's gesture detection, representing a significant advance towards the creation of comprehensive and accessible human-machine interface solutions.
Copyrights © 2025