The research develops a technology-driven solution to enhance Over-The-Top (OTT) services for Smart TVs by leveraging advanced speech recognition, video analysis, and natural language processing technologies. The system incorporates TransNetV2 for AI-based scene boundary detection, Porcupine for hotword detection, and cutting-edge Automatic Speech Recognition (ASR) engines including Vosk, Whisper, and DeepSpeech for real-time speech-to-text conversion. Natural Language Processing (NLP) employs BERT and spaCy to interpret user intent and temporal commands from spoken instructions. Video content undergoes processing through FFmpeg and OpenCV for frame manipulation and visualization, while implementing intelligent content classification and scene understanding via YOLO and ResNet. The platform architecture combines Flutter for cross-platform deployment across Smart devices with a Python Flask backend ensuring seamless module integration and operational functionality. Testing results demonstrate the system's capability to execute real-time, hands-free media control while delivering an intuitive and accessible user experience for contemporary OTT applications.
Copyrights © 2025