The inability of blind or partially-sighted people to understand visual content and real-life situations reduces their standard of living, especially in a world mainly tailored for sighted individuals. Despite the progress made by certain devices to assist them in using touch, sound, or other senses, these solutions often fall short of bridging the comprehension gap. Our work proposes an intuitive, user-friendly mobile-based framework named "SeeAround" that is capable of automatically providing real-time audio descriptions of the user's immediate visual surroundings. Our solution addresses this challenge by leveraging key point detection, image captioning, text-to-speech (TTS), optical character recognition (OCR), and translation algorithms to offer comprehensive support for visually impaired individuals. Our system architecture relies on convolutional neural networks (CNNs) such as Inception-V3, Inception-V4, and ResNet152-V2 to extract detailed features from images and employs a multi-gated recurrent unit (GRU) decoder to generate word-by-word natural language descriptions. Our framework was integrated into mobile applications and optimized with TensorFlow lite pre-trained models for easy integration on the Android platform.
Copyrights © 2025