See For Me: Real-Time Object Detection and Captioning with Audio Feedback for Visually Impaired Users Using YOLOv5 and BLIP
Main Article Content
Abstract
Individuals with visual impairments face significant obstacles when navigating new or changing environments, often putting their safety and autonomy at risk. This paper presents SeeForMe, a mobile app that delivers real-time support by incorporating object detection, contextual scene analysis, and auditory feedback. The application utilizes YOLOv5 for quick and precise object identification, paired with the BLIP framework to create descriptive captions that are communicated to users via a Text-to-Speech (TTS) interface. In contrast to existing alternatives that depend on cloud services, volunteer assistance, or costly hardware, SeeForMe is designed for on-device operation, minimizing latency and ensuring affordability. Tests conducted in both indoor and outdoor settings reveal a detection accuracy of 92%, a caption relevance score of 88%, and an average response time of 450 ms. These findings underscore the system’s potential as a viable and scalable assistive resource that improves mobility, safety, and independence for users with visual impairments.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.