Enhancing Mobility for the Blind: An AI-Powered Bus Route Recognition System

Shafique, Shehzaib; Bailo, Gian Luca; Gori, Monica; Sciortino, Giulio; Del Bue, Alessio

doi:10.3390/a18100616

Open AccessArticle

Enhancing Mobility for the Blind: An AI-Powered Bus Route Recognition System

by

Shehzaib Shafique

^1,*

,

Gian Luca Bailo

²,

Monica Gori

¹,

Giulio Sciortino

² and

Alessio Del Bue

²

¹

Unit for Visually Impaired People (U-VIP), Italian Institute of Technology, 16152 Genova, Italy

²

Pattern Analysis and Computer Vision (PAVIS), Italian Institute of Technology, 16152 Genova, Italy

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(10), 616; https://doi.org/10.3390/a18100616

Submission received: 26 August 2025 / Revised: 23 September 2025 / Accepted: 26 September 2025 / Published: 30 September 2025

(This article belongs to the Section Combinatorial Optimization, Graph, and Network Algorithms)

Download

Browse Figures

Versions Notes

Abstract

Vision is a critical component of daily life, and its loss significantly hinders an individual’s ability to navigate, particularly when using public transportation systems. To address this challenge, this paper introduces a novel approach for accurately identifying bus route numbers and destinations, designed to assist visually impaired individuals in navigating urban transit networks. Our system integrates object detection, image enhancement, and Optical Character Recognition (OCR) technologies to achieve reliable and precise recognition of bus information. We employ a custom-trained You Only Look Once version 8 (YOLOv8) model to isolate the front portion of buses as the region of interest (ROI), effectively eliminating irrelevant text and advertisements that often lead to errors. To further enhance accuracy, we utilize the Enhanced Super-Resolution Generative Adversarial Network (ESRGAN) to improve image resolution, significantly boosting the confidence of the OCR process. Additionally, a post-processing step involving a pre-defined list of bus routes and the Levenshtein algorithm corrects potential errors in text recognition, ensuring reliable identification of bus numbers and destinations. Tested on a dataset of 120 images featuring diverse bus routes and challenging conditions such as poor lighting, reflections, and motion blur, our system achieved an accuracy rate of 95%. This performance surpasses existing methods and demonstrates the system’s potential for real-world application. By providing a robust and adaptable solution, our work aims to enhance public transit accessibility, empowering visually impaired individuals to navigate cities with greater independence and confidence.

Keywords:

Bus route detection; YOLOv8; ESRGAN; EasyOCR; visual impairment; blind navigation; assistive technology

1. Introduction

The most prevalent sense in our body, vision, is essential for many facets and phases of life. While it is sometimes taken for granted, its absence seriously hinders our capacity to learn, navigate, read, carry out daily duties, and work. About 40 million people worldwide are blind, and another 250 million have some kind of vision impairment, according to the World Health Organization (WHO). Due to the aging population and the increased incidence of diseases like diabetic retinopathy and glaucoma, which are becoming more and more common causes of visual impairment, the prevalence of low vision has increased.

Living with restrictions, as with any disability, presents constant challenges. Specifically, the lack of visual accuracy makes everyday situations significantly more difficult to manage. Although alternative approaches to handling these routines can be developed, one immediate consequence of this impairment is the insecurity felt when moving or traveling independently, especially in outdoor and unfamiliar environments [1].

In both indoor and outdoor settings, computer vision is an essential and helpful tool for those with visual impairments, such as those who are blind or have limited vision. The basic idea is that visually challenged people can be helped in their daily tasks by deploying cameras as extra eyes, whose images are automatically evaluated by software. The main goals of current research are to create specialized devices that visually impaired persons can carry with them and to develop unique methods for scene analysis and augmentation.

The field of computer vision has made notable progress in recent times, resulting in the creation of systems intended to tackle the difficulties encountered by people with visual impairments. One such tool, introduced by Gagnon [2], helps blind viewers follow their favorite TV shows better, especially during dialogue-light portions. It is a computer vision application that creates automatic descriptions of video information. A head-mounted device that is specifically designed to scan important areas of the scene and obtain information that is helpful to blind people is detailed in the study of Pradeep [3]. Furthermore, Choudhury [4] provides specifics on an image contrast enhancement technique that greatly enhances the use of images, text, and other visual elements for those with limited vision. A portable computer carried in a backpack and a camera fixed on the user’s shoulder are used by the system described in Chen’s [5] study to localize and recognize text in urban surroundings.

In addition to these earlier systems, More works have applied deep learning–based computer vision methods to assistive technologies, such as wearable scene description devices [6,7], navigation aids integrating object detection and auditory feedback [8], and mobile applications for reading textual information in complex environments [6]. In parallel, significant progress has been made in text spotting and OCR, with end-to-end deep learning pipelines such as EAST, CRAFT, and Transformer-based OCR models demonstrating improved robustness in unconstrained environments [9,10,11]. Lightweight implementations suitable for mobile devices, like PaddleOCR [12], further support the feasibility of deploying recognition systems for real-world assistive use cases.

Numerous automatic bus number identification systems, many of which make use of active sensors like GPS and Radio Frequency Identification (RFID), have been described in the literature. Among the vision-based methods is the one by Guida [13], which uses an Adaboost-based cascade of classifiers to identify bus line numbers and then translates them into audio announcements via OCR. Pan [14] created a system that uses cameras at bus stops to identify the bus route numbers. HOG and SVM are used for bus recognition, while OCR, in conjunction with text-to-speech, is used for audio announcements. A comparable system with three subsystems, bus motion detection using Modified Adaptive Frame Differencing (MAFD), bus panel detection, and text detection, was developed by Tsai [15]. All three subsystems result in speech notifications.

This study proposes an algorithm that aims to assist visually impaired individuals in understanding the local bus routes of Genova, Italy. The algorithm not only detects the bus route number but also determines the direction in which the bus is heading. By providing both pieces of information, the system aims to enhance the mobility and independence of blind individuals by improving the accessibility and usability of public transit. Furthermore, we aim to integrate this algorithm into our navigational aid that can function offline, enabling visually impaired users to access bus route information without requiring an internet connection. This offline capability would further enhance their navigation and mobility in urban environments. The main contributions of this work are as follows: (i) we design a simple yet efficient algorithm that is lightweight enough to run in offline mode on portable devices, (ii) we integrate YOLOv8, ESRGAN, OCR, and lexicon-based correction into a unified pipeline optimized for practical use, and (iii) we demonstrate that despite its simplicity, the system achieves higher accuracy compared with existing methods.

2. System Overview

Public buses in Genova run in two directions, which makes it difficult for visually impaired people (VIP) to determine which bus to board. To address this challenge, our algorithm allows VIPs to capture an image of the approaching bus using a mobile device. Importantly, the picture does not need to be perfectly centered on the bus front, since the custom-trained YOLOv8 model automatically detects and isolates the relevant front panel region even when the photo contains background objects, partial bus views, or is slightly misaligned. In cases where the bus panel is not detected or the image is too blurry to process reliably, the system does not return misleading results but instead prompts the user to retake the photo. While in Genova the front panels are LED-based, the algorithm is adaptable to other bus display formats (e.g., printed signs or painted boards), provided that a representative training dataset is available.

Figure 1 shows that the suggested approach is divided into multiple crucial parts. The bus is first identified in the picture. To save computational resources, only the front portion of the detected bus, or the area of interest (ROI), is taken into consideration for additional processing. All other information is discarded. After the ROI has been identified, the picture is cropped so only the LED panel, which shows the bus route number and destination, is extracted.

After the LED panel is extracted, image-enhancing methods are used to improve the text’s image quality. The bus route number and destination are then read from the improved image using text recognition techniques. The likelihood of errors resulting from misspellings or confusing text is decreased by comparing the recognized text to a database of Genova bus routes and selecting the closest match. This algorithm provides a reliable and efficient way to determine the exact bus route and direction, hence increasing the independence and mobility of VIPs and significantly improving their accessibility to transportation in Genova.

2.1. Dataset

To train and evaluate the proposed system, we collected a dataset of bus images representing the 147 official bus routes operating in Genova, Italy. The images were obtained from two sources: (i) photographs captured with mobile devices at bus stops, and (ii) publicly available repositories. This ensured variability in both acquisition conditions and bus appearances.

For YOLOv8 training, we used 400 annotated bus images. These were manually labeled to mark only the front-facing region of each bus, ensuring that the model learned to isolate the LED panel area. Model hyperparameters were optimized using a train/validation split of 80/20 within this dataset.

For evaluation, an independent test set of 120 images was reserved, which was not seen during training or validation. This dataset was intentionally curated to capture a range of challenging real-world conditions. A summary of dataset diversity across lighting, occlusion, and motion blur is provided in Table 1. In addition, the dataset included variation in the following:

Viewing angles: ranging from frontal captures to oblique angles of up to ∼45°.
Distances: close-range captures (3–5 m) up to ∼15 m away from the bus.
Display characteristics: while all buses used LED panels, differences were present in brightness and size. Most panels employed standard fonts, but a subset featured non-standard spacing or scrolling layouts, included to test OCR robustness.

This dataset ensured that the algorithm was evaluated under diverse and challenging conditions rather than only on ideal inputs. Approximately one-fifth of the images contained partial occlusion or strong reflections, while others exhibited blur or low contrast. Although limited in size and restricted to a single city, the dataset provides an important first step in benchmarking the system under real-world conditions. Future work will expand the dataset across multiple cities and display types (e.g., printed signs, painted boards) to improve generalization. The dataset will be made available upon request for research purposes.

2.2. Detection of Region of Interest

The first stage of the proposed algorithm involves detecting the presence of a bus within the input image. For this purpose, the YOLOv8 object detection model [16] was employed. YOLOv8 is a state-of-the-art real-time object detection framework that enables efficient localization of objects within an image. To tailor the model for our application, we retrained it specifically to detect only the front portion of the bus, which typically displays the route number and destination information.

We selected YOLOv8 as the detection backbone because it provides an optimal balance between accuracy and computational efficiency, which is crucial for mobile-based assistive technologies. At the time of system development, newer releases such as YOLOv9, YOLOv10, and YOLOv11 were available; however, these versions required substantially higher computational resources and lacked the stable deployment pipelines and pretrained weights necessary for rapid adaptation to our dataset. YOLOv8, in contrast, offered robust accuracy with real-time performance on mobile devices, making it a practical and reliable choice for our application.

For model training, we employed the YOLOv8-s (small) variant, which provides a favorable trade-off between speed and detection accuracy. A total of 400 annotated images of buses were used for training, curated from publicly available sources including the Roboflow platform [17]. Images were manually annotated to include only the front-facing segments of buses as bounding box labels, thereby defining the region of interest (ROI) and eliminating irrelevant elements such as advertisements, side panels, or unrelated text. The model was trained for 100 epochs with a batch size of 16, using the AdamW optimizer and an initial learning rate of 0.001 scheduled via cosine decay. To improve generalization, standard data augmentation strategies were applied, including horizontal flipping, random scaling (±15%), brightness/contrast adjustment (±20%), and simulated motion blur. A 20% validation split from the training set was used for monitoring performance during training.

Once trained, the YOLOv8 model was integrated into the system pipeline to automatically detect the front panel of the bus from a captured image. Following detection, the corresponding ROI was cropped from the image and passed on to subsequent modules for further processing, including image enhancement and text recognition. By isolating only the relevant portion of the image, this step reduces noise and computational overhead while improving the overall effectiveness of the OCR system used downstream.

2.3. Image Enhancement and Text Reading

In the second stage, we applied the Enhanced Super-Resolution Generative Adversarial Network (ESRGAN) [18] to improve the resolution of the cropped ROI before text recognition. The ESRGAN architecture consists of a generator network with residual-in-residual dense blocks (RRDBs) that reconstruct high-resolution images from low-resolution inputs, and a discriminator trained with perceptual loss to enhance photo-realism. The motivation for this step was to address common real-world image degradations, such as low light, motion blur, or glare on LED panels, that reduce OCR accuracy. We used the official pretrained ESRGAN model without additional fine-tuning, as it provided sufficiently clear enhancements of bus LED displays while remaining computationally lightweight.

In terms of performance, ESRGAN processed each ROI in approximately 0.4 s on an NVIDIA RTX 3060 GPU and 1.0 s on a standard Intel i7-11800H CPU, demonstrating feasibility for near real-time applications. Beyond qualitative improvement, we observed a consistent gain in recognition confidence: across the test set, mean OCR confidence increased from 0.46 (without ESRGAN) to 0.71 (with ESRGAN), and this difference was statistically significant (p < 0.01, paired t-test). While ESRGAN occasionally introduced minor stroke artifacts (“hallucinations”), these did not result in misclassifications in our dataset, since the subsequent lexicon correction step filtered out spurious outputs.

Following enhancement, text recognition was carried out using the EasyOCR framework [19], a PyTorch-based OCR engine that supports 58 languages. For our application, EasyOCR was configured with both Italian and English language models to reflect the languages used on Genova bus displays. To improve robustness, a minimum confidence threshold of 0.5 was applied to filter out low-confidence predictions, and the recognized outputs were validated against a custom lexicon containing all 147 bus routes and destinations.

Further implementation details are summarized below:

Backbone: EasyOCR employs a Convolutional Recurrent Neural Network (CRNN) with a ResNet feature extractor.
Decoding method: Connectionist Temporal Classification (CTC) beam search decoding was used.
Non-Maximum Suppression (NMS): A default threshold of 0.2 was applied to merge overlapping text boxes.
Preprocessing: Input images were resized to a height of 64 pixels while preserving aspect ratio, and normalized before recognition.
Tokenization: Default EasyOCR tokenization rules (alphanumeric + diacritics) were applied.
Language handling: Italian was set as the primary recognition language, with automatic fallback to English when the Italian model returned low confidence scores.

This configuration provided a structured and reliable OCR pipeline, ensuring accurate recognition of bus numbers and destinations even under challenging conditions.

2.4. Comparison of Detected Text with Standard

The final step entails giving the algorithm a list of all Genova bus routes and their associated destinations. The algorithm processes this list, storing bus route numbers as dictionary keys and their two possible destinations as dictionary values.

The algorithm identifies the text, which then matches it to the list of Genova buses. The dictionary key and value pair with the highest matching ratio with the detected text are output. The algorithm confirms the bus number’s existence in the dictionary if it detects both the bus route number and the destination. The destination with the highest match ratio is chosen as the output once the algorithm compares the detected destination with the corresponding dictionary data, if the bus route number is present in the dictionary. The match ratio between the detected text and the dictionary keys and values is calculated using the Levenshtein method [20].

When a bus number is incorrectly recognized, the algorithm verifies that the number is present in the dictionary keys. If not, it looks through the dictionary values to see if the detected destination exists. In cases where neither the detected bus number nor the destination can be matched with the dictionary of valid routes, the system does not provide the raw OCR output as a final result. Instead, it flags the case as “unrecognized bus route” and prompts the user to capture another image. This safeguard prevents the risk of announcing a route that does not exist, which could lead to confusion or undesired actions.

3. Result and Discussion

The initial step of our algorithm is to find the bus in the input image. We trained the YOLOv8 algorithm to detect only the front portion of the bus. The performance metrics of YOLOv8 showed encouraging results. Figure 2 presents the Precision-Recall curve, which shows a mean Average Precision (mAP) of 0.943 at an IoU threshold of 0.5. The curve demonstrates that the model retains accuracy while predicting true positives with good precision at reduced recall values. Precision falls as recall rises, indicating the trade-off between adding false positives and capturing more positive cases. The high mAP score confirms that the model achieves a good balance between precision and recall across different thresholds.

The relationship between the model’s F1 score and the confidence threshold for the “bus front” class and all classes is depicted by the F1-Confidence curve in Figure 3. Precision and recall are balanced by the F1 score, which peaks at 0.93 at a confidence level of 0.736. The curve indicates that as the model begins to detect more true positives at lower confidence levels, the F1 score improves quickly. However, after the ideal cutoff, the F1 score starts to decline, suggesting that higher thresholds reduce recall while preserving precision, thereby negatively impacting the F1 score. This analysis helps identify the optimal confidence threshold for maximizing balanced predictions.

To rigorously evaluate the performance of the proposed bus route recognition system, we employed an independent test set of 120 images of Genova buses that was entirely excluded from YOLOv8 training and validation. This set included a diverse range of route numbers and destinations, with images captured both via mobile devices at bus stops and sourced from online repositories. The dataset intentionally encompassed challenging real-world conditions, such as variations in lighting (daytime and evening), low-light environments, glare from reflective surfaces, motion blur, and oblique viewing angles, thereby providing a robust basis for assessing the reliability and generalizability of the system.

Advertisements and other unnecessary messages or numbers might cause errors when using the entire bus as the ROI. We efficiently isolated the crucial ROI and removed irrelevant text by custom-training the YOLOv8 algorithm to recognize only the front portion of the bus. From Figure 4A, it can be clearly seen that considering the whole bus for text detection can cause errors. Figure 4B depicts the results of the custom-trained YOLOv8, which correctly focused only on the bus front.

To address the limitations of low-quality inputs such as motion blur, glare, or poor lighting, we applied the ESRGAN super-resolution model to the ROI before OCR. This enhancement step consistently increased recognition reliability across the dataset. On average, the mean OCR confidence improved from 0.46 (without ESRGAN) to 0.71 (with ESRGAN), and this gain was statistically significant (p < 0.01, paired t-test). While ESRGAN may occasionally introduce artificial stroke artifacts, these did not lead to misclassifications in our experiments, since the subsequent lexicon-based validation filtered out such spurious outputs. Figure 5 illustrates a representative example, where the OCR result improved from “45 BRI NOLE” (confidence 0.13) to “45 BRIGNOLE” (confidence 0.73).

Following super-resolution and OCR, the recognized text was validated against a pre-defined list of Genova bus routes and destinations. This step was critical for correcting incomplete or erroneous OCR outputs. For example, the enhanced result “45 BRIGNOLE” was matched to the lexicon entry “45 STAZIONE BRIGNOLE,” which represents the correct bus route and destination. As shown in Figure 6, the lexicon-based correction minimized errors, reduced the impact of OCR uncertainties, and ensured robust identification of both bus numbers and destinations under challenging imaging conditions.

To provide a more detailed evaluation, we also report per-task performance metrics (Table 2). Bus front detection achieved a precision of 96.2%, recall of 94.5%, and F1 score of 95.3%. Route number recognition achieved an accuracy of 93.8% ± 2.1, while destination recognition achieved 92.5% ± 2.8, with confidence intervals estimated via bootstrapping (1000 resamples).

To further assess the contribution of each component, we performed an ablation study (Table 3). Using the full bus image as ROI reduced recognition accuracy to 78.4%, due to the inclusion of irrelevant text such as advertisements. Excluding ESRGAN resulted in lower OCR confidence and an accuracy of 85.7%. Replacing EasyOCR with Tesseract reduced recognition accuracy to 81.2%, confirming the advantage of deep learning–based OCR engines. Finally, removing the route lexicon–based correction step decreased accuracy from 95% to 88.9%, highlighting the importance of structured validation. This also illustrates that while the lexicon improves recognition reliability, it may reduce flexibility under open-set conditions, since routes or destinations not present in the predefined list could be incorrectly matched to the closest available entry, resulting in over-correction or misclassification. Future work will, therefore, extend the comparison to stronger OCR frameworks such as PP-OCRv3 and transformer-based recognizers, and will include systematic evaluation under open-set conditions to ensure robustness and avoid over-correction.

Our system yielded an overall accuracy of 95% with a 5% error rate. This performance outperforms the results of earlier research. For instance, Wongta [21] reported 73.47% accuracy in bus number detection. Guida [13], who tested their algorithm on a small dataset of only five bus routes, achieved nearly 100% accuracy but lacked scalability. Maina [22] reported 72% total accuracy. A key advantage of our approach is its ability to detect both the bus route number and its destination, making it more versatile. By simply updating the lexicon with city-specific bus routes, our method can be adapted to different urban contexts, ensuring applicability for visually impaired individuals across diverse public transport systems. For cities with frequently changing or highly variable route information, the lexicon can be updated manually or linked to open transit APIs for automatic synchronization. Since the lexicon operates independently of detection and OCR, this scalability ensures that our approach remains adaptable and future-proof.

From a privacy perspective, the proposed system was designed to operate entirely on-device, with all image processing performed locally. Only the bus front panel is analyzed, while surrounding regions are discarded, and no images are stored or transmitted, thereby reducing the risk of exposing unintended personal information in public spaces. In terms of efficiency, the algorithm follows an event-based design: it is triggered only when the user actively captures an image, rather than running continuously. This approach minimizes both computational load and energy consumption, making integration into mobile or wearable devices more feasible. On standard hardware, average processing time remains close to one second, and further optimization through lightweight model variants or quantization could reduce the battery impact even further.

To clarify the practical benefit for visually impaired people, the proposed system is designed to be integrated into portable navigation aids (e.g., smartphones or wearable devices) rather than deployed at bus stops. Users can point their device toward an approaching bus, and the algorithm processes the bus front locally to extract route information. The recognized route and destination are then conveyed to the user through audio feedback.

The analysis of failure cases highlights several limitations of the proposed algorithm. First, the dataset size and scope remain restricted, with 400 training and 120 test images collected from a single city (Genova). Although the dataset was curated to include variation in lighting, angles, distances, occlusion, and panel layouts, its limited scale raises concerns about generalization to other cities, bus designs, and display types. Second, detection errors occurred in ∼4% of cases, where YOLOv8 failed to correctly isolate the bus front due to extreme occlusion, glare, or unusual viewing angles. Third, OCR misreads accounted for ∼6% of errors, typically under conditions of motion blur or low illumination. Fourth, lexicon mismatches contributed ∼3% of errors, where the correction step over-adjusted the OCR output when multiple near matches existed in the dictionary. Non-standard fonts and panel layouts also occasionally confused the OCR system.

In addition to these issues, the dataset did not explicitly include real-time environmental conditions such as rain, fog, or heavy traffic, which may further reduce visibility or cause occlusion of bus panels. Another limitation is that our OCR comparison was restricted to EasyOCR, Tesseract, and PaddleOCR; while this provides a useful baseline, stronger frameworks such as PP-OCRv3 and transformer-based recognizers were not evaluated. Finally, the system has not yet been systematically tested under open-set conditions, where unseen routes or destinations may appear and challenge the robustness of lexicon-based correction.

Taken together, these limitations emphasize the need for future improvements, including expanding the dataset across multiple cities and display types, incorporating diverse weather and traffic scenarios, integrating more advanced OCR backbones, refining lexicon correction strategies, and conducting systematic evaluations under open-set conditions to ensure robustness in real-world deployments.

Building on the identified limitations, future work will focus on several directions. First, dataset expansion is essential. We plan to collect a larger and more diverse dataset across multiple cities, incorporating different display types (e.g., printed signs, painted boards), varied fonts, and challenging environmental conditions such as rain, fog, and heavy traffic. This will help ensure stronger generalization beyond the current Genova-based dataset. Second, user-centered evaluation will be carried out through trials with visually impaired participants to assess the system’s effectiveness in real-world navigation. These studies will measure usability, error rates, and user satisfaction. Specifically, we plan to integrate the bus route recognition algorithm into our navigation aid device for VIPs, which is currently under development, enabling real-world trials that combine route recognition with multisensory guidance. Third, OCR enhancement will be explored by extending the comparison to more advanced frameworks such as PP-OCRv3 and transformer-based recognizers, alongside lightweight variants optimized for mobile deployment. Finally, robustness testing will include systematic evaluation under open-set conditions, where unseen routes or destinations may appear, to better understand and mitigate brittleness introduced by lexicon-based correction.

By addressing these directions, we aim to bridge technical improvements with lived user experience, moving toward a practical assistive tool that empowers blind and visually impaired individuals to navigate public transit with greater confidence and independence.

4. Conclusions

This study presented a robust and adaptable framework that leverages advanced computer vision techniques to recognize bus route numbers and destinations, achieving an overall accuracy of 95%. The integration of a custom-trained YOLOv8 model for bus front detection, ESRGAN for resolution enhancement, and EasyOCR with lexicon-based validation significantly reduced errors caused by low image quality, motion blur, and distracting background text. The ablation studies further confirmed that each component contributed meaningfully to system performance, underscoring the technical reliability of the pipeline.

Beyond these technical results, the broader aim of this work is to enhance the outdoor mobility of blind and visually impaired individuals by providing timely and reliable information about approaching buses. The system was designed to tolerate imperfect inputs and provide corrective feedback, thereby supporting usability in realistic conditions where framing errors, motion-induced blur, and occlusions are common.

We acknowledge that our current evaluation was limited to image-based datasets and did not include live trials with visually impaired participants. Real-world deployment may introduce additional challenges, such as misaligned framing or difficulty consistently targeting the bus front panel. To mitigate these issues, the algorithm automatically detects and crops the bus front even in imperfectly framed inputs and prompts the user to retake the photo if detection fails. Future iterations will extend testing to user-centered trials, explore continuous video capture with audio guidance, and refine the system for reliable integration into mobile or wearable assistive devices.

Author Contributions

S.S.: conceptualization, software, data curation, formal analysis, investigation, methodology, visualization, writing—original draft. G.L.B.: writing—review and editing. M.G.: project administration, supervision, writing—review and editing. G.S.: writing—review and editing. A.D.B.: project administration, supervision, writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the European Union–NextGenerationEU and by the Ministry of University and Research (MUR), National Recovery and Resilience Plan (NRRP), Mission 4, Component 2, Investment 1.5, project “RAISE–Robotics and AI for Socio-economic Empowerment” (ECS00000035). All authors, except author 3 and 5, are part of the RAISE Innovation Ecosystem.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy and ethical considerations.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ESRGAN	Enhanced Super-Resolution Generative Adversarial Network
YOLO	You Only Look Once
OCR	Optical Character Recognition
ROI	Region of Interest

References

Riazi, A.; Riazi, F.; Yoosfi, R.; Bahmeei, F. Outdoor difficulties experienced by a group of visually impaired Iranian people. J. Curr. Ophthalmol. 2016, 28, 85–90. [Google Scholar] [CrossRef] [PubMed]
Gagnon, L.; Chapdelaine, C.; Byrns, D.; Foucher, S.; Heritier, M.; Gupta, V. A computer-vision-assisted system for videodescription scripting. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, CA, USA, 13–18 June 2010; pp. 41–48. [Google Scholar]
Pradeep, V.; Medioni, G.; Weiland, J. Robot vision for the visually impaired. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, CA, USA, 13–18 June 2010; pp. 15–22. [Google Scholar] [CrossRef]
Choudhury, A.; Medioni, G. Color contrast enhancement for visually impaired people. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, CA, USA, 13–18 June 2010; pp. 33–40. [Google Scholar]
Chen, X.; Yuille, A.L. A time-efficient cascade for real-time object detection: With applications for the visually impaired. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)-Workshops, San Diego, CA, USA, 20–26 June 2005; p. 28. [Google Scholar]
Shafique, S.; Bailo, G.L.; Zanchi, S.; Barbieri, M.; Setti, W.; Sciortino, G.; Beltran, C.; De Luca, A.; Del Bue, A.; Gori, M. SnapStick: Merging AI and Accessibility to Enhance Navigation for Blind Users. Technologies 2025, 13, 297. [Google Scholar] [CrossRef]
Brilli, D.D.; Georgaras, E.; Tsilivaki, S.; Melanitis, N.; Nikita, K. Airis: An ai-powered wearable assistive device for the visually impaired. In Proceedings of the 2024 10th IEEE RAS/EMBS International Conference for Biomedical Robotics and Biomechatronics (BioRob), Heidelberg, Germany, 1–4 September 2024; pp. 1236–1241. [Google Scholar]
Okolo, G.I.; Althobaiti, T.; Ramzan, N. Smart assistive navigation system for visually impaired people. J. Disabil. Res. 2025, 4, 20240086. [Google Scholar] [CrossRef]
Das, A.; Biswas, S.; Pal, U.; Lladós, J.; Bhattacharya, S. Fasttextspotter: A high-efficiency transformer for multilingual scene text spotting. In Proceedings of the International Conference on Pattern Recognition, Kolkata, India, 1–5 December 2024; Springer: Cham, Switzerland, 2024; pp. 135–150. [Google Scholar]
Baek, Y.; Lee, B.; Han, D.; Yun, S.; Lee, H. Character Region Awareness for Text Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 9365–9374. [Google Scholar]
Zhou, X.; Yao, C.; Wen, H.; Wang, Y.; Zhou, S.; He, W.; Liang, J. East: An efficient and accurate scene text detector. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5551–5560. [Google Scholar]
Cui, C.; Sun, T.; Lin, M.; Gao, T.; Zhang, Y.; Liu, J.; Wang, X.; Zhang, Z.; Zhou, C.; Liu, H.; et al. PaddleOCR 3.0 Technical Report. arXiv 2025. [Google Scholar] [CrossRef]
Guida, C.; Comanducci, D.; Colombo, C. Automatic bus line number localization and recognition on mobile phones—A computer vision aid for the visually impaired. In Proceedings of the International Conference on Image Analysis and Processing, Ravenna, Italy, 14–16 September 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 323–332. [Google Scholar]
Pan, H.; Yi, C.; Tian, Y. A primary travelling assistant system of bus detection and recognition for visually impaired people. In Proceedings of the 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), San Jose, CA, USA, 15–19 July 2013; pp. 1–6. [Google Scholar]
Tsai, C.M.; Yeh, Z.M. Text detection in bus panel for visually impaired people “seeing” bus route number. In Proceedings of the 2013 International Conference on Machine Learning and Cybernetics, Northeastern, Thailand, 14–17 July 2013; Volume 3, pp. 1234–1239. [Google Scholar]
Jocher, G.; Chaurasia, A.; Qiu, J. YOLO by Ultralytics. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 25 August 2025).
Roboflow: Computer Vision Tools for Developers and Enterprises—roboflow.com. Available online: https://roboflow.com/ (accessed on 5 August 2025).
Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Change Loy, C. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
Smelyakov, K.; Chupryna, A.; Darahan, D.; Midina, S. Effectiveness of Modern Text Recognition Solutions and Tools for Common Data Sources. In Proceedings of the COLINS, Lviv, Ukraine, 22–23 April 2021; pp. 154–165. [Google Scholar]
Berger, B.; Waterman, M.S.; Yu, Y.W. Levenshtein distance, sequence comparison and biological database search. IEEE Trans. Inf. Theory 2020, 67, 3287–3294. [Google Scholar] [CrossRef] [PubMed]
Wongta, P.; Kobchaisawat, T.; Chalidabhongse, T.H. An automatic bus route number recognition. In Proceedings of the 2016 13th International Joint Conference on Computer Science and Software Engineering (JCSSE), Khon Kaen, Thailand, 13–15 July 2016; pp. 1–6. [Google Scholar]
Maina, H.J.; Sánchez, J.A. Stop the Bus!: Computer vision for automatic recognition of urban bus lines. In Proceedings of the XXI Simposio Argentino de Inteligencia Artificial (ASAI 2020)-JAIIO 49 (Modalidad Virtual), Virtual, 2020; Available online: https://www.researchgate.net/publication/385828724_Stop_the_Bus_computer_vision_for_automatic_recognition_of_urban_bus_lines (accessed on 20 August 2025).

Figure 1. Block diagram of the proposed method. The Region of Interest (ROI) corresponds to the bus front, which contains the LED display panel. The final output of the pipeline is text indicating the bus route number and its destination (direction of travel).

Figure 2. Precision–Recall curve for YOLOv8 on custom dataset training.

Figure 3. F1–Confidence curve for YOLOv8 on custom dataset training.

Figure 4. The (A) depicts the output of conventional YOLOv8, while (B) shows the output of our custom-trained YOLOv8. The green square represents the bounding box.

Figure 5. The picture depicts the ROI before the implementation of ESRGAN (A) and after the implementation of ESRGAN (B).

Figure 6. This figure represents the final output of our algorithm.

Table 1. Dataset statistics for training, validation, and testing.

Category	Training	Validation (20%)	Test
Images (n)	400	100	120
Routes covered	100+	80+	90+
Daylight (%)	72	70	68
Low light (%)	28	30	32
Occlusion/Glare (%)	18	20	22
Motion blur (%)	12	15	14

Table 2. Performance metrics for different components of the system.

Task	Precision (%)	Recall (%)	F1 score (%)	Accuracy ± SD (%)
Bus front detection (YOLOv8)	96.2	94.5	95.3	–
Route number recognition	–	–	–	93.8 ± 2.1
Destination recognition	–	–	–	92.5 ± 2.8

Table 3. Ablation study results for different system configurations.

Configuration	Accuracy (%)
Full pipeline (ours)	95.0
Without ESRGAN	85.7
ROI = full bus	78.4
EasyOCR replaced with Tesseract	81.2
EasyOCR replaced with PaddleOCR	86.7
Without route lexicon correction	88.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shafique, S.; Bailo, G.L.; Gori, M.; Sciortino, G.; Del Bue, A. Enhancing Mobility for the Blind: An AI-Powered Bus Route Recognition System. Algorithms 2025, 18, 616. https://doi.org/10.3390/a18100616

AMA Style

Shafique S, Bailo GL, Gori M, Sciortino G, Del Bue A. Enhancing Mobility for the Blind: An AI-Powered Bus Route Recognition System. Algorithms. 2025; 18(10):616. https://doi.org/10.3390/a18100616

Chicago/Turabian Style

Shafique, Shehzaib, Gian Luca Bailo, Monica Gori, Giulio Sciortino, and Alessio Del Bue. 2025. "Enhancing Mobility for the Blind: An AI-Powered Bus Route Recognition System" Algorithms 18, no. 10: 616. https://doi.org/10.3390/a18100616

APA Style

Shafique, S., Bailo, G. L., Gori, M., Sciortino, G., & Del Bue, A. (2025). Enhancing Mobility for the Blind: An AI-Powered Bus Route Recognition System. Algorithms, 18(10), 616. https://doi.org/10.3390/a18100616

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Mobility for the Blind: An AI-Powered Bus Route Recognition System

Abstract

1. Introduction

2. System Overview

2.1. Dataset

2.2. Detection of Region of Interest

2.3. Image Enhancement and Text Reading

2.4. Comparison of Detected Text with Standard

3. Result and Discussion

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI