Low-Cost Smart Cane for Visually Impaired People with Pathway Surface Detection and Distance Estimation Using Weighted Bounding Boxes and Depth Mapping

Mungdee, Teepakorn; Ramsiri, Prakaidaw; Khabuankla, Kanyarak; Khambun, Pipat; Nupim, Thanakrit; Chophuk, Ponlawat

doi:10.3390/info16080707

Open AccessArticle

Low-Cost Smart Cane for Visually Impaired People with Pathway Surface Detection and Distance Estimation Using Weighted Bounding Boxes and Depth Mapping

by

Teepakorn Mungdee

,

Prakaidaw Ramsiri

,

Kanyarak Khabuankla

,

Pipat Khambun

,

Thanakrit Nupim

and

Ponlawat Chophuk

^*

Faculty of Informatics, Burapha University, Chonburi 20131, Thailand

^*

Author to whom correspondence should be addressed.

Information 2025, 16(8), 707; https://doi.org/10.3390/info16080707

Submission received: 21 June 2025 / Revised: 6 August 2025 / Accepted: 8 August 2025 / Published: 19 August 2025

(This article belongs to the Special Issue AI and Data Analysis in Smart Cities)

Download

Browse Figures

Versions Notes

Abstract

Visually impaired individuals are at a high risk of accidents due to sudden changes in walking surfaces and surrounding obstacles. Existing smart cane systems lack the capability to detect pathway surface transition points with accurate distance estimation and danger-level assessment. This study proposes a low-cost smart cane that integrates a novel Pathway Surface Transition Point Detection (PSTPD) method with enhanced obstacle detection. The system employs dual RGB cameras, an ultrasonic sensor, and YOLO-based models to deliver real-time alerts based on object type, surface class, distance, and severity. It comprises three modules: (1) obstacle detection and classification into mild, moderate, or severe levels; (2) pathway surface detection across eight surface types with distance estimation using weighted bounding boxes and depth mapping; and (3) auditory notifications. Experimental results show a mean Average Precision (mAP@50) of 0.70 for obstacle detection and 0.92 for surface classification. The average distance estimation error was 0.3 cm for obstacles and 4.22 cm for pathway surface transition points. Additionally, the PSTPD method also demonstrated efficient processing with an average runtime of 0.6 s per instance.

Keywords:

smart cane; visually impaired assistance; pathway surface transition point; obstacle detection; distance estimation

1. Introduction

According to statistics published by the World Health Organization (WHO) in 2023, at least 2.2 billion people globally are affected by visual impairments, with over 1 billion of these cases remaining untreated. Vision impairment results in a significant financial strain worldwide, with the annual loss in productivity estimated at approximately USD 411 billion globally. Globally, it is estimated that only 36% of individuals with distance vision impairment caused by refractive error and just 17% of those with vision impairment due to cataracts have access to proper treatment. While vision loss can occur at any age, the majority of individuals affected by vision impairment and blindness are over 50 years old [1]. Falls represent one of the most prevalent causes of unintentional injuries and premature mortality, particularly among individuals with visual impairments [2]. A fall is defined as an incident in which a person unintentionally comes to rest on the ground, floor, or another lower surface, with or without sustaining an injury [3]. Among those with visual impairments, fall-related injuries are a leading cause of accidental harm and early mortality. Across all regions, mortality rates due to falls are highest among visually impaired individuals and adults over the age of 50 [4].

The development of a smart cane system with pathway transition monitoring is essential for enhancing the safety of individuals with visual impairments by preventing potential falls [5,6,7,8]. Most studies [9,10,11,12,13,14,15,16,17,18,19,20,21,22] have proposed smart cane systems with obstacle detection; others [23,24] have focused on pathway detection. Furthermore, some studies [25,26,27,28,29,30] have integrated both obstacle and pathway detection to provide comprehensive navigation assistance. However, a significant gap remains, as most existing studies fail to detect pathway surface transition points, which are essential for estimating the distance to transitions and assessing their danger levels. As summarized in Table 1, although several systems support obstacle distance detection, they typically lack the ability to detect or measure the distance to pathway surface transition points, and none provide a combined distance-based danger-level alert for both types of hazards. To overcome these limitations, this paper proposes a low-cost smart cane system that not only improves obstacle detection but also introduces a novel method to detect and estimate the distance of pathway surface transition points. The system provides distance-based alerts indicating the severity and proximity of both obstacles and surface transitions. The key contributions of this study are as follows:

Proposal of a novel Pathway Surface Transition Point Detection (PSTPD) method, a new approach designed to detect transition points between different walking surfaces and estimate their distances using a weighted center calculation derived from bounding boxes and a calibrated depth mapping technique.
Integration of the PSTPD method and enhanced obstacle detection, in which the proposed system introduces a smart cane that delivers distance-based alerts indicating the severity and proximity of both obstacles and pathway surface transition points.
A cost-effective smart cane solution that uses a Raspberry Pi 4, camera modules, and an ultrasonic sensor to create a low-cost but reliable assistive tool that works well in real-life situations.

The structure of this article is as follows: Section 1 introduces the study, while Section 2 reviews related works. Section 3 analyzes the problem domain, followed by Section 4, which provides an overview of the system. The proposed method is detailed in Section 5, and Section 6 presents the experimental setup and results. Section 7 discusses the findings and implementation aspects, and, finally, Section 8 concludes the study.

2. Related Works

In related research, existing studies on assistive technologies for visually impaired individuals can be categorized into three main approaches for environmental perception: obstacle detection, pathway surface detection, and integrated obstacle–pathway surface detection. Obstacle detection is a critical component of assistive navigation systems, leveraging AI and IoT to provide real-time recognition of surrounding objects. Several researchers [9,10,11,12] have proposed innovative solutions that integrate ultrasonic sensors for real-time obstacle distance detection, aiming to improve environmental perception and assist users in safely navigating their surroundings. In a more advanced approach, Cardillo et al. [13] developed a millimeter-wave radar cane (122 GHz FMCW) capable of distinguishing humans from objects through breathing detection, enhancing real-time user feedback. In addition to conventional sensor-based systems, several studies have focused on integrating smart technologies for enhanced navigation. In addition, these systems utilize image processing to accurately detect obstacles. Sibu et al. [14] developed a smart cane system for visually impaired individuals that incorporates a Raspberry Pi 4, Pi Camera, GPS module, and MobileNet-based CNN for real-time object detection. The system offers voice feedback to assist users in identifying nearby objects and includes a web-based application that enables caregivers to monitor the location and receive visual updates. Li et al. [15] developed an AIoT-based assistance system integrating wearable smart glasses and an intelligent walking stick. The system employs a binocular camera with YOLOv5, achieving an accuracy of 92.16% in real-time object detection. Furthermore, it includes heart rate monitoring, fall detection, and environmental tracking, ensuring user safety and independence. Several studies have developed systems that integrate distance sensors and image processing to enhance obstacle detection while optimizing energy efficiency. For instance, Chen et al. [16] introduced iDog, an AI-powered guide dog harness that integrates an RGB-D image sensor with the YOLOv5 model for detecting moving obstacles such as pedestrians, motorcycles, and cars with high precision. Users receive real-time voice feedback through a Bluetooth-enabled headset, ensuring enhanced safety and mobility. Rahman et al. [17] designed a smart blind assistant system incorporating IoT and deep learning technologies. The system includes a smart cap equipped with a Raspberry Pi and a camera for object detection, a blind stick with multiple sensors for obstacle recognition, and a virtual assistant for real-time navigation support. Patankar et al. [18] proposed a smart stick integrating Raspberry Pi, GPS, GSM, and multiple sensors to enhance mobility and safety for visually impaired users. It enables real-time obstacle detection, water hazard alerts, and emergency location sharing. The device offers significant improvements over traditional white canes by providing haptic and audio feedback. Ma et al. [19] developed an intelligent assistive cane that integrates Raspberry Pi, Arduino, ultrasonic sensors, and cloud-based computer vision to support visually impaired users. The system utilizes an edge-cloud collaboration model, enabling real-time aerial obstacle detection, fall detection, and traffic light recognition. With its high accuracy, low power consumption, and affordability, the cane offers a practical and adaptable solution for daily mobility assistance. Leong et al. [20] developed a wearable device using Raspberry Pi and a camera module for object detection and distance estimation. Their system integrates a pretrained convolutional neural network with ultrasonic sensors, providing real-time auditory or haptic feedback. Nataraj et al. [21] and Raj et al. [22] proposed a smart cane with object recognition system, which represents a transformative advancement in assistive technology by integrating ultrasonic sensors, an ESP32-CAM for real-time object detection, and GPS-GSM modules for location tracking to enhance the safety, mobility, and independence of visually impaired users. By combining image processing and audio feedback, the system provides real-time alerts about nearby obstacles.

Pathway detection focuses on identifying terrain variations and potential hazards, improving spatial awareness and navigation accuracy. Dang et al. [23] introduced a virtual blind cane that integrates a line laser, RGB camera, and an inertial measurement unit (IMU) for enhanced obstacle classification and distance estimation. By leveraging laser stripe scanning and IMU-based motion tracking, the system accurately differentiates between walls, stairs, and blocks, improving indoor navigation. Chang et al. [24] developed an AI edge computing-based assistive system that integrates smart sunglasses, an intelligent walking cane, and a waist-mounted device. The system employs deep learning-based zebra crossing recognition, achieving an accuracy of 90%.

Some researchers have developed systems that combine obstacle and pathway detection to provide comprehensive navigation assistance. Bai et al. [25] proposed a wearable assistive device that combines an RGB-D camera and an IMU for navigation and object recognition. The system employs ground segmentation algorithms using depth continuity, ensuring accurate path detection. A lightweight convolutional neural network (CNN) facilitates real-time object recognition, while a human–machine interface with an audio guidance system provides users with semantic scene descriptions, improving navigation efficiency. Joshi et al. [26] introduced an AI-based assistive system that integrates deep learning-based object detection using YOLOv3, achieving an object detection accuracy of 95.19% and a recognition accuracy of 99.69%. The system incorporates distance sensors and optimized auditory feedback, reducing processing time and enhancing real-time user interaction. Farooq et al. [27] developed an IoT-enabled intelligent stick equipped with ultrasonic sensors, a water sensor, and a high-definition video camera. The system operates in two modes: ultrasonic-based obstacle detection with vibration feedback and AI-powered object recognition with voice alerts. Additionally, GPS and GSM modules enable real-time location tracking and emergency assistance, ensuring user safety. Veena et al. [28] developed a smart navigation aid for visually impaired individuals using ultrasonic sensors to detect nearby obstacles with distance-based vibration alerts. The system integrates a Pi Camera and processes data via Raspberry Pi using the lightweight MobileNet-SSD model to detect moving objects, potholes, and pedestrian signals. Users receive alerts through audio, vibration, or buzzer based on the type of object detected. The device is portable and designed for ease of use in unfamiliar environments without hindering mobility. Mai et al. [29] integrated 2D LiDAR and an RGB-D camera into a smart cane, leveraging laser SLAM (Simultaneous Localization and Mapping) and an improved YOLOv5 model for obstacle recognition and real-time navigation. This approach enhances spatial mapping and ensures safer movement in both indoor and outdoor environments. Scalvini et al. [30] introduced an urban navigation system that employs real-time visual–auditory substitution, combining GPS, inertial sensors, and an RGB-D camera for accurate trajectory estimation and deep learning-based obstacle segmentation. The system includes a helmet connected to a processing unit via USB, enabling effective urban navigation.

3. Problem Analysis

Many visually impaired individuals face significant challenges when walking in real environments, as shown in Figure 1. One major problem is sudden changes in the walking surface. These changes can include going from flat ground to stairs, or from paved paths to grassy, rough, or uneven areas. People who can see may not think these changes are a serious issue, but for the visually impaired, they can be very dangerous. Without the ability to see these surface changes, it is hard for them to react in time. This increases the chance of tripping, losing balance, or falling. These kinds of accidents happen often and are one of the main causes of injury and early death in this group. Figure 1 shows common places around Burapha University, Thailand, both indoors and outdoors. Examples include walkways, roundabouts, and building entrances. In the images, the red boxes mark the areas where the ground changes. These are the spots that can be dangerous for people with vision problems because the changes often happen without clear sound or touch clues.

To help solve this problem, this study presents a low-cost support system. It uses a smart cane with built-in RGB cameras, as shown in Figure 2. The system uses a method called Pathway Surface Transition Point Detection (PSTPD). This method can find pathway surface changes in real time. It looks at images from a camera pointing at the ground. The system can tell what type of surface it is and measure how far away the change is using a depth map. The system then sends audio alerts to the user. These alerts tell them what surface is coming, how far away it is, and how dangerous it might be. By giving this information in advance, the system helps users avoid falls and stay safe while walking.

4. System Overview

The proposed system is divided into two main components. The first component, the hardware unit, describes the hardware elements utilized in the system. The second component, the software unit, presents the operational flow of the system using a flowchart to illustrate the overall software processes.

4.1. Hardware Unit

The hardware unit of the proposed smart cane system comprises four main components, as illustrated in Figure 3. The first component is a battery, which serves as the power supply for all hardware devices. The second is a Raspberry Pi 4 Model B (Raspberry Pi Ltd., Cambridge, UK), functioning as the central processing unit responsible for managing data input, processing, and output. The third component is an ultrasonic sensor (HC-SR04, NINGBO FBELE ELECTRONICS CO., LTD., Ningbo, China), used to measure the distance to obstacles in front of the user, thereby enhancing real-time spatial awareness. Lastly, the system includes two RGB cameras, each mounted at specific angles to maximize environmental perception. The first camera is tilted downward at an angle of 25 degrees to detect changes in pathway surfaces and estimate the distance to pathway surface transition points. Meanwhile, the second camera is aligned at a 0-degree horizontal angle to detect potentially hazardous obstacles. Both cameras are calibrated relative to the cane, which is held at a 45-degree tilt from the vertical axis. All components are integrated into a unified assistive device designed to enhance the mobility, safety, and confidence of visually impaired individuals.

4.2. Software Unit

The software unit in this system is designed to assist users in navigating environments by detecting changes in pathway surfaces and identifying potential obstacles, as illustrated in Figure 4. Initially, the system captures images continuously and performs a Pathway Surface Transition Point Detection process. If a transition is detected, the system alerts the user with surface type, distance information, and danger level. Simultaneously, an ultrasonic sensor measures the distance to nearby objects. If an object is detected within 2 m, the system provides a warning alert. Additionally, if the user presses a button, the camera captures an image of the obstacle, which is then classified using an obstacle recognition model. The system informs the user about the object type and continues monitoring, ensuring ongoing assistance and safety.

5. Proposed Method

The proposed method consists of three main components, Input, Processing Unit, and Notification, as indicated in Figure 5. The system uses an ultrasonic sensor to detect the distance to obstacles and a camera to capture RGB images of the surroundings. The processing unit analyzes the data through models that detect obstacles and pathway surface transition points, and then classifies the severity of both. Finally, audio notifications are generated to alert users about detected obstacles and specific walking surface transitions.

5.1. Input

The sensor and input subsystem comprises a Raspberry Pi 4 Model B, two RGB cameras, a 10,000 mAh lithium polymer battery (5V/3A), an HC-SR04 ultrasonic sensor, and a push button switch, as summarized in Table 2. The Raspberry Pi, equipped with a Broadcom BCM2711 Quad-core Cortex-A72 (1.5 GHz) and 8 GB LPDDR4-3200 RAM, serves as the central processing unit for real-time tasks. Camera 1 (Hoco GM101 (Shenzhen Hoco Technology Development Co., Ltd., Shenzhen, China), 2560 × 1440 @ 30 FPS) operates continuously to capture downward-facing images for pathway surface detection using the YOLOv8n model. Camera 2 (OE-B35 (OKER (Thailand) Co., Ltd., Krathum Baen, Thailand); 640 × 480 pixels @ 30 FPS) is oriented forward and dedicated to obstacle detection via the YOLOv5x model, which is activated only upon user confirmation through a push button switch. When the HC-SR04 ultrasonic sensor detects an object within a 2 m range, the system notifies the user, who may then manually trigger the obstacle detection process. Operating at a frequency of 10 Hz, the ultrasonic sensor provides a detection range of 2–400 cm with a beam angle of less than 15°. In particular, the following describes how the ultrasonic sensor operates and interfaces with the Raspberry Pi. The HC-SR04 ultrasonic sensor is employed in the system to determine the distance between the user and potential obstacles. It emits ultrasonic pulses and measures the time it takes for the echo to return after bouncing off an object. The sensor connects to the Raspberry Pi 4 via its GPIO pins, with one pin designated for pulse transmission (Trigger) and another for receiving the echo signal (Echo). A short high signal, typically lasting 10 microseconds, is sent to the Trigger pin. When the reflected signal is received at the Echo pin, the Raspberry Pi measures the time interval, which is then used to calculate the distance using Equation (1), as follows:

D = \frac{T \times v}{2}

(1)

where

T

is the measured time interval and

v

is the speed of sound in air (approximately 343 m/s). This process is handled by the Raspberry Pi using Python 3.11.4 libraries (RPi.GPIO, time) at a sampling rate of 10 Hz. To enhance accuracy, the system averages multiple readings and triggers an alert when obstacles are detected within 2 m. The sensor’s narrow beam angle (<15°) ensures precise directionality.

Empirical testing under standard operating conditions (power consumption of 8–10 W) indicates that the system can sustain continuous operation for approximately 3.5 to 4 h on a full battery charge.

5.2. Processing Unit

The processing unit in this smart cane system for visually impaired individuals is responsible for analyzing data from cameras and sensors to accurately detect obstacles and identify changes in the walking surface. It operates through two main processes: the obstacle detection process and the pathway surface detection process.

5.2.1. Obstacle Detection Process

The obstacle detection process consists of two steps: (1) object detection using the YOLOv5 model [37] to identify 29 obstacle classes from images and (2) severity assessment that categorizes obstacles into mild, moderate, or severe levels based on potential risk to visually impaired users.

Obstacle detection model

The object detection model is constructed using the YOLOv5 architecture, which consists of a convolutional backbone for feature extraction, a neck for feature aggregation, and multi-scale detection heads to identify objects of varying sizes (small, medium, and large). This design enables the system to effectively recognize 29 obstacle types that are critical to safe navigation for the visually impaired. To optimize processing efficiency and extend battery life, the obstacle detection model is not continuously active. Instead, the system first monitors the environment using an HC-SR04 ultrasonic sensor, which measures the distance to nearby objects. If an obstacle is detected within a range of 2 m, the system generates a tactile or auditory alert to the user. The user is then prompted to press a button to activate the object detection process. Once the user initiates the detection, the system captures an image from the forward-facing camera and feeds it to the YOLOv5 model for analysis. The results are subsequently used in the severity assessment step to determine whether the detected object poses a potential danger. This conditional activation strategy significantly reduces unnecessary computation and ensures that the system remains energy-efficient while still providing situational awareness when it matters most. Figure 6 illustrates the YOLOv5 architecture applied for obstacle detection in this study. The pipeline begins with an input image that is processed through the backbone, which includes multiple convolutional (CONV) layers and C3 modules, responsible for extracting hierarchical features at different levels of abstraction. At the end of the backbone, a Spatial Pyramid Pooling (SPP) block aggregates spatial information across multiple receptive fields to enhance contextual understanding. The extracted features are then passed to the neck, where feature maps from different stages are fused using a combination of upsampling, concatenation (CONCAT), and additional C3 layers. This process enables the model to preserve fine-grained details and improve robustness to scale variation. The processed features are forwarded to the head, which consists of three detection branches targeting small, medium, and large object sizes. This multi-scale detection approach enhances the model’s ability to identify objects at various distances and dimensions within the scene. The final output comprises bounding boxes and class predictions for 29 predefined object categories. As demonstrated in the figure, the system successfully detects and labels a “motorcycle” and a “bench,” confirming its capability to handle multiple object types within a single frame.

2.: Obstacle severity assessment process

The obstacle severity assessment categorizes 29 object classes detected by the system into three levels of risk, mild, moderate, and severe, based on their potential threat to visually impaired individuals using a smart cane. This classification supports real-time decision making and enhances navigation safety. Objects in the mild category, such as furniture or small personal items, are typically static and easily detectable, posing minimal interference. Moderate obstacles include partially obstructive items or objects positioned at varying elevations, which may reduce detection accuracy and introduce moderate risk. Severe obstacles comprise dynamic or large entities commonly found in high-risk environments, such as vehicles and pedestrians, and require immediate user awareness. The classification framework is detailed in Table 3.

5.2.2. Pathway Surface Transition Point Detection (PSTPD) Process

The PSTPD process consists of three core components designed to enhance mobility safety for visually impaired users. First, the pathway surface detection model employs a YOLOv8n-based architecture [38] to classify eight distinct walking surface types, such as grass, holes, and crosswalk, that commonly occur in urban environments. Second, the pathway surface transition point distance estimation process calculates the distance from the user to the transition point using a weighted center approach derived from bounding box areas and a pre-calibrated depth map, ensuring real-world distance accuracy. Finally, the pathway surface transition severity assessment process evaluates the risk level of each detected transition by comparing the severity scores of the involved surfaces, assigning them as mild, moderate, or severe. This multi-stage process enables real-time identification and classification of surface changes, coupled with accurate distance and risk-level alerts, to proactively inform users and reduce fall risk.

Pathway surface detection model

The model operates continuously to detect and classify various walking surface conditions that may impact the mobility and safety of visually impaired individuals using a smart cane. Built upon the YOLOv8n architecture, the model is trained to recognize eight distinct surface classes: Braille block, crosswalk, grass, hole, normal, puddle, rough, and uneven floor. Each class represents a unique walking surface characteristic. Braille blocks and crosswalks indicate structured and safe navigation paths, whereas grass and puddles may increase the risk of slipping. Meanwhile, holes, rough surfaces, and uneven floors present potential tripping or falling hazards. This real-time pathway surface detection process ensures that the smart cane provides timely alerts to the user, enhancing safe and independent mobility. The pathway surface detection pipeline is based on the YOLOv8n architecture, as shown in Figure 7. The process begins with an input image captured by a downward-facing camera, which is then fed into the model’s backbone. The backbone consists of multiple convolutional (CONV) layers and Cross Stage Partial with Focus (C2F) modules that extract hierarchical features from the input. The Spatial Pyramid Pooling Fast (SPPF) block further enhances the receptive field and captures contextual information at multiple scales. These features are passed to the neck, where feature maps are fused through upsampling, concatenation (CONCAT), and additional C2F layers. The fused multi-scale features are then forwarded to the head, which includes three detection branches designed for small-, medium-, and large-scale object detection. The final output consists of bounding boxes and classification results for eight pathway surface classes, such as normal, Braille block, puddle, and uneven floor. The system accurately detects and labels two regions in the sample output, “Normal” and “Uneven_floor,” demonstrating its effectiveness in recognizing diverse surface transitions to support the mobility of visually impaired individuals.

2.: Pathway surface transition point distance estimation process

The process is designed to calculate the distance between the user and the point where a change in walking surface occurs, which is an essential factor in ensuring safe mobility for visually impaired individuals. The process begins by detecting two bounding boxes representing different surface types, as identified by the YOLOv8-based surface classification model, as illustrated in Figure 8. The center of each bounding box (Box 1 and Box 2) is calculated using Equation (2), which determines the midpoint based on the top-left and bottom-right coordinates of each box.

\begin{matrix} c_{x_{1}} = \frac{x_{1_{1}} + x_{2_{1}}}{2}, c_{y_{1}} = \frac{y_{1_{1}} + y_{2_{1}}}{2} \\ c_{x_{2}} = \frac{x_{1_{2}} + x_{2_{2}}}{2}, c_{y_{2}} = \frac{y_{1_{2}} + y_{2_{2}}}{2} \end{matrix}

(2)

where

(x_{1_{1}}, y_{1_{1}})

and

(x_{2_{1}}, y_{2_{1}})

denote the top-left and bottom-right coordinates of Box 1, respectively,

(x_{1_{2}}, y_{1_{2}})

and

(x_{2_{2}}, y_{2_{2}})

denote the top-left and bottom-right coordinates of Box 2, respectively. The center points of Box 1 and Box 2 are represented by

(c_{x_{1}}, c_{y_{1}})

and

(c_{x_{2}}, c_{y_{2}})

, respectively.

Subsequently, the area of each box is computed using Equation (3), which calculates the width and height of each bounding box using absolute values to ensure non-negative results regardless of coordinate order. These areas are then used to derive the relative weight (w₁ and w₂) for each box, as formulated in Equation (4), allowing the estimation process to consider the proportionate significance of each bounding box.

\begin{matrix} A_{1} = |(x_{2_{1}} - x_{1_{1}}) (y_{2_{1}} - y_{1_{1}})| \\ A_{2} = |(x_{2_{2}} - x_{1_{2}}) (y_{2_{2}} - y_{1_{2}})| \end{matrix}

(3)

where

A_{1}

and

A_{2}

denote the areas of Box 1 and Box 2, respectively. The use of absolute values ensures that the computed area remains non-negative regardless of the coordinate order.

\begin{matrix} w_{1} = \frac{A_{2}}{A_{1} + A_{2}} \\ w_{2} = \frac{A_{1}}{A_{1} + A_{2}} \end{matrix}

(4)

where the variable

w_{1}

denotes the weight assigned to the center of Box 1, while

w_{2}

denotes the weight assigned to the center of Box 2. These weights satisfy the condition

w_{1} + w_{2} = 1

at all times.

Using these weights, the final weighted center point, representing the most accurate estimate of the surface transition location between the two boxes, is calculated using Equation (5). This point serves as the estimated position of the transition. The depth value at this location is then determined using Equation (6), which employs a linear interpolation method [30] based on a pre-calibrated depth reference map. The depth map is constructed from known coordinates and corresponding real-world depth values, enabling the system to estimate the distance from Camera 1 to the surface transition point, as illustrated in Figure 9. To determine the actual distance from the user to the surface transition point, Equation (7) is used. This equation incorporates the constant value L, which represents the fixed physical distance between Camera 1 and the user. By adding this known offset to the interpolated depth

E (C_{x}, C_{y})

, the system provides a complete estimation of the total distance from the user to the transition point in the real-world environment.

\begin{matrix} C_{x} = c_{x_{1}} \cdot w_{1} + c_{x_{2}} \cdot w_{2} \\ C_{y} = c_{y_{1}} \cdot w_{1} + c_{y_{2}} \cdot w_{2} \end{matrix}

(5)

where

C_{x}

and

C_{y}

represent the final weighted center point in the x- and y-dimensions, respectively, computed from the two bounding boxes.

E (C_{x}, C_{y}) = griddata ({\{(x_{i}, y_{i}), D_{i}\}}_{i = 1}^{9}, (C_{y}, C_{y}), method = ‘ linear ’)

(6)

where

(x_{i}, y_{i})

denote the pixel coordinates of the i-th reference point,

D_{i}

denotes the actual measured depth at the i-th reference point, and

E (C_{x}, C_{y})

denote the estimated depth value at location

(C_{x}, C_{y})

.

E^{'} (C_{x}, C_{y}) = E (C_{x}, C_{y}) + L

(7)

where

E^{'} (C_{x}, C_{y})

represents the estimated distance from the user to the surface transition point. L is a known constant that represents the fixed distance between the camera and the user.

3.: Pathway Surface transition point severity assessment process

The process evaluates the severity level of transitions between different surface types, as demonstrated in Figure 10. Surface classes are categorized based on their severity into three levels: mild, moderate, and severe. The set of surface classes provides a comprehensive framework of distinct surface types encountered in urban environments, as defined in Equation (8). Each element represents a specific surface class, including Braille block, crosswalk, grass, hole, normal, puddle, rough, and uneven floor. These surfaces are carefully selected due to their potential impact on pedestrian mobility, especially for individuals with visual impairments.

C = \{G_{1}, G_{2}, G_{3}, G_{4}, G_{5}, G_{6}, G_{7}, G_{8}\}

(8)

where

G_{1}, G_{2}, G_{3}, G_{4}, G_{5}, G_{6}, G_{7},

and

G_{8}

represent Braille block, crosswalk, grass, hole, normal, puddle, rough, and uneven floor, respectively.

The set with assigned severity defines mapping between each surface class and its corresponding severity level based on how challenging or hazardous it may be, particularly for individuals with visual impairments. As shown in Equation (9), the function Level (c) assigns a severity score of 1 (mild), 2 (moderate), or 3 (severe) to each surface type c ∈ C. Surfaces such as normal, crosswalk, and Braille block are considered mild, while grass, puddle, and rough surfaces are categorized as moderate due to their potential to cause confusion or imbalance. More dangerous surfaces like holes and uneven floors are classified as severe. This severity assignment plays a critical role in quantifying and understanding the risks associated with different surfaces.

L e v e l (c) = \{\begin{array}{l} 1 i f c \in {N o r m a l, C r o s s w a l k, B r a i l l e b l o c k} \\ 2 i f c \in {G r a s s, P u d d l e, R o u g h} f o r c \in C \\ 3 i f c \in {H o l e, U n e v e n f l o o r} \end{array}

(9)

where Level (c) ∈ {1,2,3} is the severity score of surface c, corresponding to mild = 1, moderate = 2, and severe = 3.

The concept of surface transition severity is introduced to quantify the level of difficulty or risk involved when transitioning between two different types of surfaces. This is especially critical for individuals with visual impairments, as sudden or severe changes in surface type can lead to confusion, instability, or accidents. Each surface type is previously assigned a severity level based on its physical characteristics, such as smoothness or irregularity, ranging from mild (1) to severe (3). When moving between two surfaces, the transition can be challenging depending on the combination of their individual severity levels, as defined in Equation (10); this equation ensures that the most challenging surface in a transition dictates the severity of that transition. For example, if the transition is from a crosswalk (Level 1) to an uneven floor (Level 3), the severity level of the transition is considered 3, or severe. The severity of a transition from surface

c_{i}

to surface

c_{j}

, denoted by

S_{i, j}

, is calculated using the maximum of their respective severity levels.

L e v e l (S_{i, j}) = m a x (L e v e l (c_{i}), L e v e l (c_{j})), for c_{i}, c_{j} \in C, i \neq j

(10)

where

c_{i}, c_{j} \in C

denote the source and target surface classes, Level(c) ∈ {1,2,3} represents the severity level of surface c corresponding to mild, moderate, and severe, respectively, and

S_{i, j}

denotes the transition from

c_{i}

to

c_{j}

, excluding the case where

c_{i}

=

c_{j}

.

This set defines specific transition cases between different walking surface types, such as from normal to crosswalk

(G_{5}, G_{2})

or from grass to puddle

(G_{3}, G_{6})

, as defined in Equation (11). These transitions are selected based on common real-world scenarios that visually impaired individuals may encounter while navigating urban environments. By analyzing these transitions, the system can assess the severity level associated with each pair and provide timely auditory alerts to users.

T r a n s i t i o n c a s e s = \{(G_{5}, G_{2}), (G_{3}, G_{6}), (G_{3}, G_{5}), (G_{3}, G_{7}), (G_{7}, G_{3}), (G_{5}, G_{3}), (G_{7}, G_{6}), (G_{7}, G_{8}), (G_{5}, G_{8})\}

(11)

where

G_{1}, G_{2}, G_{3}, G_{4}, G_{5}, G_{6}, G_{7},

and

G_{8}

represent Braille block, crosswalk, grass, hole, normal, puddle, rough, and uneven floor, respectively.

5.3. Notification

The developed smart cane system focuses on delivering audio notifications through headphones, allowing visually impaired users to be aware of obstacles and surface changes in a timely manner. The system provides alerts that include both the distance and the level of potential danger detected, covering obstacles ahead, as well as transitions in walking surfaces that could compromise user safety.

5.3.1. Obstacle Detection Alerts

When the ultrasonic sensor detects an obstacle within a detection range of 2 m, the user can activate an object detection module via a dedicated button. This module, built upon the YOLOv5 architecture, supports the recognition of 29 distinct object categories. Detected objects are classified into three levels of severity: mild, referring to stationary and easily recognizable items such as bags, cats, or bottles; moderate, indicating objects with variable positions or heights, including bicycles, tables, or chairs; and severe, which encompasses large or dynamic entities such as people, cars, or buses. Upon detection, the system delivers real-time audio alerts specifying the severity level, object type, and distance, thereby assisting the user in making timely navigational adjustments to avoid potential hazards.

5.3.2. Pathway Surface Transition Point Detection Alerts

The system is capable of accurately detecting surface transitions between two types of terrain that may impact user safety. Examples include transitions from a normal surface to an uneven surface, or from a crosswalk to a wet area. This is achieved through the use of a Surface Transition Model, which integrates camera input with deep learning algorithms (YOLOv8n) to classify surfaces into eight categories: normal, Braille block, crosswalk, grass, puddle, rough surface, hole, and uneven surface. The system assesses the severity level of a surface transition based on the danger level associated with each surface type. It adopts the higher risk level between the starting and ending surfaces. For example, a transition from a normal surface (mild) to an uneven surface (severe) is classified as severe, while a transition from a crosswalk (mild) to a puddle surface (moderate) is classified as moderate, and a transition from a normal surface (mild) to a Braille block (mild) remains classified as mild. The system calculates the distance from the user to the point where the surface transition occurs and issues an audio alert indicating the hazard level, surface type, and distance. This enables the user to anticipate the transition in advance and reduce the risk associated with changes in walking conditions.

6. Experiments and Results

This study implemented a comprehensive testing framework to evaluate the smart cane’s performance in real-world scenarios. The experimental setup was designed to assess multiple integrated components of the system, including object detection, surface classification, and distance estimation. The framework included both controlled and uncontrolled scenarios to ensure robust performance under varying lighting, obstacle density, and surface transitions. Furthermore, the evaluation focused not only on the system’s accuracy but also on its responsiveness and real-time capability when deployed on embedded hardware, such as a Raspberry Pi. This approach ensured a holistic assessment of the system’s reliability and practical applicability for visually impaired users.

6.1. Experiments

This section comprises three components: dataset description, system configuration, and evaluation methodology. The datasets include both newly collected and publicly available sources for performance benchmarking. The configuration covers hardware setup and model parameters. The evaluation focuses on performance metrics used to assess system effectiveness. As summarized in Table 4, eight experiments were conducted. Experiments 1 and 3 evaluate object detection for obstacle identification and surface classification, respectively. Experiment 2 assesses obstacle distance measurement using an ultrasonic sensor. Experiments 4–6 examine distance estimation for surface transition points with mild, moderate, and severe severity levels. Experiment 7 compares detection models and transition estimation methods, while Experiment 8 presents ablation studies on system components.

6.1.1. Dataset

The dataset used in this study comprises two primary sources: the COCO2017 dataset [39] and the Pathway Surface Dataset [40,41,42,43,44,45,46,47,48]. The COCO2017 dataset, widely used for general object detection tasks, contains 29 object classes with 98,057 training images and 4127 testing images, encompassing categories such as people, vehicles, animals, and everyday objects. In contrast, the Pathway Surface Dataset is specifically developed for pathway surface classification, focusing on eight distinct surface types: normal [40,41], rough [42], grass [43], Braille block [44], crosswalk [45], puddle [46], hole [47], and uneven floor [48]. For each class, 150 original images were manually selected from open-access datasets available on the Roboflow Universe platform, ensuring a broad range of environmental diversity. These images were then augmented using standard image transformation techniques (e.g., rotation, flipping, contrast adjustment), resulting in 4800 training images and 1600 testing images. Notably, the images for the Pathway Surface Transition Prediction (PSTP) dataset were collected in real environments across the campus of Burapha University, Thailand, to reflect realistic and diverse walking conditions. To further assess the system’s performance in detecting surface irregularities, the testing set of the PSTP dataset was categorized into three severity levels, mild (50 images), moderate (300 images), and severe (100 images), as illustrated in Figure 10. A comprehensive summary of the datasets is provided in Table 5.

6.1.2. Configuration Parameter

The training process, as detailed in Table 6, was configured to optimize the system’s performance while maintaining efficiency for deployment on resource-constrained hardware such as the Raspberry Pi 4. All models were trained with an input image size of 640 × 640 pixels, ensuring a balance between detection resolution and processing speed. A learning rate of 0.0001 was adopted to provide gradual convergence and stable training dynamics. The AdamW optimizer was selected due to its adaptive learning rate capability combined with weight decay regularization, which helps prevent overfitting and supports generalization across diverse environments. To accommodate the hardware memory constraints while preserving batch-level learning efficiency, the batch size was set to 27. The training spanned 200 epochs, which provided sufficient iterations for the model to converge effectively across all classes in both object and surface detection tasks. The system utilized two sets of pre-trained weights tailored for specific subtasks: yolov8n.pt for pathway surface detection and yolov5x.pt for obstacle detection. The YOLOv8n model was chosen for surface classification due to its lightweight architecture and fast inference, which are suitable for continuous real-time operation from the downward-facing camera. In contrast, YOLOv5x, though slightly heavier, was selected for obstacle detection because of its proven robustness in detecting small, static, and dynamic objects with high precision upon user activation.

6.1.3. Evaluation

In object detection tasks, selecting appropriate evaluation metrics is essential for accurately assessing model performance. YOLO utilizes several well-established metrics to evaluate different aspects of detection accuracy, including object localization, classification, and the balance between false positives and false negatives. The mean Average Precision (mAP) [49], as defined in Equation (12), measures the model’s ability to correctly detect and classify objects across different classes by computing the Average Precision (AP) for each class and taking the mean across all classes. The Intersection over Union (IoU) [49] quantifies the overlap between predicted and ground truth bounding boxes, as presented in Equation (13). Additionally, precision, recall, and F1-score [49] are employed to further evaluate the model’s effectiveness.

Mean Average Precision (mAP)

Mean Average Precision (mAP) is a widely adopted metric for evaluating the performance of object detection models, as it provides a balanced measure of both precision and recall across all object classes. It reflects the model’s ability to accurately localize and classify objects. The YOLO architecture typically provides two variants of this metric: mAP@50, which computes the Average Precision (AP) using a fixed Intersection over Union (IoU) threshold of 0.5, and mAP@50:95, which averages the AP values across multiple IoU thresholds ranging from 0.5 to 0.95 in increments of 0.05, thereby offering a more comprehensive assessment of model robustness. The overall mAP is defined as the mean of the AP values across all classes, as formulated in Equation (12) below:

m A P = \frac{1}{C} \sum_{i = 1}^{C} {A P}_{i}

(12)

where C is the total number of object classes, and

{A P}_{i}

denotes the Average Precision for class i. The

{A P}_{i}

is calculated as the area under the precision–recall (P–R) curve specific to class i, which is constructed by plotting precision against recall values obtained at different confidence thresholds. The curve is typically interpolated to ensure monotonicity, and the area under this curve is computed using numerical integration. A larger area corresponds to better detection performance for that class. Therefore,

{A P}_{i}

effectively captures the trade-off between false positives and false negatives for class i, and averaging these values across all classes yields the overall mAP score.

2.: Intersection over Union (IoU)

The Intersection over Union (IoU) is a key metric for evaluating how well the predicted bounding boxes align with the ground truth boxes. It is calculated by dividing the overlapping area between the predicted and actual boxes by the total area covered by both boxes. A higher IoU indicates a more accurate localization.

I o U = \frac{|R_{p} \cap R_{g}|}{|R_{p} \cup R_{g}|}

(13)

where

R_{p}

denotes the predicted bounding box, and

R_{g}

represents the ground truth bounding box. The term

|R_{p} \cap R_{g}|

refers to the area of overlap, which is the intersection between the predicted and ground truth bounding boxes. Meanwhile,

|R_{p} \cup R_{g}|

represents the area of union, which is the total area covered by both bounding boxes.

6.2. Results

In the preliminary experimental results, the developed smart cane system demonstrated effective object detection performance. Based on evaluations using the YOLOv5 model, the system achieved a mean Average Precision (mAP@50) of 0.70 and mAP@50:95 of 0.50. The Average Precision and recall were 0.76 and 0.63, respectively. These results indicate the system’s capability to accurately classify a wide range of objects, particularly those relevant to the safety of visually impaired individuals, such as people, vehicles, and various types of obstacles, as shown in Table 7.

A comparative analysis of the mean Average Precision at 50% Intersection over Union (mAP@50) was conducted to evaluate the object detection performance of the proposed system, as presented in Table 8, alongside the approaches developed by Mai et al. [29] and Scalvini et al. [30]. While both the proposed system and the method by Mai et al. employ the YOLOv5 architecture, the results demonstrate that the proposed system achieves higher detection accuracy across several object classes. For instance, detection performance for the “person” and “bus” categories reached 0.84 and 0.86, respectively, compared to 0.67 for both categories in Mai et al.’s implementation. These improvements highlight the enhanced practical effectiveness of the proposed method, even when utilizing the same model architecture. In contrast, Scalvini et al. [30] employed the more advanced YOLOv8 model, which exhibited superior accuracy across a wider range of object classes, reflecting its enhanced detection capabilities. However, YOLOv8 requires significantly greater computational resources than YOLOv5, rendering it less suitable for real-time applications on resource-constrained platforms such as the Raspberry Pi, which serves as the primary processing unit in the smart cane system developed in this study.

Table 9 and Table 10 present the results of the accuracy evaluation for distance measurement conducted using the ultrasonic sensor integrated into the developed smart cane system. In Table 9, the measured distances were compared with the ground truth at 25, 50, 100, 150, 200, and 300 cm, with five measurements performed at each distance. The results revealed that high measurement accuracy was achieved, with the mean error ranging from 0.0 to 0.6 cm and an average accuracy of 99.6%. In Table 10, the performance of the proposed system was compared with that of a traditional distance measurement method [11], based on both the mean error and accuracy at each distance. It was observed that the proposed system yielded lower mean errors in most cases and achieved a higher average accuracy of 99.6%, compared to 98.0% obtained from the traditional method. Notably, at the 25 cm distance, the traditional method had a slightly lower mean error (0.0 cm) than the proposed system (0.3 cm); however, the proposed method demonstrated superior performance across longer distances, contributing to its higher overall accuracy. These results indicate that the developed system provides more stable and accurate distance estimation for obstacle detection.

According to the performance evaluation presented in Table 11, the developed model demonstrated the highest accuracy in detecting “hole” and “grass” surfaces, achieving mean Average Precision at 50% Intersection over Union (mAP@50) scores of 0.99 and 0.96, respectively. Other surface types such as “puddle” and “crosswalk” also yielded strong results, with mAP@50 values of 0.94 and 0.93, respectively. The overall average mAP@50 across all surface classes was 0.92, indicating the system’s robust capability to accurately detect and classify a wide range of walking surface conditions. Furthermore, Figure 11 illustrates the normalized confusion matrix for the pathway surface detection model, showing the classification performance across all surface classes. The diagonal values represent correct predictions, with the highest accuracy observed in “Hole” (1.00), followed by “Crosswalk” (0.96), “Grass” (0.96), and “Rough” (0.93). Lower performance is seen in “Braille block” (0.68), which is frequently misclassified as “background” (0.28), and in “Uneven floor” (0.90), with some confusion with similar classes like “Normal” and “Rough.” Additionally, some background regions are misclassified as actual surface types, particularly “Braille block” (0.06) and “Grass” (0.11), indicating the challenge of distinguishing between background and foreground in certain cases. Overall, the matrix confirms the model’s high classification accuracy with minor confusion in visually similar surfaces. Lastly, Figure 12 illustrates the learning curves of the pathway surface detection model over 200 training epochs. The top row presents training losses—box loss, classification loss, and distribution focal loss (DFL), all of which exhibit a consistent downward trend, indicating effective model learning. The bottom row shows validation losses, which decrease sharply during the initial epochs and then stabilize, reflecting good generalization capability. The rightmost plots display performance metrics including precision, recall, mAP@50, and mAP@50:95, which increase rapidly within the first 20 epochs and remain consistently high throughout training. These trends confirm that the model converges efficiently and maintains strong performance in detecting various pathway surface types.

Table 12 compares object detection models for pathway surface classification using mAP@50. While previous studies by Mai et al. [29] and Scalvini et al. [30] considered only two surface classes, Braille block and crosswalk, with average mAP@50 values of 0.82 and 0.86, respectively, and did not specify the YOLO versions used, our study expands the scope to eight classes including grass, hole, normal, puddle, rough, and uneven floor. The proposed method based on YOLOv8n achieved an average mAP of 0.92, comparable to YOLOv5s (0.93) and YOLOv5x (0.93), while maintaining a lightweight architecture suitable for embedded systems. Although its accuracy on Braille block (0.69) was slightly lower than previous works, it outperformed both in crosswalk detection (0.93) and delivered excellent results on critical classes such as hole (0.99) and rough (0.95), confirming its robustness and efficiency for real-world navigation tasks. Importantly, YOLOv8n was selected as the final deployment model due to its favorable balance between accuracy and efficiency, with significantly fewer parameters (3.2 M vs. 9.1 M in YOLOv5s and 97.2 M in YOLOv5x), making it highly suitable for real-time applications on resource-constrained embedded platforms [50].

Table 13, Table 14, Table 15, Table 16, Table 17, Table 18, Table 19, Table 20 and Table 21 present the results of distance estimation for various surface transitions, categorized by severity—mild, moderate, and severe. In the mild case (Table 13), the transition from a normal to a crosswalk surface yielded a mean error of 7.8 cm, indicating the system’s ability to detect low-risk, subtle changes. For moderate transitions (Table 14, Table 15, Table 16, Table 17, Table 18 and Table 19), including surface changes such as grass to puddle, grass to normal, grass to rough, rough to grass, normal to grass, and rough to puddle, the system achieved consistent accuracy with mean errors ranging from 2.8 cm to 5.4 cm. In the severe category (Table 20 and Table 21), which involved rough to uneven floor and normal to uneven floor transitions, the system maintained stable performance with errors of 1.9 cm and 4.1 cm, respectively. Overall, the average error across all tested transitions was 4.22 cm. Additionally, the proposed PSTPD method demonstrated efficient processing with an average runtime of 0.6 s per instance.

To compare the performance of object detection models and surface transition estimation methods on embedded systems, three strategies including simple midpoint [51], monocular depth networks [52], and the proposed weighted center method were applied to YOLOv5s, YOLOv5x, and YOLOv8-nano models, as summarized in Table 22. Among all configurations, YOLOv8-nano with the weighted center approach achieved the best results, yielding the lowest mean error of 4.22 cm, fastest processing speed (1.72 FPS), lowest average CPU usage (67.32%), and minimal peak memory consumption (483.38 MB). Although YOLOv5x provided similar accuracy (4.25 cm), it required significantly more resources, making it less suitable for real-time use. YOLOv5s offered comparable accuracy (4.26 cm) with moderate resource usage but remained less efficient than YOLOv8-nano. These findings highlight the YOLOv8-nano with weighted center estimation as the most effective and efficient option for real-time pathway surface detection on resource-constrained embedded platforms.

To evaluate how each hardware component contributes to system performance, an ablation study was conducted using various combinations of Camera 1, Camera 2, and the ultrasonic sensor, as shown in Table 23. The full setup with both cameras and the ultrasonic sensor provided the best overall results, achieving high obstacle detection accuracy (mAP@50 = 0.70), excellent surface classification (mAP@50 = 0.92), low obstacle distance error (0.30 cm), and accurate surface transition estimation (mean error = 4.22 cm). When the ultrasonic sensor was removed, the system could still classify surfaces but lost the ability to estimate obstacle distance. In contrast, combining one camera with the ultrasonic sensor allowed for distance estimation and obstacle detection but not surface classification. Using individual components in isolation showed limited functionality—either surface classification from a single camera or distance estimation from the ultrasonic sensor. These results confirm that all three components are necessary to ensure the system operates reliably and comprehensively in real-world conditions.

7. Discussion

The proposed system is capable of detecting pathway surface transition points, while also estimating their severity level and distance from the user. The experimental results confirm the system’s effectiveness in assisting visually impaired individuals with both obstacle detection and surface transition awareness. By integrating YOLOv8n-based models, calibrated depth mapping, and real-time auditory alerts, the smart cane successfully identifies eight distinct walking surfaces and provides users with contextual hazard information.

Despite the overall strong performance, some limitations remain. For instance, errors in pathway surface detection occasionally affected the accuracy of transition point estimation, as illustrated in Figure 13. In some cases, bounding boxes generated by the detection model did not accurately align with the true object location, leading to a shift in the estimated transition center and thus the measured distance. To address this limitation, future studies may explore expanding and diversifying the training dataset to encompass a broader spectrum of surface conditions, lighting scenarios, and uncommon cases. Furthermore, it is possible that the use of data augmentation and adjustments to the model architecture or loss function could contribute to more balanced detection performance across surface classes.

Another source of error, shown in Figure 14, stems from the fixed 45-degree tilt angle of the cane-mounted camera. While this setup generally ensures a consistent viewpoint, it does not account for real-world variations in how users hold the cane. Slight deviations from the calibrated angle can result in incorrect depth estimation due to changes in the image perspective and camera-to-surface geometry. To address this issue, future versions of the system may benefit from integrating a tilt-detection mechanism, such as an onboard accelerometer or inertial measurement unit (IMU). This approach could allow for real-time adjustment of depth estimation according to the cane’s tilt angle, potentially enhancing system robustness across diverse user postures and walking styles.

8. Conclusions

This study proposed a low-cost smart cane system that enhances the mobility and safety of visually impaired individuals by detecting both obstacles and walking surface transitions in real time. The system integrates dual RGB cameras, an ultrasonic sensor, and YOLO-based models (YOLOv5x for obstacle detection and YOLOv8n for pathway surface detection), along with a novel Pathway Surface Transition Point Detection (PSTPD) method that uses weighted bounding boxes and calibrated depth mapping for distance estimation and severity classification. Experimental results validated the effectiveness of the system. The obstacle detection module, triggered via ultrasonic sensing and push-button input, achieved a mean Average Precision (mAP@50) of 0.70 and a distance estimation error of only 0.3 cm. The pathway surface detection model, based on YOLOv8n, achieved a mAP@50 of 0.92 across eight surface types. The PSTPD module further demonstrated reliable distance estimation performance, with an average error of 4.22 cm and a processing time of 0.6 s per instance. Performance remained stable across varying transition severities, with mean errors of 7.8 cm (mild), 2.8–5.4 cm (moderate), and 1.9–4.1 cm (severe). These findings confirm the system’s practical applicability as a robust, real-time assistive solution. Future work will focus on integrating an inertial measurement unit (IMU) to compensate for cane tilt, expanding training datasets to improve model robustness, and optimizing power consumption for extended operation.

Author Contributions

Conceptualization, P.C.; methodology, P.C., T.M., P.R., K.K., and P.K.; software, P.C., T.M., P.R., T.N., K.K., and P.K.; validation, P.C., T.M., P.R., T.N., K.K., and P.K.; formal analysis, P.C., T.M., P.R., T.N., K.K., and P.K.; investigation, P.C., T.M., P.R., T.N., K.K., and P.K.; resources, P.C., T.M., P.R., T.N., K.K., and P.K.; data curation, P.C., T.M., P.R., T.N., K.K., and P.K.; writing—original draft preparation, P.C., T.M., P.R., T.N., K.K., and P.K.; writing—review and editing, P.C., T.M., P.R., K.K., and P.K.; visualization, P.C., T.M., P.R., T.N., K.K., and P.K.; supervision, P.C.; project administration, P.C.; funding acquisition, P.C., T.M., P.R., T.N., K.K., and P.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Faculty of Informatics, Burapha University, Chonburi, Thailand.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Burapha University (BUU) and approved by the Institutional Review Board of Burapha University (protocol code HS008/2568, approval date: 26 February 2025).

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are derived from both public domain resources and author-generated datasets, as follows: COCO2017 Dataset: The data used for obstacle detection are available in a publicly accessible repository. The COCO2017 dataset is openly available on Kaggle at https://www.kaggle.com/datasets/awsaf49/coco-2017-dataset (accessed on 10 January 2024). These data were derived from a public domain resource and are cited in the manuscript [39]. Pathway Surface Datasets: The data used for pathway surface classification were derived from public domain resources on the Roboflow Universe platform. These datasets are openly available at the following URLs: Normal sidewalk, https://universe.roboflow.com/testyolo5-ro4zc/test_yolo8s (accessed on 10 January 2024); https://universe.roboflow.com/pothole-yzqlw/potholekerta (accessed on 10 January 2024). Rough sidewalk, https://universe.roboflow.com/damagedsidewalks/damaged-sidewalks (accessed on 10 January 2024). Grass, https://universe.roboflow.com/aut/grass-mrude (accessed on 10 January 2024). Braille block, https://universe.roboflow.com/braille-block/block-wawkg (accessed on 10 January 2024). Crosswalk, https://universe.roboflow.com/edgar1019-naver-com/crosswalk-ognxu (accessed on 10 January 2024). Puddle, https://universe.roboflow.com/hanyang-university-bd2kb/puddle-detection (accessed on 10 January 2024). Hole, https://universe.roboflow.com/perception-hmwbz/pothole-2-7kwss (accessed on 10 January 2024). Uneven floor, https://universe.roboflow.com/data-ksqzc/dataadd-bkykc (accessed on 10 January 2024). These datasets were used for model training and evaluation of pathway surface classification models. PSTP-Mild, PSTP-Moderate, and PSTP-Severe Datasets: The raw datasets supporting the analysis of surface transition severity (PSTP-Mild, PSTP-Moderate, and PSTP-Severe) were generated by the authors. These data are not publicly available due to privacy and institutional restrictions. The data are available from the corresponding author on reasonable request.

Acknowledgments

This work was supported by the Faculty of Informatics, Burapha University, Chonburi, Thailand.

Conflicts of Interest

The authors declare no conflicts of interest.

References

World Health Organization. Blindness and Visual Impairment; WHO: Geneva, Switzerland, 2024; Available online: https://www.who.int/news-room/fact-sheets/detail/blindness-and-visual-impairment (accessed on 15 March 2025).
Daga, F.B.; Diniz-Filho, A.; Boer, E.R.; Gracitelli, C.P.; Abe, R.Y.; Fajpo, M. Fear of Falling and Postural Reactivity in Patients with Glaucoma. PLoS ONE 2017, 12, e0187220. [Google Scholar] [CrossRef]
Williams, J.S.; Kowal, P.; Hestekin, H.; O’Driscoll, T.; Peltzer, K.; Yawson, A.E.; Biritwum, R.; Maximova, T.; Salinas Rodríguez, A.; Manrique Espinoza, B.; et al. Prevalence, Risk Factors, and Disability Associated with Fall-Related Injury in Older Adults in Low- and Middle-Income Countries: Results from the WHO Study on Global AGEing and Adult Health (SAGE). BMC Med. 2015, 13, 147. [Google Scholar] [CrossRef] [PubMed]
Patino, M.; McKean-Cowdin, R.; Azen, S.P.; Allison, J.C.; Choudhury, F.; Varma, R. Central and Peripheral Visual Impairment and the Risk of Falls and Falls with Injury. Ophthalmology 2010, 117, 199–206. [Google Scholar] [CrossRef] [PubMed]
Khan, S.; Nazir, S.; Khan, H.U. Analysis of Navigation Assistants for Blind and Visually Impaired People: A Systematic Review. IEEE Access 2021, 9, 26712–26729. [Google Scholar] [CrossRef]
Beingolea, J.R.; Zea-Vargas, M.A.; Huallpa, R.; Vilca, X.; Bolivar, R.; Rendulich, J. Assistive Devices: Technology Development for the Visually Impaired. Designs 2021, 5, 75. [Google Scholar] [CrossRef]
Mai, C.; Xie, D.; Zeng, L.; Li, Z.; Li, Z.; Qiao, Z.; Qu, Y.; Liu, G.; Li, L. Laser Sensing and Vision Sensing Smart Blind Cane: A Review. Sensors 2023, 23, 869. [Google Scholar] [CrossRef]
Hersh, M. Wearable Travel Aids for Blind and Partially Sighted People: A Review with a Focus on Design Issues. Sensors 2022, 22, 5454. [Google Scholar] [CrossRef]
Buckley, J.G.; Panesar, G.K.; MacLellan, M.J.; Pacey, I.E.; Barrett, B.T. Changes to Control of Adaptive Gait in Individuals with Long-Standing Reduced Stereoacuity. Investig. Ophthalmol. Vis. Sci. 2010, 51, 2487–2495. [Google Scholar] [CrossRef]
Zafar, S.; Maqbool, H.F.; Ahmad, N.; Ali, A.; Moeizz, A.; Ali, F.; Taborri, J.; Rossi, S. Advancement in Smart Cane Technology: Enhancing Mobility for the Visually Impaired Using ROS and LiDAR. In Proceedings of the 2024 International Conference on Robotics and Automation in Industry (ICRAI), Lahore, Pakistan, 18–19 December 2024; pp. 1–8. [Google Scholar] [CrossRef]
Panazan, C.-E.; Dulf, E.-H. Intelligent Cane for Assisting the Visually Impaired. Technologies 2024, 12, 75. [Google Scholar] [CrossRef]
Sipos, E.; Ciuciu, C.; Ivanciu, L. Sensor-Based Prototype of a Smart Assistant for Visually Impaired People—Preliminary Results. Sensors 2022, 22, 4271. [Google Scholar] [CrossRef]
Cardillo, E.; Li, C.; Caddemi, A. Empowering Blind People Mobility: A Millimeter-Wave Radar Cane. In Proceedings of the 2020 IEEE International Workshop on Metrology for Industry 4.0 and IoT (MetroInd4.0&IoT), Rome, Italy, 3–5 June 2020; pp. 390–395. [Google Scholar] [CrossRef]
Sibu, S.F.; Raina, K.J.; Kumar, B.S.; Joseph, V.P.; Joseph, A.T.; Thomas, T. CNN-Based Smart Cane: A Tool for Visually Impaired People. In Proceedings of the 2023 9th International Conference on Smart Computing and Communications (ICSCC), Palai, India, 17–19 August 2023; pp. 126–131. [Google Scholar] [CrossRef]
Li, J.; Xie, L.; Chen, Z.; Shi, L.; Chen, R.; Ren, Y.; Wang, L.; Lu, X. An AIoT-Based Assistance System for Visually Impaired People. Electronics 2023, 12, 3760. [Google Scholar] [CrossRef]
Chen, L.B.; Pai, W.Y.; Chen, W.H.; Huang, X.R. iDog: An Intelligent Guide Dog Harness for Visually Impaired Pedestrians Based on Artificial Intelligence and Edge Computing. IEEE Sens. J. 2024, 24, 41997–42008. [Google Scholar] [CrossRef]
Rahman, M.W.; Tashfia, S.S.; Islam, R.; Hasan, M.M.; Sultan, S.I.; Mia, S.; Rahman, M.M. The Architectural Design of Smart Blind Assistant Using IoT with Deep Learning Paradigm. Internet Things 2021, 13, 100344. [Google Scholar] [CrossRef]
Patankar, N.S.; Patil, H.P.; Aware, B.H.; Maind, R.V.; Dhorde, P.S.; Deshmukh, Y.S. An Intelligent IoT-Based Smart Stick for Visually Impaired Person Using Image Sensing. In Proceedings of the 2023 14th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Delhi, India, 6–8 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar] [CrossRef]
Ma, Y.; Shi, Y.; Zhang, M.; Li, W.; Ma, C.; Guo, Y. Design and Implementation of an Intelligent Assistive Cane for Visually Impaired People Based on an Edge-Cloud Collaboration Scheme. Electronics 2022, 11, 2266. [Google Scholar] [CrossRef]
Leong, X.; Ramasamy, R.K. Obstacle Detection and Distance Estimation for Visually Impaired People. IEEE Access 2023, 11, 136609–136627. [Google Scholar] [CrossRef]
Nataraj, B.; Rani, D.R.; Prabha, K.R.; Christina, V.S.; Abinaya, R. Smart Cane with Object Recognition System. In Proceedings of the 5th International Conference on Smart Electronics and Communication (ICOSEC 2024), Coimbatore, India, 25–27 September 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1438–1443. [Google Scholar] [CrossRef]
Raj, S.; Srivastava, K.; Nigam, N.; Kumar, S.; Mishra, N.; Kumar, R. Smart Cane with Object Recognition System. In Proceedings of the 2023 10th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 23–24 February 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 704–708. [Google Scholar] [CrossRef]
Dang, Q.K.; Chee, Y.; Pham, D.D.; Suh, Y.S. A Virtual Blind Cane Using a Line Laser-Based Vision System and an Inertial Measurement Unit. Sensors 2016, 16, 95. [Google Scholar] [CrossRef]
Chang, W.-J.; Chen, L.-B.; Sie, C.-Y.; Yang, C.-H. An Artificial Intelligence Edge Computing-Based Assistive System for Visually Impaired Pedestrian Safety at Zebra Crossings. IEEE Trans. Consum. Electron. 2021, 67, 3–11. [Google Scholar] [CrossRef]
Bai, J.; Liu, Z.; Lin, Y.; Li, Y.; Lian, S.; Liu, D. Wearable Travel Aid for Environment Perception and Navigation of Visually Impaired People. Electronics 2019, 8, 697. [Google Scholar] [CrossRef]
Joshi, R.C.; Yadav, S.; Dutta, M.K.; Travieso-Gonzalez, C.M. Efficient Multi-Object Detection and Smart Navigation Using Artificial Intelligence for Visually Impaired People. Entropy 2020, 22, 941. [Google Scholar] [CrossRef]
Farooq, M.S.; Shafi, I.; Khan, H.; De La Torre Díez, I.; Breñosa, J.; Martínez Espinosa, J.C.; Ashraf, I. IoT Enabled Intelligent Stick for Visually Impaired People for Obstacle Recognition. Sensors 2022, 22, 8914. [Google Scholar] [CrossRef]
Veena, K.N.; Singh, K.; Ullal, B.S.; Biswas, A.; Gogoi, P.; Yash, K. Smart Navigation Aid for Visually Impaired Person Using a Deep Learning Model. In Proceedings of the 2023 3rd International Conference on Artificial Intelligence and Smart Energy (ICAIS), Coimbatore, India, 23–25 February 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1049–1053. [Google Scholar] [CrossRef]
Mai, C.; Chen, H.; Zeng, L.; Li, Z.; Liu, G.; Qiao, Z.; Qu, Y.; Li, L.; Li, L. A Smart Cane Based on 2D LiDAR and RGB-D Camera Sensor—Realizing Navigation and Obstacle Recognition. Sensors 2024, 24, 870. [Google Scholar] [CrossRef]
Scalvini, F.; Bordeau, C.; Ambard, M.; Migniot, C.; Dubois, J. Outdoor Navigation Assistive System Based on Robust and Real-Time Visual–Auditory Substitution Approach. Sensors 2024, 24, 166. [Google Scholar] [CrossRef]
Raspberry Pi Ltd. Raspberry Pi 4 Model B. Available online: https://www.raspberrypi.com/products/raspberry-pi-4-model-b/ (accessed on 17 December 2024).
HOCO. Web Camera GM101 2K HD. Available online: https://hocotech.com/product/home-office/pc-accessories/web-camera-gm101-2k-hd/ (accessed on 4 December 2024).
Advice IT Infinite Public Company Limited. WEBCAM OKER (OE-B35). Available online: https://www.advice.co.th/product/webcam/webcam-hd-/webcam-oker-oe-b35- (accessed on 4 December 2024).
Arduitronics Co., Ltd. Ultrasonic Sensor Module (HC-SR04) 5V. Available online: https://www.arduitronics.com/product/20/ultrasonic-sensor-module-hc-sr04-5v (accessed on 4 December 2024).
Nubwo Co., Ltd. NBL06. Available online: https://www.nubwo.co.th/nbl06/ (accessed on 5 December 2024).
ModuleMore. DS-212 Mini Push Button Switch. Available online: https://www.modulemore.com/p/2709 (accessed on 5 December 2024).
Ultralytics. YOLOv5 Models. Available online: https://docs.ultralytics.com/models/yolov5/ (accessed on 5 December 2024).
Ultralytics. YOLOv8 Models. Available online: https://docs.ultralytics.com/models/yolov8/ (accessed on 5 December 2024).
Lin, T.-Y.; Maire, M.; Belongie, S.; Bourdev, L.; Girshick, R.; Hays, J.; Perona, P.; Ramanan, D.; Zitnick, C.L.; Dollár, P. Microsoft COCO: Common Objects in Context. arXiv 2015, arXiv:1405.0312v3. [Google Scholar] [CrossRef]
Roboflow. Test_YOLO8s Dataset. Available online: https://universe.roboflow.com/testyolo5-ro4zc/test_yolo8s (accessed on 10 December 2024).
Roboflow. PotholeKerta Dataset. Available online: https://universe.roboflow.com/pothole-yzqlw/potholekerta (accessed on 5 December 2024).
Roboflow. Damaged-Sidewalks Dataset. Available online: https://universe.roboflow.com/damagedsidewalks/damaged-sidewalks (accessed on 5 December 2024).
Roboflow. Grass-Mrude Dataset. Available online: https://universe.roboflow.com/aut/grass-mrude (accessed on 7 December 2024).
Roboflow. Block-Wawkg Dataset. Available online: https://universe.roboflow.com/braille-block/block-wawkg (accessed on 3 December 2024).
Roboflow. Crosswalk-Ognxu Dataset. Available online: https://universe.roboflow.com/edgar1019-naver-com/crosswalk-ognxu (accessed on 14 December 2024).
Roboflow. Puddle-Detection Dataset. Available online: https://universe.roboflow.com/hanyang-university-bd2kb/puddle-detection (accessed on 10 December 2024).
Roboflow. Pothole 2-7kwss Dataset. Available online: https://universe.roboflow.com/perception-hmwbz/pothole-2-7kwss (accessed on 9 December 2024).
Roboflow. DataAdd-Bkykc Dataset. Available online: https://universe.roboflow.com/data-ksqzc/dataadd-bkykc (accessed on 5 December 2024).
Ultralytics. Performance Metrics Deep Dive. Available online: https://docs.ultralytics.com/guides/yolo-performance-metrics/ (accessed on 1 January 2025).
Ultralytics. YOLOv5 vs. YOLOv8: A Detailed Comparison. Available online: https://docs.ultralytics.com/compare/yolov5-vs-yolov8/ (accessed on 5 December 2024).
Testbook. Midpoint Formula. Available online: https://testbook.com/maths/midpoint-formula (accessed on 15 July 2025).
Masoumian, A.; Rashwan, H.A.; Cristiano, J.; Asif, M.S.; Puig, D. Monocular Depth Estimation Using Deep Learning: A Review. Sensors 2022, 22, 5353. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Examples of navigation challenges for the visually impaired at Burapha University, Thailand.

Figure 2. Concept of the PSTPD system with distance and danger estimation.

Figure 3. Prototype of the assistive device for the visually impaired.

Figure 4. Flowchart of the proposed method.

Figure 5. System architecture of the proposed method.

Figure 6. Obstacle detection based on YOLOv5 architecture.

Figure 7. Pathway surface detection based on YOLOv8n architecture.

Figure 8. Weighted center estimation for Pathway Surface Transition Point Detection.

Figure 9. Calibration and depth estimation for surface transition: (a) depth reference mapping based on calibration points; (b) estimated transition distance.

Figure 10. PSTP dataset: (a) mild (normal to crosswalk path); (b) moderate (grass to puddle path); (c) moderate (grass to normal path); (d) moderate (grass to rough path); (e) moderate (rough to grass path); (f) moderate (normal to grass path); (g) moderate (rough to puddle path); (h) severe (rough to uneven floor path); (i) severe (normal to uneven floor path).

Figure 11. Confusion matrix normalized for pathway surface detection model.

Figure 12. Learning curves for the pathway surface detection model.

Figure 13. Example of error caused by object detection model: (a) moderate (rough to puddle path); (b) moderate (grass to normal path).

Figure 14. Example of error caused by cane tilt angle: (a) mild (normal to crosswalk path); (b) severe (rough to uneven floor path).

Table 1. Comparison with existing research based on obstacle and pathway detection.

Method	Year	Components	Cost	Weight	Obstacle Distance	Pathway Transition Point Distance	Danger Level (Obstacle/Pathway)	Limitations
[25]	2019	RGB-D Camera, IMU Sensor, Earphones, Smartphone	High	Light	No	No	No	Fail to detect or measure pathway transition distances, high computational resources
[26]	2020	RGB Camera, Raspberry Pi4, Distance Sensor, Headphones	Low	Light	Yes	No	No	Fail to detect or measure pathway transition distances
[27]	2022	RGB Camera, Distance Sensors, GPS, Raspberry Pi4, Water Sensor, Earphones	Low	Light	Yes	No	No	Fail to detect or measure pathway transition distances
[28]	2023	RGB Camera, Raspberry Pi, Distance Sensor, Buzzer, Earphone	Low	Light	Yes	No	No	Fail to detect or measure pathway transition distances, limited effectiveness in obstacle detection
[29]	2024	LiDAR, RGB-D Camera, IMU, GPS, Jetson nano	High	Light	Yes	No	No	Fail to detect or measure pathway transition distances, high computational resources
[30]	2024	RGB-D Camera, GPS, IMU Sensor, Laptop	High	Bulky	No	No	Both	Fail to detect or measure pathway transition distances, high computational resources

Table 2. Hardware specifications.

Systems	Specification
Raspberry Pi [31]	Model: Raspberry Pi 4 Model B
	CPU: Broadcom BCM2711, Quad-core Cortex-A72 (ARM v8) 64-bit SoC @ 1.5 GHz
	Memory Size: 8 GB LPDDR4-3200 SDRAM
RGB Camera1 [32]	Model: Hoco Webcam GM101 Resolution: 2560 × 1440 pixels
RGB Camera1 [32]	Frame Rate: 30 FPS
RGB Camera2 [33]	Model: OE-B35
	Resolution: 640 × 480 pixels
	Frame Rate: 30 FPS
Ultrasonic Sensor [34]	Model: HC-SR04
	Detection Range: 2 cm–400 cm
	Measuring Angle: <15°
Battery [35]	Capacity: 10,000 mAh
	Battery: Lithium Polymer
	Output USB: 5V/3A
	Dimensions (Width × Depth × Height): 6.7 × 1.5 × 13.4 cm
	Weight: 0.21 kg
Push Button Switch [36]	DS-212 Mini No Lock Round Switch
Push Button Switch [36]	3.3 V DC (GPIO logic level)

Table 3. Obstacle severity levels.

Severity Level	Object Classes	Description
Mild	Bench, Backpack, Umbrella, Handbag, Suitcase, Cat, Dog, Bird, Bottle, Chair, Potted Plant	Generally static and easy to detect; pose minimal threat to navigation.
Moderate	Bicycle, Motorcycle, Stop Sign, Parking Meter, Fire Hydrant, Couch, Bed, Dining Table, Toilet, Sink, Refrigerator	May partially obstruct the path or exist at elevations not consistently detected; present moderate risk.
Severe	Person, Car, Bus, Train, Truck, Boat, Traffic Light	Dynamic, large, or linked to hazardous environments; high risk, requiring immediate user awareness.

Table 4. Overview of experimental design.

Experiment No.	Experiment Description
1	Evaluation of object detection performance for obstacle identification
2	Measurement of obstacle detection distance using ultrasonic sensor
3	Evaluation of object detection performance for pathway surface classification
4	Distance estimation for mild-level pathway surface transition points
5	Distance estimation for moderate-level pathway surface transition points
6	Distance estimation for severe-level pathway surface transition points
7	Performance comparison of object detection models and surface transition estimation methods
8	Evaluation of the effectiveness of each component via ablation experiments

Table 5. Dataset description.

Datasets	Number of Classes	Original Images	Augmented Images	Dataset (Images)
Datasets	Number of Classes	Original Images	Augmented Images	Training Set	Testing Set
COCO2017 [39]	29	102,184	-	98,057	4127
Pathway Surface [40,41,42,43,44,45,46,47,48]	8	1200 (150 images per class)	6400	4800	1600
PSTP—Mild Cases [Ours]	-	50	-	-	50
PSTP—Moderate Cases [Ours]	-	300	-	-	300
PSTP—Severe Cases [Ours]	-	100	-	-	100

Table 6. Training parameters.

Parameter	Value
Image size	640 × 640
Learning rate	0.0001
Optimizer	AdamW
Batch size	27
Epoch	200
Weights	yolov8n.pt (Pathway surface detection) yolov5x.pt (Obstacle detection)

Table 7. Performance evaluation of object detectors for obstacle detection with YOLOv5x model.

Classes	mAP@50	mAP@50:95	Precision	Recall
person	0.84	0.61	0.82	0.76
bicycle	0.66	0.40	0.76	0.58
car	0.74	0.50	0.76	0.68
motorcycle	0.78	0.52	0.79	0.70
bus	0.86	0.73	0.87	0.79
train	0.94	0.75	0.92	0.90
truck	0.63	0.45	0.67	0.53
boat	0.59	0.33	0.71	0.49
traffic light	0.63	0.33	0.71	0.58
stop sign	0.83	0.74	0.88	0.73
parking meter	0.68	0.53	0.79	0.63
fire hydrant	0.91	0.74	0.93	0.83
bench	0.47	0.32	0.67	0.43
cat	0.92	0.75	0.92	0.88
backpack	0.40	0.22	0.61	0.37
umbrella	0.71	0.48	0.74	0.65
handbag	0.38	0.22	0.60	0.35
suitcase	0.72	0.49	0.70	0.65
dog	0.84	0.70	0.80	0.78
bird	0.61	0.41	0.79	0.51
bottle	0.62	0.44	0.67	0.56
chair	0.80	0.67	0.81	0.76
potted plant	0.60	0.39	0.68	0.53
couch	0.67	0.44	0.74	0.63
bed	0.88	0.71	0.84	0.82
dining table	0.53	0.38	0.64	0.49
toilet	0.72	0.50	0.77	0.60
sink	0.68	0.50	0.75	0.59
refrigerator	0.58	0.36	0.65	0.54
Average	0.70	0.50	0.76	0.63

Table 8. Performance comparison of object detectors for obstacle detection based on mAP@50.

Classes	[29] (YOLOv5)	[30] (YOLOv8)	[Ours] (YOLOv8-nano)	[Ours] (YOLOv5s)	Proposed Method (YOLOv5x)
person	0.67	0.88	0.77	0.75	0.84
bicycle	0.39	0.90	0.54	0.53	0.66
car	0.73	0.96	0.64	0.63	0.74
motorcycle	0.51	0.89	0.70	0.68	0.78
bus	0.67	0.89	0.80	0.76	0.86
train	-	-	0.85	0.84	0.94
truck	0.75	0.92	0.51	0.50	0.63
boat	-	-	0.40	0.43	0.59
traffic light	0.37	0.85	0.50	0.53	0.63
stop sign	-	-	0.73	0.74	0.83
parking meter	-	0.91	0.64	0.64	0.68
fire hydrant	-	-	0.84	0.83	0.91
bench	-	0.72	0.32	0.31	0.47
cat	-	-	0.85	0.82	0.92
backpack	-	-	0.23	0.25	0.40
umbrella	-	-	0.56	0.56	0.71
handbag	-	-	0.24	0.22	0.38
suitcase	-	-	0.56	0.53	0.72
dog	-	-	0.70	0.67	0.84
bird	-	-	0.43	0.42	0.61
bottle	-	-	0.53	0.48	0.62
chair	-	0.86	0.70	0.65	0.80
potted plant	-	0.82	0.42	0.43	0.60
couch	-	-	0.55	0.54	0.67
bed	-	-	0.78	0.77	0.88
dining table	-	-	0.47	0.42	0.53
toilet	-	-	0.58	0.53	0.72
sink	-	-	0.60	0.59	0.68
refrigerator	-	-	0.43	0.39	0.58
Average	0.58	0.87	0.58	0.57	0.70

Table 9. Error distance for obstacles detection from ultrasonic sensor.

Actual Distance (cm)	Measured Distance (cm)					Mean Distance (cm)	Mean Error (cm)	Accuracy (%)
Actual Distance (cm)	1	2	3	4	5	Mean Distance (cm)	Mean Error (cm)	Accuracy (%)
25	24.8	24.6	24.9	24.5	24.7	24.7	0.3	98.8
50	49.9	50.1	49.8	50.4	49.8	50.0	0.0	100.0
100	100.6	100.3	100.7	100.2	100.4	100.4	0.4	99.6
150	150.1	150.3	150.0	150.4	150.3	150.2	0.2	99.8
200	200.7	200.5	200.6	200.3	200.9	200.6	0.6	99.7
300	299.8	299.7	300.3	300.5	301.1	300.3	0.3	99.9

Table 10. Performance comparison of our method and conventional method in distance measurement.

Actual Distance (cm)	Mean Error [11]	Mean Error (Our Method)	Accuracy (%) [11]	Accuracy (%) (Our Method)
25	0	0.3	100	98.8
50	1	0.0	98	100.0
100	3.2	0.4	96.8	99.6
150	3.8	0.2	97.5	99.8
200	4.4	0.6	97.8	99.7
300	5.8	0.3	98.1	99.9
Average	3.0	0.3	98.0	99.6

Table 11. Performance evaluation of object detectors for pathway surface detection with YOLOv8n model.

Classes	mAP@50	mAP@50:95	Precision	Recall
Braille block	0.69	0.51	0.87	0.57
Crosswalk	0.93	0.66	0.87	0.86
Grass	0.96	0.85	0.96	0.92
Hole	0.99	0.66	0.98	1.00
Normal	0.94	0.82	0.92	0.93
Puddle	0.94	0.73	0.96	0.88
Rough	0.95	0.80	0.96	0.88
Uneven floor	0.94	0.68	0.96	0.88
Average	0.92	0.71	0.94	0.87

Table 12. Performance comparison of object detectors for pathway surface detection based on mAP@50.

Classes	[29] (YOLOv5)	[30] (YOLOv8)	[Ours] (YOLOv5x)	[Ours] (YOLOv5s)	Proposed Method (YOLOv8n)
Braille block	0.83	0.87	0.73	0.68	0.69
Crosswalk	0.82	0.86	0.96	0.95	0.93
Grass	-	-	0.96	0.97	0.96
Hole	-	-	0.99	0.99	0.99
Normal	-	-	0.95	0.95	0.94
Puddle	-	-	0.94	0.96	0.94
Rough	-	-	0.96	0.95	0.95
Uneven floor	-	-	0.97	0.95	0.94
Average	0.82	0.86	0.93	0.93	0.92

Table 13. Mild case: transition from normal to crosswalk path.

Actual Distance (cm)	Estimated Distance(cm)										Mean Estimated Distance (cm)	Error (cm)	Average Processing Time (s)
Actual Distance (cm)	1	2	3	4	5	6	7	8	9	10	Mean Estimated Distance (cm)	Error (cm)	Average Processing Time (s)
80	84.4	84.7	84.8	84.9	84.9	85.0	87.8	87.8	88.1	88.2	86.1	6.1	0.5
100	106.0	108.2	105.3	106.6	104.4	105.5	107.6	104.7	106.6	106.9	106.2	6.2	0.5
120	129.1	129.1	129.1	129.1	129.1	129.1	129.4	129.4	129.4	129.5	129.2	9.2	0.6
140	145.9	146.6	146.8	146.9	147.0	147.1	147.1	147.5	147.8	148.2	147.1	7.1	0.6
160	169.0	169.1	170.1	170.3	170.3	170.7	171.0	171.0	171.0	171.2	170.4	10.4	0.6
Average (cm)												7.8	0.6

Table 14. Moderate case: transition from grass to puddle path.

Actual Distance (cm)	Estimated Distance (cm)										Mean Estimated Distance (cm)	Error (cm)	Average Processing Time (s)
Actual Distance (cm)	1	2	3	4	5	6	7	8	9	10	Mean Estimated Distance (cm)	Error (cm)	Average Processing Time (s)
80	80.2	80.2	80.2	80.3	80.3	80.5	80.4	80.5	80.9	81.0	80.4	0.4	0.6
100	100.7	99.0	98.2	98.1	97.9	97.7	97.3	96.8	95.7	95.7	97.7	2.3	0.6
120	135.0	130.3	136.3	129.1	131.4	131.7	129.9	134.4	132.6	127.5	131.8	11.8	0.6
140	147.1	147.4	147.4	147.5	147.8	150.6	150.4	149.7	149.2	149.1	148.6	8.6	0.6
160	161.5	161.9	162.6	164.2	164.2	164.9	164.9	165.9	165.7	165.0	164.1	4.1	0.6
Average (cm)												5.4	0.6

Table 15. Moderate case: transition from grass to normal path.

Actual Distance (cm)	Estimated Distance (cm)										Mean Estimated Distance (cm)	Error (cm)	Average Processing Time (s)
Actual Distance (cm)	1	2	3	4	5	6	7	8	9	10	Mean Estimated Distance (cm)	Error (cm)	Average Processing Time (s)
80	81.0	81.0	81.1	81.1	81.1	81.2	81.2	81.3	81.3	81.4	81.2	1.2	0.6
100	99.9	100.4	99.3	98.9	98.8	98.5	98.5	98.2	98.1	98.0	98.9	1.1	0.6
120	116.5	116.3	116.2	116.0	115.4	114.8	114.3	114.2	113.8	113.8	115.1	4.9	0.6
140	130.0	129.3	129.2	129.1	128.4	127.6	127.5	127.3	127.2	127.2	128.3	11.7	0.6
160	157.1	157.0	156.5	156.1	155.6	155.1	154.5	154.0	154.0	153.8	155.4	4.6	0.6
Average (cm)												4.7	0.6

Table 16. Moderate case: transition from grass to rough path.

Actual Distance (cm)	Estimated Distance (cm)										Mean Estimated Distance (cm)	Error (cm)	Average Processing Time (s)
Actual Distance (cm)	1	2	3	4	5	6	7	8	9	10	Mean Estimated Distance (cm)	Error (cm)	Average Processing Time (s)
80	80.4	80.4	80.5	80.5	80.5	80.8	80.8	80.8	80.8	80.9	80.6	0.6	0.6
100	100.2	100.2	100.3	99.6	99.6	100.4	100.5	100.6	100.8	100.9	100.3	0.3	0.6
120	117.5	116.6	116.5	116.3	116.0	115.9	115.8	115.8	115.7	115.7	116.2	3.8	0.6
140	136.1	136.1	135.6	135.2	135.1	135.0	134.9	134.7	134.2	133.6	135.1	4.9	0.6
160	157.1	156.5	156.4	156.0	155.8	155.6	155.4	155.2	155.1	154.5	155.7	4.2	0.6
Average (cm)												2.8	0.6

Table 17. Moderate case: transition from rough to grass path.

Actual Distance (cm)	Estimated Distance (cm)										Mean Estimated Distance (cm)	Error (cm)	Average Processing Time (s)
Actual Distance (cm)	1	2	3	4	5	6	7	8	9	10	Mean Estimated Distance (cm)	Error (cm)	Average Processing Time (s)
80	79.9	80.2	80.7	80.8	80.8	80.9	81.4	81.5	81.5	81.6	80.9	0.9	0.6
100	101.0	98.9	102.3	96.7	96.5	96.4	96.4	96.3	96.2	96.0	97.7	2.3	0.6
120	117.8	114.9	114.0	113.3	112.9	111.9	111.8	111.5	111.2	111.1	113.0	7.0	0.6
140	140.0	138.6	136.7	136.6	136.6	136.2	136.0	135.7	135.7	135.0	136.7	3.3	0.6
160	155.5	155.4	155.2	154.7	154.0	153.7	153.3	153.2	152.7	152.6	154.0	6.0	0.6
Average (cm)												3.9	0.6

Table 18. Moderate case: transition from normal to grass path.

Actual Distance (cm)	Estimated Distance (cm)										Mean Estimated Distance (cm)	Error (cm)	Average Processing Time (s)
Actual Distance (cm)	1	2	3	4	5	6	7	8	9	10	Mean Estimated Distance (cm)	Error (cm)	Average Processing Time (s)
80	80.8	80.8	80.9	81.2	81.4	81.5	81.9	82.4	82.5	82.9	81.6	1.6	0.6
100	100.4	100.5	101.7	101.8	102.0	102.0	102.2	102.4	102.4	102.4	101.8	1.8	0.6
120	127.3	127.6	128.1	128.2	128.5	128.8	128.8	128.9	129.2	129.2	128.5	8.5	0.6
140	136.8	136.5	136.5	136.5	136.4	136.3	136.3	136.0	136.0	135.9	136.3	3.7	0.6
160	161.8	161.9	161.9	162.0	157.6	162.8	165.9	166.9	169.4	149.9	162.0	2.0	0.6
Average (cm)												3.5	0.6

Table 19. Moderate case: transition from rough to puddle path.

Actual Distance (cm)	Estimated Distance (cm)										Mean Estimated Distance (cm)	Error (cm)	Average Processing Time (s)
Actual Distance (cm)	1	2	3	4	5	6	7	8	9	10	Mean Estimated Distance (cm)	Error (cm)	Average Processing Time (s)
80	83.6	83.9	84.0	84.0	84.1	84.1	84.1	84.3	84.4	84.4	84.1	4.1	0.6
100	100.0	100.0	99.8	100.6	99.4	99.3	100.7	99.2	100.8	99.0	99.9	0.1	0.6
120	126.0	129.6	129.8	129.9	130.2	130.3	130.5	131.1	131.2	131.2	130.0	10.0	0.6
140	141.4	141.5	142.0	142.4	144.5	144.8	145.1	145.4	145.8	146.0	143.9	3.9	0.6
160	160.0	160.1	159.8	160.3	161.3	161.4	161.6	161.9	162.7	163.7	161.3	1.3	0.6
Average (cm)												3.9	0.6

Table 20. Severe case: transition from rough to uneven floor path.

Actual Distance (cm)	Estimated Distance (cm)										Mean Estimated Distance (cm)	Error (cm)	Average Processing Time (s)
Actual Distance (cm)	1	2	3	4	5	6	7	8	9	10	Mean Estimated Distance (cm)	Error (cm)	Average Processing Time (s)
80	80.1	79.8	79.6	79.4	79.4	80.7	79.2	78.9	78.6	78.5	79.4	0.6	0.6
100	97.8	103.9	94.8	105.6	92.7	92.7	92.5	92.5	92.3	92.2	95.7	4.3	0.6
120	127.0	127.0	127.0	127.1	127.4	127.7	128.3	129.0	129.1	129.3	127.9	7.9	0.6
140	140.4	141.1	141.8	141.9	142.0	142.1	142.2	142.2	142.4	144.3	142.0	2.0	0.6
160	157.8	157.2	157.0	155.3	154.5	154.4	151.9	151.2	150.9	150.3	154.0	6.0	0.6
Average (cm)												4.1	0.6

Table 21. Severe case: transition from normal to uneven floor path.

Actual Distance (cm)	Estimated Distance (cm)										Mean Estimated Distance (cm)	Error (cm)	Average Processing Time (s)
Actual Distance (cm)	1	2	3	4	5	6	7	8	9	10	Mean Estimated Distance (cm)	Error (cm)	Average Processing Time (s)
80	82.5	82.6	82.8	82.9	83.0	83.1	83.3	83.4	83.4	83.4	83.0	3.0	0.6
100	99.5	100.6	99.3	99.1	100.9	100.9	101.3	98.0	97.5	97.5	99.5	0.5	0.6
120	120.7	121.1	122.0	122.1	117.3	122.9	123.0	123.7	124.1	124.1	122.1	2.1	0.6
140	141.7	142.0	142.2	142.3	142.4	142.9	143.8	143.9	144.1	144.2	142.9	2.9	0.6
160	159.9	160.1	159.7	160.6	160.6	160.7	161.4	161.6	162.1	162.4	160.9	0.9	0.6
Average (cm)												1.9	0.6

Table 22. Performance comparison of object detection models and pathway surface transition point estimation methods.

Model	Method	Mean Error (cm)	FPS	Avg CPU%	Peak CPU%	Peak Memory MB
YOLOv5s	simple midpoint	38.89	0.92	78.84	82.30	965.67
YOLOv5s	monocular depth networks	48.13	0.47	79.45	82.60	965.67
YOLOv5s	weighted center	4.26	0.91	78.83	82.20	965.54
YOLOv5x	simple midpoint	37.66	0.15	90.74	95.10	1011.39
YOLOv5x	monocular depth networks	49.25	0.13	89.16	95.00	1116.46
YOLOv5x	weighted center	4.25	0.15	90.57	95.00	1069.09
YOLOv8-nano	simple midpoint	38.81	1.8	71.19	74.80	492.40
YOLOv8-nano	monocular depth networks	47.75	0.62	76.83	79.30	647.01
YOLOv8-nano (Ours)	weighted center	4.22	1.72	67.32	74.50	483.38

Table 23. Evaluation of the effectiveness of each component via ablation experiments.

Experimental Setup	Obstacle Detection (mAP@50)	Pathway Surface Detection (mAP@50)	Mean Obstacle Distance Error (cm)	Mean Transition Distance Error (cm)
Camera 1 + Camera 2 + Ultrasonic	0.70	0.92	0.30	4.22
Camera 1 + Camera 2	-	0.92	-	4.22
Camera 1 + Ultrasonic	-	0.92	0.30	4.22
Camera 2 + Ultrasonic	0.70	-	0.30	-
Camera 1	-	0.92	-	4.22
Camera 2	-	-	-	-
Ultrasonic	-	-	0.30	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mungdee, T.; Ramsiri, P.; Khabuankla, K.; Khambun, P.; Nupim, T.; Chophuk, P. Low-Cost Smart Cane for Visually Impaired People with Pathway Surface Detection and Distance Estimation Using Weighted Bounding Boxes and Depth Mapping. Information 2025, 16, 707. https://doi.org/10.3390/info16080707

AMA Style

Mungdee T, Ramsiri P, Khabuankla K, Khambun P, Nupim T, Chophuk P. Low-Cost Smart Cane for Visually Impaired People with Pathway Surface Detection and Distance Estimation Using Weighted Bounding Boxes and Depth Mapping. Information. 2025; 16(8):707. https://doi.org/10.3390/info16080707

Chicago/Turabian Style

Mungdee, Teepakorn, Prakaidaw Ramsiri, Kanyarak Khabuankla, Pipat Khambun, Thanakrit Nupim, and Ponlawat Chophuk. 2025. "Low-Cost Smart Cane for Visually Impaired People with Pathway Surface Detection and Distance Estimation Using Weighted Bounding Boxes and Depth Mapping" Information 16, no. 8: 707. https://doi.org/10.3390/info16080707

APA Style

Mungdee, T., Ramsiri, P., Khabuankla, K., Khambun, P., Nupim, T., & Chophuk, P. (2025). Low-Cost Smart Cane for Visually Impaired People with Pathway Surface Detection and Distance Estimation Using Weighted Bounding Boxes and Depth Mapping. Information, 16(8), 707. https://doi.org/10.3390/info16080707

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Low-Cost Smart Cane for Visually Impaired People with Pathway Surface Detection and Distance Estimation Using Weighted Bounding Boxes and Depth Mapping

Abstract

1. Introduction

2. Related Works

3. Problem Analysis

4. System Overview

4.1. Hardware Unit

4.2. Software Unit

5. Proposed Method

5.1. Input

5.2. Processing Unit

5.2.1. Obstacle Detection Process

5.2.2. Pathway Surface Transition Point Detection (PSTPD) Process

5.3. Notification

5.3.1. Obstacle Detection Alerts

5.3.2. Pathway Surface Transition Point Detection Alerts

6. Experiments and Results

6.1. Experiments

6.1.1. Dataset

6.1.2. Configuration Parameter

6.1.3. Evaluation

6.2. Results

7. Discussion

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI