Detecting Traversable Area and Water Hazards for the Visually Impaired with a pRGB-D Sensor

The use of RGB-Depth (RGB-D) sensors for assisting visually impaired people (VIP) has been widely reported as they offer portability, function-diversity and cost-effectiveness. However, polarization cues to assist traversability awareness without precautions against stepping into water areas are weak. In this paper, a polarized RGB-Depth (pRGB-D) framework is proposed to detect traversable area and water hazards simultaneously with polarization-color-depth-attitude information to enhance safety during navigation. The approach has been tested on a pRGB-D dataset, which is built for tuning parameters and evaluating the performance. Moreover, the approach has been integrated into a wearable prototype which generates a stereo sound feedback to guide visually impaired people (VIP) follow the prioritized direction to avoid obstacles and water hazards. Furthermore, a preliminary study with ten blindfolded participants suggests its effectivity and reliability.

The appearance of RGB-D sensors across a wide range of visual assistance contexts is not only due to their low-power-consumption and cost-efficiency, but also its robustness and good performance as they are able to simultaneously perceive color and depth information at video framerates. Apart from color and depth, polarization and its imaging extend information dimension to be used for material discrimination and target detection because polarization parameters reflect physical properties of materials [30]. Meanwhile, given that light with different polarization state behaves differently at the interface of object surface, polarization has been applied to many surface measurement techniques. However, typical commercial RGB-D sensors including light-coding sensors and stereo cameras simply rely on intensity information whereas their polarization cues are generally unavailable or weak.
Light-coding sensors project near-IR speckles and capture the patterns with a calibrated camera, the structured light pattern being coded with spatial or temporary projecting methods [31]. These sensors correlate observed pixels with projected pixels so as to derive depth information through triangulating algorithms. However, many consumer-grade sensors are coded in intensity rather than polarization regardless of whether they use spatial or temporary coding methods. Thereby, as projected 2 of 20 speckles are easily lost in sunlight, approaches for VIP with light-coding sensors focus mainly on obstacle avoidance [2,3,9,11,13], traversability awareness [5][6][7][8]12,14] and stairs detection [4,10,[15][16][17].
A stereo camera estimates depth maps through stereo matching of images from two lenses. There is a large body of work on stereo matching, some of which are local stereo matching algorithms [32][33][34] while others approximate global optimization [35][36][37]. Local stereo algorithms are generally faster as they calculate the depth of pixels based simply on the correlation of local image patches. Most commercial stereo cameras include local algorithms to achieve speedy refresh rates but return sparse depth information in texture-less scenes, such as a blank wall. Different from conventional stereo cameras, the RGB-D sensor of the RealSense R200 and RS400 [38] devices employed active stereo [39] to combine a pair of cameras with fixed structured light patterns to realize extraordinary environmental adaptability. ZED [40] implemented GPU-accelerated global stereo algorithms to achieve dense and large-scale depth perception, but these commercial stereo cameras comprised IR or RGB lenses and none of them use polarization modulation. As a result, most approaches for VIP with stereo cameras assist navigation in large outdoor spaces [18][19][20][21][22][26][27][28][29] and some schemes have also addressed localization [23][24][25], but none of them detect water hazards to prevent misleading VIP to step into water areas.
In this paper, we proposed to use a polarized a RGB-D sensor for detecting traversable areas and water hazards by adequately considering polarization effects. We equipped a stereo camera with an attitude angle sensor, a pair of horizontal and vertical polarizers to form a pRGB-D sensor. The point cloud in 3D space relative to the sensor is computed with the depth image and intrinsic parameters. The points are adjusted to the world coordinate system with the attitude angles of the sensor. The traversable area is determined by building a stochastic polar occupancy grid, estimating the ground pose by fitting a B-spline surface and using dynamic programming to segment the ground area and obstacles. The water areas are identified by using polarization effects as the primary cue and the polarization difference image is computed with disparity information from the pRGB-D sensor.
This work extends what was presented in [11,12,29], where we addressed the traversable area detection and scene segmentation with a wearable RGB-D sensor. In this paper, we have attached the filters to add polarization information, and generated a stereo sound feedback system to take into account both traversable regions and water areas to guide VIP to follow the prioritized direction in order to avoid obstacles and water hazards, as well as set up a pRGB-D navigation framework which has been proved to be useful and reliable by a field test with ten blindfolded participants. This paper is organized as follows: in Section 2, related works that have addressed traversability awareness or water hazard detection are reviewed; in Section 3, the presented approach is elaborated in detail; in Section 4, the effectiveness and robustness in terms of the detection and the stereo sound interface are demonstrated with experiments and a user study; and in Section 5, relevant conclusions are drawn and future works are expected.

Related Work
Several related works have been dedicated to traversable area detection, which can be divided into two categories: traversable area detection based on scene segmentation and traversable area detection based on surface normal vector estimation. Dakopoulos presented a comparative survey of wearable obstacle avoidance systems to inform the research community and users about the capabilities of the progress in assistive technology for VIP [41]. As for segmenting ground areas from hazardous obstacles, Wang adopted a mean-shift algorithm to regard the local point cloud to be a traversable area if the angle between the fitting plane and horizontal plane in the camera coordinate system is less than a threshold [42]. The approach achieves good robustness under certain environmental conditions with the reliance on thresholds and assumptions. Cheng put forward a seeded region growing algorithm to detect ground areas by adequately considering the depth boundaries instead of attaching importance to setting thresholds [11]. The approach withstands the fluctuations between frames but confuses VIP with the unstable results as the seeded pixels are randomly elected. Rodríguez estimated ground planes based on RANdom SAmple Consensus (RANSAC) and used a polar grid representation to account for the potential obstacles [27]. The approach involved a field test to verify the usability but it yielded a detection error in more than ten percent of the frames. Badino proposed to represent the 3D situation with a set of rectangular sticks for autonomous systems by taking into account the fact that the traversable area in front of vehicles is limited by objects with almost vertical surfaces and estimated the road by fitting a B-spline surface instead of assuming a planar road [43,44]. Elleuch developed a segmentation approach based on possibility modeling theory by imposing a crucial starting point, namely considering reference area placed in the bottom of the image to be traversable [45,46]. As for estimating the surface normal vector to determine traversable areas, Koester detected accessible sections by calculating the gradients and estimating the surface normal vector directions of real-world scene patches. The approach delivers fast and effective detection even in crowded scenes, but it prevents practical applications for user studies due to its overreliance on the quality of 3D reconstruction process and adherence to constraints [47]. For example, it assumes that area directly in front of the user is accessible. Bellone defined a descriptor to measure the unevenness of a local surface based on the estimation of normal vectors [48]. The index gives an enhanced description of the traversable area to perform obstacle avoidance and terrain navigability analysis simultaneously. However, it computes both the inclination and roughness at a low framerate which remains unpromising for assisting at normal walking speed. Ni proposed to extract safe directions by compressing the depth image from a light-coding sensor in the walking assistive robotic system which also comprises ultrasonic sensors to detect the evenness of road surfaces [49]. Cui predicted traversability based on support vector machine (SVM), and planned the optimal path for field robot with consideration of the travelling smoothness [50].
Several related work have been devoted to water hazard detection by taking multi-cue approaches with color, texture, depth and polarization information. Rankin detected water areas by analyzing brightness and saturation values, and searched for the reflections of ground cover with range information, and also discussed the possibility to specifically use the zero disparity to encode water area [51]. Yao extracted the features of brightness and texture from the non-reflective regions with color cameras, and detected the reflected regions where negative height exists and disparity sudden changes with stereo cues [52]. Xie detected water hazards by calculating the polarization degree image and used a self-adaptive segmentation algorithm and a morphology filtering technique to label water regions [53]. The method has good performance in complex natural backgrounds, but it requires mechanically rotating the linear polarizing filter and taking images at different angles, which results in a tradeoff between convenience and effectivity. Yao combined traditional machine learning and mean-shift segmentation, which accurately detects not only common water hazards, but also identifies special water hazards that may have lots of ripples or low brightness [54]. Rankin firstly found the horizon line and constrained the search for water bodies to the region below the horizon. Thereby, this approach decreases the computational cost of water detection as well as reduces the probability of false water detection [55]. Shao used a line-structured light sensor to achieve all-day water puddle detection by measuring the deformation of active light strip [56]. Kim presented a methodology using a stereo camera and treated the detection of wet areas and puddles differently [57]. For the detection of wet areas, the polarization difference, graininess as well as gradient magnitude are combined using hypothesis verification. For the detection of puddles, the depth map obtained by the stereo camera is used to check whether depth changes abruptly around the puddles. Nguyen set up a ZED stereo camera on top of a car with polarizing films on the camera lenses and tracked water hazards based on the polarization and color variation of reflected light with consideration of the effect of polarized light from the sky [58].
Almost all reported works were conducted under ideal scenarios or cause intolerable side effects in navigational assistance for VIP. To the best of our knowledge, no previous work has addressed the detection of water hazards to help VIP to avoid stepping in water areas and no previous work has fully combined the detection of traversable area and the detection of water hazard. Our approach builds on the prior work and makes the breakthrough while the superiority is apparent in the following aspects:

•
The approach is able to detect traversable area and water hazard simultaneously, which prevents colliding into obstacles and stepping in water areas during navigation.

•
The approach takes into account of the attitude angles of the sensor, which enables accurate detection with the continuous movement of the wearable prototype during navigation.

•
The approach combines the detection of water hazard with the detection of traversable area, which decreases the computing cost and reduces water area detection error in scenarios such as sky regions and edges of buildings.

Approach
In this section, the approach to detect traversable area and water hazard is elaborated in detail. As shown in Figure 1, the approach is described in terms of the pRGB-D sensor, the traversable area detection, the water hazard detection and the stereo sound feedback accordingly. • The approach is able to detect traversable area and water hazard simultaneously, which prevents colliding into obstacles and stepping in water areas during navigation.

•
The approach takes into account of the attitude angles of the sensor, which enables accurate detection with the continuous movement of the wearable prototype during navigation.

•
The approach combines the detection of water hazard with the detection of traversable area, which decreases the computing cost and reduces water area detection error in scenarios such as sky regions and edges of buildings.

Approach
In this section, the approach to detect traversable area and water hazard is elaborated in detail. As shown in Figure 1, the approach is described in terms of the pRGB-D sensor, the traversable area detection, the water hazard detection and the stereo sound feedback accordingly.

pRGB-D Sensor
As shown in Figure 1, the pRGB-D sensor is comprised of a ZED stereo camera which is equipped with horizontal and vertical polarization filters on the left and right camera, respectively, as well as a MPU6050 attitude sensor [59]. The stereo camera delivers 720 p left-right color video pairs and a depth image at 60 frames per second with the resolution same as the color video. The wideangle all-glasses dual lenses share a diagonal field of view of 110 degrees. The sensor ranges from 0.5 m to 20 m which is quite suitable to detect traversable area and water hazard.
For the traversability awareness, it not only refers to long-range ground plane detection but also involves the early warning when a VIP approaches obstacles at close-range. For obstacle detection, the minimum range of 0.5 m, equivalent to the length of an arm, is enough for the avoidance of most obstacles given the normal walking speed of a VIP during navigation.

pRGB-D Sensor
As shown in Figure 1, the pRGB-D sensor is comprised of a ZED stereo camera which is equipped with horizontal and vertical polarization filters on the left and right camera, respectively, as well as a MPU6050 attitude sensor [59]. The stereo camera delivers 720 p left-right color video pairs and a depth image at 60 frames per second with the resolution same as the color video. The wide-angle all-glasses dual lenses share a diagonal field of view of 110 degrees. The sensor ranges from 0.5 m to 20 m which is quite suitable to detect traversable area and water hazard. For the traversability awareness, it not only refers to long-range ground plane detection but also involves the early warning when a VIP approaches obstacles at close-range. For obstacle detection, the minimum range of 0.5 m, equivalent to the length of an arm, is enough for the avoidance of most obstacles given the normal walking speed of a VIP during navigation.
For water hazard detection, the primary cue is the polarization effect as specular reflection on water is known to polarize light [58]. The specular reflection from the water surface R re f lect is the sum of two polarization components R re f lect,⊥ and R re f lect,// , perpendicular and parallel respectively to the plane formed by the incident and reflected rays as given in Equations (1) and (2). As shown in Figure 2, n air and n water are the refractive indices of air and water respectively and θ is the reflection angle at the water surface. For water hazard detection, the primary cue is the polarization effect as specular reflection on water is known to polarize light [58]. The specular reflection from the water surface reflect R is the sum of two polarization components However, part of the energy S F is scattered by suspended particles and ground bottom while the rest is absorbed by both particles and the ground as shown in Equation (4)  Suppose the polarized light from the air comprises energy components E S ⊥ (θ) and E S // (θ) for perpendicular and parallel components respectively as functions of reflection angle. The total energy entering the water can be calculated with Equation (3): However, part of the energy F S is scattered by suspended particles and ground bottom while the rest is absorbed by both particles and the ground as shown in Equation (4) where µ particles and µ bottom are the scattering coefficients of particles and the ground bottom respectively, and µ absorption is the absorption coefficient: Light in water can be considered as highly unpolarized light with random scattering and internal reflection. With part of the scattered light coming out of the water through refraction, the total light energy component coming out of the water is the summation of reflection and refraction for each  (5) and (6) where θ is the refraction angle from water to air: ]R re f ract,// (n air , n water , θ = θ) (6) For illustrative purposes, in Figure 2, the water refractive index n water is set to 1.33 and the absorption coefficient µ absorption is set to 60%. Figure 3 shows that the polarization difference between perpendicular and parallel components is large enough to provide a strong cue for water hazards at reflection angles above 70 degrees or at distances above the minimum detection range. Thereby, it is able to detect water hazards by setting the threshold of polarization difference with point correspondence from left color image to right color image. perpendicular and parallel components is large enough to provide a strong cue for water hazards at reflection angles above 70 degrees or at distances above the minimum detection range. Thereby, it is able to detect water hazards by setting the threshold of polarization difference with point correspondence from left color image to right color image. If the wearing height of the pRGB-D sensor is H , the minimum water hazard detection range D is calculated with Equation (7), where ϕ is the vertical field view of the stereo camera, which equals 55.75 degrees: Thereby, given the average value of user height, the wearing height H is set to 1.6 m, and the reflection angle θ is set to 70 degrees. Thus, the minimum range D is around 1.45 m which is If the wearing height of the pRGB-D sensor is H, the minimum water hazard detection range D is calculated with Equation (7), where ϕ is the vertical field view of the stereo camera, which equals 55.75 degrees: Thereby, given the average value of user height, the wearing height H is set to 1.6 m, and the reflection angle θ is set to 70 degrees. Thus, the minimum range D is around 1.45 m which is enough to provide immediate feedback to avoid stepping in water areas and quite matched with the working range of the depth sensor. However, with the continuous movement of the user head during navigation, the threshold to segment potential water hazards should be varying as the polarization direction of reflected light would not be strictly parallel or perpendicular with the polarizer in this circumstances. Given the rolling angle of the pRGB-D sensor equaling to η, which is obtained with an attitude sensor, the dynamic threshold is set to δ = δ 0 cos 2 η, where δ 0 is the threshold if there is no rolling. Additionally, notwithstanding different brightness of water area in left and right image due to the specular reflection, depth information of water areas on the road is available as the stereo camera in the pRGB-D sensor implemented a global stereo matching method to approximate smoothness constraint for compensating local brightness differences and radiometric transformations [35]. As a result, the pRGB-D sensor is quite suitable for navigational assistance thanks to its wearable computing technology and its compatibility to support traversable area and water hazard detection.

Traversable Area Detection
In order to detect traversable area, a simple and effective technique is presented. First, 3D coordinates of the point cloud are calculated. Given the depth Z of pixel (u, v) in the depth image, the calibrated focal length f , the radial distortion coefficients c 1 , c 2 , c 5 , the tangential distortion coefficients c 3 , c 4 , as well as (u 0 , v 0 ) the principal point, the point (X, Y, Z) in the camera coordinate system can be derived with Equations (8)−(13): With the help of the attitude sensor, X, Y and Z coordinates in the camera coordinate system are adjusted to world coordinates. Assume a point in the camera coordinate system is (X, Y, Z) and the attitude angles acquired from the attitude sensor are (a, b, c). This means the point (X, Y, Z) rotates about the x-axis by α = a, then rotates about the y-axis by β = b and rotates about z-axis by γ = c in the end. Multiplying the point (X, Y, Z) by the rotation matrix given by Equation (14), the point (X w , Y w , Z w ) in world coordinate system is obtained: After the point cloud adjustment, occupancy grids are computed with the stereo disparities transferred from the adjusted point cloud coordinates, and the method proposed in [60] is used to propagate the uncertainty of the stereo disparities onto the grid. In the polar occupancy grid, the image column is used to represent the angular coordinate and the adjusted disparity is used to represent the range. 3D points lying above the ground area are registered as obstacles in the occupancy grid and extremely high 3D points are dismissed such as ceilings which will not influence the travelling of VIP. Moreover, the ground pose is fitted by a B-Spline surface with consideration of two facts: real-world ground areas are not always planar surfaces [44] and the lenses share a distortion given a diagonal field of view of 110 degrees. Our system builds on the work of Stixel World [43], which represents the scene using a few upright objects on the ground plane. The computation includes dividing the input image into a set of vertical columns, and searching for regions that could be immediately reached without collision. In this sense, the task of traversable area detection is to find the first visible relevant obstacle in the positive direction of depth. In this work, instead of using a single threshold to segment obstacles and traversable areas independently, dynamic programming is used to seek the optimal path cutting the polar grid from left to right by imposing a spatial smoothness and a temporal smoothness to penalize both jumps in depth and deviations of the current frame from the previous frame. Figure 4 depicts frames of traversable area detection results by producing a green mask to overlay on the left color image acquired with the pRGB-D sensor. instead of using a single threshold to segment obstacles and traversable areas independently, dynamic programming is used to seek the optimal path cutting the polar grid from left to right by imposing a spatial smoothness and a temporal smoothness to penalize both jumps in depth and deviations of the current frame from the previous frame. Figure 4 depicts frames of traversable area detection results by producing a green mask to overlay on the left color image acquired with the pRGB-D sensor.

Water Hazard Detection and Stereo Sound Feedback
As shown with the flowchart in Figure 5, in order to detect water hazards out of the traversable area, we first generate the disparity image from the depth image produced by the pRGB-D sensor. Considering pixel G(u,v) in the depth image which ranges d , the calibrated focal length f , the baseline length B , the disparity value dis of pixel G can be calculated with Equation (15): After that, we warp the right image to the left image to produce a point correspondence for water hazard detection. With the stereo pair, we are able to calculate a brightness difference image which signifies the polarization difference. Considering pixel G whose image coordinate is (u,v) in left image and corresponds to the pixel G' (

Water Hazard Detection and Stereo Sound Feedback
As shown with the flowchart in Figure 5, in order to detect water hazards out of the traversable area, we first generate the disparity image from the depth image produced by the pRGB-D sensor. instead of using a single threshold to segment obstacles and traversable areas independently, dynamic programming is used to seek the optimal path cutting the polar grid from left to right by imposing a spatial smoothness and a temporal smoothness to penalize both jumps in depth and deviations of the current frame from the previous frame. Figure 4 depicts frames of traversable area detection results by producing a green mask to overlay on the left color image acquired with the pRGB-D sensor.

Water Hazard Detection and Stereo Sound Feedback
As shown with the flowchart in Figure 5, in order to detect water hazards out of the traversable area, we first generate the disparity image from the depth image produced by the pRGB-D sensor. Considering pixel G(u,v) in the depth image which ranges d , the calibrated focal length f , the baseline length B , the disparity value dis of pixel G can be calculated with Equation (15): After that, we warp the right image to the left image to produce a point correspondence for water hazard detection. With the stereo pair, we are able to calculate a brightness difference image which signifies the polarization difference. Considering pixel G whose image coordinate is (u,v) in left image and corresponds to the pixel G' (  Considering pixel G(u, v) in the depth image which ranges d, the calibrated focal length f , the baseline length B, the disparity value dis of pixel G can be calculated with Equation (15): After that, we warp the right image to the left image to produce a point correspondence for water hazard detection. With the stereo pair, we are able to calculate a brightness difference image which signifies the polarization difference. Considering pixel G whose image coordinate is (u, v) in left image and corresponds to the pixel G (u + dis, v) in right image. If the brightness at G(u, v) in left image is V le f t and the brightness in the warped right image is V right at G (u + dis, v). The left image is captured by the left color camera with the horizontal polarizer while the right image is captured by the right color camera with the vertical polarizer. Thereby, the brightness difference of pixel G equals V le f t − V right . After extraction of the brightness difference image, it depends on the following conditions whether pixel G can be classified as water hazard: • Pixel G is classified as traversable in the process as presented in Section 3.2, where traversable area is detected with depth and attitude information.
where δ is the dynamic threshold of polarization difference set with the derivation in Section 3.1 and varies with the rolling angle of the pRGB-D sensor. After traversable area detection and water hazard detection, a farthest traversable line to represent the traversable distances in different directions is proposed to generate the stereo sound interface. In each direction, the farthest distance for navigating is determined by both the traversable area and water hazard detection results. In each direction, we first locate the farthest traversable pixel with the biggest depth value t Z and then search for the nearest pixel denoted as water area whose depth value is w Z . The farthest traversable depth value f Z equals to w Z if a water area is detected and equals to t Z if no water areas are detected in this direction. Gaussian filtering is applied to smooth the farthest traversable line to reduce noises before generating feedback to VIP. After the filtering stage, the green farthest traversable lines with or without water hazard detection are obtained, as shown in Figure 7 where the red triangles represent the suggested navigation directions to take to avoid obstacles or water hazards. After traversable area detection and water hazard detection, a farthest traversable line to represent the traversable distances in different directions is proposed to generate the stereo sound interface. In each direction, the farthest distance for navigating is determined by both the traversable area and water hazard detection results. In each direction, we first locate the farthest traversable pixel with the biggest depth value Z t and then search for the nearest pixel denoted as water area whose depth value is Z w . The farthest traversable depth value Z f equals to Z w if a water area is detected and equals to Z t if no water areas are detected in this direction. Gaussian filtering is applied to smooth the farthest traversable line to reduce noises before generating feedback to VIP. After the filtering stage, the green farthest traversable lines with or without water hazard detection are obtained, as shown in Figure 7 where the red triangles represent the suggested navigation directions to take to avoid obstacles or water hazards. This paper uses a variant of the non-semantic audio interface presented in our previous work [29] to transfer the traversable area and water hazard detection results to VIP by synthesizing stereo sound from the farthest traversable line. In this work, the mapping of traversable distances to the different sonification of instruments is elaborated. depth value is w Z . The farthest traversable depth value f Z equals to w Z if a water area is detected and equals to t Z if no water areas are detected in this direction. Gaussian filtering is applied to smooth the farthest traversable line to reduce noises before generating feedback to VIP. After the filtering stage, the green farthest traversable lines with or without water hazard detection are obtained, as shown in Figure 7 where the red triangles represent the suggested navigation directions to take to avoid obstacles or water hazards.  As shown in Figure 8, the directions of traversable areas are differentiated not only by sound source locations in the virtual 3D space and the directions of stereo sound, but also by musical instruments, whose timbre differs from each other. This paper uses a variant of the non-semantic audio interface presented in our previous work [29] to transfer the traversable area and water hazard detection results to VIP by synthesizing stereo sound from the farthest traversable line. In this work, the mapping of traversable distances to the different sonification of instruments is elaborated.
As shown in Figure 8, the directions of traversable areas are differentiated not only by sound source locations in the virtual 3D space and the directions of stereo sound, but also by musical instruments, whose timbre differs from each other. The generation of the stereo sound follows rules below to guide and attract VIP to take the prioritized direction to detour around hazardous regions: • Divide the farthest traversable line into five sections which correspond to the five different musical instruments. We only use five instruments to make sure that it is easy to understand and would not sound confusing after the familiarization period. Five instruments, including trumpet, piano, gong, violin and xylophone, produce sounds simultaneously.

•
The horizontal field view of the pRGB-D sensor is 86.5°, so each musical instrument corresponds to the traversable line with a range of 17.3°.

•
Each direction of traversable area and water hazard is represented by a musical instrument in the virtual 3D space.

•
For each musical instrument, the bigger the sum of height in the corresponding section of the traversable line, the louder the sound from the instrument. As the relationship shown in Figure  9, the instrument loudness increases exponentially with the average traversable distance in the The generation of the stereo sound follows rules below to guide and attract VIP to take the prioritized direction to detour around hazardous regions: • Divide the farthest traversable line into five sections which correspond to the five different musical instruments. We only use five instruments to make sure that it is easy to understand and would not sound confusing after the familiarization period. Five instruments, including trumpet, piano, gong, violin and xylophone, produce sounds simultaneously.

•
The horizontal field view of the pRGB-D sensor is 86.5 • , so each musical instrument corresponds to the traversable line with a range of 17.3 • . • Each direction of traversable area and water hazard is represented by a musical instrument in the virtual 3D space.

•
For each musical instrument, the bigger the sum of height in the corresponding section of the traversable line, the louder the sound from the instrument. As the relationship shown in Figure 9, the instrument loudness increases exponentially with the average traversable distance in the corresponding section of the traversable line so as to guide or attract the user to navigate the traversable direction.

•
For each musical instrument and the corresponding section, the bigger the areas that overlap with the red triangle sections which denote suggested directions to take on navigation, the higher the pitch of the instrument. As the relationship shown in Figure 10, the instrument pitch increases linearly with the proportion of the overlapping areas with the red triangle sections.
The generation of the stereo sound follows rules below to guide and attract VIP to take the prioritized direction to detour around hazardous regions: • Divide the farthest traversable line into five sections which correspond to the five different musical instruments. We only use five instruments to make sure that it is easy to understand and would not sound confusing after the familiarization period. Five instruments, including trumpet, piano, gong, violin and xylophone, produce sounds simultaneously.

•
The horizontal field view of the pRGB-D sensor is 86.5°, so each musical instrument corresponds to the traversable line with a range of 17.3°.

•
Each direction of traversable area and water hazard is represented by a musical instrument in the virtual 3D space.

•
For each musical instrument, the bigger the sum of height in the corresponding section of the traversable line, the louder the sound from the instrument. As the relationship shown in Figure  9, the instrument loudness increases exponentially with the average traversable distance in the corresponding section of the traversable line so as to guide or attract the user to navigate the traversable direction.

•
For each musical instrument and the corresponding section, the bigger the areas that overlap with the red triangle sections which denote suggested directions to take on navigation, the higher the pitch of the instrument. As the relationship shown in Figure 10, the instrument pitch increases linearly with the proportion of the overlapping areas with the red triangle sections.

Experiments
In this section, the proposed approach is verified with a series of experiments including tests on a score of indoor and outdoor scenarios, comparisons with other works in the literature and a field test in the real-world environment involving ten voluntary blindfolded users. Figure 11 shows a number of traversable area detection and water area detection results in indoor and outdoor environments. The approach detects traversable areas and generates farthest traversable lines correctly, whose red triangles suggest the prioritized directions to avoid close obstacles and water hazards.

Experiments
In this section, the proposed approach is verified with a series of experiments including tests on a score of indoor and outdoor scenarios, comparisons with other works in the literature and a field test in the real-world environment involving ten voluntary blindfolded users. Figure 11 shows a number of traversable area detection and water area detection results in indoor and outdoor environments. The approach detects traversable areas and generates farthest traversable lines correctly, whose red triangles suggest the prioritized directions to avoid close obstacles and water hazards. Figure 11 shows a number of traversable area detection and water area detection results in indoor and outdoor environments. The approach detects traversable areas and generates farthest traversable lines correctly, whose red triangles suggest the prioritized directions to avoid close obstacles and water hazards.

Comparison of Detection Results
To compare the performance of traversable area and water hazard detection with respect to other works, the results of several approaches are shown in Figure 12. Figure 12a−c are the results of the proposed approach which detects ground areas and water areas correctly. Comparatively, Figure 12d-f show the indoor and outdoor results of the approach proposed by Rodríguez which estimated the ground plane based on RANSAC plus filtering techniques [27]. Figure 12f is a correct result of detecting ground with some noises and holes, but the ground is partly detected in Figure 12e which is a type of sample error as the inclination angle of the plane is not considered.
We removed this kind of error in our work with adequate use of the attitude angles of the pRGB-D sensor. The approach in [58] set up a stereo camera to track water areas by modeling the sky light effect with reflection and azimuth angles. That approach is able to detect water hazards robustly, but mistakenly detects sky regions or edges of buildings as water areas as shown in Figure 12g−i. As the approach proposed in this paper combines the detection of water hazards and traversable areas, the aerial detection errors which would not hinder the navigation of VIP are reduced in number. Thus, the aerial errors would not be transferred to mislead VIP to avoid the fake water hazards. The total computation time of a single frame is 70.5 ms, while the image acquisition from the pRGB-D sensor takes 3 ms, and the time cost for the segmentation of traversable area and the detection of water hazard are 48 ms and 19.5 ms, respectively. Thereby, the computation cost is saved to maintain a reasonably qualified refresh rate of 14.2 frames per second on a 4.0 GHz Intel Core 7 processor with the resolution empirically set at 640 × 360, which is enough to feed results back in time to assist VIP at normal walking speed.
works, the results of several approaches are shown in Figure 8. Figure 8a−c are the results of the proposed approach which detects ground areas and water areas correctly. Comparatively, Figure 12d-f show the indoor and outdoor results of the approach proposed by Rodríguez which estimated the ground plane based on RANSAC plus filtering techniques [27]. Figure 12f is a correct result of detecting ground with some noises and holes, but the ground is partly detected in Figure 12e which is a type of sample error as the inclination angle of the plane is not considered. We removed this kind of error in our work with adequate use of the attitude angles of the pRGB-D sensor. The approach in [58] set up a stereo camera to track water areas by modeling the sky light effect with reflection and azimuth angles. That approach is able to detect water hazards robustly, but mistakenly detects sky regions or edges of buildings as water areas as shown in Figure 12g−i. As the approach proposed in this paper combines the detection of water hazards and traversable areas, the aerial detection errors which would not hinder the navigation of VIP are reduced in number. Thus, the aerial errors would not be transferred to mislead VIP to avoid the fake water hazards. The total computation time of a single frame is 70.5 ms, while the image acquisition from the pRGB-D sensor takes 3 ms, and the time cost for the segmentation of traversable area and the detection of water hazard are 48 ms and 19.5 ms, respectively. Thereby, the computation cost is saved to maintain a reasonably qualified refresh rate of 14.2 frames per second on a 4.0 GHz Intel Core 7 processor with the resolution empirically set at 640 × 360, which is enough to feed results back in time to assist VIP at normal walking speed.
In order to provide a quantitative evaluation of the approach, the performance of the algorithm is thoroughly analyzed by calculating traversable area detection rate ( TADR ), water area detection rate ( WADR ) and expansion error ( EE ). Given Equations (16)−(18), traversable area detection rate ( TADR ) is defined as the number of frames which ground has been detected correctly ( GD ) divided by the number of frames with ground ( G ). Water area detection rate (WADR ) is defined as the number of frames which water area has been detected correctly (WAD ) divided by the number of frames with water area (WA). Meanwhile, expansion error ( EE ) is defined as the number of frames which traversable area has been expanded to non-ground areas ( ENG ) including water areas divided by the number of frames with ground ( G ): In order to provide a quantitative evaluation of the approach, the performance of the algorithm is thoroughly analyzed by calculating traversable area detection rate (TADR), water area detection rate (WADR) and expansion error (EE). Given Equations (16)−(18), traversable area detection rate (TADR) is defined as the number of frames which ground has been detected correctly (GD) divided by the number of frames with ground (G). Water area detection rate (WADR) is defined as the number of frames which water area has been detected correctly (WAD) divided by the number of frames with water area (WA). Meanwhile, expansion error (EE) is defined as the number of frames which traversable area has been expanded to non-ground areas (ENG) including water areas divided by the number of frames with ground (G): For quantitative analysis we used a sequence of 1285 frames from the pRGB-D dataset [61], which is built for tuning parameters and testing. The dataset is captured by the wearable sensor whose images of a single frame are respectively the depth image, left and right color images acquired with different polarizers whose polarization directions are perpendicular to each other after calibrations using a polarization analyzer. As shown in Table 1, the approach has a high traversable area detection rate of 94.4% compared with respect to other works [27,29]. Moreover, the expansion error has been decreased to 7.3% after the combination of traversable area detection and water area detection, illustrating the reliability of the approach.  [27] 79.6% N/A 41.5% RANSAC plus expanding techniques [29] 93.8% N/A 65.2% 3D tracking of water areas [58] N/A 86.5% N/A

Detection Limitations
The proposed pipeline effectively combines polarization-color-depth-attitude information and improves the detection performance in indoor and outdoor environments, which is able to provide real-time assistances for VIP to avoid colliding into obstacles or stepping into water areas. However, it still has certain potential limitations. Since the proposed traversable area detection is based on the pRGB-D sensor whose depth information is acquired with stereo matching between left and right RGB images, the approach will have a slight problem when the depth information is defective as shown in Figure 13. As the textures of glass door and the overexposed region are poor, the depth information are invalid or filled with wrong values offered by the stereo type of RGB-D sensor in our prototype as shown in Figure 13a−c. Thereupon, the approach wrongly detects the region of glass door as a traversable area, and merely part of the ground area close to the overexposed wall is detected. Besides, as shown in Figure 13d-f, the approach is insensitive to small obstacles on the ground due to the depth filtering and dynamic programming strategy. To solve this problem, in the future time we aim to implement a small obstacle discovery algorithm [62] or provide both physical support and navigational indication for tiny threat awareness [49]. (d-f) Small obstacles would be detected as traversable area.

User Study
To evaluate the ability of our system to help visually impaired users follow a prioritized route without losing their orientation or colliding into obstacles or stepping into water puddles, we conducted a user study in a real-world outdoor space. Before the participants were blindfolded, the stereo sound feedback of the system when wearing a bone-conduction headphone was explained as shown in Figure 14a,b. As we know, VIP or blindfolded users rely on cues from the environment a lot. For example, they use the sounds from their ears to understand the orientations of streets. Thereby, the assisting prototype is not only wearable but also ears-free, because the bone-conducting interface will not block the ears of participants from hearing environmental sounds. This familiarization period lasted for about seven minutes, during which they were able to adapt to hearing the stereo sound to keep away from close hazards including obstacles and water areas.

User Study
To evaluate the ability of our system to help visually impaired users follow a prioritized route without losing their orientation or colliding into obstacles or stepping into water puddles, we conducted a user study in a real-world outdoor space. Before the participants were blindfolded, the stereo sound feedback of the system when wearing a bone-conduction headphone was explained as shown in Figure 14a,b. As we know, VIP or blindfolded users rely on cues from the environment a lot. For example, they use the sounds from their ears to understand the orientations of streets. Thereby, the assisting prototype is not only wearable but also ears-free, because the bone-conducting interface will not block the ears of participants from hearing environmental sounds. This familiarization period lasted for about seven minutes, during which they were able to adapt to hearing the stereo sound to keep away from close hazards including obstacles and water areas.
To evaluate the ability of our system to help visually impaired users follow a prioritized route without losing their orientation or colliding into obstacles or stepping into water puddles, we conducted a user study in a real-world outdoor space. Before the participants were blindfolded, the stereo sound feedback of the system when wearing a bone-conduction headphone was explained as shown in Figure 14a,b. As we know, VIP or blindfolded users rely on cues from the environment a lot. For example, they use the sounds from their ears to understand the orientations of streets. Thereby, the assisting prototype is not only wearable but also ears-free, because the bone-conducting interface will not block the ears of participants from hearing environmental sounds. This familiarization period lasted for about seven minutes, during which they were able to adapt to hearing the stereo sound to keep away from close hazards including obstacles and water areas. After the learning and familiarization stage, ten voluntary participants were blindfolded and were required to travel around a building and an empty field without colliding into obstacles or stepping into water areas. As a comparison task, the ten volunteers were divided into two groups randomly and while one group travelled with only traversable area detection, the other group travelled with both traversable area and water hazard detection. To provide the participants with a sense of orientation, the user would get an extra hint to turn at the bends or the road intersections. For a single test, three measures were recorded including the time needed by a participant to walk the route as shown in Figure 14c, the number of collisions into obstacles and the number of times they stepped into water areas. The route was acquired through recording the real-time locations After the learning and familiarization stage, ten voluntary participants were blindfolded and were required to travel around a building and an empty field without colliding into obstacles or stepping into water areas. As a comparison task, the ten volunteers were divided into two groups randomly and while one group travelled with only traversable area detection, the other group travelled with both traversable area and water hazard detection. To provide the participants with a sense of orientation, the user would get an extra hint to turn at the bends or the road intersections. For a single test, three measures were recorded including the time needed by a participant to walk the route as shown in Figure 14c, the number of collisions into obstacles and the number of times they stepped into water areas. The route was acquired through recording the real-time locations during navigation with the red labels on the map provided by AMAP [63]. The timer starts when a participant was sent to the start region and stops when the participant completed navigating the route. As shown in Table 2, average number of times stepping into water areas with water hazard detection is significantly less than that without water hazard detection. It is the detection of water hazard in advance that endows participants to take the prioritized direction to avoid water hazards compared with the condition with only traversable area detection. Moreover, the average number of collisions into obstacles is few and most of the collisions occurred when the participants swerved and hit a curb which was outside the horizontal field view of the pRGB-D sensor. Comparatively, times of stepping in water areas are very few but not none, which is due to the minimum range of water hazard detection. If the user did not turn when hearing the stereo sound and kept moving forward, the water area would be out of the vertical field view of the pRGB-D sensor or the reflection angles would not be large enough to provide a strong cue for water hazards. Nonetheless, explicit feedback about directions to walk to avoid obstacles and water hazards were found helpful for assisting VIP. The results suggest that participants were aware of obstacles and water areas with our system and could make use of the stereo sound to keep away from or detour around the hazards. In other word, the safety and robustness of navigating VIP has been dramatically enhanced with the traversable area and water hazard detection based on the proposed pRGB-D framework. After the test, the ten participants were asked two simple questions including whether the prototype was easy to wear and whether the stereo sound feedback could be used to find traversable directions. As shown in the questionnaire results (Table 3), all users answered that the system is useful and could help them to find traversable areas and avoid obstacles or water hazards. Most participants were able to understand the changing process of loudness and pitch, and follow the stereo sound to find the traversable directions and detour around the hazards, while a small group of participants only used the direction of the stereo sound to turn left or turn right to avoid collisions. This demonstrated the usefulness and effectivity of the approach in terms of traversable area detection and stereo sound feedback. In addition, some users provided some advice on adding functions, such as the detection of curbs and negative obstacles. Moreover, the users were optimistic about the pRGB-D system and would like to have a more profound experience.

Conclusions
RGB-D sensors are a ubiquitous choice to assist navigation for the visually impaired. However, most solutions are confined to using intensity information whereas the polarization cues are weak. The proposed pRGB-D framework enhances the safety and robustness of navigation by combining traversable area detection and water hazard detection with polarization-color-depth-attitude information. Indoor and outdoor empirical evidences, quantitative analysis of the detection as well as a user study with ten blindfolded participants demonstrated its usefulness and reliability.
In the future, we will improve our navigation assistance approach to reach a higher level of perception and offer more independence to visually impaired individuals. Specifically, we look forward to including polarization for ranging and further investigating discovery schemes for the detection of small obstacles such as curbs and transparent objects. Additionally, it is necessary to run a larger study with visually impaired participants to test this approach, and different audio output settings could be compared. The use of portable processors and sensors with larger field of view would also benefit the system in terms of wearable mobility and detectable performance.