Open Access
This article is
 freely available
 reusable
Sensors 2017, 17(9), 2101; https://doi.org/10.3390/s17092101
Article
VisionBased RealTime Traversable Region Detection for Mobile Robot in the Outdoors
Harbin Institute of Technology (Shenzhen), Shenzhen 518055, Guangdong, China
^{*}
Author to whom correspondence should be addressed.
Received: 18 July 2017 / Accepted: 4 September 2017 / Published: 13 September 2017
Abstract
:Environment perception is essential for autonomous mobile robots in humanrobot coexisting outdoor environments. One of the important tasks for such intelligent robots is to autonomously detect the traversable region in an unstructured 3D real world. The main drawback of most existing methods is that of high computational complexity. Hence, this paper proposes a binocular visionbased, realtime solution for detecting traversable region in the outdoors. In the proposed method, an appearance model based on multivariate Gaussian is quickly constructed from a sample region in the left image adaptively determined by the vanishing point and dominant borders. Then, a fast, selfsupervised segmentation scheme is proposed to classify the traversable and nontraversable regions. The proposed method is evaluated on public datasets as well as a real mobile robot. Implementation on the mobile robot has shown its ability in the realtime navigation applications.
Keywords:
binocular sensor; traversable region detection; mobile robot; outdoors1. Introduction
In recent decades, the robotics community has made great efforts to develop mobile robots with complete autonomy. Traversable region detection is one of the fundamental problems for such autonomous navigation systems. Numerous visionbased approaches have been proposed for structured road detection. In recent years, some researchers have attempted to tackle more challenging unstructured road conditions where unstructured roads are referred to the roads that have arbitrary surfaces and various shapes without painted markers or distinguishable borders [1]. However, due to the demand of good tradeoff between time efficiency and accuracy, it is still challenging for a ground robot to autonomously locate a variety of traversable areas and safely navigate itself with respect to humandefined rules (i.e., keep off the nonroad area) in realtime [2].
Kong et al. [3] proposed a method to decompose the detection process into two steps: vanishing point estimation and road segmentation based upon the detected vanishing point. This approach could be used to detect various types of roads. However, it was limited by high computational complexity of the vanishing point estimation. Many algorithms have attempted to speed up the procedure of vanishing point estimation. Moghadam et al. [4] proposed an optimal local dominant orientation method using joint activities of only four Gabor filters and an adaptive distancebased voting scheme for estimation of the vanishing point. Miksik [5] investigated a method of expanding Gabor wavelets into a linear combination of Haarlike functions to perform fast filtering, and using superpixels in the voting scheme to speed up the process. Besides the vanishing point based road detection methods, many methods have directly differentiated the road pixels from the background using appearance models [1,6,7,8,9]. Tan et al. [6] adopted multiple color histograms to capture variability of the road surface and a singlecolor histogram to model the background in RGB space. Ramstrom and Christensen [7] proposed to construct Gaussian mixture models (GMMs) for the road and background using UV color, normalized red and green, and luminance components. These methods required the color models to be trained offline and thus could not be adaptive to the appearance variations in real road conditions. There are also several methods building the model directly from sample regions in the input image. Álvarez and Ĺopez [8] employed normalized histogram of the surrounding area of a set of seed pixels at the bottom part of the image to build the road model using illuminant invariance space. Similarly, Lu [9] selected a region from the bottom part of the input image as road masks to model the road and the background combining complementary color features, such as RGB, Lab and SILTP (Scale Invariant Local Ternary Patterns) texture feature. The performance of these methods depended on the quality of constant sample regions that might be a false road sample area. Instead of selecting a constant sample area, Miksik et al. [10] used a trapezoid at the bottom and center of the image as the initial sample region, and then refined it using the vanishing point. In recent years, different hybrid methods have been proposed to further increase accuracy and robustness [1,11]. In [1], a sample region was first determined using the vanishing point, the geometric and color features of road. Then a model combining color, edge and shape features was used to detect unmarked roads. Álvarez et al. [11] introduced road priors (geographical information) and contextual cues (horizon lines, vanishing points, road markings, 3D scene layout and road geometry) to detect road areas in varying conditions. All of the methods mentioned above, however, are still computationally expensive.
Most recently, Lee et al. have made efforts to reduce the computational load for online applications [12]. In [12], to estimate the traversability in complex and unknown environments for an autonomous vehicle, a selfsupervised online learning architecture based on incremental nonparametric Bayesian clustering was developed. This method consists of three stages: superpixel labeling with inertial measurement unit (IMU) and wheel encoder data, incremental cluster learning, and traversable region classification with knearest neighborhood. The whole algorithm could be used on a robot platform for traversability estimation in complex and unknown environments as the authors claimed. However, it took about 48.3 s for the incremental clustering leaning and 0.02 s for the traversabilty estimation, which is still far from the realtime requirement of most robotics applications (e.g., at frame rate 30 fps).
It is worth noting that deep learning methods are also popular for traversable region segmentation [13,14,15,16,17]. Xiao et al. [13] presented a road detection algorithm based on structured random forest, making use of the contextual information. The computational time of their algorithm was, on average, 70 ms (Intel Core i53230 [email protected]) for unstructured road detection. Alvarez et al. [14] used a convolutional neural network based algorithm to learn features for road scene segmentation. Mohan [15] developed a new deep learning architecture, deep deconvolutional neural networks, for road parsing. This method provides one of the best results on the KITTIRoad benchmark. Howeverm the running time was over 2 s (Multiple [email protected]). Recently, Oliveira et al. [16] proposed an efficient deep model for monocular road segmentation which obtained the best tradeoff between segmentation quality and runtime on the KITTI dataset. However, the runtime was still over 80 ms (NVIDIA Titan X GPU). On the other hand, these deep learning based methods are only capable of being implemented on highpower GPUs which are usually unavailable on a mobile robotic platform. Thus, deep learning based methods are still very challenging for realtime applications at interactive frame rates in a mobile robot.
In this paper, we propose a novel method of detecting traversable region for a mobile robot from a single image in realtime without dependency on any specific hardware. A fast and robust multivariate Gaussian model combining RGB, illumination invariant space (IIS) [18] and local binary pattern (LBP) [19] feature is built from an adaptive sample region determined by the constraints of vanishing point and two road borders. The segmentation could be implemented in realtime on standard CPU equipped by an ordinary mobile robot. The main contributions of this paper are as follows:
 A new method is proposed to robustly estimate a vanishing point, which outperforms the stateoftheart considering tradeoff of time efficiency and accuracy. The vanishing point is detected by voting of a few line segments formed with some dominant pixels rather than by voting of all pixels in most existing methods.
 A fast, selfsupervised segmentation scheme is proposed for unstructured traversable region detection. An appearance model based on multivariate Gaussian is constructed from the sample region adaptively determined by the vanishing point and dominant borders in the input image. This scheme allows realtime performance on a mobile robot.
2. Methods
2.1. Traversable Region Detection
Figure 1 depicts the pipeline of the proposed traversable region detection. First, texture orientation is computed using Gabor filter with eight directions for each pixel of the input image. Secondly, instead of directly using all the pixels to vote for the vanishing point based on their orientations, we only group some dominant pixels with the same orientation into line segment candidates. Thirdly, the vanishing point is estimated by voting of those line segment candidates, and a seed pixel belonging to the road area is located through the constraints of the vanishing point and two road border candidates. Lastly, a sample region surrounding the seed pixel is selected to model the road combining RGB, IIS and LBP features using multivariate Gaussian such that the road can be classified from the background based on this appearance model.
2.2. Vanishing Point Estimation
In this paper, Gabor filters are used to estimate local dominant orientation for each pixel because of their wellknown accuracy. A 2D Gabor filter for an orientation ${\theta}_{n}$ and radial frequency ω is defined as follows [20],
where $a=x\mathrm{cos}{\theta}_{n}+y\mathrm{sin}{\theta}_{n}$, $b=x\mathrm{sin}{\theta}_{n}+y\mathrm{cos}{\theta}_{n}$, $c=\pi /2$, $\omega =2\pi /\lambda $ and $\lambda $ is set to $4\sqrt{2}$.
$${\psi}_{\omega ,{\theta}_{n}}(x,y)=\frac{\omega}{\sqrt{2\pi}c}{e}^{\frac{{\omega}^{2}}{8{c}^{2}}(4{a}^{2}+{b}^{2})}({e}^{ia\omega}{e}^{{c}^{2}/2})$$
Let $I(p)$ be the grayscale value of input image at $p(x,y)$. The convolution of input image $I(p)$ and a bank of Gabor filters with orientation ${\theta}_{n}$ and radial frequency ω are calculated as,
where N is the total number of orientations. The square norm of the complex Gabor filter response is then computed as,
$$\begin{array}{l}{\Gamma}_{{\theta}_{n}}(p)=I(p)\otimes {\psi}_{\omega ,{\theta}_{n}}\\ {\theta}_{n}=(n1)\pi /N,\text{}n=1,2,\dots N\end{array}$$
$${E}_{{\theta}_{n}}(p)=\mathrm{Re}{({\Gamma}_{{\theta}_{n}}(p))}^{2}+\mathrm{Im}{({\Gamma}_{{\theta}_{n}}(p))}^{2}.$$
Thus, the local dominant texture orientation ${\theta}_{\mathrm{max}}$ is defined as the orientation corresponding to the strongest Gabor response across all the orientations. More precise angular resolution can be achieved with a larger number of orientations ($N=36$ in [3]). However, it would be at the cost of computation complexity. In this paper, only 8 orientations are preferred to be used with a resolution of 22.5°. A confidencerated technique similar to the work of Kong et al. [3] is used to provide a confidence level for the local texture orientation ${\theta}_{\mathrm{max}}(p)$ at pixel $p$. Suppose ${E}_{1}(p)>\dots >{E}_{8}(p)$ is the ordered values of Gabor response for the 8 predefined orientations, the confidence in the orientation ${\theta}_{\mathrm{max}}(p)$ is given by,
$$con{f}_{{\theta}_{\mathrm{max}}}(p)=100(1\frac{1}{5}{\displaystyle \sum _{i=2}^{6}{E}_{i}(p)/{E}_{1}(p)}).$$
The pixels with a confidence level smaller than a threshold ${T}_{th}$, i.e., $con{f}_{{\theta}_{\mathrm{max}}}<{T}_{th}$ will be discarded. In our experiments, the optimal ${T}_{th}$ is set to 35.
It is found out that in many paved roads, those pixels contributing more to the voting share have similar orientations and can be grouped into line segments. Therefore, we propose to first group those dominant pixels into line segment candidates and then use these line segments to vote for the vanishing point.
Instead of using gradient angle like the existing approach [21], the texture orientation is used in this paper under the concept of linesupport region for line segment detection. A region growing algorithm is applied to group connected pixels (8connected neighborhood used in this paper) with a constant orientation tolerance into linesupport region. A small orientation tolerance will result in too narrow linesupport regions while a large one tends to include too many outliers. Hence, in our experiments, the orientation tolerance is empirically set as 22.5°. Small linesupport regions are rejected by the following criterion [21],
where ${n}_{reg}$ is the number of pixels in the region, ${X}_{m}$ and ${Y}_{m}$ are the sizes of the input image, ${\theta}_{th}$ is the resolution of each orientation. Once a linesupport region ${R}_{i}$ is found, a leastsquarefitting is applied to obtain a line ${l}_{i}$. A linesupport region ${R}_{i}$ (a set of pixels) must be associated with a line segment, actually a narrowed rectangle with its length and width. Thus, a rectangular approximation should be constructed to evaluate the fitted line. In this paper, we use the center of mass as the center of the rectangle $({c}_{i,x},{c}_{i,y})$,
where $g(j)$ is the gradient magnitude of pixel $j$. The main direction of the rectangle is set to the direction of the fitted line ${l}_{i}$. Then, the width ${L}_{i,w}$ and length ${L}_{i,l}$ of the rectangle are set to the smallest values so as to cover the full linesupport region ${R}_{i}$. The fitted line ${l}_{i}$ (and the linesupport region ${R}_{i}$) will be rejected if the rectangle is not narrow enough according to the ratio of length to width,
where ${r}_{th}$ is a threshold and set to 2 in our experiment.
$${n}_{reg}<{\mathrm{log}}_{10}(11{({X}_{m}{Y}_{m})}^{5/2})/{\mathrm{log}}_{10}({\theta}_{th}/180)$$
$$\begin{array}{l}{c}_{i,x}={\displaystyle \sum _{j\in {R}_{i}}g(j)x(j)/}{\displaystyle \sum _{j\in {R}_{i}}g(j)},\\ {c}_{i,y}={\displaystyle \sum _{j\in {R}_{i}}g(j)y(j)/}{\displaystyle \sum _{j\in {R}_{i}}g(j)},\end{array}$$
$${r}_{i}={L}_{i,l}/{L}_{i,w}\le {r}_{th},$$
To further evaluate the fitted line ${l}_{i}$ and select better line segment candidates for the vanishing point estimation, more constraints need to be considered. As Figure 2 shows, after the region growing and line fitting are applied, only a few line segments are extracted. Other less important line segments will be further excluded and could be viewed as noise for the vanishing point estimation. Suppose the set of line segments is ${\left\{{l}_{i}\right\}}_{M}$ and their slopes and centers are ${\left\{{K}_{i}\right\}}_{M}$,${\left\{({c}_{i,x},{c}_{i,y})\right\}}_{M}$, respectively. It is assumed that the vanishing point is not located on the left/right edges of the image. Only those line segments satisfying the following criterion are selected as candidates for vanishing point estimation:
where ${\gamma}_{i}$ is a flag for line ${l}_{i}$, ${c}_{xmean}$ is the average of all the line centers in the x direction, ${c}_{xmean}={\displaystyle \sum _{i=1}^{M}{c}_{i,x}}/M$, and $M$ is the total number of lines.
$${\gamma}_{i}=({c}_{i,x}{c}_{xmean})\cdot {K}_{i}>0,$$
Finally, a set of line segment candidates ${\left\{{l}_{i}\right\}}_{M}$ obtained above will be used to estimate the vanishing point by a weighted voting scheme. A distance parameter ${D}_{i,j}$ is defined for voting,
where ${L}_{i,l}$ and ${L}_{j,l}$ are the length of lines ${l}_{i}$ and ${l}_{j}$ respectively, ${L}_{l}$ is the sum of the length of all the lines, ${L}_{l}={\displaystyle \sum _{i=1}^{M}{L}_{i,l}}$, ${d}_{i,j,k}$ is the distance from the intersection point of the lines ${l}_{i}$ and ${l}_{j}$ to the line ${l}_{k}$ in the ${\left\{{l}_{i}\right\}}_{M}$.
$${D}_{i,j}={\displaystyle \sum _{k}\mathrm{exp}(\frac{{L}_{i,l}+{L}_{j,l}}{{L}_{l}}){d}_{i,j,k}},$$
Thus, the intersection point ${p}_{vp}({x}_{i,j},{y}_{i,j})$ corresponding to $\underset{p({x}_{i,j},{y}_{i,j})}{\mathrm{arg}\mathrm{min}}{D}_{i,j}$ is selected as the vanishing point. The complete algorithm for vanishing point estimation can be summarized as Algorithm 1.
Algorithm 1. Vanishing Point Estimation 




2.3. Sample Region Selection
Since the appearance of traversable region varies significantly, it is more plausible to build the appearance model adaptively with the input image than with offline training images [1]. To this end, the selection of a sample region of the road plays an important role. The selected sample region tends to include nontraversable region with nonadaptive methods. In our approach, the detected vanishing point is used to adaptively define the sample region because it provides a strong clue to the true location of road area.
In this paper, a similar technique is used as presented in Ref. [3] to find the two most dominant borders from a set of imaginary rays that originate from the initially estimated vanishing point. The difference is that we just roughly estimate the borders to define the sample region rather than to segment the traversable area. Specifically, we only consider 17 evenly distributed imaginary rays with the angle between two neighboring ones being 10°, as is shown in Figure 3. Suppose ${A}_{i,L}$ and ${A}_{i,R}$ are two neighboring regions on either side of the ray ${\ell}_{i}$ respectively. The color difference of ${A}_{i,L}$ and ${A}_{i,R}$ for each channel of color space (RGB used in this paper) is defined as,
where $\mathrm{mean}()$ and $\mathrm{var}()$ are the mean and variance of pixel value in the region. Let $\Delta ({A}_{i,L},{A}_{i,R})=\mathrm{max}\left\{\Delta {({A}_{i,L},{A}_{i,R})}_{c}{}_{c=R,G,B}\right\}$, then the right and left borders are simply defined as the ray ${\ell}_{j}$ and ${\ell}_{k}$ to satisfy the following expressions,
$$\Delta {({A}_{i,L},{A}_{i,R})}_{c}=\frac{\left\mathrm{mean}({A}_{i,L})\mathrm{mean}({A}_{i,R})\right}{\sqrt{\mathrm{var}({A}_{i,L})+\mathrm{var}({A}_{i,R})}},$$
$$\begin{array}{l}\Delta ({A}_{j,L},{A}_{j,R})=\mathrm{max}(\Delta ({A}_{i,L},{A}_{i,R}){}_{i=2,3,\dots 8}),\\ \Delta ({A}_{k,L},{A}_{k,R})=\mathrm{max}(\Delta ({A}_{i,L},{A}_{i,R}){}_{i=9,10,\dots 16}).\end{array}$$
Once the right and left borders are obtained, a new imaginary ray ${\ell}_{b}$ is constructed to be the bisector of ray ${\ell}_{j}$ and ${\ell}_{k}$. A seed point ${p}_{\mathrm{seed}}$ is found at the location of 2/3 of the bisector. Finally, a region ${R}_{\mathrm{S}}$ of $K\times K$ ($K=15$ in our experiments) surrounding the seed is selected as the sample region, as shown in the right image of Figure 3.
2.4. Segmentation Method
In this paper, a computationally efficient model based on multivariate Gaussian with complementary features, such as RGB, IIS and LBP, is used for segmentation. Given the sample region, the mean feature vector ${\mathsf{\mu}}_{C}$ and the covariance matrix $\sum {}_{C}$ are first obtained using 7 channels of each pixel,
where ${n}_{s}$ is the total number of pixels in the sample region, ${p}_{C,i}$ is the value of $i$ pixel for channel $C$ ($r,g,b$ are for RGB space, ${c}_{1},{c}_{2},{c}_{3}$ are for IIS space, and ${c}_{LBP}$ is for LBP).
$$\begin{array}{l}{\mathsf{\mu}}_{C}=\frac{1}{{n}_{s}}{\displaystyle \sum _{i=1}^{i={n}_{s}}{p}_{C,i}},\\ {\displaystyle \sum {}_{C}=\frac{1}{{n}_{s}}{\displaystyle \sum _{i=1}^{i={n}_{s}}{p}_{C,i}{p}_{C,i}^{T}{\mathsf{\mu}}_{C}{\mathsf{\mu}}_{C}^{T}}},\\ C=\{r,g,b,{c}_{1},{c}_{2},{c}_{3},{c}_{LBP}\}.\end{array}$$
Then, the likelihood of a pixel $p$ belonging to the road/nonroad region is measured as the Mahalanobis distance between the pixel and the learned model,
$$D(p)=\sqrt{{(p{\mathsf{\mu}}_{C})}^{T}{\displaystyle \sum {}_{C}^{1}(p{\mathsf{\mu}}_{C})}}.$$
For all the pixels in the sample region, the initial mean ${\mu}_{{D}_{0}}$ and the variance ${\sigma}_{{D}_{0}}$ are computed as,
$$\begin{array}{l}{\mu}_{{D}_{0}}=\frac{1}{{n}_{s}}{\displaystyle \sum _{i=1}^{i={n}_{s}}D({p}_{i})},\\ {\sigma}_{{D}_{0}}=\sqrt{\frac{1}{{n}_{s}}{\displaystyle \sum _{i=1}^{i={n}_{s}}{(D({p}_{i}){\mu}_{{D}_{0}})}^{2}}}.\end{array}$$
A pixel ${p}_{j}$ is classified into the road region if it satisfies the following condition,
where $\lambda $ is a parameter depending on the location of the pixel. ${\mu}_{{D}_{k}}$ and ${\sigma}_{{D}_{k}}$ are respectively the mean and variance of Mahalanobis distance for all the pixels in the road region. ${\mu}_{{D}_{k}}$ and ${\sigma}_{{D}_{k}}$ will be adaptively updated as the new road pixel outside the sample region is found.
$$\leftD({p}_{j}){\mu}_{{D}_{k}}\right<\lambda {\sigma}_{{D}_{k}}$$
In this paper, the segmentation process starts from the seed pixel with region growing method (8connected neighborhood), as shown in Figure 4. The whole image is divided into three parts by the border candidates and the horizontal line on which the vanishing point is located. To make the segmentation be accurate and robust to the noise, we apply adaptive threshold by changing the parameters $\lambda $, ${\mu}_{{D}_{k}}$ and ${\sigma}_{{D}_{k}}$. The parameter $\lambda $ changes with the pixel location because the likelihood of a pixel belonging to the road region varies for different areas of the image. In our experiments, $\lambda $ is set as,
$$\lambda =\{\begin{array}{l}3,\text{}if\text{}{p}_{j}\in \mathrm{Part}\text{}\mathrm{I},\\ 1,\text{}if\text{}{p}_{j}\in \mathrm{Part}\text{}\mathrm{II},\\ 0.5,\text{}if\text{}{p}_{j}\in \mathrm{Part}\text{}\mathrm{III}.\end{array}$$
Furthermore, once a new road pixel ${p}_{j}$ is found and added to the road region, the parameters ${\mu}_{{D}_{k}}$ and ${\sigma}_{{D}_{k}}$ are updated as,
where ${n}_{s,k1}$ is the total number of pixels in the current road region, specifically, ${n}_{s,0}$ is the total number of pixels in the sample region.
$$\begin{array}{l}{\mu}_{{D}_{k}}=[{\mu}_{{D}_{k1}}{n}_{s,k1}+D({p}_{j})]/({n}_{s,k1}+1),\\ {\sigma}_{{D}_{k}}=\sqrt{[{\sigma}_{{D}_{k1}}^{2}{n}_{s,k1}+{(D({p}_{j}){\mu}_{{D}_{k}})}^{2}]/({n}_{s,k1}+1)},\end{array}$$
The segmentation algorithm is summarized as Algorithm 2.
Algorithm 2. Traversable Region Segmentation 




3. Experimental Results and Discussion
Three experiments have been conducted to evaluate the proposed method. Firstly, vanishing point detection was tested on an image dataset for unstructured pedestrian lane detection and vanishing point estimation (PLVP) [1]. This dataset consists of 2000 images of unstructured lanes under various environmental conditions. Another more challenging image dataset from Ref. [3] (referred to as Challenge dataset in the following section) was also used for more intensive tests. This Challenge dataset contains 1003 images in total including 430 images taken along a Grand Challenge route in Southern California desert. All the images were normalized to the same size 240 × 180 and all the algorithms were run on a standard personal laptop (Intel i53230 CPU) without optimization or GPU acceleration. Then, the traversable region segmentation was evaluated on the PLVP dataset as well as KITTI road benchmark [22] to demonstrate the performance of the proposed method on different unstructured scenarios. Lastly, to show the effectiveness of the method for realtime application on robotics, the whole proposed framework was implemented on a Summit XL mobile robot platform with a binocular sensor (baseline 7 cm) in an unstructured campus environment.
3.1. Vanishing Point Detection
To evaluate the performance of vanishing point estimation algorithm, we compare the proposed method with two other related methods. One comparable method is a Gaborbased method presented by Kong et al. [3]. In this method, the vanishing point was estimated directly by the voting of all pixels based on their local orientations that were computed using Gabor filters in 36 directions. MATLAB source codes provided by the authors of Ref. [3] were implemented for comparison. The other comparable method is a Houghbased method proposed by Wang et al. [23]. In this method, Hough transform was first used to extract line segments on the edge map. Then, the vanishing point was detected by voting the intersections of line pairs. This method was implemented with C++ by us since the source code is not publicly available.
To quantitatively assess the vanishing point estimation, the estimation error is defined as follows [4],
where ${p}_{d}$ and ${p}_{g}$ are the detected vanishing point and groundtruth respectively, $L$ is the diagonal length of the image.
$${\delta}_{\mathrm{vp}}=\left{p}_{d}{p}_{g}\right/L,$$
Some examples of vanishing point estimation with different methods are shown in Figure 5. Table 1 shows the vanishing point estimation performances both on the PLVP dataset and Challenge dataset with different methods in terms of accuracy and runtime.
According to Figure 5, the Gaborbased method was easily affected by clutter pixels in an image with a complex background (e.g., Figure 5b,e,k,m) because all pixels in a halfdisk region were directly used to vote for the vanishing point candidates. In contrast, the proposed method just takes those dominant pixels that contribute more for the voting of vanishing point candidates and thus is more robust to clutter noisy pixels. Thus, the proposed method has a better performance than the Gaborbased method on the PLVP dataset. However, only a few line segment candidates are utilized to estimate the vanishing point with a simple voting scheme in the proposed method. Too few candidates will affect the estimation accuracy especially in very challenging scenarios (e.g., desert regions Figure 5q–x). Contrarily, the Gaborbased method outperforms the proposed one on the Challenge dataset because plenty of voting points are always available for the Gaborbased method. According to Table 1, the average errors of the proposed method on the PLVP dataset and Challenge dataset are 0.0734 ± 0.0858 and 0.1023 ± 0.1085, respectively, while the average errors of the Gaborbased method are 0.0812 ± 0.1042 and 0.0909 ± 0.1010, respectively. It is concluded that these two methods have close performances in terms of accuracy. In addition, the Houghbased method easily failed for natural scenes containing noisy edges or many short line segments (e.g., Figure 5f,g,i,k,m,n,p) because the vanishing point was simply estimated based on all the straight lines extracted from the image. Instead, the proposed method forms pixels into a line segment based on their texture orientations and employs a strict rejection scheme to keep only a small number of valid line segment candidates.
As shown in Table 1, the average computation time of the proposed method was significantly shorter than that of the Gaborbased method. The average error of Houghbased method was much higher than that of the proposed method although it was about two times faster than the proposed method.
In summary, the proposed method can achieve good tradeoff between accuracy and time efficiency for real time implementation.
3.2. Traversable Region Segmentation
Two comparable methods were used to make comparisons for road segmentation. One is a boundarybased method presented in Ref. [3] while the other is a pixelbased method presented in Ref. [10].
To quantitatively evaluate the road segmentation accuracy, we employ a similar approach as the one in Ref. [3]. Suppose that ${A}_{d}$ is the segmented road area and ${A}_{g}$ is the binarized groundtruth. The matching score is calculated as,
where traversable areas for ${A}_{d}$ and ${A}_{g}$ are set to 1 while nonroad areas are set to 0. The matching score $\eta $ can reach maximum value of 1 only when the detected road area completely coincides with the groundtruth. In order to show the road segmentation performance on the dataset, we change the matching score from 0 to 1 and compute the rate of correctly segmented images (Figure 6). Six examples of road segmentation with different methods are demonstrated in Figure 8.
$$\eta ({A}_{d},{A}_{g})=\frac{\left{A}_{d}\cap {A}_{g}\right}{\left{A}_{d}\cup {A}_{g}\right}$$
According to Figure 6, the proposed method outperforms the other two methods. In the pixelbased method [10], a training region at the bottom of the image was selected to construct GMMs for the road appearance. However, the training region might contain nonroad pixels and cannot always represent the true road area. For instance, the segmentation for the image on the sixth row in Figure 7 is not satisfying because the sample region included a small portion of nonroad pixels. Moreover, it was difficult to determine an appropriate number of Gaussian models and the threshold for segmentation (e.g., it is over segmented for the image on the fourth row in Figure 7). In contrast, the proposed method tends to select an optimal sample region based on the vanishing point and dominant borders. In addition, a multivariate Gaussian model combining complementary features and an adaptive threshold are used in the proposed method to robustly segment the road from the background.
As for the boundarybased method [3], the segmentation depended on detection of the dominant borders. However, inaccurate dominant borders would lead to unexpected road segmentation. For instance, the road segmentation for the image on the first row in Figure 7 includes much of the nonroad regions because of false borders. Furthermore, this method classified all the pixels locating in the area between two straightline borders into road pixels and thus was not suitable for most curved roads (e.g., the segmentations for images of curved roads on the second, fourth, fifth and sixth rows in Figure 7 either included some nonroad pixels or excluded some road pixels). In comparison, the proposed method also utilizes the dominant borders but does not strongly depend on them. If there is a big tree or building on the left and near the camera, the edges of the treetrunk or building might form a line segment and become a line segment candidate used for voting. On the one hand, these kinds of noisy candidates can be rejected at some degrees by the strict criterions (Equations (5), (7) and (8)). Moreover, as can be seen in Figure 8, even though the vanishing point estimation or the dominant border detection was not good enough, a seed pixel of road could still be correctly found. Then, the correct road sample region surrounding the seed could be used to further build the appearance model. Thus, the subsequent segmentation would not be influenced by such inaccurate vanishing points or borders.
The KITTI road benchmark consists of urban unmarked, marked, and multiple marked lanes. Because this paper mainly focuses on unmarked traversable region detection, the unmarked lane subset of KITTI was used to further evaluate the performance. According to Figure 9, the proposed method has achieved good performance on this public dataset compared with the other two methods.
The average processing time of the proposed method for traversable region detection (including vanishing point estimation and segmentation) was 28.16 ms, which is significantly faster than that of the pixelbased method [10] (2.93 s) and the boundarybased method [3] (63.56 s). In other words, our algorithm can be run in realtime over 30 fps on standard CPU although the efficiency of the proposed algorithm could be further improved by parallel computing.
3.3. Real Time Implementations for Robot Navigation
The proposed framework has been implemented on a real mobile robot in campus environments with unstructured pedestrian lanes (Figure 10). The robot was only allowed to travel on the normal pedestrian lanes like a human being. In our experiments, the lanes are 1.5–2 m wide. The robot must accurately localize itself and simultaneously build the traversable map for the environment.
The depth of the traversable region could be recovered with the binocular images by stereo matching. We adopt a fast stereo matching algorithm suitable for embedded realtime systems described in [24]. Once the disparity for a pixel $p(x,y)$ of traversable region is found by stereo matching, the 3D point in the sensor coordinates can be calculated with,
where ${z}_{c}=b\cdot f/d$, $b$ is the baseline, $f$ is the focal length, $d$ is the disparity and $K$ is the calibration matrix. To get the 3D points $p({x}_{w},{y}_{w},{z}_{w})$ of traversable region in world coordinates, a transformation needs to be performed according to the current pose of sensor,
where $T=[Rt]$, $R$ and $t$ are rotation and translation matrix respectively for the consecutive robot pose. In this experiment, ORBSLAM [25] is chosen to estimate the pose and build the map.
$$\left(\begin{array}{l}{x}_{c}\\ {y}_{c}\\ {z}_{c}\end{array}\right)={K}^{1}\left(\begin{array}{c}x\cdot {z}_{c}\\ y\cdot {z}_{c}\\ {z}_{c}\end{array}\right),$$
$$\left(\begin{array}{c}{x}_{w}\\ {y}_{w}\\ {z}_{w}\\ 1\end{array}\right)={T}^{1}\left(\begin{array}{c}{x}_{c}\\ {y}_{c}\\ {z}_{c}\end{array}\right),$$
To avoid producing too many 3D points of traversable region and save memory, a voxel filter [26] is used to downsample the point cloud. Figure 11a gives the result of a whole loop trajectory (~240 m) in the experiment. The grey point cloud represents feature points in the map while the green points represent the resulting traversable region. The robot poses and the built map are overlapped on the Google map in Figure 11b. These experimental results have shown that the robot can build a traversable map of the unstructured environment in realtime for robot navigation. For instance, the two narrow foot bridges (~1.5 m wide) on the real trajectory have been accurately mapped so that the robot could correctly locate the bridges and cross them like a human being without falling into the river. More analysis of localization and mapping results will be presented in the future work together with other related algorithms, which is not in the scope of this paper.
4. Conclusions
This paper proposes a novel method for realtime detection of unstructured traversable regions from a single image in complex outdoor environments. This method utilizes vanishing point estimated by a new fast voting scheme to adaptively determine a road sample region of input image. A selfsupervised segmentation approach based on a multivariate Gaussian model built from the sample region is used to classify the road and background rapidly and robustly. Experimental results on the public dataset have shown that the proposed method is able to detect various unstructured roads in realtime. Furthermore, implementation on a real mobile robot in challenging environments has shown the effectiveness of the proposed framework for robot navigation where the traversable region detection could be performed at a frame rate of 30 fps. Future work will focus on combing the proposed traversable region detection method with new localization and mapping algorithms to facilitate robot navigations in more challenging largescale environments.
Supplementary Materials
The following are available online at https://www.mdpi.com/14248220/17/9/2101/s1, Video S1: traversable region detection.
Acknowledgments
This research was supported by National Natural Science Foundation of China under Grant No. 91648012.
Author Contributions
Fucheng Deng proposed the algorithms, analyzed the data and wrote the paper; Xiaorui Zhu conceived and designed the experiments, and partly wrote the paper; Chao He performed the experiments and contributed to the robot localization algorithm.
Conflicts of Interest
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.
References
 Phung, S.L.; Le, M.C.; Bouzerdoum, A. Pedestrian road detection in unstructured scenes for assistive navigation. Comput. Vis. Image Underst. 2016, 149, 186–196. [Google Scholar] [CrossRef]
 Cristóforis, P.D.; Nitsche, M.A.; Krajník, T.; Mejail, M. Realtime monocular imagebased path detection. J. Real Time Image Proc. 2016, 11, 335–348. [Google Scholar]
 Kong, H.; Audibert, J.; Ponce, J. General road detection from a single image. IEEE Trans. Image Process. 2010, 19, 2211–2220. [Google Scholar] [CrossRef] [PubMed]
 Moghadam, P.; Starzyk, J.A.; Wijesoma, W.S. Fast VanishingPoint Detection in Unstructured Environments. IEEE Trans. Image Process. 2012, 21, 425–430. [Google Scholar] [CrossRef] [PubMed]
 Miksik, O. Rapid vanishing point estimation for general road detection. In Proceedings of the IEEE International Conference on Robotics and Automation, St. Paul, MN, USA, 14–18 May 2012. [Google Scholar]
 Tan, C.; Tsai, H.; Chang, T.; Shneier, M. Color modelbased realtime learning for road following. In Proceedings of the IEEE Conference on Intelligent Transportation Systems, Toronto, ON, Canada, 17–20 September 2006. [Google Scholar]
 Ramstrom, O.; Christensen, H. A method for following unmarked roads. In Proceedings of the IEEE Intelligent Vehicles Symposium, Las Vegas, NV, USA, 6–8 June 2005. [Google Scholar]
 Álvarez, J.M.; Ĺopez, A.M. Road detection based on illuminant invariance. IEEE Trans. Intell. Transp. Syst. 2011, 12, 184–193. [Google Scholar] [CrossRef]
 Lu, X. Selfsupervised road detection from a single image. In Proceedings of the IEEE International Conference on Image Processing, Quebec City, QC, Canada, 27–30 September 2015. [Google Scholar]
 Miksik, O.; Petyovsky, P.; Zalud, L.; Jura, P. Robust detection of shady and highlighted roads for monocular camera based navigation of UGV. In Proceedings of the IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011. [Google Scholar]
 Álvarez, J.M.; Ĺopez, A.M.; Gevers, T.; Lumbreras, F. Combining priors, appearance, and context for road detection. IEEE Trans. Intell. Transp. Syst. 2014, 15, 1168–1178. [Google Scholar] [CrossRef]
 Lee, H.; Kwak, K.; Jo, S. An incremental nonparametric Bayesian clusteringbased traversable region detection method. Auton. Robot 2017, 41, 795–810. [Google Scholar] [CrossRef]
 Xiao, L.; Dai, B.; Liu, D.X.; Zhao, D.W.; Wu, T. Monocular road detection using structured random forest. Int. J. Adv. Robot. Syst. 2016, 13, 1–13. [Google Scholar] [CrossRef]
 Alvarez, J.M.; Gevers, T.; LeCun, Y.; Lopez, A.M. Road scene segmentation from a single image. In Proceedings of the 12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012. [Google Scholar]
 Mohan, R. Deep deconvolutional networks for scene parsing. arXiv, 2014; arXiv:1411.4101. [Google Scholar]
 Oliveira, G.L.; Burgard, W.; Brox, T. Efficient deep models for monocular road segmentation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea, 9–14 October 2016. [Google Scholar]
 Tai, L.; Liu, M. Deeplearning in Mobile Robotics—From Perception to Control Systems: A Survey on Why and Why not. arXiv 2017, arXiv:1612.07139. [Google Scholar]
 Ghurchian, R.; Hashino, S. Shadow compensation in color images for unstructured road segmentation. In Proceedings of the 9th IAPR Conference on Machine Vision Applications, MVA 2005, Tsukuba, Japan, 16–18 May 2005; pp. 598–601. [Google Scholar]
 Ojala, T.; Pietikainen, M.; Maenpaa, T. Multiresolution grayscale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
 Lee, T. Image representation using 2D Gabor wavelets. IEEE Trans. Pattern Anal. Mach. Intell. 1996, 18, 959–971. [Google Scholar]
 Von Gioi, R.G.; Jakubowicz, J.; Morel, J.M.; Randall, G. LSD: A fast line segment detector with a false detection control. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 722–732. [Google Scholar] [CrossRef] [PubMed]
 KITTI Dataset. Available online: http://www.cvlibs.net/datasets/kitti/eval_road.php (accessed on 31 August 2017).
 Wang, Y.; Teoh, E.K.; Shen, D.G. Road detection and tracking using BSnake. Image Vis. Comput. 2004, 22, 269–280. [Google Scholar] [CrossRef]
 Humenberger, M.; Zinner, C.; Weber, M.; Kubinger, W.; Vincze, M. A fast stereo matching algorithm suitable for embedded realtime systems. Comput. Vis. Image Underst. 2010, 114, 1180–1202. [Google Scholar] [CrossRef]
 MurArtal, R.; Montiel, J.M.M.; Tardos, J.D. ORBSLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef]
 Point Cloud Library. Available online: http://pointclouds.org/ (accessed on 31 August 2017).
Figure 1.
Pipeline of the traversable region detection: (a) input image, (b) estimating texture orientations (represented by different colors) using Gabor filters, (c) grouping dominant pixels with the same orientation into line segment candidates (in green with a blue point), (d) estimating the vanishing point (red circle) by the voting of line segment candidates and locating the seed pixel (blue circle), (e) selecting a sample region surrounding the seed pixel and constructing an appearance model, (f) classifying the pixels into traversable/nontraversable area.
Figure 2.
An example of grouping dominant pixels into line segment candidates. Left: image overlaid with texture orientations, 22.5°pink, 45°red, 67.5°cyan, 90°yellow, 112.5°green, 135°deep blue, 157.5°purple. Right: selected line segment candidates (arrowed green lines with blue points in the middle); the one on the most left is abandoned.
Figure 3.
Sample region selection for constructing an appearance model. (a) image with 17 evenly distributed imaginary rays (in white) originating from the vanishing point. (b) image with dominant borders (in red) and sample region surrounding the seed (blue circle).
Figure 4.
Traversable region segmentation starting from the seed with region growing. The region between left and right border is Part I; the region above horizontal line is Part III and the rest of the image part is Part II.
Figure 6.
Segmentation accuracy with different methods on the pedestrian lane detection and vanishing point estimation (PLVP) dataset.
Figure 8.
Examples of segmentation with the proposed method under conditions of inaccurate vanishing point and dominant borders. Row 1: input images, Row 2: vanishing point and dominant borders, Row 3: segmentation.
Figure 9.
Segmentation accuracy with different methods on the KITTI road benchmark (unmarked lane subset of KITTI).
Figure 10.
Realtime traversable region detection, (a) mobile robot platform with binocular sensor; (b–d) frame #4624, #4666, #4720 (image frame rate 30 fps).
Figure 11.
A complete loop trajectory of the robot with traversable regions (a) blue: robot pose, green: traversable region, grey: feature point; (b) traversable mapping overlapped on real map.
Method  Average Error  Times (s)  

PLVP Dataset  Challenge Dataset  
Gaborbased method  0.0812 ± 0.1042  0.0909 ± 0.1010  11.712 ^{a} 
Houghbased method  0.1463 ± 0.1353  0.2464 ± 0.1464  0.009 ^{b} 
Proposed method  0.0734 ± 0.0858  0.1023 ± 0.1085  0.021 ^{c} 
^{a} the code is in MATLAB, ^{b,c} the codes are in C++.
© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).