A Robust Laser Stripe Extraction Method for Structured-Light Vision Sensing

Environmental sensing is a key technology for the development of unmanned cars, drones and robots. Many vision sensors cannot work normally in an environment with insufficient light, and the cost of using multiline LiDAR is relatively high. In this paper, a novel and inexpensive visual navigation sensor based on structured-light vision is proposed for environment sensing. The main research contents of this project include: First, we propose a laser-stripe-detection neural network (LSDNN) that can eliminate the interference of reflective noise and haze noise and realize the highly robust extraction of laser stripes region. Then we use a gray-gravity approach to extract the center of laser stripe and used structured-light model to reconstruct the point clouds of laser center. Then, we design a single-line structured-light sensor, select the optimal parameters for it and build a car–platform for experimental evaluation. This approach was shown to be effective in our experiments and the experimental results show that this method is more accurate and robust in complex environment.


Introduction
With the development of computer vision and navigation technologies, UGV (unmanned ground vehicle) and MAV (micro aerial vehicle) have come to be widely used in underground inspection, military reconnaissance and device detection [1][2][3]. Global positioning system (GPS) is one of the most popular method for robot navigation tasks. However, for some special circumstances, such as underground mines, under-lit indoors, there is almost no GPS signal due to the enclosed environment. Therefore, it is impossible to use satellite to locate robot. LiDAR scanning allows three-dimensional reconstruction of the surrounding environment but building multiline LIDAR system is way too expensive for the given task.
Considering the above factors, visual sensors are widely used in UGVs and MAVs because of its portability and inexpensiveness. The visual sensors can be categorized into two types: active visual sensors and passive visual sensors. The passive visual sensors are dependent on the ambient light and will fail if the features in the captured image are sparse. As a typical method of active vision, structured-light, due to its low cost, fast acquisition, simple system design, large visual field, has shown great advantages over other methods [4][5][6][7]. Over the course of the past 40 years, many researchers have applied structured-light vision to different tasks. Izquierdo et al. presented a sub-pixel method to measure 3D surfaces based on structured-light projector and calibrated camera [8]. Xie et al. proposed a new approach to calibrate structured-light sensor and apply it to measure the geometric size of certain objects [9]. Liu et al. achieved real time and accurate measurement of rail profile [10]. Fan et al. structured-light sensor usually consists of two parts: a camera and a laser projector. The laser projector projects a certain pattern of laser stripes on the objects and the camera will capture the image of the stripe modulated by the front objects. By calibrating the camera initially to get its parameters, we can acquire the objects' surface information [12][13][14][15]. In structured-light vision inspection, 3D reconstruction and depth measurement can be categorized by the different kind of laser used, such as point laser, line laser and grid laser. This study focuses on the application of singleline structured light.
Locating the laser stripe accurately is a key step for the acquisition of the object depth. However, as the laser beam usually has a certain width of several pixels in the image, we need to extract its center first. Many studies have been conducted for the aim of achieving high precision, applicable efficiency and strong robustness when dealing with complicated environments [16]. These studies can be classified into two following procedures, namely detection and extraction.
The first step is to detect the location of the laser stripe. To date, none of the methods proposed is perfect and far from being ready to be applied to complicated environments. What caused the noises and bring difficulty to this detection process is that the intensity of the laser stripe that the camera captured is modulated by the interreflections between different surfaces in the environment, the saturation of the laser stripes, some materials like polished metal has extreme reflection capabilities, the incident angles between different surfaces and the uneven surfaces and the discontinuity of line caused by the randomly placed objects in the environment [17]. In other cases, the laser will scatter due to the haze, resulting in the irregular shape of the laser stripe in the image acquired by the camera, as shown in Figure 1. Some traditional methods detect the region of laser stripe by using RGB color space [18]. However, as the white light also has R component, it is impossible to distinguish the stripe by simply using threshold based on R component. Moreover, there are also some other red pixels due to the interreflection between objects. Hong Nam Ta proposed a novel method [19] to solve the problem of saturation in his study. He takes advantage of YCbCr color space and the laser's physical properties in order to enhance laser signal and reduce the effects of white ambient light. It also automatically estimates the saturation of laser light and adjusts the exposure by capturing a sequence of images with different exposures.
Some work simply uses the experimental threshold to do the binarization processing, and the result is far from satisfactory. Sun, Q.C. et al. proposed a method using Sobel operator to detect the Some traditional methods detect the region of laser stripe by using RGB color space [18]. However, as the white light also has R component, it is impossible to distinguish the stripe by simply using threshold based on R component. Moreover, there are also some other red pixels due to the interreflection between objects. Hong Nam Ta proposed a novel method [19] to solve the problem of saturation in his study. He takes advantage of YCbCr color space and the laser's physical properties in order to enhance laser signal and reduce the effects of white ambient light. It also automatically estimates the saturation of laser light and adjusts the exposure by capturing a sequence of images with different exposures. Some work simply uses the experimental threshold to do the binarization processing, and the result is far from satisfactory. Sun, Q.C. et al. proposed a method using Sobel operator to detect the edge points of laser stripe first [20]. It can only work in ideal environments because Sobel operator cannot distinguish the laser stripe from the noises. Jia Du and Wei Xiong introduced a different approach. They first, propose a ridge segment detector (RSD) which is inspired by LSD to extract the potential laser regions and then rank these regions to find the most possible one [21]. This method is more robust than simply depending on the color information, but still lacks reliability when dealing with the specular reflection area.
Chmelar et al. [22] introduced a novel method of the laser line detection by using well-chosen Gaussian mixture model (GMM). GMM is a method utilizing machine learning. It trains a dataset by giving labels to different pixels. This method is able to solve the problem brought by the different laser intensity in the whole image and reduce saturation's influence. GMM is based on probability, it ignores the interconnections of pixels and their interior connections, only focusing on the simple information of the pixel itself.
According to the above discussion, these existing methods of extracting the laser stripe center line have some nonnegligible limitations. In recent years, with the rapid development of deep learning, it is common to use deep learning methods to complete advanced visual tasks [23,24]. Krizhevsky et al. [25] proposed AlexNet which is an eight-layer-deep convolutional neural network to solve the problem of image classification, and won the first place in the ILSVRC 2012 competition. AlexNet proved that deep convolutional networks can extract more advanced and effective semantic features in images than traditional methods. Fully convolutional network (FCN) which is a state-of-the-art framework to the semantic segmentation is proposed by Long et al. [26]. Olaf Ronneberger [27] proposed U-net which is an end-to-end semantic segmentation convolutional network in electron microscopic stacks. They won the ISBI cell tracking challenge 2015 in some categories. Kaiming He [28] introduced Mask-R-CNN for instance segmentation. The network first detects the location of the target and then sorts the pixels in the box of target. Vijay Badrinarayanan [29] proposed SegNet which consists of an encoder network and a decoder network. SegNet achieves semantic pixelwise segmentation and encoder network of SegNet extracts rich features. The decoder network's mission is to map the low-resolution encoder feature maps to full input resolution feature maps for pixelwise classification. Deeplabv3+ is also an encoder-decoder neural network proposed by Liang-Chieh Chen [30]. Deeplabv3+ used ResNet [31] as encoder network to extract features and designed a simple and efficient decoder to restore object boundaries.
Some researchers focus on applying deep learning method to structured-light vision. Li et al. proposed a novel method combining convolution neural network with structured-light measurement [32]. They use deep learning method to achieve stereo matching in occluded environments and can calculate the depth more accurate than traditional methods. Similarly, Du et al. designed SLNet to extract and match features more effectively [33]. This method can also realize real-time depth acquisition. Tao et al. set up a system to measure the box volume based on line structured light and deep learning [34]. They proposed IHED network to extract the edge in the captured image. This method can extract straight line from image efficiently but cannot distinguish laser stripe from other edges.
Though deep learning method has achieved important breakthroughs in semantic segmentation from complex images, few studies have attempted to locate the laser stripe, because there are no big public data set that is adequate to train the deep convolutional neural network well. Moreover, the shape of the laser stripe is relatively slender, and the intersection between the noise region and the laser stripe region is not easy to distinguish. Inspired by DeepLab [30], we propose a novel network to realize highly robust laser stripe region positioning and noise filtering.
The 3D measurement coordinates of real scene are obtained from the image coordinates of the laser stripe's center according to the measurement model of the structured-light sensor that are described in Section 2. The measurement accuracy of the sensor is highly dependent on the detection accuracy of the Sensors 2020, 20, 4544 4 of 18 light stripe. Moreover, the various interference in complicated environment, such as the pseudo-light and the haze, will severely influence the location and detection of the real laser stripe. Therefore, in practical applications, it is very important and necessary to extract laser stripe center with high robustness and reconstruct 3D point clouds of the stripe position.
In this study, our contributions can be summarized into three aspects: (1) A laser stripe region segmentation framework based on semantic segmentation network is proposed, which can eliminate the interference of reflective noise and haze noise and realize the highly robust extraction of laser stripes region for the first time; (2) A dataset representing different noises in sophisticated environments and propose a new strategy for labeling images with laser stripe is set up; (3) The structured-light vision sensor with single line stripe is designed, selected the optimal parameters for it and built a car-platform for experimental evaluation.
The rest of this paper is organized as follows: Section 2 introduces the measurement model of structured-light sensor. We also design a structured-light sensor, optimize its parameter and finish the calibration process. Section 3 presents the details of our laser-stripe-detection neural network and detection and extraction process in complicated environments. We design and compare different structure of neural network, conduct the performance evaluation test and demonstrate the robustness and availability of our method based on the results of our experiment in Section 4. Section 5 is the conclusion of our work.

Measurement Model and Design of Structured-Light Sensor
We build a structured-light sensor for robot navigation in the dark and narrow environment at low cost. The hardware part is composed of a monocular camera and a line structured light projector placed next to it and the software part uses the processor to process the raw image to get the point clouds at the position of the light bar, thereby obtaining the information of the environment.
The measurement model of the structured-light sensor is shown in Figure 2a. o c − x c y c z c is the 3D camera coordinate system. o n − x n y n is the normalized image coordinate system. o u − x u y u is the undistorted image coordinate system. π n is the normalized image plane. π u is the undistorted image plane. π s is the light plane projected by the laser projector. We set o c x c //o u x u //o n x n , o c y c //o u y u //o n y n , o c z c ⊥π u and π u //π n . We assume that P is an arbitrary point in 3D space. The intersection of the ray o c P and the normalized image plane is P n , which is the corresponding perspective projection point in π n . Similarly, P u is the ideal projection point in the undistorted image plane. P d is the real projection point of P in the normalized plane. The deviation between P n and P d is caused by the camera distortion. In this study, our contributions can be summarized into three aspects: (1) A laser stripe region segmentation framework based on semantic segmentation network is proposed, which can eliminate the interference of reflective noise and haze noise and realize the highly robust extraction of laser stripes region for the first time; (2) A dataset representing different noises in sophisticated environments and propose a new strategy for labeling images with laser stripe is set up; (3) The structured-light vision sensor with single line stripe is designed, selected the optimal parameters for it and built a car-platform for experimental evaluation.
The rest of this paper is organized as follows: Section 2 introduces the measurement model of structured-light sensor. We also design a structured-light sensor, optimize its parameter and finish the calibration process. Section 3 presents the details of our laser-stripe-detection neural network and detection and extraction process in complicated environments. We design and compare different structure of neural network, conduct the performance evaluation test and demonstrate the robustness and availability of our method based on the results of our experiment in Section 4. Section 5 is the conclusion of our work.

Measurement Model and Design of Structured-Light Sensor
We build a structured-light sensor for robot navigation in the dark and narrow environment at low cost. The hardware part is composed of a monocular camera and a line structured light projector placed next to it and the software part uses the processor to process the raw image to get the point clouds at the position of the light bar, thereby obtaining the information of the environment.
The measurement model of the structured-light sensor is shown in Figure 2a. − is the 3D camera coordinate system. − is the normalized image coordinate system. − is the undistorted image coordinate system. is the normalized image plane. is the undistorted image plane.
is the light plane projected by the laser projector. We set // // , // // , ⊥ and // . We assume that is an arbitrary point in 3D space. The intersection of the ray and the normalized image plane is , which is the corresponding perspective projection point in . Similarly, is the ideal projection point in the undistorted image plane.
is the real projection point of P in the normalized plane. The deviation between and is caused by the camera distortion. We denote the camera coordinate of P as X c = [x c , y c , z c ] T and its coordinate in normalized camera system as X n = [x n , y n ] T . The ideal coordinate of P in the image plane is denoted as X u = [x u , y u ] T . Then the transformation from o c − x c y c z c to o n − x n y n can be expressed as: We define the focal length in x and y directions are f x and f y .respectively. The coordinate of principal point in camera coordinate system is (u 0 , v 0 ). Then the intrinsic parameter matrix A of the camera can be expressed as: According to the pinhole model of camera the transformation from o n − x n y n to o u − x u y u can be expressed as: where λ is the scaling factor and X n and X u are the homogenous coordinate of X n and X u , respectively. The camera we use is not as ideal as the pinhole model. There exist unavoidable distortion and this will diminish the quality of our captured image. In this paper, we take the radial distortion and tangential distortion into account. We consider the first three terms of the radial distortion and the first two terms of the tangential distortion for our model. Moreover, the relationship between P d and P u is: where k 1 , k 2 and k 3 are the coefficients of the lens' radial distortion and p 1 , p 2 are the coefficients of the lens' tangential distortion. In addition, the coordinates of P in camera system suit the laser plane's equation: where a, b, c, d represent the coefficients of the laser plane's equation, respectively. From the above formula, we can calculate the 3D camera coordinates of the target point independent from the structure parameters of the sensor such as the base distance and tilt angle. Therefore, it can achieve higher accuracy and is more applicable in different environments. Figure 2b shows the structure design of our sensor. b is the base distance of the sensor. The angle between the laser plane π s and the normalized image plane π n is α. Moreover, the coordinate systems are same with Figure 2a.
The z coordinate of the line where the light plane intersects the ground is the maximum measurement depth Z max , x, y are x coordinate and y coordinate of point p n , respectively. We assume that the height from the camera's optical center to the ground is h c and the pixel error of x, y are ∆ x , ∆ y. ∆ means the overall error of coordinates of the target point P. We take Z = Z max , y = y max , then we can calculate the target point P's coordinate error, which is shown in Equation (6).
Through the analysis of the calculation formula of ∆, it can be concluded that the error decreases as the baseline distance increases. According to this conclusion and combined with the actual situation, Sensors 2020, 20, 4544 6 of 18 we finally choose the value of b and optimize the Equation (4) to get the optimal parameters of our sensor.
The result shows that when the baseline distance b is 50 mm and the tilt angle α is 70 • , our sensor will minimize the coordinate error and not increase greatly in its volume.
The external interface of the sensor is the USB interface of the camera. The sensor is mounted on the car. The image of the light stripe is captured and processed, and the relative position of the UGV and the surrounding environment is obtained, thereby realizing the UGV obstacle avoidance and navigation. The details are in Section 4.
After designing the sensor, we use the dot target to calibrate the sensor and calculate the intrinsic parameter matrix of the camera and the plane equation of the structured-light plane in the camera coordinate system. We also obtain the coefficients of distortion. The quantitative results are shown in Table 1. Table 1. Parameters of the sensor.

Hardware Parameters Calibration Result Physical Meaning
Monocular camera

Architecture of System and Laser-Stripe-Detection Neural Network
The overall working process of our system is as follows: First, the structured-light projector is used to project the structured light into the environment and the monocular camera is used to capture the image with the light stripe, Second, the region of laser sprite is detected by neural network and then the pixels in the center of the light stripe are extracted by gray-gravity approach from the image which is the output of the neural network. Finally, we use mathematical model in structured-light measurement to reconstruct the point cloud at the light bar to realize the perception of the three-dimensional environment. Figure 3 shows the schematic diagram of our system and Figure 3a shows the process of image segmentation and 3D point cloud reconstruction. The detailed description of the neural network is discussed in Section 3.3.

Image Labeling
Our structured-light sensor projects the line laser into the environment to form a light stripe. Due to the existence of smooth surfaces in the environment, such as marble floor and some metals having extreme surface reflection capabilities, a large number of "pseudo-light stripes" are formed. These "pseudo-light stripes" have similar morphologic features to the real one. Therefore, morphologic modeling cannot be directly applied to extract the stripe. There is also a kind of noise resulting from the scattering of light, usually when there exists haze in the environment. This kind of noise often floods the stripe, causing some obvious morphologic features of the stripe to disappear, making the tradition method fail to detect the accurate region of the laser stripe.
In this paper, the convolutional network is applied to classify the pixels in the image. Each pixel is classified into a certain category. The pixels belonging to the laser stripe area and the pixels belonging to the background area are distinguished. After we finish the segmentation process, the Steger algorithm and the gray-gravity method are, respectively used to further extract the center of the stripe. then the pixels in the center of the light stripe are extracted by gray-gravity approach from the image which is the output of the neural network. Finally, we use mathematical model in structured-light measurement to reconstruct the point cloud at the light bar to realize the perception of the threedimensional environment. Figure 3 shows the schematic diagram of our system and Figure 3a shows the process of image segmentation and 3D point cloud reconstruction. The detailed description of the neural network is discussed in Section 3.3.  Since there is a joint between the pseudo-light stripe formed by the reflection and the real light stripe, only labeling the true light stripe cannot successfully achieve the segmentation task. Therefore, the real light stripe and different forms of noise are marked into different categories. Figure 4 shows the schematic diagram of Image Labeling. The real laser stripe part is marked red (first type), the reflective part is marked green (second type), the background is black (third type), the ambient light is yellow (fourth type), and the foggy part is marked blue (fifth type). Images are labeled according to the format of VOC dataset [35]. fusion: a module which can merge level information and deep level information for better restore space information.

Image Labeling
Our structured-light sensor projects the line laser into the environment to form a light stripe. Due to the existence of smooth surfaces in the environment, such as marble floor and some metals having extreme surface reflection capabilities, a large number of "pseudo-light stripes" are formed. These "pseudo-light stripes" have similar morphologic features to the real one. Therefore, morphologic modeling cannot be directly applied to extract the stripe. There is also a kind of noise resulting from the scattering of light, usually when there exists haze in the environment. This kind of noise often floods the stripe, causing some obvious morphologic features of the stripe to disappear, making the tradition method fail to detect the accurate region of the laser stripe.
In this paper, the convolutional network is applied to classify the pixels in the image. Each pixel is classified into a certain category. The pixels belonging to the laser stripe area and the pixels belonging to the background area are distinguished. After we finish the segmentation process, the Steger algorithm and the gray-gravity method are, respectively used to further extract the center of the stripe.
Since there is a joint between the pseudo-light stripe formed by the reflection and the real light stripe, only labeling the true light stripe cannot successfully achieve the segmentation task. Therefore, the real light stripe and different forms of noise are marked into different categories. Figure 4 shows the schematic diagram of Image Labeling. The real laser stripe part is marked red (first type), the reflective part is marked green (second type), the background is black (third type), the ambient light is yellow (fourth type), and the foggy part is marked blue (fifth type). Images are labeled according to the format of VOC dataset [35].

Structure of Laser-Stripe-Detection Neural Network
Laser-stripe-detection neural network (LSDNN) is a semantic segmentation convolutional neural network which can extract the region of laser stripe. The specific process is as follows: The image captured by the camera (1920 × 1080 pixels) is used as input. First, the ResNet is used to extract rich semantic features as encoder and multiscale dilated convolution as decoder outputs the segmented result.

Structure of Laser-Stripe-Detection Neural Network
Laser-stripe-detection neural network (LSDNN) is a semantic segmentation convolutional neural network which can extract the region of laser stripe. The specific process is as follows: The image captured by the camera (1920 × 1080 pixels) is used as input. First, the ResNet is used to extract rich semantic features as encoder and multiscale dilated convolution as decoder outputs the segmented result.
In order to successfully determine whether a pixel is in the target region or not, a combination of large-scale feature, small-scale feature and global feature is needed. Some traditional methods use multiscale convolution to refine the feature [36]. The accuracy of the network is improved in this way, but the complexity and train time are also increased. Moreover, when the target object has some specific features, such structure may not lead to improvement in network performance.
In this paper, we build a single-line structured-light sensor. Given the fact that the horizontal scale of the laser stripe in image is very large, but its width is relatively small, after extracting feature map by backbone, we only need large-scale convolution and small-scale convolution to extract the features. In order to find the best combination of the number and size of the convolution layers, we conduct an experiment testing different parameters. Figure 5a is a state-of-the-art structure of pooling module in segmentation [30]. It uses multiscale atrous convolution as pooling module to extract higher-level features. We design and compare different structure of the pooling module. The quantitative results are shown in Section 4. The best structure we select for the laser stripe detection is shown in Figure 5b. It has two dilated convolution layers and one global fusion module for pooling. The pooling-module-layer 1 contains a dilated convolution layer which dilation size is 3. It can extract detailed information. The pooling-module-layer 2 contains a dilated convolution layer which dilation size is 18. It can extract large scale information. The global fusion module employs global average pooling to capture global context and computes an attention vector to guide the feature learning. This module can refine the output feature of each stage and provides rich global space information which is useful for laser stripe segmentation. map by backbone, we only need large-scale convolution and small-scale convolution to extract the features. In order to find the best combination of the number and size of the convolution layers, we conduct an experiment testing different parameters. Figure 5a is a state-of-the-art structure of pooling module in segmentation [30]. It uses multiscale atrous convolution as pooling module to extract higher-level features. We design and compare different structure of the pooling module. The quantitative results are shown in Section 4. The best structure we select for the laser stripe detection is shown in Figure 5b. It has two dilated convolution layers and one global fusion module for pooling. The pooling-module-layer 1 contains a dilated convolution layer which dilation size is 3. It can extract detailed information. The pooling-modulelayer 2 contains a dilated convolution layer which dilation size is 18. It can extract large scale information. The global fusion module employs global average pooling to capture global context and computes an attention vector to guide the feature learning. This module can refine the output feature of each stage and provides rich global space information which is useful for laser stripe segmentation. Then feature-fusion module fuse low-level features and high-level semantic features together. We define the features extracted by ResNet's first stage as low-level features and the features extracted by multiscale pooling as high-level semantic features. The input of feature-fusion module is the combination of low-level features and high-level semantic features. In this module we balance the scales of the features by the batch normalization and pool the concatenated feature to a feature vector and compute a weight vector. This weight vector can re-weight the features, which amounts to feature selection and combination, and the result we get with this module is much better than the result without it.
Finally, the feature is decoded by upsampling the 3 × 3 convolutional layer and bilinear difference and a convolution with "1 × 1 kernel" as decoder layer outputs the segmented result. The detailed architecture of LSDNN is represented in Table 2.

Layer Name
Output Size Architecture Input 513 × 513 × 3 / Then feature-fusion module fuse low-level features and high-level semantic features together. We define the features extracted by ResNet's first stage as low-level features and the features extracted by multiscale pooling as high-level semantic features. The input of feature-fusion module is the combination of low-level features and high-level semantic features. In this module we balance the scales of the features by the batch normalization and pool the concatenated feature to a feature vector and compute a weight vector. This weight vector can re-weight the features, which amounts to feature selection and combination, and the result we get with this module is much better than the result without it.
Finally, the feature is decoded by upsampling the 3 × 3 convolutional layer and bilinear difference and a convolution with "1 × 1 kernel" as decoder layer outputs the segmented result. The detailed architecture of LSDNN is represented in Table 2. The red region in the segmentation results in the region where the light bar is located, and it is very easy to extract the red region to achieve the extraction of the light bar region in the original image. (Filtering out the interference of reflective noise), the next section will show how to extract the center of the strip from the segment of the stripe.

Training Process
We denote our training dataset as X = {x i |i = 1, 2, . . . N} and Y = y i i = 1, 2, . . . N . Set X is the combination of all laser stripe images in complex environments and set Y is the label image correspondingly. As the LSDNN we propose is an end-to-end network, we use all images in set X as the input of our network and the ground-truth image in set Y as the output. This process can be expressed as: During the training process, the parameters in our laser-stripe-detection neural network are updated continuously. Each layer has its independent weight parameter and the fusion module fuse them all together. The ultimate goal of our training is to minimize the value of the cost function, which is here, y (i) represents the i th ground-truth image andŷ (i) represents the i th prediction image based on x i .

Evaluation Method
IoU (intersection over union) is a general evaluation index of semantic segmentation tasks. It represents the ratio of the intersection of two set and their union.
When we need to evaluate the accuracy of the task which includes more than one class of object, mIoU contains more information because it calculates the mean value of IoU over different classes. In our task, as the different categories in the image often have some area of overlaps, we need to focus on the overall segmentation precision instead of just on laser stripe region. fwIoU (frequency weight intersection over union) is another indicators which uses the frequency of occurrence of each category as the weight. The mathematical expression of mIoU and fwIoU are as follow, where k is the number of object categories, p ij represent the number of pixels whose ground truth are i, but predicted result are j.
mIoU is regarded as one of the most important indicators in segmentation tasks. Except mIoU and fwIoU we also use Acc (pixel accuracy), Acc class (pixel accuracy of class), as the assessment criteria of our experiment. Acc represents the correct percentage of pixels and Acc class represents the mean value of Acc of each category. The mathematical expression of Acc, Acc class, are as follows:

Post Processing Algorithm
The output of the convolutional neural network is a color image of three channels of RGB, wherein the objects of different labels are different in color. When we train the data set, the label of the laser stripe to be tested is designed to be a specific color. Then we only need to traverse all pixels of the output picture and mark the pixel points with specific R channel, G channel and B channel values, the position of the light bar can be accurately extracted. Moreover, the unrelated noise is also filtered out in this way. Because the size and type of the output image are exactly the same as the original image, we can simply filter the stripe area on the basis of traversal and remove other parts to get an image only containing the needed laser stripe.
The intensity distribution of the cross section of the laser tripe usually approximates the gaussian formula [37]: µ is the mathematical expectation and σ is the standard deviation.
For the area to be measured, the normal direction at each place can be obtained by Hessian matrix. The maximum absolute eigenvalue and the corresponding eigenvector of Hessian matrix can be solved to obtain the normal direction of laser stripe and the second derivative in this direction. In addition, Taylor series expansion can be carried out along the normal direction of the stripe since the normal direction is the direction in which the gray scale changes most greatly. Then we can get the center of the stripe by calculating the partial derivative.
Another method for extracting the center line is gray-gravity method (GGM). Similar to the definition of the center of mass in mathematics, each pixel in the image is considered a mass block and the gray value is taken as the mass of each pixel. Each column consists of several pixels can be considered as a "stick", so the barycentric coordinates of each "stick" is the center line of the laser stripe of this column. Assume the image we get has n rows and m columns. The gray value of the pixel at the i th row and j th column is denoted as I(i, j). Then the center of laser line in the j th column can be expressed as: In this paper, the single-line structured light is used, so there is only one horizontal laser stripe in Figure 6. By using gray-gravity method, the center position of the light stripe in each column can be calculated easily. Here, we use the above methods to extract the center line of the laser stripe. Steger method is robust, but it is time-consuming. By segmenting the stripe first, we can eliminate the unnecessary time cost as we only need to convolute the selected region of image. The gray-gravity method is fast, but as it takes all pixels into account, it is easily influenced by the noises in image. However, these noises can be filtered from the image by utilizing our method. Figure 6 shows the comparison of our method and Steger method. It can be seen that Steger method fail to detect some part of laser stripe when the haze flooded the target region.

Experimental Results
We independently set up a platform for the unmanned car, which is controlled by a single chip microcomputer called Arduino and can be moved remotely by Bluetooth. The structured-light sensor is mounted on the vehicle, and the structured light is projected forward for environmental reconstruction and information perception. The platform is shown in Figure 7.
Experiment in the corridor outside the laboratory and make our own data set for laser strip extraction. Deep learning experiments are conducted using four GTX 2080Ti video cards and other programs are completed under Visual studio 2017. Here, we use the above methods to extract the center line of the laser stripe. Steger method is robust, but it is time-consuming. By segmenting the stripe first, we can eliminate the unnecessary time cost as we only need to convolute the selected region of image. The gray-gravity method is fast, but as it takes all pixels into account, it is easily influenced by the noises in image. However, these noises can be filtered from the image by utilizing our method. Figure 6 shows the comparison of our method and Steger method. It can be seen that Steger method fail to detect some part of laser stripe when the haze flooded the target region.

Experimental Results
We independently set up a platform for the unmanned car, which is controlled by a single chip microcomputer called Arduino and can be moved remotely by Bluetooth. The structured-light sensor is mounted on the vehicle, and the structured light is projected forward for environmental reconstruction and information perception. The platform is shown in Figure 7.
Experiment in the corridor outside the laboratory and make our own data set for laser strip extraction. Deep learning experiments are conducted using four GTX 2080Ti video cards and other programs are completed under Visual studio 2017. Here, we use the above methods to extract the center line of the laser stripe. Steger method is robust, but it is time-consuming. By segmenting the stripe first, we can eliminate the unnecessary time cost as we only need to convolute the selected region of image. The gray-gravity method is fast, but as it takes all pixels into account, it is easily influenced by the noises in image. However, these noises can be filtered from the image by utilizing our method. Figure 6 shows the comparison of our method and Steger method. It can be seen that Steger method fail to detect some part of laser stripe when the haze flooded the target region.

Experimental Results
We independently set up a platform for the unmanned car, which is controlled by a single chip microcomputer called Arduino and can be moved remotely by Bluetooth. The structured-light sensor is mounted on the vehicle, and the structured light is projected forward for environmental reconstruction and information perception. The platform is shown in Figure 7.
Experiment in the corridor outside the laboratory and make our own data set for laser strip extraction. Deep learning experiments are conducted using four GTX 2080Ti video cards and other programs are completed under Visual studio 2017.

Discussion and Comparison about Different Structure of LSDNN
We build a dataset independently to train and test LSDNN. There are 5976 images in our dataset in total. We collect and annotate part of the images and the rest images are produced by data augmentation methods. Of the data set, 85% was used for training and 15% for validation.
We use SGD (stochastic gradient descent) as optimizer. ReLU is selected as the activation function in each layer of LSDNN. Table 3 presents the hyper-parameters we used in training process. LSDNN has two parts, one is backbone for extracting rich semantic features, the other is multiscale dilated convolution as decoder outputs the segmented result. ResNet is one of the best backbones of neural network. We use ResNet which is recognized as a good feature extractor as backbone part of the LSDNN.
The other part of LSDNN consists of multiscale dilated convolution and global fusion module and feature fusion module discussed in Section 3.3. When we classify the different pixels into different categories to successfully detect the laser stripe region, we need to fuse all levels of information together. Multiscale analysis is one of the most powerful tools for extracting different levels of information and augmenting the details of the image. As for our targeted task, the horizontal scale of the laser stripe in image is very large, but its width is relatively small. Therefore, we can combine the small-scale features and large-scale features to achieve higher mIoU. The results of different multiscale convolution layer are shown in Table 4. We can see that the performance of LSDNN is not always better when the convolution module increases. In fact, when we conduct the dilated convolution process, we only need small and large receptive field size. The medium size cannot lead to improvement in mIoU as the specific features of the laser stripe we discussed above. Therefore, we choose the multiscale convolution module with the dilation 3 and 18. Moreover, the global fusion part we design is also essential as it combines the detailed information with overall information.
In addition to ResNet, Xception and MobileNet were also very useful backbones. We tested these three different backbones. Although mIoU and Acc class fluctuated as the epoch increased, the overall trend of the curve also increased. The results are shown in Figure 8. We found that ResNet performed best among the three backbones. Table 4 is the quantitative comparison result after 300 epochs. Table 5 shows the quantitative results of changing different backbones. According to the above discussion, ResNet worked best. We selected ResNet as the backbone of LSDNN. During the training process, we optimized the loss function to acquire parameters of each layer. We compared two loss functions: cross-entropy loss function and focal loss function. Figure 9 shows the result. We found that cross-entropy loss function achieved a higher value of mIoU, and the overall trend of the curve was smoother. The cross-entropy loss function was more suitable for the target task. Therefore, we selected cross-entropy loss function for further training.  Table 5 shows the quantitative results of changing different backbones. According to the above discussion, ResNet worked best. We selected ResNet as the backbone of LSDNN. During the training process, we optimized the loss function to acquire parameters of each layer. We compared two loss functions: cross-entropy loss function and focal loss function. Figure 9 shows the result. We found that cross-entropy loss function achieved a higher value of mIoU, and the overall trend of the curve was smoother. The cross-entropy loss function was more suitable for the target task. Therefore, we selected cross-entropy loss function for further training.  After determining the structure of LSDNN, we conducted experiment to further evaluate the performance of our method. Here we used the label image as the ground truth. Then, we calculated the average pixel error of our method and Steger method. The average pixel error was obtained by averaging laser stripe center pixel error in each column. The quantitative results are shown in Table 6. In addition, we compared our laser stripe extraction approach with traditional ones. The "pseudo light", the noise in environment and discontinuity of line added difficulties to the detection task. Using threshold and morphology method to delete small line and connect some edges was not reliable as it was not adaptive and failed when the image changed. Image a and Image b had a "pseudo light stripe" which could not be easily classified as their shape and intensity were similar. Image c failed to detect and extract some part of laser stripe when the light was cut apart by different objects. In addition to this, noises in the surrounding environment, such as the crack of the door, also had similar properties to light stripe. As shown Figure 10, we found that our method performed    Table 5 shows the quantitative results of changing different backbones. According to the above discussion, ResNet worked best. We selected ResNet as the backbone of LSDNN. During the training process, we optimized the loss function to acquire parameters of each layer. We compared two loss functions: cross-entropy loss function and focal loss function. Figure 9 shows the result. We found that cross-entropy loss function achieved a higher value of mIoU, and the overall trend of the curve was smoother. The cross-entropy loss function was more suitable for the target task. Therefore, we selected cross-entropy loss function for further training.  After determining the structure of LSDNN, we conducted experiment to further evaluate the performance of our method. Here we used the label image as the ground truth. Then, we calculated the average pixel error of our method and Steger method. The average pixel error was obtained by averaging laser stripe center pixel error in each column. The quantitative results are shown in Table 6. In addition, we compared our laser stripe extraction approach with traditional ones. The "pseudo light", the noise in environment and discontinuity of line added difficulties to the detection task. Using threshold and morphology method to delete small line and connect some edges was not reliable as it was not adaptive and failed when the image changed. Image a and Image b had a "pseudo light stripe" which could not be easily classified as their shape and intensity were similar. Image c failed to detect and extract some part of laser stripe when the light was cut apart by different objects. In addition to this, noises in the surrounding environment, such as the crack of the door, also had similar properties to light stripe. As shown Figure 10, we found that our method performed After determining the structure of LSDNN, we conducted experiment to further evaluate the performance of our method. Here we used the label image as the ground truth. Then, we calculated the average pixel error of our method and Steger method. The average pixel error was obtained by averaging laser stripe center pixel error in each column. The quantitative results are shown in Table 6. In addition, we compared our laser stripe extraction approach with traditional ones. The "pseudo light", the noise in environment and discontinuity of line added difficulties to the detection task. Using threshold and morphology method to delete small line and connect some edges was not reliable as it was not adaptive and failed when the image changed. Image a and Image b had a "pseudo light stripe" which could not be easily classified as their shape and intensity were similar. Image c failed to detect and extract some part of laser stripe when the light was cut apart by different objects. In addition to this, noises in the surrounding environment, such as the crack of the door, also had similar properties to light stripe. As shown Figure 10, we found that our method performed better than traditional method and therefore the extraction result could be applied to high-accuracy measurement and navigation tasks. better than traditional method and therefore the extraction result could be applied to high-accuracy measurement and navigation tasks. We ran LSDNN and the postprocessing algorithm on the GPU platform (GTX 2080Ti) and optimized the algorithm to avoid wasting computational resources. Our algorithm could process one image in 82 ms on average. It could meet the needs of robot positioning and navigation. We ran LSDNN and the postprocessing algorithm on the GPU platform (GTX 2080Ti) and optimized the algorithm to avoid wasting computational resources. Our algorithm could process one image in 82 ms on average. It could meet the needs of robot positioning and navigation.

Detection and Extraction of the Laser Stripe
Using the network introduced above, we tested many images in different complicated environments. The results are shown in Figure 11. The noises in the image were filtered thoroughly in this way. The "pseudo-stripes" caused by reflection between smooth surfaces were distinguished from the real one. The discontinuity of the laser stripe, the saturation phenomenon and the disturbance resulting from haze were also successfully avoided from influencing the detection and extraction of line center in this way.

Reconstruction of 3D Clouds
The process to acquire the intrinsic and extrinsic parameters of the camera we used is referred to as calibration [38,39]. The three-dimensional point cloud at the position of the light bar could be obtained by intersecting the ray and the light plane. After the center line of the laser stripe was accurately extracted from the image, we could use the formula mentioned in 4.1 to acquire the three-dimensional coordinates of the center line, which were further used for navigation. The results are shown in Figure 12.
Using the network introduced above, we tested many images in different complicated environments. The results are shown in Figure 11. The noises in the image were filtered thoroughly in this way. The "pseudo-stripes" caused by reflection between smooth surfaces were distinguished from the real one. The discontinuity of the laser stripe, the saturation phenomenon and the disturbance resulting from haze were also successfully avoided from influencing the detection and extraction of line center in this way.

Reconstruction of 3D Clouds
The process to acquire the intrinsic and extrinsic parameters of the camera we used is referred to as calibration [38,39]. The three-dimensional point cloud at the position of the light bar could be obtained by intersecting the ray and the light plane. After the center line of the laser stripe was accurately extracted from the image, we could use the formula mentioned in 4.1 to acquire the threedimensional coordinates of the center line, which were further used for navigation. The results are shown in Figure 12.

Accuracy Evaluation of the Structured-Light Vision Sensor
We set the camera coordinate system as − . We selected several points on the intersection of the structured-light plane and the target plane as control points. Then we used the camera's extrinsic parameters to calculate these points' three-dimensional coordinates ( , , ) ( ∈ [1,7]) in − . The results are displayed as the blue dots in Figure 13. Next, we used the measurement model we introduce in Section 2 to calculate the

Accuracy Evaluation of the Structured-Light Vision Sensor
We set the camera coordinate system as O c − X c Y c Z c . We selected several points on the intersection of the structured-light plane and the target plane as control points. Then we used the camera's extrinsic parameters to calculate these points' three-dimensional coordinates X c i , Y c i , Z c i ( i ∈ [1,7]) in O c − X c Y c Z c . The results are displayed as the blue dots in Figure 13. Next, we used the measurement model we introduce in Section 2 to calculate the corresponding point's 3D coordinates X s i , Y s i , Z s i i ∈ [1,7] in the camera coordinate system O c − X c Y c Z c which are expressed as the red dots in Figure 13. X c i , Y c i , Z c i (i ∈ [1,7]) was closer to the truth value than X s i , Y s i , Z s i i ∈ [1,7] [40]. Moreover, this paper approximates X c i , Y c i , Z c i (i ∈ [1,7]) as truth value. We used the error E(X, Y, Z) calculated by Equation (16) to evaluate the measurement accuracy of the sensor.
Sensors 2020, 20, x FOR PEER REVIEW 17 of 19 (a) (b) Figure 13. Measurement error evaluation of the sensor.

Conclusions
This paper proposes a robust detection method of the laser stripe in complex environment by using deep convolutional network, which is able to deal with different kind of noises. We creatively design the structure of LSDNN and carefully test different structures to achieve the best result. The precision of the extraction is improved significantly, and the time cost is also reduced. We also carry out modeling analysis to design the linear structured-light sensor and use it to realize the environmental sensing of narrow space at low cost and high robustness. In some experimental scenes, the point clouds of the scene can be reconstructed well to obtain the relative position relationship between the robot and the environment. Our future research will be focused on dealing with more diverse noises and optimizing the parameters of our sensor. The average distance between the measurement point and the calibrated point on the 12 sets of graphs was recorded, which was about 4 mm and this measurement accuracy meets the navigation requirements.
Twelve maps were collected for calibrating the structured-light sensor. Figure 13 shows the measurement error estimations for each image. The measurement accuracy was higher than kinetic and was close to the LiDAR. According to the above discussion and evaluation, using structured light for navigation in the dark environment was a cheap and promising robot navigation method.

Conclusions
This paper proposes a robust detection method of the laser stripe in complex environment by using deep convolutional network, which is able to deal with different kind of noises. We creatively design the structure of LSDNN and carefully test different structures to achieve the best result. The precision of the extraction is improved significantly, and the time cost is also reduced. We also carry out modeling analysis to design the linear structured-light sensor and use it to realize the environmental sensing of narrow space at low cost and high robustness. In some experimental scenes, the point clouds of the scene can be reconstructed well to obtain the relative position relationship between the robot and the environment. Our future research will be focused on dealing with more diverse noises and optimizing the parameters of our sensor.
Author Contributions: C.Z. and J.Y. were involved in the theoretical performance analysis, designed and optimized the experiment and wrote the study; F.Z. proposed the idea and guided the research direction; other authors revised the manuscript. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.