Lane Line Detection Based on Object Feature Distillation

: In order to meet the real-time requirements of the autonomous driving system, the existing method directly up-samples the encoder’s output feature map to pixel-wise prediction, thus neglecting the importance of the decoder for the prediction of detail features. In order to solve this problem, this paper proposes a general lane detection framework based on object feature distillation. Firstly, a decoder with strong feature prediction ability is added to the network using direct up-sampling method. Then, in the network training stage, the prediction results generated by the decoder are regarded as soft targets through knowledge distillation technology, so that the directly up-samples branch can learn more detailed lane information and have a strong feature prediction ability for the decoder. Finally, in the stage of network inference, we only need to use the direct up-sampling branch instead of the forward calculation of the decoder, so compared with the existing model, it can improve the lane detection performance without additional cost. In order to verify the effectiveness of this framework, it is applied to many mainstream lane segmentation methods such as SCNN, DeepLabv1, ResNet, etc. Experimental results show that, under the condition of no additional complexity, the proposed method can obtain higher F 1 Measure on CuLane dataset.


Introduction
With the increasing popularity of automobiles, the frequent occurrence of road traffic accidents and the aggravation of traffic congestion are becoming increasingly prominent. According to relevant statistics, about 1.35 million people died in traffic accidents worldwide in 2018, and tens of millions of people were injured or disabled in traffic accidents [1]. The occurrence of road traffic accidents not only causes huge economic losses to individuals, families, and even the whole country [2], but also seriously threatens people's safety. According to a statistical analysis of traffic accidents by the Ministry of Public Security, about 50% of significant traffic accidents are related to lane line departure. In this context, a lane-keeping system is favored by more and more drivers. In a lane-keeping system, lane line detection is closely related to the realization of lane-keeping function, which directly affects the robustness and real-time performance of the whole system.
In recent years, the advanced driver assistance system (ADAS) and autonomous driving are becoming more important to reducing traffic accidents. As a key technology for intelligent vehicles, lane detection has attracted widespread attention from plenty of institutes and automobile technology companies [3]. Generally, ADAS technology includes tasks such as lane changing (LC), object detection system (ODS), collision avoidance system (CAS), adaptive cruise control (ACC), lane departure warning system (LDWS) [4]. Among the research, vision-based lane detection has always been a hot topic in the field of lane line detection [5]. Lane markings are an essential traffic safety feature with the functions of separating road areas, stipulating the direction of travel, and providing guidance information for pedestrians. In addition, as one of the core elements of an auxiliary environment • Section 2-Discusses the related work. • Section 3-Describes the methodology and multi-tasks optimization. • Section 4-Discusses the CuLane dataset. • Section 5-Reports the experimental results and analysis. • Section 6-Provides our conclusions.

Related Work
The automatic driving sensing module can obtain the environmental information around the vehicle. One of the most important components of the module is lane line detection, which is the premise to control the safe driving of the automatic driving vehicle within the lane line. Therefore, lane line detection has become a research hotspot in the field of autonomous driving perception. Existing in the lane line detection method can be roughly divided into three categories: traditional image processing method, image processing, and convolutional neural networks (CNN) with the neural network and the lane line detection method based on deep learning.
Among the traditional lane line detection methods, some works of literature [13][14][15] have made a comprehensive summary of it summarized that the traditional lane line detection methods [16] are usually divided into three steps: image pre-processing, local feature extraction, and lane line fitting. Among them, local feature extraction is to capture local lane line information by using edge [17], texture [18], and color [19] features on the region of interest (ROI) [20] in the image, which is a key step of traditional lane line detection methods. However, it needs to analyze the distribution characteristics of lane lines in the image, and manually design and combine feature extraction algorithm. The traditional lane line detection method was selected for specific road scenes. It has been difficult to adapt to the complex scenes in the practical application of autonomous driving, and the design of the extraction algorithm requires a high professional level.
In the method combining traditional image processing and CNN, CNN relies on large-scale datasets [21], and its powerful nonlinear fitting ability, which overcomes the limitations of the traditional lane line detection method and improves the generalization ability of the algorithm. Early lane line detection methods [17,22] based on the convolutional neural network only use a neural network to replace the local feature extraction step in traditional ways, but they still need complex post-processing. For example, Kim et al. [17] combined neural network and RANSAC algorithm to detect lane lines and used CNN to learn lane line features in edge images. The RANSAC algorithm was used to remove outliers and fit the lane lines, but since CNN input is a lane line edge detection image, the accuracy of this method is directly affected by the edge detection algorithm. In literature [23], the top and front view of the lane lines are taken as input, and the dual-view convolutional neural network is used to detect the lane line. However, it requires complex pre-processing of the input image.
The lane line detection method based on deep learning is defined as a dense classification prediction problem. In order to better classification from the supervision of the sparse signal out of the lane line, Pan [24] proposed a spatial convolution neural network (SCNN). SCNN transmits messages between neurons from different directions in the space so that they can capture the spatial relationship between pixels, but it cannot recover the lane line boundary pixels well because of direct up-sampling. Therefore, most lane line segmentation networks [25,26] adopt the encoder-decoder network structure. For example, Kim et al. [27] redefined the point detection problem as a region segmentation problem. SegNet [28] was used to segment the lane lines on both sides in a sequential end-to-end transmission learning method without any post-processing. However, it relies heavily on the lane line segmentation graph as the supervision signal and the training process is very complicated. Zhang et al. [29] use the network structure of two decoders sharing the same encoder to segment the lane line and the driving area and use the link encoder to transfer the lane line information of the two decoders with complementary branches, while the decoder with multi branches will bring huge computing overhead. Literatures [30,31] also adopted the encoder-decoder structure and extended lane line detection to instance segmentation. They need to accurately predict pixel categories and assigned correct instance labels for each pixel. However, they used clustering or complex label allocation strategies, which increased the computational complexity of the model.
To solve the problem that the direct up-sampling algorithm method in the literature as mentioned above cannot recover the lane line features well, and the problem that the network using the encoder-decoder method increases the huge computational cost, this paper proposes an object feature distillation (OFD) framework. It uses a direct up-sampling structure network based on the addition of a decoder branch. It also needs to fuse the features of each stage of the encoder to supplement the detailed information of the target and better restore the classification prediction of the target. Meanwhile, considering the requirement of the autonomous driving system for the reasoning time of lane line detection algorithm, this paper is inspired by the knowledge distilling algorithm [32][33][34]. The classification probability graph generated by the decoder is used as the soft target of the distillation loss function. The classification probability graph of the up-sampling branch is directly used as input to learning the refined classification information of the target in the encoder branch. In addition, only the forward calculation is performed on the direct up-sampling branch in the model inference state. The decoder is not used, which will not bring any additional computational cost.

Methodology
Firstly, detail the SCNN architecture along with network details and then show how we fuse object feature distillation to get better results.

Spatial Convolution Neural Network (SCNN)
The target feature distillation method proposed in this paper is mainly applied to the method of directly up-sampling the feature of the image to the input original image size for pixel-to-pixel classification. The SCNN network is one of the representatives of this kind of lane line segmentation algorithm. As shown in Figure 1, it adds 4 SCNNs in the spatial direction after the top convolution layer ("FC7 layer") of LargeFOV [35] to introduce spatial message propagation so that it can better capture the spatial relationship between pixels. Its backbone network is the modified VGGNet [36], which better preserves spatial information by removing the largest pooling layer in the 3rd and 4th stages, and the standard convolution is replaced by Dilated Convolution (DC) [37] to obtain a larger receiver field. Finally, the directly up-samples branch samples the feature image up by eight times through bilinear interpolation. The resolution is equal to the size of the input image to obtain a larger receptive field. Another existing prediction branch will pass the so f tmax operating to the feature image and input it to the average pooling layer and the fully connected layer (FC), the sigmoid function is used to predict whether there is a lane line.

Spatial Convolution Neural Network (SCNN)
The target feature distillation method proposed in this paper is m method of directly up-sampling the feature of the image to the input for pixel-to-pixel classification. The SCNN network is one of the rep kind of lane line segmentation algorithm. As shown in Figure 1, it ad spatial direction after the top convolution layer ("FC7 layer") of Larg duce spatial message propagation so that it can better capture the spa tween pixels. Its backbone network is the modified VGGNet [36], wh spatial information by removing the largest pooling layer in the 3rd the standard convolution is replaced by Dilated Convolution (DC) [3 receiver field. Finally, the directly up-samples branch samples the fe eight times through bilinear interpolation. The resolution is equal to t image to obtain a larger receptive field. Another existing prediction b operating to the feature image and input it to the average po fully connected layer (FC), the function is used to predict wh line.

Knowledge Distillation (KD) Method
Knowledge distillation (KD) is to extract knowledge from comple compact network learning, which can significantly improve the perfo models and has been extended from image classification tasks to segm 40]. Usually, in the knowledge distillation method, the compact studen intermediate output of the teacher network as the soft target and sup network to learn the knowledge extracted from the teacher network. I

Knowledge Distillation (KD) Method
Knowledge distillation (KD) is to extract knowledge from complex networks to guide compact network learning, which can significantly improve the performance of compact models and has been extended from image classification tasks to segmentation tasks [38][39][40]. Usually, in the knowledge distillation method, the compact student network takes the intermediate output of the teacher network as the soft target and supervises the student network to learn the knowledge extracted from the teacher network. In the literature [39], the geometric information of target depth predicted by the teacher network is used to guide the semantic segmentation tasks. However, lane lines do not have similar target depth information in natural images. In lane line segmentation, knowledge distillation is used to use the more abstract feature images generated in the later stage as the soft target of the distillation [41], which enables the network to learn richer context information. However, it ignores the shallow local detail information, which is very important for lane line boundary restoration. The target feature distillation method proposed in this paper does not need to train the complex teacher network alone, and each branch shares the feature coding network. We only need to use the classification probability map generated by the encoder branch with lane line boundary thinning information like the soft target of distillation to guide the direct up-sampling branch. In addition, the method in this paper does not require additional annotation of the training dataset.

Methodology Follow in This Paper
This paper proposes a method of distilling lane line boundary refinement information from the decoder. It can be applied to various algorithms that directly up-samples the feature image to the input image size for lane line segmentation to learn more accurate lane line boundary information from the decoding branch and do not increase the reasoning time of the network. Next, the design details of the overall framework of the proposed method, the structure of the added decoder branch, and the implementation principle of distilling lane line feature information from the decoder are described in detail.

The Overall Structure Design of the Target Characteristic Distillation Network
In this paper, lane line detection is defined as a pixel-level classification problem; that is, lane line features are extracted from the input image through the backbone network. Then, the resolution of the feature image is gradually restored, and the lane line information in the feature image is refined in the decoder. Finally, distilling the classification results of each pixel predicted by the decoder to the direct up-sampling branch to enhance its prediction ability of lane line features. As shown in Figure 2, the whole network is composed of four parts, including 1 encoder and 3 output branches.  The design of the encoder structure in the lane line detection network determines the ability of the network to extract the target features, and the design of the decoder structure affects the ability to restore and predict the details of the target features. In order to explore the influence of different decoder structures on the feature prediction of recovery lane In the existing method, the encoder is usually a modified mainstream Convolutional Neural Network (CNN) [42,43]. For example, to preserve the spatial information of the target, VGGNet and ResNet will remove the maximum pooling layer of the last two stages in VGGNet and Resnet or modify the convolution step size to 1 and replace the standard convolution layer with hole convolution with a different void ratio, to obtain a larger receptive field. In addition, "FC6" and "FC7" fully connected convolution layer, as shown in Figure 1, are added. Their output with high-dimensional feature images enables the network to learn more abundant lane line features.
The direct up-sampling branch changes the channel of the output feature image of the backbone network through a 1 × 1 convolution layer. It then uses bilinear interpolation to directly restore the resolution of the feature image to the size of the input image. However, the resolution of the feature image is continuously reduced in the encoder, and the representation of the feature becomes more and more abstract. Therefore, the lane line detection network using the direct up-sampling method often fails to sufficiently recover the lane line boundary information. Therefore, a decoder branch is added in this paper, which can restore the resolution of the feature image and refine the lane line features through three times of two times up-sampling and convolution operations.
It is predicted that the lane lines have branches that will act on the backbone network. The design details are shown in Figure 1. Firstly, the feature graph output by the decoder is subjected to the so f tmax operation; then, it is stretched into a one-dimensional vector and sent to the average pooling layer and the two-layer fully connected layer. Finally, the output result of the full connection layer is used as a sigmoid operation input to the binary cross-entropy loss function. The lane line branch prediction results are used for the lane line evaluation stage, and only the segmentation image that predicts the presence of lane lines is obtained for lane line coordinates to reduce the network's misdetection of non-existent lane lines.

Network Decoder Structure Design of Target Feature Distillation Network
The design of the encoder structure in the lane line detection network determines the ability of the network to extract the target features, and the design of the decoder structure affects the ability to restore and predict the details of the target features. In order to explore the influence of different decoder structures on the feature prediction of recovery lane lines, this paper added decoders to the SCNN network. The decoder is designed with three stages, and its structure is shown in Figure 3a. The output of the encoder is directly refined by three times of convolution layer and two times of bilinear interpolation so that the resolution of the feature image is restored to the original image size. The structure of the 5-stage decoder is shown in Figure 3b. In addition to the above operations, also fully consider the different characteristics of the target feature expression at a different stage of the encoder. The decoder features of the later stage are used to guide the feature learning of the previous stage to recover the prediction of target details better. This decoder design can be applied to various lane line segmentation algorithms using the direct up-sampling method. In addition, when using ResNet as the backbone network, only four stages of decoders are needed. In Figure 3, "×" is used to define that how many convolutional layers we used, and "*" is used to define the convolutional kernel size we use in that specific layer.

Loss Function of the Target Feature Distillation Network
Inspired by using the classification probability map output by the teacher network as the soft target of the pixel-level distillation loss function in the existing research, the classification probability map generated by the decoder branch is distilled in this paper. The direct up-sampling branch learns the lane line boundary information to enhance the detection of lane line boundary. Therefore, the final convolution output of decoder branching is defined as ∈ × × , where represents the number of channels of convolution output. ℎ represents the height of the output feature graph; represents the width of the output feature graph. Similarly, the directly up-samples branch convolution output is defined as ∈ × × . In order to enable the direct up-sampling branch to learn the pixel-level classification results of the decoder branch, the mean square error loss function is used as the distillation loss function of the two branches, and its definition is shown in Equation (1): where (. ) represents operation. After the convolution output passes through , the sum of all pixels in the channel dimension is 1, and any pixel value on the feature image represents the classification probability that the current position belongs to one of the lane lines or background; represents the target value of the decoder distillation loss function, and is the input value of the distillation loss function. The distance between the classification probabilities of the two branch pixels is measured by Equation (1), and the network minimizes the distance between the input value and the target value through continuous iteration.
The definition of the total loss function of the method in this paper is shown in Equation (2): The total loss function is composed of four parts, in which (. ) and (. ) represent the cross-entropy loss function of the direct up-sampling branch and the decoder

Loss Function of the Target Feature Distillation Network
Inspired by using the classification probability map output by the teacher network as the soft target of the pixel-level distillation loss function in the existing research, the classification probability map generated by the decoder branch is distilled in this paper. The direct up-sampling branch learns the lane line boundary information to enhance the detection of lane line boundary. Therefore, the final convolution output of decoder branching is defined as A ∈ R c×h×w , where c represents the number of channels of convolution output. h represents the height of the output feature graph; w represents the width of the output feature graph. Similarly, the directly up-samples branch convolution output is defined as B ∈ R c×h×w . In order to enable the direct up-sampling branch to learn the pixel-level classification results of the decoder branch, the mean square error loss function is used as the distillation loss function of the two branches, and its definition is shown in Equation (1): where ϕ(.) represents so f tmax operation. After the convolution output passes through so f tmax, the sum of all pixels in the channel dimension is 1, and any pixel value on the feature image represents the classification probability that the current position belongs to one of the lane lines or background; ϕ A ij represents the target value of the decoder distillation loss function, and ϕ B ij is the input value of the distillation loss function.
The distance between the classification probabilities of the two branch pixels is measured by Equation (1), and the network minimizes the distance between the input value and the target value through continuous iteration.
The definition of the total loss function of the method in this paper is shown in Equation (2): The total loss function is composed of four parts, in which l up (.) and l decoder (.) represent the cross-entropy loss function of the direct up-sampling branch and the decoder branch, respectively.Â and B both represent the true annotation of the current input image; l dist (.) represents the decoder information of distillation loss function in Equation (1), which means that the lane line has a branch loss function, using the binary cross loss function. E is the output of the sigmoid function; E represents the true label of whether the lane line exists in the current input image. Finally, parameters α and β were used to balance the distillation task and the presence of lane lines to predict the influence of the task on the performance of the whole network.

Dataset
Despite the importance and complexity of traffic lane line detection, current datasets are either too limited or too easy to compare various approaches, and a large public annotated benchmark is needed [13]. KITTI [44] and CamVid [45] both have pixel-level annotations for lane markings, but only hundreds of images, making deep learning approaches ineffective. Caltech Lanes Dataset [16] and TuSimple Benchmark Dataset [46] include 1224 and 7000 photos with annotated lane markings, respectively, in a restricted scenario with light traffic and transparent lanes marking. Furthermore, none of these datasets annotates lane markings that are occluded or unseen due to abrasion. However, such lane markings can be inferred by humans and are extremely useful in real-world applications.
In this paper, we prefer to use the CuLane dataset [24]. Cameras were installed in six different vehicles driven by six different drivers and captured videos while driving in Beijing on various days. More than 55 h of video were collected, yielding 133,235 frames, more than 20 times the size of the TuSimple Dataset. The dataset was divided into 88,880 for preparation, 9675 for validation, and 34,680 for testing. These input images have a resolution of 1640 × 590. We remove the sky and very close areas from the image and resize accordingly to the lane line network model. Our initial experiments found that traditional photometric and geometric recipes for data augmentation were not providing better driving results, so we do not use augmentation. Figure 4 depicts a variety of scenes, including urban, agricultural, and highway settings. As one of the world's largest and most congested cities, Beijing presents many daunting traffic scenarios for lane line detection. The dataset was divided into two categories: standard and difficult, which correspond to the nine examples in Figure 4. The proportion of each scenario is shown in Figure 5. The eight difficult scenarios account for the majority of the dataset (72.3%). We manually annotate the traffic lane lines with cubic splines for each frame. As previously mentioned, lane markers are often obscured by automobiles or are not visible. Even in these difficult situations, lane line detection algorithms must estimate lane locations from the background in real-world applications. As a result, we continue to annotate the lanes according to the context in these situations, as shown in Figure 4.

Experimental Result and Analysis
The training, validation, and testing of the experimental model in this paper are all built by TensorFlow [48] framework and cuDNN [49] kernel used for calculation. Hardware equipment mainly includes a high-performance workstation host. The workstation is configured with an Intel@CoreTM i7-6800K CPU@3.40 GHz, NVIDIA 2080 Ti graphics card.

Evaluation Standard
To verify the effectiveness of the proposed method, the official evaluation terminology is fully used when evaluating the CuLane dataset. In the literature [25], to make the evaluation more reasonable, both the predicted and the real lane line are labeled as lines with a width of 30 pixels and calculated the intersection over union (IOU) between them. True-positive (TP) cases are selected by setting IOU threshold, which is usually set to 0.3 or 0.5. Then, the F1 Measure is used to quantify the results of lane line detection, whose definition is shown in Equation (3):

Precision (P) is defined as
where FP represents false-positive cases, and Recall Rate (R) is defined as: where FN denote false negative (FN), γ is usually set to 1. In addition, during the test, the coordinates of each lane line need to be determined from the probabilistic map predicted by the network, so the results with branches need to be used. If the existing value of the lane line in the predicted probability map is greater than the threshold value, then the pixel coordinate points of the corresponding lane line are obtained from the probability map for subsequent calculation of F1 Measure which can effectively reduce the false detection rate.

Experimental Environment & Network Parameter Setting
This paper uses the standard Stochastic Gradient Descent (SGD) training model on the CuLane data set. The batch size is set as 12, and the basic learning rate is 0.01, the momentum is 0.9. The weight decay is 0.0001. The learning rate strategy adopts "Poly," and the learning index and the iteration times are set as 0.9 and 60 K, respectively.
The backbone network used in the experiment is VGGNet and ResNet, both of which use the pre-trained weights of the ImageNet dataset and enhance the input data. First, random rotation of [−2 • , 2 • ] is performed on the batch data; then, random cropping is performed, and the cropped image size is equal to 0.05 times the original input image resolution. Finally, the image resolution is adjusted to 288 × 800 pixels. Through a large number of experiments, α in Equation (2) is set as 10, and β is set as 0.1. All experiments in this paper were carried out under the PyTorch framework, and two NVIDIA 2080 Ti GPUs were used. In addition, in training the network, the gradient of distillation loss during backpropagation does not update the decoder branch parameters; it only affects the weights of the parameters of the direct up-sampling branch.

Quantitative Analysis
To verify the effectiveness of the proposed method in this paper and calculate its F1 Measure on the CuLane dataset, Table 2 shows the calculation results of the target feature distillation method applied on the SCNN [24]. In nine complex scenarios of the CuLane dataset, the F1 Measure of the proposed method in six scenarios, and the whole dataset is ahead of the other algorithms. It verifies the effectiveness of the proposed method without increasing the additional computation cost. However, in-crowd and dazzle light scenario GCJ [29] method because F1 Measure is higher than the other methods. This is because GCJ method uses the geometric prior information between the lane line and the driving area, which increases the context reasoning ability of the algorithm for occluding lane line. By adding geometric information, the segmentation effect of lane lines is improved, which is the next work of this paper. As shown in columns 6-9 of Table 3, F1 Measure of the target feature distillation method applied to DeepLabv1, ResNet50, ResNet101, and SCNN are shown, respectively [25]. The results show that the target feature distillation method proposed in this paper can improve the lane line detection performance in each sub-scene and the total dataset, especially in the shadow and dazzle light scene. This method brings significant improvement, which verifies its effectiveness and versatility. In addition, columns 2-5 in Table 3 are the evaluation results of literature [24] on the corresponding lane line detection algorithm. Since there is no lane line labeling in the crossroad sub-scene data, Table 3 only shows its FP measure. As shown in Figure 6, the improved F1 Measure of the four groups of comparative experiments in Table 3 in each scene were calculated, and their mean and standard deviation were calculated. It can be seen from the figure that the average value of the improvement is the highest in the Resnet50 method, and it can be found that the network performance of the decoder with a relatively simple structure is improved more obviously, which indicates the importance of the decoder for the prediction of lane line feature recovery, and also proves the effectiveness of the target feature distillation method. Additionally, in the Resnet101 method, the performance of each sub-scene has the closest improvement range, so the standard deviation of its F1 Measure improvement is the smallest, which may be due to its encoder's strong feature expression ability, which enables it to extract lane line features in each traffic scene effectively.

Qualitative Analysis
The target feature distillation method was applied to DeepLabv1 and SCNN methods, and lane line detection results before and after application were compared. As shown in Figure 7, column 1 represents the actual label of the input image. The images in column 2 represent the results without using the target feature distillation method. The image in column 3 represents the experimental results after applying the method. By comparing

Qualitative Analysis
The target feature distillation method was applied to DeepLabv1 and SCNN methods, and lane line detection results before and after application were compared. As shown in Figure 7, column 1 represents the actual label of the input image. The images in column 2 represent the results without using the target feature distillation method. The image in column 3 represents the experimental results after applying the method. By comparing the lane line detection results at the red circle in column 2 image of the DeepLabv1 method, it can be seen that the target feature distillation method can more effectively recover the prediction of lane line pixels in congestion and night scenes. From the red circle in the lower right corner of the SCNN prediction result image, it can be seen that the method presented in this paper can effectively reduce the false detection rate of lane lines on the shadow road without clear lane line identification.   As different decoders have different degrees of recovery and refinement to the encoder feature classification prediction results. Various decoder networks are designed to add to the SCNN method, and the influence of lane boundary refinement information on lane detection performance in the distillation is explored. As shown in Table 4, row 2 shows the experimental results when the decoder with three stages is up-sampled, row 3 shows the results of using a five stage decoder network. Row 4 shows the results after using the smooth decoder network proposed in the literature [50]. Due to the highest complexity of decoder design and channel attention module, the attention weight vector can be learned and generated. By adjusting the channel weight, the classification probability of lane line pixel can be enhanced to improve the lane line detection performance and to distill the information of smooth decoder, the 1 of the new method is 74.1%, which is 2.5% higher than that of the original SCNN. It also shows the effectiveness of the   Figure 8, the method proposed in this paper can effectively detect lane lines under harsh conditions with no obvious illumination at night.
As different decoders have different degrees of recovery and refinement to the encoder feature classification prediction results. Various decoder networks are designed to add to the SCNN method, and the influence of lane boundary refinement information on lane detection performance in the distillation is explored. As shown in Table 4, row 2 shows the experimental results when the decoder with three stages is up-sampled, row 3 shows the results of using a five stage decoder network. Row 4 shows the results after using the smooth decoder network proposed in the literature [50]. Due to the highest complexity of decoder design and channel attention module, the attention weight vector can be learned and generated. By adjusting the channel weight, the classification probability of lane line pixel can be enhanced to improve the lane line detection performance and to distill the information of smooth decoder, the F1 Measure of the new method is 74.1%, which is 2.5% higher than that of the original SCNN. It also shows the effectiveness of the proposed method.  As different decoders have different degrees of recovery and refinement to the encoder feature classification prediction results. Various decoder networks are designed to add to the SCNN method, and the influence of lane boundary refinement information on lane detection performance in the distillation is explored. As shown in Table 4, row 2 shows the experimental results when the decoder with three stages is up-sampled, row 3 shows the results of using a five stage decoder network. Row 4 shows the results after using the smooth decoder network proposed in the literature [50]. Due to the highest complexity of decoder design and channel attention module, the attention weight vector can be learned and generated. By adjusting the channel weight, the classification probability of lane line pixel can be enhanced to improve the lane line detection performance and to distill the information of smooth decoder, the 1 of the new method is 74.1%, which is 2.5% higher than that of the original SCNN. It also shows the effectiveness of the proposed method.  Since the final output result of the convolutional layer of the network is operated by so f tmax, the classification probability of the pixel belonging to each lane line and background in the channel dimension of the feature map satisfies the probability distribution. Therefore, in the design of distillation loss function and using the mean square error (MSE) to measure the distance of classification probability of each pixel directly. The KL divergence loss function can also measure the distance between the distribution of classification probability map of lane line output from direct up-sampling branch and decoder branch, which measures the distance between the output results of two branches from the perspective of a probability distribution. As shown in Table 5, row 2 is the experimental result of using the mean square error (MSE) loss function as the distillation loss function. Row 3 shows the experimental results using the KL divergence loss function as the distillation loss function, which improves the F1 Measure on the dataset by 0.4% compared to the results using the mean square error (MSE) as the distillation loss function.

Conclusions and Future Work
In this paper, lane detection is defined as the problem at a pixel-level classification. To solve the problem of lane line segmentation without using a decoder to recover the lane line boundary refinement information. By adding the decoder branch, the target of the lane line can be better recovered from the low-resolution feature image. The feature pixel classification prediction and the pixel-level distillation loss function are applied. The direct up-sampling branch learns the lane line segmentation probability map generated by the decoder branch to make up for the lack of lane line boundary refinement information. Therefore, without increasing any computational cost, the network's ability to detect lane lines is improved. The experimental result shows that the target feature distillation method proposed in this paper improves the F1 Measure and can better adapt the various types of road scenes. In straight lines, curves, backlit scenes, and vehicle occlusion scenes, the algorithm has good detection accuracy, strong robustness, and higher detection speed. In addition, the prior geometric information between lane lines can effectively improve the performance of lane detection.
In future work, the lane detection and classification results will fuse with a forwardcollision warning strategy. It will be helpful for assisted driving in avoiding collision caused by accidental lane deviation, fatigue, or driving under the influence of controlled substances in complex or straight road environments structured.

Conflicts of Interest:
The authors declare no conflict of interest.