Foggy Lane Dataset Synthesized from Monocular Images for Lane Detection Algorithms

Accurate lane detection is an essential function of dynamic traffic perception. Though deep learning (DL) based methods have been widely applied to lane detection tasks, such models rarely achieve sufficient accuracy in low-light weather conditions. To improve the model accuracy in foggy conditions, a new approach was proposed based on monocular depth prediction and an atmospheric scattering model to generate fog artificially. We applied our method to the existing CULane dataset collected in clear weather and generated 107,451 labeled foggy lane images under three different fog densities. The original and generated datasets were then used to train state-of-the-art (SOTA) lane detection networks. The experiments demonstrate that the synthetic dataset can significantly increase the lane detection accuracy of DL-based models in both artificially generated foggy lane images and real foggy scenes. Specifically, the lane detection model performance (F1-measure) was increased from 11.09 to 70.41 under the heaviest foggy conditions. Additionally, this data augmentation method was further applied to another dataset, VIL-100, to test the adaptability of this approach. Similarly, it was found that even when the camera position or level of brightness was changed from one dataset to another, the foggy data augmentation approach is still valid to improve model performance under foggy conditions without degrading accuracy on other weather conditions. Finally, this approach also sheds light on practical applications for other complex scenes such as nighttime and rainy days.


Introduction
With the rapid development of autonomous driving and assisted driving technologies, the accurate perception of dynamic traffic elements has become an essential prerequisite for reliable active safety strategies [1,2]. As a vital function of dynamic traffic perception, lane detection has gained increasing attention in recent years. Meanwhile, vision-based lane detection technology has seen significant progress, and deep neural networks, such as the mainstream technology for computer vision, have been widely utilized in lane detection [3][4][5][6][7][8][9][10]. However, the application scenarios of most deep learning (DL)-based lane detection are still limited to ideal weather conditions, e.g., clear daytime. However, little research [11][12][13][14][15][16] focuses on low-light weather conditions, such as foggy and rainy days, which are significant for increasing the adaptability of perception technology for autonomous driving vehicles. In fact, lane detection can be more challenging in complicated weather conditions. As the meteorological and lighting conditions deviate from the ideal case, both the clarity and contrast of the images decrease. Hence, problems such as color distortion and loss of fine features occur in the image, which bring about more difficulties in extracting lane features, thus affecting the accuracy of the computer vision based model [17,18]. Therefore, this paper aims to improve the accuracy of DL-based lane detection model in the complex foggy weather condition.
Over the past years, a great deal of attention has been paid to image dehazing algorithms to increase scene visibility, from traditional image processing methods [19][20][21][22][23][24][25] to  [13] and (b) FRIDA2 [14], (c) is one original clear image from Cityscapes [36], (d) displays its depth, and (e) is the simulated foggy image from Foggy Cityscapes [15]. (f,g,h) show the clear image, depth map, and generated foggy image from our proposed dataset, respectively.
In this work, we propose a new dataset augmentation method, using a self-supervised monocular depth prediction method to extract the depth information from monocular lane images and then synthesize foggy images based on the atmospheric scattering model. Our proposed framework estimates the depth map only from the original image, which means the method could be easily applied to other datasets without depth information collected by extra sensors. We established the FoggyCULane dataset by enlarging the CULane dataset with the synthesis of foggy images using our method to improve the accuracy of the lane detection model in foggy scenes. Networks with outstanding lane detection performance in recent years are trained with our FoggyCULane and CULane to verify the feasibility and effectiveness of our method in different complex scenes, especially in foggy scenes. The main contributions of this paper are: • A new dataset augmentation and synthesis method was proposed for lane detection in foggy conditions, which highly improved the accuracy of the lane detection model under foggy weather without introducing extra computational cost or complex framework for the algorithm.

•
We established a new dataset, FoggyCULane, which contains 107,451 frames of labelled foggy lanes. This would help the community and researchers to develop and validate their own data-driven lane detection or dehaze algorithms.

Background
Many studies have analyzed the reasons for the loss of sharpness and contrast in foggy images through modeling. In 1924, Koschieder et al. introduced the standard optical model [43] for daytime fog, which has been used extensively in image dehazing. McCartney E J et al. [44] further reported in 1976 that the absorption and scattering of light by suspended particles (including water droplets, dust, and aerosols) in the atmosphere causes the attenuation of light transmitting between the target and the camera. Besides, the scattering of suspended particles also generates background light, which leads to a decrease in contrast and saturation of images under foggy conditions. In 1999, Nayar et al. [45] established a mathematical model to describe the atmospheric scattering process clearly. The model assumes that under a strong scattering medium (e.g., foggy scenes), firstly, the light reflected by the target is absorbed and scattered by suspended particles in the atmosphere, resulting in the attenuation of the light reflected by the target, thus decreasing the brightness and contrast of the imaging. Secondly, ambient light, such as sunlight, is scattered by the scattering medium in the atmosphere to form air light. In this work, we propose a new dataset augmentation method, using a self-supervised monocular depth prediction method to extract the depth information from monocular lane images and then synthesize foggy images based on the atmospheric scattering model. Our proposed framework estimates the depth map only from the original image, which means the method could be easily applied to other datasets without depth information collected by extra sensors. We established the FoggyCULane dataset by enlarging the CULane dataset with the synthesis of foggy images using our method to improve the accuracy of the lane detection model in foggy scenes. Networks with outstanding lane detection performance in recent years are trained with our FoggyCULane and CULane to verify the feasibility and effectiveness of our method in different complex scenes, especially in foggy scenes. The main contributions of this paper are:

•
A new dataset augmentation and synthesis method was proposed for lane detection in foggy conditions, which highly improved the accuracy of the lane detection model under foggy weather without introducing extra computational cost or complex framework for the algorithm.

•
We established a new dataset, FoggyCULane, which contains 107,451 frames of labelled foggy lanes. This would help the community and researchers to develop and validate their own data-driven lane detection or dehaze algorithms.

Background
Many studies have analyzed the reasons for the loss of sharpness and contrast in foggy images through modeling. In 1924, Koschieder et al. introduced the standard optical model [43] for daytime fog, which has been used extensively in image dehazing. McCartney E J et al. [44] further reported in 1976 that the absorption and scattering of light by suspended particles (including water droplets, dust, and aerosols) in the atmosphere causes the attenuation of light transmitting between the target and the camera. Besides, the scattering of suspended particles also generates background light, which leads to a decrease in contrast and saturation of images under foggy conditions. In 1999, Nayar et al. [45] established a mathematical model to describe the atmospheric scattering process clearly. The model assumes that under a strong scattering medium (e.g., foggy scenes), firstly, the light reflected by the target is absorbed and scattered by suspended particles in the atmosphere, resulting in the attenuation of the light reflected by the target, thus decreasing the brightness and contrast of the imaging. Secondly, ambient light, such as sunlight, is scattered by the scattering medium in the atmosphere to form air light. Sometimes the intensity of this scattered light is greater than that of the target light, thus causing the image to blur.
However, the reconstruction of foggy images using the above model requires depth information from the original image. Since most current traffic lane datasets do not provide the required depth information, including the most widely used CULane dataset, the extraction of depth information needs to be performed as the first step of foggy image synthesis.
Depending on the number of viewing angles, the depth estimation of images can be divided into binocular depth estimation and monocular depth estimation. However, the binocular algorithms require multiple images of the same scene with different angles, from which the corresponding targets are matched and reconstructed in 3D.
Therefore, monocular depth prediction has been extensively studied in the field of computer vision in recent years. Based on the basic assumption that there are mapping relationships between RGB images and depth maps, data-driven deep learning methods have been proposed and applied to monocular depth estimation problems. Among them, deep learning techniques represented by Convolution Neural Network (CNN) have made substantial progress and gradually become the mainstream method for monocular image depth estimation [46][47][48][49][50].
Supervised by images with depth annotations, Eigen et al. [51] first trained a CNNbased depth estimation model and realized monocular depth estimation using the deep learning method. They proposed to estimate the depth of a monocular image using two scales of neural networks: a coarse-scale network to predict the global depth of the image and a fine-scale network to optimize the local details.
In fact, acquiring large amounts of diverse images with exact depth information is challenging, and supervised learning cannot adapt to changing environments. Supervised learning methods require that each RGB image has its corresponding depth label, and the depth label annotation usually requires depth cameras or light detection and ranging (LIDAR), which are limited by detection range and cost, respectively. In addition, depth labels acquired from LIDAR are usually sparse points that are difficult to match with the original pixel-level image. Therefore, unsupervised or self-supervised depth estimation methods that do not require depth labels have been a research trend in recent years. One of the most prominent methods is the Monodepth2 network proposed by Godard et al. [52].

Methods
In this paper, we proposed a dataset enhancement method for lane detection in foggy conditions based on the original CULane dataset, using a self-supervised monocular depth prediction method to extract the depth information from monocular lane images and synthesize foggy images based on the atmospheric scattering model. The main framework of our method is shown in Figure 2.

The Standard Optical Model of Foggy Images
From Koschmieder's law [43], the light intensity of the target observed by the camera mainly contains two parts: the light reflected by the object that reaches the detection system after attenuation by the transmission medium and the background light formed by particle scattering. The model of imaging in foggy (atmospheric scattering) conditions is defined by Koschmieder's law as follows: where x indicates one certain pixel of the image, I(x) refers to the observed foggy image of the object at the pixel x, and J(x) refers to the original clear image of the object. A is the global skylight representing the ambient light in the atmosphere and is generally assumed to be a constant. Additionally, t(x) determines the transmission of the light from the object in the atmosphere. Assuming the medium is homogeneous, the transmission t(x) can be further determined as: where l(x) is the distance between the object and observer, β is the extinction coefficient, and larger β represents the thicker fog. It can be seen that image degradation and information loss increase with depth l(x). Therefore, to generate synthetic fog images from the original clear scene, the key is to obtain the value of l(x) in (2), that is, the depth information of the clear images needs to be extracted.

The Standard Optical Model of Foggy Images
From Koschmieder's law [43], the light intensity of the target observed by the camera mainly contains two parts: the light reflected by the object that reaches the detection system after attenuation by the transmission medium and the background light formed by particle scattering. The model of imaging in foggy (atmospheric scattering) conditions is defined by Koschmieder's law as follows: where indicates one certain pixel of the image, ( ) refers to the observed foggy image of the object at the pixel , and ( ) refers to the original clear image of the object. is the global skylight representing the ambient light in the atmosphere and is generally assumed to be a constant. Additionally, ( ) determines the transmission of the light from the object in the atmosphere. Assuming the medium is homogeneous, the transmission ( ) can be further determined as: where ( ) is the distance between the object and observer, is the extinction coefficient, and larger represents the thicker fog. It can be seen that image degradation and information loss increase with depth ( ). Therefore, to generate synthetic fog images from the original clear scene, the key is to obtain the value of ( ) in (2), that is, the depth information of the clear images needs to be extracted.

Monocular Depth Estimation
With the rapid development in deep learning, deep neural networks have shown excellent performance in recovering the pixel-level depth map from a single image [46][47][48][49][50]. In this paper, the Monodepth2 network (trained on the KITTI dataset) proposed by Go-

Atmospheric
Scattering Model

Depth maps Monodepth2
Depth Estimation Predict Clear images

Monocular Depth Estimation
With the rapid development in deep learning, deep neural networks have shown excellent performance in recovering the pixel-level depth map from a single image [46][47][48][49][50]. In this paper, the Monodepth2 network (trained on the KITTI dataset) proposed by Godard et al. [52] is used for depth extraction of clear images in the CULane dataset, which is, in turn, employed to generate foggy images.
Monodepth2 is a self-supervised monocular depth estimation network that combines depth estimation and pose estimation to achieve pixel-level SOTA depth predictions. The recovery of the depth information was achieved based on the photo-consistency assumption, i.e., points in the same space should also have the same luminosity in the projection of different views. Based on minimum reprojection loss, the full-resolution multiscale sampling method, and auto-masking loss, Monodepth2 can achieve outstanding depth estimation while trained with either monocular or binocular images as input data.
The loss functions of Monodepth2 are defined as follows: where I t →t is defined as the relative pose for each source view I t with respect to the target image I t 's pose; proj( ) projects the predicted depth map D t in resulting 2D coordinates with I t and is the sampling operator; K represents pre-computed intrinsics for simplicity; pe is the photometric error function constructed by L1-norm and Structural Similarity(SSIM); d * t is the mean-normalized inverse depth; L s is edge-aware smoothness and L p is the per-pixel photometric loss; µ and λ are coefficients corresponding to auto-masking stationary pixels and multi-scale estimation respectively; L is the final loss function.
Considering that our purpose is to generate foggy images whose density varies with distance, it is unnecessary to obtain the real and absolute depth information. Instead, the trend of depth variation or the relative variation of depth meets the requirement. Therefore, Monodepth2 is sufficient for this current study.

Foggy Image Generation
To adapt to the relative depth, the actual transmission value t (x) is reformed as: Here, l (x) is the depth matrix in the range [0, 1], which indicates the normalized depth estimated by Monodepth2. Substituting (2) with (4) we can acquire the observed foggy image of the object I(x). The synthetic monocular depth map and foggy images acquired using the above method are shown in Figure 3. The acquired depth map can clearly represent the depth of each part of the original image and the trend of depth change. The generated foggy images look like real foggy pictures from the driver's perspective.
where → is defined as the relative pose for each source view with respect to the target image 's pose; ( ) projects the predicted depth map in resulting 2D coordinates with and ⟨ ⟩ is the sampling operator; represents pre-computed intrinsics for simplicity; is the photometric error function constructed by L1-norm and Structural Similarity(SSIM); * is the mean-normalized inverse depth; is edge-aware smoothness and is the per-pixel photometric loss; and are coefficients corresponding to auto-masking stationary pixels and multi-scale estimation respectively; is the final loss function.
Considering that our purpose is to generate foggy images whose density varies with distance, it is unnecessary to obtain the real and absolute depth information. Instead, the trend of depth variation or the relative variation of depth meets the requirement. Therefore, Monodepth2 is sufficient for this current study.

Foggy Image Generation
To adapt to the relative depth, the actual transmission value ′( ) is reformed as: Here, ( ) is the depth matrix in the range [0, 1], which indicates the normalized depth estimated by Monodepth2. Substituting (2) with (4) we can acquire the observed foggy image of the object I(x). The synthetic monocular depth map and foggy images acquired using the above method are shown in Figure 3. The acquired depth map can clearly represent the depth of each part of the original image and the trend of depth change. The generated foggy images look like real foggy pictures from the driver's perspective.        The CULane dataset focuses on the current lane and up to 4 adjacent lane markings, which are the most concerned lanes during driving. The traffic lanes are manually annotated with cubic splines for each image, and the labels are recorded as a set of coordinate points. The lanes are still annotated based on semantic information for cases where the lane markings are obscured or not visible by vehicles.

Establishment of FoggyCULane Dataset
Through the foggy image generation method, we expand the training set, validation set, and test set of CULane, respectively. Considering the diversity and complexity of fog density in the real foggy scene, we attempt to simulate fog scenes more realistically and adapt the dataset to multiple density scenes. Through experimentation, we found that the values of β should be limited to a reasonable range, or the artificially generated fog would be almost invisible or too thick. Thus, β is finally set to 2, 3, and 4, respectively, to generate three levels of foggy scenes with increasing fog density, as shown in Figure 5. Meanwhile, because we use the original images in the CULane dataset, the lane annotations are still applicable to the synthetic images. Through the foggy image generation method, we expand the training set, validation set, and test set of CULane, respectively. Considering the diversity and complexity of fog density in the real foggy scene, we attempt to simulate fog scenes more realistically and adapt the dataset to multiple density scenes. Through experimentation, we found that the values of should be limited to a reasonable range, or the artificially generated fog would be almost invisible or too thick. Thus, is finally set to 2, 3, and 4, respectively, to generate three levels of foggy scenes with increasing fog density, as shown in Figure 5. Meanwhile, because we use the original images in the CULane dataset, the lane annotations are still applicable to the synthetic images. We selected 16,532 images from the CULane train set as the original inputs for foggy image generation and processed them separately with different fog densities (β = 2, 3,4). Together with the original lane images, the total number of the training set for our FoggyCULane was 138,476. Similarly, while keeping the original test set images, we selected all the images in the normal scene to synthesize the foggy images and form the new test set with 63,510 images. As for the validation set, all the 9675 images were used for foggy image generation, resulting in 38,700 images in the new validation set. Table 1 presents the file structure of our synthesized foggy images in FoggyCULane based on the original CULane dataset, while Table 2 shows the comparison between test sets in CULane and FoggyCULane.

Dataset
There were 186,786 images in total in our FoggyCULane dataset. To verify the effectiveness of the synthetic dataset in improving the performance of lane detection networks, we used the Spatial CNN (SCNN) network as the baseline method to investigate the datasets with different densities of fog. Other state-of-art methods were also employed to evaluate the improvement of the FoggyCULane dataset. Additionally, 300 hazy images with annotated lane makings in the VIL-100 dataset were employed to evaluate the effectiveness of our proposed framework. Despite of the hazy images in VIL-100, we also collected 182 traffic lane images under real thick foggy weather to test whether the trained SCNN network is adequate for more challenging real foggy scenes. Moreover, the fog synthetization method was also applied to another dataset, i.e., VIL-100, to test its adaptability.

Evaluation Metrics
To quantitatively evaluate the lane detection results, we refer to the methods in [6], where both the real lane annotation and the predicted lane marking are considered a line area with a width of 30 pixels, and the Intersection over Union (IoU) between them is calculated. In this paper, the IoU threshold is set to 0.5. When the IoU between the predicted lane marking and the real lane is not less than 0.5, the prediction is considered as True Positive (TP), and the opposite is considered as False Positive (FP); accordingly, True Negative (TN) and False Negative (FP) can also be defined.
As mentioned above, the expanded test set of FoggyCULane had 12 different complex scenes, including the newly added three different density foggy scenes. Here, the F1measure is employed to be the index for quantitative evaluation and is defined as follows:

Implementation Details
SCNN is a spatial convolutional neural network creatively proposed by Xingang Pan et al. [6], which is reported to have the highest lane detection accuracy in the CULane dataset. Here, the model was trained with Stochastic Gradient Descent (SGD) as the optimizer with a base learning rate of 0.01 and momentum of 0.9. Weight decay is set to be 10 −4 . The training and testing were undertaken on 8 NVIDIA GTX 2080Ti GPUs and an Intel Xeon E5-2682 v4 CPU. Before training, the images were resized to 800 × 288 for the limited memory of the GPU. All the models were downloaded from public source code with default hyper-parameters and pre-trained as previous works did. Table 3 shows the results of five epochs of training of the SCNN network on different datasets, including 12 different scenes. The last three rows are the newly added three foggy scenes with increasing fog densities. The data in each column is acquired from five epochs of neural network trained with different datasets, including the original CULane dataset, the FoggyCULane dataset with three foggy densities of values 2, 3, and 4, and the FoggyCULane dataset with a mixture of the three foggy densities. The values in the table (except the Crossroad row) represent the F1-measure values in percentage for each scene, while the Crossroad row shows FP values. As shown in Table 3, the models trained with the FoggyCULane dataset practically obtained better lane detection performance than those trained with the original CULane dataset under the three foggy conditions, especially with the highest density (β = 4). Figure 6 presents the one of the lane detection results of SCNN trained on different datasets. As shown in Table 3, the models trained with the FoggyCULane dataset practically obtained better lane detection performance than those trained with the original CULane dataset under the three foggy conditions, especially with the highest density ( = 4).  Under the heaviest foggy weather condition, the lane detection performance (F1measure) increased from 11.09 to 89.21 by FoggyCULane. Particularly, the model trained by the FoggyCULane dataset with the mixing of the three densities of foggy weather images achieved the best performance (86.65, 81.53, and 70.41 in F1-measure) in all foggy scenes. This improvement is due to the addition of foggy images to the training set, allowing the neural network to extract features of lane markings under foggy weather for learning. These synthesis haze images contain enough foggy information to satisfy the need for deep learning. Therefore, this study can significantly improve the lane detection performance of neural networks in foggy scenes while not negatively affecting lane detection in other complex scenes. By comparing three different density FoggyCULane datasets ( = 2, 3, 4), it can be observed that:

•
The dataset with mixed foggy densities has better performance than the dataset with a single foggy density. On the one hand, there are more haze images in the dataset with mixed fog densities, and the neural network can be more exposed to the foggy scene when training. Therefore, it is more sensitive to lane markings under foggy weather. On the other hand, the dataset with mixed fog densities contains fog images Under the heaviest foggy weather condition, the lane detection performance (F1measure) increased from 11.09 to 89.21 by FoggyCULane. Particularly, the model trained by the FoggyCULane dataset with the mixing of the three densities of foggy weather images achieved the best performance (86.65, 81.53, and 70.41 in F1-measure) in all foggy scenes. This improvement is due to the addition of foggy images to the training set, allowing the neural network to extract features of lane markings under foggy weather for learning. These synthesis haze images contain enough foggy information to satisfy the need for deep learning. Therefore, this study can significantly improve the lane detection performance of neural networks in foggy scenes while not negatively affecting lane detection in other complex scenes. By comparing three different density FoggyCULane datasets (β = 2, 3, 4), it can be observed that:

•
The dataset with mixed foggy densities has better performance than the dataset with a single foggy density. On the one hand, there are more haze images in the dataset with mixed fog densities, and the neural network can be more exposed to the foggy scene when training. Therefore, it is more sensitive to lane markings under foggy weather. On the other hand, the dataset with mixed fog densities contains fog images of three densities, making the network learn and extract features for lane detection in foggy condition comprehensively during model training.

•
The model trained in the dataset with the corresponding fog density value achieves the best lane detection performance in each foggy scene. This indicates that the features extracted by the network vary with fog densities. Therefore, the dataset with mixed fog densities should be applied to obtain a better performance in practice.

Effect of FoggyCULane Dataset on Other State-of-Art Methods
To verify the effectiveness of the FoggyCULane dataset on other detection network, four other state-of-art lane detection networks (ENet-SAD [7], ERFNet [8], LaneATT [9], and LaneNet [10]) were also trained with the original CULane dataset and FoggyCULane dataset (including all the densities of fog) respectively, then tested under normal and foggy scenes to compare their performance.
As shown in Table 4, all the models trained by the FoggyCULane dataset have improved performance under foggy conditions compared with the models trained on the original CULane. The test results of these networks are similar to the results of the SCNN network, which also demonstrates the generality and applicability of our method for different lane detection networks.  [42], which includes 10,000 frames in different real traffic scenarios including normal, crowded, curved road, damaged road, shadows, road markings, dazzle light, haze, night and crossroad, and multiple scenarios may simultaneously occur in a single frame. The VIL-100 dataset contains three hazy scenes (here named as real foggy scene 1, real foggy scene 2, and real foggy scene 3) with 100 images in each scene, and the representative images in each scene are shown in Figure 7. To further evaluate the effectiveness of our proposed framework in improving the performance of the lane detection model in real foggy scenes, 300 real foggy images with annotated lane markings from the VIL-100 dataset were adopted. SCNN model trained on CULane and FoggyCULane (β = 2, 3, 4) were tested on these foggy images, respectively. Figure 7 also compares the lane detection results from the two models, and Table 5 presents the values of F1-mearsure to evaluate the performance of the models in these foggy images. As shown in Table 5, the performance of SCNN trained on FoggyCUlane was generally improved in three scenes, especially in foggy scene 1 and foggy scene 3, in which the fog is thicker, and the fog is further overlaid with night darkness. The result further indicates that our proposed framework could enhance the ability of lane detection models in real foggy scenes. ages, respectively. Figure 7 also compares the lane detection results from the two models, and Table 5 presents the values of F1-mearsure to evaluate the performance of the models in these foggy images. As shown in Table 5, the performance of SCNN trained on Fog-gyCUlane was generally improved in three scenes, especially in foggy scene 1 and foggy scene 3, in which the fog is thicker, and the fog is further overlaid with night darkness. The result further indicates that our proposed framework could enhance the ability of lane detection models in real foggy scenes.    Since the fog in the selected images of VIL-100 are commonly thin, to further evaluate whether the trained lane detection model is adequate for real foggy scenes, 182 lane line images containing more challenging real foggy scenes were downloaded and compiled from the Internet. We used the SCNN models trained on the original CULane dataset and the FoggyCULane (β = 2, 3, 4) dataset to verify the model's performance on real-world foggy images. These challenging real foggy images have several characteristics:

1.
The fog density in the image varies among images, and the fog density is also not uniform in the same image.

2.
The angle and orientation of the images vary greatly from one another, including images taken from the perspective of roadside pedestrians, road surveillance cameras, and in-vehicle recorders.

3.
Vehicles and pedestrians in the images occlude the lane marks to varying degrees, and the number of lane marks in each image is not the same.
Since these real fog lane images collected from the Internet do not have lane labels, the same strategy as the Crossroad scene in the CULane test set was used here to evaluate the lane detection results using FP values.
Comparing the FP values of the above two models, we found that the performance of the SCNN model trained with the FoggyCULane dataset is significantly better than the other on real foggy days. The FP value increases from 33 to 118, as shown in Table 6, meaning the percentage of images with recognized lane markings increased from 18.13% to 64.84%. Several representative lane detection results in foggy weather are shown in Figure 8. It can be seen from Figure 8a that for multiple and obstructed lane markings, the model still detects the lane marking blocked by the vehicle based on semantic information. Figure 8b shows the image taken from the perspective of the pedestrian on the roadside. Unlike other images, the lane marking is not located in the central area of the image, with a different angle and direction. Figure 8 shows the case of low light and high fog density, yet still, one of the lanes is detected in this complex scenario. Overall, the SCNN model trained with the FoggyCULane dataset with three different densities of foggy scenes can achieve lane detection for more complex, realistic foggy scenes.

SCNN Trained on CULane SCNN Trained on FoggyCULane
Several representative lane detection results in foggy weather are shown in Figure 8. It can be seen from Figure 8a that for multiple and obstructed lane markings, the model still detects the lane marking blocked by the vehicle based on semantic information. Figure 8b shows the image taken from the perspective of the pedestrian on the roadside. Unlike other images, the lane marking is not located in the central area of the image, with a different angle and direction. Figure 8 shows the case of low light and high fog density, yet still, one of the lanes is detected in this complex scenario. Overall, the SCNN model trained with the FoggyCULane dataset with three different densities of foggy scenes can achieve lane detection for more complex, realistic foggy scenes.

Application of Proposed Framework on VIL-100 Dataset
In this section, we applied our data augmentation framework to the VIL-100 dataset and generated FoggyVIL-100 to evaluate the feasibility of the current method further. We synthesized 5400 foggy images from the normal scenes in the VIL-100 dataset. The generated images contained three levels of simulated fog (β = 2, 3, 4), with 1800 images at each fog intensity level. Figure 9 shows the results of our generated foggy images on VIL-100 and the corresponding clear images and depth maps. The generated foggy images of all three foggy densities were mixed with the original images to establish a new dataset named FoggyVIL-100 to further test the effectiveness and feasibility of our proposed framework. Table 7 presents the file structure of VIL-100 and FoggyVIL-100, while Table 8 presents the comparison of scenarios in test sets in VIL-100 and FoggyVIL-100. Note that 300 hazy images in the original VIL-100 (real foggy scene 1, real foggy scene 2, and real foggy scene 3) were picked out to evaluate the performance of the lane detection model trained on different datasets. In this section, we applied our data augmentation framework to the VIL-100 dataset and generated FoggyVIL-100 to evaluate the feasibility of the current method further. We synthesized 5400 foggy images from the normal scenes in the VIL-100 dataset. The generated images contained three levels of simulated fog ( = 2, 3, 4), with 1800 images at each fog intensity level. Figure 9 shows the results of our generated foggy images on VIL-100 and the corresponding clear images and depth maps. The generated foggy images of all three foggy densities were mixed with the original images to establish a new dataset named FoggyVIL-100 to further test the effectiveness and feasibility of our proposed framework. Table 7 presents the file structure of VIL-100 and FoggyVIL-100, while Table 8 presents the comparison of scenarios in test sets in VIL-100 and FoggyVIL-100. Note that 300 hazy images in the original VIL-100 (real foggy scene 1, real foggy scene 2, and real foggy scene 3) were picked out to evaluate the performance of the lane detection model trained on different datasets.    The SCNN model was trained on the original VIL-100 dataset and FoggyVIL-100 dataset with default hyperparameters, respectively. Table 9 shows the lane detection results (F1-measure) of the SCNN model on the FoggyVIL-100 test set. As shown in Table 9, compared with the SCNN trained on original VIL-100, that trained on FoggyVIL-100 improved F1-measure in all scenes, especially in generated foggy scenes, which were improved from 72.84 to 81.57, 54.00 to 79.22, and 13.57 to 66.19 under three fog densities, respectively. Meanwhile, the model performance in other weather conditions was not degraded. In fact, it mostly improved, which may be the result of the increased scale and enriched features of the trained set, especially for crowed, dazzl light, and night scenes, in which the light conditions are commonly low.  Table 9 shows that the performance of the SCNN model trained on FoggyCULane was improved in the FoggyVIL-100 test set. This suggests that even when the camera position or level of brightness was changed from one dataset to another, the foggy data augmentation approach is still valid to improve model performance.
Moreover, the models trained on VIL-100 and FoggyVIL-100 were also evaluated in the FoggyCULane test set, with the results depicted in Table 10. Table 10 also shows that the performance of the model was improved by training on the augmented dataset, especially for the heaviest simulated foggy scene (from 11.10 to 36.51). Tables 11 and 12 present the evaluation results in real foggy scenes collected from the original VIL-100dataset and the internet, respectively. As is shown in Table 11, the augmented dataset improved the performance of the SCNN model in three real foggy scenes by 12.41%, 3.35%, and 15.07%, respectively, and the FP values in real foggy images collected from the internet were also improved from 24 to 77 as shown in Table 12. The results in this section further proved the feasibility and effectiveness of our method.

Conclusions
In this paper, we focus on improving the lane detection performance of neural networks in foggy conditions. We proposed a data augmentation method to synthesize foggy images from the clear weather condition. A total of 107,451 foggy images are synthesized based on the CULane dataset using our method; thus, the scale of the original CULane dataset is expanded 1.8 times to solve the problem of insufficient lane dataset under foggy weather. Therefore, a new lane detection dataset, FoggyCULane containing three different densities of foggy weather scenarios, was built.
The results of our work indicate that artificially synthesizing images to expand the dataset can significantly improve the performance of lane detection in the corresponding complex scene. Accuracy in the F1-measure of the SCNN model in different densities of foggy weather was improved from 74.65, 51.41, and 11.09 to 86.65, 81.53, and 70.41, respectively. The improvement is especially notable in the higher fog density condition; meanwhile, the expansion of the dataset does not negatively affect the performance of other complex scenes. Aside from the SCNN model, we also evaluated our method on other SOTA networks. All the models trained on the FoggyCULane dataset had improved performance under foggy conditions, demonstrating the generality and applicability of the method.
The performance of our method under real foggy scenes also showed significant improvement, with the F1-measure increasing from 59.13, 57.01, and 40.15 to 66.82, 59.44, and 46.08 in three real foggy scenes, respectively. Besides, this data augmentation method was further applied to another dataset, VIL-100, to test the adaptability of this approach. Similarly, it was found that even when the camera position or level of brightness was changed from one dataset to another, the foggy data augmentation approach is still valid for improving model performance under foggy conditions without degrading the accuracy on other weather conditions. Therefore, the method proposed in this paper can effectively improve the lane detection performance in foggy scenes for both highways and more complex urban roads without introducing extra computation on resource-constrained devices.
In summary, a data augmentation method is proposed in this work to improve the performance of DL-based lane detection algorithms under foggy weather. This work is of great practical significance for the following reasons. Firstly, the foggy image synthesis in our method avoids introducing extra sensors to acquire depth information, which means the method could be easily deployed in vision-based lane detection datasets. Additionally, despite the fact that this work mainly focuses on the lane detection task under foggy weather, it is expected that the proposed method could be further extended to improve object recognition and semantic segmentation tasks under other complex scenes for future study.