Intensity Thresholding and Deep Learning Based Lane Marking Extraction and Lane Width Estimation from Mobile Light Detection and Ranging (LiDAR) Point Clouds

Lane markings are one of the essential elements of road information, which is useful for a wide range of transportation applications. Several studies have been conducted to extract lane markings through intensity thresholding of Light Detection and Ranging (LiDAR) point clouds acquired by mobile mapping systems (MMS). This paper proposes an intensity thresholding strategy using unsupervised intensity normalization and a deep learning strategy using automatically labeled training data for lane marking extraction. For comparative evaluation, original intensity thresholding and deep learning using manually established labels strategies are also implemented. A pavement surface-based assessment of lane marking extraction by the four strategies is conducted in asphalt and concrete pavement areas covered by MMS equipped with multiple LiDAR scanners. Additionally, the extracted lane markings are used for lane width estimation and reporting lane marking gaps along various highways. The normalized intensity thresholding leads to a better lane marking extraction with an F1-score of 78.9% in comparison to the original intensity thresholding with an F1-score of 72.3%. On the other hand, the deep learning model trained with automatically generated labels achieves a higher F1-score of 85.9% than the one trained on manually established labels with an F1-score of 75.1%. In concrete pavement area, the normalized intensity thresholding and both deep learning strategies obtain better lane marking extraction (i.e., lane markings along longer segments of the highway have been extracted) than the original intensity thresholding approach. For the lane width results, more estimates are observed, especially in areas with poor edge lane marking, using the two deep learning models when compared with the intensity thresholding strategies due to the higher recall rates for the former. The outcome of the proposed strategies is used to develop a framework for reporting lane marking gap regions, which can be subsequently visualized in RGB imagery to identify their cause.


Introduction
Reliable identification of lane markings-including dash lines, edge lines, arrows, and crosswalk markings-is important for autonomous driving and driver assistance systems (ADAS) applications. Lane markings with high reflectivity on roadways can guide drivers and control traffic activities.
Furthermore, an accurate lane marking inventory is the foundation for various transportation applications, such as the development of detailed high definition (HD) maps, lane guidance, roadway maintenance, and road network optimization. Thus, lane marking extraction has become an essential process for many transportation applications. Table 1 provides a summary of recent lane marking extraction strategies based on different sensing modalities while listing their merits and shortcomings.
Several studies have been proposed to extract lane markings from imagery acquired by terrestrial and airborne platforms. Hernandez et al. [1] extracted lane markings using vehicle-based imagery. First, lane markings were extracted using edge and color information. The lane marking parameters were then calculated using linear fitting. Jung et al. [2] also detected lane markings from vehicle-based imagery. They first generated spatiotemporal imagery by accumulating the pixels on a horizontal scanline along a time axis for each frame. The lane markings were finally detected using the Hough Transform. For airborne platforms, Azimi et al. [3] proposed an Aerial LaneNet, a fully convolutional neural network (CNN) [4], for detecting lane markings in aerial imagery. However, lane markings in imagery could be occluded by vehicles and other human-made features. Image-based approaches are also affected by weather and lighting conditions. In addition, the size and resolution of available imagery limit the ability to detect all lane markings.
Recently [5][6][7][8][9][10][11][12][13][14][15], there has been an increasing interest in using LiDAR-based Mobile Mapping Systems (MMS), which can collect three-dimensional (3D) point cloud data for transportation applications. This trend is motivated by the fact that LiDAR sensors can operate under different lighting and weather conditions. Moreover, these sensors can deliver 360-degree surround perception that eliminates the occlusion problem. Several researchers, thus, have resorted to LiDAR-based MMS point clouds for lane marking extraction. The generic workflow involves extracting road surface point clouds from the original ones followed by intensity-based differentiation of lane marking points from non-lane marking points.
Lane marking extraction approaches from LiDAR data can be categorized into two groups: (1) two-dimensional (2D) rasterized intensity image-driven detection and (2) 3D point cloud-driven extraction. For detecting lane markings from rasterized images, Guan et al. [5] generated georeferenced intensity images from road surface point clouds using an Inverse Distance Weighting (IDW) strategy. After that, lane markings were extracted from the intensity images through multiple scanning-distance-based thresholds. Finally, Otsu's thresholding and morphological closing were used to refine the extracted lane markings. Kumar et al. [6] at first generated two raster images based on intensity and range values. Then, a threshold, which is based on the range and cross-slope values, was used for extracting lane markings. Finally, morphological operations were utilized to complete the lane markings and remove false positives. Soilán et al. [7] extracted potential lane markings from rasterized images by modeling the intensity distribution using a Gaussian Mixture Model. They first extracted the road surface from the original point cloud. Two classes are hypothesized-a pavement class with low-intensity values and a greater fraction of points and another lane markings class with high-intensity values and a smaller number of points. Each point is assigned to the class with maximum posterior probability. The points belonging to low-intensity class were removed, which ensures that minimal data was processed to generate intensity images. Finally, Otsu's thresholding and area-based filtering were applied to intensity images for lane marking extraction. Cheng et al. [8] also applied an Otsu's thresholding strategy for lane marking extraction. They first corrected the intensity values in the original point cloud using a scan angle rank to eliminate intensity variation caused by varying incidence angles. Based on their assumption of a planar ground surface, the scan angle rank recorded by their LiDAR-based MMS is considered very close to the incident angle. Next, a road surface point cloud segmented from the corrected point cloud was used to generate intensity images. Then, a large-size, high-pass enhancement was applied to remove gradual variation of intensity in these images. Finally, an Otsu's threshold was applied to extract lane markings. Ghallabi et al. [9] presented another intensity-image-based lane marking detection strategy. They chose a cell size of 15 cm, which is based on the width of lane markings, for generating the intensity images. The lane Remote Sens. 2020, 12, 1379 3 of 41 markings were then detected using Hough transform where the lines were parametrized by the polar representation (γ, θ)-with γ representing the distance between the vehicle and the lane marking and θ representing the vehicle's heading relative to the lane marking. In their approach, certain constraints were imposed to eliminate false lane marking detections. A detection was considered valid if the parametrized lines are approximately parallel to the driving direction. Thereafter, among the detected lines, a line that has a maximum number of the other lines parallel to it is defined as the reference line in order to remove all the lines that are not parallel to it. Finally, a line fusion was performed if the remaining lines lie within a certain distance threshold from each other. Jung et al. [10] proposed an "inpainting" algorithm to fill holes in the intensity image caused by the high speed of the MMS. They used the Laplace equation to fill the center pixel (the hole in an intensity image to be painted) based on a weighted average of neighboring pixels. In the next step, the inpainted intensity image was assumed to have a bimodal intensity distribution with two classes being lane markings and non-lane markings. Then, an iterative Expectation-Maximization algorithm was applied to extract potential lane markings.
In order to deal with over-segmentation problems arising from worn-out lane markings, they further proposed a line association strategy. Line parameters such as orientation and distance from the origin were computed for each lane marking followed by grouping lane markings that show similar topology according to these parameters. Finally, remaining false positives were removed using a filter based on the Dip test statistic [16].
For directly extracting lane markings from point clouds, Yu et al. [11] at first divided the road surface point cloud into multiple blocks across the driving direction. Subsequently, an intensity threshold was determined using Otsu's thresholding strategy for extracting lane markings. Finally, for eliminating false positives, a spatial density filter was applied to remove points with a lower spatial density in comparison to lane marking points. Yan et al. [12] separated the LiDAR point cloud into scan lines since there are a smaller number of points in a scan line for processing. They then applied an intensity-based filter to remove non-lane marking points while preserving lane marking edge points. Finally, all points falling between the edge points were extracted as lane marking points. Jeong et al. [13] proposed an intensity calibration procedure for lane marking extraction before applying Otsu's thresholding strategy. They assumed that if the incident angle and the scanning distance for two surfaces were similar, then the ratio of their intensity values would be similar to the ratio of their reflectance. Accordingly, a calibrated intensity value was calculated by taking a product of a constant value of reference reflectance and the ratio of uncalibrated intensity to reference intensity.
Recently, there is a growing interest in extracting lane markings from LiDAR-based MMS point clouds using learning-based approaches, such as machine learning and deep learning. He et al. [14] presented a lane marking detection algorithm based on CNN. The intensity values were normalized using their mean and standard deviation. Then, they were re-scaled to the [0, 255] range in order to generate intensity images. They selected 2729 intensity images, which have been manually labeled, to train the CNN model for detecting lane markings. Wen et al. [15] also developed a deep learning-based lane marking detection strategy. They at first rasterized the intensity values of the road surface point cloud into intensity images. Two different U-net models [17] were then trained with 3000 images along a highway and urban areas and 1000 images covering an underground garage (all manually labeled). In spite of their promise, the bottleneck of learning-based approaches is the generation of sufficient training and validation data.
In summary, the majority of existing approaches aim at extracting lane markings using an appropriate intensity threshold combined with intensity calibration and/or outlier removal strategies. However, these strategies require prior knowledge or assumptions regarding road surface intensity distribution to determine the thresholds and eliminate false positives. Most of the above studies have only been tested or evaluated in small areas and have not been investigated to check whether they can cope with complex road geometry. On the other hand, recently proposed learning-based approaches can more effectively solve the problem of intensity variation, eliminating the need for multiple thresholds. However, such approaches require a lot of manually labeled training images. Further, to the best of the Remote Sens. 2020, 12, 1379 4 of 41 authors' knowledge, no study has been conducted that analyzes lane marking extraction performance in the context of the nature of pavement surface (asphalt and concrete). This paper addresses these challenges by introducing two strategies for lane marking extraction (intensity thresholding-based and deep learning-based approaches). The main contributions of this research can be summarized as follows:

1.
A lane marking extraction strategy is developed by thresholding normalized intensity values from multi-beam spinning LiDAR. The intensity normalization can be applied in any environment without the need for reference targets.

2.
For the deep learning strategy, an automated labeling procedure is developed, which can generate a large number of training samples in order to detect lane markings from LiDAR intensity images. In addition, a refinement strategy for the predictions has been developed to deliver corresponding LiDAR points for the extracted lane markings.

3.
In order to compare the performance of the proposed lane marking extraction strategies, state of the art approaches based on original intensity thresholding (i.e., without intensity normalization) [18] and deep learning using manually established labels [15] are also implemented.

4.
It is hypothesized that the performance of the lane extraction procedure depends to a high degree on the pavement type. Therefore, a pavement surface-based evaluation of the lane marking extraction strategies in asphalt and concrete areas is conducted.

5.
Lane markings are extracted using the above four strategies from LiDAR-based MMS point clouds, collected on two-lane highways with a total length of 67 miles, which have different road geometry, such as turning lane, merging lane, and intersection areas. Additionally, this dataset can serve as a benchmark for performance evaluation of lane marking extraction algorithms. 6.
As a further evaluation of the performance of different lane marking extraction strategies, lane width estimates are derived for each strategy across the different datasets. These estimates have been compared to manually derived ones. 7.
Derived lane marking from the proposed strategies can be utilized to report lane marking gap regions. This reporting mechanism is quite valuable for departments of transportation (DOT) as it can be used to prioritize maintenance operations and gauge their infrastructure readiness for autonomous driving The remainder of this paper is organized as follows: Section 2 introduces the LiDAR-based MMS used in this research. Section 3 describes the LiDAR-based MMS point cloud data collected from different test sites. Then, the four lane marking extraction strategies, lane width estimation procedure, and lane marking gap reporting algorithm are described in Section 4, followed by Section 5 that discusses the lane marking extraction results and subsequent lane width estimation. Finally, concluding remarks regarding the different strategies and potential directions for future research are summarized in Section 6.

Mobile LiDAR System Used in This Research
The 3D point cloud datasets used in this research were captured by a wheel-based MMS-Purdue Wheel-based Mobile Mapping System-High Accuracy (PWMMS-HA). Four 3D LiDAR scanners are mounted on the PWMMS-HA (as shown in Figure 1): three Velodyne HDL-32E and one Velodyne VLP-16 Puck Hi-Res. The HDL-32E scanner has 32 radially oriented laser rangefinders that are aligned vertically from +10.67 • to -30.67 • making up a total vertical field of view (FOV) of 41.34 • . The HDL-32E can capture around 700,000 points per second with a maximum range of 100 m (at an accuracy of ± 2 cm) [19]. The VLP-16 scanner, on the other hand, consists of 16 radially oriented laser rangefinders from −10 • to +10 • (i.e., 20 • vertical FOV). The VLP-16 can capture around 300,000 points per second with a maximum range of 100 m (at an accuracy of ± 3 cm) [20]. All four LiDAR scanners can rotate to achieve a 360 • horizontal FOV. In addition, three FLIR Grasshopper3 9.1MP GigE cameras (two forward-facing and one rear-facing) are also mounted on the PWMMS-HA. All the Remote Sens. 2020, 12, 1379 5 of 41 cameras are synchronized to capture RGB imagery with a maximum resolution of 9.1 MP at a rate of 1 frame per second per camera. The LiDAR and imaging sensors are georeferenced by an Applanix Position and Orientation System for Land Vehicles (POSLV) 220 Global Navigation Satellite System (GNSS)/Inertial Measurement Unit (IMU) navigation system. The GNSS collection rate is 20 Hz, and the IMU measurement rate is 200 Hz. After GNSS/inertial navigation system (INS) post-processing, the attitude accuracy is ±0.020 • , and the positional accuracy is ±2 cm [21]. The expected accuracy of the derived point cloud while considering the LiDAR and navigation system specifications is roughly 2-4 cm at a range of 30 m. This accuracy is estimated using the LiDAR Error Propagation calculator developed by Habib et al. [22].
In order to reconstruct geo-referenced and well-registered point clouds from the different LiDAR scanners, a system calibration procedure [23] is used for estimating the mounting parameters between the onboard LiDAR scanners and GNSS/IMU unit. Another simultaneous LiDAR-camera calibration [24] is also conducted to estimate the mounting parameters of the onboard cameras for the registration of LiDAR point clouds with imagery. Thus, forward and backward projection between the reconstructed point cloud and RGB imagery can be established using the estimated cameras' mounting parameters and trajectory information. This projection will facilitate the analysis of the performance of the different lane marking extraction strategies. Just as an example, Figure 2 illustrates corresponding image and LiDAR point cloud where the red dot in the latter is projected onto the corresponding image (displayed as an empty magenta circle). Hereafter, a red dot will be used to represent a location in the LiDAR point cloud, while an empty magenta circle will be used to show the same location in RGB imagery.
Remote Sens. 2020, 12, x FOR PEER REVIEW 5 of 41 the attitude accuracy is ±0.020°, and the positional accuracy is ±2 cm [21]. The expected accuracy of the derived point cloud while considering the LiDAR and navigation system specifications is roughly 2-4 cm at a range of 30 m. This accuracy is estimated using the LiDAR Error Propagation calculator developed by Habib et al. [22]. In order to reconstruct geo-referenced and well-registered point clouds from the different LiDAR scanners, a system calibration procedure [23] is used for estimating the mounting parameters between the onboard LiDAR scanners and GNSS/IMU unit. Another simultaneous LiDAR-camera calibration [24] is also conducted to estimate the mounting parameters of the onboard cameras for the registration of LiDAR point clouds with imagery. Thus, forward and backward projection between the reconstructed point cloud and RGB imagery can be established using the estimated cameras' mounting parameters and trajectory information. This projection will facilitate the analysis of the performance of the different lane marking extraction strategies. Just as an example, Figure 2 illustrates corresponding image and LiDAR point cloud where the red dot in the latter is projected onto the corresponding image (displayed as an empty magenta circle). Hereafter, a red dot will be used to represent a location in the LiDAR point cloud, while an empty magenta circle will be used to show the same location in RGB imagery. the attitude accuracy is ±0.020°, and the positional accuracy is ±2 cm [21]. The expected accuracy of the derived point cloud while considering the LiDAR and navigation system specifications is roughly 2-4 cm at a range of 30 m. This accuracy is estimated using the LiDAR Error Propagation calculator developed by Habib et al. [22]. In order to reconstruct geo-referenced and well-registered point clouds from the different LiDAR scanners, a system calibration procedure [23] is used for estimating the mounting parameters between the onboard LiDAR scanners and GNSS/IMU unit. Another simultaneous LiDAR-camera calibration [24] is also conducted to estimate the mounting parameters of the onboard cameras for the registration of LiDAR point clouds with imagery. Thus, forward and backward projection between the reconstructed point cloud and RGB imagery can be established using the estimated cameras' mounting parameters and trajectory information. This projection will facilitate the analysis of the performance of the different lane marking extraction strategies. Just as an example, Figure 2 illustrates corresponding image and LiDAR point cloud where the red dot in the latter is projected onto the corresponding image (displayed as an empty magenta circle). Hereafter, a red dot will be used to represent a location in the LiDAR point cloud, while an empty magenta circle will be used to show the same location in RGB imagery.

Datasets
Three datasets are utilized in this research to evaluate the performance of different lane marking extraction strategies. These datasets were acquired over two highways (the first two on an interstate highway with the third one covering rural highway). The collection date, used sensor, length, average local point spacing (LPS) [25], average driving speeds, and main pavement type of each dataset are listed in Table 2. The datasets include both concrete pavement and asphalt pavement areas. In dataset 1, as shown in Figure 3a, approximately 2.49 mile-long point cloud was collected along concrete pavement area, and 15.55 mile-long point cloud was collected in asphalt pavement area. For dataset 2, as shown in Figure 3b, around 6.28 of the total 33.87 mile-long point cloud covers concrete pavement area. Finally, only 2.23 of the total 15.29 mile-long point cloud in dataset 3 was collected in asphalt pavement area, as shown in Figure 3c.

Datasets
Three datasets are utilized in this research to evaluate the performance of different lane marking extraction strategies. These datasets were acquired over two highways (the first two on an interstate highway with the third one covering rural highway). The collection date, used sensor, length, average local point spacing (LPS) [25], average driving speeds, and main pavement type of each dataset are listed in Table 2. The datasets include both concrete pavement and asphalt pavement areas. In dataset 1, as shown in Figure 3a, approximately 2.49 mile-long point cloud was collected along concrete pavement area, and 15.55 mile-long point cloud was collected in asphalt pavement area. For dataset 2, as shown in Figure 3b, around 6.28 of the total 33.87 mile-long point cloud covers concrete pavement area. Finally, only 2.23 of the total 15.29 mile-long point cloud in dataset 3 was collected in asphalt pavement area, as shown in Figure 3c.

Methodology
The proposed framework for lane marking extraction is illustrated in Figure 4. First, the road surface is identified from the LiDAR point cloud. Lane markings are directly extracted from the road surface point cloud using the original and normalized intensity thresholding strategies. For the two deep learning approaches, the road surface point cloud is rasterized into intensity images. Two U-net models are trained on manually established and automatically generated labels. The automatically generated labels are based on lane markings extracted through the normalized intensity thresholding strategy. For evaluating the performance of different strategies, obtained lane markings are compared with manually labeled ones. In addition, the extracted lane markings are utilized to derive lane width estimates using an adapted strategy of the one proposed by Ravi et al. [18]. As a further quantitative evaluation, these lane width estimates are also compared to manually derived values. Finally, the lane markings are also analyzed for reporting lane marking gaps.

Methodology
The proposed framework for lane marking extraction is illustrated in Figure 4. First, the road surface is identified from the LiDAR point cloud. Lane markings are directly extracted from the road surface point cloud using the original and normalized intensity thresholding strategies. For the two deep learning approaches, the road surface point cloud is rasterized into intensity images. Two Unet models are trained on manually established and automatically generated labels. The automatically generated labels are based on lane markings extracted through the normalized intensity thresholding strategy. For evaluating the performance of different strategies, obtained lane markings are compared with manually labeled ones. In addition, the extracted lane markings are utilized to derive lane width estimates using an adapted strategy of the one proposed by Ravi et al. [18]. As a further quantitative evaluation, these lane width estimates are also compared to manually derived values. Finally, the lane markings are also analyzed for reporting lane marking gaps.

Lane Marking Extraction Approaches
In this section, different lane marking extraction strategies are described. We collectively refer to the original and normalized intensity thresholding strategies as "intensity thresholding approaches." The deep learning strategies using manually derived and automatically established labels are denoted as "deep learning approaches." As mentioned earlier, the lane marking extraction procedure starts with the identification of the point cloud pertaining to the road surface. In this research, the road surface identification is based on the GNSS/INS trajectory as well as a rough estimate of the IMU height above the road surface. For more details regarding this procedure, interested readers can refer to Ravi et al. [18].

Intensity Thresholding Approaches
Original Intensity Thresholding Strategy Using the original intensity values, one can use a single threshold (ThI)-e.g., the one defined by 5th percentile intensity value [18]-to extract hypothesized lane markings from the road surface point cloud, as shown in Figure 5. However, in concrete pavement area, simple thresholding would result in hypothesized lane markings with significant false positives (hereafter referred to as "noise"). This scenario is shown in Figure 6 where more noise is observed since lane markings and pavements have similar high-intensity values in concrete pavement regions. Therefore, such low intensity contrast will negatively affect the performance of a simple thresholding strategy.

Lane Marking Extraction Approaches
In this section, different lane marking extraction strategies are described. We collectively refer to the original and normalized intensity thresholding strategies as "intensity thresholding approaches." The deep learning strategies using manually derived and automatically established labels are denoted as "deep learning approaches." As mentioned earlier, the lane marking extraction procedure starts with the identification of the point cloud pertaining to the road surface. In this research, the road surface identification is based on the GNSS/INS trajectory as well as a rough estimate of the IMU height above the road surface. For more details regarding this procedure, interested readers can refer to Ravi et al. [18].

Intensity Thresholding Approaches
Original Intensity Thresholding Strategy Using the original intensity values, one can use a single threshold (Th I )-e.g., the one defined by 5th percentile intensity value [18]-to extract hypothesized lane markings from the road surface point cloud, as shown in Figure 5. However, in concrete pavement area, simple thresholding would result in hypothesized lane markings with significant false positives (hereafter referred to as "noise"). This scenario is shown in Figure 6 where more noise is observed since lane markings and pavements have similar high-intensity values in concrete pavement regions. Therefore, such low intensity contrast will negatively affect the performance of a simple thresholding strategy. Remote Sens. 2020, 12, x FOR PEER REVIEW 9 of 41 (a) (b)

Normalized Intensity Thresholding Strategy
In order to solve the low contrast issue, which is more pronounced in concrete pavement area, this research adopts an intensity normalization strategy for an MMS with one or more multi-beam LiDAR scanners. The normalization process is based on the assumption that intensity values across laser beams should be similar for the same objects [26,27]. In this strategy, the normalized counterpart of an intensity value observed by a particular beam is the conditional expectation of intensity readings by other beams for the same areas where that beam observed the given intensity value. This normalization is applied to each multi-beam LiDAR scanner mounted on the MMS to obtain corresponding normalized intensity values of all laser beams for that scanner. Figure 7 illustrates an overview of the normalized intensity thresholding strategy for a multi-beam LiDAR-based MMS. First, for a given dataset, a small section is randomly chosen from the road surface point cloud captured by the MMS for the intensity normalization map generation. In case that the small point

Normalized Intensity Thresholding Strategy
In order to solve the low contrast issue, which is more pronounced in concrete pavement area, this research adopts an intensity normalization strategy for an MMS with one or more multi-beam LiDAR scanners. The normalization process is based on the assumption that intensity values across laser beams should be similar for the same objects [26,27]. In this strategy, the normalized counterpart of an intensity value observed by a particular beam is the conditional expectation of intensity readings by other beams for the same areas where that beam observed the given intensity value. This normalization is applied to each multi-beam LiDAR scanner mounted on the MMS to obtain corresponding normalized intensity values of all laser beams for that scanner. Figure 7 illustrates an overview of the normalized intensity thresholding strategy for a multi-beam LiDAR-based MMS. First, for a given dataset, a small section is randomly chosen from the road surface point cloud captured by the MMS for the intensity normalization map generation. In case that the small point

Normalized Intensity Thresholding Strategy
In order to solve the low contrast issue, which is more pronounced in concrete pavement area, this research adopts an intensity normalization strategy for an MMS with one or more multi-beam LiDAR scanners. The normalization process is based on the assumption that intensity values across laser beams should be similar for the same objects [26,27]. In this strategy, the normalized counterpart of an intensity value observed by a particular beam is the conditional expectation of intensity readings by other beams for the same areas where that beam observed the given intensity value. This normalization is applied to each multi-beam LiDAR scanner mounted on the MMS to obtain corresponding normalized intensity values of all laser beams for that scanner. Figure 7 illustrates an overview of the normalized intensity thresholding strategy for a multi-beam LiDAR-based MMS.

Normalized Intensity Thresholding Strategy
In order to solve the low contrast issue, which is more pronounced in concrete pavement area, this research adopts an intensity normalization strategy for an MMS with one or more multi-beam LiDAR scanners. The normalization process is based on the assumption that intensity values across laser beams should be similar for the same objects [26,27]. In this strategy, the normalized counterpart of an intensity value observed by a particular beam is the conditional expectation of intensity readings by other beams for the same areas where that beam observed the given intensity value. This normalization is applied to each multi-beam LiDAR scanner mounted on the MMS to obtain corresponding normalized intensity values of all laser beams for that scanner. Figure 7 illustrates an overview of the normalized intensity thresholding strategy for a multi-beam LiDAR-based MMS. First, for a given dataset, a small section is randomly chosen from the road surface point cloud captured by the MMS for the intensity normalization map generation. In case that the small point First, for a given dataset, a small section is randomly chosen from the road surface point cloud captured by the MMS for the intensity normalization map generation. In case that the small point cloud is captured by more than one scanner, the LiDAR data should be split according to the used scanner. Subsequently, the intensity normalization approach proposed by Levinson and Thrun [26,27] is applied to the small road surface point cloud from each LiDAR scanner. The adopted approach proceeds according to the following steps: 1.
The small road surface point cloud is gridded into cells. Each cell stores the list of points that lie within its bounding box. For each point, only the intensity value and laser beam ID are stored.

2.
In order to compute the normalized intensity value of a laser beam j that recorded an intensity value a, we seek all cells that contain the pair (j, a) in the raster grid. The average intensity is computed over these cells while excluding intensity values recorded by laser beam j. The normalized intensity of (j, a) is the resulting average. The original and normalized intensity values are stored in a lookup table (LUT) for the scanner/dataset in question.

3.
For the intensity values that are not observed in the small road surface point cloud, their normalized counterpart can be calculated by interpolation, using the normalized values associated with the observed intensities.
One should note that, in this research, the small road surface point cloud is randomly selected in concrete pavement area, which exhibits higher minimum and maximum intensity values. Using asphalt pavement regions, with the majority of the intensity values of lower magnitude, might map high-intensity values of both lane markings and concrete pavements to a similar value. This defeats the purpose of increasing intensity contrast between lane markings and pavement surface. In addition, it is assumed that the map generated from the small road surface point cloud of concrete pavements would not negatively affect the intensity contrast between lane markings and asphalt pavement. The performance metrics for lane marking extraction in dataset 2 (asphalt dominant) further validate this assumption, as will be reported in Section 5.1.2.
The choice of the cell size for generating the intensity normalization map was not addressed by Levinson and Thrun [26,27]. The cell size plays a key role-i.e., large cell size might cause more than one type of object surface being located in a single cell, while within a small cell, the laser beams could be too sparse for evaluating a reliable average intensity value. In this research, the cell size is based on the LPS of the point cloud [25]. Prior to intensity normalization, the LPS is evaluated for the small road surface point cloud captured by each scanner. The cell size is determined using a multiplication factor threshold (Th MF ) of the respective LPSs. Since a change in the driving speed from one dataset to another could lead to differences in LPSs, the intensity normalization maps should be generated for each dataset. The intensity values captured by a given scanner are then normalized using the respective LUT for the dataset in question. Finally, hypothesized lane markings can be extracted from the normalized road surface point cloud using the 5th percentile intensity threshold.

Deep Learning Approaches
In this research, two U-net models [17] are trained using manually established and automatically-generated labels. Figure 8 illustrates an overview of the proposed deep learning-based lane marking extraction and U-net model training framework. The first step in this process is to generate intensity images through a 3D-to-2D mapping process. Extracting lane markings from the intensity images is a binary classification task where each pixel is labeled as either belonging to a lane marking or not. This classification task is performed by training a U-net model to identify lane marking pixels in the intensity images. The following subsections describe intensity image generation and labeling, U-net model training, and refinement of U-net predictions.
analysis of the datasets suggests a point density equivalent to a cell size of approximately 5 cm. Therefore, an image size fixed at 256 × 256 (for U-net input), with a 5 cm cell size, ensures minimal resizing along the length and width of the block while maintaining the level of detail in the point cloud. After partitioning, an intensity enhancement is applied to each point cloud block by choosing a threshold (ThEN)-e.g., 5 th intensity percentile. Intensity values greater than this threshold are set to 255, while lower intensity values are maintained.  When generating intensity images, the pixel values are derived from the enhanced intensity within the point cloud block. For each cell, its pixel value is defined by taking an average of the intensity values of points falling in it. A second level of enhancement is applied to the generated intensity images-e.g., using a 5th intensity percentile threshold. The two-step enhancement (in the point cloud block and intensity image) helps in amplifying the pixel values corresponding to lane markings and facilitates easier inference from the intensity image by the U-net model. For the following discussion, we hereafter refer to the enhanced image as an "intensity image." An original road surface point cloud and corresponding intensity image are shown in Figure 10. For U-net training, some intensity images are utilized to establish labels manually for the first U-net model (referred to as "U-net model 1" in Figure 8). For the second U-net model (referred to as "U-net model 2" in Figure 8), labels are generated automatically using lane markings obtained from the normalized intensity thresholding after noise removal according to the following steps:

Intensity Image Generation and Labeling
For generating an intensity image, it is crucial to choose a cell size that can maintain the lane marking details in the derived image as well as reduce computations. The cell size selection should consider both the width of mapped roads as well as the LPS of available data. The width of surveyed highway roads in this research ranges from 12 to 16 m in different regions of the three datasets (i.e., covering two-lane highways including shoulder width). Therefore, the road surface point cloud is partitioned into blocks of length 12.8 m along the driving direction- Figure 9. Further, the LPS analysis of the datasets suggests a point density equivalent to a cell size of approximately 5 cm. Therefore, an image size fixed at 256 × 256 (for U-net input), with a 5 cm cell size, ensures minimal resizing along the length and width of the block while maintaining the level of detail in the point cloud. After partitioning, an intensity enhancement is applied to each point cloud block by choosing a threshold (Th EN )-e.g., 5 th intensity percentile. Intensity values greater than this threshold are set to 255, while lower intensity values are maintained.   When generating intensity images, the pixel values are derived from the enhanced intensity within the point cloud block. For each cell, its pixel value is defined by taking an average of the intensity values of points falling in it. A second level of enhancement is applied to the generated intensity images-e.g., using a 5th intensity percentile threshold. The two-step enhancement (in the point cloud block and intensity image) helps in amplifying the pixel values corresponding to lane markings and facilitates easier inference from the intensity image by the U-net model. For the following discussion, we hereafter refer to the enhanced image as an "intensity image." An original road surface point cloud and corresponding intensity image are shown in Figure 10. For U-net training, some intensity images are utilized to establish labels manually for the first U-net model When generating intensity images, the pixel values are derived from the enhanced intensity within the point cloud block. For each cell, its pixel value is defined by taking an average of the intensity values of points falling in it. A second level of enhancement is applied to the generated intensity images-e.g., using a 5th intensity percentile threshold. The two-step enhancement (in the point cloud block and intensity image) helps in amplifying the pixel values corresponding to lane markings and facilitates easier inference from the intensity image by the U-net model. For the following discussion, we hereafter refer to the enhanced image as an "intensity image." An original road surface point cloud and corresponding intensity image are shown in Figure 10. For U-net training, some intensity images are utilized to establish labels manually for the first U-net model (referred to as "U-net model 1" in Figure 8). For the second U-net model (referred to as "U-net model 2" in Figure 8), labels are generated automatically using lane markings obtained from the normalized intensity thresholding after noise removal according to the following steps: 1.
The noise removal strategy proposed by Ravi et al. [18] (the details of which are described in Section 4.2) is applied to the hypothesized lane markings. Figure 11a,b illustrate the outcome from the normalized intensity thresholding strategy before and after noise removal.

2.
The point cloud after noise removal, as shown in Figure 11b, is then divided into 12.8 m-long blocks for converting into images with a pixel size and image size of 5cm and 256 × 256, respectively. 3.
To ensure better spatial structure for the markings, a bounding box is created around each lane marking in the resulting intensity image, as shown in Figure 11c. Thereafter, all pixels falling within the bounding box are labeled as lane marking pixels. The resultant image, as shown in Figure 11d, serves as a labeled image for the training of U-net model 2.
Remote Sens. 2020, 12, x FOR PEER REVIEW 12 of 41 1. The noise removal strategy proposed by Ravi et al. [18] (the details of which are described in Section 4.2) is applied to the hypothesized lane markings. Figure 11a,b illustrate the outcome from the normalized intensity thresholding strategy before and after noise removal. 2. The point cloud after noise removal, as shown in Figure 11b, is then divided into 12.8 m-long blocks for converting into images with a pixel size and image size of 5cm and 256 × 256, respectively. 3. To ensure better spatial structure for the markings, a bounding box is created around each lane marking in the resulting intensity image, as shown in Figure 11c. Thereafter, all pixels falling within the bounding box are labeled as lane marking pixels. The resultant image, as shown in Figure 11d, serves as a labeled image for the training of U-net model 2.

U-Net Model Training
U-net is a fully CNN proposed by Ronneberger et al. [17] for biomedical image segmentation. The adopted network architecture is shown in Figure 12. In our implementation, batch normalization [28] is incorporated since it helps the U-net model to train faster by reducing the internal covariate shift and allowing higher learning rates. Considering the disparity in the number of pixels between lane marking and non-lane marking classes, this research chose a loss function based on the dice coefficient, which measures the degree of overlap between two classes. The dice coefficient [29] is defined as in Equation (1), where y true and y pred represent the ground truth and predicted pixels for the lane markings. Each pixel takes a value of either 0 or 1 depending on whether it belongs to non-lane marking or lane marking class, respectively. The dice coefficient value ranges from 0 to 1, where perfect overlap gives a value of 1. Minimizing this loss function leads to the maximization of the Dice coefficient and hence the degree of overlap between the ground truth and predicted lane markings. In order to evaluate the performance of all strategies, precision, recall, and F1-score-represented by Equations (2)-(4) where TP, FP, and FN are the true positives, false positives, and false negatives, respectively-are used. Precision signifies how accurate the positive predictions are whereas recall indicates how well the true lane markings are detected. F1-score, which is used to quantify the overall performance, is a harmonic mean of precision and recall.

Refinement of U-Net Predictions
After detection by the trained U-net model, predicted 2D lane marking images are projected back to 3D to derive lane marking points. Due to the raster nature of the images, the derived 3D points will be regularly spaced at a 5 cm interval. In order to derive lane markings with a point density similar to that of the original road surface point cloud, the back-projected 3D points are used to generate 2D masks. First, a square buffer cell with a 5cm side length is created around each projected point along the XY-plane. All neighboring cells are then merged to form mask regions, as shown in Figure 13a. As a refinement of the predicted lane markings, mask regions with areas smaller than a pre-defined threshold (Th area ) are removed. The value of Th area is based on the cell size of the intensity image and minimum area of a dash lane marking. Finally, the original points-whose intensity is within the 5th percentile intensity value-falling within the remaining mask regions are extracted as the final lane marking predictions, as shown in Figure 13b,c.

Refinement of U-Net Predictions
After detection by the trained U-net model, predicted 2D lane marking images are projected back to 3D to derive lane marking points. Due to the raster nature of the images, the derived 3D points will be regularly spaced at a 5 cm interval. In order to derive lane markings with a point

Lane Width Estimation Approach
To evaluate the performance of various lane marking extraction strategies, the lane width estimation approach proposed by Ravi et al. [18] is used after its adaptation to handle derived lane markings from intensity thresholding and deep learning strategies. Since predicted markings using either intensity thresholding or deep learning strategies might have false positives, these lane markings should be manipulated to produce their centerline points while removing potential outliers. The used strategy has four main steps, shown in Figure 14: (1) clustering lane markings through a distance-based region growing, (2) partitioning lane marking clusters, (3) noise removal, and (4) generating centerline points for each lane marking cluster. An example illustrating the different steps is shown in Figure 15.

Lane Width Estimation Approach
To evaluate the performance of various lane marking extraction strategies, the lane width estimation approach proposed by Ravi et al. [18] is used after its adaptation to handle derived lane markings from intensity thresholding and deep learning strategies. Since predicted markings using either intensity thresholding or deep learning strategies might have false positives, these lane markings should be manipulated to produce their centerline points while removing potential outliers. The used strategy has four main steps, shown in Figure 14: (1) clustering lane markings through a distance-based region growing, (2) partitioning lane marking clusters, (3) noise removal, and (4) generating centerline points for each lane marking cluster. An example illustrating the different steps is shown in Figure 15. either intensity thresholding or deep learning strategies might have false positives, these lane markings should be manipulated to produce their centerline points while removing potential outliers. The used strategy has four main steps, shown in Figure 14: (1) clustering lane markings through a distance-based region growing, (2) partitioning lane marking clusters, (3) noise removal, and (4) generating centerline points for each lane marking cluster. An example illustrating the different steps is shown in Figure 15. First, a distance-based region growing is applied to the hypothesized lane marking points. If the distance between two lane marking points is less than a distance threshold (Thdist), they are grouped into the same cluster. This Thdist is defined based on LPS analysis of the road surface point cloud. Subsequently, all lane marking clusters are partitioned into 3-m-long segments, which is the length of a dash lane marking segment [30]. This partitioning is necessary to represent curved lane markings as polylines. After the partitioning, Random Sample Consensus (RANSAC) and trajectorybased strategies are applied to the lane marking segments for noise removal. First, a best-fitting line for each segment is estimated using the RANSAC algorithm [31]. Based on the fitted line parameters, outlier points within the segment are removed, as shown in Figure 15b. Second, an entire segment which is not parallel to the driving direction is removed, as shown in Figure 15c. Collectively, the RANSAC-based strategy removes outlier points within a hypothesized lane marking segment, while the trajectory-based strategy removes an entire segment that does not represent a lane marking (as indicated by its orientation relative to the system trajectory). Finally, the points in the remaining segments are projected onto the corresponding centerlines. Once centerline points of the lane markings are generated, the next step is to cluster them into right-side and left-side groups for a given lane, as shown in Figure 16a. The basic concept of the adopted centerline clustering algorithm is to start with a segment at the beginning of a road surface and using its direction as a reference. The reference segment is augmented with the next centerline segment along its direction. The centerline segment, which has been augmented last to a group, is then used to define the new reference direction, and the process is repeated. Then, a linear interpolation is conducted for filling the gap between two successive centerline segments in a given group, as shown in Figure 16b. Two thresholds, defining the minimum and maximum bounds for conducting the interpolation, are used. For the minimum bound, we use Thdist-which was previously used for the distance-based region growing. Therefore, for clustered centerline points that are farther apart than Thdist, we carry out linear interpolation between them. To avoid linear interpolation on curved road segments, we define a maximum distance threshold, denoted as Thmiss.
For centerline points that are farther apart more than Thmiss, a region of missing lane marking is assumed and reported. In this research, Thmiss is set to 40 m, which is equivalent to the extent of three missing dash lane markings along road surface. One should note that Thmiss is determined based on the minimum radius of curvature for designing highways [32]. With a design speed of 70 mph and chord of length 40 m, the corresponding arc obtained by the minimum curvature (2040 ft) is about 40.01 m. The difference between Thmiss and the corresponding arc is 1 cm, which is within the noise Subsequently, all lane marking clusters are partitioned into 3-m-long segments, which is the length of a dash lane marking segment [30]. This partitioning is necessary to represent curved lane markings as polylines. After the partitioning, Random Sample Consensus (RANSAC) and trajectory-based strategies are applied to the lane marking segments for noise removal. First, a best-fitting line for each segment is estimated using the RANSAC algorithm [31]. Based on the fitted line parameters, outlier points within the segment are removed, as shown in Figure 15b. Second, an entire segment which is not parallel to the driving direction is removed, as shown in Figure 15c. Collectively, the RANSAC-based strategy removes outlier points within a hypothesized lane marking segment, while the trajectory-based strategy removes an entire segment that does not represent a lane marking (as indicated by its orientation relative to the system trajectory). Finally, the points in the remaining segments are projected onto the corresponding centerlines.
Once centerline points of the lane markings are generated, the next step is to cluster them into right-side and left-side groups for a given lane, as shown in Figure 16a. The basic concept of the adopted centerline clustering algorithm is to start with a segment at the beginning of a road surface and using its direction as a reference. The reference segment is augmented with the next centerline segment along its direction. The centerline segment, which has been augmented last to a group, is then used to define the new reference direction, and the process is repeated. Then, a linear interpolation is conducted for filling the gap between two successive centerline segments in a given group, as shown in Figure 16b. Two thresholds, defining the minimum and maximum bounds for conducting the interpolation, are used. For the minimum bound, we use Th dist -which was previously used for the distance-based region growing. Therefore, for clustered centerline points that are farther apart than Th dist , we carry out linear interpolation between them. To avoid linear interpolation on curved road segments, we define a maximum distance threshold, denoted as Th miss . For centerline points that are farther apart more than Th miss , a region of missing lane marking is assumed and reported. In this research, Th miss is set to 40 m, which is equivalent to the extent of three missing dash lane markings along road surface. One should note that Th miss is determined based on the minimum radius of curvature for designing highways [32]. With a design speed of 70 mph and chord of length 40 m, the corresponding arc obtained by the minimum curvature (2040 ft) is about 40.01 m. The difference between Th miss and the corresponding arc is 1 cm, which is within the noise level of the MMS. However, Th miss should be revised accordingly when the minimum curvature changes due to a decline in design speed on suburban or urban roads. Table 3

Lane Marking Gap Reporting
As mentioned previously in Section 4.2, for lane width estimation, 3-m-long lane marking segments are generated through the strategy proposed by Ravi et al. [18]. Gaps between these segments correspond to areas with worn-out/missing lane markings and/or road intersections. While an interpolation is conducted to fill the gaps for lane width estimation, interpolated segments can be analyzed to provide a report as to whether these gaps are caused by worn-out/missing lane markings or intersections. Moreover, these reported regions and the corresponding RGB imagery visualization can be utilized for lane markings inspection, which could replace in-situ inspection. In this research, lane markings are solely defined based on intensity returns from the road surface (i.e., other highway information is not incorporated to check if a gap is a result of an intersection or low-intensity returns

Lane Marking Gap Reporting
As mentioned previously in Section 4.2, for lane width estimation, 3-m-long lane marking segments are generated through the strategy proposed by Ravi et al. [18]. Gaps between these segments correspond to areas with worn-out/missing lane markings and/or road intersections. While an interpolation is conducted to fill the gaps for lane width estimation, interpolated segments can be analyzed to provide a report as to whether these gaps are caused by worn-out/missing lane markings or intersections. Moreover, these reported regions and the corresponding RGB imagery visualization can be utilized for lane markings inspection, which could replace in-situ inspection. In this research, lane markings are solely defined based on intensity returns from the road surface (i.e., other highway information is not incorporated to check if a gap is a result of an intersection or low-intensity returns from lane markings). Thus, the gaps between the lane marking segments are categorized into two classes-long lane marking gap regions and short lane marking gap regions. According to the Federal Highway Administration (FHWA) [30], dash lane markings encompass 9-m gaps, and the overall width of a two-lane intersection on highways is designed to be 102-120 ft (31.1-36.6 m). Based on this information, an algorithm is proposed for automatically reporting gaps along lane markings. Starting with the previously generated 3-m-long segments corresponding to the dash and edge lane markings, gaps longer than Th miss (which is used for avoiding interpolation in Section 4.2 and is slightly larger than the overall width of a two-lane intersection) are identified as long lane marking gap regions. Remaining gaps (less than Th miss ) are reported as short lane marking gap regions based on two cases, as shown in Figure 17: (1) when a gap between consecutive dash lane marking segments is greater than a dash-line gap threshold (Th dash ), as shown in Figure 17a, and (2) when a gap between consecutive edge lane marking segments is greater than the distance threshold (Th dist ) as shown in Figure 17b. Th dash is defined as 10 m since it is slightly larger than the standard length of a gap between two successive dash lines (9 m), and Th dist is set to 20 cm, which is used for the distance-based region growing-as discussed in Section 4.2. One should note that the lane marking segments derived from the normalized intensity thresholding strategy and U-net model 2 are used to report lane marking gap areas. More specifically, lane markings extracted from the former are utilized to report gaps along edge lane markings while the results obtained from the latter help in the identification of the gaps along dash lane markings. This framework, based on results that will be illustrated in Section 5.2, ensures that areas with lane marking gaps are not underestimated. Remaining gaps (less than Thmiss) are reported as short lane marking gap regions based on two cases, as shown in Figure 17: (1) when a gap between consecutive dash lane marking segments is greater than a dash-line gap threshold (Thdash), as shown in Figure 17a, and (2) when a gap between consecutive edge lane marking segments is greater than the distance threshold (Thdist) as shown in Figure 17b. Thdash is defined as 10 m since it is slightly larger than the standard length of a gap between two successive dash lines (9 m), and Thdist is set to 20 cm, which is used for the distance-based region growing-as discussed in Section 4.2. One should note that the lane marking segments derived from the normalized intensity thresholding strategy and U-net model 2 are used to report lane marking gap areas. More specifically, lane markings extracted from the former are utilized to report gaps along edge lane markings while the results obtained from the latter help in the identification of the gaps along dash lane markings. This framework, based on results that will be illustrated in Section 5.2, ensures that areas with lane marking gaps are not underestimated.

Experimental Results and Discussion
In this research, the three datasets were surveyed on two-lane highways (datasets 1 and 2 are on an interstate highway while dataset 3 is on a rural highway). The PWMMS-HA can capture point clouds for both driving and non-driving lanes, hereafter called "lane 1" and "lane 2", respectively.  Table 4. The values of these thresholds are kept the same across all datasets.

Experimental Results and Discussion
In this research, the three datasets were surveyed on two-lane highways (datasets 1 and 2 are on an interstate highway while dataset 3 is on a rural highway). The PWMMS-HA can capture point clouds for both driving and non-driving lanes, hereafter called "lane 1" and "lane 2", respectively.  Table 4. The values of these thresholds are kept the same across all datasets.

Intensity Thresholding Approaches
In this research, small road surface point clouds in ROIs 1, 2, and 3, as shown in Figure 3, were randomly selected in concrete pavement areas to generate the intensity normalization maps for each dataset. The number of the sensors, driving speeds, and map generation cell sizes of these small point clouds in these ROIs are listed in Table 5. According to the number of sensors used for the different datasets in Table 5, the total number of the intensity normalization maps generated for datasets 1, 2, and 3 are 3, 4, and 4, respectively. As mentioned previously, a cell size for generating the map is chosen based on the LPS of the small point cloud, which is affected by the driving speed and the number of beams of a LiDAR sensor. Thus, for the same LiDAR sensor, the cell size is relatively large in ROI 3 because of faster driving speed, as shown in Table 5. In addition, for the same ROI, the cell size of VLP16 is slightly larger due to the fewer laser beams. Just as an example, Figure 18 shows the intensity normalization maps for one of the HDL32E LiDAR units in ROIs 1, 2, and 3. For the same LiDAR unit, samples of road surface point clouds with the original and normalized intensity values, and corresponding hypothesized lane markings in these ROIs are illustrated in Figure 19. As can be seen in Figure 18, the intensity normalization map for ROI 3 is significantly different from those for ROIs 1 and 2. This difference is attributed to the fact that datasets 1 and 2 were acquired on the same interstate highway, while dataset 3 was collected on a rural highway. For interstate highways, pavement material more resistant to wear and tear is used when compared to that for rural highways [33]. As expected, properties of the pavement surface strongly influence the original intensity values of the road point cloud and subsequently the corresponding intensity normalization map. Another evidence supporting the impact of pavement surface on intensity values can be seen in Figure 19, where the original intensity values in ROI 3 are significantly higher than those for ROIs 1 and 2. Once the intensity values of road surface point clouds were normalized, the hypothesized lane markings were extracted using a 5th percentile intensity threshold (Th I ). For performance comparison, the original point cloud was also utilized to extract hypothesized lane markings using the same threshold. It is apparent that hypothesized lane markings with less noise were extracted from the normalized point cloud, as shown in Figure 19.
the original and normalized intensity values, and corresponding hypothesized lane markings in these ROIs are illustrated in Figure 19. As can be seen in Figure 18, the intensity normalization map for ROI 3 is significantly different from those for ROIs 1 and 2. This difference is attributed to the fact that datasets 1 and 2 were acquired on the same interstate highway, while dataset 3 was collected on a rural highway. For interstate highways, pavement material more resistant to wear and tear is used when compared to that for rural highways [33]. As expected, properties of the pavement surface strongly influence the original intensity values of the road point cloud and subsequently the corresponding intensity normalization map. Another evidence supporting the impact of pavement surface on intensity values can be seen in Figure 19, where the original intensity values in ROI 3 are significantly higher than those for ROIs 1 and 2. Once the intensity values of road surface point clouds were normalized, the hypothesized lane markings were extracted using a 5th percentile intensity threshold (ThI). For performance comparison, the original point cloud was also utilized to extract hypothesized lane markings using the same threshold. It is apparent that hypothesized lane markings with less noise were extracted from the normalized point cloud, as shown in Figure 19.

Evaluation of Different Lane Marking Extraction Strategies
For training the U-net models, a total of 400 manually labeled intensity images and 1183 automatically labeled intensity images are used. Another 104 manually labeled images and 238 automatically labeled images are used for validation-which is part of the training process. The training and validation images are derived from datasets 1 and 3. The training images have been also augmented during each training epoch using: a) rotation of the image in the range from 0 • to 180 • in a clockwise direction, b) zooming in and out by resizing image between 80% (zoom in) to 120% (zoom out) of its original size, and c) horizontal flip. Additionally, a test dataset of 174 images is also curated from dataset 2 for performance evaluation of both intensity thresholding and deep learning approaches. Specifically, for the former, lane marking point cloud is converted to intensity image for subsequent performance evaluation. The experimental settings for training the U-net models are listed in Table 6. An Adam optimizer is used to update the network's weights. Finally, the U-net models are trained on the Google Colaboratory platform that provides K80 GPU access. In machine and deep learning applications, a loss value quantifies the difference between ground truth and prediction. A high loss value indicates poor prediction and vice versa. Figure 20 shows the training loss (calculated on training data) and validation loss (calculated on validation data) plots for U-net models 1 and 2. The plots show the loss values at each epoch of the training process (i.e., training and validation loss values are evaluated for each training epoch). While training data helps the model to optimize its weights for the given classification task, it is the performance on validation data that indicates if the model performs well on unseen data. The plots indicate that U-net model 2 achieves the lowest validation loss of 0.118, while model 1 achieves 0.173 loss value. This can be attributed to the larger training samples for U-net model 2, which helps it to learn varied scenarios. Table 7 presents the performance metrics for the state of the art strategies (original intensity thresholding and deep learning with manual labeling) and proposed approaches (normalized intensity thresholding and deep learning with automated labeling). Comparing the deep learning approaches, U-net model 1 shows high recall but poor precision rate resulting in a low F1-score. This means that false-positive detection for model 1 is significant (i.e., the model cannot distinguish well between lane markings and high-intensity outliers). On the other hand, U-net model 2 shows large, comparable precision and recall values leading to a much higher F1-score than model 1. This better performance can be explained by 2.5 times more training samples in U-net model 2 in comparison to U-net model 1. Larger training data helps U-net model 2 to learn a variety of scenarios and enables it to lower its false-positive rate in comparison to model 1. Samples of original intensity images and corresponding images with predicted lane markings derived from the deep learning strategies are displayed in Figure 21. As far as the shape of the detected lane markings is concerned, U-net model 1 tends to obtain irregular detections, especially when the lane Figure 19. Intensity normalization maps associated with an HDL 32E LiDAR unit for (a) ROI 1, (b) ROI 2, and (c) ROI 3. Table 6. Experimental settings for training the U-net models.

Learning rate
Step size by which gradient of the loss function is scaled to update the network weights  Comparing the deep learning approaches, U-net model 1 shows high recall but poor precision rate resulting in a low F1-score. This means that false-positive detection for model 1 is significant (i.e., the model cannot distinguish well between lane markings and high-intensity outliers). On the other hand, U-net model 2 shows large, comparable precision and recall values leading to a much higher F1-score than model 1. This better performance can be explained by 2.5 times more training samples in U-net model 2 in comparison to U-net model 1. Larger training data helps U-net model 2 to learn a variety of scenarios and enables it to lower its false-positive rate in comparison to model 1. Samples of original intensity images and corresponding images with predicted lane markings derived from the deep learning strategies are displayed in Figure 21. As far as the shape of the detected lane markings is concerned, U-net model 1 tends to obtain irregular detections, especially when the lane  Comparing the deep learning approaches, U-net model 1 shows high recall but poor precision rate resulting in a low F1-score. This means that false-positive detection for model 1 is significant (i.e., the model cannot distinguish well between lane markings and high-intensity outliers). On the other hand, U-net model 2 shows large, comparable precision and recall values leading to a much higher F1-score than model 1. This better performance can be explained by 2.5 times more training samples in U-net model 2 in comparison to U-net model 1. Larger training data helps U-net model 2 to learn a variety of scenarios and enables it to lower its false-positive rate in comparison to model 1. Samples of original intensity images and corresponding images with predicted lane markings derived from the deep learning strategies are displayed in Figure 21. As far as the shape of the detected lane markings is concerned, U-net model 1 tends to obtain irregular detections, especially when the lane marking is surrounded by high-intensity outliers, as shown in Figure 21d. On the other hand, U-net model 2 is capable of extracting the regular structure of lane markings, as shown in Figure 21e. Figures 22 and 23 depict samples of original intensity images and corresponding images with predicted lane markings obtained from the four different strategies in yellow edge lane marking and worn-out dash lane marking areas, respectively (both samples are over asphalt pavement). In these figures, one can observe that the deep learning approaches show much higher recall (i.e., most of the true lane markings have been identified), while the intensity thresholding ones show higher precision (i.e., a lower percentage of false positives). This is expected since the lane markings extracted by the intensity thresholding were processed through the noise removal strategy that removes a significant number of lane marking outliers. However, during the noise removal procedure, some true lane markings could be wrongly eliminated, especially for yellow lane markings where point density is low. This results in a lower recall rate for the intensity thresholding approaches. The deep learning approaches, in contrast, can extract lane markings in such cases. This can be explained by the fact that they detect lane markings based on both content and context (intensity as well as point density and location of points), while the intensity thresholding approaches rely on content alone (intensity and point density), as shown in Figure 22. However, the deep learning approaches miss worn-out dash lane markings in some areas, as shown in Figure 23e,f. This is because of the training data bias where the point density of dash lane markings is usually high because of a small scanner-to-object distance for these markings. Since worn-out dash lane markings have low point density, missing detection could be expected from deep learning approaches. One should note that the argument of context does not hold here (contrary to edge lane markings) since dash lane markings have a smaller length. In contrast, the intensity thresholding approaches can extract these lane markings if their properties satisfy the criteria specified by the Th pt and Th dist thresholds during the noise removal strategies, as shown in Figure 23c,d. By utilizing the respective shortcomings of the intensity and deep learning approaches in areas of low point density, a conservative estimate of lane marking gap regions is reported along with their locations, which can be visually inspected through RGB imagery. Thus, based on results illustrated in Figures 22 and 23, lane markings from normalized intensity thresholding strategy are analyzed to identify gaps along edge lines while lane markings extracted through U-net model 2 are utilized to report gaps along dash lines in Section 5.3.

Comparison Between Asphalt and Concrete Pavement Areas
In addition to the condition of the lane marking, the nature of the pavement surface plays a critical role in lane marking extraction. As mentioned previously, while asphalt pavements have low reflectivity, concrete pavements produce high-intensity values, which are close to those for lane markings. Figure 24 illustrates typical intensity images for asphalt and concrete pavement regions in dataset 3. Low intensity contrast between lane markings and its surrounding concrete pavement leads to high noise in the original intensity thresholding strategy. For the same regions in Figure 24,

Comparison Between Asphalt and Concrete Pavement Areas
In addition to the condition of the lane marking, the nature of the pavement surface plays a critical role in lane marking extraction. As mentioned previously, while asphalt pavements have low reflectivity, concrete pavements produce high-intensity values, which are close to those for lane markings. Figure 24 illustrates typical intensity images for asphalt and concrete pavement regions in dataset 3. Low intensity contrast between lane markings and its surrounding concrete pavement leads to high noise in the original intensity thresholding strategy. For the same regions in Figure 24,

Comparison Between Asphalt and Concrete Pavement Areas
In addition to the condition of the lane marking, the nature of the pavement surface plays a critical role in lane marking extraction. As mentioned previously, while asphalt pavements have low reflectivity, concrete pavements produce high-intensity values, which are close to those for lane markings. Figure 24 illustrates typical intensity images for asphalt and concrete pavement regions in dataset 3. Low intensity contrast between lane markings and its surrounding concrete pavement leads to high noise in the original intensity thresholding strategy. For the same regions in Figure 24, the predicted lane marking images derived from all strategies are presented in Figures 25 and 26. In asphalt pavement area, all the strategies lead to complete extraction of lane markings, as shown in Figure 25. However, in concrete pavement area, the original intensity thresholding cannot completely extract edge lane markings, as shown in Figure 26b, but the normalized intensity thresholding strategy and deep learning approaches avoid this problem. These three strategies eliminate high noise in concrete pavement area while extracting all lane markings.  The length of the road segments where lane markings have been extracted are also compared in dataset 3 with dominant concrete pavement. Figure 27 shows the results of the length comparison. As mentioned previously, road surface point clouds covering two-lane highways were used for lane marking extraction. Thus, all the datasets contain dash center lines and solid edge lines on either side of the center dash lines, hereafter respectively called "center, left, and right lane markings," as shown in Figure 27a. The length of the lane markings (center, left, and right lane markings) obtained from all strategies are evaluated for both asphalt and concrete pavement areas in dataset 3. One should note that in dataset 3, the driving lane was maintained throughout the data collection campaign and was bounded by the center and right lane markings. Dataset 3 is 15.29 mile (24.61 km) long and the total length of the different lane markings extracted in asphalt and concrete pavement areas are tabulated in Figure 27b,c, respectively. As shown in Figure 27b,c, the percentages of the extracted lane markings indicate gaps, which could be caused by, (1) missing/worn-out lane markings and/or road intersections or (2) shortcomings of the strategies themselves. Following are the findings of the analysis: 1.
In asphalt pavement area, the results from the different strategies are comparable except for the right lane markings where U-net model 1 has poor performance. The results of model 1 are unexpected, and it is hypothesized that this is a result of unintended adversarial noise [34] in intensity images generated for these areas.

Lane Width Estimation Results
In this section, we compare the lane width estimation results for all strategies across the three datasets. As mentioned previously, an estimate of the lane width is automatically derived every 20 cm using the approach proposed by Ravi et al. [18]. Depending on the starting position for lane width estimation, which depends on the extracted lane markings, the locations of lane width estimates might slightly differ from one strategy to another. Thus, for lane width comparison, a difference is calculated if the distance between two estimates is less than 20 cm. Consequently, the number of comparisons is slightly different for each strategy. Throughout the following comparisons, lane width values based on the normalized intensity thresholding serve as the reference for comparison because its lane markings were used to automatically generate the training labels for U-net model 2. While explaining the different results, this section refers to lane markings before noise removal as "hypothesized lane marking points," and the ones after noise removal as "lane marking centerline points." Finally, the section concludes with a quantitative comparison between manually evaluated and automatically derived lane width estimates from different strategies.

Datasets 1 and 2: Mainly Asphalt Pavement
For dataset 1, Table 8 lists the number of comparisons, estimated length (total distance over which lane width estimates are obtained), and difference statistics for the four strategies. The lane width estimates in dataset 1 using various strategies are illustrated in Figure 28. These results indicate that lane width estimates from the intensity thresholding approaches are similar. Lane width estimates from the normalized intensity thresholding, on the other hand, differ from the ones obtained from the deep learning approaches. This difference is attributed to the fact that the deep learning approaches, with higher recall values, extract most of the actual lane markings including worn-out lane markings, which might be missed by thresholding strategies (i.e., more interpolation is conducted for the intensity thresholding strategies). Additionally

Lane Width Estimation Results
In this section, we compare the lane width estimation results for all strategies across the three datasets. As mentioned previously, an estimate of the lane width is automatically derived every 20 cm using the approach proposed by Ravi et al. [18]. Depending on the starting position for lane width estimation, which depends on the extracted lane markings, the locations of lane width estimates might slightly differ from one strategy to another. Thus, for lane width comparison, a difference is calculated if the distance between two estimates is less than 20 cm. Consequently, the number of comparisons is slightly different for each strategy. Throughout the following comparisons, lane width values based on the normalized intensity thresholding serve as the reference for comparison because its lane markings were used to automatically generate the training labels for U-net model 2. While explaining the different results, this section refers to lane markings before noise removal as "hypothesized lane marking points," and the ones after noise removal as "lane marking centerline points." Finally, the section concludes with a quantitative comparison between manually evaluated and automatically derived lane width estimates from different strategies. To further evaluate the strategies, we also processed dataset 2, which is mainly asphalt pavement but has more concrete pavements than dataset 1. The lane width estimates in dataset 2 from the different lane marking extraction strategies are presented in Figure 29. The number of comparisons, estimated length, and difference statistics for dataset 2 are summarized in Table 9. The normalized intensity thresholding strategy produces lane width estimates over a larger distance when compared to the strategy using original intensity values in both lanes. Compared with the intensity thresholding approaches, as shown as the red box in Figure 29, more lane width values were estimated using the deep learning approaches, especially in poor lane marking areas. The hypothesized lane markings, lane marking centerline points, and interpolated points obtained from the different strategies in such area (red box in Figure 29) are illustrated in Figure 30. From this figure, we can observe that worn-out lane markings were removed by the minimum point threshold (Th pt ). However, they were kept in the deep learning approaches due to the minimum area threshold (Th area ), which was applied to 3D masks generated from its predictions. It is also observed that U-net model 1 results in almost 1-mile longer lane width estimation than U-net model 2 in both lanes. This longer estimation is owed to a higher recall rate of 98.9% in U-net model 1, as reported in Table 7. However, U-net model 1, with the low precision of 60.5%, needs higher computation time than model 2 for eliminating many false positives through the noise removal strategies. In this dataset, it took approximately 22 min for noise removal from the results from U-net model 1 and around 18 min for model 2.

Dataset 3: Mainly Concrete Pavement
For dataset 3, Figure 31 shows the lane width profile derived from the different lane marking extraction strategies, and Table 10 summarizes the number of comparisons, estimated length, and difference statistics among the four strategies. As shown in Figure 31, it is apparent that lane width estimates obtained from the original intensity thresholding strategy and deep learning approaches differ significantly from the normalized intensity thresholding strategy in some areas. The RGB imagery for two such areas are shown in Figure 32a,b, also indicated as red boxes I and II in Figure  31. Referring to the red box I in Figure 31, hypothesized lane markings, lane marking centerline points, and interpolated centerline points for all lane marking extraction strategies are displayed in Figure 32c. This figure shows that five dash lane markings were not extracted using the intensity thresholding strategy due to higher noise in concrete pavement. Figure 32d, which refers to the red box II in Figure 31, compares hypothesized lane markings, lane marking centerline points, and interpolated centerline points obtained from the different strategies. Three dash lane markings were not detected in the deep learning strategy using U-net model 2. This misdetection is caused by the training data bias of U-net model 2, as shown in Figure 23.
In summary, the performance of the original intensity thresholding strategy gradually declines with an increase in the area of concrete pavement, but the other three strategies can extract more lane

Dataset 3: Mainly Concrete Pavement
For dataset 3, Figure 31 shows the lane width profile derived from the different lane marking extraction strategies, and Table 10 summarizes the number of comparisons, estimated length, and difference statistics among the four strategies. As shown in Figure 31, it is apparent that lane width estimates obtained from the original intensity thresholding strategy and deep learning approaches differ significantly from the normalized intensity thresholding strategy in some areas. The RGB imagery for two such areas are shown in Figure 32a,b, also indicated as red boxes I and II in Figure 31. Referring to the red box I in Figure 31, hypothesized lane markings, lane marking centerline points, and interpolated centerline points for all lane marking extraction strategies are displayed in Figure 32c. This figure shows that five dash lane markings were not extracted using the intensity thresholding strategy due to higher noise in concrete pavement. Figure 32d, which refers to the red box II in Figure 31, compares hypothesized lane markings, lane marking centerline points, and interpolated centerline points obtained from the different strategies. Three dash lane markings were not detected in the deep learning strategy using U-net model 2. This misdetection is caused by the training data bias of U-net model 2, as shown in Figure 23. markings in such area as validated by a longer distance where lane width estimates are reported across all datasets. The longer lengths for lane width estimation in dataset 2 for both lanes confirm the claim that the deep learning approaches perform better in areas of worn-out edge lane markings when compared to intensity thresholding strategies. The standard deviations of the difference statistics for all datasets, which range from 1.1 to 3.0 cm, indicating that the lane width estimates from the different strategies are compatible within a 1 to 3 cm range.    In summary, the performance of the original intensity thresholding strategy gradually declines with an increase in the area of concrete pavement, but the other three strategies can extract more lane markings in such area as validated by a longer distance where lane width estimates are reported across all datasets. The longer lengths for lane width estimation in dataset 2 for both lanes confirm the claim that the deep learning approaches perform better in areas of worn-out edge lane markings when compared to intensity thresholding strategies. The standard deviations of the difference statistics for all datasets, which range from 1.1 to 3.0 cm, indicating that the lane width estimates from the different strategies are compatible within a 1 to 3 cm range.

Comparison With Manual Lane Width Measurements
In order to demonstrate the robustness of the lane marking extraction and lane width estimation strategies, the automatically derived lane width estimates are compared to manually evaluated ones for all datasets. Figure 33

Comparison With Manual Lane Width Measurements
In order to demonstrate the robustness of the lane marking extraction and lane width estimation strategies, the automatically derived lane width estimates are compared to manually evaluated ones for all datasets. Figure 33    A lane width difference is calculated if the distance between the manual and automated estimates is less than 20 cm apart as per the Th dist threshold. Although the same manually evaluated lane width estimates are used to examine the lane width estimates in each dataset, the number of comparisons is different for each strategy due to the expected variation in the locations of automatically derived lane width estimates. The quantitative metrics, including the mean, standard deviation, root-mean-square error (RMSE), and maximum difference between the manually and automatically evaluated lane width estimates are summarized in Table 11. Overall, there is no difference greater than 7 cm for all datasets, and the RMSE values of the differences range from 1.2 to 2.8 cm, indicating good agreement among the manually and automatically-evaluated estimates. Moreover, the differences are coherent with the 2-4 cm expected accuracy range of the point cloud for the used system. The slightly larger mean differences in lane 2 reflect the slightly poor accuracy for points with longer scanning distance.

Lane Marking Gap Results
As mentioned previously, all the datasets include center, left, and right lane markings, as shown in Figure 27a. For each dataset, derived lane markings from the normalized intensity thresholding strategy are used to report right and left lane marking gaps (i.e., along edge lane markings) while U-net model 2 results are utilized to report the same along center (i.e., along dash lane markings). One should note that in all the datasets, the left lane markings are yellow edge lines, while the center and right lane markings are white dash and edge lines, respectively. For the different datasets, the driving lane, which is bounded by the center and right lane markings, was maintained during the data collection. The long lane marking gap regions along the road surface for the three datasets are reported in Figure 34. The figure also shows an example of a location with a long gap (more than Th miss ) for each of the datasets. It can be seen that the dash lane markings in Figure 34b,c are obviously worn-out or missing (identified through U-net model 2) in the RGB imagery, while the yellow markings in Figure 34a are slightly worn (identified through normalized thresholding strategy) in the image. The total length and average (total length of the gaps divided by the length of the dataset) of long lane marking gaps in datasets 1, 2, and 3 are summarized in Table 12. On the other hand, short lane marking gap regions along center, left, and right lane markings for the three datasets are reported in Figure 35, which also shows an example of a location with a short gap (less than Th miss ). The RGB imagery in Figure 35 shows the worn-out lane markings at locations i, ii, and iii for datasets 1, 2, and 3, respectively. Overall, this reporting algorithm quickly identifies a large number of regions that require further visual inspection, which can reduce cost and time for on-site inspections.

Conclusions and Recommendations for Future Research
Lane marking extraction through intensity thresholding of LiDAR-based MMS point clouds has traditionally suffered from the problem of large false positives. Hence, prior knowledge is required for noise removal. In contrast, learning-based approaches can detect lane markings from an intensity image without a specific prerequisite, but they are limited by the tedious procedure of manual labeling for training data generation. In this paper, in order to address these challenges, normalized intensity thresholding and deep learning strategies with automatically generated labels are proposed for extracting lane markings from LiDAR-based MMS point clouds. To test the performance of the proposed strategies, an original intensity thresholding strategy and a deep learning strategy using manually established labels are also implemented. In addition, the performance evaluation of all strategies is also carried out in asphalt and concrete pavement areas. For the original and normalized intensity thresholding strategies, lane markings were directly extracted from the road surface point cloud. For the deep learning approaches, lane markings were detected from generated intensity images using U-net models trained on manually established (model 1) and automatically-generated labels (model 2). Additionally, the lane markings extracted through the normalized intensity thresholding strategy and U-net model 2 were used to report lane marking gap regions along edge lines and dash lines, respectively. Lastly, the lane marking derived from all strategies are utilized for lane width estimation.
In this research, three datasets, with a total length of about 67 miles, were surveyed on two-lane highways that covered both concrete and asphalt pavement areas. Compared with the lane markings from thresholding of the original intensity, hypothesized lane markings derived from the normalized intensity thresholding strategy have less false positives. On the other hand, U-net model 2 performs better than model 1, as indicated by a higher F1-score. The precision, recall, and F1-score obtained for U-net model 1 are 60.5%, 98.9%, and 75.1%, respectively. Moreover, the derived precision, recall, and F1-score for U-net model 2 are 84%, 87.9%, and 85.9%, respectively. Further, the same metrics for the normalized intensity thresholding strategy were obtained as 83.9%, 74.4%, and 78.9%, respectively, indicating a performance better than U-net model 1 but not model 2. The original intensity thresholding strategy has an inferior overall performance than the above strategies with an F1-score of 72.3%. In concrete pavement area, high-intensity outliers are successfully eliminated by the normalized intensity thresholding and both deep learning strategies, unlike the thresholding of original intensity values. In addition, the lane width estimation results demonstrate that the deep learning approaches could extract more lane markings than other strategies in poor edge lane marking area and non-driving lane. Since this research is based on an MMS equipped with accurately calibrated imaging and ranging systems, reported lane marking gaps can be visually inspected in the RGB imagery to evaluate the cause of such gaps (e.g., missing and/or worn-out lane markings).
Future research will focus on developing an intensity normalization algorithm for an MMS equipped with single-beam LiDAR scanners. According to the assumption that the intensity values across laser beams should be similar for the same surface, utilization of an MMS equipped with two or more single-beam LiDAR scanners should also achieve the same intensity normalization effect. Another focus will be to increase the number of training samples for the U-net model trained on automatically generated labels by including samples from other single-beam LiDAR datasets. This will enhance the generalization capability of the U-net model across different types of sensors as well as improve the detection results on problematic cases such as worn-out dash lane markings with low point density. Additionally, the RGB information from imagery will be combined with point cloud data to improve the accuracy of lane marking extraction (especially those that are worn-out).

Funding:
The research was supported in part by the Joint Transportation Research Program administered by the Indiana Department of Transportation and Purdue University. The contents of this paper reflect the views of the authors, who are responsible for the facts and the accuracy of the data presented herein, and do not necessarily reflect the official views or policies of the sponsoring organizations.