A General Spline-Based Method for Centerline Extraction from Different Segmented Road Maps in Remote Sensing Imagery

: Road centerline extraction is the foundation for integrating the segmented road map from a remote sensing image into a geographic information system (GIS) database. Considering that existing approaches tend to have a decline in performance for centerline and junction extraction when segmented road structures are irregular, this paper proposes a novel method which models the road network as a sequence of connected spline curves. Based on this motivation, the ratio of cross operators is ﬁrstly proposed to detect direction and width features of roads. Then, road pixels are divided into different clusters by local features using three perceptual grouping principles (i.e., direction grouping, proximity grouping, and continuity grouping). After applying a polynomial curve ﬁtting on each cluster using pixel coordinates as observation data, the internal control points are determined according to the adjacency relation between clusters. Finally, road centerlines are generated based on spline ﬁtting with constraints. We test our approach on segmented road maps which were obtained previously by machine recognition, or manual extraction from real optical (WorldView-2) and synthetic aperture radar (TerraSAR-X, Radarsat-2) images. Depending on the accuracy of the input segmented road maps, experimental results from our test data show that both the completeness and correctness of extracted centerlines are over 84% and 68% for optical and radar images, respectively. Furthermore, experiments also demonstrate the advantages of our proposed method, in contrast to existing methods for gaining smooth centerlines and precise junctions.


Introduction
Road databases play an essential role in modern transportation systems. With the development of remote sensing technology, remote sensing images have become one of the main sources of road information acquisition. Starting from a remote sensing image, road centerline extraction is one of the key technologies for practical applications, such as road data updates in geographic information systems (GISs). Although manual marking of road centerline is more accurate, it is a time-consuming and labor-intensive task; therefore, automatic or semi-automatic road centerline extraction from remote sensing data has been a significant research activity during past decades [1]. Nevertheless, because of the data noise, the diversity of road types and complex backgrounds surrounding the road, extracting complete and correct centerlines automatically is still challenging.
Until now, a variety of road centerline extraction methods have been proposed from different viewpoints. In general, there are two kinds of methods for road centerline extraction. One is tracking road centerlines directly from remote sensing images. For example, Zhou et al. [2] introduced a road tracking system based on human-computer interactions and Bayesian filtering. Cheng et al. [3] proposed using a parallel particle filter to track road centerlines. Dal Poz et al. [4] presented a methodology which extracts road seeds first with (1) using a ratio of cross (RoC) operation to derive road orientation and width; (2) clustering road pixels based on three perceptual grouping principles; and (3) extracting control points and generating the road centerline map by spline fitting. It can also be observed that the proposed method is validated visually and numerically by comparing extraction results with ground truth data. Notably, the McNemar test is applied to measure the differences of extraction results using different approaches. For the analysis, the effects of three key parameters on the accuracy of the extraction results are discussed in detail.

Ratio of Cross Detector for Feature Extraction
In this section, a novel detector named ratio of cross (RoC) is proposed for extracting features of road direction and width. In high-resolution remote sensing images, roads usually appear as an elongated area with a specific width. Based on these characteristics, RoC is designed for computing the length ratios in a circular window at the road pixel (See Figure 2a). Given the diameter D of window and the number N of the direction division, the direction v of the current road pixel can be expressed as where i c and j c denote the length of the line covered by road pixels at the direction i and j which are perpendicular to each other. Then, the width w of the current center pixel is estimated by = u w c where ⊥ u v . In the case of = 8 N shown in Figure 2a, it can be inferred that 1 8 c c is the maximum value of i j c c and thus direction 1 will be the detected direction for road pixel O . In the implementation, the value of i c can be estimated using image convolution with a linear structuring element towards a specific direction. Figure 2b presents the length ratios at a different direction for two test road pixels which belong to horizontal and vertical road segments, respectively. It has been proven that the maximum length ratio is able to highlight the orientation of sampled road pixels. Furthermore, Figure 3 shows an example of the application of the RoC detector.

Ratio of Cross Detector for Feature Extraction
In this section, a novel detector named ratio of cross (RoC) is proposed for extracting features of road direction and width. In high-resolution remote sensing images, roads usually appear as an elongated area with a specific width. Based on these characteristics, RoC is designed for computing the length ratios in a circular window at the road pixel (See where c i and c j denote the length of the line covered by road pixels at the direction i and j which are perpendicular to each other. Then, the width w of the current center pixel is estimated by w = c u where u⊥v. In the case of N = 8 shown in Figure 2a, it can be inferred that c 1 /c 8 is the maximum value of c i /c j and thus direction 1 will be the detected direction for road pixel O. In the implementation, the value of c i can be estimated using image convolution with a linear structuring element towards a specific direction. Figure 2b presents the length ratios at a different direction for two test road pixels which belong to horizontal and vertical road segments, respectively. It has been proven that the maximum length ratio is able to highlight the orientation of sampled road pixels. Furthermore, Figure 3 shows an example of the application of the RoC detector. The colored areas in (a), (b), and (c) of Figure 3 correspond to road regions. Different directions and width values derived from RoC detectors are labelled with different colors. Considering that a pixel belonging to a direction i is likely to have neighboring pixels belonging to the same direction, a Markov random fields (MRF) [18] framework is applied to the original direction Remote Sens. 2022, 14,2074 4 of 20 map for de-noising (i.e., obtaining a "cleaner" road direction map). Figure 3b presents the de-noising result which uses Figure 3a as an initialization. The colored areas in (a), (b), and (c) of Figure 3 correspond to road regions. Different directions and width values derived from RoC detectors are labelled with different colors.
Considering that a pixel belonging to a direction i is likely to have neighboring pixels belonging to the same direction, a Markov random fields (MRF) [18] framework is applied to the original direction map for de-noising (i.e., obtaining a "cleaner" road direction map). Figure 3b presents the de-noising result which uses Figure 3a as an initialization.

Object-Based Perceptual Grouping for Clustering
The concept of perceptual grouping is proposed based on the phenomenon that human observers often show a capability to perceptually put parts together into a whole [19]. In this paper, road pixels with a similar direction and width are expected to be grouped into clusters, which will be the base unit for spline fitting to construct the final road centerlines. To this end, three perceptual grouping rules are proposed in detail in the following sections.

Direction Grouping
According to the detected orientation feature on the road direction map, it is easy to notice that road regions have been separated piece by piece. When describing roads, from pixels to segments, the connected-component (CC) labeling [20] method is firstly applied on a road map at each direction feature. In other words, connected road pixels with the same direction feature are combined into one cluster. We call this step direction grouping. For illustrative purposes, let us denote by = { } S s the set of clusters where s is a set of pixel points in CC. According to the adjacency relation, we define a set of neighbors for The colored areas in (a), (b), and (c) of Figure 3 correspond to road regions. Different directions and width values derived from RoC detectors are labelled with different colors.
Considering that a pixel belonging to a direction i is likely to have neighboring pixels belonging to the same direction, a Markov random fields (MRF) [18] framework is applied to the original direction map for de-noising (i.e., obtaining a "cleaner" road direction map). Figure 3b presents the de-noising result which uses Figure 3a as an initialization.

Object-Based Perceptual Grouping for Clustering
The concept of perceptual grouping is proposed based on the phenomenon that human observers often show a capability to perceptually put parts together into a whole [19]. In this paper, road pixels with a similar direction and width are expected to be grouped into clusters, which will be the base unit for spline fitting to construct the final road centerlines. To this end, three perceptual grouping rules are proposed in detail in the following sections.

Direction Grouping
According to the detected orientation feature on the road direction map, it is easy to notice that road regions have been separated piece by piece. When describing roads, from pixels to segments, the connected-component (CC) labeling [20] method is firstly applied on a road map at each direction feature. In other words, connected road pixels with the same direction feature are combined into one cluster. We call this step direction grouping. For illustrative purposes, let us denote by = { } S s the set of clusters where s is a set of pixel points in CC. According to the adjacency relation, we define a set of neighbors for

Object-Based Perceptual Grouping for Clustering
The concept of perceptual grouping is proposed based on the phenomenon that human observers often show a capability to perceptually put parts together into a whole [19]. In this paper, road pixels with a similar direction and width are expected to be grouped into clusters, which will be the base unit for spline fitting to construct the final road centerlines. To this end, three perceptual grouping rules are proposed in detail in the following sections.

Direction Grouping
According to the detected orientation feature on the road direction map, it is easy to notice that road regions have been separated piece by piece. When describing roads, from pixels to segments, the connected-component (CC) labeling [20] method is firstly applied on a road map at each direction feature. In other words, connected road pixels with the same direction feature are combined into one cluster. We call this step direction grouping. For illustrative purposes, let us denote by S = {s} the set of clusters where s is a set of pixel points in CC. According to the adjacency relation, we define a set of neighbors for each cluster and denote this by δ(s). That is, δ(s) = {t|t ∈ S, t ↔ s} where t ↔ s represents the two clusters bordering each other in the image. Moreover, v(s) denotes the direction feature of s and w(s) is the mean value of the width feature for pixels in s using the width map derived from Section 3.1. To combine these clusters further, proximity grouping and continuity grouping are successively introduced.

Proximity Grouping
Proximity grouping aims to merge two clusters if one is surrounded by the other and they visually belong to a same road segment. To achieve this purpose, the Hausdorff distance is introduced, which is a dissimilarity measure for two sets of points in a metric space [21]. Given two sets of points A and B, the Hausdorff distance from A to B is defined as: where d(a, b) is a Euclidean distance between point a and b. Based on the Hausdorff distance, the proposed proximity grouping algorithm is summarized in Algorithm 1. We begin by sorting S and traversing all clusters by area from the largest one to the smallest one. If one cluster and its neighbor meet the threshold condition of the Hausdorff distance, they will be assigned with a same label. Figure 4 presents a sketch map of the Hausdorff distances between adjacent road clusters. It is noted that the threshold T H is set automatically to w(s) (i.e., the mean width for all pixels in s). Moreover, in order to continue cleaning up small clusters, those clusters whose areas are less than T A are merged by relabeling them with the label of the largest neighboring cluster. To this end, T A is fixed to 20 pixels in this paper. Finally, a new set S P is obtained by combining clusters which have the same new label.
each cluster and denote this by δ ( ) s . That is, δ = s where ↔ t s represents the two clusters bordering each other in the image. Moreover, ( ) v s denotes the direction feature of s and ( ) w s is the mean value of the width feature for pixels in s using the width map derived from Section 3.1. To combine these clusters further, proximity grouping and continuity grouping are successively introduced.

Proximity Grouping
Proximity grouping aims to merge two clusters if one is surrounded by the other and they visually belong to a same road segment. To achieve this purpose, the Hausdorff distance is introduced, which is a dissimilarity measure for two sets of points in a metric space [21]. Given two sets of points A and B , the Hausdorff distance from A to B is defined as: where ( , ) d a b is a Euclidean distance between point a and b . Based on the Hausdorff distance, the proposed proximity grouping algorithm is summarized in Algorithm 1. We begin by sorting S and traversing all clusters by area from the largest one to the smallest one. If one cluster and its neighbor meet the threshold condition of the Hausdorff distance, they will be assigned with a same label. Figure 4 presents a sketch map of the Hausdorff distances between adjacent road clusters. It is noted that the threshold H T is set automatically to ( ) w s (i.e., the mean width for all pixels in s ). Moreover, in order to continue cleaning up small clusters, those clusters whose areas are less than A T are merged by relabeling them with the label of the largest neighboring cluster. To this end, A T is fixed to 20 pixels in this paper. Finally, a new set P S is obtained by combining clusters which have the same new label.  for each t ∈ δ(s), do 8: If F(t) is false and H(t, s) < T H , assign L(t) = L(s), F(t) = true. 9: end for 10: end for

Continuity Grouping
However, it is often not the optimal grouping result if the proximity principle is the only one considered. It is rational that clusters which are colinear, and have similar directional and width features, should be merged into a whole. In order to measure continuity between two clusters, degree of collinearity [22], and the difference between direction and width, are proposed for examination. More specifically, the analysis of the eigenvalue matrix is conducted to find collinear point sets of roads. Let P be the matrix that contains all pixel coordinates in a given point set A where the first column represents pixel rows and the second represents pixel columns. The covariance matrix Σ of P is Σ = cov(P) = QΛQ T where Q is the eigenvector and Λ = diag(λ 1 , λ 2 ) is the eigenvalue matrix. Then, the contribution rate of the principle component can be obtained by λ(A) = λ 1 /(λ 1 + λ 2 ) where λ 1 > λ 2 . If λ 1 λ 2 , and thus, it can be concluded that A has a linear shape. For the case of two adjacent clusters, A and B, the differences between the contribution rates before and after being combined is given by ∆λ = |λ(A ∪ B) − λ(A)|. If ∆λ is close to zero, it indicates that the combination of A and B has little effect on the principle component characteristics of A; thus, the smaller the value of ∆λ, the more likely it is that A and B will be grouped together. For the specific implementation of this continuity grouping, Algorithm 1 is still adapted by using S P as input and replacing the condition H(t, s) < T H with the following three conditions: ( where s ∪ t denotes the union of points in s and t(t ∈ δ(s)), T λ is the threshold of the contribution rate difference before and after combination, T w is the threshold of the width difference which is set to 10 pixels, and N is the number of the direction division which is set to 8 in this paper. A new set S C is finally achieved by repeating Algorithm 1 and merging clusters with the same new label until there are no two adjacent clusters that satisfy conditions in (3). In order to illustrate the proposed coherent process of perceptual grouping, Figure 5 presents an example. Taking the direction map in Figure 5a as an input, Figure 5b shows the direction grouping result where different clusters are divided by black edges. Furthermore, the proposed proximity grouping is applied to Figure 5b, and Figure 5c is the grouping result. In Figure 5c, it is noticeable that the marked cluster B is collinear and adjacent to the marked cluster A, which satisfies the proposed continuity rules. Figure 5d gives a visual result of continuity grouping for Figure 5c. It can be seen that the clusters marked A and B in Figure 5c have been combined into one cluster after continuity grouping.

Road Centerline Extraction
After segmenting and grouping the road regions, centerlines are extracted based on each cluster to form the road network. In this paper, the road centerline network is modeled as a sequence of connected spline curves. This is done in two steps. The first step involves searching internal control points (i.e., nodes of splines). The second step involves applying a spline fitting according to the nodes on each road cluster.

Road Centerline Extraction
After segmenting and grouping the road regions, centerlines are extracted based on each cluster to form the road network. In this paper, the road centerline network is modeled as a sequence of connected spline curves. This is done in two steps. The first step involves searching internal control points (i.e., nodes of splines). The second step involves applying a spline fitting according to the nodes on each road cluster.

Control Point Searching
Starting from a set S C which contains all the clusters obtained after applying the perceptual grouping process in Section 3.2, the polynomial curve fitting [17] method is applied to each cluster s(s ∈ S C ) using pixel coordinates in s as observation data, and then, a thinning curve l s can be drawn. Let E s = e 1 s , e 2 s be the set of two endpoints of l s , and N s be the needed set of nodes related to s. Due to the fact that endpoints are undoubtedly one of the used nodes for the spline model, N s is initialized by N s = E s . If s has no neighboring cluster, it is certain that l s is the final extracted centerline. In the case that neighboring clusters exist, Figure 6 presents a schematic diagram for extracting the nodes. More specifically, the connection nodes for two adjacent clusters (s and t) are further determined by the following two rules: (1) In the case that l s and l t have intersection points, the midpoint o of intersections is added to the sets N s and N t .  [23] is then applied. That method starts by treating each node as a singleton cluster. Next, pairs of clusters are successively merged until all their Euclidean distances are smaller than a threshold of n T pixels.

Spline Fitting
Once the final nodes are achieved, road centerlines are extracted based on the spline fitting. Let where i x and i y denote coordinates of pixels between nodes m n and +1 m n in s , k is By checking all the road clusters and their neighbors in the map, a set of nodes can be achieved with N = ∪ s∈S c N s . To merge nodes that are close to each other in N, agglomerative hierarchical clustering [23] is then applied. That method starts by treating each node as a singleton cluster. Next, pairs of clusters are successively merged until all their Euclidean distances are smaller than a threshold of T n pixels.

Spline Fitting
Once the final nodes are achieved, road centerlines are extracted based on the spline fitting. Let N s = n 0 , n 1 , . . . , n q−1 represent q sorted nodes in a cluster s(s ∈ S C ) according to pixel rows or columns. Supposing that f m ( is the polynomial function denoting the curve between nodes n m and n m+1 (m = 0, 1, . . . , q − 2), each curve must be made to pass through its nodes and minimize the residual sum of squares (RSS), i.e., where x i and y i denote coordinates of pixels between nodes n m and n m+1 in s, k is the number of pixels, and β is the polynomial order which is set to 3. This is an optimization problem with equality constraints, and thus, can be solved using the general Lagrange multiplier method [24]. In order to illustrate the proposed centerline extraction method, Figure 7 presents a schematic diagram for spline fitting on a road cluster. It is worth mentioning that two nodes will be connected with a straight line if there are no pixels for fitting between these two nodes in a cluster, or the root-mean-square error (RMSE) of fitting is larger than the mean road width w(s) of the cluster.

Dataset Description and Result Evaluation
In order to access the effectiveness of the proposed methodology, six different segmented road maps from real satellite images were tested. More specifically, in our work, test images 1 and 2 were optical images from Worldview-2, and a method from [16] was used to obtain the segmented road binary map. In [16], a road segmentation model is proposed which combines the adversarial networks with multiscale context aggregation. Furthermore, test images 3 and 4 were TerraSAR-X and Radarsat-2 images, and roads are first segmented by the method in [17], which are based on multiplicative Duda operation and morphological profiles of path openings. Test images 5 and 6 were selected from the DeepGlobe road dataset [25] where roads were labelled manually.
To evaluate the proposed method, the results achieved in this paper are analyzed visually and numerically. For the visual analysis, the centerline extraction results are roughly evaluated by comparing the length, shape, junction, and connectedness of roads with the reference road centerlines. The reference centerlines were ground truth data generated manually from the original satellite images.
For the numerical comparison analysis, completeness (CP), correctness (CR), and quality (QL) are computed using the reference data based on the evaluation method in

Dataset Description and Result Evaluation
In order to access the effectiveness of the proposed methodology, six different segmented road maps from real satellite images were tested. More specifically, in our work, test images 1 and 2 were optical images from Worldview-2, and a method from [16] was used to obtain the segmented road binary map. In [16], a road segmentation model is proposed which combines the adversarial networks with multiscale context aggregation. Furthermore, test images 3 and 4 were TerraSAR-X and Radarsat-2 images, and roads are first segmented by the method in [17], which are based on multiplicative Duda operation and morphological profiles of path openings. Test images 5 and 6 were selected from the DeepGlobe road dataset [25] where roads were labelled manually.
To evaluate the proposed method, the results achieved in this paper are analyzed visually and numerically. For the visual analysis, the centerline extraction results are roughly evaluated by comparing the length, shape, junction, and connectedness of roads with the reference road centerlines. The reference centerlines were ground truth data generated manually from the original satellite images.
For the numerical comparison analysis, completeness (CP), correctness (CR), and quality (QL) are computed using the reference data based on the evaluation method in

Dataset Description and Result Evaluation
In order to access the effectiveness of the proposed methodology, six different segmented road maps from real satellite images were tested. More specifically, in our work, test images 1 and 2 were optical images from Worldview-2, and a method from [16] was used to obtain the segmented road binary map. In [16], a road segmentation model is proposed which combines the adversarial networks with multiscale context aggregation. Furthermore, test images 3 and 4 were TerraSAR-X and Radarsat-2 images, and roads are first segmented by the method in [17], which are based on multiplicative Duda operation and morphological profiles of path openings. Test images 5 and 6 were selected from the DeepGlobe road dataset [25] where roads were labelled manually.
To evaluate the proposed method, the results achieved in this paper are analyzed visually and numerically. For the visual analysis, the centerline extraction results are roughly evaluated by comparing the length, shape, junction, and connectedness of roads with the reference road centerlines. The reference centerlines were ground truth data generated manually from the original satellite images.
For the numerical comparison analysis, completeness (CP), correctness (CR), and quality (QL) are computed using the reference data based on the evaluation method in [26]. According to the principles in [26], a buffer is set to determine which parts of one road network are considered to be matched with the other. In this paper, the buffer is set to 5 pixels. In detail, the aforementioned evaluation indexes are given by where TP is the length of the extracted centerlines that matched with the reference data, FP is the length of extracted centerlines that do not match reference data, and FN denotes the length of reference centerlines that do not match extracted centerlines. The QL index can be treated as a general index which combines CP and CR. It can be also inferred that the values of all the three indexes range from 0 to 1. A larger value means that the extraction result is closer to the reference data.
Considering that the input segmented road map may have false and missing parts, its three quantitative indexes, mentioned above, can also be computed using the evaluation method in [26] by matching the ground truth data with the manually labelled centerlines of the input segmented road map. That is to say, a centerline is manually generated for each road segment of the input road map, and then the CP, CR, and QL values are calculated by comparing them with ground truth data. The accuracy for six input segmented road maps can be found in tables in Section 4.2.
In terms of computational efficiency, the total execution time (ET) of centerline extraction using different methods is also proposed in this paper. All the experiments were conducted on a computer with an AMD R7-4800H processor and 16GB RAM using MAT-LAB codes.

Experimental Results and Comparisons
This section shows the results achieved by the proposed method and four existing methods. The parameter setting used for the proposed method is also presented. In this section, the experimental results for segmented road maps from optical images are presented in Section 4.2.1. Section 4.2.2 proposes the results obtained using the segmented road map from SAR images. Next, the results obtained from labelled road images are shown in Section 4.2.3. Finally, Section 4.2.4 presents the comparison results of different methods based on a McNemar test.
To verify the performance, the proposed method is compared with four state-of-the-art methods. The first one is the classic morphological thinning (MT) algorithm [5]. The second is based on the Zhang-Suen thinning (ZST) algorithm [27]. Miao et al. [7] introduced a method using tensor voting and multivariate adaptive regression splines (MARS), which is the third one for comparison. The last one is the method presented by Cheng et al. [8] which utilized approaches of tensor voting and non-maximum suppression (NMS).
For the parameter setting, the proposed method mainly depends on three parameters, namely: diameter D of the window in Section 3.1; threshold T λ of the collinearity degree in Section 3.2.3; threshold T n for clustering nodes in Section 3.3.1. The parameters used in the 6 test images have been listed in Table 1. In conclusion, it is suggested that D should be slightly larger than the maximum width of the road in the image. Parameter T λ is usually a small number near zero. The continuity grouping step can be ignored when T λ is set to zero. T n is determined empirically to merge close nodes. More details for the analysis of these parameters can be found in Section 5. For the experiments on optical data, two TrueColor images, also known as RGB images, with 0.5 m spatial resolution collected from the Worldview-2 sensor, are used. The two regions are located in Wuzhen, Zhejiang, China. It is important to note that both test images 1 and 2 are shown in Figures 9a and 10a, and contain different road segments with large differences in width. The segmented road maps from test images 1 and 2 are presented in Figures 9b and 10b along with their reference centerline data in Figures 9c and 10c. The centerline extraction results using the proposed method and four existing methods are also displayed in Figures 9 and 10.  For the experiments on optical data, two TrueColor images, also known as RGB images, with 0.5 m spatial resolution collected from the Worldview-2 sensor, are used. The two regions are located in Wuzhen, Zhejiang, China. It is important to note that both test images 1 and 2 are shown in Figures 9a and 10a, and contain different road segments with large differences in width. The segmented road maps from test images 1 and 2 are presented in Figures 9b and 10b along with their reference centerline data in Figures 9c and 10c. The centerline extraction results using the proposed method and four existing methods are also displayed in Figures 9 and 10. Table 2 presents the quantitative indexes of evaluation for the two test images.    As can be observed from the results in Figure 9, the proposed method has the best performance for the marked junction area. The other four existing methods fail to extract an accurate junction, which leads to lower values in terms of quantitative indexes compared with the proposed method. In Figure 10, for test image 2, it can be also seen that the approaches of MT, ZST, and MARS bring loops when the irregular edges and holes exist on the segmented road region. The NMS method achieves the best result in terms of CR, although it fails to link the centerlines well in some intersection areas. In addition, one can also notice that the execution time of MARS and NMS is much longer than the other methods. This happens because tensor voting is a time-consuming process, especially when the road width of the image is large, and the large value should be set for the parameter of the voting scale. Overall, our proposed method provides results with better QL and acceptable ET.  As can be observed from the results in Figure 9, the proposed method has the best performance for the marked junction area. The other four existing methods fail to extract an accurate junction, which leads to lower values in terms of quantitative indexes compared with the proposed method. In Figure 10, for test image 2, it can be also seen that the approaches of MT, ZST, and MARS bring loops when the irregular edges and holes exist on the segmented road region. The NMS method achieves the best result in terms of CR, although it fails to link the centerlines well in some intersection areas. In addition, one can also notice that the execution time of MARS and NMS is much longer than the other methods. This happens because tensor voting is a time-consuming process, especially when the road width of the image is large, and the large value should be set for the parameter of the voting scale. Overall, our proposed method provides results with better QL and acceptable ET.

Cases for SAR Images
In the case of using SAR data, test image 3 with 1 m spatial resolution obtained from the TerraSAR-X senor, and test image 4 with 2 m resolution obtained from Radarsat-2 sensor, are used. The original SAR images and the segmented road map are presented in Figures 11 and 12. As before, extracted road centerlines using different methods are marked with black lines in the experimental results. The corresponding evaluation indexes for test images 3 and 4 are shown in Table 3.
Due to the effect of multiplicative speckle noise, the segmented road map from SAR images may contain more false and missing road parts, together with irregular edges, compared with those from optical data, as shown in Figures 11b and 12b. These irregular edges result in burrs when the MT method is applied. It can be explained by the topological equivalence principle during MT operation, which leads to better results for regular road segments, but is not good for irregular ones. It can be also noticed that a similar problem exists in the case of using the ZST method, which leads to a low value in the CR index. Although the NMS method shows the best performance in terms of CR among these methods, experiments still demonstrate the advantages of the proposed method when both QL and ET are considered. Moreover, the proposed method also shows the ability to connect small gaps, according to the blue box area in Figure 11. This is done by adjusting the parameter T n when merging close nodes in Section 3.3.1. For test image 4, although the extraction quality of the input road map is not high, the centerline extraction using the proposed method only brings about 3.3% quality loss.

Cases for SAR Images
In the case of using SAR data, test image 3 with 1 m spatial resolution obtained from the TerraSAR-X senor, and test image 4 with 2 m resolution obtained from Radarsat-2 sensor, are used. The original SAR images and the segmented road map are presented in Figures 11 and 12. As before, extracted road centerlines using different methods are marked with black lines in the experimental results. The corresponding evaluation indexes for test images 3 and 4 are shown in Table 3.
Due to the effect of multiplicative speckle noise, the segmented road map from SAR images may contain more false and missing road parts, together with irregular edges, compared with those from optical data, as shown in Figures 11b and 12b. These irregular edges result in burrs when the MT method is applied. It can be explained by the topological equivalence principle during MT operation, which leads to better results for regular road segments, but is not good for irregular ones. It can be also noticed that a similar problem exists in the case of using the ZST method, which leads to a low value in the CR index. Although the NMS method shows the best performance in terms of CR among these methods, experiments still demonstrate the advantages of the proposed method when both QL and ET are considered. Moreover, the proposed method also shows the ability to connect small gaps, according to the blue box area in Figure 11. This is done by adjusting the parameter n T when merging close nodes in Section 3.

For test image 4,
although the extraction quality of the input road map is not high, the centerline extraction using the proposed method only brings about 3.3% quality loss.

Cases for Labelled Road Images
For the experiments using manually labelled road maps, two test images selected from the labelled training data of the DeepGlobe road dataset [25] are used. The accuracy of input maps can be considered as 100%. As can be seen in Figures 13b and 14b, road segments are much more regular and smoother than that in the previous test images. Test image 5 is selected for testing the centerline extraction performance in mountainous areas where roads have a large curvature. As for test image 6, it includes complex road networks where many junctions exist, and the width of the road has a large change.
For test image 5, it is clear to note that the proposed method, and all the other four methods, demonstrate a good performance according to the experimental results in Table

Cases for Labelled Road Images
For the experiments using manually labelled road maps, two test images selected from the labelled training data of the DeepGlobe road dataset [25] are used. The accuracy of input maps can be considered as 100%. As can be seen in Figures 13b and 14b, road segments are much more regular and smoother than that in the previous test images. Test image 5 is selected for testing the centerline extraction performance in mountainous areas where roads have a large curvature. As for test image 6, it includes complex road networks where many junctions exist, and the width of the road has a large change.
For test image 5, it is clear to note that the proposed method, and all the other four methods, demonstrate a good performance according to the experimental results in Table 4 and Figure 13. Nevertheless, the MT approach has the shortest execution time and highest QL value. That indicates that MT should be the best choice for centerline extraction, on the condition that the road network is simple, and the road width is small. In the case of test image 6, the MT method still shows the best results and fastest speed for the quantitative indexes compared with other methods; however, it can be found that our proposed method provides more precise road junctions by comparing the experimental results with the reference data. Taking the intersections that are perpendicular to each other as an example, the extraction results of the proposed method can better ensure the vertical relationship between road centerlines, whereas the MT method and other methods cannot. To sum up, experiments show that the MT method should take priority over other approaches when the segmented roads are smooth and narrow, whereas the proposed method has advantages in constructing accurate road junctions when roads are complex and wide. Combined with the previous four test images, the experiments also emphasize the stability and flexibility of the proposed method. Thus, regardless of whether the segmented road map is disturbed by noise or not, our proposed method demonstrates a good performance in centerline extraction.  Figure 13. Nevertheless, the MT approach has the shortest execution time and highest QL value. That indicates that MT should be the best choice for centerline extraction, on the condition that the road network is simple, and the road width is small. In the case of test image 6, the MT method still shows the best results and fastest speed for the quantitative indexes compared with other methods; however, it can be found that our proposed method provides more precise road junctions by comparing the experimental results with the reference data. Taking the intersections that are perpendicular to each other as an example, the extraction results of the proposed method can better ensure the vertical relationship between road centerlines, whereas the MT method and other methods cannot. To sum up, experiments show that the MT method should take priority over other approaches when the segmented roads are smooth and narrow, whereas the proposed method has advantages in constructing accurate road junctions when roads are complex and wide. Combined with the previous four test images, the experiments also emphasize the stability and flexibility of the proposed method. Thus, regardless of whether the segmented road map is disturbed by noise or not, our proposed method demonstrates a good performance in centerline extraction.   ) if z is greater than 1.96.
In this paper, the sampled pixels for statistical analysis are composed of road pixels and non-road pixels. The sampled road pixels are selected from road segments in the ground truth data. Since non-road pixels occupy a large proportion of the image, and most of the non-road pixels are correctly classified as non-road pixels, it is only necessary to combine the wrongly extracted parts of the extraction results, using the two methods, to be compared against the sampled non-road pixels. More specifically, according to the matching rules in [26], the extracted centerlines that do not match reference data are treated as the wrongly extracted parts in an extraction result. Through analyzing the sampled pixels in the six test images, Table 5 presents the z values of the McNemar tests between every two mentioned centerline extraction methods in this paper.

Comparisons of Different Methods
In order to statistically test whether there is any significant difference between the extraction results of the proposed method and other four existing methods, a McNemar test [28] is performed. The McNemar test is a nonparametric test which focuses on the binary distinction between correct and incorrect class assignments. This test is conducted using the standardized normal test statistic where f 12 and f 21 denote the number of sampled pixels that are correctly classified with one method, and are incorrectly classified with the other method. Based on [28], the square of z follows a chi-squared (χ 2 ) distribution with one degree of freedom. Thus, it can be concluded that the difference between the extraction results of the two approaches is statistically significant (p < 0.05) if |z| is greater than 1.96. In this paper, the sampled pixels for statistical analysis are composed of road pixels and non-road pixels. The sampled road pixels are selected from road segments in the ground truth data. Since non-road pixels occupy a large proportion of the image, and most of the non-road pixels are correctly classified as non-road pixels, it is only necessary to combine the wrongly extracted parts of the extraction results, using the two methods, to be compared against the sampled non-road pixels. More specifically, according to the matching rules in [26], the extracted centerlines that do not match reference data are treated as the wrongly extracted parts in an extraction result. Through analyzing the sampled pixels in the six test images, Table 5 presents the z values of the McNemar tests between every two mentioned centerline extraction methods in this paper.
According to the McNemar tests, the results of the proposed method are significantly more accurate than those derived from the other four state-of-the-art methods in test images 1, 3, and 4 (p < 0.05). There is an exception in test image 2 where it indicates that the difference in the accuracy of the extractions derived from the proposed method and the NMS method are not statistically significant (p < 0.05). For the labelled road maps of test images 5 and 6, there are significant differences (|z| > 1.96) at a 95% confidence level when the MT method is compared with the other four methods, except for the case of the proposed method vs. the MT method in test image 6. In conclusion, the proposed method significantly outperforms the other four existing methods for mapping road centerlines from noise-disturbed images, such as SAR images. Although the method in this paper may not have had the best extraction quality for regular segmented road maps, it has advantages in constructing more precise junctions, while maintaining a high extraction quality, by visually comparing junction shapes in the experimental results.

Discussion
This section presents the parameter sensitivity analysis of the proposed method using the test images. These parameters include the diameter D in the stage of the RoC operation, threshold T λ in the stage of perceptual grouping, and threshold T n in the stage of spline fitting.

Analysis for RoC Detector
Concerning the RoC detector, the diameter D of the sliding window is a key parameter. Undoubtedly, the chosen value of D should ensure the accuracy of the road direction and width feature extraction as much as possible. In order to present the effects of parameter D, the quantitative indexes of results using the proposed method with different values of D in test images 2 and 6 are computed and compared. As illustrated in Figure 15, for the two test images, all the three indexes are at low values when D is small. As D increases, the quantitative indexes also tend to increase and then reach stable values. Empirically, the turning point to reach a plateau is on the condition that D reaches the maximum width of the road in the image. Thus, it is suggested that D should be set based on prior knowledge of the maximum road width; however, parameter D should not be too large. If the value of D is too large, it may lead to poor detection performance because the sliding window may pass through two or more parallel road segments. Additionally, the influence of the MRF method mentioned in Section 3.1 is introduced. Using the proposed RoC detector, roads are originally segmented into several classes according to the orientation feature. The application of MRF aims to merge isolated road pixels, which have different direction features, from surround pixels using the local spatial relationship. Table 6 presents the comparison results in the case of using the proposed method with and without the MRF procedure in test images 2, 4, and 6. It can be seen that the usage of MRF is able to slightly improve the centerline extraction results in terms of the indexes of CP, CR, and QL. By comparing the ET for test images 2 and 6, it is noted that the time cost of MRF is almost negligible; however, the case for test image 4 is an exception, where the ET of using the proposed method without MRF is significantly longer. This can be explained by the fact that the roads in test image 4 are noticeably irregular, and thus, the number of road clusters in S , after direction grouping, is quite large if the MRF step is not included. The computational complexity of Algorithm 1 for perceptual grouping in Section 3.2 is proportional to the number of road clusters in the input S . The application of MRF is beneficial to reduce the number of road clusters in advance.

Analysis for Perceptual Grouping
With regard to the stage of perceptual grouping for the proposed method, three parameters are mainly involved including H T for proximity grouping, along with T λ and w T , for continuity grouping. Threshold H T can be set adaptively using the corresponding road width value detected by the RoC operation. Threshold w T is recommended to be set to a single lane width; therefore, threshold T λ is the one that remains to be determined manually in practical application. In order to evaluate the parameter sensitivity, Additionally, the influence of the MRF method mentioned in Section 3.1 is introduced. Using the proposed RoC detector, roads are originally segmented into several classes according to the orientation feature. The application of MRF aims to merge isolated road pixels, which have different direction features, from surround pixels using the local spatial relationship. Table 6 presents the comparison results in the case of using the proposed method with and without the MRF procedure in test images 2, 4, and 6. It can be seen that the usage of MRF is able to slightly improve the centerline extraction results in terms of the indexes of CP, CR, and QL. By comparing the ET for test images 2 and 6, it is noted that the time cost of MRF is almost negligible; however, the case for test image 4 is an exception, where the ET of using the proposed method without MRF is significantly longer. This can be explained by the fact that the roads in test image 4 are noticeably irregular, and thus, the number of road clusters in S, after direction grouping, is quite large if the MRF step is not included. The computational complexity of Algorithm 1 for perceptual grouping in Section 3.2 is proportional to the number of road clusters in the input S. The application of MRF is beneficial to reduce the number of road clusters in advance.

Analysis for Perceptual Grouping
With regard to the stage of perceptual grouping for the proposed method, three parameters are mainly involved including T H for proximity grouping, along with T λ and T w , for continuity grouping. Threshold T H can be set adaptively using the corresponding road width value detected by the RoC operation. Threshold T w is recommended to be set to a single lane width; therefore, threshold T λ is the one that remains to be determined manually in practical application. In order to evaluate the parameter sensitivity, experiments with a different threshold T λ is conducted. Test images 2 and 6 are used, and the results are shown in Figure 16. When threshold T λ is 0, Equation (3) cannot be satisfied and that means the continuity grouping step will not be executed. It can be observed that a very small number greater than 0 can slightly improve the centerline extraction accuracy; however, when the value of T λ increases, the inflection point will appear, and the performance of the method will deteriorate sharply. It is clear that the location of the inflection point is not fixed, and it may vary from image to image. As a consequence, it is suggested to set T λ to zero or to try using a very small number empirically, in practice. experiments with a different threshold T λ is conducted. Test images 2 and 6 are used, and the results are shown in Figure 16. When threshold T λ is 0, Equation (3) cannot be satisfied and that means the continuity grouping step will not be executed. It can be observed that a very small number greater than 0 can slightly improve the centerline extraction accuracy; however, when the value of T λ increases, the inflection point will appear, and the performance of the method will deteriorate sharply. It is clear that the location of the inflection point is not fixed, and it may vary from image to image. As a consequence, it is suggested to set T λ to zero or to try using a very small number empirically, in practice.

Analysis for Spline Fitting
For the spline fitting stage of the proposed method, threshold n T determines the minimum distance between nodes. If the Euclidean distances between two nodes is smaller than n T , they will be replaced by the midpoint of the two nodes. Generally, a n T value that is too small may lead to the generation of small burrs at the spline junctions. When threshold n T is too large, nodes may deviate from the centerline position and further cause an inaccurate centerline extraction. The experimental results in Figure 17 illustrate the effects of parameter n T . Taking Figure 17a as an example, when n T is in the range of 0 to 40, QL is able to stabilize at around 75%; however, as the value of n T is greater than 40, there is a significant drop in QL. Experimental results show that n T should not be set to a large number. Empirically, it should not exceed the maximum road width in the image.

Analysis for Spline Fitting
For the spline fitting stage of the proposed method, threshold T n determines the minimum distance between nodes. If the Euclidean distances between two nodes is smaller than T n , they will be replaced by the midpoint of the two nodes. Generally, a T n value that is too small may lead to the generation of small burrs at the spline junctions. When threshold T n is too large, nodes may deviate from the centerline position and further cause an inaccurate centerline extraction. The experimental results in Figure 17 illustrate the effects of parameter T n . Taking Figure 17a as an example, when T n is in the range of 0 to 40, QL is able to stabilize at around 75%; however, as the value of T n is greater than 40, there is a significant drop in QL. Experimental results show that T n should not be set to a large number. Empirically, it should not exceed the maximum road width in the image.

Conclusions
In this paper, a general method for road centerline extraction from remote sensing images is proposed based on perceptual grouping and spline fitting. The proposed method is suitable for noisy data and is competitive in computational efficiency. Furthermore, experiments presented the merits of the proposed method in junction construction, gap connection, and the adaptability for road segmentation resulting from different sources. From the results of the quantitative analysis, the following conclusions can be drawn. Firstly, the completeness and correctness of extracted centerlines using the proposed method can be over 84% and 68%, respectively, on the test images. Secondly, for the segmented road maps obtained by machine recognition, the proposed method has the

Conclusions
In this paper, a general method for road centerline extraction from remote sensing images is proposed based on perceptual grouping and spline fitting. The proposed method is suitable for noisy data and is competitive in computational efficiency. Furthermore, experiments presented the merits of the proposed method in junction construction, gap connection, and the adaptability for road segmentation resulting from different sources.
From the results of the quantitative analysis, the following conclusions can be drawn. Firstly, the completeness and correctness of extracted centerlines using the proposed method can be over 84% and 68%, respectively, on the test images. Secondly, for the segmented road maps obtained by machine recognition, the proposed method has the best QL indexes. Moreover, the McNemar tests demonstrate that there are significant differences at a 95% confidence level when the proposed method is compared with the methods of MT, ZST, MARS, and NMS, except for the case of the proposed method vs. NMS in test image 2. Thirdly, for the manually labelled road maps, the MT method shows the best performances in terms of both accuracy and execution time. The advantage of our method is that it can provide a more precise road junction shape, and is not too different from the MT method in terms of accuracy.
However, it can also be noted that some algorithm parameters of the proposed method have to be determined manually according to the available prior knowledge of image resolution, and the maximum road width based on practical applications. Thus, it is desirable to develop an automatic threshold determination approach for parameter setting in the future. Nevertheless, experimental results indicate that the proposed method has excellent application potentials, such as road data update, the fusion of road networks, and the road vectorization and generalization in GISs.