Road Extraction in SAR Images Using Ordinal Regression and Road-Topology Loss

: The road extraction task is mainly composed of two subtasks, namely, road detection and road centerline extraction. As the road detection task and road centerline extraction task are strongly correlated, in this paper, and 10.9% in the Quality metric on our test dataset. show that the the completeness


Introduction
Synthetic Aperture Radar(SAR) can penetrate clouds and collect data under any weather at any time. Therefore, it has been widely used in various fields. With the rapid development of SAR technology in decades, high-resolution synthetic aperture radar images have become an essential source of information due to their broader coverage and increased ground details. Roads are an indispensable part of the modern transportation network, and they have a pivotal position both geographically and economically. Automatic road extraction from SAR images has become a research hotspot. It has a wide range of urban planning applications, disaster prevention and mitigation, and geographic information system updates.
In recent decades, many methods of road extraction from SAR images have been proposed [1][2][3][4][5][6][7]. The road extraction task can be divided into two subtasks: road detection and road centerline extraction. The road extraction methods in previous works separated these two tasks, which can be roughly summarized into three steps. Firstly, the road in the SAR image is detected according to the extracted image features. Then, the road centerline is obtained by skeletalizing the road detection result. Finally, the road centerline is converted into a graph, which then is topologically optimized to obtain the road centerline network. The methods that separate the two tasks ignore the correlation between the two tasks, and the errors of road detection affect the result of road centerline extraction. The work [8] achieved a better performance using a conditional random field model to simultaneously perform the three steps: road detection, skeletonization, and topology optimization.
Since deep convolutional neural network (DCNN) has shown an outstanding performance in the 2012 ImageNet image classification challenge, DCNN [9] has made great progress in various traditional computer vision tasks such as image classification [10][11][12][13][14] and image semantic segmentation [15][16][17][18][19]. Deep learning has been widely used in the fields of remote sensing [20][21][22][23][24]. However, the works that applied deep learning to SAR images [25,26] are relatively few. One of the reasons for this is that the unique characteristics of SAR images make labeling time-consuming and labor-intensive. Reference [27] proposed a method to automatically label buildings from SAR images, which significantly reduces labeling difficulty. At the same time, reference [27] also applied the architecture combined with FCN used to extract features with CRF-RNN that exploited spatial information to extract buildings from SAR images. Although there have been fewer minimal works that apply deep learning for SAR imagery to date, it is undeniable that deep learning has colossal application potential for SAR image processing. A DisDBN which combined ensemble learning with a deep belief network to learn discriminant features is proposed in [28] for SAR image classification. Reference [29] let extracted six-channel covariance matrix fed into DCNN for PolSAR image classification. Reference [30] proposed a deep supervised and contractive neural network for SAR image classification. Reference [31] extended the semantic segmentation network from the real-value domain to the complexvalued domain, which can exploit the unique information of SAR data. Reference [32] proposed a new fully convolutional neural network that can be trained in an end-to-end for semantic segmentation of polarimetric SAR imagery in the complex land cover ecosystem. Reference [33] realized smooth classification with a small training set with four classes by deep transfer learning. Reference [34] extracted multiscale trained features using a multiscale CNN model to detect built-up areas from SAR images. Reference [35] firstly applied a deep fully convolutional neural network to segment road from SAR images.
There are two challenges that restrict the accuracy of road extraction from SAR images: (1) the road in SAR images is usually modeled as dark elongated areas surrounded by bright edges which are easily confused with other objects such as rivers, shades of trees, shadows of buildings, etc. (2) Speckle noise in SAR images seriously degrades image quality and interferes with road extraction from SAR images. A deep convolutional neural network with a large receptive field can effectively extract contextual information. This characteristic can be used to distinguish the road from other similar objects. A deep convolutional neural network reduces the influence of speckle noise by continuously performing convolution operations on the input images. Therefore, in this paper, we will introduce a deep learning framework to extract roads from SAR imagery.
The road extraction task includes two subtasks: road detection task and road centerline extraction task. Most of the previous works relied on multistage-based learning methods to extract roads. These methods obtain road centerlines by a post-processing step of thin road detection predicted by CNN. The disadvantage of these methods is that imperfect road detection results lead to road centerline extraction results with low connectivity. There is a symbiotic relationship between the road detection task and road centerline extraction task. The road detection task and road centerline extraction task can promote each other. The road detection task can provide detection cues for the road centerline extraction task to constraint road centerlines, which can avoid spurious parts. The road centerline extraction task can motivate the road detection task to pay more attention to the key points of roads, which can enhance the road connectivity. In order to make full use of this relationship, our proposed network learns the road detection task and road centerline extraction task simultaneously under the multitask learning scheme.
For road centerline extraction from raw image data, the previous works can be roughly divided into two categories: one is based on classification methods, and the other is based on regression methods. Reference [7,36,37] extracted road centerline by a classificationbased method. The features of pixels on the road centerline are similar to features of pixels adjacent to the centerline, and the features of pixels far from the centerline of the road are completely different from the pixels in the centerline of the road. However, the errors caused by the misclassification of pixels adjacent to the centerline of the road are the same as those caused by the misclassification of pixels far from the centerline of the road, which is unreasonable. In practice, the closer the pixel is to the centerline of the road, the more we can tolerate its classification errors. This contradiction makes it difficult for the classification-based to cause the to network converge to a better result. Reference [38] first proposed method-based regression, which learned a designed function whose returns value decreases with the distance from the pixel to the centerline. However, due to abnormal, i.e., annotation errors, a deep network for regression is relatively unstable, and the network trained by MSE loss will not converge to a satisfactory global solution. Reference [39] learned the map of distance from each pixel to the nearest boundary by training a multi-class classification network, which ignores the ranking relation between different distance classes. To avoid the above problems, we exploit the method based on ordinal regression to learn the discrete normalized distance labels. We use the ordinal loss, minimized to learn network parameters of the road centerline task.
In the real world, the roads have unique topology properties. Previous works usually applied topology prior to using variational and Markov random field-based methods [36,[40][41][42]. Reference [40] imposed a topology constraint by high-order CRF, in which high-order cliques connect superpixels of the road network. Reference [36] represented the road network as a sequence of graph structures and found an optimal subgraph by integer programming. These previous works generally employed road topology optimization as post-processing, which cannot remove large spurious parts and connect large gaps. Recently, some approaches guaranteed a perfect topology of the extracted road by minimizing a topology-preserving loss function [43,44]. Reference [43] stated that only using pixel-wise loss functions is not enough for curvilinear detection and proposes the topology-aware loss function defined by the selection of the filter responses of pretrained VGG19 to penalty topology errors. Reference [44] adopted a loss function based on Persistent Homology, which is continuous. Neither VGG19 nor Betti are specifically designed for road extraction tasks, so their penalty for topology errors of extracted road network is limited To solve the above-mentioned problems, a new road-topology loss is specially designed for the road extraction task, which can reduce the topology errors. Our main contributions are as follows:

1.
Different from the previous methods for road extraction from SAR imagery, we detect the road and extract road centerline simultaneously. This multitask learning scheme can exploit the correlation between road detection task and road centerline extraction task; 2.
For the road extraction task, we build our dataset with TerraSAR-X images, which cover urban, suburbs and rural areas. Our experiments are carried out based on this dataset. Our experimental results show that our proposed framework can achieve a better road extraction performance; 3.
For the road centerline extraction task, we first convert the road centerline extraction problem into the problem of discrete normalized distance label prediction, which can be solved by training an ordinal regressor; 4.
Consider the special topology feature of road network, we propose a new roadtopology loss which is designed to reduce the topology error of road extraction including spurious parts and gaps.
The remainder of our paper is organized as follows. We present the proposed method in detail in Section 2. In Section 3, we quantitatively and qualitatively analyzed the superior performance of our method compared with baseline methods. We discuss the stability of different methods with various binarization thresholds in Section 4. Finally, we conclude the whole paper in Section 5. Figure 1 illustrates our road extraction framework. For road extraction, our proposed network learns the road detection task and the road centerline extraction task jointly under a multitask learning scheme. As shown in Figure 1, our framework has two branches: the road detection branch and the road centerline extraction branch. The encoder of the two branches is shared for feature extraction, which establishes a connection between the two branches. In this section, we first separately introduce how our network performs road detection tasks and road centerline extraction tasks. Next, the definition of our first proposed road-topology loss function is given. Finally, we introduced how our multitask learning framework simultaneously learns the road detection task and road centerline extraction task based on ordinal regression, using the road-topology loss that we first proposed.In the following discussion, we let I ∈ R H×W be the H × W input image, let Y ∈ {0, 1} H×W be the corresponding ground truth, with 1 indicating pixels on the road and 0 indicating background pixels, and letŶ ∈ {0, 1} H×W be the predicted probability map of the road. We let the mini-batch be B, let i be a pixel in I, y i is label of pixel i. The predicted probability that pixel i is on the road is denoted byŷ i .

Road Detection
The road detection task aims to detect the roads from SAR imagery. The output of the road detection task is the binary image, in which the pixels seen on the roads are 1 and the others are 0. In practice, most pixels of the SAR imagery belong to the non-road regions. As a result, there is a label-unbalancing problem in the road detection task. To overcome this problem, we use weighted cross-entropy loss proposed in [45]. The weighted crossentropy loss of I is Next, we will present the weights in the weighted cross-entropy loss.Let |B| be the number of the images. The loss weight for road pixels is w 1 = 1 |B|×H×W ∑ I∈B ∑ i∈I 1(y i == 0) and the loss weight for non-road pixels is w 0 = 1 |B|×H×W ∑ I∈B ∑ i∈I 1(y i == 1).  Figure 1. An overview of the proposed multi-task road extraction framework, which includes two parts: the Road Detection branch, Road Centerline Extraction branch. The encoder is shared between the Road Detection branch and the Road Centerline Extraction branch. The architecture will be optimized with three terms: Cross-Entropy loss for the Road Detection branch, Ordinal loss for the Road Centerline Extraction branch, and Road-Topology loss for both.

Road Centerline Extraction
For road centerline extarction, the classification-based approach learn a function y(·) such that where f i is the feature of pixel i. The method based on regression is to learn a regressor y(·) whose values decrease monotonically as the distance of i to the centerline increase. Especially in [38], the regressor y(·) is such that where D C (i) is the metric distance from pixel i to the closest the pixel on the centerline and d M is s/2, where s is the size of local neighbourhoods that are used to compute feature vector f i . Our proposed method is based on ordinal regression, which is different from both. In the remainder of this subsection, we first model the road centerline extraction problem as the discrete normalized distance label prediction problem. We then describe how to predict discrete normalized distance labels by learning an ordinal regressor.
Roads are surrounded by bright edges in high-resolution SAR images. As a result, we can predict the distance d i from any pixel i to the nearest road edge. However, the probability that the pixel i is on the centerline of the road is not proportional to the distance from this pixel i to the nearest road edge d i . This is due to the width of the road is various, which is depicted in Figure 2. As a result, we predict the normalized distance from pixel i to the nearest road edge. The normalized distance dn i is defined as follows where w i is the road width of pixel i. In particular, dn i is proportional to the probability of i on the centerline of the road. Meanwhile, if i is on the road centerline, dn i is the local maxima value along the direction that is perpendicular to the direction of the road. We further quantize each dn i using the thresholds {t 0 , t 1 , ..., t K−1 } into one of the K + 1 intervals. The reason we quantify the normalized distance dn i is that the direct training of the deep network for regression is relatively unstable, because outliers (annotation errors) cause large error terms, making it difficult for the network to converge and lead to unstable predictions [39]. After quantization, each i is given a discrete normalized distance label l i , such as We can predict discrete normalized distance label prediction by a typical method based on multi-class classification. However, the ordinal information between discrete normalized distance labels will be ignored. [46] first combines ordinal regression with DCNNs to address the age estimation, which transforms an ordinal regression problem into a series of binary classification sub-problems to achieve sequential age regression tasks, thus taking accounting for the fact that the set of ages is well-ordered. Therefore, we use ordinal regression in [46] to solve the discrete normalized distance label prediction problem and modify the ordered loss to adapt to the road centerline extraction task.
Next, we will introduce the ordinal regression and the ordinal loss used in this paper in detail. Let ϕ denote the feature extractor of network that is used to extract the road centerlines and the parameters of ϕ are denoted by Φ. I ∈ R H×W is input image. One of the pixels in I is denoted as i. The feature map of I and the feature vector of pixel i for the road centerlne extarction task are F = ϕ(I, Φ) and f i ∈ F, respectively. ψ is the last layer of the network for the road centerline extarction task, which is used for ordinal regression. Its parameters are given by Θ = {θ 0 , θ 1 , ..., θ 2K−1 } where θ j (j ∈ {0, 1, ..., 2K − 1}) is the weight vector. Ordianl output of I and ordianl output vector of i are presented by O = ψ(F, Θ) and )), respectively. With the softmax activation function that is also used in [46], the probabilityp k i = P(l i > k|F, Θ)(k ∈ {0, 1, ..., K − 1}) that the predicted label of i is greater than k is calculated aŝ According to the method of calculating the ordinal loss in [46], the pixel-level ordinal loss of pixel i is then given by where are used to solve the unbalanced classes problem.
The ordinal loss of I is defined as the sum of the ordianl loss of each i in image I and given by The advantage of the ordinal loss is that the greater the difference between the predicted labell i and the true label l i , the greater the ordinal loss. We use iterative optimiza- where k ∈ {0, 1, ..., K − 1} and η(·) is the indicator function, where η(true) = 1 and η( f alse) = 0. We can update the parameters of the network for road centerline extraction task through backward propagation.
In the test phase, we calculatep i , which is the mean ofp k i for each pixel i.p i is given bŷ We observe thatp i is proportional to the normalized distance dn i . We can regardp i as the predicted probability that the pixel i is on the road centerline. LetP be the centerline probability map of I, wherep i ∈P. According to Formula (4), we know that, from the normalized distance map D N , the necessary and sufficient condition for pixel i on the road centerline is that dn i is a local maximum along the direction perpendicular to the direction of the road. However, if only by judging whether p i is a local maximum along the direction perpendicular to the direction of the road to infer whether pixel i is on the road centerline, some non-road regions will be extracted as the road centerline. As a result, we first set the value ofP that is less than T to zero, then we apply the canny-like non-maximum suppression algorithm toP to obtain the road centerline.

Road-Topology Loss
In practice, the cross-entropy loss is widely used in various segmentation tasks, such as semantic segmentation and instance segmentation. Cross-entropy loss is a pixel-wise loss, which is completely local and does not take the special and complex topological characteristics of the road into account. Such a loss penalizes the mistake of each pixel equally and independently, regardless of the effect of the error on geometry. However, in practice, we find that the pixels closer to the centerline of the road are more important. because the misclassification of these pixels will cause serious topology errors, such as gaps and spurious parts, To penalize the gaps in the prediction of road detection and the spurious parts in the road centerline extraction prediction, we propose a new road-topology loss L T .
Next, we will give the details defining the road-topology loss. To measure the connectivity of road detection prediction, we define the connectivity metric as whereŶ ∈ [0, 1] H×W is a prediction map of road detection, L ∈ {0, 1, ..., K} H×W (l i ∈ L) is discete normalized distance label of I and | | is an operation to calculate the sum of matric. Similarly, we define the differentiable correctness metric to measure the correctness of road centerline extraction prediction as where Y ∈ {0, 1} H×W is ground truth of road detection,P ∈ [0, 1] H×W (p i ∈P) is predicted road centerline probability map of I. We observe that the measure T con is susceptible to gaps in the road detection prediction while the measure T cor is susceptible to spurious parts in the road centerline extarction prediction. Finally, we define road-topology metric T Road as the harmonic average between connectivity metric T con and differentiable correctness metric T cor as T Road (Ŷ,P, Y, L) = 2 · T con · T cor T con + T cor .
The road-topology metric measures the connectivity and correctness of the road extraction result at the same time.
In order to maximize the road-topology metric in CNNs in an end-to-end manner, we define our road-topology loss L T as L T is calculated directly from the raw predictionŶ andP without thresholding. As a result, L T is differentiable over the predictionŶ andP which can be integrated into CNN.
In this paper, we use AdamW optimizer to minimize the road-topology loss.The partial derivatives of loss L T over network activationŶ andP at the location of pixel i are

Multitask Learning
In our road extraction framework, the input image I feeds into the shared encoder to extract features. The feature maps are, respectively, input into the decoders corresponding to the two tasks to obtain road detection prediction and road centerline extraction prediction. As shown in Figure 1, the prediction of the road detection taskŶ and the ground truth Y are used to calculate the weighted cross-entropy loss that can be minimized to update the parameters of the road detection network. The prediction of the road centerline extraction taskŶ and the discrete normalized distance label map L are used to calculate the ordinal loss that can be minimized to update the parameters of the road centerline extraction network. Y,Ŷ,L, and L are used to calculate our proposed road-topology loss that combines the prediction of road detection and the prediction of road centerline extraction. Our proposed road topology loss makes full use of the correlation between the two tasks and can be minimized to make the two tasks promote each other. The entire loss function is the sum of the cross-entropy loss, the ordinal loss, and road-topology loss. By minimizing the entire loss function, the parameters of the road detection network and the parameters of the road centerline extraction network can be updated simultaneously, which means our framework learns teh road detection task and road centerline extraction task. The loss of mini-batch is calculated by

Dataset
In this subsection, we aim to present the dataset used in this paper. There is no public dataset that is applicable to our research. We create our dataset by using high-resolution TerraSAR-X images that are obtained by striped mode. As shown in Table 1, we label the roads in two SAR images, the coverage areas of which include urban, suburb, and rural areas. The Google Earth Maps of study area are shown in Appendix A. As the dataset is applied to extract roads, we only label all roads in the region where the road network is dense. Our labeled area was split into a training and test set as follows: the upper 80 % of the area (435 × 1024 × 1024 pixels) was used for training, and the lower 20% (104 × 1024 × 1024 pixels ) for testing. We use the raw SAR intensity image without any preprocessing. There are speckle noise in SAR images. Our train set contains raw SAR patches, the ground truth of road, and the ground truth of discrete labels. Algorithm 1 describes how road centerline and road width ground truth can be outlined. Figure 3 shows a sample in the train set. There is the only ground truth of road and ground truth of road centerline in the test set. if d i is local maximum along the direction or i then 7: i is on the road centerline: c i = 1 8: the road width of pixel i: w i = 2 * d i 9: else 10: i is not on the road centerline: c i = 0 11: the road width of pixel i: w i = 2 * d j , where d j is the local maximum starting from i along the direction o i in the distance map D.

Evaluation Metrics
To assess the qualitative performance in both road detection task and road road centerline extraction task, we apply the metrics that are introduced in [47]. In this paper, Our framework can detect road and extract road centerline simultaneously. To highlight the characteristics of our method, we propose a series of metrics to evaluate the performance of road extraction task.

Road Detection
The metrics that are employed to evaluate the performance of our approach for road detection task are precision, recall, F1-score, and intersection over union (IoU) that is the ratio of the intersection of prediction and groundtruth to the union of prediction and groundtruth. Precision (P) measures the ratio of the number of the pixels which are labeled as road pixels in the ground truth and are predicted as road pixels to the number of pixels that are inferred as road pixels. Recall (R) calculates the ratio of the number of the pixels which are labeled as road pixels in the ground truth and are predicted as road pixels to the number of pixels that are labeled as road pixels. F1-score(F1 rd ) is used to balance precision and recall, which is a harmonic average between precision and recall. IoU is the ratio of the intersection and union of the true label and predicted result, which can trade-off between recall and precision. Specifically,the four metrics are defined as: where TP is true positive, FP is false positive and FN is false negative. As there is a deviation between the manually labelled roads and the real roads, we relax metrics using the buffer method given in [48]. Specifically, if the regions in the prediction result are within the ρ pixels range, they are regarded as matching regions. In this paper, we set ρ = 2.

Road Centerline Extraction
Due to the differences between the road detection task and road centerline extarction task,it is better to use different metrics for the road centerline task. Next, we present the metrics that are used for the road centerline extarction task. We calculate completeness, correctness, quality, and F1-score to assess the performance of our approach for the road centerline extraction task. As it was difficult to directly compare the pixel difference between the extracted centerline and the ground truth, we introduced a buffer-based evaluation for the road centerlines. Completeness (COM) is the ratio of the length of reference road centerline that lies within the buffer around the around the extracted centerline to the length of reference centerline. Correctness (COR) is the ratio of the extracted road centerline that lies within the buffer around the reference centerline to the length of the extracted centerline. Quality (Q) is a comprehensive metric that combines the completeness and correctness. F1-score (F1 rce ) is used to balance COM and COR, which is a harmonic average between COM and COR. The four metrics can be calculated as

Road Extraction
From the definition, we know that the metrics of road detection: precision (P), recall (R), IoU, F1-score (F1 rd ) correspond to the the metrics of road centerline extraction: correctness (COR), completeness (COM), quality (Q), F1-score (F1 rce ), respectively. As a result, we design four metrics: precision for road extraction (P re ), recall for road extraction (R re ), quality for road extraction (Q re ), F1-score for road extraction (F1 re ) to evaluate the performance for road extraction. The metrics for road extarction are given as where α and β can be set according to the importance of the road detection task and the road centerline extraction task. If the evaluator pays more attention to road detection task, then α > β. Otherwise, we set α < β. If the road detection task and road centerline extraction task are equally important, then we set α = β.

Implementation Details
In this subsection, we will specifically present the values of all hyperparameters in our experiment. We adopt the Pytorch framework to implement networks trained on a single NVIDIA Tesla V100 with 16G memory using a batch size of one. We train the networks with AdamW optimizer with the initial learning rate of 0.001, and we drop the learning rate by the factor of 0.1 at every ten epochs. We applied data augmentation to the training set with image rotation and horizontal and vertical flips. The augmented training set is composed of 3480 SAR images, the size of which is 1024 × 1024. For road centerline extraction, we set the threshold T as 0.25. As T only works in the preprocessing step of our test stage, this will not affect the final road centerline extraction result too much. If the parameter T is set too large, there will be more discontinuities in the road centerline extraction results. In practice, we do not know the optimal value of T for a test image without ground truth, and users can set T according to their needs. If users do not want spurious parts in the results, they can set a larger T. Conversely, if the user prefers to ensure the completeness of the road centerline extraction result, T can be set to a smaller value. In this paper, we set the T to 0.25 that is neither too big nor too small to remove some spurious parts and guarantee the connectivity of the road centerlines.

Results
In this subsection, we first introduce the networks that are used as baseline methods in this paper. We choose three fully convolutional neural networks (FCNNs), including LinkNet34 [48], DLinkNet34 [49], and DeeplabV3plus [50] as baseline methods to study the performance of our method. These three networks were implemented to extract roads from optical remote sensing imagery. To adapt to the size of images in our dataset, we set the dilation rates of dilated conventional operations in DeeplabV3plus to [2,4,8,16]. To verify that both the proposed multitask architecture and loss function are effective, we will perform three sets of comparative experiments. Each set of comparative experiments contains three methods, including the baseline method, the method I, the method II. As shown in Figure 4, (a) is the network architecture of the baseline method, while (b) and (c) are the network architectures of the method I and the method II, respectively, which both have two branches: road detection branch and road centerline extraction branch. The methods I in the three sets of comparative experiments are, respectively, abbreviated as LinkNet34+, DLinkNet34+, and DeeplabV3plus+, which are obtained by modifying the networks of the baseline methods with dual and identical decoders having shared encoders. The methods II in the three sets of comparative experiments are, respectively, abbreviated as LinkNet34++, DLinkNet34++, and DeeplabV3plus++, which use the road topology loss to update the network parameters and have the same network architecture as the methods I.

Comparative Evaluation on Road Detection
To compare the performance of road detection, all the methods are evaluated based on the test samples in the test set for road detection. For qualitative comparison, we show the results produced by all methods based on example images depicted in Figure 5. The quantitative comparisons are reported in Table 2. As shown in Figure 5, the methods with multitask learning architecture and using roadtopology loss function generally perform better than the baseline methods. The baseline methods miss road regions in many places, i.e., the false negative (green) part is large, which leads to the poor connectivity of road detection result. With the learning road detection task and road centerline extraction based on ordinal regression task jointly, the false negative part is smaller. By further using the road-topology loss function that can penalize discontinuous parts of the detected road, the performance of road connectivity is improved. Table 2 presents the comparative quantitative evaluation measured in terms of P, R, IoU, F1. In Table 2, the best value of performance is presented with bold style. We observe from the Table 2 that the proposed methods outperform baselines in terms of three metrics, i.e., R, IoU, F1. Though the precision of proposed methods is less than the baseline, the slight decrease in P is insignificant compared to the large increase in the other three metrics.

Comparison of Road Centerline Extraction
In this subsection, we present the result of a comparative evaluation based on the centerline extraction task. For LinkNet34, DLinkNet34, DeeplabV3plus, we applied the morphological thinning algorithm [51] to the road detection results so as to extract road centerline. Figure 6 illustrates the centerlines identified by different methods. From Figure 6, we can see that more discontinuities are observed in road centerline extraction results of baseline methods. As shown in Figure 6, by learning the road centerline extraction task based on ordinal regression to extract road centerline, false negative parts in the road centerline extraction results decrease, and the connectivity of the centerline network is enhanced. Figure 6 also shows that By using road topology loss, the quality of road centerline extraction results has been further improved. The one reason is that our method can learn centerline extraction task that is based on ordinal regression and the road detection task simultaneously under a multitask learning scheme, whereas the prediction of one subtask can bootstrap the performance of solving another subtask. The other reason is that minimizing our firstly proposed road-topology loss helps eliminate topology errors of road extraction results, which include spurious parts and gaps. Table 3 summarized the results evaluated by COM,COR,Q,and F1 metrics. From Table 3, we can see that the methods with centerline extraction based on ordinal regression perform better than baseline methods. The methods with centerline extraction based ordinal regression and road-topology loss achieve the best performance.

Comparison of Road Extraction
In this subsection, we present the result of a comparative evaluation based on the road extraction task. To compare the performance of road extraction qualitatively, we adapt our proposed metrics with different values of α and β to evaluate different methods. For qualitative comparison, we show the results produced by all methods based on example images depicted in Figure 6. The quantitative comparisons are reported in Tables 4-6. From Figure 7, we can see that the poor road detection results of the baseline method lead to many discontinuities in the road centerline extraction results. As shown in Figure 7, by learning the road detection task and the road centerline extraction task based on ordinal regression to extract the road centerline, the performances of both tasks were improved. Figure 7 also shows that, by using our first proposed road-topology loss, the continuity of road detection result is enhanced, and the quality of road detection result is increased. This indicates that minimizing the road-topology loss helps eliminate topology errors in the road extraction results, which include spurious parts and gaps. Tables 4-6 summarized the results evaluated by P re , R re , Q re , and F1 re metrics. We can observe that whether the evaluator pays more attention to the road detection task or the road centerline extraction task, the method II that learns road detection task and road centerline extraction task simultaneously and adapt our proposed road-topology loss can achieve the best performance.

Discussion
In this section, we will discuss the robustness of different methods when the binarization threshold is changed. When we set different binarization thresholds, the road detection results will be different. However, in practice, we do not know the optimal binarization threshold. Figure 8 reflects the statistical characteristics of IoU and F1 across different binarization thresholds. The IoU and F1-score results of baseline methods(LinkNet34, DLinkNet34, DeeplabV3plus) are unstable with the vary of the binarization thresholds, as evidenced by the large variance in IoU and F1. In contrast, the F1 and IoU of our methods show better stability. Figure 8 also reports that, for F1 and IoU metrics, the means of the baseline methods are less than our proposed methods' means. This is due to the use of a multitask learning scheme that can make two tasks mutually promote each other, and a road-topology loss that is minimized to eliminate topology errors. In conclusion, our methods can achieve a high performance under a wider range of binarization thresholds than baseline methods.

Conclusions
In this paper, we have learned the road detection task and the road centerline extraction task jointly with a multitask learning scheme to solve the problem of the road extraction from SAR imagery. To eliminate topology errors in road extraction results, we have specially designed a road-topology loss function for road extraction, which is differentiable. Different from the centerline extraction method based on regression method or classification method, we have adapted ordinal regression to learn discrete distance labels and trained the network by minimizing ordinal loss. By using the road centerline extraction method based on ordinal regression, the network is not sensitive to incorrect labels and can converge to a satisfactory result. Using multitask learning architecture, we have made full use of the correlation between the road detection task and the road centerline extraction task by learning the two tasks at the same time. The performance of one task has been improved under bootstrapping by the other task. The test result has shown that the networks, modified as multitask architecture, perform better than baseline methods. Considering the unique topological characteristics of the road, we have proposed a new road-topology loss function to penalize spurious parts in centerline extraction results and gaps in road detection results. The results show that the proposed road-topology loss function improved the connectivity and completeness of road networks. Finally, we discussed the robustness of our method, and the result has shown that our method not only greatly improves the performance of road extraction, but is also more stable than the baseline methods.

Future Work
Although our proposed method improves the performance of road extarction task, there are still some false detections. There are two main reasons for the false detection. One reason is that the noisy speckle nature of sar images, the layover and the shadowing effects of imaging destroy the roads in the SAR image. This leads many road parts to be misssed. The other reason is that the roads in sar images can often be confused with other targets, such as railway tracks, rivers, or even tree hedges. This leads some non-road regions to be detected as the roads. In the future, we will further improve our network to eliminate the influence of these two factors.