1. Introduction
Synthetic Aperture Radar(SAR) can penetrate clouds and collect data under any weather at any time. Therefore, it has been widely used in various fields. With the rapid development of SAR technology in decades, high-resolution synthetic aperture radar images have become an essential source of information due to their broader coverage and increased ground details. Roads are an indispensable part of the modern transportation network, and they have a pivotal position both geographically and economically. Automatic road extraction from SAR images has become a research hotspot. It has a wide range of urban planning applications, disaster prevention and mitigation, and geographic information system updates.
In recent decades, many methods of road extraction from SAR images have been proposed [
1,
2,
3,
4,
5,
6,
7]. The road extraction task can be divided into two subtasks: road detection and road centerline extraction. The road extraction methods in previous works separated these two tasks, which can be roughly summarized into three steps. Firstly, the road in the SAR image is detected according to the extracted image features. Then, the road centerline is obtained by skeletalizing the road detection result. Finally, the road centerline is converted into a graph, which then is topologically optimized to obtain the road centerline network. The methods that separate the two tasks ignore the correlation between the two tasks, and the errors of road detection affect the result of road centerline extraction. The work [
8] achieved a better performance using a conditional random field model to simultaneously perform the three steps: road detection, skeletonization, and topology optimization.
Since deep convolutional neural network (DCNN) has shown an outstanding performance in the 2012 ImageNet image classification challenge, DCNN [
9] has made great progress in various traditional computer vision tasks such as image classification [
10,
11,
12,
13,
14] and image semantic segmentation [
15,
16,
17,
18,
19]. Deep learning has been widely used in the fields of remote sensing [
20,
21,
22,
23,
24]. However, the works that applied deep learning to SAR images [
25,
26] are relatively few. One of the reasons for this is that the unique characteristics of SAR images make labeling time-consuming and labor-intensive. Reference [
27] proposed a method to automatically label buildings from SAR images, which significantly reduces labeling difficulty. At the same time, reference [
27] also applied the architecture combined with FCN used to extract features with CRF-RNN that exploited spatial information to extract buildings from SAR images. Although there have been fewer minimal works that apply deep learning for SAR imagery to date, it is undeniable that deep learning has colossal application potential for SAR image processing. A DisDBN which combined ensemble learning with a deep belief network to learn discriminant features is proposed in [
28] for SAR image classification. Reference [
29] let extracted six-channel covariance matrix fed into DCNN for PolSAR image classification. Reference [
30] proposed a deep supervised and contractive neural network for SAR image classification. Reference [
31] extended the semantic segmentation network from the real-value domain to the complex-valued domain, which can exploit the unique information of SAR data. Reference [
32] proposed a new fully convolutional neural network that can be trained in an end-to-end for semantic segmentation of polarimetric SAR imagery in the complex land cover ecosystem. Reference [
33] realized smooth classification with a small training set with four classes by deep transfer learning. Reference [
34] extracted multiscale trained features using a multiscale CNN model to detect built-up areas from SAR images. Reference [
35] firstly applied a deep fully convolutional neural network to segment road from SAR images.
There are two challenges that restrict the accuracy of road extraction from SAR images: (1) the road in SAR images is usually modeled as dark elongated areas surrounded by bright edges which are easily confused with other objects such as rivers, shades of trees, shadows of buildings, etc. (2) Speckle noise in SAR images seriously degrades image quality and interferes with road extraction from SAR images. A deep convolutional neural network with a large receptive field can effectively extract contextual information. This characteristic can be used to distinguish the road from other similar objects. A deep convolutional neural network reduces the influence of speckle noise by continuously performing convolution operations on the input images. Therefore, in this paper, we will introduce a deep learning framework to extract roads from SAR imagery.
The road extraction task includes two subtasks: road detection task and road centerline extraction task. Most of the previous works relied on multistage-based learning methods to extract roads. These methods obtain road centerlines by a post-processing step of thin road detection predicted by CNN. The disadvantage of these methods is that imperfect road detection results lead to road centerline extraction results with low connectivity. There is a symbiotic relationship between the road detection task and road centerline extraction task. The road detection task and road centerline extraction task can promote each other. The road detection task can provide detection cues for the road centerline extraction task to constraint road centerlines, which can avoid spurious parts. The road centerline extraction task can motivate the road detection task to pay more attention to the key points of roads, which can enhance the road connectivity. In order to make full use of this relationship, our proposed network learns the road detection task and road centerline extraction task simultaneously under the multitask learning scheme.
For road centerline extraction from raw image data, the previous works can be roughly divided into two categories: one is based on classification methods, and the other is based on regression methods. Reference [
7,
36,
37] extracted road centerline by a classification-based method. The features of pixels on the road centerline are similar to features of pixels adjacent to the centerline, and the features of pixels far from the centerline of the road are completely different from the pixels in the centerline of the road. However, the errors caused by the misclassification of pixels adjacent to the centerline of the road are the same as those caused by the misclassification of pixels far from the centerline of the road, which is unreasonable. In practice, the closer the pixel is to the centerline of the road, the more we can tolerate its classification errors. This contradiction makes it difficult for the classification-based to cause the to network converge to a better result. Reference [
38] first proposed method-based regression, which learned a designed function whose returns value decreases with the distance from the pixel to the centerline. However, due to abnormal, i.e., annotation errors, a deep network for regression is relatively unstable, and the network trained by MSE loss will not converge to a satisfactory global solution. Reference [
39] learned the map of distance from each pixel to the nearest boundary by training a multi-class classification network, which ignores the ranking relation between different distance classes. To avoid the above problems, we exploit the method based on ordinal regression to learn the discrete normalized distance labels. We use the ordinal loss, minimized to learn network parameters of the road centerline task.
In the real world, the roads have unique topology properties. Previous works usually applied topology prior to using variational and Markov random field-based methods [
36,
40,
41,
42]. Reference [
40] imposed a topology constraint by high-order CRF, in which high-order cliques connect superpixels of the road network. Reference [
36] represented the road network as a sequence of graph structures and found an optimal subgraph by integer programming. These previous works generally employed road topology optimization as post-processing, which cannot remove large spurious parts and connect large gaps. Recently, some approaches guaranteed a perfect topology of the extracted road by minimizing a topology-preserving loss function [
43,
44]. Reference [
43] stated that only using pixel-wise loss functions is not enough for curvilinear detection and proposes the topology-aware loss function defined by the selection of the filter responses of pretrained VGG19 to penalty topology errors. Reference [
44] adopted a loss function based on Persistent Homology, which is continuous. Neither VGG19 nor Betti are specifically designed for road extraction tasks, so their penalty for topology errors of extracted road network is limited To solve the above-mentioned problems, a new road-topology loss is specially designed for the road extraction task, which can reduce the topology errors. Our main contributions are as follows:
Different from the previous methods for road extraction from SAR imagery, we detect the road and extract road centerline simultaneously. This multitask learning scheme can exploit the correlation between road detection task and road centerline extraction task;
For the road extraction task, we build our dataset with TerraSAR-X images, which cover urban, suburbs and rural areas. Our experiments are carried out based on this dataset. Our experimental results show that our proposed framework can achieve a better road extraction performance;
For the road centerline extraction task, we first convert the road centerline extraction problem into the problem of discrete normalized distance label prediction, which can be solved by training an ordinal regressor;
Consider the special topology feature of road network, we propose a new road-topology loss which is designed to reduce the topology error of road extraction including spurious parts and gaps.
The remainder of our paper is organized as follows. We present the proposed method in detail in
Section 2. In
Section 3, we quantitatively and qualitatively analyzed the superior performance of our method compared with baseline methods. We discuss the stability of different methods with various binarization thresholds in
Section 4. Finally, we conclude the whole paper in
Section 5.
2. Materials and Methods
Figure 1 illustrates our road extraction framework. For road extraction, our proposed network learns the road detection task and the road centerline extraction task jointly under a multitask learning scheme. As shown in
Figure 1, our framework has two branches: the road detection branch and the road centerline extraction branch. The encoder of the two branches is shared for feature extraction, which establishes a connection between the two branches. In this section, we first separately introduce how our network performs road detection tasks and road centerline extraction tasks. Next, the definition of our first proposed road-topology loss function is given. Finally, we introduced how our multitask learning framework simultaneously learns the road detection task and road centerline extraction task based on ordinal regression, using the road-topology loss that we first proposed.In the following discussion, we let
be the
input image, let
be the corresponding ground truth, with 1 indicating pixels on the road and 0 indicating background pixels, and let
be the predicted probability map of the road. We let the mini-batch be
B, let
i be a pixel in
I,
is label of pixel
i. The predicted probability that pixel
i is on the road is denoted by
.
2.1. Road Detection
The road detection task aims to detect the roads from SAR imagery. The output of the road detection task is the binary image, in which the pixels seen on the roads are 1 and the others are 0. In practice, most pixels of the SAR imagery belong to the non-road regions. As a result, there is a label-unbalancing problem in the road detection task. To overcome this problem, we use weighted cross-entropy loss proposed in [
45]. The weighted cross-entropy loss of
I is
Next, we will present the weights in the weighted cross-entropy loss. Let be the number of the images. The loss weight for road pixels is and the loss weight for non-road pixels is .
2.2. Road Centerline Extraction
For road centerline extarction, the classification-based approach learn a function
such that
where
is the feature of pixel
i. The method based on regression is to learn a regressor
whose values decrease monotonically as the distance of
i to the centerline increase. Especially in [
38], the regressor
is such that
where
is the metric distance from pixel
i to the closest the pixel on the centerline and
is
, where
s is the size of local neighbourhoods that are used to compute feature vector
. Our proposed method is based on ordinal regression, which is different from both. In the remainder of this subsection, we first model the road centerline extraction problem as the discrete normalized distance label prediction problem. We then describe how to predict discrete normalized distance labels by learning an ordinal regressor.
Roads are surrounded by bright edges in high-resolution SAR images. As a result, we can predict the distance
from any pixel
i to the nearest road edge. However, the probability that the pixel
i is on the centerline of the road is not proportional to the distance from this pixel
i to the nearest road edge
. This is due to the width of the road is various, which is depicted in
Figure 2. As a result, we predict the normalized distance from pixel
i to the nearest road edge. The normalized distance
is defined as follows
where
is the road width of pixel
i. In particular,
is proportional to the probability of
i on the centerline of the road. Meanwhile, if
i is on the road centerline,
is the local maxima value along the direction that is perpendicular to the direction of the road. We further quantize each
using the thresholds
into one of the
intervals. The reason we quantify the normalized distance
is that the direct training of the deep network for regression is relatively unstable, because outliers( annotation errors) cause large error terms, making it difficult for the network to converge and lead to unstable predictions [
39]. After quantization, each
i is given a discrete normalized distance label
, such as
We can predict discrete normalized distance label prediction by a typical method based on multi-class classification. However, the ordinal information between discrete normalized distance labels will be ignored. [
46] first combines ordinal regression with DCNNs to address the age estimation, which transforms an ordinal regression problem into a series of binary classification sub-problems to achieve sequential age regression tasks, thus taking accounting for the fact that the set of ages is well-ordered. Therefore, we use ordinal regression in [
46] to solve the discrete normalized distance label prediction problem and modify the ordered loss to adapt to the road centerline extraction task.
Next, we will introduce the ordinal regression and the ordinal loss used in this paper in detail. Let
denote the feature extractor of network that is used to extract the road centerlines and the parameters of
are denoted by
.
is input image. One of the pixels in
I is denoted as
i. The feature map of
I and the feature vector of pixel
i for the road centerlne extarction task are
and
, respectively.
is the last layer of the network for the road centerline extarction task, which is used for ordinal regression. Its parameters are given by
where
is the weight vector. Ordianl output of
I and ordianl output vector of
i are presented by
and
(
), respectively. With the softmax activation function that is also used in [
46], the probability
that the predicted label of
i is greater than
k is calculated as
According to the method of calculating the ordinal loss in [
46], the pixel-level ordinal loss of pixel
i is then given by
where
and
are used to solve the unbalanced classes problem.
The ordinal loss of
I is defined as the sum of the ordianl loss of each
i in image
I and given by
The advantage of the ordinal loss is that the greater the difference between the predicted label
and the true label
, the greater the ordinal loss. We use iterative optimization algorithm to minimize
. The partial derivative of
with respect to
, is
where
and
is the indicator function, where
and
. We can update the parameters of the network for road centerline extraction task through backward propagation.
In the test phase, we calculate
, which is the mean of
for each pixel
i.
is given by
We observe that is proportional to the normalized distance . We can regard as the predicted probability that the pixel i is on the road centerline. Let be the centerline probability map of I, where . According to Formula (4), we know that, from the normalized distance map , the necessary and sufficient condition for pixel i on the road centerline is that is a local maximum along the direction perpendicular to the direction of the road. However, if only by judging whether is a local maximum along the direction perpendicular to the direction of the road to infer whether pixel i is on the road centerline, some non-road regions will be extracted as the road centerline. As a result, we first set the value of that is less than T to zero, then we apply the canny-like non-maximum suppression algorithm to to obtain the road centerline.
2.3. Road-Topology Loss
In practice, the cross-entropy loss is widely used in various segmentation tasks, such as semantic segmentation and instance segmentation. Cross-entropy loss is a pixel-wise loss, which is completely local and does not take the special and complex topological characteristics of the road into account. Such a loss penalizes the mistake of each pixel equally and independently, regardless of the effect of the error on geometry. However, in practice, we find that the pixels closer to the centerline of the road are more important. because the misclassification of these pixels will cause serious topology errors, such as gaps and spurious parts, To penalize the gaps in the prediction of road detection and the spurious parts in the road centerline extraction prediction, we propose a new road-topology loss .
Next, we will give the details defining the road- topology loss. To measure the connectivity of road detection prediction, we define the connectivity metric as
where
is a prediction map of road detection,
(
) is discete normalized distance label of
I and
is an operation to calculate the sum of matric. Similarly, we define the differentiable correctness metric to measure the correctness of road centerline extraction prediction as
where
is ground truth of road detection,
is predicted road centerline probability map of
I. We observe that the measure
is susceptible to gaps in the road detection prediction while the measure
is susceptible to spurious parts in the road centerline extarction prediction. Finally, we define road-topology metric
as the harmonic average between connectivity metric
and differentiable correctness metric
as
The road-topology metric measures the connectivity and correctness of the road extraction result at the same time.
In order to maximize the road-topology metric in CNNs in an end-to-end manner, we define our road-topology loss
as
is calculated directly from the raw prediction
and
without thresholding. As a result,
is differentiable over the prediction
and
which can be integrated into CNN. In this paper, we use AdamW optimizer to minimize the road-topology loss.The partial derivatives of loss
over network activation
and
at the location of pixel
i are
2.4. Multitask Learning
In our road extraction framework, the input image
I feeds into the shared encoder to extract features. The feature maps are, respectively, input into the decoders corresponding to the two tasks to obtain road detection prediction and road centerline extraction prediction. As shown in
Figure 1, the prediction of the road detection task
and the ground truth
Y are used to calculate the weighted cross-entropy loss that can be minimized to update the parameters of the road detection network. The prediction of the road centerline extraction task
and the discrete normalized distance label map
L are used to calculate the ordinal loss that can be minimized to update the parameters of the road centerline extraction network.
Y,
,
, and
L are used to calculate our proposed road-topology loss that combines the prediction of road detection and the prediction of road centerline extraction. Our proposed road topology loss makes full use of the correlation between the two tasks and can be minimized to make the two tasks promote each other. The entire loss function is the sum of the cross-entropy loss, the ordinal loss, and road-topology loss. By minimizing the entire loss function, the parameters of the road detection network and the parameters of the road centerline extraction network can be updated simultaneously, which means our framework learns teh road detection task and road centerline extraction task. The loss of mini-batch is calculated by