Road Extraction in SAR Images Using Ordinal Regression and Road-Topology Loss

Wei, Xiaochen; Lv, Xiaolei; Zhang, Kaiyu

doi:10.3390/rs13112080

Open AccessArticle

Road Extraction in SAR Images Using Ordinal Regression and Road-Topology Loss

by

Xiaochen Wei

^1,2,3,

Xiaolei Lv

^1,2,3,* and

Kaiyu Zhang

^1,2,3

¹

Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, Chinese Academy of Sciences, Beijing 100190, China

²

Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China

³

School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2021, 13(11), 2080; https://doi.org/10.3390/rs13112080

Submission received: 28 April 2021 / Revised: 17 May 2021 / Accepted: 19 May 2021 / Published: 25 May 2021

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The road extraction task is mainly composed of two subtasks, namely, road detection and road centerline extraction. As the road detection task and road centerline extraction task are strongly correlated, in this paper, we introduce a multitask learning framework to detect roads and extract road centerlines simultaneously. For the road centerline extraction problem, existing works rely either on regression-based methods, or classification-based methods. The regression-based methods suffer from slow convergence and unsatisfactory local solutions. The classification-based methods ignore the fact that the closer the pixel is to the centerline, the higher our tolerance for its misclassification. To overcome these problems, we first convert the road centerline extraction problem into the problem of discrete normalized distance label prediction, which can be resolved by training an ordinal regressor. For the road extraction task, most of the previous studies apply pixel-wise loss function, for example, Cross-Entropy loss, which is not sufficient, as the road has special topology characteristics such as connectivity. Therefore, we propose a road-topology loss function to improve the connectivity and completeness of the extracted road. The road-topology loss function has two key characteristics: (i) The road-topology loss function combines road detection prediction and road centerline extraction prediction to promote the two subtasks to each other by using the correlation between the two subtasks; (ii) The road-topology loss can emphatically penalize gaps that often appear in road detection results and spurious segments that easily appear in centerline extraction results. In this paper, we select the AdamW optimizer to minimize the road-topology loss. Since there is no public dataset, we build a road extraction dataset to evaluate our method. State-of-the-art semantic segmentation networks (LinkNet34, DLinkNet34, DeeplabV3plus) are used as baseline methods to compare with two kinds of method. The first kind of method modifies the baseline method by adding the road centerline extraction task branch based on ordinal regression. The second kind of method uses the road topology loss and has the same network architecture as the first kind of method. For the road detection task, the two kinds of methods improve the baseline methods by up to

3.51 %

and

11.98 %

in

I o U

metric on our test dataset, respectively. For the road centerline extraction task, the two kinds of methods improve the baseline methods by up to

8.22 %

and

10.9 %

in the Quality metric on our test dataset.

Keywords:

road extraction; ordinal regresssion; road-topology loss; multitask learning; deep learning; SAR images

1. Introduction

Synthetic Aperture Radar(SAR) can penetrate clouds and collect data under any weather at any time. Therefore, it has been widely used in various fields. With the rapid development of SAR technology in decades, high-resolution synthetic aperture radar images have become an essential source of information due to their broader coverage and increased ground details. Roads are an indispensable part of the modern transportation network, and they have a pivotal position both geographically and economically. Automatic road extraction from SAR images has become a research hotspot. It has a wide range of urban planning applications, disaster prevention and mitigation, and geographic information system updates.

In recent decades, many methods of road extraction from SAR images have been proposed [1,2,3,4,5,6,7]. The road extraction task can be divided into two subtasks: road detection and road centerline extraction. The road extraction methods in previous works separated these two tasks, which can be roughly summarized into three steps. Firstly, the road in the SAR image is detected according to the extracted image features. Then, the road centerline is obtained by skeletalizing the road detection result. Finally, the road centerline is converted into a graph, which then is topologically optimized to obtain the road centerline network. The methods that separate the two tasks ignore the correlation between the two tasks, and the errors of road detection affect the result of road centerline extraction. The work [8] achieved a better performance using a conditional random field model to simultaneously perform the three steps: road detection, skeletonization, and topology optimization.

Since deep convolutional neural network (DCNN) has shown an outstanding performance in the 2012 ImageNet image classification challenge, DCNN [9] has made great progress in various traditional computer vision tasks such as image classification [10,11,12,13,14] and image semantic segmentation [15,16,17,18,19]. Deep learning has been widely used in the fields of remote sensing [20,21,22,23,24]. However, the works that applied deep learning to SAR images [25,26] are relatively few. One of the reasons for this is that the unique characteristics of SAR images make labeling time-consuming and labor-intensive. Reference [27] proposed a method to automatically label buildings from SAR images, which significantly reduces labeling difficulty. At the same time, reference [27] also applied the architecture combined with FCN used to extract features with CRF-RNN that exploited spatial information to extract buildings from SAR images. Although there have been fewer minimal works that apply deep learning for SAR imagery to date, it is undeniable that deep learning has colossal application potential for SAR image processing. A DisDBN which combined ensemble learning with a deep belief network to learn discriminant features is proposed in [28] for SAR image classification. Reference [29] let extracted six-channel covariance matrix fed into DCNN for PolSAR image classification. Reference [30] proposed a deep supervised and contractive neural network for SAR image classification. Reference [31] extended the semantic segmentation network from the real-value domain to the complex-valued domain, which can exploit the unique information of SAR data. Reference [32] proposed a new fully convolutional neural network that can be trained in an end-to-end for semantic segmentation of polarimetric SAR imagery in the complex land cover ecosystem. Reference [33] realized smooth classification with a small training set with four classes by deep transfer learning. Reference [34] extracted multiscale trained features using a multiscale CNN model to detect built-up areas from SAR images. Reference [35] firstly applied a deep fully convolutional neural network to segment road from SAR images.

There are two challenges that restrict the accuracy of road extraction from SAR images: (1) the road in SAR images is usually modeled as dark elongated areas surrounded by bright edges which are easily confused with other objects such as rivers, shades of trees, shadows of buildings, etc. (2) Speckle noise in SAR images seriously degrades image quality and interferes with road extraction from SAR images. A deep convolutional neural network with a large receptive field can effectively extract contextual information. This characteristic can be used to distinguish the road from other similar objects. A deep convolutional neural network reduces the influence of speckle noise by continuously performing convolution operations on the input images. Therefore, in this paper, we will introduce a deep learning framework to extract roads from SAR imagery.

The road extraction task includes two subtasks: road detection task and road centerline extraction task. Most of the previous works relied on multistage-based learning methods to extract roads. These methods obtain road centerlines by a post-processing step of thin road detection predicted by CNN. The disadvantage of these methods is that imperfect road detection results lead to road centerline extraction results with low connectivity. There is a symbiotic relationship between the road detection task and road centerline extraction task. The road detection task and road centerline extraction task can promote each other. The road detection task can provide detection cues for the road centerline extraction task to constraint road centerlines, which can avoid spurious parts. The road centerline extraction task can motivate the road detection task to pay more attention to the key points of roads, which can enhance the road connectivity. In order to make full use of this relationship, our proposed network learns the road detection task and road centerline extraction task simultaneously under the multitask learning scheme.

For road centerline extraction from raw image data, the previous works can be roughly divided into two categories: one is based on classification methods, and the other is based on regression methods. Reference [7,36,37] extracted road centerline by a classification-based method. The features of pixels on the road centerline are similar to features of pixels adjacent to the centerline, and the features of pixels far from the centerline of the road are completely different from the pixels in the centerline of the road. However, the errors caused by the misclassification of pixels adjacent to the centerline of the road are the same as those caused by the misclassification of pixels far from the centerline of the road, which is unreasonable. In practice, the closer the pixel is to the centerline of the road, the more we can tolerate its classification errors. This contradiction makes it difficult for the classification-based to cause the to network converge to a better result. Reference [38] first proposed method-based regression, which learned a designed function whose returns value decreases with the distance from the pixel to the centerline. However, due to abnormal, i.e., annotation errors, a deep network for regression is relatively unstable, and the network trained by MSE loss will not converge to a satisfactory global solution. Reference [39] learned the map of distance from each pixel to the nearest boundary by training a multi-class classification network, which ignores the ranking relation between different distance classes. To avoid the above problems, we exploit the method based on ordinal regression to learn the discrete normalized distance labels. We use the ordinal loss, minimized to learn network parameters of the road centerline task.

In the real world, the roads have unique topology properties. Previous works usually applied topology prior to using variational and Markov random field-based methods [36,40,41,42]. Reference [40] imposed a topology constraint by high-order CRF, in which high-order cliques connect superpixels of the road network. Reference [36] represented the road network as a sequence of graph structures and found an optimal subgraph by integer programming. These previous works generally employed road topology optimization as post-processing, which cannot remove large spurious parts and connect large gaps. Recently, some approaches guaranteed a perfect topology of the extracted road by minimizing a topology-preserving loss function [43,44]. Reference [43] stated that only using pixel-wise loss functions is not enough for curvilinear detection and proposes the topology-aware loss function defined by the selection of the filter responses of pretrained VGG19 to penalty topology errors. Reference [44] adopted a loss function based on Persistent Homology, which is continuous. Neither VGG19 nor Betti are specifically designed for road extraction tasks, so their penalty for topology errors of extracted road network is limited To solve the above-mentioned problems, a new road-topology loss is specially designed for the road extraction task, which can reduce the topology errors. Our main contributions are as follows:

Different from the previous methods for road extraction from SAR imagery, we detect the road and extract road centerline simultaneously. This multitask learning scheme can exploit the correlation between road detection task and road centerline extraction task;
For the road extraction task, we build our dataset with TerraSAR-X images, which cover urban, suburbs and rural areas. Our experiments are carried out based on this dataset. Our experimental results show that our proposed framework can achieve a better road extraction performance;
For the road centerline extraction task, we first convert the road centerline extraction problem into the problem of discrete normalized distance label prediction, which can be solved by training an ordinal regressor;
Consider the special topology feature of road network, we propose a new road-topology loss which is designed to reduce the topology error of road extraction including spurious parts and gaps.

The remainder of our paper is organized as follows. We present the proposed method in detail in Section 2. In Section 3, we quantitatively and qualitatively analyzed the superior performance of our method compared with baseline methods. We discuss the stability of different methods with various binarization thresholds in Section 4. Finally, we conclude the whole paper in Section 5.

2. Materials and Methods

Figure 1 illustrates our road extraction framework. For road extraction, our proposed network learns the road detection task and the road centerline extraction task jointly under a multitask learning scheme. As shown in Figure 1, our framework has two branches: the road detection branch and the road centerline extraction branch. The encoder of the two branches is shared for feature extraction, which establishes a connection between the two branches. In this section, we first separately introduce how our network performs road detection tasks and road centerline extraction tasks. Next, the definition of our first proposed road-topology loss function is given. Finally, we introduced how our multitask learning framework simultaneously learns the road detection task and road centerline extraction task based on ordinal regression, using the road-topology loss that we first proposed.In the following discussion, we let

I \in R^{H \times W}

be the

H \times W

input image, let

Y \in {0, 1}^{H \times W}

be the corresponding ground truth, with 1 indicating pixels on the road and 0 indicating background pixels, and let

\hat{Y} \in {0, 1}^{H \times W}

be the predicted probability map of the road. We let the mini-batch be B, let i be a pixel in I,

y_{i}

is label of pixel i. The predicted probability that pixel i is on the road is denoted by

\hat{y_{i}}

.

2.1. Road Detection

The road detection task aims to detect the roads from SAR imagery. The output of the road detection task is the binary image, in which the pixels seen on the roads are 1 and the others are 0. In practice, most pixels of the SAR imagery belong to the non-road regions. As a result, there is a label-unbalancing problem in the road detection task. To overcome this problem, we use weighted cross-entropy loss proposed in [45]. The weighted cross-entropy loss of I is

L_{r s} = - \frac{1}{H \times W} \sum_{i \in I} w_{1} y_{i} log \hat{y_{i}} + w_{0} (1 - y_{i}) log (1 - \hat{y_{i}}) .

(1)

Next, we will present the weights in the weighted cross-entropy loss. Let

| B |

be the number of the images. The loss weight for road pixels is

w_{1} = \frac{1}{| B | \times H \times W} \sum_{I \in B} \sum_{i \in I} 1 (y_{i} = = 0)

and the loss weight for non-road pixels is

w_{0} = \frac{1}{| B | \times H \times W} \sum_{I \in B} \sum_{i \in I} 1 (y_{i} = = 1)

.

2.2. Road Centerline Extraction

For road centerline extarction, the classification-based approach learn a function

y (\cdot)

such that

y (f_{i}) = \{\begin{matrix} 1, & i is on centerline, \\ 0, & i is not on centerline, \end{matrix}

(2)

where

f_{i}

is the feature of pixel i. The method based on regression is to learn a regressor

y (\cdot)

whose values decrease monotonically as the distance of i to the centerline increase. Especially in [38], the regressor

y (\cdot)

is such that

y (f_{i}) = \{\begin{matrix} 1 - \frac{D_{C} (i)}{d_{M}}, & if D_{C} (i) \leq d_{M}, \\ 0, & otherwise, \end{matrix}

(3)

where

D_{C} (i)

is the metric distance from pixel i to the closest the pixel on the centerline and

d_{M}

is

s / 2

, where s is the size of local neighbourhoods that are used to compute feature vector

f_{i}

. Our proposed method is based on ordinal regression, which is different from both. In the remainder of this subsection, we first model the road centerline extraction problem as the discrete normalized distance label prediction problem. We then describe how to predict discrete normalized distance labels by learning an ordinal regressor.

Roads are surrounded by bright edges in high-resolution SAR images. As a result, we can predict the distance

d_{i}

from any pixel i to the nearest road edge. However, the probability that the pixel i is on the centerline of the road is not proportional to the distance from this pixel i to the nearest road edge

d_{i}

. This is due to the width of the road is various, which is depicted in Figure 2. As a result, we predict the normalized distance from pixel i to the nearest road edge. The normalized distance

d n_{i}

is defined as follows

d n_{i} = \{\begin{matrix} 2 * \frac{d_{i}}{w_{i}}, & i is on road, \\ 0, & i is not on road, \end{matrix}

(4)

where

w_{i}

is the road width of pixel i. In particular,

d n_{i}

is proportional to the probability of i on the centerline of the road. Meanwhile, if i is on the road centerline,

d n_{i}

is the local maxima value along the direction that is perpendicular to the direction of the road. We further quantize each

d n_{i}

using the thresholds

{t_{0}, t_{1}, \dots, t_{K - 1}}

into one of the

K + 1

intervals. The reason we quantify the normalized distance

d n_{i}

is that the direct training of the deep network for regression is relatively unstable, because outliers( annotation errors) cause large error terms, making it difficult for the network to converge and lead to unstable predictions [39]. After quantization, each i is given a discrete normalized distance label

l_{i}

, such as

l_{i} = \{\begin{matrix} 0 & d n_{i} \leq t_{0} . \\ k + 1 & t_{k} \leq d n_{i} < t_{k + 1} k = (0, 1, 2, \dots, K - 2) . \\ K & t_{K - 1} \leq d n_{i} . \end{matrix}

(5)

We can predict discrete normalized distance label prediction by a typical method based on multi-class classification. However, the ordinal information between discrete normalized distance labels will be ignored. [46] first combines ordinal regression with DCNNs to address the age estimation, which transforms an ordinal regression problem into a series of binary classification sub-problems to achieve sequential age regression tasks, thus taking accounting for the fact that the set of ages is well-ordered. Therefore, we use ordinal regression in [46] to solve the discrete normalized distance label prediction problem and modify the ordered loss to adapt to the road centerline extraction task.

Next, we will introduce the ordinal regression and the ordinal loss used in this paper in detail. Let

φ

denote the feature extractor of network that is used to extract the road centerlines and the parameters of

φ

are denoted by

Φ

.

I \in R^{H \times W}

is input image. One of the pixels in I is denoted as i. The feature map of I and the feature vector of pixel i for the road centerlne extarction task are

F = φ (I, Φ)

and

f_{i} \in F

, respectively.

ψ

is the last layer of the network for the road centerline extarction task, which is used for ordinal regression. Its parameters are given by

Θ = {θ_{0}, θ_{1}, \dots, θ_{2 K - 1}}

where

θ_{j} (j \in {0, 1, \dots, 2 K - 1})

is the weight vector. Ordianl output of I and ordianl output vector of i are presented by

O = ψ (F, Θ)

and

o_{i} \in O

(

o_{i} = (o_{i}^{0}, o_{i}^{1}, \dots, o_{i}^{2 K - 1})

), respectively. With the softmax activation function that is also used in [46], the probability

\hat{p_{i}^{k}} = P (\hat{l_{i}} > k | F, Θ) (k \in {0, 1, \dots, K - 1})

that the predicted label of i is greater than k is calculated as

\begin{matrix} \hat{p_{i}^{k}} = \frac{e^{o_{i}^{2 k + 1}}}{e^{o_{i}^{2 k}} + e^{o_{i}^{2 k + 1}}} . \end{matrix}

(6)

\begin{matrix} o_{i}^{2 k} = θ_{2 k}^{T} f_{i}, \end{matrix}

(7)

\begin{matrix} o_{i}^{2 k + 1} = θ_{2 k + 1}^{T} f_{i} . \end{matrix}

(8)

According to the method of calculating the ordinal loss in [46], the pixel-level ordinal loss of pixel i is then given by

Ψ (f_{i}, Θ) = - (\sum_{k = 0}^{l_{i} - 1} ω_{1}^{k} log (\hat{p_{i}^{k}}) + \sum_{k = l_{i}}^{K - 1} w_{0}^{k} (log (1 - \hat{p_{i}^{k}}))),

(9)

where

ω_{1}^{k} = \frac{1}{| B | \times H \times W} \sum_{I \in B} \sum_{i \in I} 1 (l_{i} \leq k)

and

ω_{0}^{k} = \frac{1}{| B | \times H \times W} \sum_{I \in B} \sum_{i \in I} 1 (l_{i} > k)

are used to solve the unbalanced classes problem.

The ordinal loss of I is defined as the sum of the ordianl loss of each i in image I and given by

L_{c} (F, Θ) = \frac{1}{H \times W} \sum_{i \in I} Ψ (f_{i}, Θ),

(10)

The advantage of the ordinal loss is that the greater the difference between the predicted label

\hat{l_{i}}

and the true label

l_{i}

, the greater the ordinal loss. We use iterative optimization algorithm to minimize

L_{c} (F, Θ)

. The partial derivative of

L_{c} (F, Θ)

with respect to

θ_{j} (j \in {0, 1, \dots, 2 K - 1})

, is

\frac{\partial L_{c} (F, Θ)}{\partial θ_{j}} = \frac{1}{H \times W} \sum_{i \in I} \frac{\partial Ψ (f_{i}, Θ)}{\partial θ_{j}},

(11)

\frac{\partial Ψ (f_{i}, Θ)}{\partial θ_{2 k + 1}} = - \frac{\partial Ψ (f_{i}, Θ)}{\partial θ_{2 k}},

(12)

\begin{matrix} \frac{\partial Ψ (f_{i}, Θ)}{\partial θ_{2 k}} & = - (ω_{1}^{k} f_{i} η (l_{i} > k) (\hat{p_{i}^{k}} - 1) \\ + ω_{0}^{k} f_{i} η (l_{i} \leq k) \hat{p_{i}^{k}}), \end{matrix}

(13)

where

k \in {0, 1, \dots, K - 1}

and

η (\cdot)

is the indicator function, where

η (t r u e) = 1

and

η (f a l s e) = 0

. We can update the parameters of the network for road centerline extraction task through backward propagation.

In the test phase, we calculate

\hat{p_{i}}

, which is the mean of

\hat{p_{i}^{k}}

for each pixel i.

\hat{p_{i}}

is given by

\hat{p_{i}} = (\sum_{k = 0}^{K - 1} \hat{p_{i}^{k}}) / K .

(14)

We observe that

\hat{p_{i}}

is proportional to the normalized distance

d n_{i}

. We can regard

\hat{p_{i}}

as the predicted probability that the pixel i is on the road centerline. Let

\hat{P}

be the centerline probability map of I, where

\hat{p_{i}} \in \hat{P}

. According to Formula (4), we know that, from the normalized distance map

D_{N}

, the necessary and sufficient condition for pixel i on the road centerline is that

d n_{i}

is a local maximum along the direction perpendicular to the direction of the road. However, if only by judging whether

p_{i}

is a local maximum along the direction perpendicular to the direction of the road to infer whether pixel i is on the road centerline, some non-road regions will be extracted as the road centerline. As a result, we first set the value of

\hat{P}

that is less than T to zero, then we apply the canny-like non-maximum suppression algorithm to

\hat{P}

to obtain the road centerline.

2.3. Road-Topology Loss

In practice, the cross-entropy loss is widely used in various segmentation tasks, such as semantic segmentation and instance segmentation. Cross-entropy loss is a pixel-wise loss, which is completely local and does not take the special and complex topological characteristics of the road into account. Such a loss penalizes the mistake of each pixel equally and independently, regardless of the effect of the error on geometry. However, in practice, we find that the pixels closer to the centerline of the road are more important. because the misclassification of these pixels will cause serious topology errors, such as gaps and spurious parts, To penalize the gaps in the prediction of road detection and the spurious parts in the road centerline extraction prediction, we propose a new road-topology loss

L_{T}

.

Next, we will give the details defining the road- topology loss. To measure the connectivity of road detection prediction, we define the connectivity metric as

T^{c o n} = \frac{| \hat{Y} \circ L |}{| L |},

(15)

where

\hat{Y} \in {[0, 1]}^{H \times W}

is a prediction map of road detection,

L \in {0, 1, \dots, K}^{H \times W}

(

l_{i} \in L

) is discete normalized distance label of I and

| |

is an operation to calculate the sum of matric. Similarly, we define the differentiable correctness metric to measure the correctness of road centerline extraction prediction as

T^{c o r} = \frac{| Y \circ \hat{P} |}{| \hat{P} |},

(16)

where

Y \in {0, 1}^{H \times W}

is ground truth of road detection,

\hat{P} \in {[0, 1]}^{H \times W} (\hat{p_{i}} \in \hat{P})

is predicted road centerline probability map of I. We observe that the measure

T^{c o n}

is susceptible to gaps in the road detection prediction while the measure

T^{c o r}

is susceptible to spurious parts in the road centerline extarction prediction. Finally, we define road-topology metric

T^{R o a d}

as the harmonic average between connectivity metric

T^{c o n}

and differentiable correctness metric

T^{c o r}

as

T^{R o a d} (\hat{Y}, \hat{P}, Y, L) = \frac{2 \cdot T^{c o n} \cdot T^{c o r}}{T^{c o n} + T^{c o r}} .

(17)

The road-topology metric measures the connectivity and correctness of the road extraction result at the same time.

In order to maximize the road-topology metric in CNNs in an end-to-end manner, we define our road-topology loss

L_{T}

as

L_{T} = - log (T^{R o a d})

(18)

L_{T}

is calculated directly from the raw prediction

\hat{Y}

and

\hat{P}

without thresholding. As a result,

L_{T}

is differentiable over the prediction

\hat{Y}

and

\hat{P}

which can be integrated into CNN. In this paper, we use AdamW optimizer to minimize the road-topology loss.The partial derivatives of loss

L_{T}

over network activation

\hat{Y}

and

\hat{P}

at the location of pixel i are

\begin{matrix} \frac{\partial L_{T}}{\partial \hat{y_{i}}} & = - \frac{1}{T^{R o a d}} \frac{\partial T^{R o a d}}{\partial \hat{y_{i}}} \\ = - \frac{2}{T^{R o a d}} (\frac{T^{c o n}}{T^{c o r} + T^{c o n}} - \frac{T^{c o n} \cdot T^{c o r}}{{(T^{c o n} + T^{c o r})}^{2}}) \cdot \frac{\partial T^{c o n}}{\partial \hat{y_{i}}} \\ = - \frac{1}{T^{c o n} (T^{c o n} + T^{c o r})} \cdot \frac{\partial T^{c o n}}{\partial \hat{y_{i}}} \\ = - \frac{1}{T^{c o n} (T^{c o n} + T^{c o r})} \cdot \frac{l_{i}}{| L |} . \end{matrix}

(19)

\begin{matrix} \frac{\partial L_{T}}{\partial \hat{p_{i}}} & = - \frac{1}{T^{R o a d}} \frac{\partial T^{R o a d}}{\partial \hat{p_{i}}} \\ = - \frac{2}{T^{R o a d}} (\frac{T^{c o r}}{T^{c o r} + T^{c o n}} - \frac{T^{c o n} \cdot T^{c o r}}{{(T^{c o n} + T^{c o r})}^{2}}) \cdot \frac{\partial T^{c o r}}{\partial \hat{p_{i}}} \\ = - \frac{1}{T^{c o r} (T^{c o n} + T^{c o r})} \cdot \frac{\partial T^{c o r}}{\partial \hat{p_{i}}} \\ = - \frac{1}{T^{c o r} (T^{c o n} + T^{c o r})} \cdot \frac{y_{i} \cdot | \hat{P} | - | Y \circ \hat{P} |}{| \hat{P} |^{2}} . \end{matrix}

(20)

2.4. Multitask Learning

In our road extraction framework, the input image I feeds into the shared encoder to extract features. The feature maps are, respectively, input into the decoders corresponding to the two tasks to obtain road detection prediction and road centerline extraction prediction. As shown in Figure 1, the prediction of the road detection task

\hat{Y}

and the ground truth Y are used to calculate the weighted cross-entropy loss that can be minimized to update the parameters of the road detection network. The prediction of the road centerline extraction task

\hat{Y}

and the discrete normalized distance label map L are used to calculate the ordinal loss that can be minimized to update the parameters of the road centerline extraction network. Y,

\hat{Y}

,

\hat{L}

, and L are used to calculate our proposed road-topology loss that combines the prediction of road detection and the prediction of road centerline extraction. Our proposed road topology loss makes full use of the correlation between the two tasks and can be minimized to make the two tasks promote each other. The entire loss function is the sum of the cross-entropy loss, the ordinal loss, and road-topology loss. By minimizing the entire loss function, the parameters of the road detection network and the parameters of the road centerline extraction network can be updated simultaneously, which means our framework learns teh road detection task and road centerline extraction task. The loss of mini-batch is calculated by

L = \frac{1}{| B |} \sum_{I \in B} (L_{r s} + L_{c} + L_{T}) .

(21)

3. Experiments

3.1. Dataset

In this subsection, we aim to present the dataset used in this paper. There is no public dataset that is applicable to our research. We create our dataset by using high-resolution TerraSAR-X images that are obtained by striped mode. As shown in Table 1, we label the roads in two SAR images, the coverage areas of which include urban, suburb, and rural areas. The Google Earth Maps of study area are shown in Appendix A. As the dataset is applied to extract roads, we only label all roads in the region where the road network is dense. Our labeled area was split into a training and test set as follows: the upper 80 % of the area (

435 \times 1024 \times 1024

pixels) was used for training, and the lower 20% (

104 \times 1024 \times 1024

pixels ) for testing. We use the raw SAR intensity image without any preprocessing. There are speckle noise in SAR images. Our train set contains raw SAR patches, the ground truth of road, and the ground truth of discrete labels. Algorithm 1 describes how road centerline and road width ground truth can be outlined. Figure 3 shows a sample in the train set. There is the only ground truth of road and ground truth of road centerline in the test set.

Algorithm 1 Obtain Road Centerline and Road Width Ground Truth

Input: road ground truth Y

Output: Road centerline ground truth C, Road width ground truth W

1:: Calculate the distance map D
2:: From distance map D, calculate the horizontal gradient map $D_{X} = D ⨂ f$ , where $f = [- 1, 2, - 1]$
3:: From distance map D, calculate the vertical gradient map $D_{Y} = D ⨂ f^{T}$ , where $f = [- 1, 2, - 1]$
4:: Calculate the orientation map $O R = t a n^{- 1} (D Y, D X)$ that is perpendicular to road direction.
5:: for (do $i \in I$ and $y_{i} = = 1$ )
6:: if $d_{i}$ is local maximum along the direction $o r_{i}$ then
7:: i is on the road centerline: $c_{i} = 1$
8:: the road width of pixel i: $w_{i} = 2 * d_{i}$
9:: else
10:: i is not on the road centerline: $c_{i} = 0$
11:: the road width of pixel i: $w_{i} = 2 * d_{j}$ , where $d_{j}$ is the local maximum starting from i along the direction $o_{i}$ in the distance map D.

3.2. Evaluation Metrics

To assess the qualitative performance in both road detection task and road road centerline extraction task, we apply the metrics that are introduced in [47]. In this paper, Our framework can detect road and extract road centerline simultaneously. To highlight the characteristics of our method, we propose a series of metrics to evaluate the performance of road extraction task.

3.2.1. Road Detection

The metrics that are employed to evaluate the performance of our approach for road detection task are precision, recall, F1-score, and intersection over union (IoU) that is the ratio of the intersection of prediction and groundtruth to the union of prediction and groundtruth. Precision (P) measures the ratio of the number of the pixels which are labeled as road pixels in the ground truth and are predicted as road pixels to the number of pixels that are inferred as road pixels. Recall (R) calculates the ratio of the number of the pixels which are labeled as road pixels in the ground truth and are predicted as road pixels to the number of pixels that are labeled as road pixels. F1-score(

F 1_{r d}

) is used to balance precision and recall, which is a harmonic average between precision and recall. IoU is the ratio of the intersection and union of the true label and predicted result, which can trade-off between recall and precision. Specifically, the four metrics are defined as:

\begin{matrix} P = \frac{T P}{T P + F P}, \end{matrix}

(22)

\begin{matrix} R = \frac{T P}{T P + F N}, \end{matrix}

(23)

\begin{matrix} F 1_{r d} = \frac{2 * P r e c i s i o n * R e c a l l}{P r e c i s i o n + R e c a l l}, \end{matrix}

(24)

\begin{matrix} I o U = \frac{T P}{T P + F N + F P}, \end{matrix}

(25)

where

T P

is true positive,

F P

is false positive and

F N

is false negative.

As there is a deviation between the manually labelled roads and the real roads, we relax metrics using the buffer method given in [48]. Specifically, if the regions in the prediction result are within the

ρ

pixels range, they are regarded as matching regions. In this paper, we set

ρ = 2

.

3.2.2. Road Centerline Extraction

Due to the differences between the road detection task and road centerline extarction task, it is better to use different metrics for the road centerline task. Next, we present the metrics that are used for the road centerline extarction task. We calculate completeness, correctness, quality, and F1-score to assess the performance of our approach for the road centerline extraction task. As it was difficult to directly compare the pixel difference between the extracted centerline and the ground truth, we introduced a buffer-based evaluation for the road centerlines. Completeness (COM) is the ratio of the length of reference road centerline that lies within the buffer around the around the extracted centerline to the length of reference centerline. Correctness (COR) is the ratio of the extracted road centerline that lies within the buffer around the reference centerline to the length of the extracted centerline. Quality (Q) is a comprehensive metric that combines the completeness and correctness. F1-score (

F 1_{r c e}

) is used to balance COM and COR, which is a harmonic average between COM and COR. The four metrics can be calculated as

\begin{matrix} C O M = \frac{length of matched reference}{length of reference} \approx \frac{T P}{T P + F N}, \end{matrix}

(26)

\begin{matrix} C O R = \frac{length of matched extraction}{length of extarction} \approx \frac{T P}{T P + F P}, \end{matrix}

(27)

\begin{matrix} Q = \frac{length of matched extarction}{length of extarction + length of unmatched reference} \approx \frac{T P}{T P + F N + F P}, \end{matrix}

(28)

\begin{matrix} F 1_{r c e} = \frac{2 * C O M * C O R}{C O M + C O R} \approx \frac{2 * P * R}{P + R} . \end{matrix}

(29)

3.2.3. Road Extraction

From the definition, we know that the metrics of road detection: precision (P), recall (R), IoU, F1-score (

F 1_{r d}

) correspond to the the metrics of road centerline extraction: correctness (COR), completeness (COM), quality (Q), F1-score (

F 1_{r c e}

), respectively. As a result, we design four metrics: precision for road extraction (

P_{r e}

), recall for road extraction (

R_{r e}

), quality for road extraction (

Q_{r e}

), F1-score for road extraction (

F 1_{r e}

) to evaluate the performance for road extraction. The metrics for road extarction are given as

\begin{matrix} P_{r e} = \frac{α * P + β * C O R}{α + β}, \end{matrix}

(30)

\begin{matrix} R_{r e} = \frac{α * R + β * C O M}{α + β}, \end{matrix}

(31)

\begin{matrix} Q_{r e} = \frac{α * I o U + β * Q}{α + β}, \end{matrix}

(32)

\begin{matrix} F 1_{r e} = \frac{α * F 1_{r d} + β F 1_{r c e}}{α + β}, \end{matrix}

(33)

where

α

and

β

can be set according to the importance of the road detection task and the road centerline extraction task. If the evaluator pays more attention to road detection task, then

α > β

. Otherwise, we set

α < β

. If the road detection task and road centerline extraction task are equally important, then we set

α = β

.

3.3. Implementation Details

In this subsection, we will specifically present the values of all hyperparameters in our experiment. We adopt the Pytorch framework to implement networks trained on a single NVIDIA Tesla V100 with 16G memory using a batch size of one. We train the networks with AdamW optimizer with the initial learning rate of 0.001, and we drop the learning rate by the factor of 0.1 at every ten epochs. We applied data augmentation to the training set with image rotation and horizontal and vertical flips. The augmented training set is composed of 3480 SAR images, the size of which is

1024 \times 1024

. For road centerline extraction, we set the threshold T as 0.25. As T only works in the preprocessing step of our test stage, this will not affect the final road centerline extraction result too much. If the parameter T is set too large, there will be more discontinuities in the road centerline extraction results. In practice, we do not know the optimal value of T for a test image without ground truth, and users can set T according to their needs. If users do not want spurious parts in the results, they can set a larger T. Conversely, if the user prefers to ensure the completeness of the road centerline extraction result, T can be set to a smaller value. In this paper, we set the T to 0.25 that is neither too big nor too small to remove some spurious parts and guarantee the connectivity of the road centerlines.

3.4. Results

In this subsection, we first introduce the networks that are used as baseline methods in this paper. We choose three fully convolutional neural networks (FCNNs), including LinkNet34 [48], DLinkNet34 [49], and DeeplabV3plus [50] as baseline methods to study the performance of our method. These three networks were implemented to extract roads from optical remote sensing imagery. To adapt to the size of images in our dataset, we set the dilation rates of dilated conventional operations in DeeplabV3plus to

[2, 4, 8, 16]

. To verify that both the proposed multitask architecture and loss function are effective, we will perform three sets of comparative experiments. Each set of comparative experiments contains three methods, including the baseline method, the method I, the method II. As shown in Figure 4, (a) is the network architecture of the baseline method, while (b) and (c) are the network architectures of the method I and the method II, respectively, which both have two branches: road detection branch and road centerline extraction branch. The methods I in the three sets of comparative experiments are, respectively, abbreviated as LinkNet34+, DLinkNet34+, and DeeplabV3plus+, which are obtained by modifying the networks of the baseline methods with dual and identical decoders having shared encoders. The methods II in the three sets of comparative experiments are, respectively, abbreviated as LinkNet34++, DLinkNet34++, and DeeplabV3plus++, which use the road topology loss to update the network parameters and have the same network architecture as the methods I.

Comparative Evaluation on Road Detection

To compare the performance of road detection, all the methods are evaluated based on the test samples in the test set for road detection. For qualitative comparison, we show the results produced by all methods based on example images depicted in Figure 5. The quantitative comparisons are reported in Table 2.

As shown in Figure 5, the methods with multitask learning architecture and using road-topology loss function generally perform better than the baseline methods. The baseline methods miss road regions in many places, i.e., the false negative (green) part is large, which leads to the poor connectivity of road detection result. With the learning road detection task and road centerline extraction based on ordinal regression task jointly, the false negative part is smaller. By further using the road-topology loss function that can penalize discontinuous parts of the detected road, the performance of road connectivity is improved. Table 2 presents the comparative quantitative evaluation measured in terms of P, R, IoU, F1. In Table 2, the best value of performance is presented with bold style. We observe from the Table 2 that the proposed methods outperform baselines in terms of three metrics, i.e., R, IoU, F1. Though the precision of proposed methods is less than the baseline, the slight decrease in P is insignificant compared to the large increase in the other three metrics.

3.5. Comparison of Road Centerline Extraction

In this subsection, we present the result of a comparative evaluation based on the centerline extraction task. For LinkNet34, DLinkNet34, DeeplabV3plus, we applied the morphological thinning algorithm [51] to the road detection results so as to extract road centerline. Figure 6 illustrates the centerlines identified by different methods. From Figure 6, we can see that more discontinuities are observed in road centerline extraction results of baseline methods. As shown in Figure 6, by learning the road centerline extraction task based on ordinal regression to extract road centerline, false negative parts in the road centerline extraction results decrease, and the connectivity of the centerline network is enhanced. Figure 6 also shows that By using road topology loss, the quality of road centerline extraction results has been further improved. The one reason is that our method can learn centerline extraction task that is based on ordinal regression and the road detection task simultaneously under a multitask learning scheme, whereas the prediction of one subtask can bootstrap the performance of solving another subtask. The other reason is that minimizing our firstly proposed road-topology loss helps eliminate topology errors of road extraction results, which include spurious parts and gaps. Table 3 summarized the results evaluated by COM, COR, Q, and F1 metrics. From Table 3, we can see that the methods with centerline extraction based on ordinal regression perform better than baseline methods. The methods with centerline extraction based ordinal regression and road-topology loss achieve the best performance.

3.6. Comparison of Road Extraction

In this subsection, we present the result of a comparative evaluation based on the road extraction task. To compare the performance of road extraction qualitatively, we adapt our proposed metrics with different values of

α

and

β

to evaluate different methods. For qualitative comparison, we show the results produced by all methods based on example images depicted in Figure 6. The quantitative comparisons are reported in Table 4, Table 5 and Table 6. From Figure 7, we can see that the poor road detection results of the baseline method lead to many discontinuities in the road centerline extraction results. As shown in Figure 7, by learning the road detection task and the road centerline extraction task based on ordinal regression to extract the road centerline, the performances of both tasks were improved. Figure 7 also shows that, by using our first proposed road-topology loss, the continuity of road detection result is enhanced, and the quality of road detection result is increased. This indicates that minimizing the road-topology loss helps eliminate topology errors in the road extraction results, which include spurious parts and gaps. Table 4, Table 5 and Table 6 summarized the results evaluated by

P_{r e}

,

R_{r e}

,

Q_{r e}

, and

F 1_{r e}

metrics. We can observe that whether the evaluator pays more attention to the road detection task or the road centerline extraction task, the method II that learns road detection task and road centerline extraction task simultaneously and adapt our proposed road-topology loss can achieve the best performance.

4. Discussion

In this section, we will discuss the robustness of different methods when the binarization threshold is changed. When we set different binarization thresholds, the road detection results will be different. However, in practice, we do not know the optimal binarization threshold. Figure 8 reflects the statistical characteristics of IoU and F1 across different binarization thresholds. The IoU and F1-score results of baseline methods(LinkNet34, DLinkNet34, DeeplabV3plus) are unstable with the vary of the binarization thresholds, as evidenced by the large variance in IoU and F1. In contrast, the F1 and IoU of our methods show better stability. Figure 8 also reports that, for F1 and IoU metrics, the means of the baseline methods are less than our proposed methods’ means. This is due to the use of a multitask learning scheme that can make two tasks mutually promote each other, and a road-topology loss that is minimized to eliminate topology errors. In conclusion, our methods can achieve a high performance under a wider range of binarization thresholds than baseline methods.

5. Conclusions

In this paper, we have learned the road detection task and the road centerline extraction task jointly with a multitask learning scheme to solve the problem of the road extraction from SAR imagery. To eliminate topology errors in road extraction results, we have specially designed a road-topology loss function for road extraction, which is differentiable. Different from the centerline extraction method based on regression method or classification method, we have adapted ordinal regression to learn discrete distance labels and trained the network by minimizing ordinal loss. By using the road centerline extraction method based on ordinal regression, the network is not sensitive to incorrect labels and can converge to a satisfactory result. Using multitask learning architecture, we have made full use of the correlation between the road detection task and the road centerline extraction task by learning the two tasks at the same time. The performance of one task has been improved under bootstrapping by the other task. The test result has shown that the networks, modified as multitask architecture, perform better than baseline methods. Considering the unique topological characteristics of the road, we have proposed a new road-topology loss function to penalize spurious parts in centerline extraction results and gaps in road detection results. The results show that the proposed road-topology loss function improved the connectivity and completeness of road networks. Finally, we discussed the robustness of our method, and the result has shown that our method not only greatly improves the performance of road extraction, but is also more stable than the baseline methods.

6. Future Work

Although our proposed method improves the performance of road extarction task, there are still some false detections. There are two main reasons for the false detection. One reason is that the noisy speckle nature of sar images, the layover and the shadowing effects of imaging destroy the roads in the SAR image. This leads many road parts to be misssed. The other reason is that the roads in sar images can often be confused with other targets, such as railway tracks, rivers, or even tree hedges. This leads some non-road regions to be detected as the roads. In the future, we will further improve our network to eliminate the influence of these two factors.

Author Contributions

Conceptualization, X.W.; methodology, X.W.; software, X.W.; validation, X.W.; formal analysis, X.W.; investigation, X.W.; resources, X.L.; data curation, X.W. and X.L.; writing— original draft preparation, X.W.; writing—review and editing, X.W., X.L.; visualization, X.W.; supervision, X.W.; project administration, X.L. and K.Z.; funding acquisition, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Key Research and Development Program of China grant number 2018YFC1505100.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Figure A1 and Figure A2 are the Google Earth images of the study area.

Figure A1. The Google Earth image of the study area.

Figure A2. The Google Earth image of the study area.

References

Tupin, F.; Maitre, H.; Mangin, J.; Nicolas, J.; Pechersky, E. Detection of linear features in SAR images: Application to road network extraction. IEEE Trans. Geosci. Remote Sens. 1998, 36, 434–453. [Google Scholar] [CrossRef] [Green Version]
Negri, M.; Gamba, P.; Lisini, G.; Tupin, F. Junction-aware extraction and regularization of urban road networks in high-resolution SAR images. IEEE Trans. Geosci. Remote Sens. 2006, 44, 2962–2971. [Google Scholar] [CrossRef]
Lu, P.; Du, K.; Yu, W.; Wang, R.; Deng, Y.; Balz, T. A New Region Growing-Based Method for Road Network Extraction and Its Application on Different Resolution SAR Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 4772–4783. [Google Scholar] [CrossRef]
He, C.; Liao, Z.; Yang, F.; Deng, X.; Liao, M. Road Extraction From SAR Imagery Based on Multiscale Geometric Analysis of Detector Responses. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2012, 5, 1373–1382. [Google Scholar]
Jiang, M.; Miao, Z.; Gamba, P.; Yong, B. Application of Multitemporal InSAR Covariance and Information Fusion to Robust Road Extraction. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3611–3622. [Google Scholar] [CrossRef]
Liu, J.; Sui, H.; Tao, M.; Sun, K.; Xin, M. Road extraction from SAR imagery based on an improved particle filtering and snake model. Int. J. Remote Sens. 2013, 34, 8199–8214. [Google Scholar] [CrossRef]
Cheng, J.; Guan, Y.; Ku, X.; Sun, J. Semi-automatic road centerline extraction in high-resolution SAR images based on circular template matching. In Proceedings of the 2011 International Conference on Electric Information and Control Engineering (ICECE), Wuhan, China, 15–17 April 2011; pp. 1688–1691. [Google Scholar] [CrossRef]
Xu, R.; He, C.; Liu, X.; Dong, C.; Qin, Q. Bayesian Fusion of Multi-Scale Detectors for Road Extraction from SAR Images. Int. J. Geo-Inf. 2017, 6, 26. [Google Scholar] [CrossRef] [Green Version]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Wu, J.; Yu, Y.; Huang, C.; Yu, K. Deep multiple instance learning for image classification and auto-annotation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3460–3469. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Ciregan, D.; Meier, U.; Schmidhuber, J. Multi-column deep neural networks for image classification. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3642–3649. [Google Scholar]
Wang, F.; Jiang, M.; Qian, C.; Yang, S.; Li, C.; Zhang, H.; Wang, X.; Tang, X. Residual Attention Network for Image Classification. In Proceedings of the 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6450–6458. [Google Scholar]
Charles, R.Q.; Su, H.; Kaichun, M.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 77–85. [Google Scholar]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Ros, G.; Sellart, L.; Materzynska, J.; Vazquez, D.; Lopez, A.M. The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 27–30 June 2016; pp. 3234–3243. [Google Scholar]
Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. Learning a Discriminative Feature Network for Semantic Segmentation. In Proceedings of the 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 1857–1866. [Google Scholar]
Huang, Z.; Wang, X.; Wang, J.; Liu, W.; Wang, J. Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing. In Proceedings of the 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7014–7023. [Google Scholar]
Shen, F.; Gan, R.; Yan, S.; Zeng, G. Semantic Segmentation via Structured Patch Prediction, Context CRF and Guidance CRF. In Proceedings of the 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5178–5186. [Google Scholar]
Pan, X.; Zhao, J.; Xu, J. Conditional Generative Adversarial Network-Based Training Sample Set Improvement Model for the Semantic Segmentation of High-Resolution Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2020, 1–17. [Google Scholar] [CrossRef]
Li, A.; Jiao, L.; Zhu, H.; Li, L.; Liu, F. Multitask Semantic Boundary Awareness Network for Remote Sensing Image Segmentation. IEEE Trans. Geosci. Remote Sens. 2021, 1–14. [Google Scholar] [CrossRef]
Zheng, C.; Zhang, Y.; Wang, L. Semantic Segmentation of Remote Sensing Imagery Using an Object-Based Markov Random Field Model With Auxiliary Label Fields. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3015–3028. [Google Scholar] [CrossRef]
Ding, L.; Zhang, J.; Bruzzone, L. Semantic Segmentation of Large-Size VHR Remote Sensing Images Using a Two-Stage Multiscale Training Architecture. IEEE Trans. Geosci. Remote Sens. 2020, 58, 5367–5376. [Google Scholar] [CrossRef]
Zhang, Z.; Liu, Q.; Wang, Y. Road Extraction by Deep Residual U-Net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef] [Green Version]
Kussul, N.; Lavreniuk, M.; Skakun, S.; Shelestov, A. Deep Learning Classification of Land Cover and Crop Types Using Remote Sensing Data. IEEE Geosci. Remote Sens. Lett. 2017, 14, 778–782. [Google Scholar] [CrossRef]
Zhang, Q.; Kong, Q.; Zhang, C.; You, S.; Li, L. A new road extraction method using Sentinel-1 SAR images based on the deep fully convolutional neural network. Eur. J. Remote Sens. 2019, 52, 572–582. [Google Scholar] [CrossRef] [Green Version]
Shahzad, M.; Maurer, M.; Fraundorfer, F.; Wang, Y.; Zhu, X. Buildings Detection in VHR SAR Images Using Fully Convolution Neural Networks. IEEE Trans. Geosci. Remote Sens. 2019, 57, 1100–1116. [Google Scholar] [CrossRef] [Green Version]
Zhao, Z.; Jiao, L.; Zhao, J.; Gu, J.; Zhao, J. Discriminant Deep Belief Network for High-Resolution SAR Image Classification. Pattern Recognit. 2016, 61, 686–701. [Google Scholar] [CrossRef]
Zhou, Y.; Wang, H.; Xu, F.; Jin, Y.Q. Polarimetric SAR Image Classification Using Deep Convolutional Neural Networks. IEEE Geosci. Remote Sens. Lett. 2017, 13, 1935–1939. [Google Scholar] [CrossRef]
Geng, J.; Wang, H.; Fan, J.; Ma, X. Deep Supervised and Contractive Neural Network for SAR Image Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2442–2459. [Google Scholar] [CrossRef]
Chen, J.; Qiu, X. Equivalent Complex Valued Deep Semantic Segmentation Network For SAR Images. In Proceedings of the International Applied-Computational-Electromagnetics-Society Symposium China (ACES), Nanjing, China, 8–11 August 2019; Volume 1, pp. 1–2. [Google Scholar]
Mohammadimanesh, F.; Salehi, B.; Mandianpari, M.; Gill, E.; Molinier, M. A new fully convolutional neural network for semantic segmentation of polarimetric SAR imagery in complex land cover ecosystem. ISPRS J. Photogramm. Remote Sens. 2019, 151, 223–236. [Google Scholar] [CrossRef]
Wu, W.; Li, H.; Li, X.; Guo, H.; Zhang, L. PolSAR Image Semantic Segmentation Based on Deep Transfer Learning—Realizing Smooth Classification With Small Training Sets. IEEE Geosci. Remote Sens. Lett. 2019, 16, 977–981. [Google Scholar] [CrossRef]
Li, J.; Zhang, R.; Li, Y. Multiscale convolutional neural network for the detection of built-up areas in high-resolution SAR images. In Proceedings of the 36th IEEE International Geoscience and Remote Sensing Symposium(IGARSS), Beijing, China, 10–15 July 2016; pp. 910–913. [Google Scholar] [CrossRef]
Henry, C.; Azimi, S.M.; Merkle, N. Road Segmentation in SAR Satellite Images With Deep Fully Convolutional Neural Networks. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1867–1871. [Google Scholar] [CrossRef] [Green Version]
Turetken, E.; Benmansour, F.; Andres, B.; Glowacki, P.; Pfister, H.; Fua, P. Reconstructing Curvilinear Networks Using Path Classifiers and Integer Programming. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 2515–2530. [Google Scholar] [CrossRef] [Green Version]
Głowacki, P.; Pinheiro, M.A.; Mosinska, A.; Türetken, E.; Lebrecht, D.; Sznitman, R.; Holtmaat, A.; Kybic, J.; Fua, P. Reconstructing Evolving Tree Structures in Time Lapse Sequences by Enforcing Time-Consistency. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 755–761. [Google Scholar] [CrossRef] [Green Version]
Sironi, A.; Türetken, E.; Lepetit, V.; Fua, P. Multiscale Centerline Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 1327–1341. [Google Scholar] [CrossRef]
Wang, Y.; Wei, X.; Liu, F.; Chen, J.; Zhou, Y.; Shen, W.; Fishman, E.K.; Yuille, A.L. Deep Distance Transform for Tubular Structure Segmentation in CT Scans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 3832–3841. [Google Scholar]
Wegner, J.D.; Montoya-Zegarra, J.A.; Schindler, K. A Higher-Order CRF Model for Road Network Extraction. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 8–14 December 2019; pp. 1698–1705. [Google Scholar] [CrossRef] [Green Version]
Han, X.; Xu, C.; Prince, J.L. A topology preserving level set method for geometric deformable models. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 755–768. [Google Scholar]
Stoica, R.; Descombes, X.; Zerubia, J. A Gibbs Point Process for Road Extraction from Remotely Sensed Images. Int. J. Comput. Vis. 2004, 57, 121–136. [Google Scholar] [CrossRef]
Mosinska, A.; Marquez-Neila, P.; Kozinski, M.; Fua, P. Beyond the Pixel-Wise Loss for Topology-Aware Delineation. In Proceedings of the 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 3136–3145. [Google Scholar]
Hu, X.; Fuxin, L.; Samaras, D.; Chen, C. Topology-Preserving Deep Image Segmentation. arXiv 2019, arXiv:1906.05404. [Google Scholar]
Zhao, K.; Gao, S.; Wang, W.; Cheng, M.M. Optimizing the F-measure for Threshold-free Salient Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 8848–8856. [Google Scholar]
Niu, Z.; Zhou, M.; Wang, L.; Gao, X.; Hua, G. Ordinal Regression with Multiple Output CNN for Age Estimation. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 27–30 June 2016; pp. 4920–4928. [Google Scholar]
Heipke, C.; Mayer, H.; Wiedemann, C.; Jamet, O. Evaluation of Automatic Road Extraction. Int. Arch. Photogramm. Remote Sens. 1997, 3-4W2, 151–156. [Google Scholar]
Mnih, V.; Hinton, G. Learning to Label Aerial Images from Noisy Data. In Proceedings of the 29th International Conference on Machine Learning (ICML), Edinburgh, UK, 26 June–1 July 2012; Volume 1, pp. 567–574. [Google Scholar]
Zhou, L.; Zhang, C.; Ming, W. D-LinkNet: LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 192–196. [Google Scholar]
Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
Zhang, T.Y.; Suen, C.Y. A Fast Parallel Algorithm for Thinning Digital Patterns. Commun. ACM 1984, 27, 236–239. [Google Scholar] [CrossRef]

Figure 1. An overview of the proposed multi-task road extraction framework, which includes two parts: the Road Detection branch, Road Centerline Extraction branch. The encoder is shared between the Road Detection branch and the Road Centerline Extraction branch. The architecture will be optimized with three terms: Cross-Entropy loss for the Road Detection branch, Ordinal loss for the Road Centerline Extraction branch, and Road-Topology loss for both.

Figure 2. (a) SAR image I. (b) ground truth Y. (c) distance map D. (d) normalized distance map

D_{N}

. (e) discrete normalized distance label L. (f) Centerline C.

Figure 2. (a) SAR image I. (b) ground truth Y. (c) distance map D. (d) normalized distance map

D_{N}

. (e) discrete normalized distance label L. (f) Centerline C.

Figure 3. (a): SAR image. (b): ground truth of road. (c): ground truth of discrete label.

Figure 4. (a) baseline methods with only road detection branch; (b) the methods obtained by modifying baseline methods with road centerline extraction branch. (c) the methods obtained by modifying baseline methods with road centerline extraction branch androad-topology loss. ‘RD’ denotes the road detection; ‘RCE’ denotes the road centerline extraction.

Figure 5. Qualitative comparison of road detection results produced by different methods.

Figure 6. Qualitative comparison of road centerline extraction results produced by different methods. For visualization, we performe the morphological dilation operation on the road centerline extraction results.

Figure 7. Qualitative comparison of road extraction results produced by different methods. For visualization, we performe the morphological dilation operation on the road centerline extraction results.

Figure 8. (a): DeeplabV3plus, (b): DeeplabV3plus+, (c): DeeplabV3plus++, (d): DLinkNet34, (e): DLinkNet34+, (f): DLinkNet34++, (g): LinkNet34, (h): LinkNet34+, (i): LinkNet34++.

Table 1. Metadata of the TerraSAR-X Images Used in Our Data Set.

Beijing, China
Size	21,800 × 15,500 px
Range Sample Distance	0.909627 m/px
Azimuth Sample Distance	1.848561 m/px
Spatial Resolution	3 m/px
Center Coordinate	[39.8798466,116.4503446]
Polarization	HH
Date	2013-05-09
Beijing, China
Size	27,600 × 18,700 px
Range Sample Distance	0.908790 m/px
Azimuth Sample Distance	1.888833 m/px
Spatial Resolution	3 m/px
Center Coordinate	[39.9571641,116.6996268]
Polarization	HH
Date	2013-03-04

Table 2. Comparative quantitative Evaluation Among Different Methods for Road Detection on our dataset. It should be noted that the results are the average performance of all images in the test set. With the best results marked in bold.

Methods Names	P	R	IoU	F1
DeeplabV3plus	0.9153	0.6341	0.5629	0.7378
DeeplabV3plus+	0.9039	0.6666	0.5980	0.7534
DeeplabV3plus++	0.8691	0.7673	0.6827	0.8069
DLinkNet34	0.9295	0.6300	0.5666	0.7382
DLinkNet34+	0.9314	0.6478	0.5899	0.7507
DLinkNet34++	0.8922	0.7424	0.6707	0.8009
LinkNet34	0.9252	0.6565	0.5918	0.7554
LinkNet34+	0.9097	0.6897	0.6216	0.7724
LinkNet34++	0.8948	0.7415	0.6693	0.8007

Table 3. Comparative quantitative Evaluation Among Different Methods for Road Centerline Extraction on our dataset. It Should Be Noted That the Results are the Average Performance of all Images in Test Set. With the best results marked in bold.

Methods Names	COR	COM	Q	F1
DeeplabV3plus	0.9532	0.7320	0.6374	0.8192
DeeplabV3plus+	0.9244	0.7860	0.6880	0.8379
DeeplabV3plus++	0.9138	0.8346	0.7360	0.8658
DLinkNet34	0.9670	0.6929	0.6187	0.7955
DLinkNet34+	0.9350	0.7839	0.6976	0.8456
DLinkNet34++	0.9150	0.8276	0.7277	0.8618
LinkNet34	0.9668	0.7102	0.6363	0.8076
LinkNet34+	0.9063	0.8237	0.7185	0.8561
LinkNet34++	0.9171	0.8265	0.7255	0.8623

Table 4. Comparative quantitative Evaluation Among Different Methods for Road Extraction on our dataset. With the best results marked in bold.

α = 1, β = 1

.

Table 4. Comparative quantitative Evaluation Among Different Methods for Road Extraction on our dataset. With the best results marked in bold.

α = 1, β = 1

.

Methods Names	$P_{r e}$	$R_{r e}$	$Q_{r e}$	$F 1_{r e}$
DeeplabV3plus	0.9342	0.6831	0.6002	0.7785
DeeplabV3plus+	0.9141	0.7263	0.6430	0.7957
DeeplabV3plus++	0.8915	0.8010	0.7094	0.8364
DLinkNet34	0.9483	0.6615	0.5926	0.7669
DLinkNet34+	0.9332	0.7159	0.6437	0.7981
DLinkNet34++	0.9036	0.7850	0.6992	0.8313
LinkNet34	0.9460	0.6833	0.6141	0.7815
LinkNet34+	0.9080	0.7567	0.6701	0.8143
LinkNet34++	0.9060	0.7840	0.6974	0.8315

Table 5. Comparative quantitative Evaluation among Different Methods for Road Extraction on our dataset. With the best results marked in bold.

α = 2, β = 1

.

Table 5. Comparative quantitative Evaluation among Different Methods for Road Extraction on our dataset. With the best results marked in bold.

α = 2, β = 1

.

Methods Names	$P_{re}$	$R_{re}$	$Q_{re}$	$F 1_{re}$
DeeplabV3plus	0.9279	0.6668	0.5877	0.7649
DeeplabV3plus+	0.9107	0.7064	0.6280	0.7816
DeeplabV3plus++	0.8840	0.7897	0.7005	0.8265
DLinkNet34	0.9420	0.6510	0.5839	0.7573
DLinkNet34+	0.9326	0.6932	0.6258	0.7823
DLinkNet34++	0.8998	0.7708	0.6897	0.8212
LinkNet34	0.9391	0.6744	0.6067	0.7728
LinkNet34+	0.9086	0.7344	0.6539	0.8003
LinkNet34++	0.9022	0.7699	0.6880	0.8213

Table 6. Comparative quantitative Evaluation Among Different Methods for Road Extraction on our dataset. With the best results marked in bold.

α = 1, β = 2

.

Table 6. Comparative quantitative Evaluation Among Different Methods for Road Extraction on our dataset. With the best results marked in bold.

α = 1, β = 2

.

Methods Names	$P_{re}$	$R_{re}$	$Q_{re}$	$F 1_{re}$
DeeplabV3plus	0.9405	0.6994	0.6126	0.7921
DeeplabV3plus+	0.9175	0.7462	0.6580	0.8098
DeeplabV3plus++	0.8989	0.8122	0.7183	0.8462
DLinkNet34	0.9545	0.6720	0.6013	0.7764
DLinkNet34+	0.9338	0.7386	0.6617	0.8140
DLinkNet34++	0.9074	0.7992	0.7087	0.8415
LinkNet34	0.9529	0.6923	0.6215	0.7902
LinkNet34+	0.9075	0.7790	0.6862	0.8282
LinkNet34++	0.9097	0.7982	0.7068	0.8418

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wei, X.; Lv, X.; Zhang, K. Road Extraction in SAR Images Using Ordinal Regression and Road-Topology Loss. Remote Sens. 2021, 13, 2080. https://doi.org/10.3390/rs13112080

AMA Style

Wei X, Lv X, Zhang K. Road Extraction in SAR Images Using Ordinal Regression and Road-Topology Loss. Remote Sensing. 2021; 13(11):2080. https://doi.org/10.3390/rs13112080

Chicago/Turabian Style

Wei, Xiaochen, Xiaolei Lv, and Kaiyu Zhang. 2021. "Road Extraction in SAR Images Using Ordinal Regression and Road-Topology Loss" Remote Sensing 13, no. 11: 2080. https://doi.org/10.3390/rs13112080

APA Style

Wei, X., Lv, X., & Zhang, K. (2021). Road Extraction in SAR Images Using Ordinal Regression and Road-Topology Loss. Remote Sensing, 13(11), 2080. https://doi.org/10.3390/rs13112080

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Road Extraction in SAR Images Using Ordinal Regression and Road-Topology Loss

Abstract

1. Introduction

2. Materials and Methods

2.1. Road Detection

2.2. Road Centerline Extraction

2.3. Road-Topology Loss

2.4. Multitask Learning

3. Experiments

3.1. Dataset

3.2. Evaluation Metrics

3.2.1. Road Detection

3.2.2. Road Centerline Extraction

3.2.3. Road Extraction

3.3. Implementation Details

3.4. Results

Comparative Evaluation on Road Detection

3.5. Comparison of Road Centerline Extraction

3.6. Comparison of Road Extraction

4. Discussion

5. Conclusions

6. Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI