Learning a Dilated Residual Network for SAR Image Despeckling

In this letter, to break the limit of the traditional linear models for SAR image despeckling, we propose a novel deep learning approach by learning a non-linear end-to-end mapping between the noisy and clean SAR images with a dilated residual network (SAR-DRN). SAR-DRN is based on dilated convolutions, which can both enlarge the receptive field and maintain the filter size and layer depth with a lightweight structure. In addition, skip connections are added to the despeckling model to reduce the vanishing gradient problem. Compared with the traditional despeckling methods, the proposed method shows superior performance over the state-of-the-art methods on both quantitative and visual assessments, especially for strong speckle noise.


I. INTRODUCTION
Synthetic aperture radar (SAR) is a coherent imaging sensor, which can access a wide range of high-quality massive surface data. However, SAR images are inherently affected by multiplicative noise, i.e., speckle noise, which is caused by the coherent nature of the scattering phenomena [1]. The presence of speckle severely affects the quality of SAR images, and greatly reduces the utilization efficiency in SAR image interpretation, retrieval, and other applications [2]. As a result, SAR image speckle reduction is an essential preprocessing step and has become a hot research topic.
To remove the speckle noise of SAR images, researchers first proposed spatial linear filters such as the Lee filter [3]. This method assumes that the image filtering result values have a linear relationship with the original image. However, due to the nature of local processing, the spatial linear filter methods often fail to preserve edges and details.
Aimed at solving this problem, the nonlocal means (NLM) algorithm [4] has provided a breakthrough in detail preservation in SAR image despeckling. The basic idea of the NLM-based methods is that natural images have self-similarity and there are similar patches repeating over and over throughout the whole image. For instance, the SAR-BM3D algorithm [5] uses the local linear minimum mean square error (MMSE) criterion and undecimated wavelet transform. However, the low computational efficiency of the similar patch searching restricts its application.
In addition, the variational methods [6] have gradually been utilized for SAR image despeckling because of their stability and flexibility. The despeckling task is cast as the inverse problem of recovering the original noise-free image based upon reasonable assumptions or prior knowledge of the noise observation model. Although these variational methods [7]- [8] have achieved good reduction of speckle noise, the result is usually dependent on the choice of the model parameters and prior models.
In general, although a lot of SAR despeckling methods have been proposed, they sometimes fail to preserve sharp features in domains of complicated texture, or even create some block artifacts in the speckled image. In this letter, considering that image speckle noise can be expressed more accurately through non-linear models than linear models, and to overcome the limitations of the linear models, we propose a novel deep neural network based approach for SAR image despeckling, learning a non-linear end-to-end mapping between the speckled and clean SAR images by a dilated residual network (SAR-DRN). Our despeckling model employs dilated convolutions [9] and skip connections with residual learning strategy. Compared with the traditional despeckling methods, the proposed approach shows a state-of-the-art performance in both quantitative and visual assessments, especially for strong speckle noise.
The rest of this letter is organized as follows. The SAR image speckling noise degradation model and the related deep neural network method are introduced in Section II. The network architecture of the proposed method is described in Section III. The results of the despeckling assessment in both simulated and real experiments are presented in Section IV. Finally, the conclusions are summarized in Section V.

A. SAR Image Speckling Noise Degradation Model
For SAR images, the main reason for the degradation of the image quality is multiplicative speckle noise. Differing from additive white Gaussian noise (AWGN), speckle noise is described by the multiplicative noise model: where y is the speckled noise image, x is the clean image, and n represents the speckle noise. It is well-known that, for SAR amplitude image, the speckle follows a Rayleigh distribution [8]: where 1 L  , 0 n  ,  is the gamma function, and L is the equivalent number of looks (ENL), as defined in (3), which is usually regarded as the quantitative evaluation index for real SAR image despeckling experiments in the homogeneous areas.
where mean and std respectively represent the image mean and standard deviation.
Therefore, for this non-linear multiplicative noise, choosing a non-linear expression for speckle reduction is an important strategy. In the following, we briefly introduce the use of convolutional neural networks (CNNs) for SAR image despeckling, considering both the low-level features as the bottom level and the output feature representation from the top level of the network.

B. CNNs for SAR Image Despeckling
Recently, benefiting from the powerful non-linear expression of deep convolution neural networks (DCNNs), CNNs have gradually become an efficient image processing method which has been successfully applied to many computer vision tasks such as image classification, segmentation, object recognition, and so on [17]. CNNs can extract the image internal features and avoid the complex preprocessing of images, organized in a feature map of 1 1 1 mnc , within which each unit is connected to local patches of the previous layer through a set of weight parameters W of size 12 k k N  and bias parameters b . The output feature map is: where  is a two-dimensional discrete convolution operation, s f is the output feature map of size 2 2 2 m n c , and i x is the i -th input feature map. Specially, the network parameters W and b need to be regenerated through back-propagation and the chain rule of derivation.
To ensure that the output of the CNN is a non-linear combination of the input, a non-linear function is introduced as an excitation function, such as the rectified linear unit (ReLU): For speckle noise reduction, a new method named the feed-forward SAR despeckling convolutional neural network (SAR-CNN) [10], as shown in Fig. 1, has recently shown excellent performances in contrast with the traditional methods. Along with batch normalization (BN) and a ReLU activation function as [11], and a component-wise division residual layer to estimate the speckled image, SAR-CNN uses the homomorphic approach with coupled logarithm and exponent transforms in combination with a similarity measure for speckle noise distribution.

III. METHODOLOGY
In this letter, rather than using log transform or modifying the loss function, we propose a novel network, which is trained in an end-to-end fashion using a combination of dilated convolutions and skip connections. Instead of relying on pre-determined image a priori knowledge or a noise description model, the main superiority of using the deep neural network strategy for SAR image despeckling is that the model can directly acquire and update the network parameters from the training data and the corresponding labels.
The proposed holistic neural network model for SAR image despeckling contains seven dilated convolution layers and two skip connections, as illustrated in Fig. 2. In addition, the proposed model uses a residual learning strategy to predict the speckled image, which adequately utilizes the non-linear expression ability of deep learning. The details of the algorithm are described in the following.

A. Dilated Convolutions
In image restoration problems such as single-image super-resolution (SISR) and denoising and deblurring, contextual information can effectively facilitate the recovery of degraded regions. In deep convolutional networks, they mainly augment the contextual information through enlarging the receptive field. Generally speaking, there are two ways to achieve this purpose: 1) increasing the network depth; and 2) enlarging the filter size. Nevertheless, as the network depth increases, the accuracy becomes "saturated" and then degrades rapidly. Enlarging the filter size can also lead to more convolution parameters, which greatly increases the calculative burden and training times.
To solve this problem effectively, dilated convolutions were first proposed in [9], which is an approach that can both enlarge the receptive field and maintain the filter size. Setting kernel size=3×3 as an example, Fig. 3 illustrates the dilated convolution receptive field size.
The common convolution receptive field has a linear correlation with the layer depth, in that the receptive field size receptive field has an exponential correlation with the layer depth, that In the proposed model, the dilation factors of the 3×3 dilated convolutions from layer 1 to layer 7 are respectively set to 1, 2, 3, 4, 3, 2, and 1. Compared with other deep networks, we propose a lightweight model with only seven dilated convolution layers, as shown in Fig. 2.

B. Skip Connections
Although the increase of network layer depth can help to obtain more data feature expressions, it often results in the vanishing gradient problem, which makes the training of the model much harder. To solve this problem, a new structure called the skip connection [12] has been created for the DCNNs. The skip connection can pass the previous layer's feature information to its posterior layer, maintaining the image details and avoiding or reducing the vanishing gradient problem. In the proposed model, two skip connections are employed to connect layer 1 to layer 3 (as shown in Fig. 2) and layer 4 to layer 7.

C. Residual Learning
Compared with traditional data mapping, He et al. [12] found that residual mapping can acquire a more effective learning effect and rapidly reduce the loss after passing through a multi-layer network. It is reasonable to consider that most pixel values in residual image are very close to zero, and the spatial distribution of the residual feature maps should be very sparse, which can transfer the gradient descent process to a much smoother hyper-surface of loss to filtering parameters. Thus, searching for an allocation which is on the verge of the optimal for the network's parameters becomes much quicker and easier, allowing us to add more layers to the network and improve its performance.
Specifically, given a collection of N training image pairs   In summary, with the dilated convolutions and skip connections structure, the flowchart of learning a deep network for the SAR image despeckling process is described in Fig. 4.

A. Implementation Details 1) Training and Test Datasets:
In this paper, the amplitude images are processed by the comparison methods and the proposed method. For SAR image despeckling with different numbers of looks, we used the UC Merced land-use dataset [13] as our training dataset. To train the proposed SAR-DRN, we chose 400 images of size 256×256 from this dataset and set each patch size as 40×40 and stride=10. To test the performance of the proposed model, single examples of the Airplanes and Buildings classes were respectively set up as simulated images. For the real SAR image despeckling, we used the classic Flevoland SAR image (cropped to 500×600), which is commonly used in real SAR data image despeckling. Table I lists the parameters of each layer for SAR-DRN. The proposed model was trained using the Adam [14] algorithm as the gradient descent optimization method where the learning rate was initialized to 0.01 for the whole network. The training process of SAR-DRN took 50 epochs, which uses Caffe [15] to train the proposed SAR-DRN in the Windows 7 environment, with an Intel Xeon E5-2609 v3 CPU at 1.90 GHz and an Nvidia Titan-X (Pascal) GPU.   0.6579 24.54/0.7279 26.52/0.7636 28.01/0.8203 26.80/0.7974 28.39/0.8381 30.14/0.8712 32.03/0.9022

B. Simulated-Data Experiments
To verify the effectiveness of the proposed model, four speckle noise levels of L=1, 2, 4, and 8 were set up for the two simulated images. The PSNR and SSIM results of the simulated experiments with the two images are listed in Table  II where the best performance is marked in bold and the second-best performance is underlined.
As shown in Table II, the proposed SAR-DRN model obtains all the best PSNR results in the four noise levels. When L=1, the proposed method outperforms SAR-BM3D by about 1.1 dB and 0.6 dB for Airplane and Building, respectively. When L=2 and 4, SAR-DRN outperforms PPB [4], SAR-POTDF [5], SAR-BM3D [8], and SAR-CNN [10] by at least 0.5 dB/0.7 dB and 0.4 dB/0.3 dB for Airplane/Building, respectively. Compared with the traditional despeckling methods above, the proposed method shows superior performance over the state-of-the-art methods on both quantitative and visual assessments, especially for strong speckle noise.  Fig. 6 correspondingly show the filtered images for the Airplane and Building images contaminated by 2-look speckle and 4-look speckle, respectively. Compared with the other algorithms above, SAR-DRN achieves the best performance in speckle reduction, concurrently avoiding introducing radiation and geometric distortion. In addition, from the red boxes of the Airplane and Building images in Figs. 5 and 6, respectively, it can be clearly seen that SAR-DRN also shows the best local detail preservation ability, while the other methods either miss partial texture details or produce blurry results, to some extent.

C. Real-Data Experiments
As shown in Fig. 7, we also compared the proposed method with the four state-of-the-art methods described above on a real SAR image. Visually, SAR-DRN achieves the best performance in speckle reduction and local detail preservation, performing better than the other mainstream methods.
In addition, we also evaluated the filtered results through ENL to measure the speckle-reduction ability. The ENL values were estimated from two chosen homogeneous regions (the red boxes in Fig. 8(a)) and are listed in Table III. Clearly, SAR-DRN has a much better speckle-reduction ability than the other methods, which is consistent with the visual observation.

Dilated Convolutions and Skip Connections:
To verify the effectiveness of the dilated convolutions and skip connections, we implemented four sets of experiments in the same environment. As Fig. 8 implies, the dilated convolutions can effectively reduce the training loss and enhance the despeckling performance (PSNR). Meanwhile, the skip connections also accelerate the convergence speed of the network and enhance the model stability.

V. CONCLUSION
In this letter, we have proposed a novel deep learning approach for the SAR image despeckling task, learning an end-to-end mapping between the noisy and clean SAR images. The presented approach is based on dilated convolutions, which can both enlarge the receptive field and maintain the filter size with a lightweight structure. Furthermore, skip connections are added to the despeckling model to avoid the vanishing gradient problem. Compared with the traditional despeckling methods, the proposed SAR-DRN approach shows a state-of-the-art performance in both simulated and real experiments, especially for strong speckle noise.
In our future work, we will investigate more powerful learning models to deal with the complex real scenes in SAR images. Furthermore, the proposed approach will be extended to polarimetric SAR image despeckling, whose noise model is much more complicated than that of single-polarization SAR.