A Filter for SAR Image Despeckling Using Pre-Trained Convolutional Neural Network Model

: Despeckling is a longstanding topic in synthetic aperture radar (SAR) images. Recently, many convolutional neural network (CNN) based methods have been proposed and shown state-of-the-art performance for SAR despeckling problem. However, these CNN based methods always need many training data or can only deal with speciﬁc noise level. To solve these problems, we directly embed an efﬁcient CNN pre-trained model for additive white Gaussian noise (AWGN) with Multi-channel Logarithm with Gaussian denoising (MuLoG) algorithm to deal with the multiplicative noise in SAR images. This ﬂexible pre-trained CNN model takes the noise level as input, thus only a single pre-trained model is needed to deal with different noise levels. We also use a detector to ﬁnd the homogeneous region automatically to estimate the noise level of image as input. Embedded with MuLoG, our proposed ﬁlter can despeckle not only single channel but also multi-channel SAR images. Finally, both simulated and real (Pol)SAR images were tested in experiments, and the results show that the proposed method has better and more robust performance than others.


Introduction
Synthetic aperture radar (SAR) is a widely used technique for earth observation due to its all-weather and day-and-night imaging acquisition, providing large scale and high resolution reflectivity image of the Earth surface and cloud-penetrating capabilities [1]. Thus, SAR images have been widely used in many fields, such as topographic mapping, environmental protection, resource exploration, disaster monitoring, and urban planning [1].
However, coherent combination of a large amount scatters within each resolution cell results in the inherent phenomenon of SAR: speckle [2]. The existence of speckle increases the difficulty of SAR image processing and leads to a severe decrease of the performance for scene interpretation, such as classification and object detection [3]. To solve this problem, many despeckling techniques for SAR images have been proposed over the last three decades, but there is still a pressing need for new methods that can efficiently eliminate speckle without sacrificing the spatial resolution [4].
Many SAR image denoising methods have been proposed in the literature. According to the tutorials [3,4], early works on despeckling are deployed in the spatial domain. Boxcar filter or multi-looking, which assigns pixel with the mean of its neighbors, is a straight and simple way to remove the speckle at the cost of resolution loss [4]. Lee filter [5], refined Lee filter [6], Frost filter [7], for single or multi-channel SAR despeckling was proposed, which achieves excellent performance. No matter single channel or multi-channel SAR images, the multiplicative speckle can be transformed into addictive ones using different strategies, such as homomorphic transform, and then a Gaussian filter can be embedded to solve the despeckling problem.
Our work aims to solve the single channel and multi-channel SAR despeckling using the deep learning which can learn how to denoise images from the training data directly rather than relying on predefined image priors or filters. For natural image denoising, a feed-forward denoising convolutional neural network (DnCNN) [30] based on residual learning was designed. Residual learning directly predicts the noise rather than a noise-free image, often achieving better performance. Its extension, a fast and flexible denoising convolutional neural network (FFDNet) [31], which takes the noise variance as input, was also proposed. In this way, a single model can handle various noise variance to get more precise denoising. More and more deep learning methods have been proposed for data restoration in SAR images [32][33][34][35]. Chierchia et. al. [32] followed the paradigm in [30] by proposing a convolutional neural network (SAR-CNN). To deal with the multiplicative noise, it used the homomorphic approach with coupled log and exponential transform, and redefined the loss function. This method achieves a comparable result to non-CNN based methods, but a larger number of co-registered temporal real SAR images are needed for the training. The image despeckling convolutional neural network (ID-CNN) proposed in [33] also used a very similar network architecture to the one in [30], which directly added multiplicative noise to natural images as training data. However, the training step is always time consuming and complicated. Besides, the authors proposed to jointly minimize both the Euclidean loss and the total variation (TV) loss to optimize the network. The methods proposed in [34] worked with an end-to-end fashion. They designed a dilated residual network (SAR-DRN) to get a better performance in high level speckle, and the dilated convolutions enlarged the receptive field while keeping filter size and the depth of the architecture. In [10], a deep encoder-decoder CNN network was proposed, which focused on the texture preservation. The network was an adaptation of U-Net, and allowed for the extraction of features at different scales.
To avoid the preparation of huge SAR dataset, well-designed network or loss function, and complicated training, the work in [36] directly embedded the pre-trained CNN model [30], which was trained on abundant nature gray images with AWGN, with the MuLoG framework [29] for SAR despeckling, and it achieved the satisfying performance. However, the problem in [36] is that 14 pre-trained models for specific preset noise levels were utilized. When the images have real noise levels that are not the preset ones, they are more likely to be over-smoothed (if the preset noise level is bigger than the real one) or under-smoothed (if the preset noise level is smaller than the real one). To solve this problem, we adopt a new pre-trained CNN model, FFDNet [31]. In FFDNet, a two-dimensional noise level map is taken as an input to the model training, thus the model parameters are invariant to the noise level. In this way, only a single pre-trained model is needed to denoise images with different noise levels in preset range, i.e. [0, 75], and better denoising results will be obtained. Moreover, when the number of looks L of SAR images is not clear, we employ the method in [37] to find the homogeneous region in the SAR images without any interaction to estimate the noise level, i.e., the number of looks L , as the input to the network. Combining the homogeneous region detector with FFDNet, we are able to denoise the single channel or multi-channel SAR images in an end-to-end way in the framework of MuLoG [29].
The contributions of this paper are: • Following the work of Yang [36], we adopt a new pre-trained CNN model, FFDNet. Taking the noise level as input to network training, FFDNet can obtain a more precise despeckling no matter what the noise level is. Combining MuLoG with pre-trained FFDNet model, this despeckling filter can not only handle single channel but also multi-channel SAR images. • We propose to use the homogeneous region detector to calculate the number of looks, L, which makes the despeckling framework work in an end-to-end way.
• Experiments on simulated and real (Pol)SAR images demonstrate the superior despeckling ability of our proposed method.
The remainder of this paper is organized as follows. Sections 2 and 3 introduce the proposed despeckling methods for SAR or PolSAR image, respectively. Next, experimental results are given in Section 4. Finally, the discussion and the conclusions are presented in Sections 5 and 6, respectively.

A Gaussian Denoiser: FFDNet
For additive white Gaussian noise (AWGN), image denoising methods can be divided into two major categories: model based methods and discriminative learning based ones. Although model based methods, such as BM3D [21], are designed to model image priors and always have state-of-the-art denoising performances with various noise level, they are always time consuming and need to be carefully designed. To overcome the limitations of model based methods, several discriminative learning based methods are proposed, which learn the underlying images prior and fast inference with training. Among them, convolutional neural networks (CNN) based denoisers [30,31] have achieved very competitive denoising performance.
FFDNet [31] is a fast and flexible CNN based Gaussian denoiser which learns the "clean" images from noisy ones. Figure 1 illustrates the architecture of FFDNet. This network is modified from the famous VGG network [38], which consists of a series of 3 × 3 convolutional layers without pooling. Each layer is composed of a specific combination: Convolution (Conv), Rectified Linear Units (ReLU) [39], and Batch Normalization (BN) [40]. Specifically, "Conv + ReLU" is adopted for the first layer, "Conv + BN + ReLU" for middle layers, and "Conv" for the last convolution layer. Zero-padding is employed to keep the size of feature maps unchanged after each convolution. Two highlights of FFDNet are: • Downsampling and Upsampling. As the efficiency is a crucial issue for CNN-based denoising methods, a reversible downsampling layer is added to reshape the input image into a set of sub-images which can also effectively expand the receptive filed. A upsampling operation is applied in the last layer. In [31], the downsampling factor is set to 2.
• Noise Level Map, M. Being different from other CNN-based denoisers, FFDNet takes a tunable noise map M as input to network to handle different noise levels with a single model. The noise map M can make a trade-off between noise reduction and detail preservation.
The denoising problem in FFDNet can be written as [31]: where F is a mapping function between the noisy observation y andx.x is the estimation of the desired "clean" image x, and Θ is the underlying model parameters. Before feeding data into network, the input image of size W × H × C is downsampled into four sub-images with size of W 2 × H 2 × 4C (the downsampling factor is 2 in both hight and width), where C is the number of channels. Then, noise level map M is concatenated to form a tensor of size W 2 × H 2 × (4C + 1) as the input to CNN. In the training step, about 5544 natural gray images are used, and the patch size is set as 70 × 70 pixels. In each epoch, N = 128 × 8000 patches are added AWGN of a random noise level σ ∈ [0, 75] to get noisy patches [31], It is worth noting that the elements in M are all σ. To train the model, the loss function is defined as the mean square error (MSE) betweenx and ground truth x [31]: represents input-output pairs. After 80 epochs of training, we can get a single pre-trained FFDNet for gray images with different noise level AWGN.

Homogeneous Region Detector
As mentioned above, the noise level is one of the inputs to the network, which means we need to know the noise level σ of the noisy image. For SAR or PolSAR images, the noise level is always related to the number of looks L. To obtain the L, one common way is to find several visual homogeneous regions, and then calculate the L using their mean and standard deviation. When the number of looks is not known, this procedure makes denoising to be a two stage problem: estimate L and filtering. To make denoising as an end-to-end problem, we utilize the idea in [37] to find homogeneous areas in SAR images automatically without any prior information.
In the homogenous areas, the signal fluctuations can be assumed to be negligible compared to the noise fluctuations, thus the noise statistics can be estimated from these areas. In [37], the image is first divided into small square regions, and then a non-parametric homogeneous region detector is designed to answer the following statistical hypothesis test: where B is the fixed size square block. Based on the proposition that, if two random disjoint sequences P and Q of a block are correlated, i.e., Corr(P, Q) = 0, then the block is inhomogeneous. And a score s : R n × R n → R should be designed to answer above hypothesis problem. The score s should promise that for any P FA ∈ [0, 1], there exists a threshold α > 0 satisfying [37]: where n is the number of elements in P or Q. To make the test of homogeneity independent of the noise distribution, the rank of the values is considered. The block is homogeneous if the ranking of the pixel values is uniformly distributed. Thus, the Kendall's τ coefficient [41,42] is used to assessed such correlation between random sequences in the block, which depends only on the relative order of the values of P and Q [37]: where P ∈ R n , Q ∈ R n , and τ : R n × R n → [−1, 1]. P i is the ith elements of P.
Once the homogenous regions are found, the value of L can be easily calculated. One of the advantages of this detector is that it has a non-detection rate independent of the unknown distribution of the noise. Thus, it can be employed to SAR images even the speckle is always multiplicative.

Homo-FFDNet Filter for SAR Images
According to Goodman's model [2], the intensity (the square of modulus) of SAR images follows a gamma distribution G(R; L) whose probability density is: where I ∈ R + is the observed intensity and R ∈ R + is the underlying reflectivity. The number of looks L satisfies L > 0, and the Γ means the gamma function. The intensity of SAR is disturbed by signal dependent multiplicative fluctuations S which follows a standard gamma distribution (S ∼ G(1; L)): There are many ways to estimate the underlying reflectivity R, among which homomorphic approach is a simple and straight one. The logarithm y = log I ∈ R is used to transfer the multiplicative fluctuations S to the additive ones. Then, a Gaussian denoiser for AWGN comes into play. After log-transforming, y = log I follows the Fisher-Tippett distribution, denoted as F T (x; L) [43]: where x = log R ∈ R. It should be pointed out that the theoretical variance of y is Ψ(1, L) where Ψ(·, L) is the polygamma function of order L. If we cannot know the value of L in advance, L can be calculated with the help of homogeneous region detector. A classical approach to estimate x is to approximate the Fisher-Tippett distribution by a non-centered Gaussian distribution [29]: where f σ 2 : R n → R n is a denoiser for zero-mean additive white Gaussian noise N 0; σ 2 , and Ψ(·) is the digamma function. Finally, the estimatedR can be obtained by an exponential operation R = exp(x). It is worth noting that a debiasing step is necessary as we assumed a non-zero mean noise to be a zero-mean one. Here, we choose the pre-trained FFDNet model as the Gaussian denoiser f in Equation (11). Being similar to the preprocessing in the test step in FFDNet, normalization should be done to the SAR intensity as the input to the network: where y min is the 0.3% quantile of y and y max is the 99.7% quantile of y. Here, 0.3% and 99.7% are empirical values according to the 3σ criterion. This is to eliminate the effects of heavy left tail in Fisher-Tippet distribution as well as removing outliers. The variance of the intensity which is used to form the noise level map M should also be normalized: Finally, the normalized intensity and variance are taken as inputs into pre-trained FFDNet to get the "clean" imagex. After debiasing and exp-transformation, the filtered SAR image can be obtained. The framework of the homomorphic approach, named Homo-FFDNet, is displayed in Figure 2.

MuLoG-FFDNet Filter for SAR Images
Rather than estimating the log-transformed reflectivity x by a Gaussian distribution such as homomorphic approach, the MuLoG algorithm [29] adopts variational approach directly which considers the Fisher-Tippett distribution in log domain, leading to a MAP optimization problem [44,45]: where R(x) = − log p x (x) is a prior term enforcing some regularity on the solution, n is the number of pixels in image, and Cst. is a constant. The problem in Equation (14) is a generic unconstrained optimization, and it can be solved by using the plug-and-play Alternating Direction Method of Multipliers (ADMM) algorithm [46], which repeats the updates: where k means the k th (k = 1, 2, ..., K) iteration, and K is set to 6 in all the experiments presented in this paper. f σ k is a Gaussian filter with the noise variance σ k . In [29], the σ k is chosen as 1/ρ, and the internal parameter ρ = (1 + 2/L)/Ψ(1, L). The homogeneous region detector can be used here to calculate L. The minimization of x in Equation (16) amounts to solving n separable convex problems, which can be solved efficiently with 10 iterations of Newton's method [46]. In MuLoG framework, Equation (17) is done by a Gaussian filter, and, in our proposed method, f σ k is the pre-trained FFDNet model. Equations (17) and (18) work for non-linear correction. After six iterations, an exp-transformation is operated onx to get the final estimateR. The framework of MuLoG-FFDNet filter for single channel SAR images is displayed in Figure 3.

Despeckling Filter for PolSAR Images
PolSAR images carry more information than single channel SAR images. For the monostatic fully PolSAR images, the lexicographic scatter vector k is defined as: where h and v are the polarization states and T is transposition operation. Suppose there are L independent random scattering vector samples following the D-dimensional complex circular Gaussian distribution, {k 1 , · · · , k L }, satisfying L > D (D = 3). Then, the covariance matrix C can be defined as the mean of k's inner product: where H means complex conjugate transpose. According to Goodman's model [2], the covariance matrix follows the complex Wishart distribution W (Σ; L): where Σ is the underlying covariance matrix and S is the speckle. Both C and Σ belong to the open cone of complex Hermitian positive definite matrices. To use the MuLoG algorithm [29], the covariance matrix C should be transformed into log domain. Here, a matrix logarithm is used to convert the multiplicative speckle into additive ones, Σ →Σ = log Σ, and the matrix exponential is defined similarly,Σ → Σ = eΣ. The log-transformed matrixC follows the Wishart-Fisher-Tippett distribution [29]: After matrix log transformation, a re-parameterization method [29] which represents the log-transformed covariance matrixC as a real vector y is proposed, denoted asC = Ω(y). Here, the noise in each of y's channels y i , i = 1, ..., D 2 (D= 3) is assumed to be signal independent, and the noise variance is about the same for all channels. As for PolSAR, the MAP optimization problem in Equations (14) and (15) is redefined as: tr Ω (x k ) + e Ω(y k ) e −Ω(x k ) + Cst.
whereΣ = Ω(x), n is the number of pixel and x k means the kth pixel. When D = 1, this MAP problem is exactly the same as single channel case. Finally, the estimatedΣ can be obtained byΣ = exp Ω (x).
To solve the MAP optimization problem, the ADMM algorithm is used. We also use the pre-trained FFDNet model as Gaussian denoiser. However, as suggested by Deledalle et al. [29], the value of σ k is 1 in the initial iteration and 1 + 2/L in the following iterations. Here, the number of iterations is also set as 6. Figure 4 gives the framework of despeckling PolSAR images with pre-trained model embedded with MuLoG.
To simulate SAR images, different levels of speckle (L = 1, 2, 4, 5, 10, 15, 20) were added to the famous Lena image with size of 512 × 512 pixels. Only the simulated images of L = 1, 4 are shown in Figure 5. To quantitatively evaluate the performance of all filters for simulated SAR images, peak signal noise ratio (PSNR), up to most recent indexes, was first chosen to measure the amplitude reconstruction. Structural preservation index measurement (SSIM) [47] can also provide meaningful information about the closeness of two images from the point of their structural information. Equivalent number of looks (ENL) was also used to measure the degree of speckle suppression when homogeneous areas are provided. For PSNR and ENL, the higher value is better, but, for SSIM, the value closest to 1 is better. We also used the homogeneity index δh of M-index proposed in [48] to measure the detail preservation. For δh, a smaller value means better detail preservation.  Figure 6, the homomorphic approach using pre-trained model (DnCNN or FFDNet) tends to leave residual dark strains and over-smooth bright targets, to some degree, revealing its lack of robustness against the heavy left tail of the Fisher-Tippet distribution. However, the Homo-FFDNet performs better than Homo-DnCNN, as fewer residual dark strains are left. Comparing with homomorphic approach, the MuLoG based methods seem to show a good speckle suppression and the ability to remove artifacts, among which the MuLoG-FFDNet is the best. The filter result of MuLoG-BM3D has a "piece by piece" appearance, while MuLoG-DnCNN and MuLoG-FFDNet look much smoother. It is worth noting that MuLoG-FFDNet has good smoothing and better detail preservation than MuLoG-DnCNN. For example, MuLoG-DnCNN still has a little residual dark strain on Lena's shoulder while MuLoG-FFDNet does not. Looking at the brim of Lena's hat, MuLoG-FFDNet obviously retains more image details, as shown by the zoomed in image. The pre-trained FFDNet model introduces a noise level map M, which plays the role of controlling the trade-off between noise reduction and detail preservation. In addition, being trained with abundant clean-noisy image pairs, FFDNet has the strength to preserve details well. The quantitative results are provided in Table 1, and the ENL was computed using homogeneous areas A1 and A2 displayed in Figure 5a. When considering low level of noise with L = 4, MuLoG based methods are still more promising in the vision than the Homo based methods in Figure 7. The quantitative results are provided in Table 2.
Considering the homogeneity index, δh, we can find that, when the noise level of image is high (L = 1), Homo based methods seem to have a better detail preservation than MuLoG based methods. MuLoG based methods tend to have an over-smoothing. However, when the noise level of image is low (L = 4), MuLoG based methods perform better. Figure 8 gives the change trends of PSNR and SSIM with different L. We can see that MuLoG-FFDNet has robust performance.    Two TerraSAR-X images in Guangzhou with size of 500 × 500 pixels were also tested, as shown in Figure 9. Being different from simulated SAR image, real SAR image does not have a clean reference, thus ENL and visual inspection are the best ways to do the qualitative and quantitative analysis. To verify the ability of edge preservation of the proposed filter, the ratio of average (EPD-ROA) [49] was also computed along the horizontal direction (HD) and vertical direction (VD). δh was used to evaluate the detail preservation. The denoising results for two real SAR images are given in Figures 10 and 11, respectively. The values of ENL (computed in areas B(C)1 and B(C)2) and EPD-ROA (computed in areas B(C)3 and B(C)4) are also provided in Table 3 and 4 for comparison. Because the information about L is not very clear for these two SAR images, we used homogeneous region detector to calculate the value of L. For Data 1, the computed L = 1.11, while, for Data 2, L = 1.46.

Experiments for PolSAR Images
Our framework, which embeds MuLoG with pre-trained FFDNet can not only handle single channel SAR images, but also multi-channel SAR images. We chose refined Lee filter, IDAN, MuLoG-BM3D, and MuLoG-DnCNN for comparison purpose. For refined Lee Filter, the size of the filtering window was set to be 7 × 7. The IDAN filter was tested by using a maximum adaptive neighborhood size of 15 pixels. For both simulated PolSAR images and real ones, ENL and ERI-ROA were still computed to evaluate the performance of these filters quantitatively.
We followed the method mentioned in [1] to generate simulated PolSAR images with L = 1 and L = 4 and size of 300 × 300 pixels, as displayed in Figure 12. The visual filtered results and the nine components of x (x 1 , x 2 , . . . , x 9 ) for the image of the covariance matrix C are displayed in Figures 13 and 14. When the noise level is high, e.g. L = 1, the MuLoG based methods have an obvious strength in smoothing than refined Lee filter and IDAN, whose filtering results retain many speckles. To be specific, the MuLoG-FFDNet is the best, while MuLoG-DnCNN is almost the same. However, looking carefully, some stains are still left in the result of MuLoG-DnCNN while the performance of MuLoG-FFDNet is good, no matter its ability to speckle suppression or detail preservation. The quantitative comparisons are also provided in Table 5 and 6 (ENL was computed in areas D1 and D2, ERI-ROA was computed in areas D3 and D4). Whether the noise level is high or low, the performance of the MuLoG-FFDNet is very robust.  We next used two real PolSAR images: one is a fully polarimetric SAR image obtained by CETC-38th airborne SAR system in Lingshui city of China with the number of looks L = 1 and size of 500 × 500 pixels and the other is a fully polarimetric GF-3 image over San Francisco with L=1, and size of 500 × 500 pixels. The visual filtered results are given in Figures 15 and 16. The quantitative results are provided in Tables    PolSAR image despeckling is an important preprocessing step before extracting land-object information. A state-of-the-art filter should not only effectively suppress the speckle and preserve the edge, but also enhance the differences between various classes and preserve the polarimetric scattering mechanism. Thus, the scatter grams of the Cloude polarimetric decomposition parameters (entropy H, anisotropy A, and alpha angle α) [50] for three areas (marked with white squares in Figures 8 and 15) with different land-object types are provided in Figures 17 and 18 for two PolSAR data, respectively. The entropy H describes the degree of statistical disorder of each distinct scatter type within the ensemble; the anisotropy A measures the relative importance of the second and the third eigenvalues of the eigen decomposition; and the angle α is related to the underlying average physical scattering mechanism. First, various kinds of land-objects are much more separable after filtering due to the suppression of the speckle. Then, compared with refined Lee and IDAN, MuLoG based methods can not only preserve the polarimetric scattering mechanism well but also make the various land-object more separable.

Experiments for Homogeneous Region Detector
We used simulated single channel SAR images to validate the effectiveness of homogeneous region detector. Several simulated images with different numbers of looks L, (L = 1,2,4,5,10,15,20) were generated, and the homogeneous region detector was used to find the homogeneous region of each SAR images. Once the regions were found, they were used to calculate L. Figure 19 shows the results, where the red "•" is the ground truth of L, the green "×" means L was computed by homogeneous region detector (HRD), and the blue " " means homogeneous regions (HR) was found by manual operation. In Figure 19, we can see the values of L computed by homogeneous region detector are the closest to the ground truth. Table 9 gives the quantitative comparison on simulated filtered SAR image.

Computation Time Analysis
We tested the computation time using a single channel SAR image with size of 512 × 512 pixels. The computation time of different methods are provided in Table 10. The computer that we used is with a Inter(R) Core(TM) i7-8700 CPU @3.20 GHz, 16 GB RAM and a Nvidia GeForce GTX 1050 Ti GPU. The Homo+BM3D and MuLoG+BM3D were performed in Matlab (R2017a) environment with the code provided by the author of MuLoG [29]. Homo+DnCNN, Homo+FFDNet, MuLoG+DnCNN, and MuLoG+FFDNet were performed in Python 3.6 and pytorch with GPU. The codes of these four methods were modified based on the source codes provided by the authors of DnCNN [30] and FFDNet [31], respectively. Table 10 shows that the MuLoG based methods cost more time than the Homo based methods, as the formers involve iterations. The FFDNet based methods are a litter faster than the DnCNN based methods, because the downsampling operation in FFDNet accelerates the testing time.

Discussion
Speckle reduction is a key issue for SAR images, as it affects the accuracy and effectiveness of SAR images interpretation. In this paper, we propose to use the pre-trained FFDNet model with MuLoG framework to handle single channel or multi-channel SAR images. Our objective is to design a filter which can efficiently eliminate the speckle while having a good preservation of edge and details. With the help of noise level map, M, FFDNet based methods can solve the limitation in the methods proposed in [36], where over-smoothing or under-smoothing occurs. In addition, the homogeneous region detector helps make the proposed methods an end-to-end process. From the experimental results for both single channel SAR images and multi-channel SAR images, we can observe that, in terms of suppressing the speckle, the MuLoG based methods have superiority compared with Homo based methods. Among MuLoG based methods, MuLoG-FFDNet performs more robustly than MuLoG-DnCNN. The MuLoG-DnCNN may leave some spots on the images sometimes, while MuLoG-FFDNet always has an effective speckle reduction at various noise levels. As for detail preservation, MuLoG based methods tend to make some details over-smoothed, but they can remove the artifacts created by Homo based methods. Observing the zoom in parts in the filtered results, MuLoG-FFDNet has a better detail preservation than MuLoG-DnCNN. MuLoG based methods also make different land objects more separable while retaining the polarimetric scattering mechanisms well. When the value of L is unknown, the homogeneous region detector, which computes the number of looks L automatically, also has shown a satisfactory performance at various noise levels.
Using the pre-trained CNN model directly prevents us from preparing dataset, well designed network and tedious training to obtain state-of-the-art results. Moreover, as the lack of PolSAR data, no CNN-based methods have been proposed to despeckle PolSAR images. Maybe, a pre-trained filter on SAR images can be used to solve this problem embedded with MuLoG framework.
However, looking carefully at the filtered results of MuLoG-FFDNet, we can see the halo artifacts exist around the edges. Recently, a very simple but efficient idea to improve filter's edge preservation capacity, called Side Window [51] has been proposed. Side Window changes the traditional center-based window into a side-based window, and can effectively restrain halo around edges. Although, the MuLoG framework is a non-patch based method, but the idea of Side Window can be used in the CNN network to train a filter with superior edge and detail preservation capacity in the future.

Conclusions
While most deep learning methods need many data or a well designed model, this paper proposes a simple yet efficient method which uses the pre-trained CNN model for SAR and PolSAR images despeckling embedded with MuLoG framework and a homogenous region detector. This pre-trained FFDNet model allows the proposed filter to work very robustly at various noise levels for single channel and multi-channel SAR images. Our work confirms that any superior Gaussian filter can be plugged into MuLoG framework to obtain the promising results in SAR images.
Author Contributions: T.P. and D.P. provided the original idea for the study. T.P. drafted the manuscript. W.Y. and H.-C.L. contributed to the discussion of the design, supervised the research and contributed to the article's organization.