Selection of Lee Filter Window Size Based on Despeckling Efficiency Prediction for Sentinel SAR Images

Radar imaging has many advantages. Meanwhile, SAR images suffer from a noise-like phenomenon called speckle. Many despeckling methods have been proposed to date but there is still no common opinion as to what the best filter is and/or what are its parameters (window or block size, thresholds, etc.). The local statistic Lee filter is one of the most popular and best-known despeckling techniques in radar image processing. Using this filter and Sentinel-1 images as a case study, we show how filter parameters, namely scanning window size, can be selected for a given image based on filter efficiency prediction. Such a prediction can be carried out using a set of input parameters that can be easily and quickly calculated and employing a trained neural network that allows determining one or several criteria of filtering efficiency with high accuracy. The statistical analysis of the obtained results is carried out. This characterizes improvements due to the adaptive selection of the filter window size, both potential and based on prediction. We also analyzed what happens if, due to prediction errors, erroneous decisions are undertaken. Examples for simulated and real-life images are presented.


Introduction
Radar remote sensing (RS) has found numerous applications in ecological monitoring, agriculture, forestry, hydrology, etc. [1][2][3][4][5]. This can be explained by the following reasons [2,3]. First, radar sensors can be used in all-weather conditions during day and night. Second, modern radars (mostly synthetic aperture radars (SARs)) provide high spatial resolution and data (image) acquisition for large territories, often with high periodicity-many existing systems perform the frequent observations (monitoring) of terrains under interest. Third, modern SARs produce valuable information content especially if they operate in multichannel (multi-polarization or multi-temporal) mode [2][3][4]6,7]. Fourth, SAR images are often provided after their pre-processing that includes co-registration, geometric and radiometric correction, and calibration. This is convenient for their further processing and interpretation.

1.
Filter performance depends upon many factors, including the parameter settings used. Parameters can be varied and set depending on the filter type. These parameters include scanning window size [12,13], thresholds [17,18,25], a block size [25], parameters of variance stabilizing transforms [17,18], and the number of blocks processed jointly within nonlocal despeckling approaches [17,20].

2.
Despeckling (denoising) performance considerably depends on image properties. For simpler structure images (that contain large homogeneous regions), a better performance is usually achieved compared to complex structure images (which contain a lot of edges, small-sized objects and textures) [26][27][28]. 3.
Speckle properties also influence a filter performance. There are filters applicable to speckle with a probability density function (PDF) close to Gaussian but there is no such restriction for some other filters. The spatial correlation of the speckle (and noise in general) plays a key role in the efficiency of its suppression [29,30]. This means that the spatial correlation of the speckle should be known in advance or pre-estimated [31] and then taken into account in filter and/or its parameters' selection.

4.
Filter performance can be assessed using different quantitative criteria; for SAR image denoising, it is common to use peak signal-to-noise ratio (PSNR) and an efficient number of looks [11,[17][18][19][20], although other criteria are applicable as well. In particular, it has become popular to use visual quality metrics [32][33][34]. Despeckling methods can be also characterized from the viewpoint of the efficiency of image classification after processing [35][36][37][38]. The SSIM metric [39] has become popular in remote sensing applications but this metric is clearly not the best visual quality metric [40][41][42] among those designed due to the current moment.
Due to these difficulties, the available tools for SAR image processing usually have a limited set of applicable filters. For example, such software packages as ESA SNAP toolbox, ENVI, etc., have a set of filters such as Frost, Lee, refined Lee and some others [10,43,44]. A common feature of these filters is that they have a limited number of parameters that must be set. These are the scanning window size, multiplicative noise variance and/or some other parameters. Even in this case, a user should either have a knowledge (experience) concerning a parameter setting or try the available options before getting the final results.
Note that the "optimal" parameter setting, even for a given filter type, depends on image and noise properties [21,[45][46][47]. For images with a more complex structure and/or less intensive noise, more attention at the parameter setting stage should be paid to edge/detail/texture preservation, and vice versa, for images with a simpler structure and/or more intensive noise, noise suppression efficiency is of prime importance. Two approaches of realizing this "strategy" automatically or semi-automatically are possible. One can be treated as a global adaptation where properties of a considered image to be filtered are briefly analyzed and parameters (e.g., thresholds [47]) of a filter are set accordingly (with respect to some rules or algorithms). Another approach [21,46] relates to a local adaptation where parameters (e.g., scanning window size [21] or thresholds [46]) vary according to some algorithm. Note that in both cases, some simple preliminary analysis is carried out either globally or locally. Decisions are performed based on some predictions of filtering efficiency, either global or local [27,28,[46][47][48][49].
In this paper, we consider the local statistic Lee filter and set its scanning window globally (for an entire image or, at least, its large fragments). We have three main hypotheses. First, due to a proper setting of the filter window size, a sufficient improvement of despeckling efficiency can be often achieved compared to the case of "average" setting (for example, 7 × 7 pixels for all cases). Second, filter performance prediction for different scanning window sizes can be easily and accurately done to undertake a decision on optimal size. Third, the approach designed based on two previous hypotheses has to be partly adapted to the properties of a considered class of SAR images, in this case, Sentinel-1 images with the number of looks approximately equal to five. Meanwhile, we believe that the proposed approach is general, and after modifications, can be applied to other types of SAR images and despeckling filters.
We already discussed why we consider the local statistic Lee filter-it is a well-known filter which can be efficiently computed and is widely used in SAR image processing. Sentinel-1 images are considered in this study since these data are openly available and are acquired with high periodicity.
One more peculiarity of our study is that we rely on the results obtained in our earlier papers [28,47,49]. It was shown in [28] that filter performance (according to several criteria) in the despeckling of Sentinel-1 SAR images can be accurately predicted using a trained neural network (NN). In [47], it was demonstrated that the global optimization of filter parameters based on filtering efficiency prediction is possible. Finally, following from the results given in [49], the filtering efficiency prediction for a 5x5 local statistic Lee filter is possible with high accuracy. This paper concentrates on the development of our method for perceptual qualitydriven image denoising based on the preliminary visual metrics of prediction and subsequent analysis. The basic concept and common methodology have been developed since our previous works [27,28] and as extension of our recent paper [49]. In [49], we demonstrated that the efficiency of the Lee filter can be accurately predicted. The main contributions of this paper are the following. First, we show that filtering efficiency can be predicted for various window sizes of Lee filter with an appropriate accuracy. Second, we demonstrate that, based on such a prediction, it is possible to undertake a correct decision on the optimal window size with a high probability.
The methodology of our study includes the analysis of image/noise properties for Sentinel SAR images, the statistical analysis of optimal window sizes depending upon a used metric of filtering efficiency, NN design and its testing for simulated data with an analysis of prediction accuracy, the verification of the proposed approach for real-life SAR images.
The paper is structured as follows. Section 2 describes the image/noise model, the considered filter, and the analyzed quality metrics. Some preliminary examples are given. The NN input features, its structure and images used for learning are considered in Section 3. NN training results are presented in Section 4. The proposed approach and its applicability are discussed in Section 5, also presenting examples for Sentinel-1 data. Finally, the conclusions follow.

Image/Noise Model and Filter Efficiency Criteria
The image/noise model relies on general information about SAR image/speckle properties [2,8,[10][11][12][13] and on available information concerning speckle characteristics in Sentinel-1 images [50,51]. A common assumption is that speckle is purely multiplicative. Experiments have proven that this assumption is correct for both VV (vertical-vertical) and VH (vertical-horizontal) polarizations of Sentinel-1 radar data [28,50]. A relative variance of speckle σ 2 µ is approximately equal to 0.05 for both VV and VH polarizations [51]. A speckle PDF is not "strictly" Gaussian, but it is quite close to it. Thus, we can present an observed image as where I true ij , i = 1, . . . , I Im , j = 1, . . . , J Im denotes the true or noise-free image, µ ij is a speckle in the ij-th pixel (it has a mean value equal to unity and a variance equal to σ 2 µ ), and I Im and J Im define size of a considered image.
Another important property is that speckle in Sentinel-1 images is spatially correlated. This has been proven by experiments performed in [28,50,51] for both polarizations. Examples of image fragments of size 512 × 512 pixels for the same terrain region are given in Figure 1. (b) by analysis of the Fourier power spectrum; and (c) by analysis of the spatial spectra for other orthogonal transforms; all determined in the homogeneous image regions or estimated using some robust techniques able to eliminate or minimize the negative influence of image information content on the obtained estimates. Such an analysis was carried out, with results presented in [28,50,51]. It has been proven that speckle is spatially correlated, and the possibility of simulating speckle with the same characteristics as in Sentinel-1 images has been demonstrated [28]. For the quantitative analysis of the filtering efficiency, we used three types of metrics determined for simulated images. First, we determine metrics' values for noisy images that were obtained by artificially introducing speckle with the aforementioned properties to noise-free images. Second, we calculate full-reference metrics' values after filtering. Third, we estimate metrics' "improvements" due to despeckling determined as = − , where and are metric values for denoised and true images, respectively.
For further analysis, we decided to use three metrics: conventional peak signal-tonoise ratio (PSNR), PSNR-HVS-M [52] (peak signal-to-noise ratio taking into account human vision system (HVS) and masking (M)), and the grayscale version of feature similarity index (FSIM) [53]. PSNR is the standard metric often used in analysis. The other two metrics are visual quality metrics that are among the best for characterizing grayscale (single-channel) image quality. As there are currently no universal visual quality metrics, we prefer using and analyzing two visual quality metrics based on different principles simultaneously. Moreover, the properties of these metrics are well studied. For example, PSNR and PSNR-HVS-M are both expressed in dB and their larger values are supposed to correspond to better quality. Distortion visibility thresholds for these metrics are established in [54]. It is also known that the difference in the quality of processed images of about 0.5 dB or larger can be noticed. Improvements of PSNR greater than 6 dB and of PSNR-HVS-M greater than 4 dB are needed to state with a high probability that SAR image visual quality has been improved due to filtering. The metric FSIM varies in the limits from 0 to 1 and it should be larger than 0.99 to show that noise or distortions are invisible [54]. This very rarely happens for SAR images, both original (noisy) and despeckled. Due Spatial correlation can be detected and characterized in different ways: (a) by visualization and analysis of the 2D spatial auto-correlation function or its main cross-sections; (b) by analysis of the Fourier power spectrum; and (c) by analysis of the spatial spectra for other orthogonal transforms; all determined in the homogeneous image regions or estimated using some robust techniques able to eliminate or minimize the negative influence of image information content on the obtained estimates. Such an analysis was carried out, with results presented in [28,50,51]. It has been proven that speckle is spatially correlated, and the possibility of simulating speckle with the same characteristics as in Sentinel-1 images has been demonstrated [28].
For the quantitative analysis of the filtering efficiency, we used three types of metrics determined for simulated images. First, we determine metrics' values for noisy images that were obtained by artificially introducing speckle with the aforementioned properties to noise-free images. Second, we calculate full-reference metrics' values after filtering. Third, we estimate metrics' "improvements" due to despeckling determined as I M = M f − M inp , where M f and M inp are metric values for denoised and true images, respectively.
For further analysis, we decided to use three metrics: conventional peak signal-tonoise ratio (PSNR), PSNR-HVS-M [52] (peak signal-to-noise ratio taking into account human vision system (HVS) and masking (M)), and the grayscale version of feature similarity index (FSIM) [53]. PSNR is the standard metric often used in analysis. The other two metrics are visual quality metrics that are among the best for characterizing grayscale (single-channel) image quality. As there are currently no universal visual quality metrics, we prefer using and analyzing two visual quality metrics based on different principles simultaneously. Moreover, the properties of these metrics are well studied. For example, PSNR and PSNR-HVS-M are both expressed in dB and their larger values are supposed to correspond to better quality. Distortion visibility thresholds for these metrics are established in [54]. It is also known that the difference in the quality of processed images of about 0.5 dB or larger can be noticed. Improvements of PSNR greater than 6 dB and of PSNR-HVS-M greater than 4 dB are needed to state with a high probability that SAR image visual quality has been improved due to filtering. The metric FSIM varies in the limits from 0 to 1 and it should be larger than 0.99 to show that noise or distortions are invisible [54]. This very rarely happens for SAR images, both original (noisy) and despeckled. Due to nonlinearity of FSIM, it is difficult to say what should be its improvement due to filtering to guarantee that a processed image has a better visual quality than the corresponding original one.
As it is known, the Lee filter output is expressed as where I Lee ij is the output image, I ij denotes the local mean in the scanning window centered on the ij-th pixel, I ij denotes the central element in the window, σ 2 ij is the variance of the pixel values in the current window.
Below we present some examples of Lee filter outputs. Since there are no commonly accepted noise-free and noisy SAR test images, a common practice is either to create some artificial noise-free images or to use some practically noise-free images acquired by other sensors. In [28], we used component images from channels #5 and #11 of Sentinel-2 multispectral imager. Figures 2-4 present some examples. Note that we consider four scanning window sizes: 5 × 5, 7 × 7, 9 × 9, and 11 × 11 pixels.
Remote Sens. 2021, 13, x FOR PEER REVIEW 6 of 27 As we only considered "marginal" cases for which the necessity to choose (determine) the optimal window size is obvious, we also carried out additional study. First, we determined the "statistics" of each window size to be optimal. For this purpose, 8100 test images of size 512 × 512 pixels were employed. After adding speckle, filtering with four scanning window sizes was applied and the metrics' values were calculated. For each metric, we determined how many times each window size provided the best results. Data obtained for all three considered metrics are given in Figures 5-7. The plot in Figure 5 shows that, according to PSNR, the 5 × 5 and 11 × 11 windows more frequently appear than the 7 × 7 and 9 × 9 pixel windows.     pixel scanning window, although the PSNRs for the 7 × 7 and 11 × 11 pixel windows are very close. The same holds for the metric PSNR-HVS-M, although the results for the 7 × 7 and 5 × 5 windows are very close. For FSIM, the best window size is 7 × 7 pixels. The results for the 5 × 5 window are the worst according to all criteria and indeed, speckle suppression is not enough. Concerning other the three output images, the opinions on their quality can differ from one expert to another. Figure 3 presents a "marginal" case when the image is almost homogeneous. In this case, the metrics' values steadily grow (improve) if the scanning window size increases. The largest improvements are observed for the 11 × 11 pixel window, while for PSNR and PSNR-HVS-M they reach almost 14 and 12 dB, respectively, clearly showing that the 11 × 11 window is the best choice. This is in good agreement with intuitive expectations since just efficient speckle suppression is the main requirement to image denoising in homogeneous image regions and such a property is achieved for filters with large scanning windows.
Another "marginal" case is demonstrated in Figure 4. The test image has a complex structure (is textural). Due to this, the best results are provided by the 5 × 5 scanning window according to all analyzed metrics. However, it is difficult to judge whether the visual quality has improved due to filtering or not, although metrics' improvements are positive. If the scanning window size increases, the output image quality decreases. This is because edge/detail/texture preservation is the main property of the filter in the considered case, and as it is known, a larger scanning window usually results in less preservation of image features, and therefore, worse visual quality.
The presented examples confirm that the optimal window size strongly depends on image content and the quality metric used. General tendencies are the following. First, a smaller window should be applied for complex structure images. Second, for visual quality metrics, optimal scanning window size is either the same as the optimal size according to PSNR or slightly smaller. This is explained by two facts: (a) for visual quality metrics, edge/detail/texture preservation is "more important" than noise suppression in homogeneous regions; (b) a better edge/detail/texture preservation is usually provided by filters with smaller window sizes (for the same filter type). Third, differences in metric values for filters with different scanning windows can be sufficient. For example, FSIM in example 1 varies from 0.87 to 0.84, PSNR in example 2 varies from 27.4 dB to 33.4 dB, PSNR-HVS-M in example 3 varies from 32.7 dB to 29.1 dB. This shows that it can be reasonable to apply the optimal window size.
As we only considered "marginal" cases for which the necessity to choose (determine) the optimal window size is obvious, we also carried out additional study. First, we determined the "statistics" of each window size to be optimal. For this purpose, 8100 test images of size 512 × 512 pixels were employed. After adding speckle, filtering with four scanning window sizes was applied and the metrics' values were calculated. For each metric, we determined how many times each window size provided the best results. Data obtained for all three considered metrics are given in Figures 5-7. The plot in Figure 5 shows that, according to PSNR, the 5 × 5 and 11 × 11 windows more frequently appear than the 7 × 7 and 9 × 9 pixel windows.
Meanwhile, the analysis of the plots in Figure 6; Figure 7 demonstrates that, according to PSNR-HVS-M and FSIM, the 5 × 5 and 7 × 7 windows are better more often than the 9 × 9 and 11 × 11 windows. The reasons why this happens have been explained earlier.
Note that, quite probably, the 3 × 3 or 13 × 13 windows can be optimal in some cases. However, our goal here is to prove that different window sizes can be optimal depending on image/noise properties and filtering efficiency criteria. Meanwhile, the analysis of the plots in Figure 6; Figure 7 demonstrates that, according to PSNR-HVS-M and FSIM, the 5 × 5 and 7 × 7 windows are better more often than the 9 × 9 and 11 × 11 windows. The reasons why this happens have been explained earlier.
Note that, quite probably, the 3 × 3 or 13 × 13 windows can be optimal in some cases. However, our goal here is to prove that different window sizes can be optimal depending on image/noise properties and filtering efficiency criteria.  A global adaptation of window size is worth carrying out if the provided benefit is high. A benefit can be determined differently. We calculated the two following parameters:   Meanwhile, the analysis of the plots in Figure 6; Figure 7 demonstrates that, according to PSNR-HVS-M and FSIM, the 5 × 5 and 7 × 7 windows are better more often than the 9 × 9 and 11 × 11 windows. The reasons why this happens have been explained earlier.
Note that, quite probably, the 3 × 3 or 13 × 13 windows can be optimal in some cases. However, our goal here is to prove that different window sizes can be optimal depending on image/noise properties and filtering efficiency criteria.  A global adaptation of window size is worth carrying out if the provided benefit is high. A benefit can be determined differently. We calculated the two following parameters:  Meanwhile, the analysis of the plots in Figure 6; Figure 7 demonstrates that, according to PSNR-HVS-M and FSIM, the 5 × 5 and 7 × 7 windows are better more often than the 9 × 9 and 11 × 11 windows. The reasons why this happens have been explained earlier.
Note that, quite probably, the 3 × 3 or 13 × 13 windows can be optimal in some cases. However, our goal here is to prove that different window sizes can be optimal depending on image/noise properties and filtering efficiency criteria.  A global adaptation of window size is worth carrying out if the provided benefit is high. A benefit can be determined differently. We calculated the two following parameters: A global adaptation of window size is worth carrying out if the provided benefit is high. A benefit can be determined differently. We calculated the two following parameters: where Max PSNR and Max PSNR-HVS-M are the maximal values (among four available) of output PSNR and PSNR-HVS-M, respectively, and the subscript relates to the scanning window size (PSNR 5 means PSNR for 5 × 5 window). The histogram of ∆PSNR is presented in Figure 8. It has the mode for ∆PSNR of about 0.4 dB. The minimal value is about 0.1 dB and the maximal value is about 3.7 dB. This means that a benefit due to the proper selection of optimal window size can be quite large.
means that a benefit due to the proper selection of optimal window size can be quite large.
Similarly, Figure 9 represents the histogram of ∆PSNR-HVS-M. The distribution mode is about 0.5 dB, the minimal benefit is about 0.2 dB and the maximal one reaches almost 5 dB.
Thus, the first hypothesis put forward in Introduction is proven and it is worth selecting the optimal window size. To show that it is possible, we study how to predict filtering efficiency.   Similarly, Figure 9 represents the histogram of ∆PSNR-HVS-M. The distribution mode is about 0.5 dB, the minimal benefit is about 0.2 dB and the maximal one reaches almost 5 dB.

Filtering Efficiency Prediction Using Trained Neural Network
size (PSNR5 means PSNR for 5x5 window).
The histogram of ∆PSNR is presented in Figure 8. It has the mode for ∆PSNR of about 0.4 dB. The minimal value is about 0.1 dB and the maximal value is about 3.7 dB. This means that a benefit due to the proper selection of optimal window size can be quite large.
Similarly, Figure 9 represents the histogram of ∆PSNR-HVS-M. The distribution mode is about 0.5 dB, the minimal benefit is about 0.2 dB and the maximal one reaches almost 5 dB.
Thus, the first hypothesis put forward in Introduction is proven and it is worth selecting the optimal window size. To show that it is possible, we study how to predict filtering efficiency.   Thus, the first hypothesis put forward in Introduction is proven and it is worth selecting the optimal window size. To show that it is possible, we study how to predict filtering efficiency.

Proposed Approach
It was mentioned above that our approach is based on the prediction of filtering efficiency. We assume that there is a method and/or a tool that allows predicting filter efficiency in terms of some criteria (e.g., using a metric or several metrics). Then, filter efficiency can be evaluated (predicted) for a set of filter parameter values (e.g., window size for the local statistic Lee filter). Based on this prediction, it is possible to undertake a decision as to what value of the considered parameter to set in order to obtain an "optimal result" for a given image. The core of this approach is a neural network-based predictor trained off-line for test images that have approximately the same image and noise properties as real-life images to be further processed.
We briefly explain what this means and the requirements for such a prediction. Our first assumption is that there is at least one parameter able to adequately characterize filtering efficiency. This aspect has already been discussed and we will further suppose that improvements in PSNR, PSNR-HVS-M and FSIM (denoted as IPSNR, IPHVSM, IFSIM, respectively) can be considered as adequate metrics. Then, we assume that there are one or several parameters that are able to characterize image and noise properties. In general, these can be different parameters [27,28,48,49], with the main requirements as follows: (a) the parameters have to be informative, (b) they should be calculated easily and quickly. Finally, there should be a connection between the chosen output parameter(s) (predicted metric or metrics) and input parameter(s). This connection should allow estimating output parameter(s) using input one(s). A connection can be realized in different ways-as an analytic expression, as a regressor, or as a more complex tool, such as a support vector machine (SVM) or a neural network (NN).
Our previous experience in the design and analysis of filter efficiency prediction [27,28,48,49] has demonstrated the following:

•
Even one input parameter (if it is informative and takes into account noise statistics or spectrum) is able to provide an accurate prediction of filtering efficiency for many different denoising techniques and criteria (metrics) [27,48]; • A joint use of several input parameters realized as a multi-parameter regression [48] or trained NN [28,49] usually leads to the sufficient improvement of prediction accuracy at the expense of the extra calculations needed.

Neural Network Input Parameters
It has been shown [49] that the improvements of many metrics can be accurately predicted for the 5x5 local statistic Lee filter using the trained NN. Recall here that the performance of any NN depends on many factors regardless of what functions are carried out by an NN (approximation, classification, recognition, etc.). These factors are the following: (a) NN structure and parameters (e.g., the number of hidden layers); (b) input parameters used and their number, (c) activation function type and parameters; (d) a methodology of NN training.
It is difficult to analyze the influence of all these factors within one paper. Due to this, we incorporated some knowledge and experience obtained in our previous works [33,48,49]. In particular, the different sets of input parameters have been considered in [48]. Originally, four groups of input parameters have been proposed.
The first group includes sixteen parameters, all calculated in the discrete cosine transform (DCT) domain. A normalized spectral power was determined in four spectral areas of 8 × 8 pixel blocks marked by digits from 1 to 4 in Figure 10. Zero relates to the DC coefficient not used in the calculations. Four energy allocation parameters are expressed as Here, and denote indices of DCT coefficients in a block ( = 1, . . ,8, = 1, . . ,8), Four energy allocation parameters are expressed as Here, k and l denote indices of DCT coefficients in a block (k = 1, . . . , 8, l = 1, . . . , 8), m (m = 1, . . . , 4) is an index of the m-th spectral area A m (see Figure 10). In fact, the parameters W m characterize the distribution of energy between areas and they all are in the limits from 0 to 1. Having a set of W m determined for a certain number of blocks, four statistical parameters were calculated for each area: mean, variance, skewness, and kurtosis. Then, sixteen parameters (denoted as M S1,...,4 , V S1,...,4 , S S1,...,4 , K S1,...,4 ) that take into account the spectral characteristics of both the true image and speckle are obtained.
The second group includes four input parameters. They all relate to image statistics in 8 × 8 pixel blocks. We denote them as M BM , V BM , S BM , K BM (the mean, variance, skewness, and kurtosis of block means, respectively). They partly describe the image histogram.
The third group of input parameters has come from our previous experience in designing simple predictors [27]. The parameters are also based on data processing in 8 × 8 blocks. Let us estimate the probabilities P σ (q), q = 1, . . . , Q in a q-th block (where Q defined the total number of the considered blocks), where magnitudes of DCT coefficients are smaller than the corresponding frequency and signal-dependent thresholds: where D pn (k, l) is the DCT normalized power spectrum, and I q denotes the q-th block mean. After estimating P σ (q), q = 1, . . . , Q, four statistical parameters (mean, variance, skewness and kurtosis) of these probabilities were calculated. We denote them as M P , V P , S P , K P , respectively. It was also supposed [48,49] that general image statistics can be useful. Due to this, four other parameters have been calculated: image mean, variance, skewness, and kurtosis (denoted as M I , V I , S I , K I , respectively).
All 28 parameters that can be potentially employed as NN inputs can be calculated easily and quickly. DCT in 8 × 8 blocks is a fast operation; other operations are either simple arithmetic or logic operations. A part of them can be calculated in parallel or in a pipeline manner. The essential acceleration of calculations also stems from the fact [27] that usually one can process data only in 1000 blocks placed randomly to obtain the required statistics with appropriate accuracy.
Numerous NN structures can be used. The multilayer perceptron (MLP) structure presented in Figure 11 is well recommended [28,49]. Studies carried out in [28] have shown that, without losing prediction accuracy, it is enough to use 13 input parameters, thus simplifying the NN-based predictor. These 13 input parameters are the following: M S1 , M S2 , M S3 , M S4 , M BM , V BM , S BM , M P , V P , S P , K P , M I and V I (see the description above).
Examples of images used in NN training are given in Figures 2-4. Preparing 8100 test images sized 512 × 512 pixels, we aimed at "covering" different types of terrains of different complexity to represent a wide range of possible practical situations. Examples of images used in NN training are given in Figures 2-4. Preparing 8100 test images sized 512 × 512 pixels, we aimed at "covering" different types of terrains of different complexity to represent a wide range of possible practical situations. Figure 11. Architecture of the multilayer perceptron.

NN Training Results
The MLP-based predictors were trained separately for each of three metrics (IPSNR, IPVHSM, IFSIM). As seen in Figure 11, the NN has three hidden layers. For all of them, hyperbolic tangent (tanh) activation function is used. The linear activation function is employed for the output layer. The MLP has 13 inputs introduced above. These input parameters must be calculated for each SAR image for which the prediction of filtering efficiency has to be done. The NN-based predictor has been trained by means of Bayesian regularization backpropagation.
The process of NN training and verification consists of four stages. The goals of the first two stages that concern self-dataset validation are to determine the final architecture and the number of training epochs. In turn, stages 3 and 4 relate to cross-dataset evaluation and checking the accuracy of the obtained predictors using data not exploited in the training process. One hundred high-quality cloudless images having total sizes of about 5500 × 5500 pixels have been obtained from components with a high SNR of multispectral RS data acquired by Sentinel-2. They were taken from channel #5 with the wavelength of about 700 nm and channel #11 with the wavelength of about 1600 nm. Using such large size fragments, images of the size 512 × 512 pixels were obtained (8100 images for each channel). These 512 × 512 pixel images were used as noise-free (true) images for which speckle distorted images were simulated.
At the preliminary stage, using noisy and corresponding true images, the metrics values were determined. Using noisy images, the input parameters considered in the previous section were determined and saved for all images. The filtered images were obtained and the quality metrics values for them were calculated. After obtaining all these data, the following actions to obtain and verify the NN performance were made. At the stage of the self-dataset validation, the optimal number of training epochs was established at 30. For the used architecture, it occurred to be equal to 30. The dataset was divided into two non-equal parts, 80% of test images have been employed for training and 20% of the remaining images for validation. The obtained training results are random. To overcome this pitfall, the validation was repeated 1000 times using the full permutation of the dataset. This allowed root mean square error (RMSE) and adjusted R 2 [55] to be obtained after averaging to decide NN parameters. Smaller RMSE and larger adjusted correspond to better solutions.

NN Training Results
The MLP-based predictors were trained separately for each of three metrics (IPSNR, IPVHSM, IFSIM). As seen in Figure 11, the NN has three hidden layers. For all of them, hyperbolic tangent (tanh) activation function is used. The linear activation function is employed for the output layer. The MLP has 13 inputs introduced above. These input parameters must be calculated for each SAR image for which the prediction of filtering efficiency has to be done. The NN-based predictor has been trained by means of Bayesian regularization backpropagation.
The process of NN training and verification consists of four stages. The goals of the first two stages that concern self-dataset validation are to determine the final architecture and the number of training epochs. In turn, stages 3 and 4 relate to cross-dataset evaluation and checking the accuracy of the obtained predictors using data not exploited in the training process. One hundred high-quality cloudless images having total sizes of about 5500 × 5500 pixels have been obtained from components with a high SNR of multispectral RS data acquired by Sentinel-2. They were taken from channel #5 with the wavelength of about 700 nm and channel #11 with the wavelength of about 1600 nm. Using such large size fragments, images of the size 512 × 512 pixels were obtained (8100 images for each channel). These 512 × 512 pixel images were used as noise-free (true) images for which speckle distorted images were simulated.
At the preliminary stage, using noisy and corresponding true images, the metrics values were determined. Using noisy images, the input parameters considered in the previous section were determined and saved for all images. The filtered images were obtained and the quality metrics values for them were calculated. After obtaining all these data, the following actions to obtain and verify the NN performance were made. At the stage of the self-dataset validation, the optimal number of training epochs was established at 30. For the used architecture, it occurred to be equal to 30. The dataset was divided into two non-equal parts, 80% of test images have been employed for training and 20% of the remaining images for validation. The obtained training results are random. To overcome this pitfall, the validation was repeated 1000 times using the full permutation of the dataset. This allowed root mean square error (RMSE) and adjusted R 2 [55] to be obtained after averaging to decide NN parameters. Smaller RMSE and larger adjusted R 2 correspond to better solutions.
Self-dataset validation results are presented in Table 1, where test images for channel #11 are used. As one can see, the prediction results are very good. IPSNR and IPHVSM are predicted with RMSE about 0.3 dB; adjusted R 2 have practically identical values and they are in the limits from 0.976 to 0.989 showing that fitting carried out by the trained NN is excellent. If an NN is trained for one set of data and then applied to another set, the NN performance might radically worsen. To check this point, we performed cross-validations. Training was carried out for 6480 test images and parameters that characterize accuracy were estimated for another 1620 images. The obtained results are given in Table 2. Cross dataset evaluation has been done for the same Sentinel-2 data from channel #11. Analysis shows that the RMSE values have slightly increased and the adjusted R 2 has slightly decreased compared to the corresponding data in Table 1. Meanwhile, the prediction accuracy remains very good. Usually, it is even more difficult for an NN to perform data processing if the sets used in training and verification differ sufficiently. In our case, this might happen if the NN is trained for test images composed of data in one channel of Sentinel-2 images and then applied to the test images composed of data in another channel. To check this case, we carried out the NN training on a Sentinel-2 dataset in channel #5 and then carried out cross-dataset evaluation on the dataset from Sentinel-2 channel #11. The obtained results are presented in Table 3.
The analysis of obtained results shows that the RMSE values are almost identical to the corresponding values in Table 2. The values of adjusted R 2 have slightly decreased compared to the corresponding data in Table 2. Nevertheless, the prediction is accurate enough. It is sufficiently better than that for predictors based on a single input parameter [27] and two input parameters [48]. We associate this benefit with two factors. First, the NN uses more input parameters that employ information on image statistics. Second, the NN exploits information about speckle spectral properties by means of input parameters M P , V P , S P and K P . The main parameters that characterize prediction accuracy for the Lee filter versions with different scanning window sizes are approximately the same as for the DCT-based filter analyzed in the paper [28]. Thus, we can conclude that there is a certain generality to the approach to filter efficiency prediction considered in this paper. Hence, the second and the third hypotheses given in the Introduction are also proven. The high accuracy of prediction is provided compared to the simpler approach to prediction [27]. This is particularly due to taking into account the speckle statistical and spectral properties incorporated by input parameters M P , V P , S P , K P that are among the most informative. Now, the question is the following: is the attained accuracy of filter efficiency prediction enough for the adaptive selection of the Lee filter scanning window size?

Adaptive Selection of Window Size
There are several ways to adapt the filter window for a given image to its content based on prediction: • To perform prediction for only one parameter, e.g., IPSNR, for all possible scanning window sizes, and to choose the window size for which the predicted metric (e.g., IPSNR) is the largest. • To jointly analyze two or three metrics (e.g., IPSNR and IPHVSM or IPSNR, IPHVSM, and IFSIM) and undertake a decision (there probably are many algorithms to do this).

•
To obtain three decisions based on the separate analysis of IPSNR, IPHVSM, and IFSIM as in the first item and then to apply the majority vote algorithm of some other decision rule.
Below, we concentrate on the way described in item 1 as the simplest solution, leaving other options for the future.
Decisions can be characterized in various ways. We are mainly interested in two aspects-what is the probability of a correct decision for our approach and what happens if the undertaken decision is wrong, i.e., if an incorrect scanning window size is decided to be used. The probabilities of correct decisions have been determined for self-dataset validation and cross-dataset evaluation. For the self-dataset validation stage, the probability of a correct decision is approximately equal to 0.9 for IPSNR, 0.918 for IPHVSM, and 0.907 for IFSIM. Thus, a high probability of correct decisions has been reached.
For the cross-dataset evaluation in channel (#11), the probability is equal to 0.898 for IPSNR, 0.916 for IPHVSM, and 0.901 for IFSIM. For cross-dataset evaluation with channel (#5), the probability of a correct decision is 0.857 for IPSNR, 0.899 for IPHVSM, and 0.877 for IFSIM. For cross-dataset evaluation with another channel (#5), the probability of a correct decision for IPSNR equals to 0.857; for IPHVSM-to 0.899, and for IFSIM-to 0.877. All probabilities are high enough.
Let us now see what happens if a wrong decision is undertaken. Clearly, this must lead to a reduction in filtering efficiency. Hence, we estimated the differences in metrics values for the optimal (maximally attainable) metric value and the one produced in the case of wrong decision. The distribution of such differences for IPSNR is presented in Figure 12. Most differences are very small (less than 0.2 dB), so erroneous decision is not a problem. Meanwhile, there a few (six) cases where differences exceed 0.5 dB.
of a correct decision is approximately equal to 0.9 for IPSNR, 0.918 for IPHVSM, and 0.90 for IFSIM. Thus, a high probability of correct decisions has been reached.
For the cross-dataset evaluation in channel (#11), the probability is equal to 0.898 fo IPSNR, 0.916 for IPHVSM, and 0.901 for IFSIM. For cross-dataset evaluation with channe (#5), the probability of a correct decision is 0.857 for IPSNR, 0.899 for IPHVSM, and 0.87 for IFSIM. For cross-dataset evaluation with another channel (#5), the probability of a cor rect decision for IPSNR equals to 0.857; for IPHVSM-to 0.899, and for IFSIM-to 0.877 All probabilities are high enough.
Let us now see what happens if a wrong decision is undertaken. Clearly, this mus lead to a reduction in filtering efficiency. Hence, we estimated the differences in metric values for the optimal (maximally attainable) metric value and the one produced in th case of wrong decision. The distribution of such differences for IPSNR is presented in Figure 12. Most differences are very small (less than 0.2 dB), so erroneous decision is no a problem. Meanwhile, there a few (six) cases where differences exceed 0.5 dB. Similarly, Figure 13 represents the distribution for differences between the optima IPHVSM and the corresponding values produced in the cases of wrong decisions. Again most differences are very small and do not exceed 0.2 dB. There are only two test image for which the differences are larger than 0.5 dB. Figure 14 shows the differences for IFSIM Mostly, the differences are very small (less than 0.005). There are only three cases when the differences exceed 0.01. Similarly, Figure 13 represents the distribution for differences between the optimal IPHVSM and the corresponding values produced in the cases of wrong decisions. Again, most differences are very small and do not exceed 0.2 dB. There are only two test images for which the differences are larger than 0.5 dB. Figure 14 shows the differences for IFSIM. Mostly, the differences are very small (less than 0.005). There are only three cases when the differences exceed 0.01.
Let us present some examples of undertaking correct decisions (according to any considered metric). Figure 15 shows the true image (a); the speckled image (b); the optimal filter output for the 11 × 11 scanning window (c); and the filter output for the 5 × 5 scanning window that is surely not the best choice. In addition, we give all true and all predicted metric values. In this example, all true and the corresponding predicted values are close to each other. Meanwhile, all predicted values are slightly larger than the corresponding true values.  Let us present some examples of undertaking correct decisions (according to any con sidered metric). Figure 15 shows the true image (a); the speckled image (b); the optima filter output for the 11 × 11 scanning window (c); and the filter output for the 5 × 5 scannin window that is surely not the best choice. In addition, we give all true and all predicte metric values. In this example, all true and the corresponding predicted values are clos to each other. Meanwhile, all predicted values are slightly larger than the correspondin true values.
Note that for the examples given in Figures 2-4, the optimal and the recommende window sizes coincide.  Let us present some examples of undertaking correct decisions (according to any con sidered metric). Figure 15 shows the true image (a); the speckled image (b); the optima filter output for the 11 × 11 scanning window (c); and the filter output for the 5 × 5 scannin window that is surely not the best choice. In addition, we give all true and all predicte metric values. In this example, all true and the corresponding predicted values are clos to each other. Meanwhile, all predicted values are slightly larger than the correspondin true values.
Note that for the examples given in Figures 2-4, the optimal and the recommende window sizes coincide. Note that for the examples given in Figures 2-4, the optimal and the recommended window sizes coincide.  Figure 15. The true image (a); the speckled image (b); the optimal filter output for the 11 × 11 scanning window (c); and the filter output for the 5 × 5 scanning window (d). Figure 16 and Figure 17 show two more examples. For the image in Figure 16, the 7 × 7 window is the best choice according to both the true and predicted values of all three metrics, although the 9 × 9 window produces good outcomes as well. For the image in Figure 17, the 5 × 5 window is the best choice according to all three metrics, both true and predicted. The use of the 9 × 9 window leads to oversmoothed output. Note that IFSIM can be negative, indicating image quality degradation due to filtering. Figure 18 shows an example of a wrong decision. According to the predicted IPSNR, one must use the 9 × 9 window. Meanwhile, according to the true IPSNR, the 7 × 7 window is the best choice (the predicted IPHVSM and IFSIM are in favor of the 7 × 7 window too). However, in this case, the use of the 9 × 9 window does not lead to a considerably negative effect. Figure 19 demonstrates one more interesting case. Obviously, the 5 × 5 window is the best choice. Meanwhile, for the 9 × 9 window, both true and predicted PSNR are close to zero while IPHVS and IFSIM are negative. In this case, when filtering by 9 × 9, the Lee   Figure 16, the 7 × 7 window is the best choice according to both the true and predicted values of all three metrics, although the 9 × 9 window produces good outcomes as well. For the image in Figure 17, the 5 × 5 window is the best choice according to all three metrics, both true and predicted. The use of the 9 × 9 window leads to oversmoothed output. Note that IFSIM can be negative, indicating image quality degradation due to filtering. the despeckling leads to image degradation for all three considered window sizes and this, in our opinion, is in agreement with visual inspection. Maybe, the use of the 3 × 3 pixel window can be a compromise.
Finally, Figure 22 shows an image with large homogeneous regions. According to IPSNR and IFSIM, the 11 × 11 window is the best, but according to IPHVSM, the 7 × 7 pixel window is better. We prefer to agree with the latter variant.   Figure 16. The true image (a); the speckled image (b); the optimal filter output for the 7 × 7 scanning window (c); and the filter output for the 9 × 9 scanning window (d). Figure 16. The true image (a); the speckled image (b); the optimal filter output for the 7 × 7 scanning window (c); and the filter output for the 9 × 9 scanning window (d). Figure 18 shows an example of a wrong decision. According to the predicted IPSNR, one must use the 9 × 9 window. Meanwhile, according to the true IPSNR, the 7 × 7 window is the best choice (the predicted IPHVSM and IFSIM are in favor of the 7 × 7 window too). However, in this case, the use of the 9 × 9 window does not lead to a considerably negative effect.   ; the optimal filter output for the 5 × 5 scanning window (c); and the filter output for the 9 × 9 scanning window (d). Figure 19 demonstrates one more interesting case. Obviously, the 5 × 5 window is the best choice. Meanwhile, for the 9 × 9 window, both true and predicted PSNR are close to zero while IPHVS and IFSIM are negative. In this case, when filtering by 9 × 9, the Lee filter is useless.  Figure 18. The true image (a); the speckled image (b); the optimal filter output for the 7 × 7 scanning window (c); and the filter output for the 9 × 9 scanning window (d). . The true image (a); the speckled image (b); the optimal filter output for the 5 × 5 scanning window (c); and the filter output for the 9 × 9 scanning window (d). Figure 19. The true image (a); the speckled image (b); the optimal filter output for the 5 × 5 scanning window (c); and the filter output for the 9 × 9 scanning window (d).
Let us give three real-life examples. Figure 20 shows an example for a real-life Sentinel-1 image of size 512 × 512 pixels. Since we do not have the true image in this case, we can demonstrate the results only visually and analyze them. The original (noisy) image was processed by the Lee filter with scanning window sizes of 5 × 5 (b); 7 × 7 (c); and 9 × 9 (d) pixels. The predicted metric values are given under the corresponding outputs. According to these, the 7 × 7 window size is the best and, in our opinion, this correlates with a visual analysis.  Another example is given in Figure 21. In this case, the image contains a lot of smallsized details. According to IPSNR, the despeckling produces a small improvement for the scanning windows of 5 × 5 and 7 × 7 pixels. However, according to visual quality metrics, the despeckling leads to image degradation for all three considered window sizes and this, in our opinion, is in agreement with visual inspection. Maybe, the use of the 3 × 3 pixel window can be a compromise.

Conclusions
The local statistic Lee filter is considered as a representative of the known de-speckling methods used in SAR image processing. It was shown that the scanning window size sufficiently influences the quality of output images. Thus, its adaptive setting seems expedient. We show how this size can be defined for a given image using filter efficiency prediction realized by a neural network. Many aspects of the neural network design and training are considered. A high prediction accuracy was demonstrated for three quality metrics. It was shown that the correct decisions can be undertaken with high probability (exceeding 0.85). The cases of undertaking wrong decisions are studied as well. It is shown that, in most situations, the negative outcomes of such decisions are negligible. Examples of simulated and real-life images are presented to explain the problem and give details concerning the proposed solutions (see data in Supplementary Materials Section).
We empirically proved all three hypotheses stated in the Introduction. Due to the optimal setting of the filter window size, a considerable improvement of despeckling efficiency can be reached compared to the case of fixed setting. The optimal window size can be chosen based on efficiency prediction performed by the trained NN. The proposed

Conclusions
The local statistic Lee filter is considered as a representative of the known de-speckling methods used in SAR image processing. It was shown that the scanning window size sufficiently influences the quality of output images. Thus, its adaptive setting seems expedient. We show how this size can be defined for a given image using filter efficiency prediction realized by a neural network. Many aspects of the neural network design and training are considered. A high prediction accuracy was demonstrated for three quality metrics. It was shown that the correct decisions can be undertaken with high probability (exceeding 0.85). The cases of undertaking wrong decisions are studied as well. It is shown that, in most situations, the negative outcomes of such decisions are negligible. Examples of simulated and real-life images are presented to explain the problem and give details concerning the proposed solutions (see data in Supplementary Materials Section).
We empirically proved all three hypotheses stated in the Introduction. Due to the optimal setting of the filter window size, a considerable improvement of despeckling efficiency can be reached compared to the case of fixed setting. The optimal window size can be chosen based on efficiency prediction performed by the trained NN. The proposed approach presumes that the training must be carried out, taking into account statistical and spectral properties of speckle for the considered class of SAR images.
If so, the question of universality of the proposed approach arises. The speckle statistical properties influence input signal-to-noise ratios of acquired images that, in turn, impact filtering efficiency [27] and other performance characteristics of RS image processing [56]. We adapted our NN predictor to the statistical and spectral characteristics of the speckle for five-look Sentinel SAR images and this is one reason why the high accuracy of prediction was provided. Therefore, we can currently talk about the universality of our approach in the sense that the same preliminary study and training can be carried out for other types of SAR images, e.g., for single-look TerraSAR-X images. Meanwhile, it is also possible to find such input parameters for a trained network that can be quite insensitive to the possible changes of speckle statistics and spatial spectrum.