Statistical Properties of an Unassisted Image Quality Index for SAR Imagery

The M estimator is a recently proposed image-quality index used to evaluate the despeckling operation in SAR (Synthetic Aperture Radar) data. It is used also to rank despeckling filters and to improve their design. As a difference with traditional image-quality estimators, it operates not on the filtered result but on a derived one, i.e., the ratio image. However, a deep statistical analysis of its properties remains open and, with it, the ability to use it as a test statistic. In this work, we focus on obtaining insights into its distribution as well as on exploring other remarkable statistical properties of this unassisted estimator. This study is performed through EDA (Exploratory Data Analysis) and the well-known ANOVA (ANalysis Of VAriance). We test our results on a set of simulated SAR data and provide guides to enrich theM estimator to extend its capabilities.


Introduction
The benefits of SAR (Synthetic Aperture Radar) systems for monitoring planetary surfaces, such as their high-resolution, day and night imaging capability, and being little dependent on the weather, are well-known [1].
It is also well-known that SAR data are corrupted with multiplicative noise (speckle), which can be modelled by a Gamma law [2,3].Speckle contains valuable information (it can not be regarded to as truly noise) but it hinders proper interpretation of the data and it makes post-processing tasks more complicated, such as image interpretation and classification.
Speckle reduction (despeckling) is an important issue in SAR, and a plethora of excellent despeckling filters have been proposed based on local statistical and adaptive models since the very beginning [4][5][6].More recently, new approaches include improvement of existing filters [7,8], Nonlocal Means methods [9][10][11], the Curvelet Transform [12,13], the Wavelet Transform [14][15][16], Partial Differential Equations formulation (such as Total Variation and Diffusion-Reaction models) [17,18], or combinations of methods employing Machine Learning techniques [19][20][21].There are also methods devoted to the multilook case such as [22] which benefits from the sparsity characteristics of speckle to build a non-parametric statistical model for speckle reduction.See an updated tutorial about speckle reduction in [23].
Regarding staircase effect (commonly related to total variation, TV, based methods), a great effort to alleviate this artifact has been done through the either higher-order TV formulations involving complex boundary conditions [24,25], or by splitting techniques for the regularizer [26] or other techniques (in [27], a combination of splitting and penalty technique to solve the model is applied).
Assessing the performances of a despeckling filter is a challenging problem, specially for the state-of-the-art filters: the improvements are likely to be very small, even incremental.Thus, any strategy to confirm that a filter performs better that others should be both efficient and objective.Traditionally, such evaluation relied only on visual inspection of the denoised images [4].
The SAR community has somehow adopted the initial protocol to evaluate despeckling filters proposed by Lee [28] in 1994.It consists of using simulated data: a phantom (truth) image corrupted by speckle noise following a particular distribution, usually the Gamma law.This simulated image is then filtered to obtain the despeckled result.The assessment of the filter operation is done both visually (inspection by an expert) and numerically through a set of measures of quality.Among these, the PSNR (Peak-Signal-to-Noise Ratio), edge preservation (Pratt's figure of merit, FOM) and image similarity measures (SSIM: structural similarity index) between the original and filtered results are commonly applied.Other first statistical order (mean preservation within homogeneous areas) and second order (variance reduction within homogeneous areas) measures are also applied.After using simulated data, actual cases are evaluated (at least two and with different polarization, number of looks, etc.) and, as no ground-truth is available, the criteria used are mean preservation, variance reduction and visual inspection by an expert.The Lee's protocol is valid for the one-look and the multilook cases.
Most works in this field propose a new filter, and then compare its performance with existing and closely-related ones.This is usually performed by using simulated data, applying the filters, comparing the results visually and displaying tables or plots with the results of the quantitative assessment [29,30].A complete review of SAR despeckling assessment can be found in Ref. [1].
Other simulation schemes apart from the one proposed by Lee are also used for the same purpose [11,16], for instance images obtained by means of physical models [17].
Moschetti et al. [31] proposed a MAP (Maximum a Posteriori) despeckling filter.The authors state that [. . .] to quantitatively evaluate the results obtained using these new filters, with respect to classical ones, a Monte Carlo extension of Lee's protocol is proposed.This extension of the protocol shows that its original version leads to inconsistencies that hamper its use as a general procedure for filter assessment.
As a consequence of that, Monte Carlo simulations became a common practical for researchers to better compare filter performance [32][33][34], and also for other SAR/PolSAR (Polarimetric SAR) analyses [35,36].
Recently, a new referenceless image-quality index intended for evaluation of despeckling filters was proposed [37].This new metric, called M, differently from most of image-quality estimators, obtains valuable information not from the filtered result, but from a derived result: the ratio image.
Evaluation of a filter operation from inspection of the ratio images is inspired from the inspection of residual images after a denoised operation for a image corrupted with additive Gaussian noise.The residual image, R, is obtained as R = Z − " X, with Z denoting the noisy image and " X its filtered version.For an ideal filtering operation, R should contain independent identically distributed Gaussian samples.This kind of complementary evaluation was used in the seminal work by Buades et al. [38] in the introduction of the nonlocal means filters.The use of such alternative approach for the case of multiplicative noise appears in [39] and, from that, many works include this analysis [15,37,40,41] to assess the filter performance.
Assuming the multiplicative model for the speckle within SAR data, the observed image Z is the product of two independent fields: the backscatter X and the speckle Y.The result of any despeckling filter is " X(Z), an estimator of the backscatter X, which only depends on the observed data Z.After the filtering, the ratio image is obtained as R = Z/ " X.If the filter is "perfect", it contains only speckle, that is, a collection of independent and identically distributed samples from Gamma variates.The new measure is comprised of two terms: First Order: which quantifies deviations from the hypothesis of observing identically distributed deviates from Gamma random variables with unitary mean and same shape parameter, and Second Order: which assesses departures from the independence hypothesis by quantifying the residual geometric content within the ratio image.
Following this idea, Vitale et al. [41] proposed a new second order component: RIS (Ratio Image Structuredness).
Albeit an expressive estimator, the distribution of M is not known and, therefore, cannot be used as a statistical model [42] for prediction and testing [43,44].This information would allow to classify filters, to compare the quality of the procedures, to synthesize information and to evaluate fluctuations of the ratio image.Given the difficulty in finding the exact distribution of M, we opt for a nonparametric approach [45].
The organization of this paper is as follows: Section 2 summarizes the M estimator introduced in [37].In Section 3 we detail the Monte Carlo experiments on simulated data carried out to obtain the distribution of M, while the mathematical framework is described in Section 4. The results are shown and discussed in Section 5. Final comments and an outline of future research developments are given in Section 6.

The M Estimator
First and for the sake of completeness, we recall the M estimator and its main properties; see [37] for details about this test statistic.
Figure 1 is a schematic representation of the steps leading to the computation of the M statistic.The required input is identified in the grayed box.The initial step consists of applying a filter to the original image Z, computing the filtered image " X and the ratio image R = Z/ " X.The test statistic M is comprised of two terms: one based on first order statistics (mean and number of looks preservation), and one related to second order statistics (independence, or lack of it, of the ratio data).
Computing the first order statistics term requires identifying n ≥ 1 textureless areas within within the ratio image.Detecting such areas after a reasonable good despeckling operation is a challenging task since most of the data resembles pure noise (speckle).To overcome that, and to avoid any bias from the filtering operation, an automatic detection is performed within the original unfiltered data, and by stipulating a number of looks, a tolerance and a window size.
An automatic search is performed after setting a minimum number of regions to detect, a mask size (i.e., 15) and a tolerance value (i.e., 5%) for the deviation of the estimated ENL (the number of looks of the original SAR data is known).If no area satisfies the setting conditions, they are relaxed by either reducing the size of the mask or by increasing the tolerance related to ENL value.At least ten homogeneous areas were detected in all the experiments carried out in this work.This step is illustrated in the "Identification" box of the schema.Notice that these regions are only used to compute the first-order term of the M statistic.Now we calculate the first-order residual as the mean over the n regions identified in the previous step: where, for each textureless area i, is the absolute value of the relative residual due to deviations from the ideal ENL, and is the absolute value of the relative residual due to deviations from the ideal mean (which is 1).An ideal despeckling operation would yield r ENL, µ = 0.The next step consists of measuring the residual geometric content within the ratio image.We perform this with two quantities derived from a Haralick's measure of texture, namely: • h o : the homogeneity of the original ratio image R, and • h g : the mean homogeneity calculated over 10 randomly shuffled versions of R.
Then, we define δh as the absolute value of the relative variation of h o in percentage as a measure of the departure from the hypothesis that all observations are independent.Note that the larger this variation is, also the greater is the amount of structure remaining on the ratio image.We discretized the observations to eight levels to compute the Haralick measure.
As defined, δh provides an objective measure for ranking despeckled results regarding solely the remaining geometrical content within the ratio image.This component increases in the presence of blurred edges, for instance.
By combining the first and second order measures defined above, the proposed estimator provides an estimate of both the remaining structure and of deviations from the marginal statistical properties of the ratio image: The perfect despeckling filter will produce M = 0, and the larger M is, the further apart the filter is from the ideal one.
Figure 2 shows observed data, the results of applying two common despeckling filters, the enhanced Lee filter (E-Lee) and FANS (Fast Adaptive Nonlocal SAR) filter for the 4 looks SAR image (Flevoland, Netherlands), and the corresponding ratio images.It can be seen that the result from the FANS filter (third row) is visually superior to the result from the E-Lee filter in terms of edge preservation and also in global appearance.This result is also confirmed visually by inspecting the ratio images: much less geometrical content remains in the ratio image after applying FANS.These subjective results are numerically confirmed by the M estimator (see Table 1).The M value for the H 0 (the pure speckle data used to compare with) is also shown in this Table.The M estimator is also useful to designing despeckling filters.Figure 3 shows two results of applying the E-Lee filter using different mask sizes (7 × 7 and 15 × 15).Notice that the latter filter yields a ratio image with more geometrical content than the solution provided by the design using the 7 × 7 mask.
The visual inspection of the ratio images suggests that the presence of any remaining structure in the ratio image related to geometric content within the observed image is caused by a non ideal filter.Therefore, edges and small features were been modified (not properly preserved).Regarding bright scatterers, which are of great relevance for despeckling SAR images, note that competitive despeckling filters do not alter significantly their signature of them (the E-Lee filter leaves them without filtering at all) so, no geometrical content remains in the ratio image related to them.All this information is numerically compressed in the M estimator through the analysis of texture by the δh measure.Although expressive and useful, as seen in the examples above, not knowing the distribution of M limits its applicability.In particular, we would like to turn it into a test statistic.In the following sections we will tackle this shortcoming.

The Distribution of M
In this section we perform experiments on simulated data to estimate properties of the statistical distribution of M. Figure 4 outlines the methodology we followed.
In order to achieve generality, we employed a parameter space consisting of a set of equivalent number of looks (ranging from ENL min to ENL max ), Tolerances (from Tol min to Tol max ) and Mask Sizes (from Mask min to Mask max ).Then, we applied the same bank of despeckling filters as in [37] to the samples generated.With this we obtain a statistically significant number of samples from M. Then, through the statistical methods described herein below, we infer about the distribution of M. • Despeckling filters: As mentioned above, the despeckling filters used are the same employed in [37].They are well-known and with proven efficiency.It is interesting to remark that the four filters are adaptive although conceptually different.
-E-Lee: The Enhanced Lee [7] filter is an improved version of the classical adaptive Lee filter [46].Its underlying model takes into account an asymmetric distribution for the data.-SRAD: The Speckle Reducing Anisotropic Diffusion [17] filter belongs to the category of PDE-based (Partial Differential Equations) filters.As a difference with the rest of the filters used in this work, SRAD processes the whole image at once, that is, it is not based on convolution masks.
-PPB: The Probabilistic Patch Based [10] is a nonlocal filter.This filter performs well on images corrupted with both additive and multiplicative noise.-FANS: The Fast Adaptive Nonlocal SAR [16] filter may be considered state-of-the art among despeckling filters since it provides outstanding results in most cases.FANS employs a set of wavelet transforms in a collaborative fashion with relative low computational cost.All filters were tuned to the designs recommended by their authors, only with slight modifications (mask size and threshold) for PPB and FANS that yielded improved mean and ENL preservation.This promotes a fair comparison among them.
The next section presents the statistical technique to make inferences about M.

Parametric Distribution Analysis
There is a growing tendency of assessing filters in a quantitative, reproducible, and statistically sound manner [31].This requires going well beyond the use of subjective evaluation.In this sense, the statistical evaluation of quantifiers as the M may support decisions to manage SAR imagery in an efficient and effective manner.
Analysing M as a test statistic requires assessing its behavior in two situations: H 0 , the null hypothesis, and H 1 , the alternative.The former is that the observations in the ratio image R are a collection of independent and identically distributed deviates from a Gamma law with unitary mean and ENL as shape parameter.The latter, H 1 , is any deviation from H 0 .We are interested in the test size and power, related, respectively, to Type I and Type II errors, and to the distribution of M under H 0 .
To this end, we perform a Monte Carlo (MC) study, and we consider three situations: • Experiment 1. Simulation of M under H 0 to evaluate its distribution under the null hypothesis (baseline), i.e., without applying any filter.
• Experiment 2. Simulation of step images, filtering, and then assessing M.
• Experiment 3. Simulation of ramp images, filtering, and then assessing M.
We obtained 100 independent replications in each point of the parameter space.In order to determine a suitable number of replications, we performed an initial analysis in a few selected points of the parameter space with a much larger value.
To evaluate the experiments we use Analysis of Variance-ANOVA [47,48]: a methodology to evaluate combinations of several variables or factors to identify those that have a significant effect on the estimates.The factors under assessment are ENL = {1, 4}, Mask = {15, 25}, Tol = {5, 10} and Filter = {None, FANS, E-Lee, PPB, SRAD}.We used a robust linear regression [49] in order to prevent the effect of outliers.Also, we obtained Kernel density estimates of M using a Gaussian kernel with a fixed smoothing and bandwidth choosing by "rule of thumb" [50].
The robust models used in the analyses were all sequential and with fixed-effects.For each, the variability accounted for was evaluated as adjusted R 2 × 100.The models' fits were compared via ANOVA and robustified F-tests (namely, F R ).We performed post-hoc pairwise comparisons [51] with the Tukey's honestly significant difference test (Tukey's HSD test), and their p-values were adjusted via false discovery rate (p FDR , see [52]).

Experiment 1
This experiment provides the core result of this paper: an analysis of the distribution of M under the (null) hypothesis that the filter is perfect.Under this assumption, the filtered image is exactly the backscatter: " X = X, so the ratio image is pure speckle R = Z/ " X = XY/X = Y.In this manner, (2) is zero, and the first order component depends only on the mean preservation (3), i.e., r ENL, µ = 2 −1 n i=1 r µ (i).The second order component is computed at each iteration.Figure 6 shows the smoothed histograms of the data in all the situations here considered, along with their sample mean (dashed vertical lines).Table 2 shows the sample mean, standard deviation, asymmetry and kurtosis of M under the null hypothesis and the eight situations here considered.We notice the following: • mean and median are very close, if not identical; this suggests there are no one-sided outliers; • Tolerance and Mask have less influence on the mean than ENL; • although systematically positive, the skewness is small; • kurtosis fluctuates around 3, the value for the Normal distribution, and remains relatively close to it.
In spite of their similarity, the Kolmogorov-Smirnov test rejected the hypothesis that any pair belongs to the same distribution.
For this reason, Table 3 presents the sample critical values at the 5%, 1%, and 1 We notice the following: • mean and median are very close, if not identical; this suggests there are no one-sided outliers; • Tolerance and Mask have less influence on the mean than ENL; • although systematically positive, the skewness is small; • kurtosis fluctuates around 3, the value for the Normal distribution, and remains relatively close to it.
In spite of their similarity, the Kolmogorov-Smirnov test rejected the hypothesis that any pair belongs to the same distribution.
For this reason, Table 3 presents the sample critical values at the 5%, 1%, and 1 ‰ levels for each combination of ENL, Tolerance and Mask.These values allow for testing the hypothesis that a certain filter is ideal at selected significance levels.ENL Tol Mask q 950/1000 q 990/1000 q 999/1000 As this is the baseline, the kernel density estimators [53,54] of Experiment 1 are shown along the results of other situations.
The normality of the distribution was also checked through the Shapiro-Wilk test [55,56] across parameters.In all cases, the normality is not rejected for usual levels of significance.Usually [57], the thresholds to compare p-values are 0.001, 0.01 and 0.05.Although this would allow us to provide exact rather than sample critical values, using the sample mean and standard deviation provided in Table 2, we opted for the latter in a more conservative approach.

Experiment 2
Figure 7 shows the kernel density estimates of M when the filters are applied to the step image.There are noticeable differences in the location the distribution for each combination of parameters when compared to the previous experiment.Also, M with all filters is more spread than under H 0 .
Regarding mean values, E-Lee is the closest to H 0 , closely followed by SRAD, and then by FANS.PPB produces the distribution most shifted to the right, i.e., the one furthest away from the ideal situation.levels for each combination of ENL, Tolerance and Mask.These values allow for testing the hypothesis that a certain filter is ideal at selected significance levels.As this is the baseline, the kernel density estimators [53,54] of Experiment 1 are shown along the results of other situations.
The normality of the distribution was also checked through the Shapiro-Wilk test [55,56] across parameters.In all cases, the normality is not rejected for usual levels of significance.Usually [57], the thresholds to compare p-values are 0.001, 0.01 and 0.05.Although this would allow us to provide exact rather than sample critical values, using the sample mean and standard deviation provided in Table 2, we opted for the latter in a more conservative approach.

Experiment 2
Figure 7 shows the kernel density estimates of M when the filters are applied to the step image.There are noticeable differences in the location the distribution for each combination of parameters when compared to the previous experiment.Also, M with all filters is more spread than under H 0 .
Regarding mean values, E-Lee is the closest to H 0 , closely followed by SRAD, and then by FANS.PPB produces the distribution most shifted to the right, i.e., the one furthest away from the ideal situation.

Experiment 3
Figure 8 shows the kernel density estimates of M when the filters are applied to the ramp image.We note big differences in the location of the distribution when compared with the baseline.The normality was confirmed in all cases via Shapiro-Wilk.The distribution of M with PPB is the worst, as it is the furthest away from M under H 0 .A comparison between E-Lee, SRAD and FANS is not conclusive with the single-look ramp, but when the noise is reduced to four looks, SRAD behaves best.This may be due to its global nature, in contrast with the local approach taken by the other filters.
There were significant effects in the probability distribution of M in Experiment 2. The mean of all filters is shifted when compared with the baseline.However, we note that, with the exception of PPB, all other filters preserve the shape of the distribution; this suggests stability of distributional family behaviour under H 0 across the other parameters.Again, with the exception of Tolerance (Tol 10 vs. 5 = −0.004,p FDR = 0.44), all pairwise comparisons were significant for usual levels of significance:  4) using M as a proxy variable and grouping the data by ENL, Mask and Filter, we obtain statistically significant differences, with the exception of Tolerance which is not significant.The null hypothesis is that the location of M is the same when the actual effect is a change in dispersion.Clearly, when ENL = 1 the E-Lee, SRAD and PPB filters have greater dispersion when compared to FANS.This suggests that the trade-off between bias and variance produced by the FANS filter is smaller, turning the FANS filter more sensitive to detect nuances in the step image, but still it is far from an ideal filter.
We note a paradoxical result by Tukey's HSD test.PPB and H 0 are not significantly different under H 1 .This is caused by an interaction effect between ENL and Mask, and indicates that PPB is not robust enough to changes in these parameters.
Finally, in Experiment 3, the null hypothesis of the same mean of M when the parameters are included in the modelling is rejected in all situations (see Table 4).In this experiment the Tolerance is apparently significant, indicating that the detection of the small nuances is affected by the degree of homogeneity of the image, which was to be expected, i.e., the filters need more specificity to deal with departures from structure detection.However, we observe in the post hoc analysis with the Tukey's HSD test that Tolerance is not significant Tol 10 vs. 5 = 0.002, p FDR = 0.69).This contradictory result implies that the parameter must be adjusted more carefully.The pairwise comparisons for the other parameters indicate that the differences are statistically significant: • Filter E-Lee vs. FANS = 0.01, p FDR = 0.63, • Filter SRAD vs. PPB = −0.34,p FDR < 2.3e −8 , • Filter E-Lee vs. PPB = −0.39,p FDR < 2.3e −8 , • Filter E-Lee vs. SRAD = −0.05,p FDR < 1.7e −7 .
Shapiro-Wilks test does not reject the null hypothesis of normality of M for all combinations of parameters at usual levels of significance.
Simulations and the computation of Haralick's textural features were performed using Matlab [58].The data was subsequently analyzed using the R language [59].Plots and statistical tools were evaluated by available packages in R.
The code and data for reproducing the results here reported are available here https://github.com/Raydonal/Statistical-Behaviour-M-Index.

Conclusions
We concluded in this study that the Normal distribution is a suitable model for M both for the ideal filter (the null hypothesis) and for the alternatives here considered.We observe that, in general, the mean can be assumed as a linear function of ENL and Mask.All filters produced values of M with approximately Normal distribution shape but with mean values different to those obtained under H 0 (Experiment 1).
In this way, we can conduct tests based on normality [60] to evaluate distributional differences between H 0 and H 1 .For example, we can test µ FANS − µ H 0 = 0 (the difference of means of the residual structure is zero) with a t-test.Here, µ FANS and µ H 0 indicate the expected value of M using the filter FANS and under H 0 .In practice, we do not have the mean or variance values.However, we can obtain them using resampling [45] for the filters and by using the simulated values produced as in Section 3 for H 0 .

Figure 1 .
Figure 1.Schematic representation of the steps for computing the M statistic.

Figure 2 .
Figure 2. Examples of ratio images after filtering the observed data (top) with two despeckling filters.The ideal ratio image is also shown (top right).The filtered image by the enhanced Lee filter (7 × 7 mask size) is shown in the second row (left) and the ratio image in the same row (right).The filtered image by the FANS filter is shown in the third row (left) and the ratio image in the same row (right).

Figure 3 .
Figure 3. Examples of ratio images after filtering the observed data (top) with the same filter but with different designs.The ideal ratio image is also shown (top right).The filtered image by the enhanced Lee filter (7 × 7 mask size) is shown in the first row (left) and the ratio image in the same row (right).The filtered image by the enhanced Lee filter (15 × 15 mask size) is shown in the second row (left) and the ratio image in the same row (right).

EFigure 4 .
Figure 4. Monte Carlo experiments to infer about the distribution of M.• Simulated SAR data Figure5shows the two cases here considered: ramp and step, and an example of their speckled versions in the worst case: one look.The images are are 150 × 150 pixels.•Parameters -ENL: We chose two values for the number of looks: 1 (the worst case) and 4. From the authors' experience, these two cases are enough for a sound Monte Carlo analysis.-Tolerance: Tolerance (Tol in Figure4) refers to the relative deviation from the nominal and estimated values of ENL and the mean, as measured within the observed data in an textureless area.That is, an image region is considered textureless with a tolerance value of, for instance, 5% if its number of looks is, for instance, L = 1 and the mean value is µ = 100, and the measured ENL ∈ [0.95, 1.05] and the measured µ ∈ [95, 105].As above, two values for the tolerance have been set: 5% and 10%.-Window Mask: Window mask (Mask in Figure 4) is the size of the window used to measure ENL and µ.Two values have been set: 15 × 15 and 25 × 25.

Figure 5 .
Figure 5. Simulated data used in the Monte Carlo experiments: (top) degraded data and step data (bottom).Bitmap are shown on the left column and simulated samples are shown on the right column.

Figure 6 .
Figure 6.Kernel density estimates of the statistic M when applied to pure speckle without structure.The vertical dashed lines represent the mean values.H 0 is M without filtering.

Figure 7 .
Figure 7. Kernel density estimates of the statistic M when the filters are applied to the step image.The vertical dashed lines are the mean values.H0 is M under H 0 .

Figure 8 .
Figure 8. Kernel density estimates of the statistic M when the filters are applied to the ramp image.The vertical dashed lines are the mean values.H0 is M under H 0 .

Table 1 .
M estimator for the examples shown in Figures 2 and 3.

Table 3 .
Critical values of M.

Table 3 .
Critical values of M.
Now, by assessing the ANOVA results in Experiment 3 (see Table

Table 4 .
Results of the robust linear regression modelling of the data in the four experiments.