Identification of Statistically Homogeneous Pixels Based on One-Sample Test

Statistically homogeneous pixels (SHP) play a crucial role in synthetic aperture radar (SAR) analysis. In past studies, various two-sample tests were applied on multitemporal SAR data stacks under the assumption of having stationary backscattering properties over time. In this letter, we propose the Robust T-test (TR) to improve the effectiveness of test operation. The TR test reduces the impact of temporal variabilities and outliers, thus helping to identify SHP with assurances of similar temporal behaviors. This method includes three steps: (1) signal suppression; (2) outlier removal; and (3) one-sample test. In the experiments, we apply the TR test on both simulated and real data. Different stack sizes, types of distributions, and hypothesis tests are compared. Results of both experiments signify that the TR test outperforms conventional approaches and provides reliable SHP for SAR image analysis.

A generic approach for SHP identification is through hypothesis tests.Pixels showing no significant differences from each other will be incorporated into the same SHP family.In general, the hypothesis tests can be divided into two categories: nonparametric and parametric tests.Nonparametric tests perform acceptably over a wide range of distributions as they do not assume the distribution of the data.However, these tests only utilize part of the information provided by the sample [14], which makes them less effective than the parametric ones.To attain the same power (power is the probability of correctly rejecting the null hypothesis.), nonparametric tests usually require larger sample sizes than the parametric tests.In contrast, parametric tests are usually defined based on certain assumptions.They are expected to perform better when the assumptions hold [2].
Statistically speaking, the effectiveness of the hypothesis tests relies on how the assumptions are in line with the data.Nonparametric tests can provide relatively robust results when no appropriate assumptions can be made, while parametric tests can perform effectively when the data characteristics are in conformity with the tests.Indeed, SAR amplitude was commonly assumed to be Rayleigh distributed (several theoretical models have also been suggested, such as Nakagami, Gamma, K, and G 0 distributions [15][16][17][18].);however, empirically non-Rayleigh behaviors were often observed in past studies [19].To avoid incorrect presumptions of SAR data characteristics, nonparametric two-sample tests (e.g., Kolmogorov-Smirnov (KS) and Anderson-Darling (AD) tests) were more extensively used than the parametric ones for SHP identification [2,9,10].As these tests rely on the stationarity of the backscattering properties, they may fail in the presence of temporal variations or outliers.
In theory, these hypothesis tests handle only stochastic processes.Any inclusions of meaningful signals (e.g., temporal changes) will be interpreted as the stochastic processes and potentially hamper the effectiveness of the hypothesis testing.Standard tests usually assume that the backscattering properties remain stationary over time.Although reducing the sample size may help the data characteristics meet this assumption, committing type I and II errors becomes an undesired trade-off [14].As there is no clear winner between these choices, we should develop an approach that improves the robustness of the test operation while keeping the large sample size.
In this letter, we propose the Robust T-test (TR) to reduce the impact of temporal variabilities and outliers.The TR test helps identify SHP with assurances of similar temporal behaviors, which improves the reliability of SHP in SAR analysis.

Methodology
In the following section, we introduce the procedure of the TR test based on the multitemporal configuration.As illustrated in Figure 1, this method is comprised of three steps: (1) signal suppression; (2) outlier removal; and (3) one-sample test.

Signal Suppression
Conventionally, two-sample tests were directly applied to pairs of time series to determine whether or not each pair was retrieved from the same distribution.In such a case, the test operation deals with signals and stochastic processes concurrently.The handling of non-random signals potentially impedes the effectiveness of the test procedure.To enhance the performance of the hypothesis testing, one should remove signals from the data before any test operations are applied.In this study, we propose a strategy to eliminate signals.
Assume N SAR images were acquired in the same geometry and have been well coregistered.The amplitude time series of a pixel P can be represented as where I(P) is a sample comprised of temporally ordered N amplitude values from 1st to Nth images.
To detail the theoretical basis of signal suppression, we first analyze the components of a temporal observation I t (P).We will ignore the additive systematic errors in the following discussions and consider the observed amplitude as the product of speckles and scene signals [20] I t (P) = I n t (P)I S t (P), where the superscripts n and S indicate the speckle noise and the scene signal, respectively.
Assume two samples, denoted as I(P i ) and I(P j ), need to be tested.Instead of applying a two-sample test on I(P i ) and I(P j ) directly, we take the logarithm on each of their temporal observations to transform the multiplicative noise into an additive one [21] ψ t (P) = ln (I t (P)) = ln (I n t (P)) + ln (I S t (P)), and then subtract ψ t (P j ) from ψ t (P i ) for each of the temporal observations, acquiring a new sample where ψ(P ij ) is a N-element time series (vector) that represents the difference between ψ(P j ) and ψ(P i ).
If the original two samples I(P i ) and I(P j ) are statistically homogeneous, most signals in the observations will be canceled after Equation ( 4) is applied (so as the original distribution in the data).In other words, only random errors should remain in ψ(P ij ).Since random errors are usually assumed Gaussian distributed with the mean equal to zero, the result leads to (5)

Outlier Removal
Considering that abnormal values will influence the test operation, an outlier removal process is applied to improve the effectiveness of the hypothesis testing.In past studies, a standard approach for outlier removal was through boxplot [22], which was effective when the data were normally distributed.Since the amplitudes are usually considered as skewed distributions [19], ψ(P ij ) might tend to be skewed (especially for small sample size).We therefore apply adjusted boxplot [23] to effectively remove abnormal values in different situations.
The adjusted boxplot utilizes medcouple (MC) [24] as a robust measure to the skewness of a distribution.Consider ψ(P ij ) as a sorted sample of ψ(P ij ), where ψ1 (P ij ) ≤ ψ2 (P ij ) ≤ ... ≤ ψN (P ij ).We first define a kernel function h for all ψa (P ij ) = ψb (P ij ) where Q 2 is the median of ψ(P ij ).For the special case that ψa (P ij ) = ψb (P ij ) = Q 2 , the kernel h is defined in a different way (see details in [23]).In real SAR applications, the special case may not play a role due to the presence of measuring errors.The MC of ψ(P ij ) is defined as the median of all h( ψa (P ij ), ψb (P ij ) values for which ψa (P ij ) ≤ Q 2 ≤ ψb (P ij ).Consequently, observations not within the interval [L, U] will be considered as outliers where Q 1 and Q 3 are the first and third quartiles of ψ(P ij ).
The adjusted boxplot does not require any assumptions on the distributional type.Therefore, it is adaptive to various distributions.The effectiveness of adjusted boxplot on amplitude data can be referred to [9].In this study, we apply this procedure on ψ(P ij ) to remove abnormal values prior to the hypothesis testing.We denote the sample after the outlier removal process as ψ( Pij ).

One-Sample Test
After the signal suppression and outlier removal processes have been performed, a one-sample t-test is applied on ψ( Pij ).One-sample t-test compares the sample mean to a known population mean without knowing the variability/dispersion of the population.It has been considered as a robust procedure among the parametric tests.Slight departures from normality may only result in minor deviations from the ideal.Therefore, the p-value quoted may be slightly in error if there is a moderate violation of the normality assumption [14].Since random errors are considered normally distributed, the t-test becomes a suitable measure to test if ψ( Pij ) mainly contains random errors where μ is the population mean of ψ( Pij ), and the corresponding t-statistic can be computed through where µ, s and n are the sample mean, standard deviation, and sample size of ψ( Pij ).
As a consequence, two pixels P i and P j will be considered as SHP only if ψ( Pij ) contains no meaningful signals.Handling of only the stochastic processes potentially improves the effectiveness of the hypothesis testing.Under the proposed procedure, SHP of a generic pixel P (denoted as Ω(P)) can be identified regardless of meaningful signals or outliers.

Experiments and Discussion
As previously mentioned, we apply the TR test on both simulated and real data.Assuming the existence of abnormal values in the data, we compare the TR test with other standard tests based on different hypothesis tests and stack sizes.Decoupling of outlier rejection from TR-test is not considered in this study.

Monte Carlo Simulation
Monte Carlo simulation is used for validating the proposed procedure.We simulate two samples based on inverse transform sampling under the desired distribution.A unit mean gamma distribution [16] is applied to the samples to serve as the speckle effect.We consider the two samples as the temporal observations of two pixels.For each of the following four scenarios, different stack sizes, distribution types, and hypothesis tests are analyzed: The two samples possess outliers without temporal changes.The distributional parameters are the same as Table 1; however, both samples include 5% outliers.The magnitudes of outliers are set to be µ + 5σ, where µ and σ represent the mean and standard deviation of the sample.(iv) Temporal change: Yes; Outlier: Yes A temporal change is designed in the first sample.The distributional parameters are the same as Table 2; however, both samples include 5% outliers.The magnitudes of outliers are set to be µ + 5σ.

Rayleigh Gamma Nakagami Lognormal Inverse Gaussian Exponential
Sample 1 In this experiment, we compare the proposed method with three types of nonparametric tests, including the KS, AD, and Cramèr-von Mises (CM) tests.Also, two parametric tests, including Bhattacharyya distance (under Gaussian assumption) and generalized likelihood test (GLRT) (under Rayleigh assumption) are compared.The significance level is set to be 1% for all tests, and the number of simulations is 10,000 for all cases.As the distributional parameters are known, the power can be computed accordingly, as shown in Figures 2-5.
In case (i), where no temporal changes or outliers are present, the TR test provides higher power than all other tests over different distributions.Although GLRT performs well under the Rayleigh distribution among the parametric/nonparametric tests, it is less effective than the proposed TR test.The results signify the importance of the signal suppression (even if the signals are simply related to the primitive backscattering energy rather than the changes/outliers), as it helps the test deal only with the stochastic processes.
In case (ii), where outliers exist in both samples, the TR test still performs well thanks to the adjusted boxplot.Conversely, other tests show less effectively than they had in case (i) due to the absence of outlier removal procedure.The results demonstrate the importance of removing abnormal values prior to the test operation.
In case (iii), where we have temporal changes in the first sample, the TR test shows discriminative power compared to the other tests.The signal suppression becomes pivotal especially in scenes with temporal changes.These changes contain meaningful signals that can seriously bias the hypothesis testing.As the TR test applies the signal suppression before the hypothesis testing, it removes the signals from both backscattering energy and temporal changes, thus augmenting its test power.
In case (iv), where samples include both temporal changes and outliers, the TR test provides reliable results.As aforementioned, the TR test suppresses signals and outliers in the measurements and deals with the signal randomness.These procedures make the SHP identification not only adaptive to the temporal changes, but also resistant to the abnormal values in the observations.Overall, the TR test reveals discriminative power in different distributions when compared with other tests.In the parametric tests, GLRT works well under Rayleigh scenario in case (i), which is as expected.Bhattacharyya criterion fails in most cases because the normality does not exist in all of the simulated scenarios.In the nonparametric tests, the AD and CM tests provide similar power in most cases.Both conventional parametric and nonparametric tests fail to provide reliable results compared to the TR test because they are not capable of taking meaningful signals into account.

Experiments with SAR Data Stack
To detail the effectiveness of the proposed TR test in real applications, we select a harbor area in Hong Kong as our test site.The SAR Data stack was acquired at 24 • 19 49.24 N, 114 • 8 0.16 E with approximately 2.5 km × 2.5 km spatial coverage (image size: 1500 × 1250 pixels), as shown in Figure 6.This data stack consists of seventy-five TerraSAR-X images acquired in the ascending orbit with the VV (Vertical-transmit-Vertical-receive) polarization.We select Area 1 and Area 2 for detailed comparisons as these areas involved several human activities (e.g., transport of containers) in the entire time span.Since these temporal changes may alter the backscattering signals received by the antenna, building up a single model for the parametric tests is not a simple task.Therefore, we compare the TR test with the AD test to show fundamental differences between the proposed and the conventional approaches.As previously mentioned, SHP can be utilized in various SAR applications.In this experiment, we use amplitude denoising as an example to verify the proposed method.Figure 7 shows the number of SHP identified at each pixel using different tests and stack sizes.Figure 8 shows the histograms corresponding to Figure 7. Results in Figure 7a-c   Figure 8a signifies that the AD test has a lower rejection rate and a higher number of SHP on the right tail of the histogram.A total of 37,497 pixels (nearly 2%) identifies the SHP with a full window size (window size is 15 × 15), i.e., 225 pixels.Conversely, Figure 8d shows that the TR test has a higher rejection rate and a lower number of SHP on the right tail of the histogram.None of the pixels identify the SHP with a full window size.Although the power cannot be directly obtained from this experiment, we may surmise that the higher rejection rate in the TR test correlates to better delineations between different ground features.On the other hand, it is clear that the stack size has significant influences on the shape of the histograms in the AD test (as demonstrated in Figure 8a-c).The results can be explained by the fact that the stack size affects the buildup of CDF and can potentially improve the power in some areas, which leads to alterations of SHP number.Note, however, that the shape of the histograms is little influenced by the stack size in the TR test (as shown in Figure 8d-f.The subtle impact on the histograms can possibly relate to the robustness of the proposed approach since the TR test does not rely on the information built up from the CDF. Figure 9 shows the denoised amplitude (amplitude values of a single SAR acquisition) and reflectivity (temporally averaged amplitude values of the entire SAR data stack) maps of Area 1.The denoised amplitude and reflectivity maps are acquired by averaging all amplitudes in SHP groups (i.e., Ω(P)) for given amplitude map and reflectivity map, respectively.Figure 9d shows that the AD test (10 images) identifies a limited number of SHP in the container areas, which leads to underfiltered maps as shown in Figure 9h,l.The delineations of the container boundaries are blurred due to the low rejection rate.Indeed, building up a meaningful CDF requires enough observations.With only a limited number of temporal observations, the AD test can easily fail to correctly reject the null hypothesis.As a result, the unchanged areas are overfiltered and the changed areas are underfiltered.The denoised results are not considered ideal in this case.Moreover, Figure 9e indicates that the AD test (75 images) identifies much less SHP in the changed areas than the previous case (i.e., AD test with 10 images).The boundary delineations become clearer at the cost of the high rejection rate.Few pixels are denoised in the changed areas as illustrated in Figure 9i.Contrarily, the proposed TR test has much better delineations between different ground features than the AD test, especially for small data stacks (see Figure 9f).Figure 9j,n also demonstrate that the amplitude and reflectivity maps are better denoised compared with Figure 9h,l.These results show that the statistics of the TR test can provide meaningful SHP groups even with small stack sizes.Increasing the stack sizes does not hamper the SHP identification either.The contrast can thus be enhanced while leaving limited pixels unfiltered, as shown in Figure 9j,k Figure 10 shows the denoised amplitude and reflectivity maps of Area 2. From Figure 10d, one can observe that the AD test (10 images) identifies many uncorrelated pixels as SHP, and the boundaries are not ideal due to the small correct rejection rate limited by the stack size.A significant amount of overfiltered pixels can be found in Figure 10h,l.Furthermore, a high rejection rate occurs in the changed areas as the stack size increases (see Figure 10e).Similar to Area 1, the changed areas are mostly underfiltered in the AD test (see Figure 10i), whereas distinct ground features are well delineated in the proposed TR test (see Figure 10f).Compared with the results in Figure 10h,l, the results in Figure 10j,n are better denoised.One can also observe better denoised results in Figure 10k than in Figure 10j as the stack sizes increase.

Conclusions
In this letter, we present a new testing method for SHP identification.The method considers the impacts of signals and outliers in SHP identification.By suppressing the meaningful signals and removing the outliers, the approach makes the test operation more effective than the conventional ones.Consequently, the test operation outperforms the conventional methods.According to the experimental results, we draw the following conclusions:

•
Removing signals and outliers before conducting any hypothesis tests is advantageous.By reducing the impacts of these measurements, the hypothesis testing can deal only with the stochastic processes.This not only makes the parametric tests applicable, but also augments the power of the test operation.

•
Considering temporal variabilities is pragmatically necessary, especially when dealing with data stacks crossing through a large temporal spacing.The proposed approach helps to identify SHP family even with temporal variations.

•
Since having large sample sizes can lower the probability of conducting type I and II errors, the proposed approach becomes useful in mitigating the impact of temporal variabilities while keeping the large temporal spacing.

•
The difference of two time series (i.e., ψ(P ij )) keeps the temporal sequence that would be lost when building sample CDF in tests like KS or AD.This also helps to improve the effectiveness of hypothesis tests.
For future works, the proposed procedure can be directly applied to different applications, such as adaptive filtering, unbiased coherence estimation, image segmentation, etc.
(i) Temporal change: No; Outlier: No The two samples possess neither temporal changes nor outliers.Three types of distributions are analyzed.The distributional parameters are shown in Table 1.(ii) Temporal change: No; Outlier: Yes

(
iii) Temporal change: Yes; Outlier: No A temporal change is designed in the first sample.The time of change is always at N/2 for different sample sizes, meaning that observations before and after N/2 follow different distributional parameters.The corresponding parameters are shown in Table 2.

Figure 2 .
Figure 2. The power of hypothesis tests in case (i).

Figure 3 .
Figure 3.The power of hypothesis tests in case (ii).

Figure 4 .
Figure 4.The power of hypothesis tests in case (iii).

Figure 5 .
Figure 5.The power of hypothesis tests in case (iv).

Figure 6 .
Figure 6.The area of interest (white box) in Hong Kong retrieved from World Imagery (Esri, Redlands, CA, USA).Two areas were selected for detailed comparisons.
Figure7shows the number of SHP identified at each pixel using different tests and stack sizes.Figure8shows the histograms corresponding to Figure7.Results in Figure7a-cindicate that the cumulative distribution function (CDF) becomes more meaningful as the stack size increases.The boundary delineation of different ground features becomes more significant as well.However, the performances of the two tests are significantly different.With only 10 images (Figure 7a,d), the AD test has limited capabilities to distinguish different ground features, whereas the TR test has good delineations of different backscattering properties.

Figure 7 .
Figure 7. Number of statistically homogeneous pixels (SHP) identified at each pixel based on different tests and stack sizes.The significance level is 5%.(a-c) represent the SHP number of the Anderson-Darling (AD) test using 10, 30, 75 images, respectively; (d-f) correspond to the SHP number of Robust-T (TR) test using 10, 30, 75 images, respectively.

Figure 8 .
Figure 8. Histograms of the SHP number.Horizontal axis is the number of SHP based on a 15 × 15 window.Vertical axis is the pixel counts at the corresponding SHP number; (a-f) correspond to Figure 7a-f, respectively.

Figure 9 .
Figure 9. Amplitude denoising of Area 1 based on different approaches and stack sizes.(a) an optical image retrieved from Google Earth; (b) an unfiltered amplitude map acquired on the 25 October 2008; (c) a reflectivity map generated by the original data stack (75 images);(d-g) SHP number correspond to AD 10imgs , AD 75imgs , TR 10imgs , and TR 75imgs , respectively; (h-k) denoised amplitude maps correspond to AD 10imgs , AD 75imgs , TR 10imgs , and TR 75imgs , respectively; and (l-o) denoised reflectivity maps correspond to AD 10imgs , AD 75imgs , TR 10imgs , and TR 75imgs , respectively.

Figure 10 .
Figure 10.Amplitude denoising of Area 2 based on different approaches and stack sizes.(a) an optical image retrieved from Google Earth; (b) an unfiltered amplitude map acquired on 25 October 2008; (c) a reflectivity map generated by the original data stack (75 images);(d-g) SHP number corresponds to AD 10imgs , AD 75imgs , TR 10imgs , and TR 75imgs , respectively; (h-k) denoised amplitude maps correspond to AD 10imgs , AD 75imgs , TR 10imgs , and TR 75imgs , respectively; and (l-o) denoised reflectivity maps correspond to AD 10imgs , AD 75imgs , TR 10imgs , and TR 75imgs , respectively.

Table 2 .
Distributional parameters for case (iii) and (iv).λ, k, θ, Ω, m, u, and σ represent the scale, shape, scale, spread, shape parameters, log mean, and log standard deviation, respectively.Subscripts b and a indicate the parameters before and after the temporal change, respectively.