This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/)

Statistical modeling is essential to SAR (Synthetic Aperture Radar) image interpretation. It aims to describe SAR images through statistical methods and reveal the characteristics of these images. Moreover, statistical modeling can provide a technical support for a comprehensive understanding of terrain scattering mechanism, which helps to develop algorithms for effective image interpretation and creditable image simulation. Numerous statistical models have been developed to describe SAR image data, and the purpose of this paper is to categorize and evaluate these models. We first summarize the development history and the current researching state of statistical modeling, then different SAR image models developed from the product model are mainly discussed in detail. Relevant issues are also discussed. Several promising directions for future research are concluded at last.

Statistical modeling of SAR images is one of the basic problems of SAR image interpretation. It involves several fields such as pattern recognition, image processing, signal analysis, probability theory, and electromagnetic scattering characteristics analysis of targets

Statistical modeling is of great value in SAR image applications. Firstly, it leads to an in-depth comprehension of terrain scattering mechanism. Secondly, it can guide the researches of speckle suppression [

The research on statistical modeling of SAR images may be traced back to the 1970s. With the acquisition of the first SAR image in the U.S., the analysis of real SAR data directly promoted the development of statistical modeling techniques. The speckle model of SAR images, proposed by Arsenault [

Since the 1990s, with the coming forth of a series of air-borne or space-borne SAR platforms, the acquisition of SAR data is no longer a problem. Due to the urgent demands for analyzing and interpreting the obtained image data, statistical modeling has drawn much attention.

In recent years, many famous research organizations have been studying SAR statistical modeling [

According to the modeling process, the statistical models of SAR images can be divided into two categories [

Since nonparametric modeling involves complex computation as well as numerous data, it is usually time-consuming and cannot satisfy the requirements of various applications [

Several strategies have been proposed in the literature to deal with parameter estimation [

A number of methods for quantitatively assessing the validity of statistical models in light of sample data have been developed over the last hundred years. Many of these methods place the problem in a statistical hypothesis testing framework, pitting a null-hypothesis _{0}, an assertion that the data were not generated according to the model, against an alternative hypothesis _{1}, an assertion that they are not. The methods are then implemented by computing some statistic of the random observations that has a known distribution if _{0} were true. Values of this quantity close to zero are interpreted as evidence that _{0} should be rejected in favor of _{1}. The purpose of these methods is to seek the model that best describes observed data from a set of specified models, irrespective of whether any model is actually a good fit to the data [

In summary, the major rules for assessing the fitting accuracy includes the ^{2} matching test [

The purpose of statistical modeling of SAR images is to determine a statistical model for single-polarimetric images or multi-polarimetric images. The multi-polarimetric SAR images are a combination of four basic kinds of polarimetric images represented by the scattering matrix. For any one of the polarimetric images, its statistical characteristics are no different from those of a single-polarimetric image. The single-polarimetric statistical model can be extended to describe the multi-polarimetric images [

It is more than 30 years since the SAR statistical model has been first studied. Researchers have proposed various statistical models, among which the statistical model family based on the product model outperforms other models [

The nonparametric models are an effective kind of models which can estimate the probability density function (PDF) of SAR image data based on the nonparametric method. The basic idea is to use the weighted sum of different kernel functions to obtain the estimation of the statistical distribution. Typical methods include: the Parzen window technique [

The underlying idea of parametric modeling is to use the parameter estimation method to determine the statistical model of SAR image data according to some known distributions. During the past 20 years, the parametric model has been widely and thoroughly studied. With the analysis of data from different sensors and the scattering mechanism of different kinds of terrain, many concrete SAR statistical distributions for different cases have been proposed.

The parametric models can be classified into four categories according to its main idea (see

The product model is widely used in SAR image analyzing, processing and modeling. Most of the widely-used statistical models are developed from the product model, which is derived in turn from the speckle model. The process of developing concrete statistical models from the speckle model is shown in

The speckle model, proposed by Arsenault [

Each resolution cell contains sufficient scatterers;

The echoes of these scatterers are independently identically distributed;

The amplitude and phase of the echo of each scatterer are statistically independent random variables;

The phase of the echo of each scatterer is uniformly distributed in [0,2π];

Inside a resolution cell, there are no dominant scatter- ers;

The size of a resolution cell is large enough, compared with the size of a scatterer.

Secondly, with the six hypotheses mentioned above and the central limit theorem (CLT) [

Motivated by the speckle model, Ward [

Since the speckle component has a determinate statistical distribution, only the RCS fluctuation component need to be considered when developing the statistical models of SAR images (see

The RCS of a homogeneous region (e.g., the grassland region) in either low-resolution or high-resolution SAR images can be expected to correspond to a constant. Actually, most scenes contain in-homogeneous regions with RCS fluctuations [

Motivated by the derivation of

The

To solve these problems, Frery deduced a new statistical model, the ^{0} (also called

The former is appropriate for the heterogeneous region and the latter is appropriate for the extremely heterogeneous region. The ^{0} distribution can be converted into the ^{0} distribution is a specific example of the ^{0} distribution and the ^{0} distribution are sensitive to the homogeneous degree of a region, which makes the ^{0} model appropriate for modeling either heterogeneous or extremely heterogeneous region. Moreover, MoM can be easily and successfully applied to parameter estimation of the ^{0} distribution. Frery [^{0} distribution.

A further particular case of the the ^{h}

Eltoft [

The above models developed from the product model are all derived under the hypothesis that the speckle component satisfies the central limit theorem. Theoretically, when the resolution becomes high enough, the resolution cell will be so small that the central limit theorem cannot be applied any more. Thus, the above models are not appropriate for modeling of the high-resolution SAR images. Accordingly, Anastassopoulos [

Another thread of statistical modeling is to develop the models based on the generalized central limit theorem [

In order to consider further the statistical modeling problem of narrowband SAR images, Kuruoglu [

The empirical distributions have no sound deduction in theory. They come from the experience of analyzing real data. Several empirical models have been used to characterize the statistics of SAR amplitude or intensity data, such as Weibull, log-normal, and Fisher PDFs.

The log-normal distribution was proposed by George [

The Weibull distribution [

Recently, the Fisher distribution has also been adopted as an empirical model for the SAR statistics over high resolution urban regions [^{0} PDF [

Goodman [

Blake [

Blacknell [

Besides, some other models, which are mostly the generalization or modification of the models mentioned above, have been proposed in the literature [

The statistical model of a single-look image is a special example of the corresponding multi-look model when the look number _{I}_{A}

Hence, the statistical distribution of single-look data can be deduced from that of multi-look data; and the distribution of the amplitude can be deduced from that of the intensity. Additionally, the log-transformed distributions are also deduced easily according to [

According to many researchers’ experiences [

Much progress has been made with the research of statistical modeling of SAR images in the past few tens of years, especially during recent years. The related literatures are uncountable. As far as we could comprehend, the major conclusions and several promising directions for further research are summarized as follows:

Regarding the deducing process of current statistical models, many assumptions are made to acquire the models, so these models can only approximately describe the electromagnetic scattering characteristics of the scene in theory, which is the common shortcoming of all the statistical modeling of the scene. How to construct models that can exactly describe the electromagnetic scattering characteristics of a scene will be a big challenge.

Among the existing statistical models, those developed from the product model are the most widely used and the most promising. This can also be seen from the related literatures.

The statistical models based on the product model can be divided into two cases according to whether the speckle component satisfies the central limit theorem or not. Correspondingly, there are two typical models, ^{0} model and the GC model with difficulty in application. The problem is, what level on earth the resolution is increased to that the speckle component doesn’t satisfy the central limit theorem any longer. No conclusion has been made yet.

It is a novel idea to model a region according to its homogeneousness degree. The ^{0} model (the ^{0} model are sensitive to the homogeneousness degree of the observed images. Such a characteristic make it suitable for modeling the homogeneous, heterogeneous or extremely heterogeneous, single-look or multi-look, intensity or amplitude data. That means it can be universally used. On the other hand, many widely used models can be unified to the ^{0} model (see

All the statistical models, even the ^{0} model, can describe the regions only with relatively simple contents and a few terrain types. In other words, the statistical model has the so-called “regional” characteristic. For the large- scale scene, whose contents are complex and terrain types are extremely numerous, it is impractical to use the statistical models with a few parameters to describe the whole image. However, models with too many parameters also cause difficulties in applications. Therefore, it is a trend to build a statistical model with the “regional” characteristic. Typically, Billingsley [

According to the related literatures, once a model was proposed, it would be applied to diverse images with several bands and different view angles. Usually, their results were good. Generally speaking, the diversity of the band and the view angle of a sensor within a certain scope have slight influence on statistical modeling of the SAR data.

It is also a new idea to consider the correlation among the SAR data. In theory, it can expose the statistical characteristics of SAR images more accurately. However, it’s hard to exactly model the correlation. Borghys [

Statistical modeling of SAR images is one of the basic research topics of SAR image interpretation. It is of great significance both in theory and in applications. Based on an extensive investigation on the related literatures, this paper begins with the history and current research state of statistical modeling of SAR images. Then, statistical modeling techniques are thoroughly reviewed using the product model as a thread and some major problems are briefly illustrated in order to attract more attentions in this field. We believe that the research will progress widely and deeply due to the demands of SAR image interpretation.

_{2}and

A general flow chart of parametric modeling.

Four major categories of parametric modeling Note: PM represents the product model; CLT represents the central limit theorem; GCLT represents the general central limit theorem.

Process of developing a statistical model from the product model.

Statistical models of constant RCS or RCS fluctuations when the speckle component satisfy the central limit theorem.

Statistical models of RCS fluctuations when the speckle component satisfies the central limit theorem.

Statistical models when the speckle component dissatisfies the central limit theorem.

Relationship among the major statistical models (

Summary of the applications of the major models.

Yes | Complex | High-resolution, amplitude or intensity, single-look | unsuitable for multi-look images | ||

Yes | Simple | Moderately high-resolution, amplitude | Data over fitted phenomenon | ||

Yes | Simple | Homogenous, heterogeneous or extremely heterogeneous region, multi- or single-look, intensity or amplitude | Be equivalent to a ^{0} | ||

Yes | Simple | Homogenous region, single-look, amplitude | Widely used in interpretation algorithms | ||

Yes | Simple | Homogenous region, single-look, intensity | Widely used in interpretation algorithms | ||

Yes | Simple | Homogenous region, multi-look, intensity | The amplitude distribution corresponding to the square root Gamma. | ||

Yes | Complex | Moderately heterogeneous region, multi- or single-look, intensity or amplitude (having corresponding expressions for each case) | Widely used in interpretation algorithms | ||

Yes | Complex | Moderately heterogeneous region, multi- or single-look, intensity or amplitude (having corresponding expressions for each case) | Seldom used in interpretation algorithms | ||

Yes | Complex | Homogenous, heterogeneous or extremely heterogeneous region, multi- or single-look, intensity or amplitude (having corresponding expressions for each case) | Difficult to apply | ||

^{0} |
Yes | Simple | Homogenous, heterogeneous or extremely heterogeneous region, multi- or single-look, intensity or amplitude (having corresponding expressions for each case) | A special example of the G distribution, also called the | |

Yes | Simple | Homogenous, heterogeneous or extremely heterogeneous region, single-look, intensity | A special example of the G^{0} distribution, widely used | ||

^{h} |
Yes | Simple | extremely heterogeneous urban areas and mixed terrian | A special example of the G distribution | |

Yes | Simple | Ultrasound images | Further investigation for SAR images is needed | ||

No | Complex | Various image data with an extremely high resolution level | A general form of many other models, difficult to apply, further validation is needed | ||

No | Complex | Real and imaginary components of SAR data | Used in modeling the woodland regions in UWB SAR data | ||

No | Complex | Long-tailed amplitude image of urban area | Difficult to apply | ||

Yes | Complex | Low-resolution image with targets in weak clutter | Seldom used | ||

Yes | complex | Heterogeneous | Difficult to apply | ||

Yes | simple | Considering the correlation between pixels | Correlation is simple, further research is needed |

Note: “1” represents the empirical distributions; “2” represents the models developed from the product model; “3” represents the models developed from the general central limit theorem; “4” represents other models.