Next Article in Journal
Spatiotemporal Intermittency in Pulsatile Pipe Flow
Previous Article in Journal
Computational Hardness of Collective Coin-Tossing Protocols
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Bayesian Inference of Pareto Models Based on Information Geometry

School of Mathematics and Statistics, Beijing Institute of Technology, Beijing 100081, China
*
Author to whom correspondence should be addressed.
Entropy 2021, 23(1), 45; https://doi.org/10.3390/e23010045
Submission received: 1 December 2020 / Revised: 26 December 2020 / Accepted: 26 December 2020 / Published: 30 December 2020

Abstract

:
Bayesian methods have been rapidly developed due to the important role of explicable causality in practical problems. We develope geometric approaches to Bayesian inference of Pareto models, and give an application to the analysis of sea clutter. For Pareto two-parameter model, we show the non-existence of α-parallel prior in general, hence we adopt Jeffreys prior to deal with the Bayesian inference. Considering geodesic distance as the loss function, an estimation in the sense of minimal mean geodesic distance is obtained. Meanwhile, by involving Al-Bayyati’s loss function we gain a new class of Bayesian estimations. In the simulation, for sea clutter, we adopt Pareto model to acquire various types of parameter estimations and the posterior prediction results. Simulation results show the advantages of the Bayesian estimations proposed and the posterior prediction.

1. Introduction

Geometric method plays an important role in Bayesian statistics. At present, there are two main ways to study Bayesian inference through geometric methods. An idea is to regard the prior distribution, the probability distribution of the statistical model and the posterior distribution as the vectors in Hilbert space L 2 ( Θ ) . The research is then carried out through the geometric properties of Hilbert space. M. de Carvalho [1] used the cosine of the angle between vectors to study their relationship with each other, where the cosine of priors represents coherency of opinions of experts, the cosine of prior and probability density represents prior-data agreement and the cosine of prior and posterior represents sensitivity of the posterior to the prior specification. Furthermore, M. de Carvalho used Monte Carlo Markov Chain to give an estimation of the cosine value for further analysis. R. Kulhavy [2] viewed statistical inference as an approximation of the empirical density rather than an estimation of a true density, and built a model by analyzing the trace of orthogonal projection of conditional empirical distributions onto the model manifold. He also used Kerridge inaccuracy as a generalized empirical error. Kerridge inaccuracy is a generalization of Shannon entropy. It is used to measure the difference between observed distribution Q = ( q 1 , , q n ) and true distribution P = ( p 1 , , p n ) , which is defined by I ( P , Q ) = i p i log ( q i ) . The advantage of this idea is providing a unified treatment to all pieces of Bayesian theorem. However, the finite parameter space measure is required.
Another idea is to give the statistical manifolds Riemannian metrics. J.A. Hartigan [3,4,5] proposed a reparametrization invariance prior- α -parallel prior and later J. Takeuchi and S. Amari [6,7] clarified an interesting connection between the information geometrical properties of the statistical model and the existence of the α -parallel prior. α -parallel prior, as an uninformed prior, is invariant under the coordinate transformation and can well reflect the intrinsic properties of the model. It is worth noting that the general α -parallel prior does not always exist, but 0-parallel prior, Jeffreys prior, always exists. After obtaining the corresponding prior, subsequent Bayesian inference and prediction can be carried out. J. Takeuchi and S. Amari studied the asymptotic properties of minimum description length (MDL) estimation and projected Bayesian estimation of general exponential families, and extended it to curve exponential families. The differential geometry of curved exponential families are given by S. Amari and M. Kumon [8,9].
In many confidential fields, historical data is difficult to obtain. Hence one of the advantages of the above idea is giving prior a good theoretical basis. Besides, this idea can illustrate the geometric meaning of the common Bayesian estimations and provide an estimation in the sense of minimal mean geodesic distance. In the application of target detection, maritime radar performance is often seriously interfered and suppressed by sea clutter, especially for the detection of weak targets on the sea. Therefore, whether sea clutter can be effectively suppressed is the key factor to improve the performance of maritime radar systems. Thus the study of sea clutter is of vital importance [10]. With the development of radar hardware technology, the statistical distribution histogram of radar sea clutter appears long “trailing tail”, which is manifested as frequent sea peak phenomenon, and the amplitude distribution of clutter seriously deviates from the Rayleigh distribution proposed before. In order to solve this problem, an improved Rayleigh model-compound Gaussian model is proposed [11]. The compound Gaussian model successfully illustrates the formation mechanism of sea clutter, and is more successful than the single point probability distribution model in terms of data fitting. In the compound Gaussian model, K distribution and Pareto distribution are two typical representatives. When the structural component is Gamma distribution, the compound Gaussian model is K distribution. When the structural component is inverse Gamma distribution, the compound Gaussian model is Pareto distribution. In 2010, M. Farshchian and F.L. Posner [12] analyzed the sea clutter data in X-band, and used Pareto distribution to fit data. They found that the fitting effect of the tail was better than K distribution.
The paper is organized as follows—Section 2 introduces the preliminary including some geometric structure of statistical manifolds and some concepts of Bayesian inference. In Section 3, we introduce the geometric approaches for Bayesian inference by using α -parallel connection, and propose a geometric loss function based on geodesic distance. In Section 4, we prove that Pareto two-parameter model does not have general α -parallel prior. Then we adopt Jeffreys prior to provide the explicit expressions of estimations in the sense of minimal mean geodesic distance. Furthermore, we come up with theorems under Al-Bayyati’s loss function to obtain a new class of Bayesian estimations. The bounds of certain expressions without closed forms are given. Besides, we show the existence of the best parameter in Al-Bayyati’s loss function. In Section 5, we show the advantages of the proposed Bayesian estimations and the posterior prediction distributions.

2. Preliminaries

2.1. α-Parallel Prior

Definition 1
([7,13]). For a statistical manifold M = { p ( x ; θ ) | p ( x ; θ ) d x = 1 , p ( x ; θ ) > 0 , θ Θ R d } , define an affine connection ( α ) on M with the following coefficients
Γ j k ( α ) i : = Γ j k ( 0 ) i + 1 α 2 T s j k g i s , Γ j k ( 0 ) i : = Γ s : j k ( 0 ) g i s ,
where α R is an arbitrary real number, g i j = E i l j l , g i j is the inverse matrix g i j , Γ s : j k ( 0 ) = E s l j k l , T s j k = E s l j l k l , l : = log p ( x ; θ ) denotes the log likelihood function and E · denotes the expectation with respect to the observation x.
Definition 2
([7]). An affine connectionis called locally equiaffine if around each point x of M, there is a parallel volume element, that is, a nonvanishing d-form w such that w = 0 .
An equiaffine connectionon M is a torsion-free affine connection with a parallel volume element w on M.
If w is a volume element on M such that w = 0 , then we say that , w is an affine structure on M.
For a statistical manifold M, we may represent the α -parallel volume element w as
w = π ( θ ) d θ 1 · · · d θ d
for a certain coordinate θ = θ 1 , · · · , θ d Θ R d , where π is the volume form on the whole manifold. We take π ( θ ) as a prior distribution on the parameter space Θ .
Definition 3
([7]). In a statistically equiaffine manifold, for a fixed α R , we call the above form of π an α-parallel prior.
When α = 1 , 1-parallel prior is called maximum likelihood estimation (MLE) prior proposed by J.A. Hartigan [5]. Note that there always exists a ( 0 ) -parallel volume element w g ( θ ) d θ 1 · · · d θ d , where g is the determinant of the Fisher metric, the invariant volume element in a Riemannian manifold M , g i j . This prior distribution π g ( θ ) is called the Jeffreys prior.
J. Takeuchi and S. Amari gave a sufficient and necessary condition for the existence of α -parallel prior.
Proposition 1
([6]). For a statistical manifold M with the α connection ( α ) , if α 0 , then there exists an α-parallel prior if and only if
i T j j T i = 0
where T i : = T i k l g k l .

2.2. Bayesian Inference

For the random variate x subject to the distribution p ( x ; θ ) , and let π ( θ ) be the prior distribution of θ . The posterior distribution π ( θ | x ) is given by the formula
π ( θ | x ) = p ( x ; θ ) π ( θ ) Θ p ( x ; θ ) π ( θ ) d θ .
Now, we introduce some notations for later uses. Let θ ^ M D be the maximum posterior estimation. Let θ ^ M e be the posterior median estimation which is the median of the posterior distribution. Let θ ^ E be the posterior expectation estimation which is the expectation of the posterior distribution.
These three estimations are also known as Bayesian estimations of θ . When θ ^ = θ ^ E , the posterior mean square value reaches the minimum. Hence θ ^ E = E [ θ | x ] is often taken as the Bayesian estimation.
Let the random variable X p ( x ; θ ) . If one does not know the observation data, the marginal distribution m ( x ) is also known as the prior prediction distribution. If one obtains the observation data x = ( x 1 , · · · , x n ) , the distribution of unknown observation values could be obtained by the posterior distribution π ( θ | x ) :
  • Predict the future observations of the same population p ( x ˜ ; θ )
    m ( x ˜ | x ) = Θ p ( x ˜ ; θ ) π ( θ | x ) d θ .
  • Predict the observations of another population g ( z ; θ )
    m ( z | x ) = Θ g ( z ; θ ) π ( θ | x ) d θ ,
    where m ( x ˜ | x ) or m ( z | x ) is called the posterior predictive distribution.

3. The Geometric Approaches for Bayesian Inference

In this section, we introduce the basic methods of Bayesian inference with geometric means. The idea of geometry is embodied in the selection of priors and loss functions.

3.1. The Geometric Prior

The idea of geometric methods is to extend the uniform distribution naturally and construct geometric priors suitable for multidimensional and measure infinite-dimensional parameter space according to the idea that probability measure is proportional to volume element. The studied probability distribution family can be regarded as a statistical manifold with Riemannian metrics.
Fisher information matrix is the most widely used Riemannian metric on statistical manifolds, and the prior generated by its corresponding volume element is Jeffreys prior. α -connection is a natural extension of the Levi-Civita connection corresponding to Fisher information matrix. Its corresponding volume element is α -parallel volume element, and the generated prior is called α -parallel prior. In particular, the 0-parallel prior is the Jeffreys prior.
α -parallel prior reflects the intrinsic property of the model and does not depend on the selection of parameters. Although Jeffreys prior must exist, general α -parallel prior does not necessarily exist. (1) gives the necessary and sufficient conditions for the existence of general α -parallel prior.
Therefore, when one deals with Bayesian inference by geometric methods, the first step is to select the appropriate geometric priors, that is, to verify the existence of α -parallel prior in a specific statistical manifold.
With Riemannian metric, we can acquire geometric information of the statistical manifold, such as connection, curvature, geodesic and geodesic distance [14,15]. Through geometric priors, the joint posterior density of the parameters can be obtained, and then the corresponding Bayesian estimation and Bayesian posterior prediction are carried out [16].

3.2. The Geometric Loss Functions

In this subsection, we show the geometric meaning of the common Bayesian estimations and propose a new geometric approach of choosing loss functions.
Proposition 2.
Suppose that θ = θ 1 , , θ d Θ R d . Let π ( θ | x ) be the joint posterior distribution and θ ^ = θ ^ 1 , , θ ^ d be the estimation of θ. For the loss function l 1 θ , θ ^ = i = 1 d | θ ^ i θ i | , the corresponding estimation is θ ^ M e = θ ^ 1 M e , , θ ^ d M e . For the loss function l 2 θ , θ ^ = i = 1 n ( θ ^ i θ i ) 2 , the corresponding estimation is θ ^ E = θ ^ 1 E , , θ ^ d E .
Proof. 
Denote d θ i ˇ = d θ 1 d θ i 1 d θ i + 1 d θ d . Let π ( θ i | x ) be the marginal posterior density and Pr ( θ i t | x ) be its cumulative distribution. Assume that Pr ( θ i a | x ) = 0 and Pr ( θ i b | x ) = 1 , where a , b R and they may be infinite.
For the loss function l 1 ( θ , θ ^ ) , we define
R 1 ( θ , θ ^ ) = Θ l 1 ( θ , θ ^ ) π ( θ | x ) d θ = a θ ^ i θ ^ i b θ i ˇ ( θ ^ i θ i ) π ( θ | x ) d θ i ˇ d θ i + j i Θ | θ ^ j θ j | π ( θ | x ) d θ = θ ^ i a θ ^ i θ ^ i b π ( θ i | x ) d θ i a θ ^ i θ ^ i b θ i π ( θ i | x ) d θ i + j i Θ | θ ^ j θ j | π ( θ | x ) d θ .
Then we get
R 1 θ ^ i = a θ ^ i π ( θ i | x ) d θ i θ ^ i b π ( θ i | x ) d θ i .
Let R 1 θ ^ i = 0 , we have θ ^ i = θ ^ i M e .
For the loss function l 2 ( θ , θ ^ ) , we define
R 2 ( θ , θ ^ ) = Θ l 2 ( θ , θ ^ ) π ( θ | x ) d θ = i = 1 d Θ ( θ ^ i θ i ) 2 π ( θ | x ) d θ .
Then we obtain
R 2 θ ^ i = 2 a b ( θ ^ i θ i ) π ( θ i | x ) d θ i
Hence, R 2 θ ^ i = 0 implies that θ ^ i = θ ^ i E .  □
If the loss function is the distance induced by · 1 , then by Proposition 2 we see that the corresponding risk function represents the average distance between the estimated value and the true value. Besides, the posterior median estimation of parameters minimizes risk function, which means that this estimation has the minimum mean distance from the posterior density.
If the loss function is the distance induced by · 2 , then the corresponding risk function represents the average value of the square of the distance between the estimated value and the true value. The obtained estimation is the posterior expectation of parameters, which has the minimum mean square error from the posterior density.
These two kinds of loss functions above are distances or increasing functions of distances in R n . However, in the parameter space endowed with corresponding Riemannian metric, the distance between two points is geodesic distance instead of Euclidean distance.
Hence, in order to make the estimation more accurate, we take the geodesic distance or its increasing function as a loss function, the corresponding risk function represents the geodesic distance between the estimated value and the true value. Before that, we need the following definition.
Definition 4.
(Mean Geodesic Estimation) Assume that the statistical manifold M = { p ( x ; θ ) } is equipped with Fisher Riemannian metric ( g i j ) . Let π ( θ | x ) be the joint posterior distribution and d ( θ , θ ^ ) be the geodesic distance between θ and θ ^ , where θ ^ is the estimation of θ. Let F: R R be an increasing function. Denote D ( θ , θ ^ ) = F d ( θ , θ ^ ) . The risk function with the loss function D ( θ , θ ^ ) is
R ( θ , θ ^ ) = Θ D ( θ , θ ^ ) π ( θ | x ) d θ .
The estimation minimizing R ( θ , θ ^ ) is called mean geodesic estimation (MGE) and denoted by θ ^ M G E .
The geometric priors, the corresponding geodesic distance and the corresponding Bayesian inference depend on the choice of the Riemannian metric. Hence choosing a proper Riemannian metric is of great importance to Bayesian inference.

4. Bayesian Inference on Pareto Model

4.1. The Geometric Structure of Pareto Two-Parameter Model

The probability density function of Pareto two-parameter distribution satisfies
p ( x ; α , β ) = β α β x β + 1 I [ x α ] , α > 0 , β > 0 ,
where α is called the scale parameter and β is called the shape parameter.
Its logarithmic likelihood function is
l ( x ; α , β ) = log p ( x ; α , β ) = log β + β log α ( β + 1 ) log x
Noting that the Pareto distribution family does not meet the common regularity condition, hence the Fisher-Rao metric on the Pareto distribution family is not equal to the negative Hessian matrix.
Furthermore, from References [17,18] we can get the geometric structure of Pareto model. On Pareto two-parameter distribution family, the tensor form of Fisher-Rao metric satisfies
g = β 2 α 2 d α d α + 1 β 2 d β d β ,
which is isometric to the upper half of the Poincar e ´ plane. Hence, Pareto two-parameter model ( P , g ) is a Riemannian manifold endowed with Riemannian metric g. The volume form, the connection form, the curvature form, the Christoffel symbols and the geodesic distance formula on ( P , g ) are given as follows
d v = θ 1 θ 2 = 1 α d α d α
w 2 1 = β α d α
Ω 2 1 = d w 2 1 = 1 α d α d α = K d v
1 1 = 1 α 1 β 3 α 2 2 2 1 = 1 2 = 1 β 1 2 2 = 1 β 2
d ( ( α 0 , β 0 ) , ( α 1 , β 1 ) ) = arcosh 1 + β 0 β 1 ( log α 0 log α 1 ) 2 2 + ( β 0 β 1 ) 2 2 β 0 β 1 .

4.2. The Existence of α-Parallel Prior on Pareto Two-Parameter Model

Theorem 1.
When α 0 , Pareto two-parameter model does not have any α-parallel prior.
Proof. 
Denote 1 = α , 2 = β . Then by calculation, we can obtain
T 111 = E [ 1 l 1 l 1 l ] = β 3 α 3 T 222 = E [ 2 l 2 l 2 l ] = 2 β 3 T 112 = T 121 = T 211 = E [ 1 l 1 l 2 l ] = 0 T 122 = T 221 = T 212 = E [ 1 l 2 l 2 l ] = 1 α β
Hence, we get
T 1 = T 1 i k g i k = 2 β α T 2 = T 2 i k g i k = 2 β
and
i T j j T i = ± 2 α d β d α .
It is obvious that α 0 means i T j j T i 0 . Therefore, according to Proposition 1, we find that Pareto two-parameter model does not have any α -parallel prior when α 0 .  □
From Theorem 1, we see that Pareto two-parameter model only has the 0-parallel prior (Jeffreys prior). Its Jeffreys prior π ( α , β ) satisfies π ( α , β ) α 2 β 2 1 β 2 = 1 α , α > 0 , β > 0 , which is a generalized prior density whose integral is infinite. We assume that π ( α , β ) = 1 α , α > 0 , β > 0 .

4.3. Bayesian Estimations of Pareto Model

Before we proceed, we state necessary results from Reference [17].
The joint probability density of a simple random sample on Pareto model is
p ( x ; α , β ) = β n α n β i = 1 n x i β 1 I [ min i = 1 n x i α ] .
The posterior distribution of Pareto model under Jeffreys prior is obtained by Bayesian formula
π ( α , β | x ) = n q 2 ( x ) n log q 1 ( x ) n τ ( n ) β n α n β 1 exp q 2 ( x ) β I [ 0 α q 1 ( x ) ] ,
where q 1 ( x ) = min i = 1 n x i , q 2 ( x ) = i = 1 n log x i . Furthermore, by calculation we can see that the maximum likelihood estimation and the maximum posterior estimation of α , β are given as
a ^ M L E = a ^ M D = q 1 ( x ) β ^ M L E = β ^ M D = n i = 1 n log x i n log min i = 1 n x i = n q 2 ( x ) n log q 1 ( x ) .
The marginal posterior density of α determined by the joint posterior density π ( α , β | x ) is
π ( α | x ) = n 2 q 2 ( x ) n log q 1 ( x ) n α q 2 ( x ) n log α n + 1 I [ 0 α q 1 ( x ) ]
and its cumulative distribution function is
Pr ( α t | x ) = 1 + β ^ M L E log α ^ M L E t n , 0 t q 1 ( x ) .
The marginal posterior density of β determined by the joint posterior density π ( α , β | x ) is
π ( β | x ) = β n 1 q 2 ( x ) n log q 1 ( x ) n τ ( n ) exp q 2 ( x ) n log q 1 ( x ) β .
Under posterior distribution (13), when β is known, the conditional posterior density of α is
π ( α | x , β ) = n β α n β 1 q 1 n β ( x ) I [ 0 α q 1 ( x ) ]
and its cumulative distribution function is
Pr ( α t | x , β ) = t q 1 ( x ) n β , 0 t q 1 ( x ) .
When α is known, the conditional posterior density of β is
π ( β | x , α ) = β n q 2 ( x ) n log α n + 1 τ ( n + 1 ) exp ( q 2 ( x ) n log α ) β .

4.3.1. Mean Geodesic Estimation

The geodesic distance between ( α , β ) and ( α ^ , β ^ ) is expressed as
d ( ( α , β ) , ( α ^ , β ^ ) ) = arcosh 1 + β β ^ ( log α log α ^ ) 2 2 + ( β β ^ ) 2 2 β β ^ .
Hence, the distance function is a monotone function of
1 + β β ^ ( log α log α ^ ) 2 2 + ( β β ^ ) 2 2 β β ^ .
We denote (22) as
D ( ( α , β ) , ( α ^ , β ^ ) ) = 1 + β β ^ ( log α log α ^ ) 2 2 + ( β β ^ ) 2 2 β β ^ .
By the discussion in the Section 3, when α and β are unknown, D ( ( α , β ) , ( α ^ , β ^ ) ) can be taken as the loss funtion, and the estimations α ^ M G E and β ^ M G E in the sense of minimum mean geodesic distance can be obtained. When β = β 0 or α = α 0 , D β 0 ( ( α , β 0 ) , ( α ^ , β 0 ) ) or D α 0 ( ( α 0 , β ) , ( α 0 , β ^ ) ) can be taken as the loss function respectively, and we can get the mean geodesic estimation α ^ M G E ( x | β ) or β ^ M G E ( x | α ) .
Theorem 2.
When α and β are unknown, we have
α ^ M G E = α ^ M L E exp 1 n β ^ M L E , β ^ M G E = 2 n 2 ( n 1 ) 2 n 3 + n + 1 β ^ M L E .
Proof. 
Denote f ( α ) = E β [ π ( α , β | x ) ] , then have
f ( α ) = n 2 ( n + 1 ) q 2 ( x ) n log q 1 ( x ) n α q ( x ) n log α n + 2 , 0 α q 1 ( x ) .
Let F ( t ) = 0 t f ( α ) d α , then we have
F ( t ) = n q 2 ( x ) n log q 1 ( x ) n q ( x ) n log t n + 1 , 0 t q 1 ( x ) .
Denote g ( t ) = F ( t ) t , then
G ( s ) = 0 s g ( t ) dt = q 2 ( x ) n log q 1 ( x ) n n q ( x ) n log s n , 0 s q 1 ( x ) .
Denote h ( t ) = G ( t ) t , then have
H ( v ) = 0 v h ( t ) dt = q 2 ( x ) n log q 1 ( x ) n n 2 ( n 1 ) q ( x ) n log v n 1 , 0 v q 1 ( x ) .
The risk function is
R ( ( α , β ) , ( α ^ , β ^ ) ) = 0 α ^ M L E 0 + D ( ( α , β ) , ( α ^ , β ^ ) ) π ( α , β | x ) d β d α .
Firstly, we show that α ^ M G E = α ^ M L E exp 1 n β ^ M L E . Since
R α ^ = β ^ α ^ 0 α ^ M L E 0 + ( log α log α ^ ) β π ( α , β | x ) d β d α ,
when R α ^ = 0 , we have
log α ^ M G E = 0 α ^ M L E log α · E β [ π ( α , β | x ) ] d α 0 α ^ M L E E β [ π ( α , β | x ) ] d α .
Combining
0 α ^ M L E E β [ π ( α , β | x ) ] d α = F ( α ^ M L E ) = β ^ M L E
with
0 α ^ M L E log α · E β [ π ( α , β | x ) ] d α = 0 α ^ M L E log t d F ( t ) = β ^ M L E log α ^ M L E G ( α ^ M L E ) = β ^ M L E log α ^ M L E 1 n β ^ M L E
we get
α ^ M G E = α ^ M L E exp 1 n β ^ M L E .
Secondly, we will show that β ^ M G E = 2 n 2 ( n 1 ) 2 n 3 + n + 1 β ^ M L E . In fact, from
R β ^ = 0 α ^ M L E 0 + β ( log α log α ^ ) 2 2 β β ^ 2 + 1 β π ( α , β | x ) d β d α
we see that when R β ^ = 0 , we have
β ^ M L E β ^ M G E 2 = F ( α ^ M L E ) β ^ M G E 2 = 0 α ^ M L E 0 + 1 β π ( α , β | x ) d β d α + 1 2 0 α ^ M L E ( log α log α ^ ) 2 f ( α ) d α .
Noting that
0 α ^ M L E 0 + 1 β π ( α , β | x ) d β d α = 0 + 1 β π ( β | x ) d β = q 2 ( x ) n log q 1 ( x ) n τ ( n ) 0 + β n 2 exp ( q 2 ( x ) n log q 1 ( x ) ) β d β = q 2 ( x ) n log q 1 ( x ) n 1 = n ( n 1 ) β ^ M L E
and
0 α ^ M L E ( log α log α ^ ) 2 f ( α ) d α = 0 α ^ M L E ( log t log α ^ ) 2 dF ( t ) = β ^ M L E ( log α ^ M L E log α ^ ) 2 2 0 α ^ M L E ( log t log α ^ ) g ( t ) dt = β ^ M L E 1 n β ^ M L E 2 2 G ( α ^ M L E ) 1 n β ^ M L E + 2 H ( α ^ M L E ) = 1 n 2 β ^ M L E + 2 n ( n 1 ) β ^ M L E = n + 1 n 2 ( n 1 ) β ^ M L E
we have
β ^ M L E β ^ M G E 2 = n ( n 1 ) β ^ M L E + n + 1 2 n 2 ( n 1 ) β ^ M L E = 2 n 3 + n + 1 2 n 2 ( n 1 ) β ^ M L E .
As a result, we obtain β ^ M G E = 2 n 2 ( n 1 ) 2 n 3 + n + 1 β ^ M L E .  □
Theorem 3.
When β is known, we have α ^ M G E ( x | β ) = α ^ M L E exp 1 n β . And when α is known, we have β ^ M G E ( x | α ) = n ( n + 1 ) q 2 ( x ) n log α .
Proof. 
When β is known, the risk function is
R ( ( α , β ) , ( α ^ , β ) ) = 0 α ^ M L E D β ( α , β ) , ( α ^ , β ) π ( α | x , β ) d α .
Since
d R d α ^ = β 2 α ^ 0 α ^ M L E ( log α log α ^ ) π ( α | x , β ) d α
when d R d α ^ = 0 , we have
log α ^ M G E ( x | β ) = 0 α ^ M L E log α · π ( α | x , β ) d α = 0 α ^ M L E log t dPr ( α t | x , β ) = log α ^ M L E 0 α ^ M L E 1 t Pr ( α t | x , β ) d t = log α ^ M L E 1 n β .
Hence
α ^ M G E ( x | β ) = α ^ M L E exp 1 n β .
When α is known, the risk function is
R ( α , β ) , ( α , β ^ ) = 0 + D α ( α , β ) , ( α , β ^ ) π ( β | x , α ) d β .
Since
d R d β ^ = 0 + β β ^ 2 + 1 β π ( β | x , α ) d β ,
when d R d β ^ = 0 , we have
β ^ M G E ( x | α ) = 0 + β π ( β | x , α ) d β 0 + 1 β π ( β | x , α ) d β .
Noting that
0 + β π ( β | x , α ) d β = q 2 ( x ) n log α n + 1 τ ( n + 1 ) 0 + β n + 1 exp ( q 2 ( x ) n log α ) β d β = n ( n + 1 ) q 2 ( x ) n log α 2 q 2 ( x ) n log α n + 1 τ ( n + 1 ) 0 + β n 1 exp ( q 2 ( x ) n log α ) β d β = n ( n + 1 ) q 2 ( x ) n log α 2 0 + 1 β π ( β | x , α ) d β
hence we have β ^ M G E ( x | α ) = n ( n + 1 ) q 2 ( x ) n log α .  □

4.3.2. Bayesian Estimations under Al-Bayyati’s Loss Function

The Al-Bayyati’s loss function was stated by Reference [19] as
l ( θ ^ , θ ) = θ c ( θ ^ θ ) 2 ,
where c is a real number. Next, we use Al-Bayyati’s loss function to derive the Bayesian estimation of Pareto model.
Proposition 3.
Assume that θ Θ R . Under Al-Bayyati’s loss function, the Bayesian estimation of parameter θ is given by
θ ^ B c = Θ θ c + 1 π ( θ | x ) d θ Θ θ c π ( θ | x ) d θ .
Proof. 
Since the risk function
R ( θ ^ , θ ) = Θ θ c ( θ ^ θ ) 2 π ( θ | x ) d θ ,
we have
d R ( θ ^ , θ ) d θ ^ = θ ^ Θ θ c π ( θ | x ) d θ Θ θ c + 1 π ( θ | x ) d θ .
Then we have
θ ^ B c = Θ θ c + 1 π ( θ | x ) d θ Θ θ c π ( θ | x ) d θ ,
by letting d R ( θ ^ , θ ) d θ ^ = 0 .  □
Using Al-Bayyati’s loss function, α ^ B c lacks the simple display expression. Thus we give the upper and lower bound estimations.
Theorem 4.
Using Al-Bayyati’s loss function and assuming c 0 , we find that when β is unknown, then α ^ B c satisfies
n β ^ M L E + c ( n 1 ) β ^ M L E ( c + 1 ) ( n 1 ) n β ^ M L E 2 α ^ M L E α ^ B c ( n 1 ) n β ^ M L E 2 n β ^ M L E + c + 1 ( n 1 ) β ^ M L E c α ^ M L E .
And when α is unknown, then
β ^ B c = 1 + c n β ^ M L E .
Furthermore, there exists c 0 such that β ^ B c 0 = β 0 , where β 0 is the real value of shape parameter β.
Proof. 
When β is unknown, we have
0 α ^ M L E α c + 1 π ( α | x ) d α = 0 α ^ M L E t c + 1 dPr ( α t | x ) = t c + 1 Pr ( α t | x ) 0 α ^ M L E ( c + 1 ) 0 α ^ M L E t c Pr ( α t | x ) d t = α ^ M L E c + 1 ( c + 1 ) 0 α ^ M L E t c Pr ( α t | x ) d t .
Noting that
0 α ^ M L E t c dPr ( α t | x ) = 0 α ^ M L E t c 1 + β ^ M L E log α ^ M L E t n d t = α ^ M L E c + 1 0 + exp ( c + 1 ) u 1 + u β ^ M L E n d u α ^ M L E c + 1 0 + 1 + u β ^ M L E n d u = α ^ M L E c + 1 ( n 1 ) β ^ M L E
and that
0 α ^ M L E t c dPr ( α t | x ) = α ^ M L E c + 1 0 + exp ( c + 1 ) u 1 + u β ^ M L E n d u α ^ M L E c + 1 0 + exp ( c + 1 ) u exp u β ^ M L E n d u = α ^ M L E c + 1 0 + exp n β ^ M L E + c + 1 u d u = α ^ M L E c + 1 n β ^ M L E + c + 1
we have
0 α ^ M L E α c + 1 π ( α | x ) d α α ^ M L E c + 1 ( c + 1 ) α ^ M L E c + 1 ( n 1 ) β ^ M L E = ( n 1 ) β ^ M L E ( c + 1 ) ( n 1 ) β ^ M L E α ^ M L E c + 1 0 α ^ M L E α c + 1 π ( α | x ) d α α ^ M L E c + 1 ( c + 1 ) α ^ M L E c + 1 n β ^ M L E + c + 1 = n β ^ M L E n β ^ M L E + c + 1 α ^ M L E c + 1 .
Similarly, we can get
( n 1 ) β ^ M L E c ( n 1 ) β ^ M L E α ^ M L E c 0 α ^ M L E α c π ( α | x ) d α n β ^ M L E n β ^ M L E + c α ^ M L E c .
Furthermore, by Proposition 3, we have
α ^ B c = 1 ( c + 1 ) 0 + exp ( c + 1 ) u 1 + u β ^ M L E n d u 1 c 0 + exp ( c u ) 1 + u β ^ M L E n d u α ^ M L E
and
α ^ B c ( n 1 ) β ^ M L E ( c + 1 ) ( n 1 ) β ^ M L E α ^ M L E c + 1 n β ^ M L E + c n β ^ M L E α ^ M L E c = ( n β ^ M L E + c ) ( n 1 ) β ^ M L E ( c + 1 ) ( n 1 ) n β ^ M L E 2 α ^ M L E α ^ B c n β ^ M L E n β ^ M L E + c + 1 α ^ M L E c + 1 ( n 1 ) β ^ M L E ( n 1 ) β ^ M L E c α ^ M L E c = ( n 1 ) n β ^ M L E 2 n β ^ M L E + c + 1 ( n 1 ) β ^ M L E c α ^ M L E .
When α is unknown, we have
0 + β c + 1 π ( β | x ) d β = q 2 ( x ) n log q 1 ( x ) n τ ( n ) 0 + β n + c exp q 2 ( x ) n log q 1 ( x ) β d β = n + c q 2 ( x ) n log q 1 ( x ) 0 + β c π ( β | x ) d β .
Thus, by Proposition 3, we have
β ^ B c = 1 + c n β ^ M L E .
And when c 0 = n β 0 β ^ M L E 1 , we have β ^ B c 0 = β 0 .  □
Remark 1.
When β is unknown, we have
α ^ B c = 1 ( c + 1 ) 0 + exp ( c + 1 ) u 1 + u β ^ M L E n d u 1 c 0 + exp ( c u ) 1 + u β ^ M L E n d u α ^ M L E .
In particular, when c = 0 , we get
α ^ B c = α ^ E = 1 0 + exp ( u ) 1 + u β ^ M L E n d u α ^ M L E .
From (27), we know that α ^ E < α ^ M L E . By analyzing (26), when c > 0 gradually increases, α ^ B c will increase firstly and then decrease, and finally α ^ B c will converge to α ^ M L E . Let α 0 be the real value of scale parameter α, then when α ^ E α 0 α ^ M L E , there exists c 0 0 such that α ^ B c 0 = α 0 , When α 0 > α ^ E , α ^ E is the closest estimation among all α ^ B c . When α 0 > α ^ M L E , there exists c 0 0 such that α ^ c 0 is the closest estimation among all α ^ B c .
Theorem 5.
Using Al-Bayyati’s loss function, we find that when β is known, α ^ B c ( x | β ) = n β + c n β + c + 1 α ^ M L E . When α is known, β ^ B c ( x | α ) = n + 1 + c q 2 ( x ) n log α . In both cases, there exist c 1 and c 2 such that α ^ B c 1 ( x | β ) = α 0 and β ^ B c 2 ( x | α ) = β 0 , where α 0 and β 0 are the real value of scale and shape parameter respectively.
Proof. 
When β is known, we have
0 α ^ M L E α c + 1 π ( α | x , β ) d α = 0 α ^ M L E t c + 1 dPr ( α t | x , β ) = n β n β + c + 1 α ^ M L E c + 1
and
0 α ^ M L E α c π ( α | x , β ) d α = n β n β + c α ^ M L E c .
Then by Proposition 3, we have
α ^ B c ( x | β ) = 0 α ^ M L E α c + 1 π ( α | x , β ) d α 0 α ^ M L E α c π ( α | x , β ) d α = n β + c n β + c + 1 α ^ M L E .
When α is known, we get
0 + β c + 1 π ( β | x , α ) d β = q 2 ( x ) n log α n + 1 τ ( n + 1 ) 0 + β n + c + 1 exp ( q 2 ( x ) n log α ) β d β = n + c + 1 q 2 ( x ) n log q 1 ( x ) 0 + β c π ( β | x , α ) d β .
Thus by Proposition 3, we get
β ^ B c ( x | α ) = n + 1 + c q 2 ( x ) n log α .
When β = β 0 , we have
α ^ B c ( x | β 0 ) = n β 0 + c n β 0 + c + 1 α ^ M L E
Noting that if α ^ M L E α 0 , then α ^ B c 0 ( x | β 0 ) = α 0 , where
c 0 = α 0 α ^ M L E α 0 n β 0 .
Hence either α ^ M L E or α ^ B c 0 ( x | β 0 ) is the true value of α .
When α = α 0 , we have
β ^ B c ( x | α 0 ) = n + 1 + c q 2 ( x ) n log α 0 .
Hence we can take
c 0 = ( q 2 ( x ) n log α 0 ) α 0 ( n + 1 ) ,
such that β ^ B c 0 ( x | α 0 ) is the true value of β.  □

4.4. Bayesian Posterior Prediction

Let X ˜ π ( x ˜ ; α , β ) be the value that needs to be observed from Pareto distribution. In the sense of posterior distribution (13), if the sample X is given, we can make relevant posterior prediction of X ˜ . The discussion will be divided into the following three cases.
  • When neither α nor β are unknown, then we have
    m ( x ˜ | x ) = 0 + 0 + π ( x ˜ ; α , β ) π ( α , β | x ) d α d β = 0 + 0 + n q 2 ( x ) n log q 1 ( x ) n τ ( n ) β n α n β 1 exp q 2 ( x ) β I [ 0 α q 1 ( x ) ] · β α β x ˜ β + 1 I [ x ˜ α ] d α d β = 0 + n q 2 ( x ) n log q 1 ( x ) n τ ( n ) exp q 2 ( x ) β β n + 1 x ˜ β + 1 d β 0 min x ˜ , q 1 ( x ) α ( n + 1 ) β 1 d α = 0 + n q 2 ( x ) n log q 1 ( x ) n τ ( n ) exp q 2 ( x ) β β n + 1 x ˜ β + 1 · ( min x ˜ , q 1 ( x ) ) ( n + 1 ) β ( n + 1 ) β d β = 0 + n q 2 ( x ) n log q 1 ( x ) n τ ( n ) β n exp ( n + 1 ) log min x ˜ , q 1 ( x ) + log x ˜ + q 2 ( x ) β ( n + 1 ) x ˜ d β = n 2 q 2 ( x ) n log q 1 ( x ) n ( n + 1 ) x ˜ ( n + 1 ) log min x ˜ , q 1 ( x ) + log x ˜ + q 2 ( x ) n + 1 I [ x ˜ > 0 ] = n 2 q 2 ( x ) n log q 1 ( x ) n ( n + 1 ) x ˜ N ( x , x ˜ ) ,
    where
    N ( x , x ˜ ) = q 2 ( x ) n log x ˜ ( n + 1 ) , 0 < x ˜ < α ^ M L E ( x ) ( n + 1 ) log q 1 ( x ) + log x ˜ + q 2 ( x ) ( n + 1 ) , x ˜ α ^ M L E ( x ) .
  • When α is known and β is unknown, we have
    m ( x ˜ | x , α ) = 0 + π ( x ˜ ; α , β ) π ( β | x , α ) d β = ( n + 1 ) q 2 ( x ) n log α n + 1 x ˜ ( n + 1 ) log α + log x ˜ + q 2 ( x ) n + 2 I [ x ˜ > α ] .
  • When β is known and α is unknown, we have
    m ( x ˜ | x , β ) = 0 + π ( x ˜ ; α , β ) π ( α | x , β ) d α = n n + 1 β α ^ M L E n β ( x ) · x ˜ β 1 min x ˜ , α ^ M L E ( x ) ( n + 1 ) β I [ x ˜ > 0 ] = n n + 1 β · α ^ M L E n β ( x ) · x ˜ n β 1 , 0 < x ˜ < α ^ M L E ( x ) α ^ M L E β ( x ) · x ˜ β 1 , x ˜ α ^ M L E ( x ) .
For the above posterior prediction distribution, given the prediction credibility k, we can make Bayesian prediction inference in practical applications. The specific process is as follows. From x ˜ L x ˜ U m ( x ˜ | x ) d x ˜ = k , multiple sets of x ˜ L , x ˜ U can be got. By choosing appropriately the upper and lower bounds for X ˜ such that x ˜ U x ˜ L is smaller, then we can obtain higher prediction accuracy.

5. Simulation

In real life, the proposed algorithm for target detection of maritime radar needs to be tested and verified on sea clutter data. In order to determine the sea clutter better, it is often necessary to estimate the parameters of sea clutter model.
Therefore, in this section, we will use the conclusion of Section 4 to estimate the parameters of Pareto model of sea clutter and show the simulation results.

5.1. The Influence of Parameters on Sea Clutter

In this subsection, we show the effect of scale parameter α and shape parameter β on sea clutter.
Figure 1 and Figure 2 show the probability density curve of Pareto distribution with respect to two parameters. It can be seen from the figures that when the scale parameter α is larger, the density curve is even. The proportion of small clutter amplitude increases, and the decline of the whole curve is gentle. As the shape parameter β becomes larger, the proportion of small clutter amplitude increases significantly and becomes more concentrated, and the tail descends faster. On the whole, for Pareto model, the energy is concentrated on the small clutter. The trailing phenomenon is apparent. The essential reason is that when the radar is grazing incident, the overall backscatter echo is relatively weak.

5.2. Various Types of Bayesian Estimation on Sea Clutter Models

In this subsection, we show the aforementioned Bayesian estimations of sea clutter.
In order to generate random samples of Pareto distribution with parameters α 0 and β 0 , we use the inverse distribution function and take the inverse transformation method to extract Pareto samples: X = α 0 U 1 β 0 , where U is the uniformly distributed random variable on [ 0 , 1 ] . We carry out numerical simulations where α 0 = 0.5 , 1 , 1.5 and β 0 = 0.5 , 1 , 1.5 , respectively. Using the inverse transformation method, we generate 1000 random samples subject to Pareto distribution. To show the geometry of Pareto model of sea clutter, we take ( α 0 , β 0 ) = ( 0.5 , 1 ± 0.5 ) as the center and draw the unit geodesic circumference with dotted line. We draw 64 uniformly distributed geodesics with directions θ 0 = k π 32 , k = 0 , 1 , , 63 with solid line. See Figure 3 and Figure 4.
To describe the proximity between the estimated values of each group and the predetermined parameter value ( α 0 , β 0 ) , we need to calculate the geodesic distance d { ( α 0 , β 0 ) , ( α ^ ( x ) , β ^ ( x ) ) } . If the distance between the estimated value and the predetermined parameter value is close, we believe that the estimation is accurate. By (22), d ( α 0 , β 0 ) , α ^ ( x ) , β ^ ( x ) 1 + β 0 β ^ ( x ) log α 0 log α ^ ( x ) 2 2 + β 0 β ^ ( x ) 2 2 β 0 β ^ ( x ) . Hence the smaller | log α 0 log α ^ | and | β 0 β ^ | are, the more accurate the estimation is.
Next we will make a comparative analysis of various types of Bayesian estimations.

5.2.1. Mean Geodesic Estimation and the Common Bayesian Estimations

Case 1. Both scale parameter α and shape parameter β are unknown.
From Table 1 we know that | α ^ E α ^ M G E | and | β ^ E β ^ M G E | are less than 10 4 , hence α ^ E and α ^ M G E are almost equal. Since α ^ E does not have explicit expression, α ^ M G E can take the place of α ^ E and it also has more precise geometric explanation.
In most simulation tests, ( α ^ M G E , α ^ M G E ) is more accurate than ( α ^ M L E , α ^ M L E ) and ( α ^ M E , α ^ M E ) . Hence, in general the estimation MGE that we proposed is better than the common Baysesian estimations.
Case 2. Either shape parameter β or scale parameter α is known.
When one parameter is known, the statistical manifold is degenerated. Hence taking the Euclidean distance or the geodesic distance does not make much difference. This can be seen in Table 2 and Table 3.
Comparing Table 2 and Table 3, when β is known, the accuracy of the estimations is 10 3 and when α is known, the accuracy of all kinds of estimations is 10 2 . Therefore, the accuracy of various types of estimations will improve if β is known. This indicates that scale parameter α is more easily obtained from samples in the sea clutter model and has strong robustness. Shape parameter β is more sensitive than scale parameter α .

5.2.2. The Estimations under Al-Bayyati’s Loss Function

In this subsection, we give the simulation of various types of Bayesian estimations when ( α 0 , β 0 ) = ( 1.5 , 1.5 ) .
Case 1. Both scale parameter α and shape parameter β are unknown.
When α and β are unknown, the variation trend of Bayesian estimation of the two parameters under Al-Bayyati’s loss function with parameter c is shown in Figure 5 and Figure 6, respectively. When β is unknown, by (26)
α ^ B c = 1 ( c + 1 ) 0 + exp ( c + 1 ) u 1 + u β ^ M L E n d u 1 c 0 + exp ( c u ) 1 + u β ^ M L E n d u α ^ M L E .
Figure 5 shows the case when α ^ E α 0 . Hence from the discussion in Remark 1, α ^ E is the closest estimate among all α ^ B c with the increasing of positive c.
When α is unknown, by Theorem 4
β ^ B c = 1 + c n β ^ M L E .
When c = 0 , β ^ B c = β ^ M L E = β ^ E and when c 0 = n β 0 β ^ M L E 1 , β ^ B c 0 = β 0 . As shown in Figure 6, there always exists infinitely many c such that β ^ c is closer than the common Bayesian estimations. And c 0 = n β 0 β ^ M L E 1 is nothing but the real value of parameter β . These two figures also show that MGE are better than the common Bayesian estimations when both of the parameters are unknown. Therefore, when α and β are unknown, to obtain closer estimations, we are able to change c 1 and c 2 to make | log α 0 log α ^ c 1 | and | β 0 β ^ c 2 | smaller and even obtain the minimum value. Through the previous discussions, the choice of the best c 1 depends on the inequality among real parameter α 0 , α ^ M L E and α ^ E . The best c 2 is c 2 = n β 0 β ^ M L E 1 .
Case 2. Either shape parameter β or scale parameter α is known.
When β or α is known, the variation trend of various Bayesian estimation of parameter α or parameter β in Al-Bayyati’s loss function with parameter c is shown in Figure 7 or Figure 8, respectively. When β = β 0 , by Theorem 5, we get
α ^ B c ( x | β ) = n β + c n β + c + 1 α ^ M L E .
If α ^ M L E α 0 , we have α ^ B c ( x | β ) = α 0 when c = α 0 α ^ M L E α 0 n β . Hence either α ^ M L E is the true value of α , or we can take c 0 = α 0 α ^ M L E α 0 n β such that α ^ B c 0 ( x | β ) , which is the true value of α . This is shown in Figure 7.
When α = α 0 , by Theorem 5, we get
β ^ B c ( x | α ) = n + 1 + c q 2 ( x ) n log α .
If c = ( q 2 ( x ) n log α ) α 0 ( n + 1 ) , then β ^ B c ( x | α ) = β 0 . Hence we can take c 0 = q 2 ( x ) n log α α 0 ( n + 1 ) such that β ^ B c 0 ( x | α ) is the true value of β . This is shown in Figure 8.
In above two cases, there are infinitely many c such that α ^ c ( x | β ) or α ^ c ( x | α ) is closer than the common Bayesian estimations.

5.3. Simulation of Posterior Predictive Distribution

In order to observe the simulation effect of the posterior prediction distribution, according to the samples generated in Section 5.2, we drew the posterior prediction distribution of sea clutter and the real Pareto distribution π ( x | α 0 , β 0 ) of sea clutter where ( α 0 , β 0 ) = ( 1.5 , 1.5 ) for comparative analysis. See Figure 9, Figure 10 and Figure 11.
Case 1. Scale parameter α and shape parameter β are unknown
The posterior prediction distribution of sea clutter is
m ( x ˜ | x ) = n 2 q 2 ( x ) n log q 1 ( x ) n ( n + 1 ) x ˜ N ( x , x ˜ ) ,
where
N ( x , x ˜ ) = q 2 ( x ) n log x ˜ ( n + 1 ) , 0 < x ˜ < α ^ M L E ( x ) ( n + 1 ) log q 1 ( x ) + log x ˜ + q 2 ( x ) ( n + 1 ) , x ˜ α ^ M L E ( x ) .
The image is shown in Figure 9. The blue curve represents the probability distribution of sea clutter π ( x | α 0 , β 0 ) , which gives positive values at the right side of the boundary point x = α 0 . The orange curve represents the posterior prediction distribution of sea clutter m ( x ˜ | x ) , which changes continuously when x ˜ > 0 , but forms a cusp at x ˜ = α ^ M L E . Compared with the two curves, the curve of the predicted distribution of sea clutter is connected by a continuous curve and shifts slightly to the left. It is worth noting that although m ( x ˜ | x ) tends to infinity as x ˜ 0 + , it is not reflected in the image and can be ignored in the actual calculation of the probability.
Case 2.α is known and β is unknown
The posterior prediction distribution of sea clutter is
m ( x ˜ | x , α ) = ( n + 1 ) q 2 ( x ) n log α n + 1 x ˜ ( n + 1 ) log α + log x ˜ + q 2 ( x ) n + 2 I [ x ˜ > α ] .
It can be seen from Figure 10 that the probability distribution π ( x | α 0 , β 0 ) (the blue curve) and the posterior prediction distribution m ( x ˜ | x , β ) (the orange curve) can only obtain positive values at the right side of the boundary point x = α 0 . There is a very high degree of overlap, which means that when α is known, the prediction is going to be very accurate. We come to the conclusion that more effective information can be obtained for parameter α than β .
Case 3.β is known and α is unknown
The posterior prediction distribution of sea clutter is
m ( x ˜ | x , β ) = n n + 1 β · α ^ M L E n β ( x ) · x ˜ n β 1 , 0 < x ˜ < α ^ M L E ( x ) α ^ M L E β ( x ) · x ˜ β 1 , x ˜ α ^ M L E ( x ) .
The probability distribution π ( x | α 0 , β 0 ) (the blue curve) obtains a positive value at the right side of boundary point x = α 0 . The posterior prediction distribution m ( x ˜ | x , α ) (the orange curve) changes continuously at x ˜ > 0 and forms a cusp at x ˜ = α ^ M L E . By comparing these two curves, the posterior prediction distribution shows a significant right shift, and the simulation effect is not ideal near x ˜ = α ^ M L E . However, with the continuous increasing of x ˜ , the two curves gradually coincide and the prediction accuracy becomes higher. Therefore, when β is known and α is unknown, the larger the clutter amplitude is to be observed, the higher the prediction accuracy will be.
To sum up, for the sea clutter model, the Bayesian posterior prediction results under the above three conditions are ideal, and the prediction model can well reflect the characteristics of sea clutter towing.

6. Conclusions and Future Work

In this paper, we presented systematic methods for Bayesian inference from geometric viewpoints and applied it to Pareto model. We carried out simulations on sea clutter to show the effectiveness.
For Pareto model, there does not exist general α-parallel prior. Using the Jeffreys prior and by using geodesic distance and Al-Bayyati’s loss function, we obtain two new classes of Bayesian estimations. We call the estimation in the sense of mean geodesic distance MGE and it is proved that MGE has following advantages: it has the explicit expression, and is more accurate than the common Bayesian estimations which has shown in our simulation. We also prove that the estimations under the Al-Bayyati’s loss function are more accurate than the common Bayesian estimations. Actually, there are infinitely many c such that the new estimations are better. These results are important for the estimation of parameters when studying sea clutter model. Finally we show that the Bayesian posterior prediction results can well reflect the characteristics of sea clutter towing in any case.
In the future, more in-depth researches are worth discussing. From statistical viewpoints, we can apply Bayesian inference for the Pareto model to non-linear regression models [20]. From geometrical viewpoints, we expect to generalize our framework and combine more tools from information geometry. We want to carry out more experiments and applications in different fields.

Author Contributions

F.S. participated in raising questions and completing the calculation process; H.S. participated in raising questions and confirming the results; Y.C. and S.Z. participated in part of the calculation process and confirming the results. All authors have read and agree to the published version of the manuscript.

Funding

This research was funded by National Key Research and Development Plan of China, No. 2020YFC2006201.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is synthetically generated and described in the paper.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. de Carvalho, M.; Page, G.L.; Barney, B.J. On the geometry of Bayesian inference. Bayesian Anal. 2017, 14, 1013–1036. [Google Scholar] [CrossRef]
  2. Kulhavy, R. Recursive nonlinear estimation: A geometric approach. Automatica 1990, 26, 545–555. [Google Scholar] [CrossRef]
  3. Hartigan, J.A. Invariant prior distributions. Ann. Math. Stat. 1964, 35, 836–845. [Google Scholar] [CrossRef]
  4. Hartigan, J.A. The asymptotically unbiased density. Ann. Math. Stat. 1965, 36, 1137–1152. [Google Scholar] [CrossRef]
  5. Hartigan, J.A. The maximum likelihood prior. Ann. Math. Stat. 1998, 26, 2083–2103. [Google Scholar] [CrossRef]
  6. Takeuchi, J.; Amari, S. α-parallel prior and its properties. IEEE Trans. Inf. Theory 2005, 51, 1011–1023. [Google Scholar] [CrossRef]
  7. Tanaka, F. Curvature form on statistical model manifolds and its application to Bayesian analysis. J. Stat. Appl. Probab. 2012, 1, 35–43. [Google Scholar] [CrossRef]
  8. Amari, S. Differential geometry of curved exponential families—Curvature and information loss. Ann. Stat. 1982, 10, 357–385. [Google Scholar] [CrossRef]
  9. Amari, S.; Kumon, M. Differential geometry of edgeworth expansions in curved exponential family. Ann. Inst. Stat. Math. 1983, 35, 1–24. [Google Scholar] [CrossRef]
  10. Ward, K.D.; Watts, S. Use of sea clutter models in radar design and development. IET Radar Sonar Navig. 2010, 4, 146–157. [Google Scholar] [CrossRef]
  11. Ollila, E.; Tyler, D.E.; Koivunen, V.; Poor, H.V. Compound-Gaussian clutter modeling with an inverse Gaussian texture distribution. IEEE Signal Process. Lett. 2012, 19, 876–879. [Google Scholar] [CrossRef]
  12. Farshchian, M.; Posner, F.L. The Pareto distribution for low grazing angle and high resolution X-band sea clutter. In Proceedings of the 2010 IEEE Radar Conference, Washington, DC, USA, 10–14 May 2010. [Google Scholar]
  13. Amari, S.; Nagaoka, H. Methods of Information Geometry; AMS: Oxford, UK, 2000. [Google Scholar]
  14. Cao, L.; Sun, H.; Wang, X. The geometric structures of the Weibuul distribution manifold and the generalized exponential distribution manifold. Tamkang J. Math. 2008, 39, 45–51. [Google Scholar] [CrossRef] [Green Version]
  15. Zhang, Z.; Sun, H.; Zhong, F. Information geometry of the power inverse Gaussian distribution. Appl. Sci. 2007, 9, 194–203. [Google Scholar]
  16. Al-Kutubi, H.S.; Ibrahim, N.A. Bayes estimator for exponential distribution with extension of Jeffery prior information. Malays. J. Math. Sci. 2009, 3, 297–313. [Google Scholar]
  17. Li, M.; Sun, H.; Peng, L. Fisher-Rao geometry and Jeffreys prior for Pareto distribution. Commun. Stat. Theory Methods 2020, 1–16. [Google Scholar] [CrossRef]
  18. Peng, L.; Sun, H.; Jiu, L. The geometric structure of the Pareto distribution. Bol. Asoc. Mat. Venez. 2007, 14, 5–13. [Google Scholar]
  19. Dibal, N.P.; Mobolaji, A.T.; Musa, Y.A. Bayes’ estimators of an exponentially distributed random variables using Al-Bayyati’s loss function. Int. J. Entific Res. Publ. (IJSRP) 2019, 9, 674–684. [Google Scholar] [CrossRef]
  20. Contreras-Reyes, J.E.; Quintero, F.O.L.; Wiff, R. Bayesian Modeling of Individual Growth Variability Using Back-calculation: Application to Pink Cusk-eel (Genypterus Blacodes) off Chile. Ecol. Model. 2018, 385, 145–153. [Google Scholar] [CrossRef]
Figure 1. The influence of the change of α probability distributions.
Figure 1. The influence of the change of α probability distributions.
Entropy 23 00045 g001
Figure 2. The influence of the change of β on probability distributions.
Figure 2. The influence of the change of β on probability distributions.
Entropy 23 00045 g002
Figure 3. Sixty-four uniformly distributed geodesics centered on ( 0.5 , 0.5 ) .
Figure 3. Sixty-four uniformly distributed geodesics centered on ( 0.5 , 0.5 ) .
Entropy 23 00045 g003
Figure 4. Sixty-four uniformly distributed geodesics centered on ( 0.5 , 1.5 ) .
Figure 4. Sixty-four uniformly distributed geodesics centered on ( 0.5 , 1.5 ) .
Entropy 23 00045 g004
Figure 5. The variation of various Bayesian estimations of α with Al-Bayyati’s loss function parameter c.
Figure 5. The variation of various Bayesian estimations of α with Al-Bayyati’s loss function parameter c.
Entropy 23 00045 g005
Figure 6. The variation of various Bayesian estimations of β with Al-Bayyati’s loss function parameter c.
Figure 6. The variation of various Bayesian estimations of β with Al-Bayyati’s loss function parameter c.
Entropy 23 00045 g006
Figure 7. The variation of various Bayesian estimations of α with loss function parameter c when β = β 0 .
Figure 7. The variation of various Bayesian estimations of α with loss function parameter c when β = β 0 .
Entropy 23 00045 g007
Figure 8. The variation of various Bayesian estimations of β with loss function parameter c when α = α 0 .
Figure 8. The variation of various Bayesian estimations of β with loss function parameter c when α = α 0 .
Entropy 23 00045 g008
Figure 9. Posterior predictive distribution and underlying distribution ( α and β are unknown).
Figure 9. Posterior predictive distribution and underlying distribution ( α and β are unknown).
Entropy 23 00045 g009
Figure 10. Posterior predictive distribution and underlying distribution ( α = α 0 ).
Figure 10. Posterior predictive distribution and underlying distribution ( α = α 0 ).
Entropy 23 00045 g010
Figure 11. Posterior predictive distribution and underlying distribution ( β = β 0 ).
Figure 11. Posterior predictive distribution and underlying distribution ( β = β 0 ).
Entropy 23 00045 g011
Table 1. Mean geodesic estimations and the common Bayesian estimations ( α , β are not known).
Table 1. Mean geodesic estimations and the common Bayesian estimations ( α , β are not known).
α , β α ^ MLE , β ^ MLE α ^ Me , β ^ Me α ^ E , β ^ E α ^ MGE , β ^ MGE
(0.5,0.5)(0.5024,0.5171)(0.5017,0.5169)(0.5014,0.5171)(0.5014,0.5171)
(0.5,1.0)(0.5001,1.0292)(0.4998,1.0288)(0.4997,1.0292)(0.4997,1.0292)
(0.5,1.5)(0.5003,1.5566)(0.5001,1.5561)(0.5000,1.5566)(0.5000,1.5566)
(1.0,0.5)(1.0053,0.4743)(1.0038,0.4742)(1.0032,0.4743)(1.0032,0.4743)
(1.0,1.0)(1.0009,1.0209)(1.0002,1.0205)(0.9999,1.0209)(0.9999,1.0209)
(1.0,1.5)(1.0001,1.4639)(0.9996,1.4634)(0.9994,1.4639)(0.9994,1.4639)
(1.5,0.5)(1.5033,0.4883)(1.5012,0.4881)(1.5002,0.4883)(1.5002,0.4883)
(1.5,1.0)(1.5003,1.0226)(1.4993,1.0223)(1.4988,1.0226)(1.4988,1.0226)
(1.5,1.5)(1.5010,1.4978)(1.5003,1.4973)(1.5003,1.4978)(1.5000,1.4978)
Table 2. Mean geodesic estimations and the common Bayesian estimations ( β is known).
Table 2. Mean geodesic estimations and the common Bayesian estimations ( β is known).
β = β 0 α ^ Me ( x | β 0 ) α ^ E ( x | β 0 ) α ^ E ( x | β 0 ) α ^ MGE ( x | β 0 )
(0.5,0.5)0.50240.50170.50140.5014
(0.5,1.0)0.50010.49980.49960.4996
(0.5,1.5)0.50030.50000.49990.4999
(1.0,0.5)1.00531.00391.00331.0033
(1.0,0.5)1.00091.00020.99990.9999
(1.0,1.5)1.00010.99960.99940.9994
(1.5,0.5)1.50331.50121.50031.5003
(1.5,1.0)1.50031.49921.49881.4988
(1.5,1.5)1.50101.50031.50001.5000
Table 3. Mean geodesic estimations and the common Bayesian estimations (α is known).
Table 3. Mean geodesic estimations and the common Bayesian estimations (α is known).
α = α 0 β ^ MLE ( x | α 0 ) β ^ Me ( x | α 0 ) β ^ E ( x | α 0 ) β ^ MGE ( x | α 0 )
(0.5,0.5)0.51580.51620.51630.5161
(0.5,1.0)1.02891.02951.02991.0294
(0.5,1.5)1.55531.55631.55681.5560
(1.0,0.5)0.47320.47350.47360.4734
(1.0,0.5)1.01991.02061.02101.0205
(1.0,1.5)1.46381.46471.46521.4645
(1.5,0.5)0.48770.48810.48820.4880
(1.5,1.0)1.02241.02311.02341.0229
(1.5,1.5)1.49211.49311.49361.4928
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Sun, F.; Cao, Y.; Zhang, S.; Sun, H. The Bayesian Inference of Pareto Models Based on Information Geometry. Entropy 2021, 23, 45. https://doi.org/10.3390/e23010045

AMA Style

Sun F, Cao Y, Zhang S, Sun H. The Bayesian Inference of Pareto Models Based on Information Geometry. Entropy. 2021; 23(1):45. https://doi.org/10.3390/e23010045

Chicago/Turabian Style

Sun, Fupeng, Yueqi Cao, Shiqiang Zhang, and Huafei Sun. 2021. "The Bayesian Inference of Pareto Models Based on Information Geometry" Entropy 23, no. 1: 45. https://doi.org/10.3390/e23010045

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop