Next Article in Journal
A New Deep Dual Temporal Domain Adaptation Method for Online Detection of Bearings Early Fault
Next Article in Special Issue
Measuring and Controlling Bias for Some Bayesian Inferences and the Relation to Frequentist Criteria
Previous Article in Journal
Complexity Measures of Heart-Rate Variability in Amyotrophic Lateral Sclerosis with Alternative Pulmonary Capacities
Previous Article in Special Issue
Bayesian Estimation of Geometric Morphometric Landmarks for Simultaneous Localization of Multiple Anatomies in Cardiac CT Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improvement of Bobrovsky–Mayor–Wolf–Zakai Bound

by
Ken-ichi Koike
1,* and
Shintaro Hashimoto
2
1
College of Commerce, Nihon University, Tokyo 157-8570, Japan
2
Department of Mathematics, Hiroshima University, Hiroshima 739-8521, Japan
*
Author to whom correspondence should be addressed.
Entropy 2021, 23(2), 161; https://doi.org/10.3390/e23020161
Submission received: 18 November 2020 / Revised: 26 January 2021 / Accepted: 26 January 2021 / Published: 28 January 2021
(This article belongs to the Special Issue Bayesian Inference and Computation)

Abstract

:
This paper presents a difference-type lower bound for the Bayes risk as a difference-type extension of the Borovkov–Sakhanenko bound. The resulting bound asymptotically improves the Bobrovsky–Mayor–Wolf–Zakai bound which is difference-type extension of the Van Trees bound. Some examples are also given.

1. Introduction

The Bayesian Cramér–Rao bound or Van Trees bound [1] has been extended in a number of directions (e.g., [1,2,3]). For example, multivariate cases for such bounds are discussed by [4]. These bounds are used in many practical fields such as signal processing and nonlinear filtering. However, these bounds are not always sharp. To improve them, Bhattacharrya type extensions for them were provided by [5,6]. These Bayesian bounds are split into two categories, the Weiss–Weinstein family [7,8,9] and the Ziv–Zakai family [10,11,12]. The work in [13] serves as an excellent reference of this topic.
Recently, the authors in [14] showed that the Borovkov–Sakhanenko bound is asymptotically better than the Van Trees bound, and asymptotically optimal in a certain class of bounds. The authors in [15] compared some Bayesian bounds from the point of view of asymptotic efficiency. Furthermore, necessary and sufficient conditions for the attainment of Borovkov–Sakhanenko and the Van Trees bounds were given by [16] for an exponential family with conjugate and Jeffreys priors.
On the other hand, the Bobrovsky–Mayor–Wolf–Zakai bound ([17]) is known as a difference-type (Chapman–Robbins type) variation of the Van Trees bound. In this paper, we consider the improvement of the Bobrovsky–Mayor–Wolf–Zakai bound by applying the Chapman–Robbins type extension of the Borovkov–Sakhanenko bound. This bound is categorized into Weiss–Weinstein family.
As discussed later, the obtained bound is asymptotically superior to the Bobrovsky–Mayor–Wolf–Zakai bound for a sufficiently small perturbation and large sample size. We also provide several examples for finite and large sample size settings which include conjugate normal and Bernoulli logit models.

2. Improvement of Bobrovsky–Mayor–Wolf–Zakai Bound

Let X 1 , , X n be a sequence of independent, identically distributed (iid) random variables with density function f 1 ( x | θ ) ( θ Θ = R 1 ) with respect to a σ -finite measure μ . Suppose that f 1 ( x | θ ) is twice partial differentiable with respect to θ , and support { x | f 1 ( x | θ ) > 0 } of f 1 ( x | θ ) is independent of θ . The joint probability density function of X : = ( X 1 , , X n ) is f ( x | θ ) : = i = 1 n f 1 ( x i | θ ) , where x = ( x 1 , , x n ) . Let λ ( θ ) be a prior density of θ with respect to the Lebesgue measure. Consider the Bayesian estimation problem for a function φ ( θ ) of θ under quadratic loss L ( θ , a ) = ( a φ ( θ ) ) 2 . The joint pdf f ( x , θ ) of ( X , θ ) is given by f ( x , θ ) = f ( x | θ ) λ ( θ ) . Hereafter, expectations under probability densities f ( x , θ ) and f ( x | θ ) are denoted by E ( · ) and E θ ( · ) , respectively. We often use prime notation for partial derivatives with respect to θ for brevity, for example, θ φ ( θ ) is expressed as φ ( θ ) .
In this paper, we assume the following regularity conditions (A1)–(A3).
(A1)
φ ( θ ) is twice differentiable.
(A2)
Fisher information number
0 < I ( θ ) = E θ { 2 log f 1 ( X 1 | θ ) / θ 2 } = E θ [ log f 1 ( X 1 | θ ) / θ 2 ] <
for arbitrary θ Θ and is continuously differentiable in Θ .
(A3)
Prior density λ ( θ ) of θ is positive and differentiable for arbitrary θ Θ and
lim θ ± λ ( θ ) = 0 .
Let G h = 1 h f ( x , θ + h ) f ( x , θ ) φ ( θ + h ) I ( θ + h ) φ ( θ ) I ( θ ) . Considering variance–covariance inequality for G h , we have the following theorem for the Bayes risk.
Theorem 1.
Assume (A1)–(A3). For an estimator φ ^ ( X ) of φ ( θ ) and a real number h, inequality
E ( φ ^ ( X ) φ ( θ ) ) 2 Cov ( G h , φ ^ ( X ) φ ( θ ) ) 2 E ( G h 2 ) = E { φ ( θ ) φ ( θ h ) } φ ( θ ) I ( θ ) 2 E f ( X , θ + h ) f ( X , θ ) φ ( θ + h ) I ( θ + h ) φ ( θ ) I ( θ ) 2
for the Bayes risk holds.
Bound (1) is directly derived as a special case of the Weiss–Weinstein class [7]. However, we prove it in the Appendix B for the sake of clarity.
Note that
lim h 0 G h = 1 f ( x , θ ) lim h 0 1 h f ( x , θ + h ) φ ( θ + h ) I ( θ + h ) f ( x , θ ) φ ( θ ) I ( θ ) = 1 f ( x , θ ) θ f ( x , θ ) φ ( θ ) I ( θ ) ( = G 0 , say ) .
The Borovkov–Sakhanenko bound is obtained from the variance–covariance inequality for G 0
E ( φ ^ ( X ) φ ( θ ) ) 2 Cov ( G 0 , φ ^ ( X ) φ ( θ ) ) 2 E ( G 0 2 ) = E φ ( θ ) 2 I ( θ ) 2 n E φ ( θ ) 2 I ( θ ) + E ( φ ( θ ) λ ( θ ) / I ( θ ) ) λ ( θ ) 2
([2]). Since Bound (1) converges to Bound (3) as h 0 under Condition (B1) in Appendix A, Bound (1) for a sufficiently small h is very close to Bound (3).
In a similar way, the Bobrovsky–Mayor–Wolf–Zakai bound is obtained from variance–covariance inequality
E ( φ ^ ( X ) φ ( θ ) ) 2 Cov ( B h , φ ^ ( X ) φ ( θ ) ) 2 E ( B h 2 ) = E { φ ( θ ) φ ( θ h ) } 2 E f ( X , θ + h ) f ( X , θ ) 2 1 ,
where B h = 1 h f ( x , θ + h ) f ( x , θ ) 1 ([17]). By applying lim h 0 B h = B 0 = θ f ( x , θ ) f ( x , θ ) to the variance–covariance inequality, we have the Van Trees bound, that is,
E ( φ ^ ( X ) φ ( θ ) ) 2 Cov ( B 0 , φ ^ ( X ) φ ( θ ) ) 2 E ( B 0 2 ) = { E ( φ ( θ ) ) } 2 n E I ( θ ) + E λ ( θ ) λ ( θ ) 2 .
Since lim h 0 B h = B 0 , the value of Bobrovsky–Mayor–Wolf–Zakai Bound (4) converges to Van Trees Bound (5) as h 0 under (B2) in Appendix A. Hence, the value of Bound (4) for a sufficiently small h is very close to the one of Bound (5) in this case.
On the other hand, we often consider the normalized risk
lim n n E ( φ ^ ( X ) φ ( θ ) ) 2
(see [3,14]). For the evaluation of the normalized risk (6), Bayesian Cramér–Rao bounds can be used. For example, from Bound (3),
lim n n E ( φ ^ ( X ) φ ( θ ) ) 2 lim n n E φ ( θ ) 2 I ( θ ) 2 n E φ ( θ ) 2 I ( θ ) + E ( φ ( θ ) λ ( θ ) / I ( θ ) ) λ ( θ ) 2 = E φ ( θ ) 2 I ( θ ) .
Moreover, the authors in [14,15] showed that the Borovkov–Sakhanenko bound is asymptotically optimal in some class, and asymptotically superior to the Van Trees bound, that is,
lim n n E φ ( θ ) 2 I ( θ ) 2 n E φ ( θ ) 2 I ( θ ) + E ( φ ( θ ) λ ( θ ) / I ( θ ) ) λ ( θ ) 2 lim n n E ( φ ( θ ) ) 2 n E ( I ( θ ) ) + E λ ( θ ) λ ( θ ) 2 .
Denote Borovkov–Sakhanenko Bound (3), Van Trees Bound (5), Bobrovsky–Mayor–Wolf–Zakai Bound (4), and Bound (1) as BS n , VT n , BMZ n , h and N n , h , when sample size is n and perturbation is h, respectively. Then, (8) means
lim n BS n VT n = lim n n × BS n n × VT n = lim n n × BS n lim n n × VT n 1 .
Hence, from (9),
BS n VT n
holds for a sufficiently large n. Moreover, for this large n N ,
lim h 0 N n , h = BS n lim h 0 BMZ n , h = VT n
under (B1) and (B2). Hence, if Inequality (8) is strict, then N n , h > BMZ n , h for this large n N and a sufficiently small h by (10) and (11). The equality in (8) holds if and only if φ is proportional to I ( θ ) . Therefore, Bound (1) is asymptotically superior to the Bobrovsky–Mayor–Wolf–Zakai bound (4) for a sufficiently small h.
However, the comparison between Bounds (1) and (4) is not easy for a finite n. Hence, we now show comparisons of various existing bounds in two simple examples for fixed n N and h R 1 .
Example 1.
Let X 1 , , X n be a sequence of iid random variables according to N ( θ , 1 ) ( θ Θ = R 1 ) . We show that Bound (1) is asymptotically tighter than Bobrovsky–Mayor–Wolf–Zakai Bound (4) for a sufficiently large n. Suppose that the prior of θ is N ( m , τ 2 ) , where m and τ > 0 are known constants. Denote X = ( X 1 , , X n ) and x = ( x 1 , , x n ) . In this model, Fisher information I ( θ ) per observation equals 1. We consider the estimation problem for φ ( θ ) = θ 2 since Bound (1) coincides with Bound (4) for φ ( θ ) = θ (see also [5,6]).
First, we calculated Bobrovsky–Mayor–Wolf–Zakai Bound (4). The ratio of f ( x , θ + h ) and f ( x , θ ) is
f ( x , θ + h ) f ( x , θ ) = exp h T n 2 ( 2 h θ + h 2 ) h 2 τ 2 ( 2 θ 2 m + h ) ,
where T = i = 1 n X i . Since the conditional distribution of T given θ is N ( n θ , n ) and the moment generating function g T ( s ) is
g T ( s ) = exp s n θ + s 2 n 2 ,
the conditional expectation E T | θ { exp ( 2 h T ) } is
E T | θ { exp ( 2 h T ) } = g T ( 2 h ) = exp 2 h n θ + 2 h 2 n ,
where E T | θ ( · ) denotes the conditional expectation with respect to the conditional distribution of T given θ. Then, from (12) and (14), we have that
E f ( X , θ + h ) f ( X , θ ) 2 = E exp 2 h T n ( 2 h θ + h 2 ) h τ 2 ( 2 θ 2 m + h ) = E E T | θ exp ( 2 h T ) exp n h 2 h τ 2 ( 2 m + h ) 2 h θ n + 1 τ 2 = E exp n h 2 + 2 h m τ 2 h 2 τ 2 exp 2 h θ τ 2 = exp n h 2 + 2 h m τ 2 h 2 τ 2 exp 2 h m τ 2 + 2 h 2 τ 2 = exp n h 2 + h 2 τ 2 .
We can easily obtain E { φ ( θ ) φ ( θ h ) } = h ( 2 θ h ) . Hence, Bobrovsky–Mayer–Wolf–Zakai Bound (4) is equal to
{ h ( 2 m h ) } 2 exp h 2 n + 1 τ 2 1 ( = BMZ h , say )
from (15). Next, we calculated Bound (1). Since I ( θ ) = 1 , φ ( θ ) = θ 2 and φ ( θ ) = 2 θ ,
E φ ( θ ) I ( θ ) { φ ( θ ) φ ( θ h ) } = E 4 θ 2 h 2 θ h 2 = 2 h { 2 ( m 2 + τ 2 ) m h } .
Since
f ( X , θ + h ) f ( X , θ ) φ ( θ + h ) I ( θ + h ) = 2 ( θ + h ) exp h T n 2 ( 2 h θ + h 2 ) h 2 τ 2 ( 2 θ 2 m + h ) ,
we have
E f ( X , θ + h ) f ( X , θ ) φ ( θ + h ) I ( θ + h ) 2 = E 4 ( θ + h ) 2 E T | θ exp 2 h T exp n ( 2 h θ + h 2 ) h τ 2 ( 2 θ 2 m + h ) = 4 exp n h 2 h τ 2 ( 2 m + h ) E ( θ + h ) 2 exp 2 h τ 2 θ
from (18) and (14). Here, since moment-generating function g θ ( s ) of θ is g θ ( s ) = E { exp ( s θ ) } = exp s m + s 2 τ 2 2 ,
g θ ( s ) = E θ exp ( s θ ) = ( m + s τ 2 ) exp s m + s 2 τ 2 2 , g θ ( s ) = E θ 2 exp ( s θ ) = τ 2 + ( m + s τ 2 ) 2 exp s m + s 2 τ 2 2 .
So, from (20), we obtain
E exp 2 h τ 2 θ = exp 2 h m τ 2 + 2 h 2 τ 2 , E θ exp 2 h τ 2 θ = m 2 h exp 2 h m τ 2 + 2 h 2 τ 2 , E θ 2 exp 2 h τ 2 θ = τ 2 + ( m 2 h ) 2 exp 2 h m τ 2 + 2 h 2 τ 2 .
Hence, from (19) and (21),
E f ( X , θ + h ) f ( X , θ ) φ ( θ + h ) I ( θ + h ) 2 = 4 { τ 2 + ( m h ) 2 } exp n h 2 + h 2 τ 2 .
Moreover, we have
E f ( X , θ + h ) f ( X , θ ) φ ( θ + h ) I ( θ + h ) φ ( θ ) I ( θ ) = 4 E θ ( θ + h ) f ( X , θ + h ) f ( X , θ ) = 4 θ ( θ + h ) f ( x , θ + h ) f ( x , θ ) f ( x , θ ) d θ d μ ( x ) = 4 ( t h ) t f ( x , t ) d t d μ ( x ) ( substitute t = θ + h ) = 4 E { ( θ h ) θ } = 4 ( m 2 + τ 2 h m ) ,
and
E φ ( θ ) I ( θ ) 2 = 4 E ( θ 2 ) = 4 ( τ 2 + m 2 ) .
From (22)–(24),
E f ( X , θ + h ) f ( X , θ ) φ ( θ + h ) I ( θ + h ) φ ( θ ) I ( θ ) 2 = 4 τ 2 + ( m h ) 2 exp h 2 n + 1 τ 2 4 ( m 2 + τ 2 2 h m ) .
Therefore, Bound (1) is equal to
h 2 ( m 2 + τ 2 ) m h 2 τ 2 + ( m h ) 2 exp h 2 n + 1 τ 2 ( m 2 + τ 2 2 h m ) ( = N h , say ) .
Lastly, we compare (1) and (4). From Bounds (1) and (4), we have
E ( φ ^ ( X ) φ ( θ ) ) 2 BMZ h , N h
for arbitrary h R 1 . In general, while the Bayes risk is O ( n 1 ) , bounds BMZ h and N h are O ( exp ( n h 2 ) ) or decrease exponentially for h 0 as n . Thus, we take the limit as h 0 in order to obtain an asymptotically tighter bound. Define lim h 0 BMZ h = BMZ 0 and lim h 0 N h = N 0 . Since
BMZ 0 = 4 m 2 n + 1 τ 2 , N 0 = 4 ( m 2 + τ 2 ) 1 m 2 + τ 2 + n + 1 τ 2
from (16) and (26), we may compare their reciprocals, 4 / BMZ 0 and 4 / N 0 , in order to compare BMZ 0 and N 0 . BMZ 0 and N 0 are the Van Trees and Borovkov–Sakhanenko bounds, respectively. The Borovkov–Sakhanenko bound is asymptotically tighter than the Van Trees bound. In this case, the Borovkov–Sakhanenko bound is also tighter than the Van Trees bound for fixed n. In fact, since the difference is
4 BMZ 0 4 N 0 = 1 m 2 ( m 2 + τ 2 ) m 2 m 2 + 1 n τ 2 1 < 1 m 2 ( m 2 + τ 2 ) 1 n τ 2 1 = n τ 2 m 2 ( m 2 + τ 2 ) < 0
from (28), so 4 / BMZ 0 > 4 / N 0 and hence BMZ 0 < N 0 for all n N .
Next, we compare these bound to the Bayes risk of the Bayes estimator φ ^ B ( X ) of φ ( θ ) = θ 2 . The Bayes estimator φ ^ B ( X ) is given by
φ ^ B ( X ) = 1 n + ( 1 / τ 2 ) + T + ( m / τ 2 ) n + ( 1 / τ 2 ) .
Then, the Bayes risk of (30) is
E ( φ ^ B ( X ) φ ( θ ) ) 2 = 2 τ 2 ( 2 n τ 4 + 2 m 2 τ 2 n + 2 m 2 + τ 2 ) ( n τ 2 + 1 ) 2 = 4 ( m 2 + τ 2 ) n 1 + 2 ( 2 m 2 + 3 τ 2 ) τ 2 n 2 + O n 3 ( n ) .
Then, the normalized risk satisfies
lim n n E ( φ ^ B ( X ) φ ( θ ) ) 2 = 4 ( m 2 + τ 2 ) = lim n n N 0 > 4 m 2 = lim n n BMZ 0 .
Thus, the Van Trees bound is not asymptotically tight, while the Borovkov–Sakhanenko bound is asymptotically tight.
Example 2.
We considered the Bernoulli logit model of Example 2 in [16] when the sample size was 1. Bound (1) was not always better than Bobrovsky–Mayor–Wolf–Zakai Bound (4). Let X have Bernoulli distribution Ber e θ 1 + e θ ( θ R 1 ). Then, the probability density function of X given θ is
f ( x | θ ) = e θ x 1 1 + e θ ( x = 0 , 1 ) .
It is assumed that the prior density of θ is the conjugate, a version of Type IV generalized logistic distribution (e.g., [18]); then,
λ ( θ ) = 30 e 3 θ 1 + e θ 6 ( θ R 1 ) .
We set the hyperparameters to these values for some moment conditions. In this case, Fisher information for Model (33) is given by
I ( θ ) = e θ ( 1 + e θ ) 2 ,
and we considered the estimation problem of φ ( θ ) = θ .
In this example, we calculated Bound (1) in the first place. Combining (33)–(35), we have
f ( x , θ + h ) f ( x , θ ) φ ( θ + h ) I ( θ + h ) = e h x ( 1 + e θ ) 7 ( 1 + e θ + h ) 5 e 2 h θ
for h R 1 . Since X given θ is distributed as Ber e θ 1 + e θ , it holds
E X | θ ( e 2 h X ) = 1 + e θ + 2 h 1 + e θ ,
where E X | θ ( · ) means the expectation with respect to the conditional distribution of X given θ. Then, we have
E f ( X , θ + h ) f ( X , θ ) φ ( θ + h ) I ( θ + h ) 2 = E E X | θ f ( X , θ + h ) f ( X , θ ) 1 I ( θ + h ) 2 = E ( 1 + e θ ) 14 ( 1 + e θ + h ) 10 e 4 h 2 θ E X | θ e 2 h X = E ( 1 + e θ ) 13 ( 1 + e θ + h ) 10 ( 1 + e θ + 2 h ) e 4 h 2 θ = 30 e 4 h ( 1 + e θ ) 7 ( 1 + e θ + 2 h ) e θ ( 1 + e θ + h ) 10 d θ = 5 6 10 cosh ( h ) + 10 cosh ( 2 h ) + 10 cosh ( 3 h ) + cosh ( 4 h ) + 5 ,
by (35) and (37), where cosh ( x ) = 1 2 ( e x + e x ) is the hyperbolic cosine. Moreover, we have
E f ( X , θ + h ) f ( X , θ ) φ ( θ + h ) φ ( θ ) I ( θ + h ) I ( θ ) = f ( x , θ + h ) 1 I ( θ + h ) I ( θ ) d θ d F ( x ) = f ( x , t ) 1 I ( t ) I ( t h ) d t d F ( x ) ( substitute t = θ + h ) = λ ( t ) 1 I ( t ) I ( t h ) d t = 30 ( 1 + e t ) 4 ( 1 + e t h ) 2 e t + h d t = 10 1 + 2 cosh ( h ) ,
where F ( · ) is the cumulative distribution function of Ber e θ 1 + e θ . In a similar way, we have
E φ ( θ ) 2 I ( θ ) 2 = E ( 1 + e θ ) 4 e 2 θ = 30 ( 1 + e θ ) 2 e θ d θ = 30
and
E φ ( θ ) φ ( θ h ) φ ( θ ) I ( θ ) = E h I ( θ ) = 30 h e 2 θ ( 1 + e θ ) 4 d θ = 5 h .
Hence, we can show from (38)–(41) that the right-hand side of (1) equals
E φ ( θ ) φ ( θ h ) φ ( θ ) I ( θ ) 2 E f ( X , θ + h ) f ( X , θ ) φ ( θ + h ) I ( θ + h ) 2 2 E f ( X , θ + h ) f ( X , θ ) φ ( θ + h ) φ ( θ ) I ( θ + h ) I ( θ ) + E φ ( θ ) 2 I ( θ ) 2 = ( 5 h ) 2 10 3 sinh h 2 2 33 cosh ( h ) + 12 cosh ( 2 h ) + cosh ( 3 h ) + 8 = 30 h 2 38 cosh ( h ) + 10 cosh ( 2 h ) + 10 cosh ( 3 h ) + cosh ( 4 h ) + 17 ( = : ( N h , say ) ,
where sinh ( x ) = 1 2 ( e x e x ) is the hyperbolic sine. The Borovkov–Sakhanenko bound (3) is calculated as
N 0 = lim h 0 N h = 5 9 0.556 .
In the second place, we calculate Bound (4). In a similar way to (38) and (39), we have
E f ( X , θ + h ) f ( X , θ ) 2 = E ( 1 + e θ + 2 h ) ( 1 + e θ ) 13 ( 1 + e θ + h ) 14 e 6 h = 30 e 6 h ( 1 + e θ + 2 h ) ( 1 + e θ ) 7 ( 1 + e θ + h ) 14 e 3 θ d θ = 1 858 318 cosh ( h ) + 231 cosh ( 2 h ) + 116 cosh ( 3 h ) + 18 cosh ( 4 h ) + 175
and E { φ ( θ ) φ ( θ h ) } = h . Hence, by substituting (44) into (4), we have
E { φ ( θ ) φ ( θ h ) } 2 E f ( X , θ + h ) f ( X , θ ) 2 1 = h 2 1 858 318 cosh ( h ) + 231 cosh ( 2 h ) + 116 cosh ( 3 h ) + 18 cosh ( 4 h ) + 175 1 = 858 h 2 318 cosh ( h ) + 231 cosh ( 2 h ) + 116 cosh ( 3 h ) + 18 cosh ( 4 h ) 683 ( = : BMZ h , say ) .
The Van Trees bound is calculated as
BMZ 0 = lim h 0 BMZ h = 2 3 0.667 .
In last place, we compute the Bayes risk of the Bayes estimator θ ^ B ( X ) of θ, as follows. Since the posterior density of θ, given X = x , is given by
60 e θ ( x + 3 ) ( 1 + e θ ) 7 ( 0 < θ < 1 ) ,
the Bayes estimator is calculated as θ ^ B ( 0 ) = E ( θ | X = 0 ) = 60 θ e 3 θ ( 1 + e θ ) 7 d θ = 1 / 3 , θ ^ B ( 1 ) = E ( θ | X = 1 ) = 60 θ e 4 θ ( 1 + e θ ) 7 d θ = 1 / 3 . Then, by easy but tedious calculation, the Bayes risk of θ ^ B is
E θ ^ B θ 2 = π 2 3 47 18 0.679 .
Then, we can plot the values of N h , N 0 , BMZ h , BMZ 0 and the Bayes risk of θ ^ B from (42)–(46) and (48) (Figure 1). Figure 1 shows that Bound (1) is lower than Bound (4) for any h under Prior (34) when the sample size equals 1. However, in Section 3, we show by using the Laplace method that Bound (1) is tighter than Bound (4) for a large sample size.

3. Asymptotic Comparison by Laplace Approximation

In this section, we consider Example 2 in the previous section, again in the case when sample size is n. Bound (1) is asymptotically better than Bound (4) for a sufficiently large n by using the Laplace method. These bounds are only approximations as n . The probability density function of X i given θ is
f ( x i | θ ) = e θ 1 + e θ x i 1 1 + e θ 1 x i = e θ x i 1 1 + e θ ( x i = 0 , 1 ; θ R 1 ; i = 1 , , n )
and the likelihood ratio of (49) is
f ( x i | θ + h ) f ( x i | θ ) = e ( θ + h ) x i 1 1 + e θ + h e θ x i ( 1 + e θ ) = e h x i 1 + e θ 1 + e θ + h ( h R 1 ) .
Assume that the prior density of θ is
λ ( θ ) = 1 B ( c 1 , c 2 c 1 ) e c 1 θ ( 1 + e θ ) c 2 ( θ R 1 ; c 2 > c 1 + 1 > 2 ) .
Then, the ratio of (51) is equal to
λ ( θ + h ) λ ( θ ) = e ( θ + h ) c 1 ( 1 + e θ + h ) c 2 e θ c 1 ( 1 + e θ ) c 2 = e c 1 h ( 1 + e θ + h ) c 2 ( 1 + e θ ) c 2 .
By denoting X = ( X 1 , , X n ) , and x = ( x 1 , , x n ) , the ratio of joint probability density functions of ( X , θ ) is
P : = f ( x , θ + h ) f ( x , θ ) = i = 1 n f ( x i | θ + h ) f ( x i | θ ) λ ( θ + h ) λ ( θ ) = e h i = 1 n x i ( 1 + e θ ) n + c 2 ( 1 + e θ + h ) n c 2 e c 1 h
by the iid assumption of X i | θ , (50), and (52). From (53), we have
E ( P 2 ) = E e 2 h i = 1 n X i ( 1 + e θ ) 2 n + 2 c 2 ( 1 + e θ + h ) 2 n 2 c 2 e 2 c 1 h = E E X | θ e 2 h i = 1 n X i ( 1 + e θ ) 2 n + 2 c 2 ( 1 + e θ + h ) 2 n 2 c 2 e 2 c 1 h = E E X | θ e 2 h X 1 n ( 1 + e θ ) 2 n + 2 c 2 ( 1 + e θ + h ) 2 n 2 c 2 e 2 c 1 h .
By (37), we have
E ( P 2 ) = E ( 1 + e θ ) n + 2 c 2 ( 1 + e θ + h ) 2 n 2 c 2 ( 1 + e θ + 2 h ) n e 2 c 1 h = e 2 c 1 h B ( c 1 , c 2 c 1 ) × ( 1 + e θ ) c 2 ( 1 + e θ + h ) 2 c 2 e c 1 θ ( 1 + e θ ) ( 1 + e θ + h ) 2 ( 1 + e θ + 2 h ) n d θ .
Here, we consider the Laplace approximation of integral
I 1 = ( 1 + e θ ) c 2 ( 1 + e θ + h ) 2 c 2 e c 1 θ ( 1 + e θ ) ( 1 + e θ + h ) 2 ( 1 + e θ + 2 h ) n d θ
(see e.g., [19]). I 1 can be expressed as
I 1 = g 1 ( θ ) exp n k ( θ ) d θ ,
where g 1 ( θ ) = ( 1 + e θ ) c 2 ( 1 + e θ + h ) 2 c 2 e c 1 θ and k ( θ ) = log ( 1 + e θ ) ( 1 + e θ + h ) 2 ( 1 + e θ + 2 h ) .
Since
k ( θ ) = e θ ( 1 + e h ) 2 ( 1 + e θ + h ) ( 1 + e θ ) ( 1 + e θ + h ) ( 1 + e θ + 2 h ) ,
if k ( θ ) = 0 , then θ = h . k takes its maximum at θ = h ,
k ( h ) = tanh h 2 2 2 < 0
and k ( h ) 0 ( h 0 ). Therefore, the Laplace approximation of I 1 gives
I 1 exp n k ( h ) g 1 ( h ) 2 π n k ( h )
as n from (57)–(59). Here, we have k ( h ) = log ( 1 + e h ) ( 1 + e h ) / 4 0 since e h + e h 2 from the arithmetic-geometric mean inequality. The equality holds if and only if h = 0 . Hence, the leading term of Bobrovsky–Mayor–Wolf–Zakai Bound (4) is
h 2 e 2 c 1 h B ( c 1 , c 2 c 1 ) J n ( h )
as n , from (55) and (60), where
J n ( h ) = exp n k ( h ) g 1 ( h ) 2 π n k ( h ) .
In a similar way to the above, defining
Q : = f ( x , θ + h ) f ( x , θ ) φ ( θ + h ) I ( θ + h ) = P ( 1 + e θ + h ) 2 e θ + h ,
we calculate
E ( Q 2 ) = E ( 1 + e θ ) n + 2 c 2 ( 1 + e θ + h ) 2 n 2 c 2 + 4 ( 1 + e θ + 2 h ) n e 2 c 1 h 2 θ 2 h = e 2 ( c 1 1 ) h B ( c 1 , c 2 c 1 ) × ( 1 + e θ ) c 2 ( 1 + e θ + h ) 2 c 2 + 4 e ( c 1 2 ) θ ( 1 + e θ ) ( 1 + e θ + h ) 2 ( 1 + e θ + 2 h ) n d θ .
Here, we consider the Laplace approximation of the integral
I 2 = g 2 ( θ ) exp { n k ( θ ) } d θ ,
where g 2 ( θ ) = ( 1 + e θ ) c 2 ( 1 + e θ + h ) 2 c 2 + 4 e ( c 1 2 ) θ and k ( θ ) is defined in (57). The Laplace approximation of I 2 gives
I 2 exp n k ( h ) g 2 ( h ) 2 π n k ( h ) = 2 4 e 2 h J n ( h )
as n . Similarly to (41), we have
E φ ( θ ) φ ( θ h ) φ ( θ ) I ( θ ) = E h ( 1 + e θ ) 2 e θ = h B ( c 1 , c 2 c 1 ) e ( c 1 1 ) θ ( 1 + e θ ) c 2 + 2 d θ = h B ( c 1 , c 2 c 1 ) 0 1 t c 1 2 ( 1 t ) c 2 c 1 2 d t ( substitute t = e θ / ( 1 + e θ ) ) = h B ( c 1 1 , c 2 c 1 1 ) B ( c 1 , c 2 c 1 ) .
Hence, by using (64)–(67), the leading term of Bound (1) is
h B ( c 1 1 , c 2 c 1 1 ) B ( c 1 , c 2 c 1 ) 2 2 4 e 2 c 1 h B ( c 1 , c 2 c 1 ) J n ( h )
as n . Dividing (61) by (68) yields
h 2 e 2 c 1 h B ( c 1 , c 2 c 1 ) J n ( h ) h B ( c 1 1 , c 2 c 1 1 ) B ( c 1 , c 2 c 1 ) 2 2 4 e 2 c 1 h B ( c 1 , c 2 c 1 ) J n ( h ) = 4 B ( c 1 , c 2 c 1 ) B ( c 1 1 , c 2 c 1 1 ) 2 = 4 ( c 1 1 ) ( c 2 c 1 1 ) ( c 2 2 ) ( c 2 1 ) 2 < c 2 2 c 2 1 2 < 1 .
The second inequality from the end follows from ( c 2 2 ) / 2 = { ( c 1 1 ) + ( c 2 c 1 1 ) } / 2 ( c 1 1 ) ( c 2 c 1 1 ) by the arithmetic-geometric mean inequality. Hence, (68) is asymptotically greater than (61) for any h in this setting.

4. Conclusions

Bayesian Cramér–Rao-type bounds are often useful for issues of asymptotical efficiency of estimators (for example, [4]). However the Borovkov–Sakhanenko bound is asymptotically tighter ([14,15]) than the Van Trees bound [1]. Since the Bobrovsky–Mayer–Wolf–Zakai bound [17] and the new bound in this paper converge to the Van Trees and Borovkov–Sakhanenko bounds, respectively, as h 0 under some conditions, it is natural to consider that their asymptotical property still holds for a small h. Examples in this paper supported this result. The new bound gives an asymptotic lower bound of normalized Bayes risk, and the bound cannot be improved as h 0 .

Author Contributions

K.-i.K. contributed the method and algorithm; K.-i.K. and S.H. performed the experiments and wrote the paper. All authors have read and approved the final manuscript.

Funding

This work was supported by JSPS KAKENHI, grant numbers JP20K11702 and JP17K142334.

Acknowledgments

The authors would like to thank the referees for the careful reading of the paper, and the valuable suggestions and comments.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Regularity Conditions

We need the following conditions for the convergence of Bounds (1) and (2) to Borovkov–Sakhanenko and the Van Trees bounds, respectively.
(B1)
There exist h 1 > 0 and a function b 1 ( x , θ ) , such that
E b 1 2 ( X , θ ) < and f ( x , θ + h ) f ( x , θ ) φ ( θ + h ) I ( θ + h ) φ ( θ ) I ( θ ) h b 1 ( x , θ ) for all | h | h 1 and arbitrary θ Θ .
(B2)
There exist h 2 > 0 and a function b 2 ( x , θ ) such that
E b 2 2 ( X , θ ) < and f ( x , θ + h ) f ( x , θ ) 1 h b 2 ( x , θ ) for all | h | h 2 and arbitrary θ Θ .

Appendix B. Proof of Theorem 1

Theorem 1 is directly derived from [7]. However, we prove it here for the sake of clarity.
Let G h = 1 h f ( x , θ + h ) f ( x , θ ) φ ( θ + h ) I ( θ + h ) φ ( θ ) I ( θ ) . Then, we have
E ( G h ) = 1 h f ( x , θ + h ) φ ( θ + h ) I ( θ + h ) d θ d μ ( x ) f ( x , θ ) φ ( θ ) I ( θ ) d θ d μ ( x ) = 1 h f ( x , t ) φ ( t ) I ( t ) d t d μ ( x ) f ( x , θ ) φ ( θ ) I ( θ ) d θ d μ ( x ) ( substitute t = θ h ) = 0 ,
and
E { G h φ ( θ ) } = 1 h φ ( θ ) f ( x , θ + h ) φ ( θ + h ) I ( θ + h ) d θ d μ ( x ) φ ( θ ) f ( x , θ ) φ ( θ ) I ( θ ) d θ d μ ( x ) = 1 h φ ( t h ) f ( x , t ) φ ( t ) I ( t ) d t d μ ( x ) φ ( θ ) f ( x , θ ) φ ( θ ) I ( θ ) d θ d μ ( x ) ( substitute t = θ h ) = 1 h E φ ( θ h ) φ ( θ ) I ( θ ) E φ ( θ ) φ ( θ ) I ( θ ) = 1 h E { φ ( θ h ) φ ( θ ) } φ ( θ ) I ( θ ) .
By Fubini’s theorem,
E { G h φ ^ ( X ) } = 1 h φ ^ ( x ) f ( x , θ + h ) φ ( θ + h ) I ( θ + h ) d θ d μ ( x ) φ ^ ( x ) f ( x , θ ) φ ( θ ) I ( θ ) d θ d μ ( x ) = 1 h φ ^ ( x ) f ( x , t ) φ ( t ) I ( t ) d t d μ ( x ) φ ^ ( x ) f ( x , θ ) φ ( θ ) I ( θ ) d θ d μ ( x ) ( substitute t = θ h ) = 0 ,
hence, from (A1), (A2), and (A3),
Cov ( G h , φ ^ ( X ) φ ( θ ) ) = 1 h E { φ ( θ ) φ ( θ h ) } φ ( θ ) I ( θ ) .
From variance–covariance inequality, (A4) gives
E ( φ ^ ( X ) φ ( θ ) ) 2 Cov ( G h , φ ^ ( X ) φ ( θ ) ) 2 E ( G h 2 ) = E { φ ( θ ) φ ( θ h ) } φ ( θ ) I ( θ ) 2 E f ( X , θ + h ) f ( X , θ ) φ ( θ + h ) I ( θ + h ) φ ( θ ) I ( θ ) 2 ,
which is the desired inequality.

References

  1. Van Trees, H.L. Detection, Estimation, and Modulation Theory, Part I; Wiley: New York, NY, USA, 1968. [Google Scholar]
  2. Borovkov, A.A.; Sakhanenko, A.U. On estimates of the expected quadratic risk (in Russian). Probab. Math. Statist. 1980, 1, 185–195. [Google Scholar]
  3. Borovkov, A.A. Mathematical Statistics; Gordon and Breach: Amsterdam, The Netherlands, 1998. [Google Scholar]
  4. Gill, R.D.; Levit, B.Y. Applications of the van Trees inequality: A Bayesian Cramér-Rao bound. Bernoulli 1995, 1, 59–79. [Google Scholar] [CrossRef] [Green Version]
  5. Koike, K. An integral Bhattacharyya type bound for the Bayes risk. Commun. Stat. Theory Methods 2006, 35, 2185–2195. [Google Scholar] [CrossRef] [Green Version]
  6. Hashimoto, S.; Koike, K. Bhattacharyya type information inequality for the Bayes risk. Commun. Stat. Theory Methods 2015, 44, 5213–5224. [Google Scholar] [CrossRef]
  7. Weinstein, E.; Weiss, A.J. A general class of lower bounds in parameter estimation. IEEE Trans. Inf. Theory 1988, 34, 338–342. [Google Scholar] [CrossRef]
  8. Renaux, A.; Forster, P.; Larzabal, P.; Richmond, C.D.; Nehorai, A. A fresh look at the Bayesian bounds of the Weiss-Weinstein family. IEEE Trans. Signal Process. 2008, 56, 5334–5352. [Google Scholar] [CrossRef] [Green Version]
  9. Todros, K.; Tabrikian, J. General classes of performance lower bounds for parameter estimation—Part II: Bayesian bounds. IEEE Trans. Inf. Theory 2010, 56, 5064–5082. [Google Scholar] [CrossRef]
  10. Ziv, J.; Zakai, M. Some lower bounds on signal parameter estimation. IEEE Trans. Inf. Theory 1969, 15, 386–391. [Google Scholar] [CrossRef]
  11. Bell, K.L.; Steinberg, Y.; Ephraim, Y.; Van Trees, H.L. Extended Ziv-Zakai lower bound for vector parameter estimation. IEEE Trans. Inf. Theory 1997, 43, 624–637. [Google Scholar] [CrossRef]
  12. Routtenberg, T.; Tabrikian, J. A general class of outage error probability lower bounds in Bayesian parameter estimation. IEEE Trans. Signal Process. 2012, 60, 2152–2166. [Google Scholar] [CrossRef]
  13. Van Trees, H.L.; Bell, K.L. Bayesian Bounds for Parameter Estimation and Nonlinear Filtering/Tracking; Wiley and IEEE Press: Piscataway, NJ, USA, 2007. [Google Scholar]
  14. Abu-Shanab, R.; Veretennikov, A.Y. On asymptotic Borovkov-Sakhanenko inequality with unbounded parameter set. Theory Probab. Math. Stat. 2015, 90, 1–12. [Google Scholar] [CrossRef] [Green Version]
  15. Koike, K. Asymptotic comparison of some Bayesian information bounds. Commun. Stat. Theory Methods 2020, 49. [Google Scholar] [CrossRef]
  16. Koike, K. Attainments of the Bayesian information bounds. Commun. Stat. Theory Methods 2019, 48. [Google Scholar] [CrossRef]
  17. Bobrovsky, B.Z.; Mayer-Wolf, E.; Zakai, M. Some classes of grobal Cramér-Rao bounds. Ann. Stat. 1987, 15, 1421–1438. [Google Scholar] [CrossRef]
  18. Prentice, R.L. A generalization of the probit and logit models for dose response curves. Biometrics 1976, 32, 761–768. [Google Scholar] [CrossRef] [PubMed]
  19. Small, C.G. Expansions and Asymptotics for Statistics; CRC Press: New York, NY, USA, 2010. [Google Scholar]
Figure 1. N h , N 0 , BMZ h , BMZ 0 , and Bayes risk.
Figure 1. N h , N 0 , BMZ h , BMZ 0 , and Bayes risk.
Entropy 23 00161 g001
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Koike, K.-i.; Hashimoto, S. Improvement of Bobrovsky–Mayor–Wolf–Zakai Bound. Entropy 2021, 23, 161. https://doi.org/10.3390/e23020161

AMA Style

Koike K-i, Hashimoto S. Improvement of Bobrovsky–Mayor–Wolf–Zakai Bound. Entropy. 2021; 23(2):161. https://doi.org/10.3390/e23020161

Chicago/Turabian Style

Koike, Ken-ichi, and Shintaro Hashimoto. 2021. "Improvement of Bobrovsky–Mayor–Wolf–Zakai Bound" Entropy 23, no. 2: 161. https://doi.org/10.3390/e23020161

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop