Next Article in Journal
Inverted Weibull Regression Models and Their Applications
Next Article in Special Issue
fsdaSAS: A Package for Robust Regression for Very Large Datasets Including the Batch Forward Search
Previous Article in Journal
Decisions in Risk and Reliability: An Explanatory Perspective
Previous Article in Special Issue
Cumulative Median Estimation for Sufficient Dimension Reduction
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Measuring Bayesian Robustness Using Rényi Divergence

1
Department of Mathematical & Computational Sciences, University of Toronto Mississauga, Mississauga, ON L5L 1C6, Canada
2
Department of Mathematical Sciences, Isfahan University of Technology, Isfahan 84156-83111, Iran
*
Author to whom correspondence should be addressed.
Stats 2021, 4(2), 251-268; https://doi.org/10.3390/stats4020018
Submission received: 17 February 2021 / Revised: 17 March 2021 / Accepted: 22 March 2021 / Published: 29 March 2021
(This article belongs to the Special Issue Robust Statistics in Action)

Abstract

:
This paper deals with measuring the Bayesian robustness of classes of contaminated priors. Two different classes of priors in the neighborhood of the elicited prior are considered. The first one is the well-known ϵ -contaminated class, while the second one is the geometric mixing class. The proposed measure of robustness is based on computing the curvature of Rényi divergence between posterior distributions. Examples are used to illustrate the results by using simulated and real data sets.

1. Introduction

Bayesian inferences require the specification of a prior, which contains a priori knowledge about the parameter(s). If the selected prior, for instance, is flawed, this may yield erroneous inferences.
The goal of this paper is to measure the sensitivity of inferences to a chosen prior (known as robustness). Since, in most cases, it becomes very challenging to come up with only a sole prior distribution, we consider a class, Γ , of all possible priors over the parameter space. To construct Γ , a preliminary prior π 0 is elicited. Then robustness for all priors π in a neighborhood of π 0 is intended. A commonly accepted way to construct neighborhoods around π 0 is through contamination. Specifically, we will consider two different classes of contaminated or mixture of priors, which are given by
Γ a = π ( θ ) : π ( θ ) = ( 1 ϵ ) π 0 ( θ ) + ϵ q ( θ ) , q Q
and
Γ g = π ( θ ) : π ( θ ) = c ( ϵ ) π 0 1 ϵ ( θ ) q ϵ ( θ ) , q Q ,
where π 0 is the elicited prior, Q is a class of distributions, c ( ϵ ) is normalizing constant and 0 ϵ 1 is a small given number denoting the amount of contamination. For other possible classes of priors, see for instance, De Robertis and Hartigan (1981) [1] and Das Gupta and Studden (1988a, 1988b) [2,3].
The class (1) is known as the ϵ -contaminated class of priors. Many papers about the class (1) are found in the literature. For instance, Berger (1984, 1990) [4,5], Berger and Berliner (1986) [6], and Sivaganesan and Berger (1989) [7] used various choices of Q. Wasserman (1989) [8] used (1) to study robustness of likelihood regions. Dey and Birmiwal (1994) [9] studied robustness based on the curvature. Al-Labadi and Evans (2017) [10] studied robustness of relative belief ratios (Evans, 2015 [11]) under class (1).
On the other hand, the class (2) will be referred as geometric contamination or mixture class. This class was first studied, in the context of Bayesian Robustness, by Gelfand and Dey (1991) [12], where the posterior robustness was measured using Kullback-Leibler divergence. Dey and Birmiwal (1994) [9] generalized the results of Gelfand and Dey (1991) [12] under (1) and (2) by using the ϕ divergence defined by
d ϕ ( π ( θ | x ) , π 0 ( θ | x ) ) = π 0 ( θ | x ) ϕ ( π ( θ | x ) / π 0 ( θ | x ) ) d θ
for a smooth convex function ϕ . For example, ϕ ( x ) = x ln x gives Kullbak-Leibler divergence.
In this paper, we extend the results of Gelfand and Dey (1991) [12] and Dey and Birmiwal (1994) [9] by applying Rényi divergence on both classes (1) and (2). This will give local sensitivity analysis on the effect of small perturbation to the prior. Rényi entropy, developed by Hungarian mathematician Alfréd Rényi in 1961, generalizes the Shannon entropy and includes other entropy measures as special cases. It finds applications, for instance, in statistics [13], pattern recognition [14], economics [15] and biomedicine [16].
Although the focus of this paper is on Rényi divergence, it also contains ( h , ϕ ) family of divergence measures (Menéndez et al., 1995 [17]). Examples of ( h , ϕ ) divergence include Rényi divergence, Shama-Mittal divergence and Bhattacharyya divergence. We refer the reader to Pardo (2006) [18] for more details about ( h , ϕ ) divergence.
An outline of this paper is as follows. In Section 2, we give definitions, notations and some properties of Rényi divergence. In Section 3, we develop curvature formulas for measuring robustness based on Rényi divergence and ( h , ϕ ) divergence. In Section 4, three examples are studied to illustrate the results numerically. Section 5 ends with a brief summary of the results.

2. Definitions and Notations

Suppose we have a statistical model that is given by the density function f θ ( x ) (with respect to some measure), where θ is an unknown parameter that belongs to the parameter space Θ . Let π ( θ ) be the prior distribution of θ . After observing the data x, by Bayes’ theorem, the posterior distribution of θ is given by the density
π ( θ | x ) = f θ ( x ) π ( θ ) m ( x | π ) ,
where
m ( x | π ) = f θ ( x ) π ( θ ) d θ
is the prior predictive density of the data.
To measure the divergence between two posterior distributions, we consider Rényi divergence (Rényi, 1961 [19]). Rényi divergence of order a between two posterior densities π ( θ | x ) and π 0 ( θ | x ) is defined as:
d = d ( π ( θ | x ) , π 0 ( θ | x ) ) = 1 a 1 ln π ( θ | x ) a π 0 ( θ | x ) 1 a d θ = 1 a 1 ln E π 0 ( θ | x ) π ( θ | x ) π 0 ( θ | x ) a ,
where a > 0 and E π 0 ( θ | x ) denotes the expectation with respect to the density π 0 ( θ | x ) . It is known that d ( π ( θ | x ) , π 0 ( θ | x ) ) 0 for all π ( θ | x ) , π 0 ( θ | x ) , a > 0 and d ( π ( θ | x ) , π 0 ( θ | x ) ) = 0 if and only if π ( θ | x ) = π 0 ( θ | x ) . Please note that the case a = 1 is defined by letting a 1 . Other values of a of a particular interest are a = 0 , 0.5 , 2 and (van Erven and Harremoës, 2014 [20]). For further properties of Rényi divergence consult, for example, Li and Turner (2016) [21].
Rényi divergence belongs to the following general class of family of divergence measures called the ( h , ϕ ) divergence (Menéndez et al., 1995 [17]).
Definition 1.
Let h be a differentiable increasing real function mapping from 0 , ϕ ( 0 ) + lim t ϕ ( t ) t to [ 0 , ) . The ( h , ϕ ) divergence measure between two posterior distributions π ( θ | x ) and π 0 ( θ | x ) is defined as
d ϕ h ( π ( θ | x ) , π 0 ( θ | x ) ) = h ( d ϕ ( π ( θ | x ) , π 0 ( θ | x ) ) ) ,
where d ϕ ( π ( θ | x ) , π 0 ( θ | x ) ) is the ϕ divergence defined in (3).
Please note that Rényi divergence is a ( h , ϕ ) divergence measure with h ( x ) = 1 a 1 ln [ a ( a 1 ) x + 1 ] , ϕ ( x ) = x a a ( x 1 ) 1 a ( a 1 ) for a 0 , 1 . To see this, from Definition 1, we have
h ( d ϕ ( π ( θ | x ) , π 0 ( θ | x ) ) ) = 1 a 1 ln [ a ( a 1 ) d ϕ ( π ( θ | x ) , π 0 ( θ | x ) ) + 1 ] = 1 a 1 ln [ a ( a 1 ) π 0 ( θ | x ) { π ( θ | x ) π 0 ( θ | x ) a a π ( θ | x ) π 0 ( θ | x ) + a 1 a ( a 1 ) } d θ + 1 ] = 1 a 1 ln [ π 0 ( θ | x ) π ( θ | x ) π 0 ( θ | x ) a d θ a π 0 ( θ | x ) π ( θ | x ) π 0 ( θ | x ) d θ + a π 0 ( θ | x ) d θ π 0 ( θ | x ) d θ + 1 ] = 1 a 1 ln E π 0 ( θ | x ) π ( θ | x ) π 0 ( θ | x ) a ,
which is Rényi divergence as defined in (4).
Similar to McCulloch (1989) [22] and Dey and Birmiwal (1994) [9] for calibrating, respectively, the Kullback-Leibler divergence and the ϕ divergence, it is also possible to calibrate Rényi divergence as follows. Consider a biased coin where X = 1 (heads) occurs with probability p. Then Rényi divergence between an unbiased and a biased coin is
d ( f 0 , f 1 ) = 1 a 1 ln 2 a 1 p a + ( 1 p ) a ,
where for x = 0 , 1 , f 0 ( x ) = 0.5 and f 1 ( x ) = p x ( 1 p ) 1 x . Now, setting d ( f 0 , f 1 ) = d 0 gives
2 1 a e ( a 1 ) d 0 = p a + ( 1 p ) a .
Then the number p is the calibration of d. In general, Equation (6) needs to be solved numerically for p. Please note that for the case a = 1 (i.e., the Kullback-Leibler divergence) one may use the following explicit formula for p due to McCulloch (1989) [22]:
p = 0.5 + 0.5 1 e 2 d 0 1 / 2 .
Values of p close to 1 indicate that f 0 and f 1 are quite different, while values of p close to 0.5 implies that they are similar. It is restricted that p is chosen so that it is between 0.5 and 1 there is a one-to-one correspondence between p and d 0 .
A motivating key fact about Rényi divergence follows from its Taylor expansion. Let
f ( ϵ ) = d ( π ( θ | x ) , π 0 ( θ | x ) ) = 1 a 1 ln π ( θ | x ) a π 0 ( θ | x ) 1 a d θ ,
where π ( θ | x ) is the posterior distribution of θ given the data x under the prior π defined in (1) and (2). Assuming differentiability with respect to ϵ , the Taylor expansion of f ( ϵ ) about ϵ = 0 is given by
f ( ϵ ) = f ( 0 ) + ϵ f ( ϵ ) ϵ | ϵ = 0 + ϵ 2 2 2 f ( ϵ ) ϵ 2 | ϵ = 0 + .
Clearly, f ( 0 ) = 0 . If integration and differentiation are interchangeable, we have
f ( ϵ ) ϵ = a 1 a π 0 ( θ | x ) 1 a π ( θ | x ) a 1 π ( θ | x ) ϵ d θ π 0 ( θ | x ) 1 a π ( θ | x ) a d θ .
Hence,
f ( ϵ ) ϵ | ϵ = 0 = a 1 a π ( θ | x ) ϵ d θ = a 1 a ϵ π ( θ | x ) d θ = a 1 a ϵ ( 1 ) = 0 .
On the other hand,
2 f ( ϵ ) ϵ 2 = ϵ a 1 a π 0 ( θ | x ) 1 a π ( θ | x ) a 1 π ( θ | x ) ϵ d θ π 0 ( θ | x ) 1 a π ( θ | x ) a d θ ,
which at ϵ = 0 , reduces to
2 f ( ϵ ) ϵ 2 | ϵ = 0 = a π ( θ | x ) ϵ 2 π ( θ | x ) d θ | ϵ = 0 = a π ( θ | x ) ϵ π ( θ | x ) 2 π ( θ | x ) d θ | ϵ = 0 = a E π ( θ | x ) ln π ( θ | x ) ϵ 2 | ϵ = 0 = a I π ( θ | x ) ( ϵ ) | ϵ = 0 .
Here I π ( θ | x ) ( ϵ ) = E π ( θ | x ) ln π ( θ | x ) ϵ 2 | ϵ = 0 is the Fisher information function for π ( θ | x ) (Lehmann and Casella, 1998 [23]). Thus, for ϵ 0 , we have
d ( π ( θ | x ) , π 0 ( θ | x ) ) a ϵ 2 2 I π ( θ | x ) ( ϵ ) .
Please note that 2 f ( ϵ ) / ϵ 2 | ϵ = 0 = 2 d / ϵ 2 | ϵ = 0 is known as the local curvature at ϵ = 0 of Rényi divergence. Formula (8) justifies the use of the curvature to measure the Bayesian robustness of the two classes of priors Γ a and Γ g as defined in (1) and (2), respectively. Also this formula provide a direct relationship between Fisher’s information and the curvature of Rényi divergence.

3. Measuring Robustness Using Rényi Divergence

In this section, we explicitly obtain the local curvature at ϵ = 0 of Rényi divergence (i.e., 2 d / ϵ 2 | ϵ = 0 ), to measure the Bayesian robustness of the two classes of priors Γ a and Γ g as defined in (1) and (2), respectively. The resulting quantities are presumably much easier to estimate than working directly with Rényi divergence.
Theorem 1.
For the ϵ-contaminated class defined in (1), the local curvature of Rényi divergence at ϵ = 0 is
C a Γ a = 2 d ϵ 2 | ϵ = 0 = a V a r π 0 ( θ | x ) q ( θ ) π 0 ( θ ) ,
where V a r π 0 ( θ | x ) denotes the variance with respect to π 0 ( θ | x ) .
Proof. 
Under the prior π defined in (1), the marginal m ( θ | x ) and the posterior distribution π ( θ | x ) can be written as
m ( x | π ) = ( 1 ϵ ) m ( x | π 0 ) + ϵ m ( x | q )
and
π ( θ | x ) = f θ ( x ) π ( θ ) m ( x | π ) = f θ ( x ) ( 1 ϵ ) π 0 ( θ ) + ϵ q ( θ ) m ( x | π ) = λ ( x ) π 0 ( θ | x ) + ( 1 λ ( x ) ) q ( θ | x ) ,
where
λ ( x ) = ( 1 ϵ ) m ( x | π 0 ) m ( x | π ) .
Define
f ( ϵ ) = d π ( θ | x ) , π 0 ( θ | x ) = 1 a 1 ln π ( θ | x ) a π 0 ( θ | x ) 1 a d θ = 1 a 1 ln γ d θ ,
where
γ = π ( θ | x ) a π 0 ( θ | x ) 1 a = λ ( x ) π 0 ( θ | x ) + ( 1 λ ( x ) ) q ( θ | x ) a π 0 ( θ | x ) 1 a .
Clearly,
γ | ϵ = 0 = π 0 ( θ | x ) a n d γ | ϵ = 0 d θ = 1 .
We have
γ ϵ = a m ( x | q ) m ( x | π 0 ) q ( θ | x ) π 0 ( θ | x ) ϵ q ( θ | x ) m ( x | q ) + ( 1 ϵ ) m ( x | π 0 ) π 0 ( θ | x ) ( 1 ϵ ) m ( x | π 0 ) + ϵ m ( x | q )
and
γ ϵ | ϵ = 0 = a m ( x | q ) q ( θ | x ) π 0 ( θ | x ) m ( x | π 0 ) .
Thus,
γ ϵ d θ | ϵ = 0 = 0 .
Now,
2 d ϵ 2 = ϵ 1 a 1 γ ϵ d θ γ d θ = 1 a 1 [ γ d θ ] [ 2 γ ϵ 2 d θ ] [ γ ϵ d θ ] 2 [ γ d θ ] 2 .
By (10) and (11),
2 d ϵ 2 | ϵ = 0 = 1 a 1 2 γ ϵ 2 | ϵ = 0 d θ .
We have
2 γ ϵ 2 ϵ = 0 = ( π 0 ( θ | x ) m ( x | π 0 ) q ( θ | x ) m ( x | q ) π 0 ( θ | x ) m ( x | π 0 ) + m ( x | π 0 ) m ( x | q ) m ( x | π 0 ) + a m ( x | q ) m ( x | π 0 ) q ( θ | x ) π 0 ( θ | x ) π 0 ( θ | x ) ) × a m ( x | q ) m ( x | π 0 ) q ( θ | x ) π 0 ( θ | x ) .
Since
m ( x | q ) m ( x | π 0 ) = f θ ( x ) q ( θ ) d θ m ( x | π 0 ) = f θ ( x ) π 0 ( θ ) q ( θ ) π 0 ( θ ) d θ m ( x | π 0 ) = π 0 ( θ | x ) q ( θ ) π 0 ( θ ) d θ = E π 0 ( θ | x ) q ( θ ) π 0 ( θ ) ,
from (12), we get
2 γ ϵ 2 | ϵ = 0 = a 2 E π 0 ( θ | x ) q ( θ ) π 0 ( θ ) E π 0 ( θ | x ) q ( θ ) π 0 ( θ ) q ( θ | x ) π 0 ( θ | x ) a E π 0 ( θ | x ) 2 q ( θ ) π 0 ( θ ) q ( θ | x ) π 0 ( θ | x ) q ( θ | x ) π 0 ( θ | x ) + a 2 E π 0 ( θ | x ) 2 q ( θ ) π 0 ( θ ) q ( θ | x ) π 0 ( θ | x ) 2 π 0 ( θ | x ) .
Therefore,
2 d ϵ 2 | ϵ = 0 = a ( E π 0 ( θ | x ) q ( θ ) π 0 ( θ ) 2 E π 0 ( θ | x ) q ( θ | x ) π 0 ( θ | x ) 2 E π 0 ( θ | x ) q ( θ ) π 0 ( θ ) 2 ) .
Please note that
q ( θ | x ) π 0 ( θ | x ) 2 = q ( θ ) f θ ( x ) / m ( x | q ) π ( θ ) f θ ( x ) / m ( x | π 0 ) 2 = q ( θ ) π ( θ ) 2 m ( x | π 0 ) m ( x | q ) 2
Hence, by (13),
E π 0 ( θ | x ) q ( θ | x ) π 0 ( θ | x ) 2 = E π 0 ( θ | x ) q ( θ ) π 0 ( θ ) 2 1 E π 0 ( θ | x ) q ( θ ) π 0 ( θ ) 2 .
Thus, by (14) and (15),
2 d ϵ 2 | ϵ = 0 = a E π 0 ( θ | x ) q ( θ ) π 0 ( θ ) 2 E π 0 ( θ | x ) q ( θ ) π 0 ( θ ) 2 = a V a r π 0 ( θ | x ) q ( θ ) π 0 ( θ ) .
   □
Theorem 2.
For the geometric contaminated class defined in (2), the local curvature of Rényi divergence at ϵ = 0 is
C a Γ g = 2 d ϵ 2 | ϵ = 0 = a V a r π 0 ( θ | x ) ln q ( θ ) π 0 ( θ ) ,
V a r π 0 ( θ | x ) denotes the variance with respect to π 0 ( θ | x ) .
Proof. 
Define
γ = π ( θ | x ) a π 0 ( θ | x ) 1 a .
Thus,
d = 1 a 1 ln γ d θ .
We have
d ϵ = 1 a 1 × γ ϵ d θ γ d θ
and
2 d ϵ 2 = 1 a 1 × γ d θ 2 γ ϵ 2 d θ γ ϵ d θ 2 γ d θ 2 .
Since γ | ϵ = 0 = π 0 ( θ | x ) ,
2 d ϵ 2 | ϵ = 0 = 2 γ ϵ 2 d θ | ϵ = 0 γ ϵ d θ 2 | ϵ = 0 .
For the geometric class defined in (2),
π ( θ | x ) = f θ ( x ) π ( θ ) m ( x | π ) = f θ ( x ) c ( ϵ ) ( π 0 ( θ ) ) 1 ϵ ( q ( θ ) ) ϵ m ( x | π ) a n d π 0 ( θ | x ) = f θ ( x ) π 0 ( θ ) m ( x | π 0 ) .
Thus,
γ = f θ ( x ) ( c ( ϵ ) ) a ( π 0 ( θ ) ) 1 a ϵ ( q ( θ ) ) a ϵ ( m ( x | π ) ) a ( m ( x | π 0 ) ) 1 a .
Therefore,
ln γ = a ln c ( ϵ ) m ( x | π ) a ϵ ln π 0 ( θ ) q ( θ ) + ln f θ ( x ) π 0 ( θ ) ( m ( x | π 0 ) ) 1 a .
We have
γ ϵ = γ ln γ ϵ = a γ ϵ ln c ( ϵ ) m ( x | π ) ln π 0 ( θ ) q ( θ ) .
As
ϵ ln c ( ϵ ) m ( x | π ) = E π 0 ( θ | x ) ln π 0 ( θ ) q ( θ )
(Dey and Birmiwal, 1994 [9], Theorem 3.2), we get
γ ϵ = a γ E π 0 ( θ | x ) ln π 0 ( θ ) q ( θ ) ln π 0 ( θ ) q ( θ ) .
Since γ | ϵ = 0 = π 0 ( θ | x ) , by (16) and (18), it follows that γ ϵ d θ | ϵ = 0 = 0 and
2 d ϵ 2 | ϵ = 0 = 2 γ ϵ 2 d θ | ϵ = 0 .
Now, by (18),
2 γ ϵ 2 = ϵ a γ E π 0 ( θ | x ) ln π 0 ( θ ) q ( θ ) ln π 0 ( θ ) q ( θ ) = a γ E π 0 ( θ | x ) ln π 0 ( θ ) q ( θ ) ln π 0 ( θ ) q ( θ ) 2 .
Using the γ | ϵ = 0 = π 0 ( θ | x ) one more time, we obtain
2 d ϵ 2 | ϵ = 0 = 2 γ ϵ 2 | ϵ = 0 d θ = a V a r π 0 ( θ | x ) ln q ( θ ) π 0 ( θ ) .
   □
The curvature of the family ( h , ϕ ) of divergence measures under classes (1) and (2) is derived in the next theorem.
Theorem 3.
The local curvature for the ( h , ϕ ) divergence under classes (1) and (2) are respectively given by
i. 
C a Γ a = 2 d ϕ h ( π ( θ | x ) , π 0 ( θ | x ) ) ϵ 2 | ϵ = 0 = a ϕ ( 1 ) V a r π 0 ( θ | x ) q ( θ ) π 0 ( θ ) ,
ii. 
C a Γ g = 2 d ϕ h ( π ( θ | x ) , π 0 ( θ | x ) ) ϵ 2 | ϵ = 0 = a ϕ ( 1 ) V a r π 0 ( θ | x ) ln q ( θ ) π 0 ( θ ) ,
where ϕ ( 1 ) is the second derivation of smooth convex function ϕ at 1.
Proof. 
To prove (i), from Equation (5), we have
d ϕ h ( π ( θ | x ) , π 0 ( θ | x ) ) ϵ = 1 a 1 a ( a 1 ) d ϕ ( π ( θ | x ) , π 0 ( θ | x ) ) ϵ a ( a 1 ) d ϕ ( π ( θ | x ) , π 0 ( θ | x ) ) + 1 = a d ϕ ( π ( θ | x ) , π 0 ( θ | x ) ) ϵ a ( a 1 ) d ϕ ( π ( θ | x ) , π 0 ( θ | x ) ) + 1 .
Now, we get
2 d ϕ h ( π ( θ | x ) , π 0 ( θ | x ) ) ϵ 2 = a { a ( a 1 ) d ϕ ( π ( θ | x ) , π 0 ( θ | x ) ) + 1 2 d ϕ ( π ( θ | x ) , π 0 ( θ | x ) ) ϵ 2 a ( a 1 ) d ϕ ( π ( θ | x ) , π 0 ( θ | x ) ) + 1 2 a ( a 1 ) d ϕ h ( π ( θ | x ) , π 0 ( θ | x ) ) ϵ 2 a ( a 1 ) d ϕ ( π ( θ | x ) , π 0 ( θ | x ) ) + 1 2 } .
From Dey and Birmiwal (1994, Thm 3.1) [9], under class (1), we have
d ϕ ( π ( θ | x ) , π 0 ( θ | x ) ) ϵ | ϵ = 0 = 0
and
2 d ϕ ( π ( θ | x ) , π 0 ( θ | x ) ) ϵ 2 | ϵ = 0 = ϕ ( 1 ) V a r π 0 ( θ | x ) q ( θ ) π 0 ( θ ) .
Therefore,
2 d ϕ h ( π ( θ | x ) , π 0 ( θ | x ) ) ϵ 2 | ϵ = 0 = [ 0 + 1 ] a ϕ ( 1 ) V a r π 0 ( θ | x ) q ( θ ) π 0 ( θ ) 0 1 ,
and the proof of (i) is concluded. To prove (ii), from Dey and Birmiwal (1994, Thm 3.2.) [9], under class (2), we have
d ϕ ( π ( θ | x ) , π 0 ( θ | x ) ) ϵ | ϵ = 0 = 0
and
2 d ϕ ( π ( θ | x ) , π 0 ( θ | x ) ) ϵ 2 | ϵ = 0 = ϕ ( 1 ) V a r π 0 ( θ | x ) ln q ( θ ) π 0 ( θ ) .
Similar to the proof of (i), by considering the above equations in (19) the proof of (ii) is concluded.    □
Please note that since for Rényi divergence ϕ ( x ) = x a a ( x 1 ) 1 a ( a 1 ) , we have ϕ ( 1 ) = 1 . This implies that Theorems 1 and 2 can be obtained by Theorem 3. However, the proofs of Theorems 1 and 2 are more general and could be applied to cases that are not a member of ( h , ϕ ) divergence.

4. Examples

In this section, the derived results are explained through three examples: the Bernoulli model, the multinomial model and the location normal model. In each example, the curvature values for the two classes (1) and (2) are reported. Additionally, in Example 1, we computed Rényi divergence between π ( θ | x ) and π 0 ( θ | x ) and reported the calibrated value p as described in (6) and (7). Recall that curvature values close to zero indicate robustness of the used prior whereas larger values suggest lack of robustness. On the other hand, values of p close to 0.5 suggest robustness whereas values of p close to 1 means absence of robustness.
Example 1
(Bernoulli Model). Suppose x = ( x 1 , , x n ) is a sample from a Bernoulli distribution with a parameter θ. Let the prior π 0 ( θ ) be Beta ( α , β ) , i.e.,
π ( θ ) = Γ ( α + β ) Γ ( α ) Γ ( β ) θ α 1 ( 1 θ ) β 1 .
Thus, π 0 ( θ | x ) is
B e t a α + t , β + n t ,
where t = i = 1 n x i . Let q ( θ ) be Beta ( c α , c β ) for c > 0 .
Now consider the two samples x = ( 0 , 0 , 1 , 1 , 0 , 1 , 1 , 1 , 1 , 0 , 0 , 0 , 1 , 0 , 1 , 0 , 1 , 1 , 0 , 1 ) and x = ( 0 , 0 , 1 , 1 , 0 , 1 , 1 , 1 , 1 , 0 , 0 , 0 , 1 , 0 , 1 , 0 , 1 , 1 , 0 , 1 , 1 , 0 , 1 , 0 , 0 , 0 , 0 , 0 , 1 , 0 , 0 , 1 , 0 , 0 , 1 , 1 , 1 , 0 , 1 , 0 , 1 , 1 , 1 , 1 , 1 , 1 , 0 , 0 , 1 , 1 ) of sizes n = 20 and n = 50 generated from Bernoulli ( 0.5 ) . For comparison purposes, we consider several values of α , β and c. Although it is possible to find exact formulas of the curvature by some algebraic manipulation, it looks more convenient to use a Monte Carlo approach in this example. The computational steps are summarized in Algorithm 1.
Algorithm 1 Computing curvature based on Monte Carlo approach
  • For s = 1 , , 10 6 , generate θ ( s ) from the posterior π 0 ( θ | x ) .
  • For each θ ( s ) , find q ( θ ( s ) ) and π 0 ( θ ( s ) ) .
  • Compute the sample variance of the 10 6 values of q ( θ ( s ) ) / π 0 ( θ ( s ) ) .
    Denote this value by V a r ^ π 0 ( θ | x ) ( q ( θ ) / π 0 ( θ ) ) .
  • Return a V a r ^ π 0 ( θ | x ) ( q ( θ ) / π 0 ( θ ) ) as the curvature value under class (1).
  • Compute the sample variance of the 10 6 values of ln q ( θ ( s ) ) / π 0 ( θ ( s ) ) .
    Denote this values by V a r ^ π 0 ( θ | x ) ( ln q ( θ ) / π 0 ( θ ) ) .
  • Return a V a r ^ π 0 ( θ | x ) ( ln q ( θ ) / π 0 ( θ ) ) as the curvature value under class (2).
The values of the curvature for both classes (1) and (2) are reported in Table 1. Remarkably, for the cases when α = β = 1 (uniform prior on [ 0 , 1 ] ) and α = β = 0.5 (Jeffreys’ prior), the curvature values are prominently small for all values of c. Also, it is clear that when c = 1 , the curvature values are 0. It worth noticing here that when fixing the parameters α , β and c, the curvature decrease by increasing the sample size. This supports the fact that the effect of the prior dissipates with increasing the sample.
While it is easier to quantify the curvature based on Theorems 1 and 2, in this example, for comparison purposes, we computed Rényi divergence between π ( θ | x ) and π 0 ( θ | x ) under classes (1) and (2). It can be shown that under class (1) in (9), π ( θ | x ) = λ ( x ) B e t a α + t , β + n t + ( 1 λ ( x ) ) B e t a c α + t , c β + n t , where
λ ( x ) = ( 1 ϵ ) Γ ( α + β ) Γ ( α ) Γ ( β ) Γ ( α + t ) Γ ( β t + n ) Γ ( α + β + n ) ( 1 ϵ ) Γ ( α + β ) Γ ( α ) Γ ( β ) Γ ( α + t ) Γ ( β t + n ) Γ ( α + β + n ) + ϵ Γ ( c α + c β ) Γ ( c α ) Γ ( c β ) Γ ( c α + t ) Γ ( c β t + n ) Γ ( c α + c β + n ) .
Also, from (17), it can be easily concluded that the posterior π ( θ | x ) under class (2) is obtained as
π ( θ | x ) = K × θ t ( 1 θ ) n t B e t a α , β 1 ϵ B e t a c α , c β ϵ Γ ( α + β ) Γ ( α ) Γ ( β ) ( 1 ϵ ) Γ ( c α + c β ) Γ ( c α ) Γ ( c β ) ϵ ,
K = Γ ( t + ( 1 ϵ ) ( α 1 ) + ϵ ( c α 1 ) + 1 ) Γ ( n t + ( 1 ϵ ) ( β 1 ) + ϵ ( c β 1 ) + 1 ) Γ ( ( 1 ϵ ) ( α + β 2 ) + ϵ ( c α + c β 2 ) + n + 2 ) .
Please note that since d ( π ( θ | x ) , π 0 ( θ | x ) ) = 1 a 1 ln E π 0 ( θ | x ) π ( θ | x ) π 0 ( θ | x ) a , it possible to compute the distance based on a Monte Carlo approach. When a = 1 , d ( π ( θ | x ) , π 0 ( θ | x ) ) = E π 0 ( θ | x ) π ( θ | x ) π 0 ( θ | x ) ln π ( θ | x ) π 0 ( θ | x ) , the Kullback-Leibler divergence. We also calibrated Rényi divergence values as described in (6) and (7).To save space, the results based on class (1) and (2) of the sample of size n = 20 are reported in Table 2 and Table 3, respectively.
Please note that from (8), by multiplying the curvature value in Table 1 by ϵ 2 / 2 , one may get the value of the corresponding distance in Table 2 and Table 3. For instance, setting α = 1 , β = 3 , c = 0.5 , a = 0.5 in Table 1, gives C a Γ a = 0.0265 . The corresponding distance is 0.0265 × 0 . 5 2 / 2 = 0.0033 , which close to the one reported in Table 2.
Now we consider the Australian AIDS survival data, available in the R package “Mass”. There are 2843 patients diagnosed with AIDS in Australia before 1 July 1991. The data frame contains the following columns: state, sex, date of diagnosis, date of death at end of observation, status (“ A = 0 ” (alive) or “ D = 1 ” (dead) at end of observation), reported transmission category, and age at diagnosis. There are 1082 and 1761 alive and dead cases. We consider the values of column status. Under the prior distribution given above, the values of the curvatures for two classes (1) and (2) are summarized in Table 4 for a random sample of size n = 20 and for the whole data. The sampled data is x = ( 1 , 1 , 1 , 0 , 1 , 0 , 0 , 0 , 1 , 1 , 0 , 0 , 1 , 0 , 1 , 0 , 0 , 1 , 1 , 0 ) . It interesting to notice that unlike the sample of size n = 20 , for the whole dataset (i.e., n = 2843 ), the value of the curvature is small for all cases of α , β and c, demonstrating less effect of the prior in the presence of a large sample size.
Example 2
(Multinomial model). Suppose that x = ( x 1 , x 2 , , x k ) is an observation from a multinomial distribution with parameters ( N , ( θ 1 , , θ k ) ) , where i = 1 k x i = N and i = 1 k θ i = 1 . Let the prior π 0 ( θ 1 , , θ k ) be Dirichlet ( α 1 , , α k ) . Then π 0 ( θ 1 , , θ k | x ) is D i r i c h l e t ( α 1 + x 1 , , α k + x k ) .
Let q ( θ 1 , , θ k ) D i r i c h l e t ( c α 1 , , c α k ) . We consider the observation x = ( 6 , 4 , 5 , 5 ) generated from Multinomial ( 20 , ( 1 / 4 , 1 / 4 , 1 / 4 , 1 / 4 ) ) . As in Example 1, we use Monte Carlo approach to compute curvature values. Table 5 reports values of the curvature for different values of α 1 , , α k and c. For the cases when α 1 = α 2 = α 3 = α 4 = 1 (uniform prior over [ 0 , 1 ] 4 ) and α 1 = α 2 = α 3 = α 4 = 0.5 (Jeffreys’ prior), the curvature values are prominently small.
Example 3
(Location normal model). Suppose that x = ( x 1 , x 2 , , x n ) is a sample from N ( θ , 1 ) distribution with θ R 1 . Let the prior π 0 ( θ ) of θ be N ( θ 0 , σ 0 2 ) . Then
π 0 ( θ | x ) N μ x , σ x 2 ,
μ x = θ 0 σ 0 2 + n x ¯ 1 σ 0 2 + n 1 and σ x 2 = 1 σ 0 2 + n 1 .
Let q ( θ ) N ( c θ 0 , σ 0 2 ) , c > 0 . Due to some interesting theoretical properties in this example, we present the exact formulas of the curvature for class (1) and class (2). We have
q ( θ ) π 0 ( θ ) = exp θ 0 θ ( c 1 ) + 0.5 θ 0 2 ( 1 c 2 ) σ 0 2 .
Therefore, for the class (1), we have
V a r π 0 ( θ | x ) q ( θ ) π 0 ( θ ) = E π 0 ( θ | x ) q ( θ ) π 0 ( θ ) 2 E π 0 ( θ | x ) q ( θ ) π 0 ( θ ) 2 = exp θ 0 2 ( 1 c 2 ) σ 0 2 [ M π 0 ( θ | x ) 2 θ 0 ( c 1 ) σ 0 2 M π 0 ( θ | x ) θ 0 ( c 1 ) σ 0 2 2 ] ,
where M π 0 ( θ | x ) ( t ) is the moment generating function with respect to the density π 0 ( θ | x ) . Thus, V a r π 0 ( θ | x ) q ( θ ) π 0 ( θ ) is equal to
exp θ 0 2 ( 1 c 2 ) σ 0 2 [ exp 2 θ 0 ( c 1 ) μ x σ 0 2 + 2 θ 0 2 ( c 1 ) 2 σ x 2 σ 0 4 exp { 2 θ 0 ( c 1 ) μ x σ 0 2 + θ 0 2 ( c 1 ) 2 σ x 2 σ 0 4 } ] = exp θ 0 2 ( 1 c 2 ) σ 0 2 exp 2 θ 0 ( c 1 ) μ x σ 0 2 exp θ 0 2 ( c 1 ) 2 σ x 2 σ 0 4 × exp θ 0 2 ( c 1 ) 2 σ x 2 σ 0 4 1 .
On the other hand, for the geometric contaminated class, we have
ln q ( θ ) π 0 ( θ ) = θ 0 θ ( c 1 ) + 0.5 θ 0 2 ( 1 c 2 ) σ 0 2 .
Thus, by (20), we get
V a r π 0 ( θ | x ) ln q ( θ ) π 0 ( θ ) = θ 0 2 ( c 1 ) 2 σ 0 4 V a r π 0 ( θ | x ) θ = θ 0 2 ( c 1 ) 2 σ 0 4 σ x 2 = θ 0 2 ( c 1 ) 2 σ 0 4 1 σ 0 2 + n 1 .
Interestingly, from (22), V a r π 0 ( θ | x ) ln q ( θ ) π 0 ( θ ) depends on the sample only through its size n. For fixed values of θ 0 and c, as n or σ 0 , V a r π 0 ( θ | x ) ln q ( θ ) π 0 ( θ ) 0 , which indicates robustness. Also, for fixed values of σ 0 and n, as θ 0 or c , V a r π 0 ( θ | x ) ln q ( θ ) π 0 ( θ ) and no robustness will be found.
Now we consider a numerical example by generating a sample of size n = 20 from N ( 4 , 1 ) distribution. We obtain
x = ( 3.37 , 4.18 , 3.16 , 5.59 , 4.32 , 3.17 , 4.48 , 4.73 , 4.57 , 3.69 , 5.51 , 4.38 , 3.37 , 1.78 , 5.12 , 3.95 , 3.98 , 4.94 , 4.82 , 4.59 )
(with t = x ¯ = 4.1905 ). Table 6 reports the values of the curvature for different values of θ 0 , σ 0 and c.
Clearly, for large values of σ 0 2 , the value of the curvature is small, which is an indication of robustness. For instance, for μ 0 = 0.5 in Table 6, that value of the curvature when σ 0 2 = 5 is much smaller than the value of the curvature when σ 0 2 = 1 .

5. Conclusions

Measuring Bayesian robustness of two classes of contaminated priors is studied. The approach is based on computing the curvature of Rényi divergence between posterior distributions. Two different proofs are given for the results. The first one is general and depends on a direct derivation of the curvatures. The second one uses the connection between ( h , ϕ ) divergence and ϕ divergence. The derived results do not require specifying values for ϵ and its computation is straightforward. Examples illustrating the approach are considered. Finally, it is possible to extend the results in this paper to other divergences. See, for instance, Liese and Vajda (1982) [24]. We leave this direction for future work.

Author Contributions

All authors have contributed equally on this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not available.

Acknowledgments

The authors thank the Editor, the Associate Editor and anonymous referees for their important and constructive comments that led to significant improvement of the paper. In particular, the connection between ( h , ϕ ) divergence and ϕ divergence is highly appreciated.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. De Robertis, L.; Hartigan, J.A. Bayesian inference using intervals of measures. Ann. Stat. 1981, 9, 235–244. [Google Scholar]
  2. Das Gupta, A.; Studden, W.J. Robust Bayesian Analysis and Optimal Experimental Designs in Normal Linear Models with Many Parameters I; Tech. Report; Department of Statistics, Purdue University: West Lafayette, IN, USA, 1988. [Google Scholar]
  3. Das Gupta, A.; Studden, W.J. Variations in Posterior Measures for Priors in a Band: Effect of Additional Restrictions; Tech. Report; Department of Statistics, Purdue University: West Lafayette, IN, USA, 1988. [Google Scholar]
  4. Berger, J. The robust Bayesian viewpoint (with discussion). In Robustness in Baysian Statistics; Kadane, J., Ed.; Springer: Amsterdam, The Netherlands, 1984. [Google Scholar]
  5. Berger, J. Robust Bayesian analysis: Sensitivity to the prior. J. Stat. Plan. Inference 1990, 25, 303–328. [Google Scholar] [CrossRef]
  6. Berger, J.; Berliner, L.M. Robust Bayes and empirical Bayes analysis with c-contaminated priors. Ann. Stat. 1986, 14, 461–486. [Google Scholar] [CrossRef]
  7. Sivaganesan, S.; Berger, J. Ranges of posterior measures for priors with unimodal contaminations. Ann. Stat. 1989, 17, 868–889. [Google Scholar] [CrossRef]
  8. Wasserman, L. A robust Bayesian interpretation of likelihood regions. Ann. Stat. 1989, 17, 1387–1393. [Google Scholar] [CrossRef]
  9. Dey, D.K.; Birmiwal, L.R. Robust Bayesian analysis using divergence measures. Stat. Probab. Lett. 1994, 20, 287–294. [Google Scholar] [CrossRef]
  10. Al-Labadi, L.; Evans, M. Optimal robustness results for relative belief inferences and the relationship to prior-data conflict. Bayesian Anal. 2017, 12, 705–728. [Google Scholar] [CrossRef]
  11. Evans, M. Measuring Statistical Evidence Using Relative Belief; Monographs on Statistics and Applied Probability, 144; CRC Press, Taylor & Francis Group: Boca Raton, FL, USA, 2015. [Google Scholar]
  12. Gelfand, A.E.; Dey, D.K. On measuring Bayesian robustness of contaminated classes of priors. Stat. Decis. 1991, 9, 63–80. [Google Scholar]
  13. Kanaya, F.; Han, T.S. The asymptotics of posterior entropy and error probability for Bayesian estimation. IEEE Trans. Inf. Theory 1995, 41, 1988–1992. [Google Scholar] [CrossRef]
  14. Jenssen, R.; Hild, K.E.; Erdogmus, D.; Principe, J.C.; Eltoft, T. Clustering using Rényi’s entropy. In Proceedings of the International Joint Conference on Neural Networks, Portland, OR, USA, 20–24 July 2003; pp. 523–528. [Google Scholar]
  15. Bentes, S.R.; Menezes, R.; Mendes, D.A. Long memory and volatility clustering: Is the empirical evidence consistent across stock markets? Phys. A Stat. Mech. Appl. 2008, 387, 3826–3830. [Google Scholar] [CrossRef] [Green Version]
  16. Lake, D.E. Renyi entropy measures of heart rate Gaussianity. IEEE Trans. Biomed. Eng. 2006, 53, 21–27. [Google Scholar] [CrossRef] [PubMed]
  17. Menéndez, M.L.; Morales, D.; Pardo, L.; Salicrú, M. Asymptotic behavior and statistical applications of divergence measures in multinomial populations: A unified study. Stat. Pap. 1995, 36, 1–29. [Google Scholar] [CrossRef]
  18. Pardo, L. Statistical Inference Based on Divergence Measures; Chapman & Hall/CRC: Boca Raton, FL, USA, 2006. [Google Scholar]
  19. Rényi, A. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics; University of California Press: Berkeley, CA, USA, 1961; pp. 547–561. [Google Scholar]
  20. van Erven, T.; Harremoës, P. Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inf. Theory 2014, 60, 3797–3820. [Google Scholar] [CrossRef] [Green Version]
  21. Li, Y.; Turner, R.E. Rényi Divergence Variational Inference. arxiv 2016, arXiv:1602.02311. [Google Scholar]
  22. McCulloch, R. Local prior influence. J. Am. Stat. Assoc. 1989, 84, 473–478. [Google Scholar] [CrossRef]
  23. Lehmann, E.L.; Casella, G. Theory of Point Estimation, 2nd ed.; Springer: New York, NY, USA, 1998. [Google Scholar]
  24. Liese, F.; Vajda, I. Convex Statistical Distances; Teubner-Texte zur Mathematik, Band 95; Teubner: Leipzig, Germany, 1987. [Google Scholar]
Table 1. Values of the local curvature for two classes Γ a and Γ g for a sample generated from Bernoulli(0.5).
Table 1. Values of the local curvature for two classes Γ a and Γ g for a sample generated from Bernoulli(0.5).
n α β c a = 0.5 a = 1 a = 2
C a Γ a C a Γ g C a Γ a C a Γ g C a Γ a C a Γ g
20 0.5 0.5 0.5 8 × 10 5 0.0002 0.0001 0.0004 0.0003 0.0008
1000000
1.5 0.0003 0.0002 0.0006 0.0004 0.0013 0.0008
3 0.0098 0.0033 0.0196 0.0067 0.0393 0.0135
5 0.0531 0.0135 0.1062 0.0271 0.2125 0.0543
1 1 0.5 0.0003 0.0007 0.0007 0.0015 0.0014 0.0030
1000000
1.5 0.0010 0.0007 0.0021 0.0015 0.0042 0.0030
3 0.0241 0.0121 0.0483 0.0243 0.0967 0.0486
5 0.1065 0.0486 0.2130 0.0972 0.4260 0.1945
1 3 0.5 0.0265 0.0235 0.0530 0.0470 0.1060 0.0941
1000000
1.5 0.0171 0.0235 0.0342 0.0470 0.0684 0.0941
3 0.1061 0.3767 0.2122 0.7535 0.4244 1.5070
5 0.1660 1.5070 0.3320 3.0141 0.6641 6.0282
3 1 0.5 0.0089 0.0113 0.0179 0.0227 0.0133 0.0454
1000000
1.5 0.0108 0.0113 0.0216 0.0227 0.0433 0.0454
3 0.1162 0.1819 0.2324 0.3638 0.4648 0.7277
5 0.2774 0.7277 0.5548 1.4555 1.1096 2.9110
50 0.5 0.5 0.5 10 5 4 × 10 5 3 × 10 5 8 × 10 5 6 × 10 5 0.0001
1000000
1.5 6 × 10 5 4 × 10 5 0.0001 8 × 10 5 0.00020.0001
30.00220.00060.00440.00130.00890.0026
50.01390.00260.02790.00520.05590.0104
1 1 0.5 6 × 10 5 0.00010.00010.00030.00020.0006
1000000
1.50.00020.00010.00040.00030.00090.0006
30.00660.00240.01320.00490.02650.0099
50.03590.00990.07180.01980.14370.0397
1 3 0.50.01060.01120.02120.02250.04250.0451
1000000
1.50.00870.01120.01740.02250.03490.0451
30.04900.18050.09800.36100.19600.7221
50.05350.72210.10701.44420.21402.8885
3 1 0.50.00420.00600.00840.01210.01690.0243
1000000
1.50.00610.00600.01230.01210.02470.0243
30.06720.09720.13440.19440.26880.3889
50.14070.38890.28140.77790.56281.5559
Table 2. Values of d 0 and p in (6) (for a 1 ) and (7) (for a = 1 ) under class (1) for a sample generated from Bernoulli(0.5).
Table 2. Values of d 0 and p in (6) (for a 1 ) and (7) (for a = 1 ) under class (1) for a sample generated from Bernoulli(0.5).
α β c a = 0.5 a = 1 a = 2
ϵ = 0 . 05 ϵ = 0 . 5 ϵ = 1 ϵ = 0 . 05 ϵ = 0 . 5 ϵ = 1 ϵ = 0 . 05 ϵ = 0 . 5 ϵ = 1
0.5 0.5 0.5 d 0 2 × 10 7 4 × 10 6 9 × 10 5 5 × 10 7 3 × 10 5 0.0002 10 6 7 × 10 5 0.0004
p(0.5003)(0.5022)(0.51)(0.5005)(0.5042)(0.5107)(0.5003)(0.5041)(0.5106)
1 d 0 000000000
p(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)
1.5 d 0 2 × 10 6 4 × 10 5 0.0001 2 × 10 7 5 × 10 5 0.0001 3 × 10 7 0.0001 0.0003
p(0.5013)(0.5068)(0.5104)(0.5003)(0.5054)(0.5098)(0.5003)(0.5053)(0.5096)
3 d 0 4 × 10 6 0.0004 0.0015 10 5 0.0012 0.0028 3 × 10 5 0.0023 0.0054
p(0.5022)(0.5204)(0.5393)(0.5031)(0.5244)(0.5379)(0.5030)(0.5239)(0.5367)
5 d 0 5 × 10 5 0.0019 0.0055 0.0001 0.0048 0.0102 0.0002 0.0090 0.0181
p(0.5071)(0.5437)(0.5741)(0.5074)(0.5493)(0.5711)(0.5074)(0.5476)(0.5676)
1 1 0.5 d 0 7 × 10 7 5 × 10 5 0.0003 10 6 0.0001 0.0008 3 × 10 6 0.0002 0.0017
p(0.5007)(0.5071)(0.5193)(0.5009)(0.5083)(0.5204)(0.5007)(0.5084)(0.5207)
1 d 0 000000000
p(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)
1.5 d 0 2 × 10 7 7 × 10 5 0.0003 10 6 0.0002 0.0006 2 × 10 6 0.0003 0.0013
p(0.5003)(0.5084)(0.5193)(0.5008)(0.5100)(0.5185)(0.5007)(0.51)(0.5180)
3 d 0 10 5 0.0013 0.0050 5 × 10 5 0.0034 0.0092 0.0001 0.0065 0.0165
p(0.5042)(0.5364)(0.5706)(0.5050)(0.5416)(0.5677)(0.505)(0.5405)(0.5645)
5 d 0 8 × 10 5 0.0050 0.0167 0.00020.01240.02970.00040.02250.0494
p(0.5092)(0.5708)(0.6279)(0.5107)(0.5785)(0.6201)(0.5106)(0.5755)(0.6125)
1 3 0.5 d 0 2 × 10 5 0.0032 0.0133 7 × 10 5 0.0067 0.0282 0.0001 0.0145 0.0623
p(0.5053)(0.5565)(0.6143)(0.5059)(0.5580)(0.6171)(0.5060)(0.5604)(0.6268)
1 d 0 000000000
p(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)
1.5 d 0 2 × 10 5 0.00230.0104 3 × 10 5 0.00450.0199 7 × 10 5 0.00880.0370
p(0.505)(0.5484)(0.6015)(0.5044)(0.5476)(0.5989)(0.5044)(0.5472)(0.5971)
p(0.5081)(0.5846)(0.6878)(0.5077)(0.5834)(0.6795)(0.5077)(0.5833)(0.6793)
3 d 0 0.00010.0175012130.00020.03490.21250.00050.06910.3421
p(0.5119)(0.6308)(0.8181)(0.5115)(0.6299)(0.7942)(0.5117)(0.6337)(0.8193)
5 d 0 0.00020.03080.34230.00040.06380.55190.00080.13370.6003
p(0.5145)(0.6715)(0.9536)(0.5146)(0.6731)(0.9087)(0.5144)(0.6891)(0.9535)
3 1 0.5 d 0 7 × 10 6 0.00120.0063 2 × 10 5 0.00270.0135 5 × 10 5 0.00570.0295
p(0.5026)(0.5356)(0.5791)(0.5036)(0.5369)(0.5816)(0.5034)(0.5379)(0.5866)
1 d 0 000000000
p(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)
1.5 d 0 10 5 0.00130.0051 2 × 10 5 0.00250.0096 4 × 10 5 0.00480.0180
p(0.5040)(0.5364)(0.5713)(0.5034)(0.5354)(0.5692)(0.5032)(0.535)(0.5674)
3 d 0 0.00010.01390.06000.00020.02860.10540.00050.05050.1711
p(0.5125)(0.6168)(0.7342)(0.5117)(0.6143)(0.7180)(0.5119)(0.6137)(0.7160)
5 d 0 0.00030.03400.17240.00060.06570.27860.00120.12310.4062
p(0.5196)(0.68)(0.865)(0.5183)(0.6754)(0.8268)(0.5177)(0.6809)(0.8539)
Table 3. Values of d 0 and p in (6) (for a 1 ) and (7) (for a = 1 ) under class (2) for a sample generated from Bernoulli(0.5).
Table 3. Values of d 0 and p in (6) (for a 1 ) and (7) (for a = 1 ) under class (2) for a sample generated from Bernoulli(0.5).
α β c a = 0.5 a = 1 a = 2
ϵ = 0 . 05 ϵ = 0 . 5 ϵ = 1 ϵ = 0 . 05 ϵ = 0 . 5 ϵ = 1 ϵ = 0 . 05 ϵ = 0 . 5 ϵ = 1
0.5 0.5 0.5 d 0 2 × 10 7 2 × 10 5 9 × 10 5 10 6 5 × 10 5 0.0002 2 × 10 6 0.00010.0004
p(0.5003)(0.5043)(0.51)(0.5007)(0.5054)(0.5107)(0.5007)(0.5053)(0.5106)
1 d 0 000000000
p(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)
1.5 d 0 7 × 10 7 3 × 10 5 0.0001 3 × 10 8 4 × 10 5 0.0001 6 × 10 8 9 × 10 5 0.0003
p(0.5007)(0.5053)(0.5104)(0.5001)(0.5048)(0.5098)(0.5)(0.505)(0.5096)
3 d 0 6 × 10 6 0.00040.0015 6 × 10 6 0.00070.0028 10 5 0.00140.0054
p(0.5023)(0.5204)(0.5393)(0.5017)(0.5195)(0.5379)(0.5014)(0.5191)(0.5367)
5 d 0 2 × 10 5 0.00150.0055 2 × 10 5 0.00280.0102 5 × 10 5 0.00540.0181
p(0.5045)(0.5393)(0.5741)(0.5038)(0.5379)(0.5711)(0.5036)(0.5367)(0.5676)
1 1 0.5 d 0 6 × 10 8 8 × 10 5 0.0003 2 × 10 6 0.00020.0008 5 × 10 6 0.00040.0017
p(0.5)(0.5095)(0.5193)(0.5012)(0.5101)(0.5204)(0.5011)(0.5103)(0.5207)
1 d 0 000000000
p(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)
1.5 d 0 10 6 0.0001 0.0003 8 × 10 7 0.00010.0006 10 6 0.00030.0013
p(0.5013)(0.51)(0.5193)(0.5006)(0.5093)(0.5185)(0.5007)(0.5093)(0.5180)
3 d 0 10 5 0.00140.0050 2 × 10 5 0.00260.0092 5 × 10 5 0.00480.0165
p(0.5043)(0.5373)(0.5706)(0.5035)(0.5360)(0.5677)(0.5037)(0.535)(0.5645)
5 d 0 6 × 10 5 0.00500.01670.00010.00920.02970.00020.01650.0494
p(0.5081)(0.5706)(0.6279)(0.5074)(0.5677)(0.6201)(0.5073)(0.5645)(0.6125)
1 3 0.5 d 0 2 × 10 5 0.00300.0133 6 × 10 5 0.00640.02820.00010.01350.0623
p(0.505)(0.5555)(0.6143)(0.5056)(0.5566)(0.6171)(0.5054)(0.5583)(0.6268)
1 d 0 000000000
p(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)
1.5 d 0 3 × 10 5 0.00280.0104 5 × 10 5 0.00530.01990.00010.01030.0370
p(0.5059)(0.5527)(0.6015)(0.5022)(0.5517)(0.5989)(0.5053)(0.5509)(0.5971)
3 d 0 0.00040.03730.12130.00080.06900.21250.00170.12100.3421
p(0.5216)(0.6878)(0.8181)(0.5211)(0.6795)(0.7942)(0.5209)(0.6793)(0.8193)
5 d 0 0.00180.12130.34230.00340.21250.55190.00670.34210.6003
p(0.5425)(0.8181)(0.9536)(0.5417)(0.7942)(0.9087)(0.5411)(0.8193)(0.9535)
3 1 0.5 d 0 10 5 0.00140.0063 3 × 10 5 0.00310.0135 6 × 10 5 0.00650.0295
p(0.5031)(0.5381)(0.5791)(0.5040)(0.5394)(0.5816)(0.5039)(0.5403)(0.5866)
1 d 0 000000000
p(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)
1.5 d 0 10 5 0.00140.0052 2 × 10 5 0.00250.0096 4 × 10 5 0.00490.0180
p(0.5041)(0.5376)(0.5720)(0.5034)(0.5359)(0.5692)(0.5033)(0.5353)(0.5674)
3 d 0 0.00020.01850.06040.00040.03380.10540.00080.05960.1711
p(0.5153)(0.6341)(0.735)(0.5145)(0.6278)(0.7180)(0.5143)(0.6239)(0.7160)
5 d 0 0.00080.06040.17240.00160.10540.27860.00320.17110.4074
p(0.53)(0.735)(0.865)(0.5289)(0.7180)(0.8268)(0.5284)(0.7160)(0.8545)
Table 4. Values of the local curvature for the two classes Γ a and Γ g for the real data set AIDS.
Table 4. Values of the local curvature for the two classes Γ a and Γ g for the real data set AIDS.
n α β c a = 0.5 a = 1 a = 2
C a Γ a C a Γ g C a Γ a C a Γ g C a Γ a C a Γ g
20 0.5 0.5 0.50.00010.00040.00030.00080.00060.0016
1000000
1.50.00060.00040.00120.00080.00250.0016
30.01740.00650.03480.01300.06970.0260
50.08760.02600.17520.05210.35040.1043
1 1 0.50.00070.00140.00140.00280.00290.0057
1000000
1.50.00190.00140.00380.00280.00760.0057
30.03950.02290.07910.04580.15830.0916
50.15780.09160.31560.18320.63120.3665
1 3 0.50.00490.00710.00990.01430.01980.0286
1000000
1.50.00750.00710.01500.01430.03010.0286
30.09950.11460.19910.22930.39820.4586
50.27990.45860.55990.91731.11981.8346
3 1 0.50.04570.03190.09150.06380.18310.1277
1000000
1.50.01950.03190.03910.06380.07820.1277
30.08550.51110.17101.02230.34202.0446
50.10302.04460.20604.08920.41218.1784
2843 0.5 0.5 0.5 9 × 10 7 2 × 10 6 10 6 5 × 10 6 3 × 10 6 10 5
1000000
1.5 4 × 10 6 2 × 10 6 8 × 10 6 5 × 10 6 10 5 10 5
30.0001 4 × 10 5 0.0003 8 × 10 5 0.00060.0001
50.00090.00010.00190.00030.00380.0006
1 1 0.5 4 × 10 6 10 5 9 × 10 6 2 × 10 5 10 5 4 × 10 5
1000000
1.5 10 5 10 5 3 × 10 5 2 × 10 5 6 × 10 5 4 × 10 5
30.00040.00010.00090.00030.00180.0006
50.00250.00060.00510.00130.01020.0027
1 3 0.50.00050.00040.00100.00080.00210.0016
1000000
1.50.00020.00040.00040.00080.00080.0016
30.00020.00640.00040.01290.00090.0259
5 10 5 0.0259 3 × 10 5 0.0518 7 × 10 5 0.1037
3 1 0.5 2 × 10 5 5 × 10 5 5 × 10 5 0.00010.00010.0002
1000000
1.5 6 × 10 5 5 × 10 5 0.00010.00010.00020.0002
30.00140.00080.00290.00160.00580.0032
50.00540.00320.01080.00640.02160.0129
Table 5. Values of the local curvature for two classes Γ a and Γ g for a sample generated from Mn(20,(1/4,1/4,1/4,1/4)).
Table 5. Values of the local curvature for two classes Γ a and Γ g for a sample generated from Mn(20,(1/4,1/4,1/4,1/4)).
α 1 α 4 c a = 0.5 a = 1 a = 2
C a Γ a C a Γ g C a Γ a C a Γ g C a Γ a C a Γ g
0.25 0.25 0.25 0.25 0.5 2 × 10 5 0.0006 5 × 10 5 0.0012 0.0001 0.0024
1000000
1.5 0.0031 0.0006 0.0062 0.0012 0.0124 0.0024
3 0.5285 0.0097 1.0570 0.0195 2.1141 0.0390
5 8.4050 0.0301 16.816 0.0780 33.632 0.1560
0.5 0.5 0.5 0.5 0.5 0.0001 0.0021 0.0003 0.0043 0.0004 0.0087
1000000
1.5 0.0080 0.0021 0.0161 0.0043 0.0323 0.0087
3 0.7706 0.0349 1.5413 0.0699 3.0826 0.1398
5 8.0246 0.1398 16.049 0.2797 32.098 0.5595
1 1 1 1 0.5 0.0008 0.0071 0.0017 0.0142 0.0035 0.0284
1000000
1.5 0.0185 0.0071 0.0370 0.0142 0.0741 0.0284
3 0.9799 0.1137 1.9598 0.2274 3.9196 0.4549
5 6.7661 0.4549 13.532 0.9098 27.064 1.8197
2 1 1 1 0.5 0.0018 0.0120 0.0037 0.0240 0.0074 0.0480
1000000
1.5 0.0270 0.0120 0.0540 0.0240 0.1081 0.0480
3 1.1052 0.1923 2.2104 0.3847 4.4209 0.7695
5 6.3984 0.7695 12.796 1.5390 25.593 3.0780
Table 6. Values of the local curvature for two classes Γ a and Γ g for a sample generated from N(4,1).
Table 6. Values of the local curvature for two classes Γ a and Γ g for a sample generated from N(4,1).
θ 0 σ 0 2 c a = 0.5 a = 1 a = 2
C a Γ a C a Γ g C a Γ a C a Γ g C a Γ a C a Γ g
0.1 0.1 0.5 0.0001 0.0059 0.0002 0.0119 0.0004 0.0238
1000000
1.5 0.2908 0.0059 0.5816 0.0119 1.1633 0.0238
3 498 , 033.7 0.0953 996 , 067.4 0.1907 1 , 992 , 135 0.3814
5 8 × 10 12 0.3814 10 13 0.7629 3 × 10 13 1.5258
0.5 1 0.5 0.0002 0.0014 0.0004 0.0029 0.0009 0.0059
1000000
1.5 0.0081 0.0014 0.0162 0.0029 0.0325 0.0059
3 10.629 0.0238 21.258 0.0476 42.517 0.0953
5 2964.9 0.0935 2929.8 0.1907 11 , 859.7 0.3814
0.5 5 0.5 4 × 10 5 5 × 10 5 8 × 10 5 0.0001 0.0001 0.0002
1000000
1.5 8 × 10 5 5 × 10 5 0.0001 0.0001 0.0003 0.0002
3 0.0031 0.0009 0.0063 0.0019 0.0127 0.0038
5 0.0288 0.0038 0.0576 0.0076 0.1152 0.0152
4 5 0.5 0.0001 0.0038 0.0029 0.0076 0.0059 0.0152
1000000
1.5 0.0020 0.0038 0.0040 0.0076 0.0080 0.0152
3 3 × 10 7 0.0610 7 × 10 7 0.1220 10 6 0.2441
5 9 × 10 23 0.2441 10 22 0.4882 3 × 10 22 0.9765
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Al-Labadi, L.; Asl, F.F.; Wang, C. Measuring Bayesian Robustness Using Rényi Divergence. Stats 2021, 4, 251-268. https://doi.org/10.3390/stats4020018

AMA Style

Al-Labadi L, Asl FF, Wang C. Measuring Bayesian Robustness Using Rényi Divergence. Stats. 2021; 4(2):251-268. https://doi.org/10.3390/stats4020018

Chicago/Turabian Style

Al-Labadi, Luai, Forough Fazeli Asl, and Ce Wang. 2021. "Measuring Bayesian Robustness Using Rényi Divergence" Stats 4, no. 2: 251-268. https://doi.org/10.3390/stats4020018

Article Metrics

Back to TopTop