Next Article in Journal
Simon’s Algorithm in the NISQ Cloud
Previous Article in Journal
Enhanced Video Anomaly Detection Through Dual Triplet Contrastive Loss for Hard Sample Discrimination
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bayesian Discrepancy Measure: Higher-Order and Skewed Approximations

1
Barcelona School of Economics, Universitat Pompeu Fabra, 08005 Barcelona, Spain
2
Department of Mathematics and Computer Science, University of Cagliari, 09124 Cagliari, Italy
3
Department of Statistical Sciences, University of Padova, 35121 Padova, Italy
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(7), 657; https://doi.org/10.3390/e27070657
Submission received: 29 April 2025 / Revised: 16 June 2025 / Accepted: 17 June 2025 / Published: 20 June 2025

Abstract

:
The aim of this paper is to discuss both higher-order asymptotic expansions and skewed approximations for the Bayesian discrepancy measure used in testing precise statistical hypotheses. In particular, we derive results on third-order asymptotic approximations and skewed approximations for univariate posterior distributions, including cases with nuisance parameters, demonstrating improved accuracy in capturing posterior shape with little additional computational cost over simple first-order approximations. For third-order approximations, connections to frequentist inference via matching priors are highlighted. Moreover, the definition of the Bayesian discrepancy measure and the proposed methodology are extended to the multivariate setting, employing tractable skew-normal posterior approximations obtained via derivative matching at the mode. Accurate multivariate approximations for the Bayesian discrepancy measure are then derived by defining credible regions based on an optimal transport map that transforms the skew-normal approximation to a standard multivariate normal distribution. The performance and practical benefits of these higher-order and skewed approximations are illustrated through two examples.

1. Introduction

Bayesian inference often relies on asymptotic arguments, leading to approximate methods that frequently assume a parametric form for the posterior distribution. In particular, a Gaussian distribution provides a convenient density for a first-order approximation. This practice is formally justified under regularity conditions by the Bernstein–von Mises theorem. However, this approximation fails to capture potential skewness and asymmetry in the posterior distribution. To avoid this drawback, starting from third-order expansions of Laplace’s method for the posterior distributions (see, e.g., [1,2,3], and references therein), possible alternatives are as follows:
  • Higher-order asymptotic approximations: These offer improved accuracy at minimal additional computational costs compared to first-order approximations, and are applicable to posterior distributions and quantities of interest such as tail probabilities and credible regions (see, e.g., [4], and references therein);
  • To use skewed approximations for the posterior distribution, theoretically justified by a skewed Bernstein–von Mises theorem (see, e.g., [5,6], and references therein).
The aim of this contribution is to discuss higher-order expansions and skew-symmetric approximations for the Bayesian discrepancy measure (BDM) proposed in [7] for testing precise statistical hypotheses. Specifically, the BDM assesses the compatibility of a given hypothesis with the available information (prior and data). To summarize this information, the posterior median is used, providing a straightforward evaluation of the discrepancy with the null hypothesis. The BDM possesses desirable properties such as consistency and invariance under reparameterization, making it a robust measure of evidence.
For a scalar parameter of interest, even with nuisance parameters, computing the BDM involves evaluating tail areas of the posterior or marginal posterior distribution. A first-order Gaussian approximation can be used, but it may be inaccurate, especially with small sample sizes or many nuisance parameters, since it fails to account for potential posterior asymmetry and skewness. In this respect, the aim of this paper is to provide higher-order asymptotic approximations and skewed asymptotic approximations for the BDM. For the third-order approximations, connections with frequentist inference are highlighted when using objective matching priors.
Also, for multidimensional parameters, while a first-order Gaussian approximation of the posterior distribution can be used to calculate the BDM, it still fails to account for potential posterior asymmetry and skewness. In this respect, this paper also addresses higher-order asymptotic approximations and skewed approximations for the BDM. The latter ones are based on an optimal transport map (see [8,9]), which transforms the skew-normal approximation to a standard multivariate normal distribution.
This paper is organized as follows. Section 2 provides some background for the BDM for a scalar parameter of interest, even with nuisance parameters, and extends the definition to the multivariate framework. Section 3 illustrates higher-order Bayesian approximations for the BDM; connections with frequentist inference are highlighted when using objective matching priors. Section 4 discusses skewed approximations for the posterior distribution and for the BDM, theoretically justified by a skewed Bernstein–von Mises theorem, with new insights into the multivariate framework. Two examples are discussed in Section 5. Finally, some concluding remarks are given in Section 6.

2. Background

Consider a sampling model f ( y ; θ ) , indexed by a parameter θ Θ R d , d 1 , and let L ( θ ) = L ( θ ; y ) = exp { ( θ ) } be the likelihood function based on a random sample y = ( y 1 , , y n ) of size n. Given a prior density π ( θ ) for θ , Bayesian inference for θ is based on the posterior density π ( θ | y ) π ( θ ) L ( θ ) .
In several applications, it is of interest to test the precise (or sharp) null hypothesis
H 0 : θ = θ 0
against H 1 : θ θ 0 . In Bayesian hypothesis testing, the usual approach relies on the well-known Bayes factor (BF), which measures the ratio of posterior to prior odds in favor of the null hypothesis H 0 . Typically, a high BF, or the weight of evidence W = log (BF), provides support for H 0 . However, improper priors can lead to an undetermined BF, and in the context of precise null hypotheses, the BF can be subject to the Jeffreys–Lindley paradox. This paradox highlights a critical divergence between frequentist and Bayesian approaches, that is, as the sample size increases, a p-value can become arbitrarily small, leading to the rejection of the null hypothesis, while the BF can simultaneously provide overwhelming evidence in favor of the same precise null. This typically occurs when the alternative hypothesis is associated with a diffuse prior distribution for the parameter of interest. With such priors, the BF tends to favor the simpler H 0 because, while data might be unlikely under H 0 , it may also be poorly supported by any specific value within the diffuse alternative space. Furthermore, the BF is not well-calibrated, as its finite sampling distribution is generally unknown and may depend on nuisance parameters. To address these limitations, recent research has explored alternative Bayesian measures of evidence for precise null hypothesis testing, including the e-value (see, e.g., [10,11,12] and references therein) and the BDM [7]. In the following, we focus on the Bayesian discrepancy measure of evidence proposed in [7] (see also [13]).

2.1. Scalar Case

The BDM gives an absolute evaluation of a hypothesis, H 0 , in light of prior knowledge about the parameter and observed data. In the absolutely continuous case, for testing (1), the BDM is defined as
δ H = 1 2 min θ 0 π ( θ | y ) d θ , 1 θ 0 π ( θ | y ) d θ .
The quantity min { θ 0 π ( θ | y ) d θ , 1 θ 0 π ( θ | y ) d θ } can be interpreted as the posterior probability of a “tail” event concerning only the precise hypothesis H 0 . Doubling this “tail” probability, related to the precise hypothesis H 0 , one obtains a posterior probability assessment about how “central” hypothesis H 0 is, and, hence, how it is supported by the prior and the data. This interpretation is related to an alternative definition for δ H . Let θ m be the posterior median and consider the interval defined as I E = ( θ 0 , + ) if θ m < θ 0 or as I E = ( , θ 0 ) if θ 0 < θ m . Then, the BDM of the hypothesis H 0 can be computed as
δ H = 1 2 P ( θ I E | y ) = 1 2 I E π ( θ | y ) d θ .
Note that the quantity 2 P ( θ I E | y ) represents the posterior probability of an equi-tailed credible interval for θ .
The Bayesian discrepancy test assesses hypothesis H 0 based on the BDM. High values of δ H indicate strong evidence against H 0 , whereas low values suggest data consistency with H 0 . Under H 0 , for large sample sizes, δ H is asymptotically uniformly distributed on [ 0 , 1 ] . Conversely, when H 0 is false, δ H tends to 1 in probability. While thresholds can be set to interpret δ H , in line with the ASA statement, we agree with Fisher that significance levels should be tailored to each case based on evidence and ideas.
The BDM remains invariant under invertible monotonic reparametrizations. Under general regularity conditions and assuming Cromwell’s rule for prior selection, δ H exhibits specific properties, as follows: (1) if θ 0 = θ t (the true value of the parameter), δ H tends toward a uniform distribution as the sample size increases; (2) if θ 0 θ t , δ H converges to 1 in probability. Furthermore, using a matching prior, δ H is exactly uniformly distributed for all sample sizes.
The practical computation of δ H requires the evaluation of the tail areas of the following form:
P ( θ θ 0 | y ) = θ 0 π ( θ | y ) d θ .
The derivation of a first-order tail area approximation is simple since it uses a Gaussian approximation. With this approximation, a first-order approximation for δ H when testing (1) is simply given by
δ H = ˙ 2 Φ θ 0 θ ^ j ( θ ^ ) 1 1 ,
where θ ^ is the maximum likelihood estimate (MLE) of θ , j ( θ ) = ( 2 ) ( θ ) = 2 ( θ ) / θ 2 is the observed information, the symbol “ = ˙ ” indicates that the approximation is accurate to O ( n 1 / 2 ) , and Φ ( · ) is the standard normal distribution function. Thus, to first order, δ H agrees numerically with the 1 p -value based on the Wald statistic w ( θ ) = ( θ ^ θ ) / j ( θ ^ ) 1 / 2 and also with the first-order approximation of the e-value (see, e.g., [14]). In practice, the approximation (5) of δ H may be inaccurate, particularly for a small sample size, because it forces the posterior distribution to be symmetric.

2.2. Nuisance Parameters

In most applications, θ is partitioned as θ = ( ψ , λ ) , where ψ is a scalar parameter of interest and λ is a ( d 1 ) —dimensional nuisance parameter, and it is of interest to test the precise (or sharp) null hypothesis
H 0 : ψ = ψ 0
against H 1 : ψ ψ 0 . In the absolutely continuous case, for testing (6) in the presence of nuisance parameters, the BDM is defined as
δ H = 1 2 min ψ 0 π m ( ψ | y ) d ψ , 1 ψ 0 π m ( ψ | y ) d ψ ,
where π m ( ψ | y ) is the marginal posterior density for ψ , given by
π m ( ψ | y ) = π ( ψ , λ | y ) d λ π ( ψ , λ ) L ( ψ , λ ) d λ .
Also in this framework, the practical computation of δ H requires the evaluation of tail areas of the following form:
P m ( ψ ψ 0 | y ) = ψ 0 π m ( ψ | y ) d ψ .
The derivation of a first-order tail area approximation is still simple since it uses a Gaussian approximation. Let p ( ψ ) = log L ( ψ , λ ^ ψ ) be the profile log-likelihood for ψ , where λ ^ ψ denotes the constrained MLE of λ given ψ . Moreover, let ( ψ ^ , λ ^ ) be the full MLE, and let j p ( ψ ) = p ( 2 ) ( ψ ) = 2 p ( ψ ) / ψ 2 be the profile observed information. A first-order approximation for δ H when testing (6) is simply given by
δ H = ˙ 2 Φ ψ 0 ψ ^ j p ( ψ ^ ) 1 1 .
Thus, to first order, δ H agrees numerically with 1 p -value based on the profile Wald statistic w p ( ψ ) = ( ψ ^ ψ ) / j p ( ψ ^ ) 1 / 2 . In practice, as in the scalar parameter case, the approximation (5) of δ H may be inaccurate, particularly for a small sample size or a large number of nuisance parameters, since it fails to account for potential posterior asymmetry and skewness.

2.3. The Multivariate Case

Extending the definition of the BDM to the multivariate setting, where θ Θ R d with d > 1 , presents some challenges. The core concepts of the univariate definition rely on the unique ordering of the real line and the uniquely defined median, which splits the probability mass into two equal halves (tail areas). In R d , with d > 1 , there is no natural unique ordering, and concepts like the median and “tail areas” relative to a specific point θ 0 lack a single, universally accepted definition. Despite these challenges, the fundamental goal remains the same, that is, to quantify how consistent the hypothesized value θ 0 is with the posterior distribution π ( θ | y ) ; specifically measuring how “central” or, conversely, how “extreme” θ 0 lies within the posterior distribution.
Utilizing the notion of center-outward quantile functions ([8,9]), a concept from recent multivariate statistics, provides a theoretically appealing way to define the multivariate BDM. Let F P ± : R d B d be the center-outward distribution function mapping the posterior distribution P θ (with density π ( θ | y ) ) to the uniform distribution U d on the unit ball B d . More precisely, the center-outward distribution function  F P ± : R d B d is defined as the almost-everywhere unique gradient of a convex function that pushes a distribution P θ forward to the uniform distribution U d on the unit ball B d in R d . That is,
F P ± : = g , such that F P ± # P θ = U d .
The center-outward quantile function  Q P ± is defined as the (continuous) inverse of F P ± , i.e.,
Q P ± : = ( F P ± ) 1 .
It maps the open unit ball B d (minus the origin) to R d ( F P ± ) 1 ( 0 ) and satisfies
Q P ± # U d = P θ .
For τ ( 0 , 1 ) , we define the center-outward quantile region of order τ as
R P ± ( τ ) : = Q P ± ( τ B d ) ,
and the center-outward quantile contour of order τ as
C P ± ( τ ) : = Q P ± ( τ S d 1 ) ,
where S d 1 is the unit sphere in R d . When d = 1 , this coincides with the rescaled univariate cumulative distribution function F P ± ( x ) = 2 F P ( x ) 1 and the BDM (7) can be expressed as
δ H = | F P ± ( θ 0 ) | .
This measures the (rescaled) distance of the quantile rank of θ 0 from the center point (corresponding to rank 0). Generalizing this, we can define the multivariate BDM for the hypothesis H 0 : θ = θ 0 as
δ H = F P ± ( θ 0 ) ,
where · denotes the standard Euclidean norm in R d . Here, F P ± ( θ 0 ) maps the point θ 0 to a location u within the unit ball B d . This definition has desirable properties (see [8]):
  • It yields a value between 0 and 1;
  • δ H = 0 if θ 0 corresponds to the geometric center (or multivariate median) of the distribution (mapped to 0 by F P ± );
  • δ H increases as θ 0 moves away from the center toward the “boundary” of the distribution, approaching 1 for points mapped near the surface of the unit ball S d 1 ;
  • It is invariant under suitable classes of transformations (affine transformations if P θ is elliptically contoured, more generally under monotone transformations linked to an optimal transport map construction);
  • It naturally reduces to the univariate definition δ H = | F P ± ( θ 0 ) | when d = 1 .
The primary practical difficulty lies in computing the center-outward distribution function F P ± ( · ) for an arbitrary posterior distribution π ( θ | y ) , as it typically requires solving a complex optimal transport problem (see [15]).

3. Beyond Gaussian I: Higher-Order Asymptotic Approximations

3.1. Scalar Case

In order to have more accurate evaluations of the first-order approximation (5) of δ H , it may be useful to resort to higher-order approximations based on tail area approximations (see, e.g., [3,4], and references therein). Using the tail area argument to the posterior density, we can derive the O ( n 3 / 2 ) approximation:
P ( θ θ 0 | y ) = ¨ Φ ( r B ( θ 0 ) ) ,
where the symbol “ = ¨ ” indicates that the approximation is accurate to O ( n 3 / 2 ) and
r B ( θ ) = r ( θ ) + 1 r ( θ ) log q ( θ ) r ( θ ) ,
with r ( θ ) = sign ( θ ^ θ ) [ 2 ( ( θ ^ ) ( θ ) ) ] 1 / 2 , the likelihood root, and
q ( θ ) = ( 1 ) ( θ ) j ( θ ^ ) 1 / 2 π ( θ ^ ) π ( θ ) .
In the expression of q ( θ ) , ( 1 ) ( θ ) = ( θ ) / θ is the score function.
Using the tail area approximation (12), a third-order approximation of the BDM (2) can be computed as
δ H = ¨ 1 2 min { Φ ( r B ( θ 0 ) ) , 1 Φ ( r B ( θ 0 ) ) } = 2 Φ ( | r B ( θ 0 ) | ) 1 .
Note that the higher-order approximation (13) does not call for any condition on the prior π ( θ ) , i.e., it can also be improper, and it is available at a negligible additional computational cost over the simple first-order approximation.
Note also that using r B ( θ ) an ( 1 α ) equi-tailed credible interval for θ can be computed as C I = { θ : | r B ( θ ) | z 1 α / 2 } , where z 1 α / 2 is the ( 1 α / 2 ) -quantile of the standard normal distribution, and in practice, it can reflect asymmetries of the posterior. Moreover, from (12), the posterior median can be computed as the solution for θ of the estimating equation r B ( θ ) = 0 .

3.2. Nuisance Parameters

When θ is partitioned as θ = ( ψ , λ ) , where ψ is a scalar parameter of interest and λ is a ( d 1 ) —dimensional nuisance parameter, in order to have more accurate evaluations of the first-order approximation (10) of δ H , using the tail area argument to the marginal posterior density, we can derive the O ( n 3 / 2 ) approximation (see, e.g., [3,4]):
P m ( ψ ψ 0 | y ) = ¨ Φ ( r B p ( ψ 0 ) ) ,
where
r B p ( ψ ) = r p ( ψ ) + 1 r p ( ψ ) log q B ( ψ ) r p ( ψ ) ,
with r p ( ψ ) = sign ( ψ ^ ψ ) [ 2 ( p ( ψ ^ ) p ( ψ ) ) ] 1 / 2 , the profile likelihood root, and
q B ( ψ ) = p ( 1 ) ( ψ ) | j p ( ψ ^ ) | 1 / 2 | j λ λ ( ψ , λ ^ ψ ) | 1 / 2 | j λ λ ( ψ ^ , λ ^ ) | 1 / 2 π ( ψ ^ , λ ^ ) π ( ψ , λ ^ ψ ) .
In the expression of q B ( ψ ) , p ( 1 ) ( ψ ) is the profile score function and j λ λ ( ψ , λ ) represents the ( λ , λ ) -block of the observed information j ( ψ , λ ) .
Using the tail area approximation (14), a third-order approximation of the BDM (7) can be computed as
δ H = ¨ 1 2 min { Φ ( r B p ( ψ 0 ) ) , 1 Φ ( r B p ( ψ 0 ) ) } = 2 Φ ( | r B p ( ψ 0 ) | ) 1 .
Note that the higher-order approximation (15) does not call for any condition on the prior π ( ψ , λ ) , i.e., it can also be improper. Note also that, using r B p ( ψ ) , a ( 1 α ) equi-tailed credible interval for ψ can be computed as C I = { ψ : | r B p ( ψ ) | z 1 α / 2 } . Moreover, from (14), the posterior median of (8) can be computed as the solution for ψ of the estimating equation r B p ( ψ ) = 0 .

Approximations with Matching Priors

The order of the approximations in the previous sections refers to the posterior distribution function and may depend, to varying degrees, on the choice of prior. A so-called strong-matching prior (see [16], and references therein) ensures that a frequentist p-value coincides with a Bayesian posterior survivor probability to a high degree of approximation, in the marginal posterior density (8).
Welch and Peers [17] showed that for a scalar parameter θ , Jeffreys’ prior is probability matching, in the sense that posterior survivor probabilities agree with frequentist probabilities, and credible intervals of a chosen width coincide with frequentist confidence intervals. With Jeffreys’ prior, we have
q ( θ ) = ( 1 ) ( θ ) j ( θ ^ ) 1 / 2 i ( θ ^ ) 1 / 2 i ( θ ) 1 / 2
and the corresponding r B ( θ ) coincides with the frequentist modified likelihood root as defined by [18]. In this case, using the tail area approximation (12), a third-order approximation of the BDM of the hypothesis H 0 : θ = θ 0 coincides with 1 p , where p is the p-value based on the frequentist modified likelihood root. Thus, when using Jeffreys’ prior and higher-order asymptotics in the scalar case, there is agreement between Bayesian and frequentist hypothesis testing.
In the presence of nuisance parameters, following [4], when using a strong matching prior, the marginal posterior density can be written as
π m ( ψ | y ) ¨ exp 1 2 r p ( ψ ) 2 s p ( ψ ) r p ( ψ ) ,
where s p ( ψ ) = p ( 1 ) ( ψ ) / j p ( ψ ^ ) 1 / 2 is the profile score statistic, and r p ( ψ ) is the modified profile likelihood root:
r p ( ψ ) = r p ( ψ ) + 1 r p ( ψ ) log q p ( ψ ) r p ( ψ ) ,
which has a third-order standard normal null distribution. In (17), the quantity q p ( ψ ) is a suitably defined correction term (see, e.g., [18] and [19], Chapter 9). Moreover, the tail area of the marginal posterior for ψ can be approximated to third order as
P m ( ψ ψ 0 | y ) = ¨ Φ ( r p ( ψ 0 ) ) ,
A remarkable advantage of (16) and (18) is that their expressions automatically include the matching prior, without requiring its explicit computation.
Using (18), an asymptotic equi-tailed credible interval for ψ can be computed as C I = { ψ : | r p ( ψ ) | z 1 α / 2 } , i.e., as a confidence interval for ψ based on (17) with approximate level ( 1 α ) . Note from (18) that the posterior median of π m ( ψ | y ) can be computed as the solution for ψ of the estimating equation r p ( ψ ) = 0 , and thus it coincides with the frequentist estimator defined as the zero-level confidence interval based on r p ( ψ ) . Such an estimator has been shown to be a refinement of the MLE ψ ^ .
Using the tail area approximation (18), a third-order approximation of the BDM of the hypothesis H 0 : ψ = ψ 0 is
δ H = ¨ 1 2 min { Φ ( r p ( ψ 0 ) ) , 1 Φ ( r p ( ψ 0 ) ) } = 2 Φ ( | r p ( ψ 0 ) | ) 1 .
In this case, (19) coincides with 1 p r , where p r is the p-value based on (17). Thus, when using strong matching priors and higher-order asymptotics, there is agreement between Bayesian and frequentist hypothesis testing, point estimation, and interval estimation.
From a practical point of view, the computation of (19) can be easily performed in practical problems using the likelihoodAsy package [20] of the statistical software R version 4.4.1. In practice, the advantage of using this package is that it does not require the function q p ( ψ ) explicitly, but instead requires only the code for computing the log-likelihood function and for generating data from the assumed model. Some examples can be found in [14].

3.3. Multidimensional Parameters

When θ is multidimensional, the derivation of a first-order tail area approximation and a first-order approximation for δ H remains straightforward, starting from the Laplace approximation of the posterior distribution. In particular, let W ( θ ) = 2 ( ( θ ^ ) ( θ ) ) be the log-likelihood ratio for θ . Using W ( θ ) , a first-order approximation of the BDM for the hypothesis H 0 : θ = θ 0 can be obtained as follows:
δ H = ˙ 1 P ( χ d 2 W ( θ 0 ) ) ,
where χ d 2 is the Chi-squared distribution with d degrees of freedom. This approximation is asymptotically equivalent to the first-order approximation:
δ H = ˙ 1 P χ d 2 ( θ 0 θ ^ ) T j ( θ ^ ) ( θ 0 θ ^ ) .
Higher-order approximations based on modifications of the log-likelihood ratios are also available for multidimensional parameters of interest, both with or without nuisance parameters (see [4,19,21], and references therein). As is the case with the approximations for a scalar parameter, the proposed results are based on the asymptotic theory of modified log-likelihood ratios [21], they require only routine maximization output for their implementation, and they are constructed for arbitrary prior distributions. For instance, paralleling the scalar parameter case, a credible region for a d-dimensional parameter of interest θ with approximately 100 ( 1 α ) % coverage in repeat sampling, can be computed as C R = { θ : W ( θ ) χ d ; 1 α 2 } , where W ( θ ) is a suitable modification of the log-likelihood ratio W ( θ ) or of the profile log-likelihood ratio (see [19,21]), and χ d ; 1 α 2 is the ( 1 α ) quantile of the χ d 2 distribution. In practice, the C R region can be interpreted as the extension to the multidimensional case of the equi-tailed C I set, i.e., the C R region is computed as a multidimensional case of the C I set based on the Chi-squared approximation. As in the scalar case, the C R region can reflect departures from symmetry with respect to the first-order approximation based on the Wald statistic. Some simulation studies on C R based on W ( θ ) can be found in [22].
Using W ( θ ) , a higher-order approximation of the BDM for the hypothesis H 0 : θ = θ 0 can be obtained as
δ H = ¨ 1 P ( χ d 2 W ( θ 0 ) ) .
The major drawback with this approximation is that the signed root log-likelihood ratio transformation W ( θ ) in general depends on the chosen parameter order. Moreover, its computation can be cumbersome when d is large.

4. Beyond Gaussian II: Skewed Approximations

A major limitation of standard first-order Gaussian approximations, like (5) and (10), is their reliance on symmetric densities, which simplifies inference but can misrepresent key posterior features like skewness and heavy tails. Indeed, even simple parametric models can yield asymmetric posteriors, leading to biased and inaccurate approximations.
To overcome this, recent work has introduced flexible families of approximating posterior densities that can capture the shape and skewness [5,6,23]. In particular, [5] developed a class of closed-form deterministic approximations using a third-order extension of the Laplace approximation. This approach yields tractable, skewed approximations that better capture the actual shape of the target posterior while remaining computationally efficient.
Also, the skewed approximations, as well as the higher-order approximations discussed in Section 3, rely on higher-order expansions and derivatives. They start with a symmetric Gaussian approximation, but centered at the maximum a posteriori (MAP) estimate, and introduce skewness through the Gaussian distribution function combined with a cubic term influenced by the third derivative of the log-likelihood function.

4.1. Scalar Case

Let us denote with ( k ) ( θ ) the k-th derivative of the log-likelihood ( θ ) , i.e., ( k ) ( θ ) = k ( θ ) / θ k , k = 1 , 2 , 3 , . Moreover, let θ ˜ = argmax θ Θ { ( θ ) + log π ( θ ) } be the MAP estimate of θ and let h = n ( θ θ ˜ ) be the rescaled parameter. Using result (14) of [5] and all the regularity conditions there stated, the skew-symmetric (SKS) approximation for the posterior density for θ is
π S K S ( θ | y ) 2 ϕ ( h ; 0 , ω ˜ ) Φ ( α ˜ ( h ) ) ,
where ϕ ( h ; 0 , ω ˜ ) is the normal density function with mean 0 and variance ω ˜ = n j ( θ ˜ ) 1 and
α ˜ ( h ) = ( 3 ) ( θ ˜ ) 2 π 12 n 3 / 2 h 3
is the skewness component, expressed as a cubic function of h, reflecting the influence of the third derivative of the log-likelihood on the shape of the posterior distribution.
Equation (23) provides a practical skewed second-order approximation of the target posterior density, centered at its mode. This approach is known as the SKS approximation or skew-modal approximation. Compared to the classical first-order Gaussian approximation derived from the Laplace method, the SKS approximation remains similarly tractable while providing significantly greater accuracy. Note that this approximation depends on the prior distribution through the MAP.
Using (23) and the approximation
2 ϕ ( h ; 0 , ω ˜ ) 1 2 + 1 2 π α ˜ ( h ) = 2 ϕ ( h ; 0 , ω ˜ ) Φ ( α ˜ ( h ) ) + O ( n 1 ) ,
we can derive the approximation
P S K S ( θ θ 0 | y ) = h 0 2 ϕ ( h ; 0 , ω ˜ ) 1 2 + 1 2 π α ˜ ( h ) d h 2 ϕ ( h ; 0 , ω ˜ ) 1 2 + 1 2 π α ˜ ( h ) d h
for the tail area for (4), where h 0 = n ( θ 0 θ ˜ ) . Note that the denominator is simply equal to 1, due to the symmetry of ϕ ( · ) and the oddness of α ˜ ( h ) . The numerator can be split into two integrals:
h 0 2 ϕ ( h ; 0 , ω ˜ ) 1 2 + 1 2 π α ˜ ( h ) d h = h 0 ϕ ( h ; 0 , ω ˜ ) d h + 2 π h 0 ϕ ( h ; 0 , ω ˜ ) α ˜ ( h ) d h .
The first integral can be expressed as the standard Gaussian tail, as follows:
h 0 ϕ ( h ; 0 , ω ˜ ) d h = 1 Φ h 0 ω ˜ ,
while the second integral involves the skewness term and can be expressed as
2 π h 0 ϕ ( h ; 0 , ω ˜ ) α ˜ ( h ) d h = ( 3 ) ( θ ˜ ) 6 n 3 / 2 h 0 h 3 ϕ ( h ; 0 , ω ˜ ) d h .
Substituting z = h / ω ˜ into the integral h 0 h 3 ϕ ( h ; 0 , ω ˜ ) d h , we have
h 0 h 3 ϕ ( h ; 0 , ω ˜ ) d h = h 0 h 3 1 2 π ω ˜ exp h 2 2 ω ˜ d h = h 0 / ω ˜ ( ω ˜ z ) 3 1 2 π ω ˜ exp ( ω ˜ z ) 2 2 ω ˜ ω ˜ d z = ω ˜ 3 / 2 h 0 / ω ˜ z 3 1 2 π exp z 2 2 d z .
Using the identity z 0 z 3 ϕ ( z ; 0 , 1 ) d z = ϕ ( z 0 ; 0 , 1 ) ( z 0 2 + 2 ) , with z 0 = h 0 / ω ˜ , and z 0 z 3 ϕ ( z ; 0 , 1 ) d z = ϕ ( z 0 ; 0 , 1 ) ( z 0 2 + 2 ) , we obtain
h 0 h 3 ϕ ( h ; 0 , ω ˜ ) d h = ω ˜ 3 / 2 ϕ h 0 ω ˜ ; 0 , 1 h 0 ω ˜ 2 + 2 = ω ˜ 3 / 2 ϕ h 0 ω ˜ ; 0 , 1 h 0 2 ω ˜ + 2 .
Then, the resulting SKS approximation to P ( θ θ 0 | y ) is
P S K S ( θ θ 0 | y ) = 1 Φ h 0 ω ˜ + ( 3 ) ( θ ˜ ) 6 n 3 / 2 ω ˜ 3 / 2 ϕ h 0 ω ˜ ; 0 , 1 h 0 2 ω ˜ + 2 .
Finally, substituting this approximation into (7), we obtain the SKS approximation of the BDM, given by
δ H S K S = 2 Φ h 0 ω ˜ 2 sign ( h 0 ) ( 3 ) ( θ ˜ ) 6 n 3 / 2 ω ˜ 3 / 2 ϕ h 0 ω ˜ ; 0 , 1 h 0 2 ω ˜ + 2 1 .
Note that the first term of this approximation differs from that in (5) since it is evaluated at the MAP and not at the MLE.

4.2. Nuisance Parameters

As in Section 2.2, suppose that the parameter is partitioned as θ = ( ψ , λ ) , where ψ is a scalar parameter of interest and λ is a nuisance parameter of dimension d 1 . Also, for the marginal posterior distribution π m ( ψ | y ) , an SKS approximation is available (see [5], Section 4.2).
Adopting the index notation, let us denote by j ( θ ) = [ s t ( 2 ) ( θ ) ] the observed Fisher information matrix, where s t ( 2 ) ( θ ) = 2 ( θ ) θ s θ t , s , t = 1 , , d , and let Ω = ( j ( θ ˜ ) / n ) 1 be the inverse of the scaled observed Fisher information matrix evaluated at the MAP. We denote the elements of Ω by Ω s t ; in particular, let us denote by Ω 11 the element corresponding to the parameter of interest, ψ . Moreover, let us denote with s t l ( 3 ) ( θ ) = 3 ( θ ) θ s θ t θ l the elements of the third derivative of the log-likelihood, with s , t , l = 1 , , d . Finally, let us define the following two quantities:
v 1 , 1 = 3 i = 1 d j = 1 d 1 i j ( 3 ) ( θ ˜ ) Ω i j + 3 i = 1 d j = 1 d k = 1 d i j k ( 3 ) ( θ ˜ ) Ω i j Ω k 1
and
v 3 , 111 = 111 ( 3 ) ( θ ˜ ) + 3 i = 1 d 11 i ( 3 ) ( θ ˜ ) Ω i 1 + 3 i = 1 d j = 1 d 1 i j ( 3 ) ( θ ˜ ) Ω i j Ω j 1 + i = 1 d j = 1 d k = 1 d i j k ( 3 ) ( θ ˜ ) Ω i j Ω k 1 Ω 11 .
Then, following formula (23) in [5], the SKS approximation of the marginal posterior density π m ( ψ | y ) can be expressed as
π m S K S ( ψ | y ) 2 ϕ ( h ψ ; 0 , Ω 11 ) Φ ( α ψ ( h ψ ) ) ,
where h ψ = n ( ψ ψ ˜ ) is the rescaled parameter of interest, ϕ ( · ; 0 , Ω 11 ) is the density of a Gaussian distribution with mean 0 and variance Ω 11 , and the skewness component α ψ ( h ψ ) is defined as
α ψ ( h ψ ) = 2 π 12 n 3 / 2 v 1 , 1 h ψ + v 3 , 111 h ψ 3 .
Using (25), we can derive the SKS tail area approximation of (9), given by
P m S K S ( ψ ψ 0 | y ) = h ψ 0 2 ϕ ( h ψ ; 0 , Ω 11 ) Φ ( α ψ ( h ψ ) ) d h ψ ,
where h ψ 0 = n ( ψ 0 ψ ˜ ) . Finally, the marginal SKS approximation of the BDM is given by
δ H m S K S = 1 2 min P m S K S ( ψ ψ 0 | y ) , 1 P m S K S ( ψ ψ 0 | y ) .
The marginal SKS tail area approximation P m S K S ( ψ ψ 0 | y ) , and, thus, also δ H m S K S , can be derived numerically.

4.3. Multidimensional Parameters

While the SKS approximation is theoretically elegant, similar to the higher-order modification of the log-likelihood ratio W ( θ ) , it has two main drawbacks. The first one is that it relies only on local information around the mode. The second is that it is computationally intensive because it relies on third-order derivatives (i.e., a tensor of derivatives) of the log-likelihood. The size of this derivative tensor increases cubically with the number of parameters, leading to substantial memory and computational demands, particularly in models with many parameters. Furthermore, quantities such as the moments, marginal distributions, and quantiles of the SKS approximation are not available in closed form, even in the scalar case.
To address these challenges, [6] proposed a class of approximations based on the standard skew-normal (SN) distribution. Their method matches posterior derivatives, aiming to preserve the ability to model skewness while employing more computationally tractable structures. It utilizes local information around the MAP by matching the mode m, the negative Hessian at the mode, i.e., j ( θ ˜ ) , and the third-order unmixed derivatives vector t R d of the log-posterior. Moreover, modern tools for automatic differentiation can greatly facilitate the computation of such higher-order derivatives without manual derivation. The goal is to find the parameters of the multivariate SN distribution SN d ( ξ , Ω , α ) that best match these quantities. The notation SN d ( ξ , Ω , α ) indicates a d-dimensional SN distribution (see e.g., [24], and references therein), with location parameter ξ , scale matrix Ω , and shape parameter α . The matching equations are given by
0 = Ω 1 ( m ξ ) + ζ 1 ( κ ) α , j ( θ ˜ ) = Ω 1 ζ 2 ( κ ) α α , t = ζ 3 ( κ ) α 3 , κ = α ( m ξ ) ,
where ζ k ( κ ) denotes the k-th derivative of log Φ ( κ ) , and 3 represents the Hadamard (element-wise) product. The solution proceeds by reducing the system to a one-dimensional root-finding problem in κ , after which α , Ω , and ξ can be obtained analytically. Ultimately, the marginal distributions are available in closed form as well. Given its tractability, we adopt the derivative matching approach proposed by [6] to derive SKS approximations for models with multidimensional parameters. For the SN model, we can instead easily define the multivariate quantiles.
As suggested in [8,9], an effective approach to defining quantiles in the multidimensional case is to identify the optimal transport (OT) map between the spherical uniform distribution and the target multivariate SN distribution. Considering the inherent relationship between the standard multivariate Gaussian distribution and the spherical uniform distribution, we explore the OT map linking a multivariate SN distribution to a multivariate standard normal distribution. Indeed, given a multivariate standard normal S in R d , it is well known that U = S / S is uniformly distributed on the sphere of radius d in R d . Furthermore, 2 ( Φ ( S ) 0.5 ) is uniform in (0,1). Thus, the OT map and the quantiles of the multivariate standard Gaussian are coherently defined as a bijection of the norm of the multivariate standard normal vector S (the distance from the origin). In particular, we utilize the canonical multivariate SN distribution, derived from applying a rotational transformation, and we consider a component-wise transformation using the univariate SN distribution function and the standard normal quantile function, which delineates a transport map represented as the gradient of a convex function.
From X SN d ( ξ , Ω , α ) , let δ = Ω ( α / 1 + α Ω α ) . We define a rotation T 1 ( X ) = Q X by means of the matrix Q R d × d such that
  • Z = Q ( X ξ ) aligns the skewness with the first coordinate;
  • in the rotated space, Z 1 SN 1 ( 0 , ω 2 , α ) , with ω 2 = [ Q Ω Q ] 1 , 1 , and Z 2 : d are Gaussian.
The matrix Q is obtained by applying a (rectangular) QR decomposition to the α vector. The vector of means is E ( Z ) = Q δ 2 / π and the covariance matrix is V = Q ( Ω 2 π ( Q δ ) Q δ ) . Moreover, the scale parameter of Z 1 is σ = Q Ω Q and we denote its mean and variance as μ 1 = E [ Z 1 ] and V 1 = V a r ( Z 1 ) , respectively.
We define the transport map T 2 ( X ) in the rotated space as
T 2 ( X ) = Φ 1 ( F SN ( X 1 , 0 , σ 2 , Q α ) , μ 1 , V 1 ) X 2 X d ,
where F SN ( · ) is the univariate SN cumulative distribution function and Φ 1 ( · ) is the standard normal quantile function. In practice, we transform the first component using the univariate SN cumulative distribution function ( F S N ) and the standard normal quantile function ( Φ 1 ) to remove its skewness, while leaving other components unchanged. Note that the SN distribution is closed under linear transformations. In particular, after the rotation, the skewness of the variable Z becomes Q α (see [24]). The variable Z = T 2 ( Z ) is now approximately multivariate normal. Finally, we apply an affine transformation to standardize the result. More precisely, we consider T 3 ( X ) = V 1 / 2 ( X Q δ 2 / π ) , and set U = T 3 ( Z ) . The resulting U is distributed as a standard normal (see Figure 1).
It follows that, using the SN approximation π S N ( θ | y ) for the posterior distribution of θ , the SN approximation of the BDM can be expressed as
δ H S N = 1 Pr ( χ d 2 T ( θ 0 ) ) ,
where T ( x ) = T 3 T 2 T 1 . The map T ( x ) = T 3 T 2 T 1 is the OT map as it is the gradient of a convex function. In particular, T 1 and T 3 are affine transformations, and the function Φ 1 ( F SN ( z , ξ , ω , α ) ) is monotonically increasing in z, hence its integral is convex. Defining
g ( Z ) = 0 Z 1 Φ 1 ( F SN ( t , ξ , ω , α ) ) d t + 1 2 i = 2 d Z i 2 ,
then T 2 ( Z ) = g ( Z ) . The composite map T ( · ) , used in (27), is the gradient of a convex function and, thus, it represents the optimal transport map (under quadratic cost) from an SN distribution to a standard normal. The procedure is summarized in Algorithm 1.
Algorithm 1 Optimal transport from SN d ( ξ , Ω , α ) to N d ( 0 , I d )
  • Input  X R d SN d ( ξ , Ω , α )
  • Output  U N d ( 0 , I d )
  • Compute δ Ω α / 1 + α Ω α
  • Compute QR decomposition: Q QR _ decomposition ( α )
  • Compute mean and covariance: μ Z Q δ · 2 / π , V Q Ω 2 π ( Q δ ) ( Q δ ) Q
  • Compute variance of the rotated first component: σ 2 [ Q Ω Q ] 1 , 1
  • Set α rot Q α
  • Set μ 1 ( μ Z ) 1 and V 1 V 1 , 1
  • Apply rotation: Z Q ( X ξ ) { T 1 }
  • Compute u F SN ( Z 1 ; 0 , σ 2 , α rot ) Compute z 1 Φ 1 ( u ; μ 1 , V 1 )
  • Set Z ( z 1 , Z 2 , , Z d ) { T 2 }
  • Compute U V 1 / 2 ( Z μ Z ) { T 3 }

5. Examples of Higher-Order and Skewed Approximations

In the following, we focus on assessing the performance of the higher-order approximations and of the skewed approximations of the BDM in two examples, discussed also in [5,7].

5.1. Exponential Model

We revisit Example 1 in [7], where the model for data y 1 , , y n is an exponential distribution with scale parameter θ , meaning E ( Y ) = θ . By employing Jeffrey’s prior, which is π ( θ ) θ 1 , the resulting posterior distribution is an inverse gamma, characterized by shape and rate parameters equal to n and t n , respectively, with t n = i y i . The quantities for the SKS approximation of the posterior distribution are available in [5] (see Section 3.1), while for the higher-order approximation, we have that q ( θ ) coincides with the score statistic, i.e., q ( θ ) = ( 1 ) ( θ ) / i ( θ ) 1 / 2 . We analyze how well the two approximations align with the true BDM under the growing sample size ( n = 6 , 12 , 20 , 40 ) while keeping the MLE fixed at θ ^ = 1.2 . The MAP values are 1.03 ( n = 6 ), 1.11 ( n = 12 ), 1.14 ( n = 20 ), and 1.17 ( n = 40 ).
Figure 2 and Figure 3 and Table 1 report the approximations of the BDM for several candidate values for θ 0 . In particular, the first-order (IO) approximation (5), the higher-order (HO) approximation (13), the SKS approximation (24), a direct numerical tail area calculation (SKS-num) of (23) and the SN approximation (27) are considered. Figure 2 and Figure 3 also display the approximations to the corresponding posterior distributions, where the HO approximation is derived numerically by inverting the tail area. Also, note that the SKS approximation of the posterior distribution is not guaranteed to lie within the interval (0,1), so we practically bounded the BDM in this interval.
The results confirm that the HO and SKS approximations yield remarkable improvements over the first-order counterpart for any n. Moreover, they show that the HO approximation of the BDM is almost perfectly over-imposed on the true BDM, especially for values of θ 0 far from the MLE. When the value under the null hypothesis is closer to the MLE, the SKS approximation and the numerical tail areas derived from the SKS and SN approximations better approximate the true BDM. Furthermore, the SN approximation more accurately captures the tail behavior of the posterior distribution than the SKS approximation.

5.2. Logistic Regression Model

We now consider a real-data application using the Cushing’s dataset (see [5], Section 5.2), which is openly available in the R library MASS. The data were obtained from a medical study on n = 27 individuals, aimed at investigating the relationship between Cushing’s syndrome and two steroid metabolites, namely Tetrahydrocortisone and Pregnanetriol.
We define a binary response variable Y, which takes the value 1 when the patient is affected by bilateral hyperplasia, and 0 otherwise. The two observed covariates x 1 and x 2 are two dummy variables representing the presence of the metabolites. We focus on the most popular regression model for binary data, namely, logistic regression with mean function logit 1 ( β 0 + β 1 x 1 + β 2 x 2 ) . As in [5], Bayesian inference is carried out by employing independent, weakly informative Gaussian priors N(0, 25) for the coefficients β = ( β 0 , β 1 , β 2 ) .
Figure 4 displays the marginal posterior distributions for β 1 and β 2 obtained via MCMC sampling (black curves), along with the first-order, the SKS, and the SN approximations. The MAP values for the two parameters are −0.031 and −0.286, respectively.
We aim to test the two null hypotheses H 0 : β 1 = 0 and H 0 : β 2 = 0 , corresponding to the null effect of the metabolites’ presence in determining Cushing’s syndrome (indicated by red vertical lines in Figure 4). The exact BDM gives the values 0.592 and 0.932, respectively, indicating that the hypothesized value may support the null hypothesis for the first parameter β 1 , whereas the second value suggests a weak disagreement with the assumed value for H 0 : β 2 = 0 . The SKS approximations of the BDM for the considered hypotheses are 0.612 and 0.935, respectively; the SN approximations are 0.584 and 0.870, respectively; the first-order approximations are 0.512 and 0.891, respectively; while the higher-order approximations provide 0.611 and 0.998, respectively. Finally, the approximations based on the matching priors are 0.477 and 0.862, respectively. Thus, the skewed approximations (SKS, SN) provide the best results.
For the composite hypothesis H 0 : β 1 = β 2 = 0 , the ground truth is not available; however, in the presence of low correlation between the components, one can approximate it as the geometric mean of the two marginal measures, which is 0.743. The first-order approximation for the BDM gives 0.300, while the SN approximation gives 0.760, revealing that the value under the null is more extreme (see also Figure 5).

6. Concluding Remarks

Although the higher-order and skewed approximations described in this paper are derived from asymptotic considerations, they perform well in moderate or even small sample situations. Moreover, they represent an accurate method for computing posterior quantities and approximating δ H , and they make it quite straightforward to assess the effect of changing priors (see, e.g., [25]). When using objective Bayesian procedures based on strong-matching priors and higher-order asymptotics, there is agreement between Bayesian and frequentist point and interval estimation, as well as in significance measures. This is not true, in general, with the e-value, as discussed in [14].
A significant contribution of this work is the extension to multivariate hypotheses. We propose a formal definition of the multivariate BDM based on center-outward optimal transport maps, providing a theoretically sound generalization of the univariate concept. By utilizing either the multivariate normal or multivariate SN approximations of the posterior distribution, we can formulate the multivariate quantiles in a closed form, thereby allowing us to derive the BDM for composite hypotheses. Nonetheless, precisely determining or defining these quantiles on the true posterior is challenging, as the transport map may not be available in a closed form and requires solving a complex optimization problem. However, the SN approximation as well as the derived OT map continue to be manageable in high-dimensional settings, whereas typical OT methods generally do not scale efficiently with increasing dimensions.
As a final remark, the high-order procedures proposed and described are tailored to continuous posterior distributions, and their extension to models with discrete or mixed-type parameters warrants further study. Moreover, although the higher-order and skewed methods, alongside SN-based OT maps, offer a useful means for approximating the posterior distributions and computing tail areas, their application might fail in handling complex or irregular posterior landscapes. In such cases, employing integrated computational procedures to find the transport map [26] and utilizing the direct definition of the multivariate BDM could be more appropriate. Furthermore, the utility of the higher-order and skew-normal approximation techniques developed here is not restricted to the Bayesian discrepancy measure. These methods hold considerable promise for approximating other Bayesian measures of evidence. For example, applying these procedures to quantities like the e-value is a natural and compelling direction for future work.

Author Contributions

Conceptualization, E.B., M.M. and L.V.; Methodology, E.B., M.M. and L.V.; Software, E.B.; Validation, E.B. and L.V.; Formal analysis, E.B., M.M. and L.V.; Investigation, E.B. and L.V.; Data curation, E.B.; Writing—original draft, E.B., M.M. and L.V.; Writing—review & editing, E.B. and M.M.; Supervision, F.B., M.M. and L.V. All authors have read and agreed to the published version of the manuscript.

Funding

Elena Bortolato acknowledges funding from the European Union under the ERC grant project number 864863.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations and Notation Glossary

The following abbreviations are used in this manuscript:
SymbolMeaning/Definition
BDMBayesian discrepancy measure
BFBayes factor
LRLog-likelihood ratio
MAPMaximum a posteriori
MLEMaximum likelihood estimate
OTOptimal transport
SKSSkew-symmetric
SNSkew-normal
H 0 Sharp (precise) null hypothesis
θ Scalar parameter or parameter vector in the multivariate case
θ 0 Specific hypothesized value of the parameter θ under H 0
ψ Scalar parameter of interest
λ Nuisance parameter (scalar or vector)
yObserved data
dDimension of the full parameter vector
nSample size
θ ^ MLE of θ
θ ˜ MAP of θ
λ ^ ψ Constrained MLE of λ given ψ
( θ ) , ( ψ , λ ) Log-likelihood function
p ( ψ ) Profile log-likelihood function for ψ
( 1 ) ( θ ) , p ( 1 ) ( ψ ) Score function, profile score function
j ( θ ) , j p ( ψ ) Observed information matrix, profile observed information
j ψ ψ , j λ λ , j ψ λ Submatrices of j ( θ ) for parameter partitions
w ( θ ) , w p ( ψ ) Wald statistic, profile Wald statistic
s ( θ ) , s p ( ψ ) ,Score statistic, profile score statistic
W ( θ ) Log-likelihood ratio statistic
r ( θ ) , r p ( ψ ) Likelihood root, profile likelihood root
π ( θ ) , π ( θ y ) Prior density of θ , posterior density of θ
δ H Bayesian discrepancy measure, quantifying evidence against H 0
r B ( θ ) , r B p ( ψ ) Bayesian modified likelihood root statistic (scalar, with nuisance parameters)
Φ ( · ) Standard normal cumulative distribution function
π m ( ψ y ) Marginal posterior density of ψ
F P ± Center-outward distribution function mapping posterior to the unit ball
Q P ± Center-outward quantile function (inverse of F P ± )
U d Uniform distribution on the unit ball B d R d
B d , S d 1 Unit ball in R d , unit sphere in R d
R P ± ( τ ) , C P ± ( τ ) Center-outward quantile regions and quantile contours of order τ
· Euclidean norm
= ˙ , = ˙ ˙ Approximate equality to first- or third-order (e.g., O ( n 1 / 2 ) or O ( n 3 / 2 ) accuracy)

References

  1. Kass, R.E.; Tierney, L.; Kadane, J. The validity of posterior expansions based on Laplace’s method. In Bayesian and Likelihood Methods in Statistics and Econometrics; Elsevier: Amsterdam, The Netherlands, 1990; pp. 473–488. [Google Scholar]
  2. Reid, N. Likelihood and Bayesian approximation methods. Bayesian Stat. 1995, 5, 351–368. [Google Scholar]
  3. Reid, N. The 2000 Wald memorial lectures: Asymptotics and the theory of inference. Ann. Stat. 2003, 31, 1695–1731. [Google Scholar] [CrossRef]
  4. Ventura, L.; Reid, N. Approximate Bayesian computation with modified log-likelihood ratios. Metron 2014, 7, 231–245. [Google Scholar] [CrossRef]
  5. Durante, D.; Pozza, F.; Szabo, B. Skewed Bernstein–von Mises theorem and skew-modal approximations. Ann. Stat. 2024, 52, 2714–2737. [Google Scholar] [CrossRef]
  6. Zhou, J.; Grazian, C.; Ormerod, J.T. Tractable skew-normal approximations via matching. J. Statist. Comput. Simul. 2024, 94, 1016–1034. [Google Scholar] [CrossRef]
  7. Bertolino, F.; Manca, M.; Musio, M.; Racugno, W.; Ventura, L. A new Bayesian discrepancy measure. Stat. Methods Appl. 2024, 33, 381–405. [Google Scholar] [CrossRef]
  8. Hallin, M.; Del Barrio, E.; Cuesta-Albertos, J.; Matrán, C. Distribution and quantile functions, ranks and signs in dimension d: A measure transportation approach. Ann. Stat. 2021, 49, 1139–1165. [Google Scholar] [CrossRef]
  9. Hallin, M.; Konen, D. Multivariate Quantiles: Geometric and Measure-Transportation-Based Contours. In Applications of Optimal Transport to Economics and Related Topics; Springer Nature: Cham, Switzerland, 2024; pp. 61–78. [Google Scholar]
  10. Madruga, M.; Pereira, C.; Stern, J. Bayesian evidence test for precise hypotheses. J. Stat. Plan. Inf. 2003, 117, 185–198. [Google Scholar] [CrossRef]
  11. Pereira, C.; Stern, J.M. Evidence and Credibility: Full Bayesian Significance Test for Precise Hypotheses. Entropy 1999, 1, 99–110. [Google Scholar] [CrossRef]
  12. Pereira, C.; Stern, J.M. The e-value: A fully Bayesian significance measure for precise statistical hypotheses and its research program. Sao Paulo J. Math. Sci. 2022, 16, 566–584. [Google Scholar] [CrossRef]
  13. Bertolino, F.; Columbu, S.; Manca, M.; Musio, M. Comparison of two coefficients of variation: A new Bayesian approach. Commun. Stat. Simul. Comput. 2024, 53, 6260–6273. [Google Scholar] [CrossRef]
  14. Ruli, E.; Ventura, L. Can Bayesian, confidence distribution and frequentist inference agree? Stat. Methods Appl. 2021, 30, 359–373. [Google Scholar] [CrossRef]
  15. Peyré, G.; Cuturi, M. Computational optimal transport: With applications to data science. Found. Trends Mach. Learn. 2019, 11, 355–607. [Google Scholar] [CrossRef]
  16. Fraser, D.A.S.; Reid, N. Strong matching of frequentist and Bayesian parametric inference. J. Stat. Plan. Inf. 2002, 103, 263–285. [Google Scholar] [CrossRef]
  17. Welch, B.L.; Peers, H.W. On formulae for confidence points based on integrals of weighted likelihoods. J. Roy. Statist. Soc. B 1963, 25, 318–329. [Google Scholar] [CrossRef]
  18. Barndorff-Nielsen, O.E.; Chamberlin, S.R. Stable and invariant adjusted directed likelihoods. Biometrika 1994, 81, 485–499. [Google Scholar] [CrossRef]
  19. Severini, T.A. Likelihood Methods in Statistics; Oxford University Press: Oxford, UK, 2000. [Google Scholar]
  20. Pierce, D.A.; Bellio, R. Modern likelihood-frequentist inference. Int. Stat. Rev. 2017, 85, 519–541. [Google Scholar] [CrossRef]
  21. Skovgaard, I.M. Likelihood Asymptotics. Scand. J. Stat. 2001, 28, 3–32. [Google Scholar] [CrossRef]
  22. Ventura, L.; Ruli, E.; Racugno, W. A note on approximate Bayesian credible sets based on modified log-likelihood ratios. Stat. Prob. Lett. 2013, 83, 2467–2472. [Google Scholar] [CrossRef]
  23. Tan, L.S.; Chen, A. Variational inference based on a subclass of closed skew normals. J. Stat. Comput. Simul. 2024, 34, 1–15. [Google Scholar] [CrossRef]
  24. Azzalini, A.; Capitanio, A. Statistical applications of the multivariate skew normal distribution. J. R. Stat. Soc. B 1999, 61, 579–602. [Google Scholar] [CrossRef]
  25. Reid, N.; Sun, Y. Assessing Sensitivity to Priors Using Higher Order Approximations. Commun. Stat. Simul. Comput. 2010, 39, 1373–1386. [Google Scholar] [CrossRef]
  26. Li, K.; Han, W.; Wang, Y.; Yang, Y. Optimal Transport-Based Generative Models for Bayesian Posterior Sampling. arXiv 2025, arXiv:2504.08214. [Google Scholar]
Figure 1. First panel: Original SN approximation of a bivariate posterior distribution, with the mode in red and skewness direction indicated by the black line. Second panel: Rotated SN distribution aligning the skewness with the first coordinate; red dashed lines show quantiles of the first rotated component. Third panel: Symmetrized distribution after applying a univariate marginal transformation. Fourth panel: Final standardized and centered normal distribution. Bottom panel: Visualization of the optimal transport (OT) map (red arrows).
Figure 1. First panel: Original SN approximation of a bivariate posterior distribution, with the mode in red and skewness direction indicated by the black line. Second panel: Rotated SN distribution aligning the skewness with the first coordinate; red dashed lines show quantiles of the first rotated component. Third panel: Symmetrized distribution after applying a univariate marginal transformation. Fourth panel: Final standardized and centered normal distribution. Bottom panel: Visualization of the optimal transport (OT) map (red arrows).
Entropy 27 00657 g001
Figure 2. Exact posterior (in green) and approximate posteriors for n = 6 , 12 in the exponential model (top panels). The blue vertical line indicates the posterior median. BDM for a series of parameters (lower panels).
Figure 2. Exact posterior (in green) and approximate posteriors for n = 6 , 12 in the exponential model (top panels). The blue vertical line indicates the posterior median. BDM for a series of parameters (lower panels).
Entropy 27 00657 g002
Figure 3. Exact posterior (in green) and approximate posteriors for n = 20 , 40 in the exponential model (top panels). The blue vertical line indicates the posterior median. BDM for a series of parameters (lower panels).
Figure 3. Exact posterior (in green) and approximate posteriors for n = 20 , 40 in the exponential model (top panels). The blue vertical line indicates the posterior median. BDM for a series of parameters (lower panels).
Entropy 27 00657 g003
Figure 4. Marginal posterior distributions for the regression parameters of the logistic regression example. The marginal medians are indicated in blue, while the parameters under the null hypothesis are indicated in red.
Figure 4. Marginal posterior distributions for the regression parameters of the logistic regression example. The marginal medians are indicated in blue, while the parameters under the null hypothesis are indicated in red.
Entropy 27 00657 g004
Figure 5. Joint posterior for ( β 1 , β 2 ) in the logistic regression example with the first-order (IOrder) and skew-normal (SN) approximations. The point (0,0) is marked with a cross.
Figure 5. Joint posterior for ( β 1 , β 2 ) in the logistic regression example with the first-order (IOrder) and skew-normal (SN) approximations. The point (0,0) is marked with a cross.
Entropy 27 00657 g005
Table 1. BDM for a series of values θ 0 for the parameter and increasing sample sizes in the exponential example. The values of the true BDM and the best approximation(s) in each configuration are highlighted in bold.
Table 1. BDM for a series of values θ 0 for the parameter and increasing sample sizes in the exponential example. The values of the true BDM and the best approximation(s) in each configuration are highlighted in bold.
θ 0 0.30.60.91.21.51.82.12.4
n = 6 IO0.930.780.460.000.460.780.930.99
HO1.000.960.620.000.300.570.730.83
SKS1.001.000.800.200.320.730.940.99
SKS-num1.000.940.530.070.580.910.991.00
SN1.000.940.520.060.510.780.910.97
BDM1.000.960.620.110.300.570.730.83
n = 12 IO0.990.920.610.000.610.920.991.00
HO1.000.990.750.000.480.780.910.96
SKS1.001.000.74−0.000.610.910.991.00
SKS-num1.000.990.720.010.640.951.001.00
SN1.001.000.770.040.620.890.981.00
BDM1.000.990.750.080.480.780.910.96
n = 20 IO1.000.970.740.000.740.971.001.00
HO1.001.000.850.000.620.900.970.99
SKS1.001.000.910.080.660.961.001.00
SKS-num1.001.000.840.020.730.981.001.00
SN1.001.000.940.020.720.951.001.00
BDM1.001.000.850.060.620.900.970.99
n = 40 IO1.001.000.890.000.891.001.001.00
HO1.001.000.950.000.810.981.001.00
SKS1.001.000.990.050.831.001.001.00
SKS-num1.001.000.960.030.871.001.001.00
SN1.001.001.000.020.870.991.001.00
BDM1.001.000.950.040.810.981.001.00
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bortolato, E.; Bertolino, F.; Musio, M.; Ventura, L. Bayesian Discrepancy Measure: Higher-Order and Skewed Approximations. Entropy 2025, 27, 657. https://doi.org/10.3390/e27070657

AMA Style

Bortolato E, Bertolino F, Musio M, Ventura L. Bayesian Discrepancy Measure: Higher-Order and Skewed Approximations. Entropy. 2025; 27(7):657. https://doi.org/10.3390/e27070657

Chicago/Turabian Style

Bortolato, Elena, Francesco Bertolino, Monica Musio, and Laura Ventura. 2025. "Bayesian Discrepancy Measure: Higher-Order and Skewed Approximations" Entropy 27, no. 7: 657. https://doi.org/10.3390/e27070657

APA Style

Bortolato, E., Bertolino, F., Musio, M., & Ventura, L. (2025). Bayesian Discrepancy Measure: Higher-Order and Skewed Approximations. Entropy, 27(7), 657. https://doi.org/10.3390/e27070657

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop