Objective Priors for Invariant e-Values in the Presence of Nuisance Parameters

This paper aims to contribute to refining the e-values for testing precise hypotheses, especially when dealing with nuisance parameters, leveraging the effectiveness of asymptotic expansions of the posterior. The proposed approach offers the advantage of bypassing the need for elicitation of priors and reference functions for the nuisance parameters and the multidimensional integration step. For this purpose, starting from a Laplace approximation, a posterior distribution for the parameter of interest is only considered and then a suitable objective matching prior is introduced, ensuring that the posterior mode aligns with an equivariant frequentist estimator. Consequently, both Highest Probability Density credible sets and the e-value remain invariant. Some targeted and challenging examples are discussed.


Introduction
In this article, we discuss an objective matching prior that maintains the invariance of the posterior mode when testing specific hypotheses for parametric models, especially in the presence of nuisance parameters.These parameters are often introduced to establish flexible and realistic models, although the primary focus of inference is typically limited to a parameter of interest.The proposed approach offers the advantage of eliminating the need for eliciting information on the nuisance components and for conducting multidimensional integration, and it produces invariant e-values in the presence of nuisance parameters.
The parametric framework that we consider can be described as follows.Consider a random sample y = (y 1 , . . ., y n ) of size n from a random variable Y with parametric model f (y; θ), indexed by a paramater θ, with θ ∈ Θ ⊆ IR d .Given a prior π(θ) on θ, Bayesian inference for θ is based on the posterior density π(θ|y) ∝ π(θ)L(θ), (1) where L(θ) represents the likelihood function based on f (y; θ).Interest is, in particular, in the situation in which θ = (ψ, λ), where ψ is a scalar parameter for which inference is required, and λ represents the remaining (d − 1) nuisance parameters.In such case, Bayesian inference for ψ is based on the marginal posterior density π m (ψ|y) = π(ψ, λ|y) dλ ∝ π(ψ, λ)L(ψ, λ) dλ, which for its computation requires both elicitation on the nuisance parameter λ and multidimensional integration.Asymptotic arguments are widely used in Bayesian inference through (1) and ( 2), based on developments of so-called higher-order asymptotics (see, e.g., [1][2][3]).Indeed, the theory of asymptotic expansions provides very accurate approximations to posterior distributions, and to various summary quantities of interest, including tail areas, credible regions and for the Full Bayesian Significance Test (see, e.g., [4,5]).Moreover, they are particularly useful for sensitivity analyses (see [6,7]) and also for the derivation of matching priors (see [8], and references therein).For instance, focusing on the presence of nuisance parameters, the Laplace approximation to (2) provides where ℓ p (ψ) = log L p (ψ) = log L(ψ, λψ ) is the profile log-likelihood for ψ, with λψ the constrained maximum likelihood estimate (MLE) of λ given ψ, ( ψ, λ) is the full MLE, and j p (ψ) = −∂ 2 ℓ p (ψ)/∂ψ 2 is the profile observed information.Moreover, j λλ (ψ, λ) is the (λ, λ)-block of the observed information from the full log-likelihood ℓ(ψ, λ) = log L(ψ, λ), and the notation = indicates that the approximation is accurate to order O(n −3/2 ) in moderate deviation regions (see, e.g., [9], Chapter 2).One appealing feature of higher-order approximations like (3) is that they may routinely be applied in practical Bayesian inference, since they require little more than standard likelihood quantities for their implementation, and hence, they may be available at little additional computational cost over simple firstorder approximations.
In the presence of nuisance parameters, starting from approximation (3), it is possible to define a general posterior distribution for ψ of the form where π * (ψ) is now a prior distribution on ψ only.Bayesian inference based on pseudolikelihood functions-i.e., functions of ψ only and of the data y with properties similar to those of a genuine likelihood function, such as the profile likelihood-have been widely used and discussed in the recent statistical literature.Moreover, it has been theoretically motivated in several papers (see, for instance, [10][11][12], and references therein), also focusing on the derivation of suitable objective priors.Especially when the dimension of λ is large, there are two advantages in using (4) instead of the marginal posterior distribution (2).First, the elicitation over λ is not necessary, and second, the computation of the integrals in (2) is circumvented.
Focusing on (4), in this paper, it is of interest to test the precise (or sharp) null hypothesis using the measure of evidence for the Full Bayesian Significance Test (see, e.g., [4,5]).The Full Bayesian Significance Test (FBST) quantifies evidence by considering the posterior probability associated with the least probable points in the parameter space under H 0 .Higher-order asymptotic computation of the FBST for precise null hypotheses in the presence of nuisance parameters has been discussed in [13].
The original measure of evidence for the FBST is not invariant under suitable transformations of the parameter, a property which has, however, been reached in the more recent definition of the e-value (see [14,15], and references therein).Neverthless, when working on a scalar parameter of interest, in the presence of nuisance parameters, the e-value is not invariant with respect to marginalizations of the nuisance parameter, and it must be used in the full dimensionality of the parameter space.This requires elicitation on the complete parameters, numerical optimization and numerical integration, that can be computationally heavy, especially when the dimension of λ is large.
The aim of this paper is to consider the e-value in the context of the pseudo-posterior distribution π * (ψ|y), suggesting in this respect a suitable objective prior π * (ψ) to be used in (4).More precisely, focus is on a particular matching prior, which ensures the invariance of the posterior mode of the pseudo-posterior distribution.As a consequence, Highest Probability Density credible (HPD) sets are also invariant, as well as the e-value.This paper is organized as follows.Section 2 provides a short review on the FBST for testing precise null hypotheses and also illustrates asymptotic approximations for the e-value, extending results of [13].Section 3 discusses the derivation of the objective matching prior for the parameter of interest only, called median matching prior, that produces invariant e-values.Also, several targeted and challenging examples are discussed.Finally, Section 4 closes with some concluding remarks.

The FBST Measure of Evidence
Suppose that we need to decide between two hypotheses: the null H 0 and the alternative H 1 .The usual Bayesian testing procedure is based on the well-known Bayes factor (BF), defined as the ratio of the posterior odds to the prior odds in favor of the null hypothesis.A high BF or its logarithm suggests evidence in favor of H 0 .However, it is well known that, when improper priors are used, the BF can be undetermined, and when the null hypothesis is precise (as specified in (5)), the BF can lead to the so-called Jeffreys-Lindley's paradox (see, e.g., [16]).Moreover, the BF is not calibrated, i.e., its finite sampling distribution is unknown and it may depend on the nuisance parameter.
To avoid these drawbacks, in recent years, an alternative Bayesian procedure, called FBST, has been introduced by [5] in case of sharp hypothesis H 0 identified by the null set Θ 0 , a submanifold of Θ of lower dimension.The FBST quantifies evidence by considering the posterior probability associated with the least probable points in the parameter space Θ 0 .When this probability is high, it favors the null hypothesis, providing a clear and interpretable measure of support for H 0 (see, e.g., [4,15,17], and references therein).The FBST is based on a specific loss function, and thus, the decision made under this procedure is the action that minimizes the corresponding posterior risk.
The FBST operates by determining the e-value, a representation of Bayesian evidence associated to H 0 .To construct the e-value, the authors introduced the posterior surprise function and its supremum given, respectively, by where r(θ) is a suitable reference function to be chosen.The surprise function was introduced in the context of statistical inference also by [18] (see [15], and references therein).Then, they introduce the tangential set T y (θ * ) defined as the set of parameter values for which the posterior surprise function exceeds the supremum s * , that is This set, often referred to as the Highest Relative Surprise Set, includes parameter values with higher surprise than those within the null set Θ 0 .The e-value is then computed as and H 0 is rejected for small values of ev.
The original FBST, as proposed by [5,19], relies on a flat reference function r(θ) ∝ 1, so that this first version involved the determination of the tangential set T y (θ) starting only from the posterior distribution π(θ|y).However, this initial version lacked invariance under reparameterizations.Subsequent refinements of the FBST introduced the importance of reference density functions, making the e-value explicitly invariant under appropriate transformations of the parameter.Common choices for the reference function include uninformative priors, like the uniform distribution, maximum entropy densities, or Jeffreys' invariant prior.In [20], the use of the Jeffreys' prior, π(θ) ∝ |i(θ)| 1/2 , where i(θ) is the Fisher information derived from L(θ), is discussed as the reference function to derive invariant HPD sets and Maximum A Posteriori (MAP) estimators that are invariant under reparameterizations.Note that the ev uses the full dimensionality of the parameter space.Moreover, this measure is not invariant with respect to transformations of the nuisance parameters, and the use of high posterior densities to construct credible sets may produce inconsistencies.
Concerning the asymptotic behavior of the ev, it can be proven that, under suitable regularity conditions as the sample size increases, with θ 0 representing the true parameter value (see [15]), it holds: • If H 0 is false, i.e., θ 0 / ∈ H 0 , then ev converges in probability to 1.

•
If H 0 is true, i.e., θ 0 ∈ H 0 , then, denoting by V(c) = Pr(ev ≤ c) the cumulative distribution function of ev, we have that ) the cumulative chi-square distribution with k degrees of freedom.
In practice, the computation of ev is performed in two steps: (a) a numerical optimization and (b) a numerical integration.The numerical optimization step consists of finding the maximizer θ * of π s (θ|y) under the null hypothesis.The numerical integration step consists of integrating the posterior surprise function over the region where it is greater than π s (θ * |y), to obtain the e-value.These computational steps make the FBST a computationally intensive procedure.Despite efficient computational algorithms for local and global optimization, as well as numerical integration, obtaining precise results for hypotheses like (5) is highly demanding, especially with large nuisance parameter dimensions.Numerical integration can be tackled by resorting to higher-order tail area approximations, as reviewed in the Bayesian framework in [3].An application of asymptotic approximation to the FBST in its first formulation, i.e., with reference function r(θ) ∝ 1, has been discussed in [13].

Asymptotic Approximations for the e-Value
A first-order approximation for the e-value, when testing (5), is simply given by (see, e.g., [21,22]) where the symbol " =" indicates that the approximation is accurate to O(n −1/2 ), and Φ(•) is the standard normal distribution function.Thus, to first-order, ev agrees with the p-value based on the profile Wald statistic In practice, the approximation (6) of ev may be inaccurate, in particular when the dimension of λ is large with respect to the sample size, because it forces the marginal posterior distribution to be symmetric.The practical computation of ev requires the evaluation of integrals of the marginal posterior distribution.In order to have more accurate evaluations of ev, it may be useful to resort to higher-order asymptotics based on tail area approximations (see, e.g., [2,3], and references therein).Indeed, the measure of evidence involves integrals of the marginal surprise posterior density π ms (ψ|y).In particular, extending the application of the tail area argument to the marginal surprise posterior density, we can derive a O(n −3/2 ) approximation to the marginal surprise posterior tail area probability, given by where profile likelihood root and In the expression of q B (ψ), ℓ ′ p (ψ) = ∂ℓ p (ψ)/∂ψ is the profile score function.Using the tail area approximation ( 8), a third-order approximation of the measure of evidence ev can be derived.The approximation, assuming without loss of generality that ψ 0 is smaller than the MAP of π ms (ψ|y), is given by with ψ * 0 the value of the parameter such that π ms (ψ * 0 |y) = π ms (ψ 0 |y).Note that gives the posterior probability of the HPD credible interval (ψ 0 , ψ * 0 ).Note also that the higher-order approximation (9) does not call for any condition on the prior π(ψ, λ), i.e., it can be also improper.Finally, when π ms (ψ|y) is symmetric, Equation ( 9) reduces to ev = 2(1 − Φ(r * B (ψ 0 ))).While tail area approximations require little more than standard likelihood quantities for their implementation and, in this respect, they are available at little additional computational cost over the first-order approximation, they require elicitation on the complete parameter θ and to choose the reference function r(θ).

An Invariant Objective Prior
The aim of this section is to derive a default prior π * (ψ) to be used in (4).To this end, following [8], we use the shrinkage argument, which is a crucial procedure in the development of matching priors, i.e., priors that ensure, up to the desired order of asymptotics, an agreement between Bayesian and frequentist procedures.Examples of matching priors are (see [8]) for posterior quantiles, for credible regions and for prediction.Here, we focus on a specific matching prior that ensures the invariance of the posterior mode in the posterior distribution (4).As a consequence, the invariance extends to HPDs, as well as the e-value, achieved incorporating the reference function within the prior.
The proposed choice of the prior π * (ψ), which makes the MAP and thus also HPDs and the e-value invariant under 1-1 reparameterization, will depend on the log-likelihood ℓ(θ) and on its derivatives.In regular parametric estimation problems, both the MLE and the score-estimating function exhibit an asymptotically symmetric distribution centered at the true parameter value and at zero, respectively.However, these asymptotic behaviors may poorly reflect exact sampling distributions, particularly in cases with small or moderate sample information, sparse data, or complex models.Several proposals have been developed to correct the estimate or the estimating function.Most available methods are aimed at approximate bias adjustment, either of the MLE or of the profile score function, also when nuisance parameters are present (see [23] for a review of bias reduction for the MLE and [24] and subsequent literature for bias correction of the profile score).Lack of equivariance impacts the so-called implicit bias reduction methods, which achieve firstorder bias correction by modifying the score equation (see [23,25]).To avoid this drawback, in this paper we focus on the median modification of the score, or profile score equation, whose solution respects equivariance under monotone reparameterizations ( [26]).Similar to Firth's implicit method ( [25]), the median modification of the score, or profile score, does not rely on finiteness of the MLE, thereby effectively preventing infinite estimates.
In practice, to derive the median matching prior π * (ψ), we impose that the MAP of π * (ψ|y) coincides with a refined version of the MLE, obtained as the solution of the median modified score function ( [26]).To introduce this new invariant prior, we initially explore the scenario without nuisance parameters and then the situation in which nuisance parameters are present.

No Nuisance Parameters
Let us explore first the scenario where θ is scalar.In order to obtain median bias reduction of the MLE, it is possible to resort to a modified version of the score function of the form where ℓ θ (θ) = ℓ θ (θ; y) = ∂ℓ(θ; y)/∂θ is the score function, and m(θ) is a suitable correction term of order O(1).In particular, the median modified score function assumes for m(θ) the expression The solution θ to the equation t(θ) = 0 not only upholds equivariance under componentwise monotone reparameterizations, but also approximates median unbiasedness ( [26]).Note that likelihood inference based on (10) does not depend explicitely on the MLE.Indeed, the modified score function has been found to overcome infinite estimate problems.Likewise, the MLE and also θ is asymptotically N(θ, i(θ) −1 ), so that the Wald-type statistics only differ in location.Since Bayes' theorem is a statement of adittivity on the log scale log π(θ|y) = log π(θ) + log L(θ)+ constant, we observe that in the Bayesian framework, m(θ) can be interpreted as the derivative of the logarithm of a prior, that is, m(θ) = ∂ log π(θ)/∂θ.We are thus looking for a matching prior π * (θ) such that In the scalar parameter case, it is straightforward to show that the proposed median matching prior takes the form with ℓ θθ (θ) = ∂ℓ θ (θ)/∂θ and ℓ θθθ (θ) = ∂ℓ θθ (θ)/∂θ, where the second expression for π * (θ) follows from the Bartlett's identities.The posterior based on the median matching prior is thus A first-order approximation for the e-value, when testing H 0 : θ = θ 0 , is simply given by which differs in location with respect to the classical first-order approximation for the e-value based on the MLE.A second approximation for the e-value, when testing H 0 : θ = θ 0 , can be obtained from the asymptotic distribution of the modified score function (10), that is Although the first-order equivalence between ( 11) and ( 12), note that ( 11) is based on an easily understandable comparison between estimated value and hypothetical value, taking estimation error into account, and is widely used in applications but does not satisfy the principle of parameterization invariance.On the other hand, t(θ)/ i(θ) is parameterization invariant.Note that, when using a predictive matching prior, i.e., a prior ensuring asymptotic equivalence of higher-order frequentist and Bayesian predictive densities (see, e.g., [8]), the term m(θ) in (10) corresponds to the Firth's adjustment ( [25]) In view of this, for general regular models, Firth's estimate coincides with the mode of the posterior distribution obtained using the default predictive matching prior.However, lack of invariance affects this kind of adjustment ( [23]), unless dealing with linear transformations.
Example 3 (Skew-normal distribution).Consider a skew-normal distribution with shape parameter θ ∈ IR, with density f (y; θ) = 2ϕ(y)Φ(yθ), where ϕ(•) is the standard normal density function.The median correction term for the score function associated to the median matching prior is (see [26,29]) Numerical integration must be performed to obtain the expected values involved in m(θ).
In order to illustrate the proposed prior, we consider draws from the skew-normal model with true parameter θ 0 = 3 and increasing sample sizes n = 20, 30, 50, 200 (Figure 1).The posterior distributions are obtained with the method by [30], i.e., drawing 10 5 values and accepting the best 5%.The e-values associated to the null (true) hypothesis H 0 : θ = 3 and the (false) hypothesis H 0 : θ = 4 are reported in Table 1.For comparison, the Jeffreys' prior ( [31]), the predictive matching prior ( [29]) and the flat prior, with uniform reference function, are also considered.Progressive agreement among evidence values obtained with the proposed median matching prior and the other priors for larger sample size is shown.Also, as expected, when progressively increasing n, the evidence values indicate agreement with the true hypothesis H 0 : θ 0 = 3 and disagreement with H 0 : θ 0 = 4 for all the priors used.Anyway, note that the posterior distribution obtained with a flat prior and a uniform reference function is proportional to the likelihood function that can be monotone.In view of this, while the MAPs of the posterior based on the default priors are always finite, in some samples the MAP of the posterior with the non-informative prior may be infinite.An example of this effect is illustrated in Figure 2. The red line is used for the posterior obtained from the median matching prior, the green one for the predictive matching prior, the violet one for the Jeffreys' prior and the blue one from an improper flat prior.The horizontal lines identify the corresponding tangential sets associated to the hypothesis H 0 : θ = 3.
density function.The median correction term for the score function associated to the median matching prior is (see [37] and [18]) Numerical integration must be performed to obtain the expected values involved in m(θ).234 The red line is used for the posterior obtained from the median matching prior, the green one for the predictive matching prior, the violet one for the Jeffreys' prior and the blue one from an improper flat prior.The horizontal lines identify the corresponding tangential sets associated to the hypothesis H 0 : θ = 3.  H 0 : θ = 3 and the (false) hypothesis H 0 : θ = 4 are reported in Table 1.For com also the Jeffreys' prior ([24]), the predictive matching prior ( [37]) and the flat pri uniform reference function, are considered.Progressive agreement among evidenc obtained with the proposed median matching prior and the other priors for larger size is shown.Also, as expected, progressively increasing n, the evidence values agreement with the true hypothesis H 0 : θ 0 = 3 and disagreement with H 0 : θ 0 = the priors used.Anyway, note that the posterior distribution obtained with a fl and a uniform reference function, is proportional to the likelihood function tha monotone.In view of this, while the MAPs of the posterior based on the default pr always finite, in some samples the MAP of the posterior with the non-informati may be infinite.An example of this effect is illustrated in Figure 2.
The properties of first-order approximations of the e−values have been inve by a simulation study, with sample sizes n = 20, 30, 50, 200.Results are displ Figure 3. Distributions of the e−value from the posterior based on the median m prior are better, both for small and moderate sample sizes, in terms of converg The properties of first-order approximations of the e-values have been investigated by a simulation study, with sample sizes n = 20, 30, 50, 200.Results are displayed in Figure 3. Distributions of the e-value from the posterior based on the median matching prior are better, both for small and moderate sample sizes, in terms of convergence to the uniform distribution.Moreover, score-type e-values (12) are also preferable over Wald-type e-values (11).For the results with the posterior distribution obtained with a flat prior, we found 4.3%, 4.2%, 0.9%, 0% of infinite estimates for the sample sizes considered in the simulation study, and in these cases the e-value was considered as 0. : θ = 3, using a flat prior (blue line), the median matching prior (red line), the predictive matching prior (green line), and the Jeffreys' prior (violet line).The darker line is used for the approximation (11) while the lighter for that based on (12).

263
Note that, also in the context of nuisance parameters, we are in the situation in which 264 the proposed prior π * (ψ) is known through its first derivative; this is typically the situation 265 with matching priors (see, e.g., [8]).Since the parameter of interest is scalar, the posterior 266 based on the median matching prior can be written as A simple analytical way of approximating to first-order the posterior distribution ( 14) 268 based on the median matching prior is to resort to a quadratic form of t p (ψ).In particular, 269 the approximate posterior distribution for ψ takes the form Figure 3. Skew-normal model: Distributions of e-values from a simulation study under the null hypothesis H 0 : θ = 3, using a flat prior (blue line), the median matching prior (red line), the predictive matching prior (green line), and the Jeffreys' prior (violet line).The darker line is used for the approximation (11), while the lighter is for that based on (12).

Presence of Nuisance Parameters
In the presence of nuisance parameters, in order to obtain median bias reduction of the MLE, it is possible to resort to a modified version of the profile score function of the form where m(ψ, λ) is a suitable correction term of order O(1).In particular, for the median modified profile score function, the adjustment m(ψ, λ) assumes the expression where κ 1ψ , κ 2ψ and κ 3ψ are the first three cumulants of ℓ ′ p (ψ) (see [26], Section 2.2, for their expression).For the estimator ψp , defined as the solution of t p (ψ) = 0, parameterization equivariance holds under interest respecting reparameterizations ( [26]).
Note that, also in the context of nuisance parameters, we are in the situation in which the proposed prior π * (ψ) is known through its first derivative; this is typically the situation with matching priors (see, e.g., [8]).Since the parameter of interest is scalar, the posterior based on the median matching prior can be written as A simple analytical way of approximating to first-order the posterior distribution ( 14) based on the median matching prior is to resort to a quadratic form of t p (ψ).In particular, the approximate posterior distribution for ψ takes the form where s p (ψ) = t p (ψ) 2 j p (ψ) −1 is a Rao score-type statistic based on (13), and the symbol " ∝" means asymptotic proportionality to first-order.In this case, a first-order approximation of the e-value, when testing H 0 : ψ = ψ 0 , is given by In this case, an higher-order approximation via (3) would be impractical since a closed-form prior is not available.As an alternative, simulation-based approaches may be used to derive the implied posterior distribution ( 14) based on the median matching prior.The first one relies on Approximate Bayesian Computation (ABC) techniques, using ψp or the modified profile score function t p (ψ) as summary statistics; see [32] for the modification of the algorithm of [30] by using a profile estimating equation.This first method introduces an approximation at the level of the posterior estimation.The second one still relies on (13) but considers use of Manifold MCMC methods (see, e.g., [33]) to conditioning exactly on the profile equation and not up to a tolerance level, as in ABC (see [34,35]).The algorithm moves on the constrained space {(y, ψ) ∈ Y × ⊖|t p ( ψp ) = 0}, where ψp is the solution of the estimating equation on the original data.For the latter method, we need minimal regularity assumptions on m(ψ, λ), which is assumed to be continuous, differentiable and available in closed form expression. Note, for instance, that in the skew-normal example in Section 3.1 these conditions are not met.

Example 5 (Multivariate regression model). Consider a regression model of the form
where it is assumed that ϵ i ∼ N 2 (0, Σ), with Σ = σ 2 1 ρ ρ 1 positive definite matrix, and (β 0 , β 1 , β 2 , σ 2 , ρ) are unknown parameters.This model is widely used for instance in time series analysis, in which as regression covariates the past of the observables y are used.We focus on the problem of testing hypothesis on the correlation coefficient ρ.Consider a draw with true parameter ρ 0 = 0.95 and n = 20.For obtaining the proposed posterior (14) for ρ, we first compute the MAPs with the median matching prior and also, for comparison, with the predictive matching prior, which are, respectively, 0.953 and 0.92.Note that the expression of the predictive matching prior for (ψ, λ) corresponds to the Firth's adjustment to the score function.The expressions of the modified profile estimating functions t p (ψ) and t F (ψ) are obtained from [38] and are available in closed form expressions.Hence, the Manifold MCMC method can be used to obtain the implied posteriors, whose approximation is comparable to that of any MCMC sampler.In particular we used 20,000 iterations.
We compare the posterior distribution based on the proposed median matching prior, with those obtained with the predictive matching prior and with an inverse-Wishart prior for the covariance matrix Σ with one degree of freedom and identity position, and uniform prior on the regression parameters.The posterior distributions are displayed in Figure 4.The hypothesis of interest is H 0 : ρ = 0.9, and a smaller e-value indicating disagreement with the hypothesis should be preferable.The e-values are 0.25 with the median matching prior, 0.36 for the predictive matching prior and 0.60 with the inverse-Wishart prior.Note that the e-value based on the inverse-Wishart prior involves the constrained maximization and multidimensional integration and thus is not directly readable in Figure 4. Indeed, one crucial difference is that the original e-value formulation links the evidence of the null hypothesis to the evidence of a more refined hypothesis, choosing the MAP under the null hypothesis for all the nuisance parameters, while in the alternative (tangential) set, all values are used, and integration is performed on the full dimensionality of the space.On the contrary, in the proposed posterior based on the median matching prior, the maximizer of nuisance parameters are taken both in the null and non-null sets.
Finally, for the posterior based on the inverse-Wishart prior, we also computed the e-value based on the high-order tail area approximation (9) of the marginal surprise posterior, which is equal to 0.27.This procedure still avoids the multidimensional integration, but the result is not invariant to changes in parametrization.prior, 0.36 for the predictive matching prior and 0.60 with the inverse-Wishart prior.Note 315 that the e−value based on the inverse-Wishart prior involves the constrained maximization 316 and multidimensional integration thus is not directly readable in Figure 4. Indeed, one 317 crucial difference is that the original e−value formulation links the evidence of the null 318 hypothesis to the evidence of a more refined hypothesis, choosing the MAP under the 319 null hypothesis for all the nuisance parameters, while in the alternative (tangential) set all 320 values are used, and integration is performed on the full dimensionality of the space.On 321 the contrary, in the proposed posterior based on the median matching prior, the maximizer 322 of nuisance parameters are taken both in the null and non null sets.

323
Finally, for the posterior based on the inverse-Wishart prior, we also computed the 324 e−value based on high-order tail area approximation (9) of the marginal surprise posterior, 325 which is equal to 0.27.This procedure still avoids the multidimensional integration but the 326 result is not invariant to changes of parametrization.

327
Example 6: Logistic regression model.Let y i , i = 1, . . ., n, be independent realizations 328 of binary random variables with probability π i , where log(π i /(1 − π i )) = η i = x i β and 329 x i = (x i1 , . . ., x ip ) is a row vector of covariates.We assume that a generic scalar component 330 of β is of interest and we treat the remaining components as nuisance parameters.

331
As an example, we consider the endometrial cancer grade dataset analyzed, among 332 Example 6 (Logistic regression model).Let y i , i = 1, . . ., n, be independent realizations of binary random variables with probability π i , where log(π i /(1 − π i )) = η i = x i β and x i = (x i1 , . . ., x ip ) is a row vector of covariates.We assume that a generic scalar component of β is of interest, and we treat the remaining components as nuisance parameters.
As an example, we consider the endometrial cancer grade dataset analyzed, among others, in [39].The aim of the clinical study was to evaluate the relationship between the histology of the endometrium (HG), the binary response variable, of n = 79 patients and three risk factors: 1. Neovascularization (NV), that indicates the presence or extent of new blood vessel formation; 2. Pulsatility Index (PI), that measures blood flow resistance in the endometrium; 3. Endometrium Height (EH), that indicates the thickness or height of the endometrium.A logistic model for HG, including an intercept and using all the covariates (NV, PI, EH), has been fitted, but a maximum likelihood leads to the infinite MLE of the coefficient β 2 related to NV, due to quasi-complete separation.This phenomenon prohibits the use of diffuse priors for β 2 , since the corresponding posterior would not concentrate.Moreover, the e-value with non-informative priors cannot be obtained also for any hypothesis concerning parameters different from β 2 .
If we consider β 2 as the parameter of interest, while the remaining regression coefficients are treated as nuisance parameters, the analysis with the median matching prior allows us to obtain a global proper posterior, with MAP equal to 3.86, open to interpretation both in the original scale and in terms of odds ratios.Similarly, the posterior based on the predictive matching prior, which in this model coincides with Jeffreys' prior π(β) ∝ |i(β)| 1/2 is proper, with the MAP set at 2.92.The latter suffers from a lack of interpretability on different scales, since a different parametrization in the estimation phase would affect the results.
If we consider β 3 as the parameter of interest, related to the risk factor PI, the MAPs are −0.038when using the median matching prior and −0.035 when using the predictive matching prior.The e-values for the hypothesis H 0 : β 3 = 0 are 0.60 and 0.55, respectively (see Figure 5).Likewise, the interpretation of e-values remains consistent and independent of parametrization solely in the first case.
Version January 2, 2024 submitted to Entropy 13 of 15  If we consider β 3 as the parameter of interest, related to the risk factor PI, the MAPs 351 are -0.038when using the median matching prior and -0.035 when using the predictive 352 matching prior.The e−values for the hypothesis H 0 : β 3 = 0 are 0.60 and 0.55, respectively 353 (see Figure 5).Likewise, the interpretation of e-values remains consistent and independent 354 of parametrization solely in the first case.Although (14) cannot always be considered as orthodox in a Bayesian setting, the use of 357 alternative likelihoods is nowadays widely shared, and several papers focus on the Bayesian 358 application of some well-known pseudo-likelihoods.In particular, the proposed posterior 359 π * (ψ|y) has the advantages of avoiding the elicitation on the nuisance parameter λ and of 360 the computation of multidimensional integrations.Moreover, it provides invariant MAPs, 361 HPDs and e−values, without the adoption of a reference function.Finally, we remark that 362 frequentist properties of the MAP of the posterior based of the proposed median matching 363 prior in comparison with the MAP of the posterior based of the predictive matching prior 364 have been investigated in [18] and [3] for some of the examples discussed in this paper.

365
For inference on a full vector parameter θ, with d > 1 components, a direct extension 366 of the rationale leading to (10) does not seem to be practicable due to lack of a manageable 367 definition of multivariate median.Actually, in [18] and [19] it is shown how the method can 368 be extended to a vector parameter of interest, in the presence of nuisance parameters, by 369 simultaneously solving median bias corrected score equations for all parameter components.370 This leads to componentwise third-order median unbiasedness and parameterization 371

Conclusions
Although (14) cannot always be considered as orthodox in a Bayesian setting, the use of alternative likelihoods is nowadays widely shared, and several papers focus on the Bayesian application of some well-known pseudo-likelihoods.In particular, the proposed posterior π * (ψ|y) has the advantages of avoiding the elicitation on the nuisance parameter λ and of the computation of multidimensional integrations.Moreover, it provides invariant MAPs, HPDs and e-values, without the adoption of a reference function.Finally, we remark that frequentist properties of the MAP of the posterior based off the proposed median matching prior in comparison with the MAP of the posterior based of the predictive matching prior have been investigated in [26,38] for some of the examples discussed in this paper.
For inference on a full vector parameter θ, with d > 1 components, a direct extension of the rationale leading to (10) does not seem to be practicable due to lack of a manageable definition of the multivariate median.Actually, in [26,40], it is shown how the method can be extended to a vector parameter of interest in the presence of nuisance parameters by simultaneously solving median bias-corrected score equations for all parameter components.This leads to componentwise third-order median unbiasedness and parameterization equivariance.Moreover, the use of default priors involving all parameter components, also the nuisance, becomes necessary to regularize likelihoods in case of monotonicity.We note that among the possible objective priors that ensures invariance of the posterior, we did not focus on the Jeffreys' in the multidimensional case, since it often exhibits poor convergence properties.Conversely, the default matching priors considered in this paper are easily generalizable to the multidimensional case [40] preserving good convergence properties.
As a final remark, we highlight that this paper opens several topics of future research.In particular, from a computational point of view, it could be of interest:

•
To develop a library of computational routines exploring the methods proposed in this paper for a wide range of statistical models of interest; • To develop semi-automated procedures for further expanding this library, as is done for point estimation for Generalized Linear Models in the R package brglm2 [41].
Moreover, from a theoretical point of view, it could be of interest: • To further explore the theoretical connections between the e-value invariance properties and matching priors; • To explore the existence of similar connections in other classes of pseudo-likelihoods, in particular in the context of empirical and profile empirical likelihoods, with a large number of nuisance parameters ( [42]), • To apply and extend the methodology to consider other objective priors used in Bayesian inference, such as those obtained from scoring rules, as proposed in [10], that are expressed as solutions of differential equations.

Figure 1 .
Figure 1.Inference for the scalar parameter θ of the skew-normal model with sample sizes n = 20, 30, 50, 200 (top-left, top-right, bottom-left and bottom-right panels, respectively).The red line is used for the posterior obtained from the median matching prior, the green one for the predictive matching prior, the violet one for the Jeffreys' prior and the blue one from an improper flat prior.The horizontal lines identify the corresponding tangential sets associated to the hypothesis H 0 : θ = 3.

Table 1 .Figure 1 .
Figure 1.Inference for the scalar parameter θ of the skew-normal model with sample sizes n = 20, 30, 50, 200 (top-left, top-right, bottom-left and bottom-right panels, respectively).The red line is used for the posterior obtained from the median matching prior, the green one for the predictive matching prior, the violet one for the Jeffreys' prior and the blue one from an improper flat prior.The horizontal lines identify the corresponding tangential sets associated to the hypothesis H 0 : θ = 3.

Figure 2 .
Figure 2. Skew-normal model: An example of ∂ log π(θ|y)/∂θ (estimating equation) with a (blue line), the median matching prior (red line) the predictive matching prior (green line Jeffreys' prior (violet line) in a sample where all the observations are positive.

Figure 2 .
Figure 2. Skew-normal model: An example of ∂ log π(θ|y)/∂θ (estimating equation) with a flat prior (blue line), the median matching prior (red line) the predictive matching prior (green line) and the Jeffreys' prior (violet line) in a sample where all the observations are positive.

Figure 4 .
Figure 4. Posterior distributions for the correlation parameter ρ of the bivariate regression model obtained from MCMC draws and the three different priors.

Figure 4 .
Figure 4. Posterior distributions for the correlation parameter ρ of the bivariate regression model obtained from MCMC draws and the three different priors.

Figure 5 .
Figure 5. Median matching posterior distribution and predictive matching posterior distributions for β 3 in the logistic regression model.

Figure 5 .
Figure 5. Median matching posterior distribution and predictive matching posterior distributions for β 3 in the logistic regression model.
1/2is 348 proper, with the MAP set at 2.92.The latter suffers from lack of interpretability on different 349 scales, since a different parametrization in estimation phase would affect the results.