1. Introduction and General Setting
Full Bayesian Significance Testing (FBST) [
1] is a Bayesian method for testing if a parameter
belongs to some set
. In traditional statistical setting, researchers analyze a collection of
n observations
that are presumed to conform to a specified distribution
characterized by an unobserved parameter
. A Bayesian statistician makes inferences about
by updating a prior density
, supported by the set of all possibilities
. After observing
, one obtains a posterior density
. Often, one needs to determine whether
supports a scientific hypothesis framed with respect to
belonging to some subset
, written
. FBST tests
by comparing the posterior density
of points inside and outside
. This comparison is represented by the posterior probability of the tangential set:
encompasses all points in the parameter space that exhibit lower posterior density compared to those in . The FBST methodology posits that if the posterior probability of is low, the hypothesis should be rejected, as it is located in a region characterized by low posterior density.
Definition 1. In a standard Bayesian statistical model, let Θ
be a finite dimensional parametric space, an observed sample, the likelihood function, π be the prior distribution, and the posterior density proportional to . Also, let be the measure on Θ
induced by . The Full Bayesian Significance Test (FBST) for testing consists on rejecting the null hypothesis based on the e-value statisticwhere the tangential set is given by Equation (1). is rejected if for some fixed significance level . In other words, the e-value quantifies the credibility of a hypothesis using the maximum probability argument, whereby a system is optimally represented by its most probable realization. This probability is defined as the posterior density , which quantifies the continuous probability associated with a specific point . The e-value directly addresses the question “What is the posterior probability of observing a with a posterior density exceeding that of any point in ?”. A higher e-value signifies that is deemed more credible, whereas a lower e-value suggests that is considered less credible.
In this paper, we extend this concept to a nonparametric framework for density estimation using histograms. Bayesian nonparametric approaches for density estimation can be divided into two main categories. The first type focuses on defining priors
in the infinite-dimensional space of probability densities. Upon observing the data
, these priors are updated into infinite-dimensional posteriors, facilitating an adapted approach to Bayesian inference. Well-established examples of such priors include the Dirichlet Process Mixtures (DPM) and its extensions [
2]. In contrast, the second type of Bayesian nonparametric approach employs regular finite-dimensional Bayesian modeling in parameter spaces
that maintain a fixed dimension
, while allowing
for gradual expansion as the sample sizes increase. This includes truncated versions of infinite-dimensional priors and histograms with a fixed number of bins that increases with the sample size. This paper specifically examines a variant of the FBST applied in the context of increasing dimensionality.
In this paper, we propose an FBST for the problem of Bayesian density estimation using Dirichlet-Multinomial models, interpreted as histograms where the number of bins increases with the sample size. This methodology is in alignment with the Bayesian frameworks outlined by [
3,
4,
5]. Therefore, we will use consistent notation to leverage results from the existing literature. The primary advantages of leveraging the Dirichlet-Multinomial model include (1) the feasibility of deriving an explicit formula for the FBST test statistic in a nonparametric context, (2) the implicit relation between the formula for the FBST test statistic and the differential entropy estimation, and (3) the potential to extend frequentist consistency results from the literature to this method. These attributes collectively establish a robust framework for nonparametric hypothesis testing that is mathematically rigorous, interpretable through the lens of information geometry, and consistent from a frequentist standpoint.
This paper is structured as follows.
Section 2 outlines the essential definitions and properties of our proposed methodology.
Section 3 provides simulations demonstrating the statistical power of our test. Finally,
Section 4 offers a discussion of our findings and potential avenues for future research. The proof of our results are presented in
Appendix A.
2. FBST for Random Histograms
We start with a formal definition of our model. To maintain clarity, we will restrict our analysis to densities on .
Definition 2. For , consider the set of densities with support on , defined aswhere for . A random histogram θ is a random variable that selects a random element of . The distribution of
is fully characterized by the distribution of the vector of random weights
. Bayesian posterior inference on
may also be conducted with respect to
W if the likelihood is given by
which corresponds to the assumption that the sample values
are conditionally independent and share an identical density
. In this paper, we shall consider random histograms sampled implicitly by Dirichlet priors of the weights
W. This approach guarantees that the posterior inference on
is conjugate and computationally tractable, as it is equivalent to inference on a Dirichlet-Multinomial Bayesian model.
Proposition 1. Consider θ a random histogram with weights . If , and , then the posterior remains a random histogram with weightswhere A usual approach for Bayesian nonparametric inference on a histogram is letting
grow slowly with the sample size
n. This may be interpreted as a data-dependent prior; the full parameter space being considered is the set of all densities and, contingent on
n, random histograms puts mass only on specific subsets of this set. One could define priors that do not depend on
n, but this would come at a heavy computational cost. Moreover, meaningful and computationally sound inference might be conducted both in frequentist and Bayesian perspective if we also require the priors of
w to depend on
n [
2,
4].
Fixing
n and
, the original definition of
may be adapted to conduct tests regarding
. Given that there exists a bijection between an element of
and its corresponding weights
, the FBST test statistic may be defined in terms of the Dirichlet distribution defined in Equation (
3). However, this approach comes at the price of being able to only test hypothesis of the form
. Therefore, if a researcher is interested in testing a hypothesis framed in terms of a general
, our proposed procedure specifies a test statistic based on its finite-dimensional counterpart. As
is permitted to increase with the sample size, this translation process becomes increasingly negligible.
Definition 3 (FBST for random histograms)
. Let θ be a random histogram defined by Dirichlet weights, and represent an i.i.d. sample drawn from θ. The FBST test statistic for testing a hypothesis , where is an arbitrary set of densities on , is given bywhere denotes probabilities attributed to the sets by each element of : The FBST for random histograms may be interpreted in the context of information theory. Let
p and
q be two
m dimensional probability vectors. We recall that the cross-entropy divergence
between those vectors is given by
and the Kullback–Leibler divergence
is given by
. By introducing
, Equation (
3) can be articulated as
These equations demonstrate that the application of the FBST definition leads to statistical tests grounded in an information-theoretic measure of divergence between a distribution of the sample into
bins and the expected value of counts on those same bins under the assumption that
is some hypothesized density
f. Indeed, in the context of this particular test, a related concept has emerged in the literature on goodness-of-fit testing, notably in G-tests [
6] and other methodologies rooted in frequentist nonparametric estimates of the continuous variant of the Kullback–Leibler divergence for probability densities [
7]. Both tests utilize a
asymptotic distribution under the null hypothesis. For the FBST, there are specific rates of increase of
that ensure the presence of analogous results.
Theorem 1. If (1) is an independent and identically distributed sample of with density Lipchitz continuous on ; (2) θ is a random histogram satisfying for all i and fixed quantities and (3) , for any , then the FBST for random histograms with satisfies
- 1.
if and
- 2.
if , where denotes convergence in probability with respect to .
One particular virtue of Equation (
4) is the simplicity of the optimization step in the FBST. This fact is due the convexity of the cross-entropy functional. Also, this optimization will be able to reject false null hypotheses as the sample size grows larger, as we exemplify for the case of fixed parametric families.
Theorem 2. Let be a parametric family of differentiable distribution functions , such that and is the corresponding subset of the dimensional simplex. Then, the FBST on histograms for goodness-of-fit of this parametric family satisfies in probability.
This procedure is similar to other nonparametric methods that do not rely on maximum likelihood estimates for testing, but instead optimize specific statistics. This idea dates back to Berkson’s suggestion to minimize chi-squared rather than maximize likelihood [
8], although there have been few attempts to directly optimize test statistics, such as the Kolmogorov–Smirnov statistic [
9]. This may be because optimizing usual test statistics for goodness-of-fit, such as Kolmogorov–Smirnov, Anderson–Darling, and Cramer–von Mises [
10], requires specialized optimization procedures, like the one developed in [
9].
Alternatively, the most common approach for testing adherence to a parametric family of distributions involves estimating parameters by maximum likelihood and then deriving the null distribution of an existing test through resampling [
11]. Our new test, as we will demonstrate in simulations, could also require corrections when the optimization suggested by Theorem 2 is used.
3. Simulations
In this section, we will compare the statistical power of our test with that of other available techniques through simulations. Both simple and composite null hypotheses will be considered. Simulations will be conducted using the R programming language [
12] and its public repository of packages. For simple hypotheses, the following tests are compared:
The e-value for histograms, as defined in Definition 3, adopting with , with the number of bins defined as the hypothesis of Theorem 1;
Classic Kolmogorov–Smirnov (KS) test, as described in [
6];
Alternative versions of KS, AD, and CV, constructed by [
10], implemented by the R Package [
13].
Following [
10], we shall compare the NP-FBST of Definition 3 with
using sample sizes
. The null hypothesis tested shall be
and
will be simulated in 4 scenarios:
,
,
, and
. We calculate the statistical power as the % of correct rejection of
, with rejection at the
level, on 500 Monte Carlo sample. The results are summarized in
Figure 1.
Analyzing
Figure 1, we observe the following:
For and , the NP-FBST power may be much more powerful for small sample sizes, but it is still competitive for large sample sizes.
For
and
, which are non-Lipchitz alternative hypotheses, the test performs worse than Zhang’s alternatives [
10]. However, it still shows comparable or superior power compared to the usual Kolmogorov–Smirnov statistic.
For sample sizes below 1000, will usually be very small, such as 2 or 3, so the test is just a regular multinomial test.
To showcase the approximation properties of the NP-FBST based on Lemma A1, we shall simulate one last example, but this time we will adopt
, usually referred as Sturge’s Law, for histogram binning [
14]. This is, of course, appropriate asymptotically, as
for all
. However, this produces very competitive statistical power for testing if a
is a
, as highlighted in
Figure 2.
4. Conclusions
In this paper, we presented a new nonparametric Bayesian procedure extending the usual FBST for Bayesian histograms. We summarize our results as practical and theoretical.
On the practical and applied front, we draw the following conclusions:
For small sample sizes, our method is competitive in terms of statistical power, even compared to sophisticated alternatives such as Zhang’s tests [
10].
For larger sample sizes, the very slow sample size growth required by Theorem 1 harms the statistical power. Therefore, other binning rules could be considered. Further research shall look for an adaptable number of bins. Desirable binning rules should be larger than for small sample sizes, but smaller for large sample sizes. Our simulations suggest that the usual , known as Sturge’s Law, is a competitive alternative for moderate sample sizes lower than 1000.
Unlike previous attempts, our method is computationally inexpensive with competitive statistical power.
From a theoretical perspective, we derive the following conclusions:
The natural Dirichlet-Multinomial formulation of Bayesian histograms induces statistical tests based on estimates of Kullback–Leibler divergences. This formulation logically follows from the definition of the FBST, and the same logic could be applied to other Bayesian density estimation methods. The frequentist properties of other versions of this NP-FBST remain to be studied in future works, but our results highlight the Kullback–Leibler divergence as a possible “canonical” statistic for nonparametric versions of the FBST for density estimation.
Our results show that taking the limit of a slowly increasing finite-dimensional parameter space is a viable strategy for building nonparametric versions of the FBST. The frequentist properties of the FBST are intimately related to the Bernstein–von Mises theorem. Therefore, if these types of Gaussian approximations are available, our arguments should also hold. In fact, all the main references of this paper build specific growth rates of the dimension of the parameter space and could be used to find other versions of nonparametric FBSTs [
3,
4,
5].
For composite hypotheses, our method can be used both for testing based on the maximum likelihood estimate of nuisance parameters and for directly optimizing the test statistics, which may be interpreted as a weighted likelihood function. Usual numerical methods for optimizing the likelihood will work for our test statistic, which is not the case for other usual statistics such as Anderson–Darling, Cramer–von Mises, or Kolmogorov–Smirnov.
For future research, we highlight that adaptively choosing the number of bins is crucial, as the statistical power is heavily influenced by this quantity. Additionally, Theorem 1 requires a Lipschitz continuous data-generating density, a usual assumption for histograms, but excludes unbounded densities, which are important from a practical and theoretical point of view. Extending our results to Hölder continuous densities is particularly important but requires the derivation of other versions of Bernstein–von Mises theorems.