Abstract
Bayes estimators for the unknown mean against a reference, non-informative prior distribution for both the mean and independent variances are derived. I entertain the scenario with two groups of observables with the same unknown mean. The unknown variances of the the first group are not supposed to be equal or to be restricted; the second homeogeneous group of observations all have the same unknown variance. Under the normality condition, these procedures turn out to have a very explicit form of the weighted average with data-dependent weights that admit of a very clear interpretation. The approximate formulas for the variance of the considered estimators and their limiting behavior are also examined. The related “self-dual” orthogonal polynomials and their properties are examined. Recursive formulas for estimators on the basis of these polynomials are developed.
Keywords:
Bayes estimators; heterogeneous variances; non-informative prior; Mills ratio; missing uncertainties; orthogonal polynomials; poly-t distribution; repeatability MSC:
62F10; 62F15
1. Introduction: Missing Uncertainties
Assume that the available data consist of a sequence of independent observations, , each having the same mean. We consider the situation where the unknown variances of the observables cannot be supposed to be equal or to be restricted in any way. This scenario is perhaps unusual in the statistical community. However, according to Morris (1983, p. 49) [1], “… almost all applications involve unequal variances”. The problem is to estimate the common mean modeled as a location parameter without traditional conditions on the standard deviations (scale parameters).
This setting appears in instances of heterogeneous research synthesis where represents the summary estimate of the common mean (e.g., the treatment effect) obtained by the j-th study. Commonly, the protocol of such studies demands that be accompanied by its uncertainty estimate, but sometimes these estimates are either unavailable or cannot be trusted. In many applications, the variances of systematic, laboratory-specific errors cannot be reliably estimated; a scientist cannot place confidence in inferences made under unrealistically low noise. The issue of underreported uncertainties, particularly those that stem from asymptotic normal theory, which presupposes large datasets, is prevalent in many applications. The existing imputation techniques (e.g., Rukhin [2], Templ [3]) may not provide justifiable uncertainty values.
The latest point of view (Spiegelhalter [4]) is that uncertainty is a subjective relationship between an observer and what is observed. The issue of underreported uncertainties, particularly those that stem from asymptotic normal theory, which presupposes large datasets, is prevalent in metrology. The challenge of reproducibility within individual centers may be exacerbated by the nature of the measuring instruments employed, leading to heterogeneous unknown uncertainties (see Possolo [5]). The most striking example is provided by “one-shot” devices in the atomic industry, which are limited to single use.
An additional contemporary illustration is found in citizen science or crowd-sourcing projects, where participants contribute measurement results of various random phenomena, with some of them using relatively imprecise instruments, such as smartphones. These measurements can range from precipitation levels to air quality and biological observations. See Hand [6] for an introduction. Despite anticipated heterogeneity, the project organizers are faced with the task of synthesizing data in the absence of reliable uncertainty () estimates.
Our investigation focuses on Bayes estimators obtained from the posterior distribution for an unknown mean, set against a non-informative, objective, or “uniform” prior distribution for both the mean and independent variances. This line of inquiry, initiated by Rukhin [7] under the assumption of normality, grapples with the complete lack of variance information. Needless to say, this framework introduces several statistical complications. For instance, the classical maximum likelihood estimator becomes undefined, as the likelihood function reaches infinity at each data point. Nevertheless, the problem is well defined, as estimating the common mean requires determining at most n parameters, the mean itself, and the variance ratios, , which belong to the unit simplex of dimension , .
In Section 2.1, we investigate the Bayes estimators in the setting allowing for a group of homeogeneous observations, which have the same unknown variance. Under the normality condition, these procedures turn out to have a surprisingly explicit form. In fact, each of the derived rules is a weighted average with data-dependent weights that are invariant under the location–scale transformations, admitting a very clear interpretation. The approximate formulas for the variance of the considered estimators and their limiting behavior are also examined. Section 3 contains several approaches to the distribution of the Bayes estimator. The orthogonal polynomials are discussed in Section 3.3 with recursive formulas derived in Section 3.4.
2. Non-Informative Priors and Bayes Estimators
Consider the situation where distinct independent observables are drawn from a location–scale parameter family with underlying symmetric density p, which has all necessary moments.
The principal interest is in the mean , while are positive nuisance parameters. For this purpose, one needs to estimate the -dimensional vector , with and . If is such an estimator, then one can use as a -statistic. Indeed, if all scale parameters are known, the best unbiased estimator of is the weighted means rule, .
Commonly, the estimated weights are taken to be location-invariant—i.e., for any real c,
Then, the corresponding estimator is (location) equivariant,
Most estimators used in practice are also scale-equivariant,
and this property calls for scale-invariant weights.
In the normal case, the reduction to the invariant procedures leads to an explicit form of the maximum likelihood estimators and of some Bayes procedures.
To eliminate nuisance parameters (or , one can use a non-informative prior, which is a classical technique. Under mild regularity conditions on the underlying density p, Rukhin [8] derived the Bayes estimator under the quadratic loss (the posterior mean) against the uniform (reference) prior . This statistic coincides with the Bayes rule within the class of invariant procedures.
The discrete posterior distribution is supported by all data points with probabilities
Here, further,
denotes the parity of observations.
Thus, the Bayes estimator has a very explicit form:
The magnitude of probabilities (1) describes the intrinsic similarity of observations: the weight of a data-point is large if it is close to the bulk of data, meaning that is relatively small.
Statistic also appears in the approximation theory. According to the Tchebycheff interpolation formula, one has
where runs through polynomials of degree not exceeding . See Chapter 5 in Trefethen (2013) [9].
The probabilities (1) have their origin in optimization problems involving the discriminant function. Borodin [10] discusses their use in statistical physics. Genest et al. [11] study the remarkable mirror-symmetry (persymmetry) of the underlying Jacobi matrix.
2.1. Heterogeneity and Homogeneity
Here, normality of observations, , , is assumed. We consider the setting where, in addition to x values, there is a possible group of distinct homogeneous data that have the same unknown standard deviation , say, . In the context of citizen science projects mentioned in Section 1, may represent data supplied by smartphone users, while x values correspond to measurements derived by other means. In metrology applications, a known homogeneous group of laboratories employing the same techniques may participate in interlaboratory studies.
Then, one has independent observations and altogether unknown parameters with the main interest in .
We start with the traditional reference prior density of the form
relative to . Here,
For any continuous bounded function ,
Indeed, for any ,
and
For any fixed small and fixed , provided that all data points are different,
Therefore, for ,
where .
Thus, we can formulate the first result.
Theorem 1.
Under the prior (3), when the posterior distribution of μ is discrete with finite support and the probabilities
Here, are estimators of the common mean and variance based on the homogeneous sub-sample.
The probabilities (4) still describe the intrinsic similarity of observations: the weight of a data-point is large if it is close to the greater part of the data. The attenuating factor, when , encourages , which is close to . When the homogeneous data are absent, this factor is 1 and (5) coincides with (2).
In this situation, the posterior mode , if
presents the maximum likelihood estimator within the class of invariant procedures.
The prior density (3) for the mean and the variances represents the right Haar measure on the group of linear transforms. In the context of the multivariate normal model, it is known as the the Geisser–Cornfield prior. See Geisser [12], Ch 9.1. This prior is known to be an exact frequentist matching prior yielding as the posterior Fisher’s fiducial distribution (Fernandez and Steel [13], Severini et al. [14]).
Despite this fact, “the prior seems to be quite bad for correlations, predictions and other inferences involving a multivariate normal distribution” (Sun and Berger [15]). Its mentioned drawbacks stem from the fact that if , the marginal (or prior predictive) density does not exist. A related weakness of (5) is its sensitivity to observations which are close one to another.
To mitigate these drawbacks, we look now for other prior distributions.
2.2. Conjugate Priors and Variance Formulas
A wide class of Bayes estimators of arises from conjugate prior densities,
relative to . Here, and are hyperparameters to be specified in (6), .
A slightly modified proof of Theorem 1 shows that the posterior distribution of under the prior (6) is proportional to
where , which is treated as a constant in the following discussion. The posterior distribution in this situation is the product of t-densities (with a degrees of freedom) and t-density (with degrees of freedom). Thus, it is a particular case of the poly-t distribution, which is ubiquitous in multivariate analysis. It appears in the posterior analysis of linear models (Box and Tiao [16]) and is popular in econometrics (Bauwens [17]).
The Bayes estimator has the form
If , (7) is the classical Pitman estimator of the location parameter involving t-distributions with a degrees of freedom. It is especially well studied for the Cauchy location/scale parameter family ().
If, in addition, ,
which corresponds to the formal Pitman estimator of the location parameter derived from the working family employed to estimate the location parameter when the observations are normal.
Needless to say that the functions in this family are not probability densities. Moreover, they have a singularity of the third kind.
When and are fixed positive numbers, the approximate variance of (7) can be found via the usual argument employed for M-estimators, i.e., solutions of the moment-type equation (or minimizers of ). In our case, the contrast functions are,
The M-estimator satisfies the equation
According to well-known results (e.g., Huber and Ronchetti [18]),
Here, refers to the expectation evaluated under the normal distribution with zero mean and variance ; the distribution of is also normal with variance . The main restriction on is that the Central Limit Theorem for holds. For instance, one can employ (8) if the Liapounov condition for independent non-identically distributed summands is satisfied—i.e., (e.g., Lehmann [19], Theorem 2.7.3).
To simplify (8), we need the known formula for the standard normal Z and positive ,
where is the familiar Mills ratio (Stuart and Ord, 1994) [20]. With the differentiation of this identity shows that for ,
and
where for one has to replace by .
When ,
where
Since , one has
where
is the best unbiased estimator of when all variances are equal.
If , i.e., and are adequate approximations of the common variance, then
Thus, when all are equal, and hyperparameters in (6) are chosen so that and , the variance of is about larger than that of Smaller values of lead to the larger variance .
If for some sequence , , the corresponding estimator is asymptotically normal, , albeit at a slower rate than . Therefore, there is no surprise that in the case of (for which , when as , one has n Var(. Indeed, it seems that bears more resemblance to the nonparametric estimates of the location parameter for which the convergence rate is slower than . Numerical experiments suggest that in the normal case, Var.
We summarize now the main results of this section.
Theorem 2.
For the remainder of this paper, we will concentrate on the estimator .
3. Distribution of
3.1. Jacobian and Moments
Let the vector have unit coordinates and representing a random sample. By location equivariance,
and by scale equivariance,
Define the matrix Q by its elements , , so that
where the i-th coordinate of the vector is , and the skew-symmetric matrix Z for has elements , and a zero diagonal.
We will get an extension of (15) to higher moments: . It is shown in Rukhin (2023) [8] that for any integer ,
so that
The coefficients, vanish for .
Indeed,
Because of Hermite’s (osculatory) interpolation formula, for ,
Therefore, for these values of m,
In particular,
Let so that the probabilities (4) can be written as . When and ,
For any positive integer m, the coefficients determine the asymptotic expansion in of , which is
For , the values can be found from the formula
If , is the parity of , then
is a product of two polynomials of degrees (even), or (odd); if ; if .
According to the classical Lagrange interpolation formula, one has
For non-negative integer m, define . Then, the same formula implies that for any ,
The coefficients determine the asymptotic expansion in of
For , the values can be found from the formula , e.g., . With , denoting elementary symmetric functions, one obtains for a positive integer m
Furthermore, one has
and
I will now formulate the results.
Theorem 3.
The matrix is diagonalizable with the eigenvalues .
Proof.
Let
represent the Vandermonde matrix.
We prove that
Here, is a lower triangular matrix with the elements .
Indeed, the elements of the matrix on the left-hand side are
Since all eigenvalues of are distinct, it is diagonalizable. □
3.2. Differential Equations and Integration by Parts
Let and
where form a standard normal random sample.
Then, . In this notation, , , and
According to the celebrated Selberg formula for ,
which implies that
See Lu and Richards [21] for several results related to uses of the Selberg formula in statistics. In particular, these authors elaborate the Central Limit Theorem for L expressed as a U-statistic,
where is Euler’s constant.
To simplify the formula for the quadratic risk of , we use integration by parts:
so that
Let denote the density of the estimator . This density exists and is differentiable as is the sum of two independent random variables: and normally distributed . Clearly, , and
This identity holds since the score function of is the conditional expected value of the score function of for given . It also follows from (10) in the same way as the next formula follows from (11).
For any z, so that
or
More generally,
so that
Since for any differentiable bounded function and any z,
Integrating by parts, one obtains for
or
It follows that
By putting , one obtains
One can show that by applying the above argument to . Indeed,
so that
For ,
which means that
To proceed, we need some facts about orthogonal polynomials with regard to (random) weights .
3.3. Random Orthogonal Polynomials
Using notation from the previous section for we define Hankel moment matrices as follows:
Their determinants satisfy the condition . This fact is due to self-duality of the weights (Rukhin, 2023 [8]). Then, the sequence is such that , .
We consider the sequence of monic polynomials , , , which are orthogonal in the space of all functions over . They are known to satisfy the three-term recurrence:
where , and . Clearly, depends on only, while is determined by the first moments, . For example, , .
The polynomial is less deviant from zero in : . Therefore, any monic polynomial of degree not exceeding
The ratio gives the Stiltjes transform of M,
In addition to , the polynomial has real roots (which interlace those of ), with denoting the monic polynomial of degree , which has the following roots:
It follows that coincides with the associated polynomial :
Associated with , the orthogonal monic polynomial has degree . These polynomials satisfy the same recurrence (refrek) but with different initial conditions— and —so that , .
One can represent as a product of two monic polynomials (with real roots) of degrees and respectively. Then, and , so that if , and if , ,
and
Clearly,
and
To specify coefficients of , we use the identity
where denotes the elementary symmetric function based on . Because of (18) for ,
so that
Similarly, one obtains formulas for and via the elementary symmetric functions based on .
If , represent order statistics, then
The ratio coincides with the Stiltjes transform of the discrete measure defined by the weights ; is associated with . Thus, can be written as a finite continued fraction whose coefficients can be found from the three-term recurrence (24) for orthogonal polynomials on for this measure. Similar facts hold for and the probability distribution given by .
I now summarize the obtained results.
3.4. Main Representation
The polynomials allow for expressing , as a rational function of namely,
To stress dependence on n, we write , , and . The functions on the right-hand side of (34) correspond to the sample of size obtained by deleting from the original dataset.
For the reduced sample, ; if originally then . Therefore, because of (18),
For fixed n and ,
implying (34).
Since , the repeated use of (35) gives
Here,
is the resultant of polynomials and (which are supposed not to have common roots). One can check by using (31) that
where the degree of polynomial is and the degree of is .
Actually, (34) and (35) hold if is replaced by any , in which case refers to the monic polynomial of degree proportional to
It corresponds to the sample of size obtained by removing , when becomes
In an alternative form of (34) for , the monic polynomial is proportional to
with instead of .
Our goal is to prove the following representation of .
Theorem 6.
For any integer , with S defined in (36),
Here, of represents a homogeneous symmetric function of degree —where even and odd—which is a linear combination of products of homogeneous symmetric polynomials in and with integer coefficients. if , with The recursive Formula (40) relates to , based on the reduced sample . One has
and
Proof.
As was already noticed, for the subsample the polynomial coincides with . The corresponding result is .
Therefore, (35) implies that
where the coefficients of the polynomial depend only on , and are evaluated on the sample of size obtained by removing (with for ). The homogeneity degree of is . If , then
Because of (34) for ,
If is an -dimensional vector, then (39) and (40) mean that where is a matrix of size with elements
The induction assumption that for all can be represented as linear combinations of products of homogeneous symmetric polynomials in and with integer coefficients implies that has the claimed properties. Namely, it is homogeneous of the stated degree, it is symmetric in , and , and it can be written as a linear combination of products of homogeneous symmetric polynomials in these variables with integer coefficients.
In particular, with non-negative and shift-invariant specified in (39) as a product of polynomials (29) evaluated at successive order statistics, one obtains the representation of (2) as follows:
with polynomials and based on , for which .
Theorem 6 shows that has some resistance to extreme observations,
and
Here are examples of Ks for smaller values of n given in terms of elementary symmetric functions:
4. Conclusions
The present work allows for obtaining meaningful consensus values under unreliable or missing uncertainties by using Bayes estimators (5) or (7). This method leads to mathematically challenging properties of self-dual weights (1) and their extension (4). The orthogonal polynomials that involve rank parity exhibit fascinating symmetry.
The approach explores non-traditional statistical estimation in the absence of variance information. The recursive algorithm detailed in Theorem 6 may be useful for practical calculations. The paper hints at new mathematical findings and sets the stage for a detailed exploration of a novel statistical methodology for estimating common parameters in the face of variance heterogeneity.
Funding
This research received no external funding.
Data Availability Statement
No new data were created or analyzed in this study.
Conflicts of Interest
The author declares no conflicts of interest.
References
- Morris, C.N. Parametric empirical Bayes inference: Theory and applications. J. Am. Stat. Assoc. 1983, 78, 47–65. [Google Scholar] [CrossRef]
- Rukhin, A.L. Estimating heterogeneity variances to select a random effects model. J. Stat. Plan. Inference 2019, 202, 1–13. [Google Scholar] [CrossRef]
- Templ, M. Enhancing precision in large scale data-analysis: An innovative robust imputation algorithm for managing outliers and missing values. Mathematics 2023, 11, 2729. [Google Scholar] [CrossRef]
- Spiegelhalter, D. The Art of Uncertainty; Norton: New York, NY, USA, 2025. [Google Scholar]
- Possolo, A. Measurement science meets the reproducibility challenge. Metrologia 2022, 80, 044002. [Google Scholar] [CrossRef]
- Hand, E. Citizen science: People power. Nature 2010, 466, 685–687. [Google Scholar] [CrossRef] [PubMed]
- Rukhin, A.L. Estimation of the common mean from heterogeneous normal observations with unknown variances. J. R. Stat. Soc. Ser. B 2017, 79, 1601–1618. [Google Scholar] [CrossRef]
- Rukhin, A.L. Orthogonal polynomials for self-dual weights. J. Approx. Theory 2023, 288, 105865. [Google Scholar] [CrossRef]
- Trefethen, L.N. Approximation theory and approximation practice. SIAM Rev. 2013, 46, 501–517. [Google Scholar]
- Borodin, A. Duality of orthogonal polynomials on a finite set. J. Statist. Phys. 2002, 109, 1109–1120. [Google Scholar] [CrossRef]
- Genest, V.; Tsujimoto, S.; Vinet, L.; Zhedanov, A. Persymmetric Jacobi matrices, isospectral deformations and orthogonal polynomials. J. Math. Anal. Appl. 2017, 450, 915–928. [Google Scholar] [CrossRef]
- Geisser, S. Predictive Inference: An Introduction; Chapman & Hall: New York, NY, USA, 1993. [Google Scholar]
- Fernandez, C.; Steel, M.F.J. Reference priors for the general location-scale model. Stat. Probab. Lett. 1999, 43, 377–384. [Google Scholar] [CrossRef]
- Severini, T.A.; Mukherjea, R.; Ghosh, M. On an exact probability matching property of right-invariant priors. Biometrika 2002, 89, 952–957. [Google Scholar] [CrossRef]
- Sun, D.; Berger, J.O. Objective Bayesian analysis for the multivariate normal model. In Bayesian Statistics 8; University Press: Oxford, UK, 2007; pp. 525–562. [Google Scholar]
- Box, G.; Tiao, G. Bayesian Inference in Statistical Analysis, 2nd ed.; Wiley: New York, NY, USA, 1992. [Google Scholar]
- Bauwens, L. Bayesian Full Information Analysis of Simultaneous Equation Models Using Integration by Monte Carlo; Springer: Berlin/Heidelberg, Germany, 1994. [Google Scholar]
- Huber, P.J.; Ronchetti, E.M. Robust Statistics, 2nd ed.; Wiley: New York, NY, USA, 2009. [Google Scholar]
- Lehmann, E. Elements of Large-Sample Theory; Springer: New York, NY, USA, 1999. [Google Scholar]
- Stuart, A.; Ord, J.K. Kendall’s Advanced Theory of Statistics, 6th ed.; E. Arnold: London, UK, 1994; Volume 1. [Google Scholar]
- Lu, I.-L.; Richards, D. Random discriminants. Ann. Stat. 1993, 21, 1982–2000. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).