^{1}

^{*}

^{1}

^{2}

^{3}

^{4}

^{3}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Peelle's Pertinent Puzzle (PPP) was described in 1987 in the context of estimating fundamental parameters that arise in nuclear interaction experiments. In PPP, generalized least squares (GLS) parameter estimates fell outside the range of the data, which has raised concerns that GLS is somehow flawed and has led to suggested alternatives to GLS estimators. However, there have been no corresponding performance comparisons among methods, and one suggested approach involving simulated data realizations is statistically incomplete. Here we provide performance comparisons among estimators, introduce approximate Bayesian computation (ABC) using density estimation applied to simulated data realizations to produce an alternative to the incomplete approach, complete the incompletely specified approach, and show that estimation error in the assumed covariance matrix cannot always be ignored.

Peelle's Pertinent Puzzle (PPP) was introduced in 1987 in the context of estimating fundamental parameters that arise in nuclear interaction experiments [

Others [

The following section includes additional background for PPP and the GLS approach. Section 3 discusses the ordinary sample mean as a perhaps naive alternative and shows that estimation error in the assumed covariance matrix cannot always be ignored. Sections 4 and 5 develop an alternative involving a modified form of approximate Bayesian computation (ABC). This ABC-based approach uses density estimation applied to realizations of simulated data that follow an error model appropriate for PPP and having various error distributions. Section 6 provides a second Bayesian alternative resulting from completion of the ad hoc approach by specifying a suitable prior probability density function (pdf). Section 7 is a summary.

A situation that leads to PPP is as follows. Two experiments aim to estimate a physical constant _{1}, and experiment two produces _{2}. Measurements _{1} and _{2} arise by imperfect conversion of underlying measurements _{1} and _{2}. Suppose _{1} = _{R}_{1} and _{2} = _{R}_{2} where _{R}_{1} is zero-mean random error in _{1} and similarly for _{R}_{2}. Burr _{1} =_{1} + _{S}_{2} = _{2} + _{S}_{S}_{S}_{1}, _{2}) can lie in the PPP region in which the GLS estimate of _{1} and _{2}. And, Burr

Let Σ be the 2-by-2 symmetric covariance matrix for _{1} and _{2} with diagonal entries
_{12}, which denote the variance of _{1}, the variance of _{2}, and the covariance of _{1} and _{2}, respectively. For the case considered here and in Zhao and Perry [

If the three errors _{R}_{1}, _{R}_{2}, and _{S}_{1}, _{2}) is also normal. If the three errors have non-normal distributions, then the joint pdf of (_{1}, _{2}) will not be normal and might be difficult to calculate analytically. Therefore, to handle non-normal distributions that might be difficult to calculate analytically, Sections 4 and 5 develop an estimator based on ABC. This ABC-based estimation can be applied with the error model just described resulting from the two experiments producing _{1}, _{2} values with a covariance given by

It is well known that GLS applied to _{1} and _{2} results in the best linear unbiased estimate (BLUE) _{1}, _{2}) = Cov(_{1}, _{2}) is given by
^{t}^{−1}^{−1}. And the variance ^{2} of the GLS estimate ^{t}_{1} = 1.5 and _{2} = 1.0 gives _{1}_{1} + _{2}_{2} = 0.89 and _{1} = −0.22 and _{2} = 1.22. Notice that _{1} + _{2} = 1 (so that _{1} < 0 and that

GLS estimation is guaranteed to produce the BLUE even if the underlying data are not normal. However, if the data are not normal, then the minimum variance unbiased estimator is not necessarily linear in the data. Also, unbiased estimation is not necessarily superior to biased estimation [

The main alternative to GLS estimation in this context is estimation that uses the pdf. When viewed as a function of the unknown model parameters (_{1}, _{2} in our case), the pdf is referred to as the likelihood. The maximum likelihood (ML) estimate therefore of course depends on the pdf for the errors. For example, the normal distributions are replaced with lognormal distributions, the ML estimate will change. Because the ML approach makes strong use of the assumed error distributions, the ML estimate is sensitive to the assumed error distribution. The ML method has desirable properties, including asymptotically minimum variance as the sample size increases. However, in our example, the sample size is tiny (two), so asymptotic results for ML estimates are not relevant. It still is possible that an ML estimator will be better for non-normal data than GLS. In this paper, “better” is defined as the mean squared error (MSE) of the estimator, which is well known to satisfy MSE = variance + bias^{2}. In some cases, biased estimators have lower MSE than unbiased estimators because the bias introduced is more than offset by a reduction in variance [

In nearly all real situations, Σ is not known exactly, but must be estimated. Suppose the estimate Σ̂ = _{1}, _{2} values that are auxiliary data pairs separate from the _{1} = 1.5 and _{2} = 1.0 values. We will consider two examples.

Example 3.1

Assume that the distribution of (_{1} and _{2} pairs are bivariate normal, and approximately true for many other bivariate distributions [_{1} and _{2}. Recall that if Σ is known exactly, then the RMSE in

A third estimation option suggested in [_{1} and _{2} are thought to be independent, but because the estimated covariance
_{1} and _{2} are known exactly. Therefore, there is merit in Schmelling's [

Example 3.2

Still assuming a normal likelihood but changing
_{1}, _{2}) = 0.997), options 1, 2, and 3 have RMSE of 0.025, 0.24, and 0.17, respectively, for _{1} and _{2}, so even with large estimation error in Σ̂, the GLS method has by far the lowest RMSE for all values of _{μ̂}_{μ̂}

Bayesian analysis requires a prior probability

ABC has arisen relatively recently in applications for which the likelihood _{1}, _{2} that could be difficult to derive analytically, then ABC is attractive. Predating ABC is a nonBayesian (“frequentist”) method referred to as “inference for implicit statistical models” [

In the common version of ABC, because data can be simulated from the pdf, candidate values

As a simple example to motivate our modified form of ABC, suppose _{1}, _{2}, …, _{n}_{i}

Our modified ABC that relies on density estimation has the following steps:

Determine a lower (_{1}, _{2}, …, _{n1} and knowing _{1} and _{1}.

Choose _{i}_{2} observations from _{i}, σ_{2} observations to estimate the pdf using density estimation [

The likelihood of each observed _{j}_{i}_{2} can be increased until estimates of

The overall likelihood is then the product (because the _{j}_{j}

Use MCMC to estimate the posterior probability for

_{1} = 10, and _{2} = 1000. Notice that the log likelihood peaks very near the true

To check our implementation, we specified the prior _{prior} = 10, and _{1} = 10, so the posterior distribution has mean essentially equal to _{1} = 10 observations, and applying density estimation to estimate the likelihood using _{2} = 1000 observations resulted in no statistical difference (based on a paired

To our knowledge, this simple use of density estimation to substitute for having an analytical form for the likelihood has not been studied in the ABC literature. However, provided the data can be simulated sufficiently fast to acquire many observations (_{2} = 1000 was sufficiently large in this toy problem), and provided the dimension of the data is not too large, density estimation is feasible and effective. In the PPP case, we have only two variables, _{1} and _{2}, so certainly 2-dimensional density estimation is feasible. We note here that summary statistics such as moments from the real and simulated data in the MCMC accept/reject steps in typical ABC implementations are fast to compute, but convergence criteria and measures of adequacy are still being developed [

This section again uses the slightly modified form of ABC that relies on density estimation rather than using summary statistics computed from the real and simulated data, but MCMC is applied in the context of PPP.

As mentioned in Section 2, suppose from two experiments _{1} = _{R}_{1} and _{2} = _{R}_{2}, where _{R}_{1} is zero-mean random error in _{1} and similarly for _{R}_{2}. Then if _{1} = _{1} + _{S}_{2} = _{2}_{S}_{S}_{S}_{1}, _{2}) can lie in the PPP region [

To implement our version of ABC, many (_{1}, _{2}) realizations are simulated from each of many values of _{1}, _{2}) and its known covariance Σ, we can again set up good

A key advantage of ABC is that any probability distribution can be easily accommodated for any of _{R}_{1}, _{R}_{2}, and _{S}

Example 5.1.

Assume _{R}_{1}, _{R}_{2}, and _{S}_{1}, _{2} is bivariate normal. Then as it must, our version of ABC recovers the GLS estimate

Example 5.2.

Assume _{R}_{1} and _{R}_{2} each have a centered and scaled lognormal distribution and _{S}_{1}, _{2} is given by _{1}, _{2} and assuming a bivariate normal distribution for _{1}, _{2}. These log likelihoods are meaningfully different and one might expect the corresponding estimators to behave quite differently. For the single realization of _{1}, _{2} = 1.08, 1.24, for which _{1}, _{2} is 1.27. The ABC estimate assuming the correct bivariate distribution for _{1}, _{2} is 1.03.

To investigate whether using the correct likelihood leads to smaller RMSE, we performed 100 simulations for various values of _{1}, _{2}) pair was generated to represent the observed data pair, and then 10,000 observations of (_{1}, _{2}) pairs were generated to use for density estimation so that the likelihood of each candidate value of

Recall that _{1}, _{2}) when PPP occurs, and that some researchers are bothered by this. For the non-normal data just described, even though PPP occurs, we find numerically that _{1}, _{2}) and sometimes lies within the range of (_{1}, _{2}).

Using a model of the two experiments that produce _{1}, _{2} such as that by [_{1}, _{2} pairs but regard them as being _{1}, _{2} pairs. See below for “justification” in switching from _{1}, _{2} to _{1}, _{2}. Because of the strong prior belief that _{1} ≈ _{2}, restrict attention to bivariate points (_{1}, _{2}) lying very near the diagonal, defining _{1} or _{2} (or both _{1} and _{2}) satisfying |_{1} − _{2}| < _{1} or _{2} to estimate _{m}

In one dimension, the analogous ad hoc procedure for a sample of size ^{2}/^{2} with
^{2}) and a normal prior _{p}, ^{2}), as the prior variance ^{2} → ∞, the prior becomes “flat,” and the ad hoc procedure is well known to be correct in an asymptotic sense.

Although this ad hoc approach in two dimensions is a recipe that can be followed to produce _{1}, _{2} pairs as being _{1}, _{2} pairs requires that _{1}, _{2} pairs be interpreted as being observations from the posterior distribution for _{1}, _{2} in a Bayesian sense. Second, to formalize the notion of restricting attention to bivariate points (_{1}, _{2}) lying very near the diagonal, we must define a prior probability distribution. Specifically, assume

Let the prior for

The recipe to restrict attention to simulated values satisfying _{1} ≈ _{2} is ad hoc, but for a given choice of _{1} ≈ _{2} as satisfying |_{1} − _{2}| < _{prior}. And, given

For example, choosing
_{1}, _{2}) estimates of (1.49, 1.00), (1.08, 0.92), and (0.89, 0.89) for _{prior} such as 0.1 to 10. The prior can be chosen such that the resulting estimate of _{1} is within a small _{2}. Notice that the (0.89, 0.89) values for the (_{1}, _{2}) estimate agrees with the GLS method for _{1}, _{2}) pairs along the diagonal [

This Bayesian approach revises the ad hoc recipe and is easily implemented using MCMC for any likelihood. It is sometimes difficult to calculate the likelihood for complicated combinations of distributions involving for example sums and products of random variables (which can arise in modifications of the prescription to generate _{1}, _{2} pairs). An appeal of the modified ad hoc approach is that the likelihood does not need to be computed. Instead, one need only be able to generate many _{1}, _{2} pairs. This is the same appeal of the ABC approach in our context.

To summarize this section, the ad hoc approach in [_{1}, _{2}) pairs from any distribution as providing an estimate of _{1} and _{2} satisfying |_{1} − _{2}| < _{1} and _{2} values that are similar). This approach has a corresponding formally correct Bayesian interpretation, but is easier to implement than the corresponding correct Bayesian implementation with a particular choice of the prior pdf.

For Example 5.1 in Section 5, for which the distributions for _{R}_{1}, _{R}_{2}, and _{S}

Example 6.1.

If the distributions for _{R}_{1}, _{R}_{2}, and for _{S}

If the distributions for _{R}_{1}, _{R}_{2} are centered and scaled product normal and the distribution for _{S}

The fact that the estimates can lie between _{1} and _{2} appeals in some way.

Example 6.2

The ad hoc method can be formally defended by the Bayesian framework just described. Therefore, the ad hoc approach is included among the candidates whose RMSEs were given in Example 5.2 from Section 5. In the second part of Example 5.2, there are 100 simulations of 10,000 observations of (_{1}, _{2}) pairs (for which the RMSE of GLS and ABC are 0.22 and 0.15, respectively). The the RMSE of the ad hoc approach is 0.24 (repeatable to within ±0.01).

For Example 6.2, although the RMSEs from low to high (low is good) are ABC, GLS, and alternate Bayes, for this example, we do not make any general performance claims ranking GLS, ABC, and the modified ad hoc (“alternate Bayes”) approaches. Instead, the goal is to present statistically defensible approaches and demonstrate that they do perform differently.

_{1}, _{2}) pairs in the metrop MCMC (in red), which tightly follow the diagonal line. The bottom plot shows the estimated density of the accepted pairs using the Bayesian strategy and using the ad hoc strategy. Notice that the two densities are not the same. However, in this case, both methods arrive at

There will almost always be estimation error in Σ̂, and often the measurement errors are non-normal. Therefore, we considered the following three topics: (1) alternatives to GLS when there is estimation error in Σ̂, (2) approximate Bayesian computation [

Regarding (1), it was illustrated in the PPP case that weighted estimates do not always outperform equally-weighted estimates when there is estimation error in the weights. We have already noted that estimation of Σ in

Regarding (2), ABC based on density estimation to estimate the required likelihood was introduced. We showed that for some likelihoods, Bayesian estimation can outperform GLS, and that for some data realizations, the ABC estimator fell within the range of the simulated _{1}, _{2} data pairs. Of course GLS provides a good estimate

Regarding (3), ad hoc estimators based on incomplete statistical reasoning did produce values that lie within the data range [

Example ABC. The solid black curve is for

Example ABC. The solid black curve is for a bivariate normal. The dotted red curve is for a bivariate non-normal (combination of Gamma and lognormal).

Example ABC. The ad hoc “go along the diagonal” approach in [

Example ABC. The

We acknowledge the U.S. NNSA, and the Next Generation Safeguards Initiative (NGSI) within the U.S. Department of Energy.

This appendix lists

mutrue = 1; sigma = .1

mu.grid = seq(mutrue-3*sigma,mutrue+3*sigma,length=100)

llik1 = numeric(length(mu.grid))

for(i in 1:length(mu.grid)) {

xtemp = rnorm(n=10^{5},mean=mu.grid[i],sd=sigma)

temp.density = density(xtemp,n=1000)

llik1[i] = sum(log10(approx(temp.density$x,temp.density$y,xout=mu.grid)$y))

}

plot(mu.grid[5:95],llik1[5:95],xlab=expression(mu),ylab=”log(likelihood)”,type=”l”)

abline(v=1)

llik1.fun = function(mu,llik=llik1,mu.grid.use=mu.grid,prior.mean=0,prior.sd=10) {

llik.temp = approx(x=mu.grid.use,y=llik,xout=mu)$y

prior = log(dnorm(mu,mean=prior.mean,sd=prior.sd))

llik.temp = llik.temp + prior

if(is.na(llik.temp)) llik.temp = −10^{10}

if(llik.temp == -Inf) llik.temp = −10^{10}

llik.temp

}

library(mcmc) out1 = metrop(obj=llik1.fun,llik=llik1,initial=1,blen=10,scale=0.1,nbatch=1000)

plot(out1$batch) # basic diagnostic plot to check mcmc convergence

mean(out1$batch) # Result: approximately 1.0, the correct answer.

#product normal case

llik2.fun = function(mu,llik=llik2,mu.grid.use=mu.grid,prior.mean=0,prior.sd=10) {

llik.temp = approx(x=mu.grid.use,y=llik,xout=mu)$y

prior = log(dgamma(mu,shape=prior.mean/prior.sd,rate=1/prior.sd)) # true mean is nonzero

llik.temp = llik.temp + prior

if(is.na(llik.temp))llik.temp = −10^{10}

if(llik.temp == -Inf) llik.temp = −10^{10}

llik.temp

}

x = c(1.5,1)

library(MASS) #

mu.grid = seq(0.3,1.7,length=100)

n = 10^{5} # observations from simulation for each value of mu in grid of mu values

llik5a = numeric(length(mu.grid))

nvar1 = 0.1134; nvar2 = 0.0504; ncov12 = 0.06

alpha = .7

svar = ncov12/alpha

rvar1 = nvar1 - svar

rvar2 = nvar2 - alpha^{2} * svar

for(i in 1:length(mu.grid))

mu = mu.grid[i]

tempS = svar^{.5} * rnorm(n) * rnorm(n)

x1temp = mu + tempS +rvar1^{.5} * rnorm(n) * rnorm(n)

x2temp = mu + alpha * tempS + rvar2^{.5} * rnorm(n) * rnorm(n)

temp.density = kde2d(x=x1temp,y=x2temp,lims=c(x[1],x[1],x[2],x[2]),n=1)

llik5a[i] = log(temp.density$z)

llik5a[llik5a==-Inf] = min(llik5a[llik5a != -Inf])

tempy = lokerns(x=mu.grid,y=llik5a,x.out=mu.grid)

llik5a = tempy$est

llik2[llik1a==-Inf] = min(llik2[llik2 != -Inf])

tempy = lokerns(x=mu.grid,y=llik1a,x.out=mu.grid)

llik2 = tempy$est

out2 = metrop(obj=llik2.fun,initial=1,llik=llik2,blen=10,scale=.3,nbatch=1000,prior.mean=1,prior.sd=1)

mean(out2$batch) # Result: approximately 0.89.

^{239}Fission Cross-Section Uncertainties Using a Monte Carlo Technique