^{1}

^{*}

^{2}

^{2}

^{3}

^{4}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/.)

Generalized least squares (GLS) for model parameter estimation has a long and successful history dating to its development by Gauss in 1795. Alternatives can outperform GLS in some settings, and alternatives to GLS are sometimes sought when GLS exhibits curious behavior, such as in Peelle's Pertinent Puzzle (PPP). PPP was described in 1987 in the context of estimating fundamental parameters that arise in nuclear interaction experiments. In PPP, GLS estimates fell outside the range of the data, eliciting concerns that GLS was somehow flawed. These concerns have led to suggested alternatives to GLS estimators. This paper defends GLS in the PPP context, investigates when PPP can occur, illustrates when PPP can be beneficial for parameter estimation, reviews optimality properties of GLS estimators, and gives an example in which PPP does occur.

Generalized least squares (GLS) for parameter estimation has a long and successful history dating to its development by Gauss in 1795. In some settings, alternatives to GLS can be effective, and are sometimes sought when GLS exhibits curious behavior, such as in Peelle's Pertinent Puzzle (PPP).

PPP was introduced in 1987 in the context of estimating fundamental parameters that arise in nuclear interaction experiments [

A GLS estimate lying outside the range of the data causes heartache among nuclear scientists. Therefore, PPP continues to be of theoretical and practical interest. For example, a summary report of International Evaluation of Neutron Cross Section Standards [

We quote the original PPP problem proposed by [

Although this PPP statement is vague, by converting it to something more interpretable, GLS can be applied and the resulting estimate is 0.88 (with an associated standard deviation of 0.22), which is outside the range of the measurements. Zhao and Perey [_{1} = 1.5 ± 10%. Another is _{2} = 1.0 ± 10%. To convert this quantity into another physical quantity, we need a conversion factor _{1} = _{1} = 1.5 and _{2} = _{2} = 1.0. We are required to obtain the weighted average of those experimental data.

In this interpretation, the common error (the “fully correlated” component) is understood to be _{1} = 1.5 ± 10% is assumed to mean that the _{1} (and 0.10 for _{2}). Even after these interpretations, some vagueness remains. There is no convention regarding what confidence is associated with ±10%. Nor is there a convention for whether the standard deviation includes all error sources, or only includes random error effects, ignoring accuracy. In addition, we show below that it can matter whether the standard deviation is expressed as a fraction of the true quantity or of the measured quantity.

One of our contributions is to make explicit assumptions and examine their implications in order to convert vague statements to statements about which it is possible to find agreement among physical scientists and statisticians regarding suitable approaches. We also defend GLS in the PPP context, illustrate when PPP can be beneficial, briefly describe properties of GLS estimators, show that PPP cannot occur for certain measurement error models, and calculate a covariance matrix Σ for _{1} and _{2} for which PPP occurs that follows from a physical description of a realistic measurement scenario.

The vagueness of the original PPP statement is one reason there are so many interpretations of PPP [_{1} and _{2} and the variances of _{1} and _{2} are very different.

Let Σ be the 2-by-2 symmetric covariance matrix for _{1} and _{2} with diagonal entries
_{12}, which denote the variance of _{1}, the variance of _{2}, and the covariance of _{1} and _{2}, respectively. Zhao and Perry [_{1} = _{1} and _{2} = _{2} as described in the Introduction) as
_{m}_{1}, _{m}_{2}, and _{c}_{1}, _{2}, and _{1}_{2}. _{m}_{1} = 0.15, _{m}_{2} = 0.10, and _{c}

Readers might find it informative that those with traditional statistical education among the authors were the most willing to accept GLS estimates despite the apparent flaw of lying outside the range of the data. Statisticians will often consider alternatives to GLS, but recognize that GLS estimation is difficult to beat, at least in terms of typical performance measures such as being close on average to the true parameter value over hypothetical repeats of the pair of experiments [

Although Peelle [_{m}_{m}_{m}_{1} or _{m}_{2} in _{m}_{1} or _{2}.

As shown below, prior to substituting the approximation for
_{1}_{2} in the off-diagonal and

The fact that _{m}

To focus on GLS behavior when PPP occurs,

It is well known that the GLS method can be applied to _{1} and _{2} to obtain the best linear unbiased estimate (BLUE) _{1}, _{2}) = Cov(_{1}, _{2}) given by
^{t}^{−1}^{−1}. And the variance ^{2} of the GLS estimate ^{t}_{1}_{1} + _{2}_{2} = 0.88 and _{1} = −0.24 and _{2} = 1.24. Notice that _{1} + _{2} = 1 (so that _{1} < 0 and that _{1} = 1.5 and _{2} = 1.0.

GLS is usually introduced in the context of estimating _{1}_{1} + (1 − _{1})_{2}, note that
_{1} to minimize var(_{1} to 0.

The GLS solution of

GLS estimation has a long and successful history, but met with serious objection within the nuclear physics community in the context of combining estimates from multiple experiments upon observing a tendency to produce estimates that are outside the range of the data. More specifically, to date the tendency has been to produce estimates that are less than the minimum data value, so have been criticized as being “too small” [

GLS estimation is guaranteed to produce the BLUE even if the underlying data are not Gaussian. However, if the data is not Gaussian, then the minimum variance unbiased estimator (MVUE) is not necessarily linear in the data. Also, though unbiased estimation might sound politically correct, it is not necessarily superior to biased estimation [

If the data has a Gaussian distribution, then it is well known that the GLS estimate is the same as the maximum likelihood (ML) estimate. This is because the log of the Gaussian likelihood involves a sum of squares, so choosing an estimate (the GLS estimate) that minimizes a sum of squares corresponds to choosing an estimate (the ML estimate) that maximizes the likelihood. Ordinary LS (OLS), weighted LS (WLS), and GLS are all essentially the same technique, but OLS is used if Σ is proportional to a unit-diagonal matrix, WLS is used if Σ is proportional to a diagonal matrix, and GLS is used if Σ is an arbitrary positive definite covariance matrix. The Gauss-Markov theorem [

The ML estimate depends on the assumed distributions for the errors. For example, if we replace the Gaussian (Normal) distributions with logNormal distributions, the ML estimate will change. In the cases considered here, ML gives the same estimate as GLS, because the data distribution is Gaussian. Because the ML approach makes strong use of the assumed error distributions, the ML estimate is sensitive to the assumed error distribution. The ML method has desirable properties, including asymptotically minimum variance as the sample size increases. However, in our example, the sample size is tiny (two), so asymptotic results for ML estimates are not relevant. It still is possible that an ML estimator will be better for nonGaussian data than GLS [^{2}. In some cases, biased estimators have lower MSE than unbiased estimators because the bias introduced is more than offset by a reduction in variance [

If Σ is given by

The covariance matrix Σ can be expressed as

Notice that

Several authors explored different values for the covariance matrix Σ to understand the relationship between the covariance matrix and the estimates. Some numerical examples are in [

Next rewrite

Suppose
_{1}, _{2}) or _{1}, _{2}). That is, _{1}, _{2}).

First assume
_{1} < _{2} then
_{1} > _{2} then

The fact that the GLS estimate is the BLUE estimate (and also the MVUE estimate if the data is Gaussian) and that it lies below the range of (_{1}, _{2}) suggests two features. First, _{1}, _{2}). Second, there must better than random chance capability to guess on which side of the range of (_{1}, _{2}) that

In addition to GLS's BLUE property, we can add support for GLS by numerical example to illustrate features one and two. _{1} = 1.5 and _{2} = 1.0. Informally, we can integrate this density over regions 1 and 3 to see that there is a large probability that both _{1} and _{2} lie either above the mean or below the mean, so indeed _{1}, _{2}). Integration of the bivariate normal for Σ given by _{1}, _{2}) and with the same probability _{1}, _{2}). This is a total of approximately 0.80 probability that _{1}, _{2}), which is an example of feature one. Having an estimate lie outside the range of the data is therefore defensible, provided (feature two) that one can guess with better than random chance performance whether _{1}, _{2}). To see that one can beat random guessing performance, suppose _{1} > _{2} as in our case (_{1} = 1.5 and _{2} = 1.0). Then, because
_{2} because if instead _{1} then the distance from _{1} to _{2} to _{1}, _{2}) pairs having _{1} > _{2} did in fact also have _{2}. On the basis of 10,000 simulations, 57% is repeatable to within ± 1% or less, so this is better than random chance (50%) guessing. This is not a formal proof but does suggest a direction to understand when a GLS estimate falling outside the range of the data is effective. Note that _{1} = 1.5 and _{2} = 1.0 in the PPP statement, and the GLS estimate is _{2}.

Suppose _{1} = _{R}_{1} and _{2} = _{R}_{2} where _{R}_{1} is random error in _{1} with variance
_{R}_{2}. Then if _{1} = _{1} + _{S}_{2} = _{2} + _{S}_{S}_{S}_{1}, _{2}) can lie in the PPP region.

The proof is by demonstration. Specify any values _{1}, _{2}, and _{12} that satisfy the PPP condition

Note:

If

If _{1} ≠ _{2}, then _{1}, _{2}) in the case _{12} > 0.

One example in which the assumptions of Theorem 3 hold involves subtracting a background measurement from a region of interest (ROI) measurement to get a net result. Because the background measurement often involves a different number of channels than the ROI measurement, a scale factor

Suppose each ROI and corresponding background are analyzed separately, and consider the first ROI in _{1} = _{1} − _{1} × _{2} = _{2} − _{2} × _{1} is the scale factor for experiment 1, and _{2} is the scale factor for experiment 2. The scale factor _{1} for experiment 1 is the ratio of the number of ROI channels to the number of background channels, and similarly for the scale factor _{2} for experiment 2. The channel counts have variation from repeat to repeat so the detected counts will vary around the true counts with some error. As an aside, often the channel counts have approximately a Poisson distribution which for large count rates is well approximated by a Gaussian distribution. Regardless of which probability distribution best describes the channel counts, there are measurement errors in _{1}, _{2}, and one can divide _{1} = _{1} − _{1} × _{1} to convert this pair of equations to those assumed in Theorem 3.

Because Peelle's original statement is vague, there have been several interpretations and solutions. In our experience, there is considerable variation among experimentalists in the expression of measurement uncertainty, and a wide the range of analyses can result from vague uncertainty statements.

The three main contributions of this paper are: (1) illustrating examples when PPP cannot occur (Theorem 1); (2) providing insight when PPP is effective and appropriate (related to Theorem 2), and (3) deriving a realistic covariance matrix Σ for which PPP occurs according to physical descriptions of realistic measurement scenarios (Theorem 3). We also showed via numerical integration that an estimate lying outside the range of the data is sensible. This is because the unequal variances of _{1} and _{2} provide information regarding whether _{1} and _{2}.

Of course GLS provides a good estimate

There will almost always be estimation error in Σ̂, and often the measurement errors are nonGaussian. Therefore, we consider the following two topics in [

Finally, [

Contours of example bivarate normal density of _{1} and _{2} illustrating that _{1} and _{2} because of the large probability of (_{1}, _{2}) falling in region 1 or 3.

Example region of interest and corresponding background that can lead to the PPP condition.

We acknowledge the next generation safeguards initiative within the U.S. Department of Energy.