1. Introduction
In the archetypal measurement, the only thing known in advance about the true value
is that it is suitable for study using the chosen technique. Consequently, the measured value
x and the accompanying standard uncertainty
are obtained using only the data and the model of the measurement procedure. However, there are some situations where more is known about
: in particular, there are situations where there is a nominal value for
, which we denote by
. For example, in the measurement of a quantity
that acts as an unwanted bias in an experimental procedure, the nominal value will, by design, be
. The existence of a nominal value is information that we would like to use to obtain an estimate of
that is, in some sense, more accurate than the raw measured value
x. Here, we show how the mean square error in the overall measurement procedure is reduced by moving the estimate partway towards
. The analysis describes a statistical concept called
shrinkage estimation [
1,
2].
The nominal value
, or, equivalently, a previous unrelated estimate, is prior information about
. Such information is to be employed as fully as possible without overstating its meaning or importance. The use of prior information is a strength of a Bayesian statistical analysis, but the associated need to accurately encode such information in a joint prior probability distribution for every relevant constant is a weakness. Such a distribution is a mental construct that involves the idea of subjective probability, not an objective frequency distribution that can be examined experimentally. The fact that individual subjective probability statements cannot be falsified, i.e., cannot hypothetically be shown to be incorrect, is enough to render them unacceptable to many scientists. Also, a strong argument has recently been given in the metrology literature that the foundation of Bayesian theory and practice is unsound [
3]. Thus, an ideal method of analysis might make use of prior information while retaining only the objective concept of frequency-based probability. Such a method is presented here, where the prior information is in a form that is acceptable to the classical, frequentist, statistician.
The principle of shrinkage estimation might be applied in an individual measurement, but it also might be used in the combination of data, as, for example, in the formation of a consensus value in a measurement comparison.
Section 2 describes the basic idea of shrinkage estimation with an individual measurement, and then
Section 3 uses this idea to propose a new analysis for a consensus value in a comparison.
Section 4 considers how the data from the comparison can be used to accept or reject the influence of the nominal value, and
Section 5 contains a discussion. Our notation is standard in statistical writing. A Greek letter, e.g.,
or
, indicates an unknown and unknowable true value. A lower-case italic Latin letter, e.g.,
or
a, is used for a figure or observation that is known or will be determined, while an upper-case italic Latin letter, e.g.,
, indicates a corresponding random variable, which can be understood to represent the procedure or process that generates the observation. In keeping with the classical, frequentist, paradigm of statistics, statements of probability are only made about the potential results of procedures, not about constants. Therefore, a probability distribution is not attributed to a measurand such as
.
2. A Shrinkage Estimator
Imagine that you measure a quantity known to be small and that the figure you record is y. It is natural to take y as the measured value of , but the additional information that is small has not been used in making that assessment. If someone asked you “Is less than y or is it greater than y?”, a reasonable answer might be “If I had to choose one or the other, I would say it was less than y because I know that is small, not large.” Such an answer would be based on combining knowledge of the data with prior knowledge about . As is now illustrated, this idea of supplementing data with prior information can lead to an improved measurement process in some scenarios.
Suppose that the measurand has true value 1.5 and that the measurement procedure is unbiased with standard deviation . Imagine a long series of measurements that gives the set of figures . The mean square error (MSE) of an estimator is the square of its bias plus its variance, so the estimates in the set have an MSE of , while the estimates in the alternative set have an MSE of . Thus, with regard to the MSE, the regular choice of instead of y as the measured value would be superior! The presence of the squared bias associated with moving an otherwise-unbiased estimate towards the origin is, in this case, more than offset by the reduction in variance: the relationship of the difference to the standard error u is such that there is benefit. The method can be seen to involve a trade-off between bias and variance that, in this case, favours the presence of a small bias.
The bias and the variance in the more general estimate
are
and
, so the MSE, which is unknown and which we denote by
, is
Figure 1 depicts the ratio
as a function of
a for various values of the dimensionless unknown
. The ratio is smaller than 1 for any value of
a in the interval from
to 1, but it is larger for any value of
a outside this interval. If
and
, then the ratio never exceeds 1. The ratio is minimized with respect to
a when
where its value is equal to
a (marked). In our example with
, any value of
a between
and 1 leads to an improvement, the improvement is greatest at
, and the MSE achieved at that point is
, because
.
Thus, there is an improvement in the MSE whenever , and the improvement can be substantial at small values of . However, the price to pay for this too-good-to-be-true behaviour is an increase in MSE when is larger. For example, if had been equal to 5 in our example, then the MSE with would have been 1.64. In this way, accurate prior information about (and hence ) leads to a gain, but inaccurate prior information leads to a loss, as might be expected. (The same concepts apply with the mean absolute error. For example, with and , the mean absolute error is when but it is when .)
It follows that if any upper bound can be put on
, then a range of values of
a exists for which the mean square error will be less than
. Thus, if we are confident about the maximum possible value of
, then some corresponding estimate
should be used instead of
y. The estimate
is formed by shrinking the original estimate
y towards the origin, so it is called a
shrinkage estimate. The random variable realized in
, i.e., the combined procedure of measurement and analysis that generated the estimate, is called a
shrinkage estimator. The factor
a is called a
shrinkage factor, though the term “expansion factor” seems more meaningful. In practice, the technique finds application when there is a nominal value for the measurand or a previous estimate of it, with this figure acting as a new origin. Then,
is the difference between the true value of the measurand and this nominal figure, so the nominal value of
is zero and the theory above applies. (This is one context for the term “shrinkage estimation”. A different context encountered more often in the statistical literature is briefly mentioned in
Section 5.5.)
Example 1—Estimation of a Systematic Effect
Suppose that a certain step in an experimental procedure incurs a fixed error for which a historical correction is available. The laboratory seeks to improve the correction by designing an experiment to measure the size of this effect. The difference between the true value of the effect and the value implied by the correction is
, which is a small quantity with nominal value 0. Suppose that the random error in the measurement procedure has standard deviation
and that the laboratory is confident that
. Then, the laboratory assumes that
, meaning that it can choose any value of
a in the interval from 0.8 to 1. From (
2), the best choice appears to be
. The measurement is carried out and the raw, unbiased, estimate of
is found to be
, so the shrinkage estimate of
is
. Both the raw estimate and the prior information have been used appropriately, and the laboratory is confident that the new figure
is the result of a procedure with reduced mean square error. Therefore, this is the preferred estimate of
. We see in
Section 3.3 that an appropriate figure to state as the standard uncertainty of this estimate is
.
3. Application in a Measurement Comparison
Frequently, measured values of the same quantity are obtained from different laboratories, and these values are compared to assess whether the accompanying statements of measurement uncertainty represent the capabilities of the laboratories adequately. Usually, each measured value is compared to a consensus value that acts as the best estimate of . Often, because there is no other source of information, this consensus value must be calculated solely from the submitted measured values and their stated uncertainties. However, sometimes there might be a nominal value for , perhaps an estimate obtained in an earlier unrelated measurement. This nominal value can act as a new origin, and then shrinkage estimation can be applied. This section gives details of the analysis and shows how the relevant shrinkage factor can be chosen.
Consider the situation where a stable artefact with unknown true value
is measured by a number of laboratories and where
n submitted measured values
are selected as being jointly consistent given their accompanying figures of standard uncertainty,
. These
n values of
are to be formed into a consensus value, CV, using the figures of uncertainty. The usual model for the generation of the data states that (a) the measured value from the
ith laboratory,
, was drawn from a distribution with mean
and standard deviation
, and that (b) the
n overall errors in the processes of generating the measured values were incurred independently. Then, the consensus value is given by the familiar inverse-variance-weighted mean
(Unless indicated otherwise, all summation in this paper is over
). Because the process that generated this estimate is regarded as being unbiased, it is appropriate to take the standard uncertainty of
to be the standard deviation of the combined estimator of
, which is
This model and the means of combination are broadly accepted, e.g., [
4], and they form the basis of the procedure proposed here. (The quantity
appears frequently in what follows, so we often represent this quantity using the simpler symbol
).
Shrinkage estimation can potentially be applied to form an alternative consensus value if the true value
has a nominal value or an existing estimate,
. We simply apply the theory of
Section 2 with
and
, and then we readjust by adding
to the figure obtained. If
is sufficiently small in relation to a standard uncertainty
, then the contribution of the corresponding measured value
to the overall mean square error in a weighted sum like
is reduced if we use
instead, provided that the shrinkage factor
is suitably chosen. Because each
is different, each optimal
might possibly be different. But it is important to realize that this step would not amount to changing or rejecting the data: it would merely be using the nominal value
and the stated uncertainties
to conduct a more accurate combined measurement of
.
The attributes of bias and mean square error are long-run properties of estimators (random variables), not strictly properties of the one-off estimates. Thus, it is helpful to express these ideas in terms of random variables rather than observed figures. Let
be the random variable observed (realized) in the quantity
. The model states that
has a mean equal to the true value
, has variance
, and is independent of
. This can be written as
Here,
is seen as a constant parameter of the measurement process; it is not seen as the outcome or observation of a random variable. The task is to choose weights
and associated shrinkage factors
to minimize the mean square error of the random variable
This is to be achieved using the information available to us before studying the
data, which comprises the model, the nominal value
and the set of standard uncertainties
. Subsequently, the
data are observed and we observe the realization of
T, which is taken as the consensus value. Thus, the consensus value obtained by this approach is
3.1. Identifying the Consensus Value
Equation (
6) shows that the consensus value
is determined by the
n values of
in addition to the
data and the nominal value. But there are
values to be chosen in the combined set of weights and shrinkage factors
. Therefore, without any loss of flexibility, we can adopt the same weights that were used in the formation of the standard consensus value
and subsequently choose the
n optimal values of
. Now, we set
which means that
and that
The figure
is the outcome of a random variable
T that can be written as
because
. The bias of the individual shrunken estimator
is
(where
denotes the expected value), so the bias of
T is
Also, the variance of
T is
using (
7). Therefore, the mean square error of
T is
This is unknown because
is unknown, but we wish to minimize it as best we can by a judicious choice of each
.
Our approach is to state some known positive figure
d that is regarded as an approximation to, or upper bound on,
and then to choose each
to minimize the value of
that would exist if
happened to be equal to
d, which is
Differentiating this with respect to each
, setting the results to zero and simplifying gives the
n equations
The quantity on the right-hand sides of these equations does not depend on the index
i, which implies that the optimal values for
are equal. Substituting
a for
and
in these equations and solving gives
as the optimal choice of shrinkage factor for the specified value of
d. Then, (
6) implies that the associated consensus value is
Because
, we can see that the proposed consensus value
is a convex weighted sum of the nominal value
and the standard consensus value
. Given the data, the procedure is determined by specifying
and
d, which must be done without being influenced by the
figures because the procedure must be considered to be fully defined before the randomness modelled by (
5) acts.
Several observations can be made about the appropriateness of :
Each estimate has been shrunk using the same factor, . This emphasizes further that no data are being adjusted.
If , then the nominal value is being regarded as exact, in which case .
As , and so . Thus, as the quality of the prior information diminishes, the difference between the two consensus values diminishes.
As , and so . As the prior information becomes dominated by the data, the consensus value responds accordingly.
3.2. Comparison of MSEs
Let us now compare the mean square errors of the proposed procedure with the MSE of the procedure that results in
, which is
, from (
4). Equation (
9) shows that
arises as a weighted sum of the nominal value
and the standard estimate
. The term involving
is a constant that contains bias, while the term involving
has not been subject to bias but has been subject to variance. The MSE of the proposed procedure is
Define the dimensionless unknown
. (The quantity
here is analogous to the quantity
in
Section 2.) We find that
which echoes (
1).
Figure 2 shows the ratio
as a function of
for several different values of
(unlike
Figure 1, which shows the ratio as a function of
a for several different values of
). The ratio of the mean square errors is smaller than one if and only if
and it can be as low as
, which occurs when
, i.e., when
.
(The material in
Section 2 implies that the minimum value of
taken with respect to
at a fixed value of
is
. Accordingly, the relationship ‘
’, which is shown by the enveloping dotted line, describes the greatest lower bound to the family of curves that would be obtained using all values of
.)
Figure 3 shows the ratio of MSEs against the dimensionless quantity
(whereas
Figure 1 showed the ratio of MSEs against
a). From this and from (
2) and (
8), we can see that for a fixed
, the MSE is minimized when
, i.e., when
, as might be expected. However, from (
11), we find that there is a reduction in the MSE whenever we choose
d such that
which is guaranteed if we choose
. Thus, for an improvement in MSE, it is not necessary for
d to be an upper bound on
, and there is considerable room for misjudgement in assessing the value of
d to represent
.
3.3. Standard Uncertainty of the Consensus Value
In an unbiased measurement, the square of the standard uncertainty is to act as an estimate of the variance in the measurement procedure. But when a measurement is potentially biased, any single figure of uncertainty must also include a component relating to the bias. Therefore, let us now consider how to express the measurement uncertainty when is used as the combined estimate of .
3.3.1. Propagation of Mean Square Error
To find the appropriate representation of uncertainty, we consider the indirect measurement of by the summation of biased measured values of each component . We can write , where is the estimator of , is the bias, and is the random variable for the corresponding random error. The standard estimator of is and this has an overall bias . If the contributing biases have different signs, then there is some cancellation and the overall bias does not have its “worst-case” magnitude. Suppose that the signs of the contributing biases are random and that we can regard the generation of as being the realization of some random variable with mean zero and some variance . Then, is the realization of the variable , which has mean , while is the realization of the variable , which also has mean because the random nature of the signs implies that the expected value of the product is zero for . In other words, the effect of the sum of biases in the overall error is represented, on average, by the sum of the squares of the biases. We see that, on average, the squares of the biases propagate along a chain of measurement in just the same manner as the variances of the random errors. It follows that we can also consider MSEs to propagate additively along a long chain of unrelated measurements. Thus, if a single figure is to be used to describe the size of the potential error in a biased measurement, then assuming that the accurate propagation of error or uncertainty is the objective, the figure should be the square root of the MSE, not the standard deviation, the two being equal when there is no bias.
This conclusion is in keeping with the concept of Type B analysis endorsed in CIPM Recommendation INC-1 [
5,
6] and described more clearly in the parent report [
7]. In Type B analysis,
is considered to be drawn from a population with mean zero and some known variance
, i.e., to be the outcome of a random variable with mean zero and variance
. The overall variance attributed to
is
, as in (
5). We can interpret
as being an estimate of the square of the bias
, in which case
is just an estimate of what, in the years preceding the acceptance of the Type B evaluation, would have been called mean square error. Thus, known variances are acting for unknown biases within the uncertainty analysis, and, in effect, all bias is modelled out of the measurement.
3.3.2. Analysis of Mean Square Error
Let us return to our context where we have
n measurement results
that are to be combined to form a consensus value. Each of the measurements is subject to a Type B evaluation, so individually, each is modelled as an unbiased measurement, as in (
5). But now, we have deliberately introduced an unknown bias
into the combination process through the shrinkage factor. Following the argument above, we wish to calculate the square root of a suitable estimate of the MSE given in (11) and to state this as the “standard uncertainty” of
. It seems clear that this estimate of the MSE is to have the form
for some as yet unspecified multiplier
k. The choice of
k depends upon how we interpret
d and upon our attitude to the idea that the uncertainty analysis is to err on the side of conservatism. If we see
as being an unbiased estimate of
, then we would set
, while if we see
d as an upper bound on
, then we would perhaps set
and so overstate the measurement uncertainty for conservatism. However, we might choose a larger value of
k for conservatism in other circumstances. In general, we suggest setting
, so that the standard uncertainty stated with the proposed consensus value
is
Then, the standard uncertainty associated with
is no greater than the standard uncertainty associated with
, which accords with the idea that we have made use of more information in obtaining the alternative estimate. (When
we obtain the situation of a single measurement in
Section 2. Equation (
12) then justifies our use of
as the standard uncertainty in the example of
Section 2.)
3.3.3. Another Derivation—And a Potential Misconception
We can reach the same expression for the uncertainty in
by a different, perhaps faulty, argument. Dividing both numerator and denominator in (
8) by
shows that
Then, using (
9), we can write
In this expression, the quantity
appears as if it were the variance underlying an additional “observation”,
, so the figure
is the figure that would be obtained using the usual method of analysis if
were an additional observation with standard uncertainty
d. This can encourage us to think that the squared standard uncertainty to state with
should be
which is (
12). This might appear to be a simpler derivation of (
12), but the logic is questionable. The argument treats
as if it varies around
over repeated measurements like the other observations, but we wish to estimate the MSE with
fixed, so it does not seem advisable to rely upon this derivation of (
12).
The form of (
13) shows again that shrinking the measured values towards the nominal value does not correspond to modifying the data.
3.4. Example 2
Imagine that an artefact with unknown true value
is circulated for measurement in a comparison. Suppose that the laboratory given the task of analysing the comparison data has available a previous independent estimate
and that the laboratory is confident that
. Thus, the laboratory sets
. Suppose that, after the comparison data are received, it is decided to use the set of
data pairs
indicated in
Table 1 to calculate the consensus value. For the standard method, Equations (
3) and (
4) give
and for the shrinkage estimation, (
8), (
9) and (
12) subsequently give
Because
d is greater than
in (
8), the optimal shrinkage factor is greater than 0.5, and
is closer to
than to
. Also, in accordance with the idea that
is derived using more information than
, the standard uncertainty of
is smaller than the standard uncertainty of
.
4. Compatibility of the Nominal Value with the Data
The method involves a trade-off between bias and variance, with the bias being proportional to the absolute value of the difference between
and
. We can prevent an unrealistic value of
from having excessive effect by assessing its compatibility with the data. If the data are collectively inconsistent with
, then the prior information represented by the use of
can be disregarded and there is no harm to our results. This principle applies in the situation of
Section 2, where we would compare the value
against
u to assess whether zero was a feasible value for
, and it also applies in the context of the measurement comparison in
Section 3.
Consider the analysis in
Section 3. If
for some appropriate multiplier
h, then we have statistical evidence at the 0.05 level to reject the idea that
and
d are compatible with the data, in which case we can discard the prior information as being unreliable. In that situation, we quote the standard results
and
instead of
and
. Thus,
is preferred to
only when
d is large enough for
to be reliable.
Because
d is intended to be an estimate or overestimate of
, we can set
. The relevant condition becomes
and the final recommended figures CV and
are then given by the expressions
and
Example 2—Continued
We find from the statistics calculated in
Section 3.4 that the condition is met, so
d is large enough for the perceived reliability of
to be acceptable. Therefore, we set the final figures to be those that were obtained in the shrinkage estimation, i.e., we set
and
.
Let us consider again the data in
Table 1, and now, let us examine the behaviour of the different estimates if
and
d are hypothetically permitted to vary.
Figure 4 shows the values of CV
1, CV
2 and CV as functions of
d for integer values of
from 998 to 1002. We see that for
and
, there are points of transition where the condition moves from TRUE to FALSE as
d is reduced. The point corresponding to the settings of
and
d in our example is marked.
5. Discussion
This section presents self-contained pieces of discussion, the first three subsections relating only to the material in
Section 3.
5.1. Redundancy in the Model
The redundancy that exists in the determination of the
weights and shrinkage factors might make the problem appear poorly defined. However, it must be recognized that
and
are just quantities invented to minimize the MSE, and their actual values are unimportant. We could potentially set
and find the corresponding optimal value of each
, but we would then have obtained the same value of
that is given by (
9). Thus, we can accept a solution with the redundancy described. Our choice to set
equal to the weight of
in
facilitates the task of finding a solution and permits a simple demonstration of the result.
5.2. Statistical Validity
The values of and d are chosen by the party conducting the analysis, which presumably is the “pilot laboratory”. These figures represent an opinion, perhaps a consensus of opinions, and they would potentially be different if the pilot laboratory were different. Therefore, it might be thought that there is something too subjective about this method. However, whatever the figures of and d are chosen, the method gives a legitimate estimate of and a corresponding legitimate statement of standard uncertainty. In other words, these figures affect the numerical result but they do not affect the statistical validity of that result, which is a concept that relates to the reliability of the statement of uncertainty, i.e., the level of confidence we can have that the interval with limits encloses . We are using a subjective opinion in the design of the procedure, which is a sensible thing to do, but we are not using it in reporting the reliability of the result, which would be incorrect in a classical analysis. We are using the prior opinion to engineer a solution, not in the formal inference. (The practice of statistical engineering, i.e., the construction of some algorithm to perform as a tool, is to be differentiated from the practice of statistical inference, i.e., the statement of a conclusion about the real world with a justifiable level of assurance such as 95%).
If the subjective nature of the choice of d remains troubling, then the reader might consider three other points. The problematic term “uncertainty” itself implies the existence of subjectivity; otherwise, the relevant term would just be “variability”. And subjectivity is ubiquitous in a Type B analysis of uncertainty, yet that practice is accepted. Moreover, the existence of is external information that should be employed somehow; otherwise, we are not making best use of all that we are given.
Reproducibility and transparency are also important. Provided that and d are reported, the same results will be obtained by another analyst, and provided that it is acknowledged that these values were identified without regard to the values, the method is transparent.
5.3. Exclusive Reference Values
Up to this point, we have not distinguished the idea of a consensus estimate of from the idea of a reference value for a contribution such as . There is a compelling rationale for forming a reference value for from the set of observations obtained when is removed, in what has been called an “exclusive” analysis. Accordingly, the calculation of this reference value just requires a simple modification to the procedure: the observation is removed and the remaining observations are renumbered. Subsequently, the calculation of an value or a “degree of equivalence” for can proceed as normal using this exclusive reference value and its standard uncertainty.
5.4. Applicability in Metrology
The shrinkage estimation is only beneficial when the nominal value lies within a few experimental standard deviations of the true value, which is unlikely in many measurements. Therefore, we make no claim that the method is to be applied generally. On the other hand, if the value of
proposed is distant from the true value, then the additional method of
Section 4 will prevent it from having a detrimental effect on the result, so the combined method might be regarded as being applicable in every situation but as being beneficial only in some.
5.5. Shrinkage Estimation
The type of shrinkage estimation described in this paper was proposed by Thompson [
1]. One of the problems he considered relates to the archetypal form of a Type A evaluation of measurement uncertainty, where a sample of size
n is used to estimate the mean
of a normal distribution with unknown variance
[
6]. The sample-mean random variable is
and the sample-variance random variable is
. The variable
is known to be the unbiased estimator of
with minimum mean square error, which is
. However, this fact does not preclude the possibility that there is a
biased estimator with smaller mean square error. In fact, the estimator
with
has mean square error
and has the smallest mean square error among all fixed multiples of
. The optimal multiplier
is a positive value less than one, so the estimate is biased and is shrunk toward the origin. The multiplier
is unknown but can be approximated if we replace
and
by their unbiased estimators,
and
. This gives the estimator
in which the shrinkage factor is now a random variable. This shrinkage estimator has a lower mean square error than
when
but has an increased mean square error otherwise. Sometimes, there is a nominal value
for
and occasionally, there will be reason to believe that
is less than or comparable to the standard error of the sample mean,
. The value
acts as a new origin for the analysis, and the corresponding shrinkage estimator of
is
analogously to (
8) and (
10). This estimator has a lower mean square error than
when
.
Thus, the concept of shrinkage estimation could also find application in a Type A evaluation of measurement uncertainty. As is explained more fully in
Section 5.6, the fact that the bias would be overtly introduced into the measurement process would be unimportant because there would already be bias from the existence of the systematic effects that were treated, for convenience, as variances in a Type B evaluation.
The idea of shrinkage estimation that we have been discussing is general, and the principle has been called “shrinkage in the direct sense” [
2]. To some extent, it is exemplified in the improved estimation of parameters of probability distributions [
8,
9,
10]. The concept of moving a raw result toward a nominal value also features in a “shrinkage confidence interval” for estimating the mean of a normal distribution [
11]. However, the context in which the term “shrinkage estimation” is encountered in statistics is often more specific, the relevant objective typically being the estimation of the mean of a
multivariate normal distribution, e.g., [
12,
13]. Thus, a reader searching for the term “shrinkage estimation” might obtain many irrelevant results.
5.6. Shrinkage, Bias and Type B Evaluation
Shrinkage estimation introduces bias into the experimental part of the measurement in order to lower the MSE. Some readers might not like the idea that the experiment has become biased, but that would be to forget the meaning of a Type B evaluation of uncertainty. Let us consider this using a simple illustration. Suppose that the measurand is the length of a rod at a fixed temperature, and that is to be measured by comparison with the length of a similar standard rod at that temperature. Suppose that the length of the standard rod has measured value and associated standard uncertainty . Then, , where is the difference measured using some comparator. The difference is estimated several times in a statistical process, and the results are averaged to form its measured value and standard uncertainty using the familiar concepts of Type A evaluation. The measured value of is then defined to be , and the corresponding standard uncertainty is stated to be . In this simple situation, the difference is an unknown error whose value does not change from experiment to experiment. It is a bias, and the fact that it is represented by a variance in the uncertainty analysis does not change that. Such systematic errors are ubiquitous in practical measurement procedures.
The point being made is that in metrology, the shrinkage procedure would not be turning an unbiased procedure into a biased one; rather, it would just be altering the existing bias, perhaps even reducing it because of a cancellation effect. There is bias in the measurement before the shrinkage procedure is carried out, but it is a bias that has been modelled out of existence by the metrologist’s device of a Type B evaluation of uncertainty. It follows that if metrologists are comfortable with stating the limits of an expanded interval of uncertainty as when a typical Type B evaluation has been involved, then they should also be comfortable with stating the limits as when x and u are the results after shrinkage estimation has been applied.
A Type B evaluation is a procedure that is unfamiliar to statisticians. It makes much of statistical theory irrelevant: the familiar idea of bias is replaced by one of variance. The statistical theory of measurement becomes compromised by the acceptance, albeit the necessary acceptance, of a Type B evaluation as a pragmatic solution to a long-standing problem [
7]. The statistical rules change, and old tools and ideas become distractions. In an attempt to comes to terms with this, we ask the question “Is the measurement as a whole biased or is it just the experimental part of the measurement that is biased? Equivalently, do we see the establishment of the laboratory procedure and the calibration of equipment as being part of the measurement that we are referring to, or do they precede this measurement?” Consider our simple example with the rods. If we conceive of the measurement as beginning before we obtain the estimate
of the length of the standard rod
, (i.e., before the calibration of the equipment), then, because the potential value of the error
is modelled using a variance, we must see the final measured value
y as being the outcome of an unbiased procedure, in which case the whole measurement of
is unbiased. But if we only see the measurement as commencing when we subsequently begin to obtain the estimate
of
, then our measurement must be regarded as being biased because there is a pre-existing constant error
from the earlier calibration. Therefore, the terms “bias” and “measurement” must be used together carefully if metrologists are to accurately bring statistical ideas into their work.
One reasonable answer to this question of whether “measurement” includes the preliminary steps involves the idea that is called a component of “measurement uncertainty”. If the terminology is acceptable, then the answer must be that a measurement is the whole process; otherwise, the generation of the error would not be part of the measurement and, presumably, the term could not legitimately be referred to as part of the measurement uncertainty. Therefore, a logical solution to this problem of communication is to regard measurement as being an unbiased procedure (by definition).
5.7. The Concept of “Measurement”
It is fair to suggest that, in this proposal, the nominal value is being treated as part of the measurement procedure rather than as something external to it, so it might be thought that the proposal is challenging the concept of measurement. Whether that is the case depends on what the term “measurement” means to you and how strictly you interpret it. However, as now explained, prior information such as a nominal value is already being used
inside the measurement process in the approach to measurement described in supplements (GUM-S) [
14,
15] to the
Guide to the Expression of Uncertainty in Measurement (GUM) [
6]. Consider the situation where the measurand is
and the input quantity
is measured directly, and suppose that
x and
are the measured value of
and the associated standard uncertainty. If the original GUM formulation is applied, then the measured value of
is
. However, if the “Bayesian” approach advocated in GUM-S is adopted, then
x and
become the mean and standard deviation of the (posterior) distribution attributed to
, and the mean of the resulting distribution attributed to
is
. Thus, the measured value is instead
. An unstated step in this Bayesian analysis is the prior attribution of a uniform distribution to
to represent minimal prior information about it. This accords with the fact that a fundamental part of a Bayesian analysis is the attribution of a prior distribution to each unknown to represent whatever prior information or belief there is about it, such as the existence of a physical bound. From this, we infer that the approach to data analysis adopted in GUM-S has implicitly accepted the use of prior information
within the measurement itself. The statistical validity of any Bayesian analysis is linked to the acceptability of the subjective prior distributions, which is problematic. Also, the basic principle of attributing a continuous probability distribution to a measurand has recently been shown to lead to an internal contradiction [
3]. In contrast, the proposed shrinkage-estimation procedure is a classical way to make use of prior information while maintaining statistical validity.