Previous Article in Journal / Special Issue
Metrological Aspects of Soft Sensors for Estimating the DC-Link Capacitance of Frequency Inverters
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Shrinkage Estimation to Minimize Error in Measurement Estimates and Consensus Values

by
Robin Willink
1,2
1
Measurement Standards Laboratory of New Zealand, Lower Hutt 5010, New Zealand
2
Department of the Dean, University of Otago, Wellington 6021, New Zealand
Metrology 2026, 6(2), 26; https://doi.org/10.3390/metrology6020026
Submission received: 9 April 2025 / Revised: 27 January 2026 / Accepted: 5 March 2026 / Published: 9 April 2026
(This article belongs to the Collection Measurement Uncertainty)

Abstract

This paper considers the measurement of a quantity when a nominal value or previous estimate is available, which is the case with a quantity designed to be zero or which might be the case when a consensus value is to be calculated in a measurement comparison. If an upper bound can be placed on the magnitude of the difference between the nominal value and the true value, then the mean square error of the overall measurement procedure can be reduced by a statistical method known as shrinkage estimation. We describe the method for use in an individual measurement, but we give a deeper analysis assuming the context of a measurement comparison.

1. Introduction

In the archetypal measurement, the only thing known in advance about the true value θ is that it is suitable for study using the chosen technique. Consequently, the measured value x and the accompanying standard uncertainty u u ( x ) are obtained using only the data and the model of the measurement procedure. However, there are some situations where more is known about θ : in particular, there are situations where there is a nominal value for θ , which we denote by x nom . For example, in the measurement of a quantity θ that acts as an unwanted bias in an experimental procedure, the nominal value will, by design, be x nom = 0 . The existence of a nominal value is information that we would like to use to obtain an estimate of θ that is, in some sense, more accurate than the raw measured value x. Here, we show how the mean square error in the overall measurement procedure is reduced by moving the estimate partway towards x nom . The analysis describes a statistical concept called shrinkage estimation [1,2].
The nominal value x nom , or, equivalently, a previous unrelated estimate, is prior information about θ . Such information is to be employed as fully as possible without overstating its meaning or importance. The use of prior information is a strength of a Bayesian statistical analysis, but the associated need to accurately encode such information in a joint prior probability distribution for every relevant constant is a weakness. Such a distribution is a mental construct that involves the idea of subjective probability, not an objective frequency distribution that can be examined experimentally. The fact that individual subjective probability statements cannot be falsified, i.e., cannot hypothetically be shown to be incorrect, is enough to render them unacceptable to many scientists. Also, a strong argument has recently been given in the metrology literature that the foundation of Bayesian theory and practice is unsound [3]. Thus, an ideal method of analysis might make use of prior information while retaining only the objective concept of frequency-based probability. Such a method is presented here, where the prior information is in a form that is acceptable to the classical, frequentist, statistician.
The principle of shrinkage estimation might be applied in an individual measurement, but it also might be used in the combination of data, as, for example, in the formation of a consensus value in a measurement comparison. Section 2 describes the basic idea of shrinkage estimation with an individual measurement, and then Section 3 uses this idea to propose a new analysis for a consensus value in a comparison. Section 4 considers how the data from the comparison can be used to accept or reject the influence of the nominal value, and Section 5 contains a discussion. Our notation is standard in statistical writing. A Greek letter, e.g., θ or Ω , indicates an unknown and unknowable true value. A lower-case italic Latin letter, e.g., x 1 or a, is used for a figure or observation that is known or will be determined, while an upper-case italic Latin letter, e.g., X 1 , indicates a corresponding random variable, which can be understood to represent the procedure or process that generates the observation. In keeping with the classical, frequentist, paradigm of statistics, statements of probability are only made about the potential results of procedures, not about constants. Therefore, a probability distribution is not attributed to a measurand such as θ .

2. A Shrinkage Estimator

Imagine that you measure a quantity ϕ known to be small and that the figure you record is y. It is natural to take y as the measured value of ϕ , but the additional information that ϕ is small has not been used in making that assessment. If someone asked you “Is ϕ less than y or is it greater than y?”, a reasonable answer might be “If I had to choose one or the other, I would say it was less than y because I know that ϕ is small, not large.” Such an answer would be based on combining knowledge of the data with prior knowledge about ϕ . As is now illustrated, this idea of supplementing data with prior information can lead to an improved measurement process in some scenarios.
Suppose that the measurand ϕ has true value 1.5 and that the measurement procedure is unbiased with standard deviation u = 1 . Imagine a long series of measurements that gives the set of figures { y } . The mean square error (MSE) of an estimator is the square of its bias plus its variance, so the estimates in the set { y } have an MSE of 0 2 + u 2 = 1 , while the estimates in the alternative set { 0.8 y } have an MSE of ( 0.2 × 1.5 ) 2 + 0 . 8 2 u 2 = 0.73 . Thus, with regard to the MSE, the regular choice of 0.8 y instead of y as the measured value would be superior! The presence of the squared bias associated with moving an otherwise-unbiased estimate towards the origin is, in this case, more than offset by the reduction in variance: the relationship of the difference ( ϕ origin ) to the standard error u is such that there is benefit. The method can be seen to involve a trade-off between bias and variance that, in this case, favours the presence of a small bias.
The bias and the variance in the more general estimate a y are ( 1 a ) ϕ and a 2 u 2 , so the MSE, which is unknown and which we denote by Ω , is
Ω = ( 1 a ) 2 λ 2 + a 2 u 2 λ | ϕ | / u .
Figure 1 depicts the ratio Ω / u 2 as a function of a for various values of the dimensionless unknown λ . The ratio is smaller than 1 for any value of a in the interval from ( λ 2 1 ) / ( λ 2 + 1 ) to 1, but it is larger for any value of a outside this interval. If λ 1 and 0 a 1 , then the ratio never exceeds 1. The ratio is minimized with respect to a when
a = λ 2 λ 2 + 1 ,
where its value is equal to a (marked). In our example with λ = 1.5 , any value of a between 0.385 and 1 leads to an improvement, the improvement is greatest at a = 0.692 , and the MSE achieved at that point is 0.692 , because u = 1 .
Thus, there is an improvement in the MSE whenever λ < ( 1 + a ) / ( 1 a ) , and the improvement can be substantial at small values of λ . However, the price to pay for this too-good-to-be-true behaviour is an increase in MSE when λ is larger. For example, if λ had been equal to 5 in our example, then the MSE with a = 0.8 would have been 1.64. In this way, accurate prior information about ϕ (and hence λ ) leads to a gain, but inaccurate prior information leads to a loss, as might be expected. (The same concepts apply with the mean absolute error. For example, with ϕ = 1.5 and u = 1 , the mean absolute error is 2 / π = 0.798 when a = 1 but it is 0.638 when a = 0.8 .)
It follows that if any upper bound can be put on λ , then a range of values of a exists for which the mean square error will be less than u 2 . Thus, if we are confident about the maximum possible value of | ϕ | / u , then some corresponding estimate a y should be used instead of y. The estimate a y is formed by shrinking the original estimate y towards the origin, so it is called a shrinkage estimate. The random variable realized in a y , i.e., the combined procedure of measurement and analysis that generated the estimate, is called a shrinkage estimator. The factor a is called a shrinkage factor, though the term “expansion factor” seems more meaningful. In practice, the technique finds application when there is a nominal value for the measurand or a previous estimate of it, with this figure acting as a new origin. Then, ϕ is the difference between the true value of the measurand and this nominal figure, so the nominal value of ϕ is zero and the theory above applies. (This is one context for the term “shrinkage estimation”. A different context encountered more often in the statistical literature is briefly mentioned in Section 5.5.)

Example 1—Estimation of a Systematic Effect

Suppose that a certain step in an experimental procedure incurs a fixed error for which a historical correction is available. The laboratory seeks to improve the correction by designing an experiment to measure the size of this effect. The difference between the true value of the effect and the value implied by the correction is ϕ , which is a small quantity with nominal value 0. Suppose that the random error in the measurement procedure has standard deviation u = 20 and that the laboratory is confident that | ϕ | 60 . Then, the laboratory assumes that λ 3 , meaning that it can choose any value of a in the interval from 0.8 to 1. From (2), the best choice appears to be a = 0.9 . The measurement is carried out and the raw, unbiased, estimate of ϕ is found to be y = 12.0 , so the shrinkage estimate of ϕ is 0.9 × 12.0 = 10.8 . Both the raw estimate and the prior information have been used appropriately, and the laboratory is confident that the new figure 10.8 is the result of a procedure with reduced mean square error. Therefore, this is the preferred estimate of ϕ . We see in Section 3.3 that an appropriate figure to state as the standard uncertainty of this estimate is a u 19 .

3. Application in a Measurement Comparison

Frequently, measured values of the same quantity θ are obtained from different laboratories, and these values are compared to assess whether the accompanying statements of measurement uncertainty represent the capabilities of the laboratories adequately. Usually, each measured value is compared to a consensus value that acts as the best estimate of θ . Often, because there is no other source of information, this consensus value must be calculated solely from the submitted measured values and their stated uncertainties. However, sometimes there might be a nominal value for θ , perhaps an estimate obtained in an earlier unrelated measurement. This nominal value can act as a new origin, and then shrinkage estimation can be applied. This section gives details of the analysis and shows how the relevant shrinkage factor can be chosen.
Consider the situation where a stable artefact with unknown true value θ is measured by a number of laboratories and where n submitted measured values x 1 , , x n are selected as being jointly consistent given their accompanying figures of standard uncertainty, u 1 , , u n . These n values of x i are to be formed into a consensus value, CV, using the figures of uncertainty. The usual model for the generation of the data states that (a) the measured value from the ith laboratory, x i , was drawn from a distribution with mean θ and standard deviation u i , and that (b) the n overall errors in the processes of generating the measured values were incurred independently. Then, the consensus value is given by the familiar inverse-variance-weighted mean
CV 1 x i / u i 2 1 / u i 2 .
(Unless indicated otherwise, all summation in this paper is over i = 1 , , n ). Because the process that generated this estimate is regarded as being unbiased, it is appropriate to take the standard uncertainty of CV 1 to be the standard deviation of the combined estimator of θ , which is
u u CV 1 1 1 / u i 2 .
This model and the means of combination are broadly accepted, e.g., [4], and they form the basis of the procedure proposed here. (The quantity u CV 1 appears frequently in what follows, so we often represent this quantity using the simpler symbol u ).
Shrinkage estimation can potentially be applied to form an alternative consensus value if the true value θ has a nominal value or an existing estimate, x nom . We simply apply the theory of Section 2 with ϕ = θ x nom and y i = x i x nom , and then we readjust by adding x nom to the figure obtained. If | θ x nom | is sufficiently small in relation to a standard uncertainty u i , then the contribution of the corresponding measured value x i x nom to the overall mean square error in a weighted sum like CV 1 is reduced if we use a i ( x i x nom ) instead, provided that the shrinkage factor a i is suitably chosen. Because each u i is different, each optimal a i might possibly be different. But it is important to realize that this step would not amount to changing or rejecting the data: it would merely be using the nominal value x nom and the stated uncertainties { u i } to conduct a more accurate combined measurement of θ .
The attributes of bias and mean square error are long-run properties of estimators (random variables), not strictly properties of the one-off estimates. Thus, it is helpful to express these ideas in terms of random variables rather than observed figures. Let X i be the random variable observed (realized) in the quantity x i . The model states that X i has a mean equal to the true value θ , has variance u i 2 , and is independent of X j . This can be written as
X i ( θ , u i 2 ) X i X j .
Here, u i is seen as a constant parameter of the measurement process; it is not seen as the outcome or observation of a random variable. The task is to choose weights { w i } and associated shrinkage factors { a i } to minimize the mean square error of the random variable
T w i a i X i x nom w i + x nom .
This is to be achieved using the information available to us before studying the x i data, which comprises the model, the nominal value x nom and the set of standard uncertainties { u i } . Subsequently, the x i data are observed and we observe the realization of T, which is taken as the consensus value. Thus, the consensus value obtained by this approach is
CV 2 w i a i ( x i x nom ) w i + x nom .

3.1. Identifying the Consensus Value

Equation (6) shows that the consensus value CV 2 is determined by the n values of w i a i / w i in addition to the x i data and the nominal value. But there are 2 n values to be chosen in the combined set of weights and shrinkage factors { w i , a i } . Therefore, without any loss of flexibility, we can adopt the same weights that were used in the formation of the standard consensus value CV 1 and subsequently choose the n optimal values of a i . Now, we set
w i = 1 / u i 2 j = 1 n 1 / u j 2 ,
which means that w i = 1 and that
w i u i 2 = u 2 .
The figure CV 2 is the outcome of a random variable T that can be written as
T = w i a i X i w i ( a i 1 ) x nom
because w i = 1 . The bias of the individual shrunken estimator a i X i is
E [ a i X i θ ] = ( a i 1 ) θ ,
(where E [ · ] denotes the expected value), so the bias of T is
bias ( T ) = ( θ x nom ) 1 w i a i .
Also, the variance of T is
variance ( T ) = w i 2 u i 2 a i 2 = u 2 w i a i 2 ,
using (7). Therefore, the mean square error of T is
Ω ( T ) = ( θ x nom ) 2 1 w i a i 2 + u 2 w i a i 2 .
This is unknown because θ is unknown, but we wish to minimize it as best we can by a judicious choice of each a i .
Our approach is to state some known positive figure d that is regarded as an approximation to, or upper bound on, | θ x nom | and then to choose each a i to minimize the value of Ω ( T ) that would exist if | θ x nom | happened to be equal to d, which is
d 2 1 w i a i 2 + u 2 w i a i 2 .
Differentiating this with respect to each a i , setting the results to zero and simplifying gives the n equations
a i = d 2 u 2 1 j = 1 n w j a j i = 1 , , n .
The quantity on the right-hand sides of these equations does not depend on the index i, which implies that the optimal values for a 1 , a n are equal. Substituting a for a i and a j in these equations and solving gives
a opt d 2 d 2 + u 2
as the optimal choice of shrinkage factor for the specified value of d. Then, (6) implies that the associated consensus value is
CV 2 = ( 1 a opt ) x nom + a opt CV 1
= x nom + a opt CV 1 x nom .
Because 0 < a opt < 1 , we can see that the proposed consensus value CV 2 is a convex weighted sum of the nominal value x nom and the standard consensus value CV 1 . Given the data, the procedure is determined by specifying x nom and d, which must be done without being influenced by the x i figures because the procedure must be considered to be fully defined before the randomness modelled by (5) acts.
Several observations can be made about the appropriateness of CV 2 :
  • Each estimate has been shrunk using the same factor, a opt . This emphasizes further that no data are being adjusted.
  • If d = 0 , then the nominal value is being regarded as exact, in which case CV 2 = x nom .
  • As d , a opt 1 and so CV 2 CV 1 . Thus, as the quality of the prior information diminishes, the difference between the two consensus values diminishes.
  • As n , u 2 0 and so CV 2 CV 1 . As the prior information becomes dominated by the data, the consensus value responds accordingly.

3.2. Comparison of MSEs

Let us now compare the mean square errors of the proposed procedure with the MSE of the procedure that results in CV 1 , which is Ω 1 = u 2 , from (4). Equation (9) shows that CV 2 arises as a weighted sum of the nominal value x nom and the standard estimate CV 1 . The term involving x nom is a constant that contains bias, while the term involving CV 1 has not been subject to bias but has been subject to variance. The MSE of the proposed procedure is
Ω 2 = ( 1 a opt ) 2 ( θ x nom ) 2 + a opt 2 u 2 .
Define the dimensionless unknown λ | θ x nom | / u . (The quantity λ here is analogous to the quantity λ in Section 2.) We find that
Ω 2 Ω 1 = ( 1 a opt ) 2 λ 2 + a opt 2 ,
which echoes (1). Figure 2 shows the ratio Ω 2 / Ω 1 as a function of λ for several different values of a opt (unlike Figure 1, which shows the ratio as a function of a for several different values of λ ). The ratio of the mean square errors is smaller than one if and only if
λ 2 < 1 + a opt 1 a opt
and it can be as low as a opt 2 , which occurs when λ = 0 , i.e., when θ = x nom .
(The material in Section 2 implies that the minimum value of Ω 2 taken with respect to a opt at a fixed value of λ is λ 2 / ( λ 2 + 1 ) × u 2 . Accordingly, the relationship ‘ Ω 2 / Ω 1 = λ 2 / ( λ 2 + 1 ) ’, which is shown by the enveloping dotted line, describes the greatest lower bound to the family of curves that would be obtained using all values of a opt .)
Figure 3 shows the ratio of MSEs against the dimensionless quantity
d u a opt 1 a opt ,
(whereas Figure 1 showed the ratio of MSEs against a). From this and from (2) and (8), we can see that for a fixed θ , the MSE is minimized when d = θ x nom , i.e., when d / u = λ , as might be expected. However, from (11), we find that there is a reduction in the MSE whenever we choose d such that
d 2 > θ x nom 2 u 2 2 ,
which is guaranteed if we choose d > θ x nom / 2 . Thus, for an improvement in MSE, it is not necessary for d to be an upper bound on θ x nom , and there is considerable room for misjudgement in assessing the value of d to represent θ x nom .

3.3. Standard Uncertainty of the Consensus Value

In an unbiased measurement, the square of the standard uncertainty is to act as an estimate of the variance in the measurement procedure. But when a measurement is potentially biased, any single figure of uncertainty must also include a component relating to the bias. Therefore, let us now consider how to express the measurement uncertainty when CV 2 is used as the combined estimate of θ .

3.3.1. Propagation of Mean Square Error

To find the appropriate representation of uncertainty, we consider the indirect measurement of θ k = 1 m θ k by the summation of biased measured values of each component θ k . We can write X k = θ k + β k + E k , ran , where X k is the estimator of θ k , β k is the bias, and E k , ran is the random variable for the corresponding random error. The standard estimator of θ is X k = 1 m X k and this has an overall bias k = 1 m β k . If the contributing biases have different signs, then there is some cancellation and the overall bias does not have its “worst-case” magnitude. Suppose that the signs of the contributing biases are random and that we can regard the generation of β k as being the realization of some random variable B k with mean zero and some variance σ 2 ( B k ) . Then, k = 1 m β k 2 is the realization of the variable k = 1 m B k 2 , which has mean k = 1 m σ 2 ( B k ) , while k = 1 m β k 2 is the realization of the variable k = 1 m B k 2 , which also has mean k = 1 m σ 2 ( B k ) because the random nature of the signs implies that the expected value of the product B k 1 B k 2 is zero for k 1 k 2 . In other words, the effect of the sum of biases in the overall error is represented, on average, by the sum of the squares of the biases. We see that, on average, the squares of the biases propagate along a chain of measurement in just the same manner as the variances of the random errors. It follows that we can also consider MSEs to propagate additively along a long chain of unrelated measurements. Thus, if a single figure is to be used to describe the size of the potential error in a biased measurement, then assuming that the accurate propagation of error or uncertainty is the objective, the figure should be the square root of the MSE, not the standard deviation, the two being equal when there is no bias.
This conclusion is in keeping with the concept of Type B analysis endorsed in CIPM Recommendation INC-1 [5,6] and described more clearly in the parent report [7]. In Type B analysis, β k is considered to be drawn from a population with mean zero and some known variance u k , sys 2 , i.e., to be the outcome of a random variable with mean zero and variance u k , sys 2 . The overall variance attributed to X i is u k 2 = u k , sys 2 + u k , ran 2 , as in (5). We can interpret u k , sys 2 as being an estimate of the square of the bias β k , in which case u k 2 is just an estimate of what, in the years preceding the acceptance of the Type B evaluation, would have been called mean square error. Thus, known variances are acting for unknown biases within the uncertainty analysis, and, in effect, all bias is modelled out of the measurement.

3.3.2. Analysis of Mean Square Error

Let us return to our context where we have n measurement results { ( x i , u i ) } that are to be combined to form a consensus value. Each of the measurements is subject to a Type B evaluation, so individually, each is modelled as an unbiased measurement, as in (5). But now, we have deliberately introduced an unknown bias ( 1 a opt ) ( θ x nom ) into the combination process through the shrinkage factor. Following the argument above, we wish to calculate the square root of a suitable estimate of the MSE given in (11) and to state this as the “standard uncertainty” of CV 2 . It seems clear that this estimate of the MSE is to have the form
1 a opt 2 ( k d ) 2 + a opt 2 u 2
for some as yet unspecified multiplier k. The choice of k depends upon how we interpret d and upon our attitude to the idea that the uncertainty analysis is to err on the side of conservatism. If we see d 2 as being an unbiased estimate of ( θ x nom ) 2 , then we would set k = 1 , while if we see d as an upper bound on | θ x nom | , then we would perhaps set k = 1 and so overstate the measurement uncertainty for conservatism. However, we might choose a larger value of k for conservatism in other circumstances. In general, we suggest setting k = 1 , so that the standard uncertainty stated with the proposed consensus value CV 2 is
u CV 2 1 a opt 2 d 2 + a opt 2 u 2 = a opt u CV 1 .
Then, the standard uncertainty associated with CV 2 is no greater than the standard uncertainty associated with CV 1 , which accords with the idea that we have made use of more information in obtaining the alternative estimate. (When n = 1 we obtain the situation of a single measurement in Section 2. Equation (12) then justifies our use of a u 19 as the standard uncertainty in the example of Section 2.)

3.3.3. Another Derivation—And a Potential Misconception

We can reach the same expression for the uncertainty in CV 2 by a different, perhaps faulty, argument. Dividing both numerator and denominator in (8) by d 2 u 2 shows that
a opt = 1 / u i 2 1 / u i 2 + 1 / d 2 .
Then, using (9), we can write
CV 2 = x i / u i 2 + x nom / d 2 1 / u i 2 + 1 / d 2 .
In this expression, the quantity d 2 appears as if it were the variance underlying an additional “observation”, x nom , so the figure CV 2 is the figure that would be obtained using the usual method of analysis if x nom were an additional observation with standard uncertainty d. This can encourage us to think that the squared standard uncertainty to state with CV 2 should be
u CV 2 2 = 1 1 / d 2 + 1 / u i 2 = a opt u CV 1 2 ,
which is (12). This might appear to be a simpler derivation of (12), but the logic is questionable. The argument treats x nom as if it varies around θ over repeated measurements like the other observations, but we wish to estimate the MSE with x nom fixed, so it does not seem advisable to rely upon this derivation of (12).
The form of (13) shows again that shrinking the measured values towards the nominal value does not correspond to modifying the data.

3.4. Example 2

Imagine that an artefact with unknown true value θ is circulated for measurement in a comparison. Suppose that the laboratory given the task of analysing the comparison data has available a previous independent estimate x nom = 1000 and that the laboratory is confident that 999 θ 1001 . Thus, the laboratory sets d = 1 . Suppose that, after the comparison data are received, it is decided to use the set of n = 6 data pairs { ( x 1 , u 1 ) , , ( x 6 , u 6 ) } indicated in Table 1 to calculate the consensus value. For the standard method, Equations (3) and (4) give
CV 1 = 1000.32 u u CV 1 = 0.70
and for the shrinkage estimation, (8), (9) and (12) subsequently give
a opt = 0.67 CV 2 = 1000.21 u CV 2 = 0.57 .
Because d is greater than u in (8), the optimal shrinkage factor is greater than 0.5, and CV 2 is closer to CV 1 than to x nom . Also, in accordance with the idea that CV 2 is derived using more information than CV 1 , the standard uncertainty of CV 2 is smaller than the standard uncertainty of CV 1 .

4. Compatibility of the Nominal Value with the Data

The method involves a trade-off between bias and variance, with the bias being proportional to the absolute value of the difference between x nom and θ . We can prevent an unrealistic value of x nom from having excessive effect by assessing its compatibility with the data. If the data are collectively inconsistent with x nom , then the prior information represented by the use of x nom can be disregarded and there is no harm to our results. This principle applies in the situation of Section 2, where we would compare the value ( y 0 ) against u to assess whether zero was a feasible value for ϕ , and it also applies in the context of the measurement comparison in Section 3.
Consider the analysis in Section 3. If CV 1 x nom > 2 ( h d ) 2 + u 2 for some appropriate multiplier h, then we have statistical evidence at the 0.05 level to reject the idea that x nom and d are compatible with the data, in which case we can discard the prior information as being unreliable. In that situation, we quote the standard results CV 1 and u CV 1 instead of CV 2 and u CV 2 . Thus, CV 2 is preferred to CV 1 only when d is large enough for x nom to be reliable.
Because d is intended to be an estimate or overestimate of | θ x nom | , we can set h = 1 . The relevant condition becomes
condition CV 1 x nom 2 d 2 + u CV 1 2 ,
and the final recommended figures CV and u CV are then given by the expressions
CV = CV 2 condition = TRUE CV 1 condition = FALSE
and
u CV = u CV 2 condition = TRUE u CV 1 condition = FALSE .

Example 2—Continued

We find from the statistics calculated in Section 3.4 that the condition is met, so d is large enough for the perceived reliability of x nom to be acceptable. Therefore, we set the final figures to be those that were obtained in the shrinkage estimation, i.e., we set CV = 1000.21 and u CV = 0.57 .
Let us consider again the data in Table 1, and now, let us examine the behaviour of the different estimates if x nom and d are hypothetically permitted to vary. Figure 4 shows the values of CV1, CV2 and CV as functions of d for integer values of x nom from 998 to 1002. We see that for x nom = 998 and x nom = 1002 , there are points of transition where the condition moves from TRUE to FALSE as d is reduced. The point corresponding to the settings of x nom and d in our example is marked.

5. Discussion

This section presents self-contained pieces of discussion, the first three subsections relating only to the material in Section 3.

5.1. Redundancy in the Model

The redundancy that exists in the determination of the 2 n weights and shrinkage factors might make the problem appear poorly defined. However, it must be recognized that w i and a i are just quantities invented to minimize the MSE, and their actual values are unimportant. We could potentially set w 1 = w 2 = w 3 = = 1 and find the corresponding optimal value of each a i , but we would then have obtained the same value of CV 2 that is given by (9). Thus, we can accept a solution with the redundancy described. Our choice to set w i equal to the weight of x i in CV 1 facilitates the task of finding a solution and permits a simple demonstration of the result.

5.2. Statistical Validity

The values of x nom and d are chosen by the party conducting the analysis, which presumably is the “pilot laboratory”. These figures represent an opinion, perhaps a consensus of opinions, and they would potentially be different if the pilot laboratory were different. Therefore, it might be thought that there is something too subjective about this method. However, whatever the figures of x nom and d are chosen, the method gives a legitimate estimate of θ and a corresponding legitimate statement of standard uncertainty. In other words, these figures affect the numerical result but they do not affect the statistical validity of that result, which is a concept that relates to the reliability of the statement of uncertainty, i.e., the level of confidence we can have that the interval with limits CV 2 ± k u CV 2 encloses θ . We are using a subjective opinion in the design of the procedure, which is a sensible thing to do, but we are not using it in reporting the reliability of the result, which would be incorrect in a classical analysis. We are using the prior opinion to engineer a solution, not in the formal inference. (The practice of statistical engineering, i.e., the construction of some algorithm to perform as a tool, is to be differentiated from the practice of statistical inference, i.e., the statement of a conclusion about the real world with a justifiable level of assurance such as 95%).
If the subjective nature of the choice of d remains troubling, then the reader might consider three other points. The problematic term “uncertainty” itself implies the existence of subjectivity; otherwise, the relevant term would just be “variability”. And subjectivity is ubiquitous in a Type B analysis of uncertainty, yet that practice is accepted. Moreover, the existence of x nom is external information that should be employed somehow; otherwise, we are not making best use of all that we are given.
Reproducibility and transparency are also important. Provided that x nom and d are reported, the same results will be obtained by another analyst, and provided that it is acknowledged that these values were identified without regard to the x i values, the method is transparent.

5.3. Exclusive Reference Values

Up to this point, we have not distinguished the idea of a consensus estimate of θ from the idea of a reference value for a contribution such as ( x k , u k ) . There is a compelling rationale for forming a reference value for x k from the set of observations obtained when ( x k , u k ) is removed, in what has been called an “exclusive” analysis. Accordingly, the calculation of this reference value just requires a simple modification to the procedure: the observation ( x k , u k ) is removed and the remaining observations are renumbered. Subsequently, the calculation of an E n value or a “degree of equivalence” for ( x k , u k ) can proceed as normal using this exclusive reference value and its standard uncertainty.

5.4. Applicability in Metrology

The shrinkage estimation is only beneficial when the nominal value lies within a few experimental standard deviations of the true value, which is unlikely in many measurements. Therefore, we make no claim that the method is to be applied generally. On the other hand, if the value of x nom proposed is distant from the true value, then the additional method of Section 4 will prevent it from having a detrimental effect on the result, so the combined method might be regarded as being applicable in every situation but as being beneficial only in some.

5.5. Shrinkage Estimation

The type of shrinkage estimation described in this paper was proposed by Thompson [1]. One of the problems he considered relates to the archetypal form of a Type A evaluation of measurement uncertainty, where a sample of size n is used to estimate the mean θ of a normal distribution with unknown variance σ 2 [6]. The sample-mean random variable is X ¯ k = 1 n X k / n and the sample-variance random variable is S 2 k = 1 n ( X k X ¯ ) 2 / ( n 1 ) . The variable X ¯ is known to be the unbiased estimator of θ with minimum mean square error, which is σ 2 / n . However, this fact does not preclude the possibility that there is a biased estimator with smaller mean square error. In fact, the estimator α X ¯ with α θ 2 / ( θ 2 + σ 2 / n ) has mean square error α σ 2 / n and has the smallest mean square error among all fixed multiples of X ¯ . The optimal multiplier α is a positive value less than one, so the estimate is biased and is shrunk toward the origin. The multiplier α is unknown but can be approximated if we replace θ and σ 2 by their unbiased estimators, X ¯ and S 2 . This gives the estimator
X ¯ 2 X ¯ 2 + S 2 / n × X ¯ ,
in which the shrinkage factor is now a random variable. This shrinkage estimator has a lower mean square error than X ¯ when | θ | σ / n but has an increased mean square error otherwise. Sometimes, there is a nominal value x nom for θ and occasionally, there will be reason to believe that | θ x nom | is less than or comparable to the standard error of the sample mean, σ / n . The value x nom acts as a new origin for the analysis, and the corresponding shrinkage estimator of θ is
X ˜ x nom + 1 + S 2 / n X ¯ x nom 2 1 X ¯ x nom ,
analogously to (8) and (10). This estimator has a lower mean square error than X ¯ when | θ x nom | σ / n .
Thus, the concept of shrinkage estimation could also find application in a Type A evaluation of measurement uncertainty. As is explained more fully in Section 5.6, the fact that the bias would be overtly introduced into the measurement process would be unimportant because there would already be bias from the existence of the systematic effects that were treated, for convenience, as variances in a Type B evaluation.
The idea of shrinkage estimation that we have been discussing is general, and the principle has been called “shrinkage in the direct sense” [2]. To some extent, it is exemplified in the improved estimation of parameters of probability distributions [8,9,10]. The concept of moving a raw result toward a nominal value also features in a “shrinkage confidence interval” for estimating the mean of a normal distribution [11]. However, the context in which the term “shrinkage estimation” is encountered in statistics is often more specific, the relevant objective typically being the estimation of the mean of a multivariate normal distribution, e.g., [12,13]. Thus, a reader searching for the term “shrinkage estimation” might obtain many irrelevant results.

5.6. Shrinkage, Bias and Type B Evaluation

Shrinkage estimation introduces bias into the experimental part of the measurement in order to lower the MSE. Some readers might not like the idea that the experiment has become biased, but that would be to forget the meaning of a Type B evaluation of uncertainty. Let us consider this using a simple illustration. Suppose that the measurand is the length θ of a rod at a fixed temperature, and that θ is to be measured by comparison with the length θ 1 of a similar standard rod at that temperature. Suppose that the length of the standard rod has measured value y 1 and associated standard uncertainty u ( y 1 ) . Then, θ = θ 1 + θ 2 , where θ 2 is the difference measured using some comparator. The difference θ 2 is estimated several times in a statistical process, and the results are averaged to form its measured value y 2 and standard uncertainty u ( y 2 ) using the familiar concepts of Type A evaluation. The measured value of θ is then defined to be y y 1 + y 2 , and the corresponding standard uncertainty is stated to be u ( y ) u 2 ( y 1 ) + u 2 ( y 2 ) . In this simple situation, the difference y 1 θ 1 is an unknown error whose value does not change from experiment to experiment. It is a bias, and the fact that it is represented by a variance in the uncertainty analysis does not change that. Such systematic errors are ubiquitous in practical measurement procedures.
The point being made is that in metrology, the shrinkage procedure would not be turning an unbiased procedure into a biased one; rather, it would just be altering the existing bias, perhaps even reducing it because of a cancellation effect. There is bias in the measurement before the shrinkage procedure is carried out, but it is a bias that has been modelled out of existence by the metrologist’s device of a Type B evaluation of uncertainty. It follows that if metrologists are comfortable with stating the limits of an expanded interval of uncertainty as x ± k u when a typical Type B evaluation has been involved, then they should also be comfortable with stating the limits as x ± k u when x and u are the results after shrinkage estimation has been applied.
A Type B evaluation is a procedure that is unfamiliar to statisticians. It makes much of statistical theory irrelevant: the familiar idea of bias is replaced by one of variance. The statistical theory of measurement becomes compromised by the acceptance, albeit the necessary acceptance, of a Type B evaluation as a pragmatic solution to a long-standing problem [7]. The statistical rules change, and old tools and ideas become distractions. In an attempt to comes to terms with this, we ask the question “Is the measurement as a whole biased or is it just the experimental part of the measurement that is biased? Equivalently, do we see the establishment of the laboratory procedure and the calibration of equipment as being part of the measurement that we are referring to, or do they precede this measurement?” Consider our simple example with the rods. If we conceive of the measurement as beginning before we obtain the estimate y 1 of the length of the standard rod θ 1 , (i.e., before the calibration of the equipment), then, because the potential value of the error y 1 θ 1 is modelled using a variance, we must see the final measured value y as being the outcome of an unbiased procedure, in which case the whole measurement of θ is unbiased. But if we only see the measurement as commencing when we subsequently begin to obtain the estimate y 2 of θ 2 , then our measurement must be regarded as being biased because there is a pre-existing constant error y 1 θ 1 from the earlier calibration. Therefore, the terms “bias” and “measurement” must be used together carefully if metrologists are to accurately bring statistical ideas into their work.
One reasonable answer to this question of whether “measurement” includes the preliminary steps involves the idea that u ( y 1 ) is called a component of “measurement uncertainty”. If the terminology is acceptable, then the answer must be that a measurement is the whole process; otherwise, the generation of the error y 1 θ 1 would not be part of the measurement and, presumably, the term u ( y 1 ) could not legitimately be referred to as part of the measurement uncertainty. Therefore, a logical solution to this problem of communication is to regard measurement as being an unbiased procedure (by definition).

5.7. The Concept of “Measurement”

It is fair to suggest that, in this proposal, the nominal value is being treated as part of the measurement procedure rather than as something external to it, so it might be thought that the proposal is challenging the concept of measurement. Whether that is the case depends on what the term “measurement” means to you and how strictly you interpret it. However, as now explained, prior information such as a nominal value is already being used inside the measurement process in the approach to measurement described in supplements (GUM-S) [14,15] to the Guide to the Expression of Uncertainty in Measurement (GUM) [6]. Consider the situation where the measurand is θ = λ 2 and the input quantity λ is measured directly, and suppose that x and u x are the measured value of λ and the associated standard uncertainty. If the original GUM formulation is applied, then the measured value of θ is y = x 2 . However, if the “Bayesian” approach advocated in GUM-S is adopted, then x and u x become the mean and standard deviation of the (posterior) distribution attributed to λ , and the mean of the resulting distribution attributed to θ is x 2 + u x 2 . Thus, the measured value is instead y = x 2 + u x 2 . An unstated step in this Bayesian analysis is the prior attribution of a uniform distribution to λ to represent minimal prior information about it. This accords with the fact that a fundamental part of a Bayesian analysis is the attribution of a prior distribution to each unknown to represent whatever prior information or belief there is about it, such as the existence of a physical bound. From this, we infer that the approach to data analysis adopted in GUM-S has implicitly accepted the use of prior information within the measurement itself. The statistical validity of any Bayesian analysis is linked to the acceptability of the subjective prior distributions, which is problematic. Also, the basic principle of attributing a continuous probability distribution to a measurand has recently been shown to lead to an internal contradiction [3]. In contrast, the proposed shrinkage-estimation procedure is a classical way to make use of prior information while maintaining statistical validity.

6. Conclusions

A nominal value (or an existing estimate) for a measurand is a piece of prior information that can justifiably be utilized in a classical, frequentist, statistical analysis. The concept of “shrinkage estimation” refers to the movement of a raw estimate towards this nominal value. The mean square error of measurement can be reduced if the nominal value differs from the true value by an amount smaller than or comparable to the standard error of the raw estimator. The technique can be applied in a single measurement, where the nominal value might be zero, but it can also be applied when calculating a consensus value in a measurement comparison, where there might be external knowledge of the true value of the artefact being circulated. The shrinkage incurs some bias, but the experimental procedure is biased anyway by the systematic effects that act as inputs to that procedure, even though such effects are treated as known variances in a Type B evaluation. The modern analysis of measurement uncertainty is seen to relate to the propagation of the mean square error, not the propagation of the error variance, and shrinkage estimation can potentially be used to reduce this mean square error.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The author declares no conflicts of interest.

References

  1. Thompson, R. Some shrinkage techniques for estimating the mean. J. Am. Stat. Assoc. 1968, 63, 113–122. [Google Scholar] [CrossRef]
  2. Lemmer, H. Shrinkage estimators. In Encyclopedia of Statistical Sciences; Johnson, N.L., Kotz, S., Read, C.B., Eds.; Wiley: Hoboken, NJ, USA, 1988; Volume 8, pp. 452–456. [Google Scholar]
  3. Willink, R. On the role of probability in science, analytical measurement and QUAM. Accredit. Qual. Assur. 2025, 30, 245–252. [Google Scholar] [CrossRef]
  4. Cox, G. The evaluation of key comparison data. Metrologia 2002, 39, 589–595. [Google Scholar] [CrossRef]
  5. Giacomo, P. News from the BIPM. Metrologia 1981, 17, 69–74. [Google Scholar] [CrossRef]
  6. Joint Committee for Guides in Metrology. Guide to the Expression of Uncertainty in Measurement; Joint Committee for Guides in Metrology: Sèvres, France, 1995. [Google Scholar]
  7. Kaarls, R. Report of the BIPM Working Group on the Statement of Uncertainties (1st Meeting—21 to 23 October 1980) to the Comite International des Poids et Mesures; BIPM: Sèvres, France, 1980. [Google Scholar]
  8. Vishwakarma, G.K.; Gupta, S. Shrinkage estimator for scale parameter of gamma distribution. Commun. Stat. Simul. Comput. 2022, 51, 3073–3080. [Google Scholar] [CrossRef]
  9. Gupta, S.; Vishwakarma, G.K.; Elsawah, A.M. Shrinkage estimation for location and scale parameters of logistic distribution under record values. Ann. Data Sci. 2024, 11, 1209–1224. [Google Scholar] [CrossRef]
  10. Ghazani, Z.S. Shrinkage estimators for shape parameter of Gompertz distribution. J. Stat. Theory Pract. 2024, 18, 15. [Google Scholar] [CrossRef]
  11. Willink, R. Shrinkage confidence intervals for the normal mean: Using a guess for greater efficiency. Can. J. Stat. 2008, 36, 623–637. [Google Scholar] [CrossRef]
  12. Fourdrinier, D.; Strawderman, W.E.; Wells, M.T. Shrinkage Estimation, 1st ed.; Springer: Cham, Switzerland, 2018. [Google Scholar]
  13. Tsukuma, H.; Kubokawa, T. Shrinkage Estimation for Mean and Covariance Matrices, 1st ed.; JSS Research Series in Statistics; Springer: Singapore, 2020. [Google Scholar]
  14. Joint Committee for Guides in Metrology. JCGM 101:2008 Evaluation of Measurement Data—Supplement 1 to the “Guide to the Expression of Uncertainty in Measurement”—Propagation of Distributions Using a Monte Carlo Method. 2008. Available online: https://www.bipm.org/en/doi/10.59161/JCGM101-2008 (accessed on 27 January 2026).
  15. Joint Committee for Guides in Metrology. JCGM 102:2011 Evaluation of Measurement Data—Supplement 2 to the “Guide to the Expression of Uncertainty in Measurement”—Extension to Any Number of Output Quantities. 2011. Available online: https://www.bipm.org/en/doi/10.59161/JCGM102-2011 (accessed on 27 January 2026).
Figure 1. The ratio of mean square errors Ω / u 2 (solid curves) and the function “ ratio = a ” (dotted line).
Figure 1. The ratio of mean square errors Ω / u 2 (solid curves) and the function “ ratio = a ” (dotted line).
Metrology 06 00026 g001
Figure 2. Ratio of mean square errors Ω 2 / Ω 1 as as a function of λ | θ x nom | / u for different values of the optimal shrinkage factor a opt . Dotted line—enveloping function (see text).
Figure 2. Ratio of mean square errors Ω 2 / Ω 1 as as a function of λ | θ x nom | / u for different values of the optimal shrinkage factor a opt . Dotted line—enveloping function (see text).
Metrology 06 00026 g002
Figure 3. Ratio of mean square errors Ω 2 / Ω 1 as a function of d / u for different values of λ ( θ x nom ) / u .
Figure 3. Ratio of mean square errors Ω 2 / Ω 1 as a function of d / u for different values of λ ( θ x nom ) / u .
Metrology 06 00026 g003
Figure 4. Behaviour of the estimates for the example dataset against d for the values of x nom indicated. Solid line—CV; dotted line—CV2; dashed line—CV1, which is equal to 1000.32 for all d and all x nom . The marker indicates the value of CV 2 and CV with x nom = 1000 and d = 1 , as in the example.
Figure 4. Behaviour of the estimates for the example dataset against d for the values of x nom indicated. Solid line—CV; dotted line—CV2; dashed line—CV1, which is equal to 1000.32 for all d and all x nom . The marker indicates the value of CV 2 and CV with x nom = 1000 and d = 1 , as in the example.
Metrology 06 00026 g004
Table 1. Example dataset for combination into a consensus value.
Table 1. Example dataset for combination into a consensus value.
i123456
x i 1003.2997.91004.21003.9998.81004.8
u i 2.11.42.32.71.12.7
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Willink, R. Shrinkage Estimation to Minimize Error in Measurement Estimates and Consensus Values. Metrology 2026, 6, 26. https://doi.org/10.3390/metrology6020026

AMA Style

Willink R. Shrinkage Estimation to Minimize Error in Measurement Estimates and Consensus Values. Metrology. 2026; 6(2):26. https://doi.org/10.3390/metrology6020026

Chicago/Turabian Style

Willink, Robin. 2026. "Shrinkage Estimation to Minimize Error in Measurement Estimates and Consensus Values" Metrology 6, no. 2: 26. https://doi.org/10.3390/metrology6020026

APA Style

Willink, R. (2026). Shrinkage Estimation to Minimize Error in Measurement Estimates and Consensus Values. Metrology, 6(2), 26. https://doi.org/10.3390/metrology6020026

Article Metrics

Back to TopTop