Distinguishing Constant and Variable Bias in Systematic Error: A New Error Model for Metrology and Clinical Laboratory Quality Control

Vandra, Atilla Barna; Drégelyi-Kiss, Ágota

doi:10.3390/metrology5040067

Open AccessArticle

Distinguishing Constant and Variable Bias in Systematic Error: A New Error Model for Metrology and Clinical Laboratory Quality Control

by

Atilla Barna Vandra

^1,†

and

Ágota Drégelyi-Kiss

^2,*

¹

Spitalul Clinic Județean de Urgență Brasov, Brasov County University Hospital for Urgencies, 500326 Brasov, Romania

²

Bánki Donát Faculty of Mechanical and Safety Engineering, Óbuda University, 1034 Budapest, Hungary

^*

Author to whom correspondence should be addressed.

^†

Retired.

Metrology 2025, 5(4), 67; https://doi.org/10.3390/metrology5040067

Submission received: 22 September 2025 / Revised: 27 October 2025 / Accepted: 3 November 2025 / Published: 5 November 2025

(This article belongs to the Collection Measurement Uncertainty)

Download

Browse Figures

Versions Notes

Abstract

This study presents a novel error model that distinguishes between constant and variable components of systematic error (bias) in measurement systems, particularly within clinical laboratory settings. Traditional approaches often conflict with these components, resulting in miscalculations of total error and measurement uncertainty. Through mathematical deduction and computer simulations, the authors demonstrate that the standard deviation derived from long-term quality control (QC) data includes both random error and the variable bias component, challenging its use as a sole estimator of random error. The proposed model defines the constant component of systematic error (CCSE) as a correctable term, while the variable component (VCSE(t)) behaves as a time-dependent function that cannot be efficiently corrected. The study further reveals that long-term QC data are not normally distributed, contradicting prevailing assumptions in metrology. It advocates for revised definitions in the International Vocabulary of Metrology (VIM3), emphasizing the need to distinguish between bias types determined under different measurement conditions. By applying this refined model, laboratories can enhance decision-making accuracy and more accurately estimate measurement error and uncertainty. The findings have implications beyond clinical laboratories, suggesting a paradigm shift in how systematic error is conceptualized and managed across all domains of metrology.

Keywords:

normal distribution; standard deviation; random error; bias; systematic error; measurement uncertainty; unpredictable; variable component of the systematic error; repeatability conditions; reproducibility within laboratory conditions

1. Introduction

The instruments and measuring systems must be calibrated to link the directly measured signal to the measured quantity. Calibration is a measurement, subject to errors [1]. The need for periodic quality control (QC) is evidence of the time variability of the measuring system. This study focuses on QC decisions, and only measurement errors can be quantified in the QC. Therefore, only these will be discussed.

Metrology—the science of measurement—was developed in conjunction with the concept of the normal (Gaussian) distribution. These mathematical tools were originally created to describe measurements taken in stable and unchanging environments—usually in systems that are not alive. The probability density function of the Gaussian distribution describes how the values of a continuous random variable are distributed, and it is strictly valid only when the conditions remain constant. However, clinical laboratory measurements involve biological materials and systems that are inherently variable. This makes the application of traditional metrological models more complex and requires careful consideration of the specific conditions under which measurements are made.

A significant contribution of the theory was that it enabled the distinction between systematic and random measurement error components. The total measurement error (TE) is the sum of the systematic error component (SE) and the random error component (RE). The criterion to distinguish the two components (SE and RE) in the International Vocabulary of Metrology (3rd Ed., VIM3) [2] 2.17 and 2.19 definitions is the predictability. The systematic measurement error component (SE) is either constant or varies predictably, while the random error component (RE) varies unpredictably across replicate measurements. The stress in the definition is on the “in replicate measurements” expression, which restricts the meaning of “varies unpredictably”. The neglect of the restriction is the source of several misinterpretations of the definition, as will be detailed in the study.

Unfortunately, the σ and μ, the parameters of the probability density function of the Gaussian distribution, are ideal values; therefore, estimators are used instead.

\bar{x}

, the mean of the measurements, is the estimator of μ, the ideal mean, and if and only if the conditions of the probability density function of the Gaussian distribution are strictly respected, the standard deviation (SD) is the estimator of σ, the ideal mean, which is the measure of the dispersion of data around the mean caused by the random error (RE). The SD can be determined by making several measurements of the same quantity. Its value depends on the conditions in which the measurements are made. VIM3 [2] distinguishes three types of measurement conditions:

Repeatability conditions (VIM3 2.20 [2]—constant conditions: same measuring procedure, same operators, same measuring system, same location, and same operating conditions over a short period of time). The measured SD is abbreviated as $s_{r}$ .
Intermediate, also called reproducibility within laboratory conditions (VIM3 2.22 [2]—variable conditions in the laboratory, over an extended period of time, maintaining the same measurement procedure and the same location). The measured SD is abbreviated $s_{R W}$ .
(Full) reproducibility conditions (VIM3 2.24 [2], including all possible variations that may happen when moving and measuring the sample in a different laboratory, including different measuring procedures and different locations). The measured SD is abbreviated $s_{R}$ .

The full reproducibility conditions include variations that a specialist (an operator) cannot influence in their own laboratory; therefore, this study will not discuss them. However, with some adaptations, the conclusions of this study can also be applied to full reproducibility conditions.

This study starts by assuming four quintessential principles valid across all fields of metrology:

QP1: A parameter must be determined under the same conditions under which it is used in calculations and predictions. If the conditions differ, the results may not be valid [3,4].
QP2: When applying a law or using an equation, we assume that all conditions of applicability are fulfilled. (It may be an unconscious, hidden assumption if the compliance with the condition is not verified.) [2].
QP3: A corrective action—such as recalibration—can neither efficiently correct an error if the average error introduced by the corrective action is larger than the original error, nor if the uncertainty of the value is greater than the error intended to fix [2,5].
QP4: Adding a constant (as a correction, GUM B.2.23 [2,5]) or multiplying it by a constant (as a correction factor, GUM B.2.24 [2,5]) to a function does not reduce the variability of that function. This means that such corrections cannot eliminate the natural variation present in the measurements.

Quality control (QC) in clinical laboratories has unique characteristics compared to other fields that use metrology. The samples measured in clinical labs are biological materials, which are highly complex and contain proteins. These samples, along with reference materials and reagents, are often unstable. This instability leads to significant variation in measurement errors. Additionally, the technology used in the production of the control materials can alter the structure of the proteins. This change causes what is known as matrix errors. Because of this, the bias measured during internal QC in clinical laboratories is only accepted as a relative value—meaning it is compared to the target value of the control material. The true or “absolute bias value” is determined through the external quality assessment (EQA).

Reliable quality control is not possible without a solid theoretical foundation. This foundation must be built on clear definitions and well-established conditions. In the context of clinical laboratory QC, we assume the following facts:

Despite the effort to correct them, most of the monthly mean deviations of the control results from the target values are not insignificant, suggesting their incorrigibility. (In the clinical laboratory, half of these relative biases are around 1 × $s_{R W}$ (one standard deviation (SD) measured in intermediate (reproducibility within laboratory) conditions (The conditions may be either intermediate conditions or reproducibility within laboratory conditions. VIM suggests the first, but older literature uses the second. The abbreviation is from the second) [1]).
The laws of the normal distribution would predict hundreds of warnings and alarms if the bias is 1 × SD, which are not observed. Instead, several ‘impossible’ QC graphs (e.g., no value beyond the 2 × SD limit) are experienced in practice [1], suggesting an overestimation of the decision limits based on $s_{R W}$ [6].
The time variability of the systematic error (bias) is known from the beginning of metrology [7,8]. Despite the known variability, it is not a usual practice to distinguish between biases measured in repeatability and those measured in intermediate or reproducibility within laboratory conditions, as is performed in the case of standard deviations (SD).
The variability of the aforementioned relative biases contradicts the predictions based on Student’s t distribution tables [9]. The variability of the biases measured in external quality assessment (EQA) is even bigger [10].
Calibration is a measurement, subject to errors [1].
In normal laboratory conditions (which must be guaranteed), i.e., air-conditioned clinical laboratory (adequate temperature), using an automated analyzer, well-trained personnel, adequate deionized water quality, correctly performed maintenance, and absence of failures (which are usually not detected in the QC), only sources of random error (RE) but no sources of significant σ variability can be identified. While $s_{r}$ is not time-variable [11,12,13], the variability of the $s_{R W}$ (a value measured in a month may even double or halve in the next month [10]) contradicts the predictions based on chi-square distribution tables [9], questioning whether $s_{R W}$ is the correct estimator of the σ parameter (the mean random error).

Research aimed at understanding the causes of the previously mentioned phenomena uncovered several hidden factors (referred to as QP2), as well as some widely accepted but incorrect assumptions and contradictions. Below are a few examples:

Calibration is error-free [14].
Long-term control data are normally distributed [14].
$s_{R W}$ is the correct estimator of the σ parameter of the probability density function of the normal distribution.
The calculations are based on the SD measured in repeatability conditions ( $s_{r}$ ); (Statement acknowledged by JO. Westgard and T. Groth [14]).

The analysis of the causes suggested that the source of these false assumptions can be traced to the unverified conditions of the normal distribution, the misinterpretation of the ‘random measurement error’ definition in VIM3 2.19 [2], and the neglected separation of the components of the systematic error; however, VIM3 2.17 [2], through the word ‘or’, suggests the existence of two components of the systematic error (SE).

Modern metrology was born alongside the work of Gauss and Laplace, as well as the development of the normal distribution. However, neither of the authors used the ‘normal distribution’ term. It originates from K. Pearson [15]: “Many years ago, I called the Laplace-Gaussian curve the ‘normal’ curve, which name, while avoids an international question of priority, has the disadvantage of leading people to believe that other distributions of frequency are in one sense or other ‘abnormal.’” The two definitions of the quantitative expression of the measurement uncertainty are equivalent only if the data are normally distributed. According to GUM F.1.1.3 [2,5], “Second, it must be asked whether all of the influences that are assumed to be random really are random”.

The study starts with the description of the proposed error model in metrology, then a mathematical deduction, which will bring proofs that the separation of the SE into a constant and a variable component is justified, the

s_{R W}

has two components, and only one is random. The other component, however, is an SD; it is systematic and exhibits a different mathematical behavior. In the next section, the study uses computer simulations to support the mathematical findings. These simulations confirmed that the equations are correct and that the separation of components is justified. The Discussion section explores the consequences of these results, and suggestions will be formulated for future research. The findings also call for a shift in how we think about quality control (QC) and metrology. A new perspective is needed—one that clearly distinguishes between random and systematic components of uncertainty and treats them according to their true nature.

2. Proposed Error Model

The concept of bias is a subject of ongoing debate in metrology. According to the International Vocabulary of Metrology (VIM3, [2]), two definitions are relevant. First, VIM3 4.20 defines instrumental bias (IB) as the average of replicate indications (

{\bar{x}}_{i}

) minus a reference quantity value (x_ref):

IB \approx {\bar{x}}_{i} - x_{r e f}

(1)

According to the VIM3 2.18 [2], the definition of measurement bias or bias (B) is “the estimate of a systematic error”. The reference value itself (VIM3 5.18 [2]) is described as a quantity value used as a basis for comparison with quantities of the same kind. This reference can either be a true value (which is usually unknown) or a conventional value (which is known and agreed upon).

In the total error (TE) theory-based metrology, bias is calculated relative to a reference value. However, the Guide to the Expression of Uncertainty in Measurement (GUM [5]) introduced a more modern approach. According to GUM 3.2.3 [5], if a bias is identified and is ‘significant in size relative to the required accuracy of the measurement’, it must be corrected. Additionally, the uncertainty associated with this correction must be included in the overall uncertainty budget.

Despite correction efforts, some bias may remain. However, GUM 3.2.3 [5] states the following: “It is assumed that, after correction, the expectation or expected value of the error arising from a systematic effect is zero”. The statement is not true. According to QP3, a correction cannot eliminate a bias if the uncertainty of the bias is greater than the bias intended to be fixed. Consequently, after correction, the remanent bias is not zero, but is under the limit of corrigibility and can be considered insignificant. However, its value is not precisely known; the bias exists. The situation becomes more complex when the bias is not constant. If the bias changes over time or under different conditions, any correction becomes temporary. In such cases, the bias tends to reappear, making calibration or correction only a short-term solution. Only a constant bias component can be efficiently corrected (QP4).

The contradictions above could not be explained consistently without separating the systematic error (SE) (and its estimator, the bias) into two subcomponents: the constant component of the SE (CCSE) and the variable component of the SE (VCSE(t)), which changes over time. This separation is essential because the variable component behaves as a time-dependent function, making its impact on measurements unpredictable and harder to correct. Even the definition provided in VIM3 2.17 [2] suggests this distinction. It describes systematic error as “The component of measurement error that in replicate measurements remains constant or varies in a predictable manner”. The use of the word “or” implies that systematic error may either be constant or change in a way that can be anticipated. Recognizing this dual nature enables a more accurate understanding and handling of bias in measurement systems, particularly in complex environments such as clinical laboratories.

The existence of the CCSE is sustained by the mentioned incorrigible monthly biases linked to the calibration errors. In contrast, the need for periodic controls sustains the existence of the VCSE(t). The bias can be measured by repeatedly measuring the same certified reference material (in repeatability conditions). On different days, different values are obtained. Let it be noted that B_r(t) is the bias measured under repeatability conditions at time t. In long time frames, under intermediate reproducibility within laboratory conditions, only a mean value can be obtained. (Let it be noted that

{\bar{B}}_{R W}

, highlighting with an accent that is a mean value.)

{\bar{B}}_{R W}

can be identified with the CCSE. The difference between them is the VCSE(t) function. The relationship between them is as follows [16,17]:

SE (t) = CCSE + VCSE (t) \approx {\bar{B}}_{R W} + VCSE (t) = B_{r} (t)

(2)

Without definitions in VIM3, more authors, using different names, notations, and definitions, published equations consistent with Equation (2) [18,19,20]. Using empirical methods and computer simulations, AB. Vandra [16,17] obtained the following relationship between the standard deviation (SD) measured in repeatability and in reproducibility within laboratory conditions (

s_{r}

and

s_{R W}

):

s_{R W} = \sqrt{s_{r}^{2} + s_{V C S E}^{2}}

(3)

where

s_{V C S E}

is the SD calculable from the variable B_r(t) values, or the values of the VCSE(t). An SD can be calculated from any set of variable data, not only from normally distributed ones [16,17]. Several authors using different names, definitions, and notations published equivalent equations to Equation (3) [13,18,19,20]. None of them published a mathematical proof. The differences in notations, content, definitions, and names impose upon the standardization and definitions in the VIM3.

Equation (3) suggests that

s_{R W}

is not a purely random component, but a mix of two terms. s_r is the random component (

s_{r}

≈ σ), while

s_{V C S E}

, which is calculated from the bias values, is a systematic component. Consequently,

s_{R W}

is not the measure of the RE, but of the variability of data under two influences: the RE and the bias variations. Because this observation contradicts the actual recommendations, which consider

s_{R W}

the estimator of the RE [2,3,4,5,6,14], this study aims to bring mathematical proof for this statement.

Equations (2) and (3) together describe an error model that addresses the time variability of the SE (bias). In contrast to similar alternative models, this model takes an additional step, linking the bias components to the values measured in repeatability and intermediate reproducibility (reproducibility within the laboratory) conditions, respectively. Equations (2) and (3) have the advantage of highlighting that the VCSE(t) function in repeatability conditions is hidden in B_r(t), whereas in reproducibility within laboratory conditions, in the

s_{R W}

. The VCSE(t) appears unimportant since, being included in other error components, no decisions are made based on its value; however, the statement is only apparently true. If we are consistent with Equations (1) and (2), we can realize that

{\bar{B}}_{R W}

is a mean, not a value. B_r(t) is a time-dependent function, and its value has only 24 h validity.

s_{R W}

includes a systematic error component and considering it the measure of σ is questionable. The use of

s_{R W}

and B_r(t) in the same equation causes redundant use of the VCSE(t). Because there are two SD and two bias types, “the SD” and “the bias” (with the definite article) are ambiguous terms (definitional uncertainties) if it is not specified (at least in the context) which one we are referring to. (To

{\bar{B}}_{R W}

or to B_r(t)? To

s_{r}

or to

s_{R W}

?). For example, “according to GUM, the discovered bias must be corrected”. The ambiguity is revealed when we recognize that bias measurements (in EQA) are conducted under repeatability conditions, bias is variable, and its validity term is limited to only 24 h. According to QP4, a variable bias cannot be eliminated by adding a constant (a correction) to the result. Meanwhile,

{\bar{B}}_{R W}

is a constant and can be corrected.

The inclusion of the VCSE(t) in the bias (B_r(t)) in the short term, and the

s_{R W}

in more extended time frames, suggested questionable theories, like the following: “Variable bias components become random errors over time” [21]. As will be shown, the ‘transformation’ is only subjective, based on a misinterpretation of VIM3 2.19 [2].

However, this theory was born in the clinical laboratory, suggesting modifications in the VIM3, with implications across all domains where metrology is applied. Therefore, this study aims to provide a theoretical foundation for the previously empirically deduced theory and error model based on the separation of the SE into a constant and a variable subcomponent, elucidating its implications and the significance of

s_{R W}

.

Furthermore, as a consequence of the proposed error model (Figure 1) applied consistently with the mentioned four quintessential principles of metrology, it can be concluded that the long-term QC data are not normally distributed, and

s_{R W}

is not the correct estimator of σ. The former conclusion contradicts the actual paradigm. Therefore, this study aims to bring mathematical proofs, sustained by computer simulations, that the long-term QC data are not normally distributed, and the correct estimator of the σ parameter is measured under repeatability conditions (

s_{r}

). The advantages of separating the SE into a constant and variable component, as well as the risks of not performing the separation, will be analyzed.

This study has two main goals. The first is to answer the following questions: (1) Can long-term control data follow a normal distribution? (2) What is the cause of the significant variability of

s_{R W}

and the mean? If the answer to the first question is yes, the study also aims to define the conditions under which this is true. The second goal is to help clarify what the term “random” means in metrology. As M. Krystek observed [22], “We speak of ‘random’ variations, although we cannot explain what the attribute ‘random’ actually means”. And he also mentioned that “It has long been known that there are two essentially different contributions to the measurement uncertainty, but the discussion of how to deal with them in a correct way is still going on”.

3. Mathematical Deduction

This study is a theoretical investigation based on mathematical deductions derived from the Gauss equation. It focuses on the conditions under which this equation can be applied, and on the precise definitions of its parameters, σ and μ, and their estimators, the mean and the SD, respectively. The misinterpretations highlight the weakness of the VIM3 2.19 [2], and the inequivalence of the ‘random’ and ‘unpredictable’ terms will be sustained by examples and counterexamples, mainly from the clinical laboratory practice but also from other domains.

The primary objective of this study is to determine whether the long-term data can be assumed to be normally distributed. The demonstration starts with the probability density function of the Gaussian (normal) distribution:

f (x) = \frac{e^{- \frac{{(x - μ)}^{2}}{2 σ^{2}}}}{σ \sqrt{2 π}}

(4)

where f(x) is the probability density function, σ is the scale parameter characterizing the dispersion of data, and μ is the location parameter, the ideal mean of the data [9].

The σ and μ parameters of the equation are assumed to be constants. This is a condition of applicability. If they are not, Equation (4) is not valid, and the distribution is not normal. Gauss imagined describing the distribution of data in lifeless domains under constant conditions [23]. This means the same measured quantity at a specific moment, which means repeatability conditions of measurement. (VIM3 2.20 [2]). In variable conditions σ and μ are time-variable functions, and Equation (4) becomes

f (x, t) = \frac{e^{- \frac{{(x - μ (t))}^{2}}{2 {σ (t)}^{2}}}}{σ (t) \sqrt{2 π}}

(5)

where

μ (t)

and

σ (t)

are the functions that describe the time-dependence of the parameters. For each t moment, a different equation is obtained.

Quality control measurements are made in discrete moments. In each t = i moment (i = (1,…,m)), n measurements are made (j = (1,…n)). Let it be noted that x_i,j the jth result in day i. In repeatability conditions at time point ‘i’, the mean of the results (the mean of day i obtained from a finite number of results

{\bar{x}}_{i}

) is

{\bar{x}}_{i} = \frac{\sum_{j = 1}^{n} x_{i, j}}{n}

(6)

The theoretical mean

μ_{i}

at time point t = i is

\lim_{n \to \infty} {\bar{x}}_{i} = \lim_{n \to \infty} \sum_{j = 1}^{n} \frac{x_{i, j}}{n} = μ_{i}

(7)

{\bar{x}}_{i}

is the estimator of

μ_{i} .

The deviation of the jth measurement

x_{i, j}

from the measured mean (

{\bar{x}}_{i})

, noted d_i,j(

{\bar{x}}_{i})

, is

d_{i, j} ({\bar{x}}_{i}) = x_{i, j} - {\bar{x}}_{i}

(8)

Because of the properties of the means are as follows:

\sum_{j = 1}^{n} d_{i, j} ({\bar{x}}_{i}) = \sum_{j = 1}^{n} (x_{i, j} - {\bar{x}}_{i}) = 0

(9)

The SD measured at time point t = i, (

s_{r_{i}} -

repeatability conditions) is the root mean square of the deviations

d_{i, j}

(with Bessel’s correction [24], substituting the number of measurements n with the degrees of freedom n − 1). The σ parameter at time point t = i (σ_i) is the limit of the measured standard deviation,

s_{r_{i}}

, if n →

\infty

:

σ_{i} = \lim_{n \to \infty} s_{r_{i}} = \lim_{n \to \infty} \sqrt{\frac{\sum_{j = 1}^{n} {(x_{i, j} - {\bar{x}}_{i})}^{2}}{n - 1}}

(10)

Because of Equation (10), the measured SD,

s_{r_{i}}

, can be used as the estimator of σ_i, but only in the domain of validity of Equation (4), which means constant repeatability conditions. From the set of m individual values of repeatability standard deviation,

s_{r_{i}}

, it is possible to calculate their average (

{\bar{s}}_{r}

):

{\bar{s}}_{r} = \sqrt{\frac{\sum_{i = 1}^{m} s_{r_{i}}^{2}}{m}} = \sqrt{\frac{\sum_{i = 1}^{m} \frac{\sum_{j = 1}^{n} {(x_{i, j} - {\bar{x}}_{i})}^{2}}{n - 1}}{m}}

(11)

The long-term mean

\bar{x}

of the determined m daily means

{\bar{x}}_{i}

is

\bar{x} = \frac{\sum_{i = 1}^{m} {\bar{x}}_{i}}{m} ~ \frac{\sum_{i = 1}^{m} μ_{i}}{m} ~ μ

(12)

\bar{x}

is the estimator of the theoretical long-term mean μ. The deviation of each measured value from the long-term mean (d_i,j(

\bar{x}

)) is

d_{i, j} (\bar{x}) = (x_{i, j} - \bar{x}) = (x_{i, j} - {\bar{x}}_{i}) + ({\bar{x}}_{i} - \bar{x})

(13)

The first term in Equation (13), denoted as d_i,j(

{\bar{x}}_{i})

, represents the deviation from the daily average under repeatability conditions at time point ‘i’. This term captures the random variation in the measurements. The second term is the difference between the daily average and the long-term mean

\bar{x}

, a systematic component (Equation (1)). These two components have different physical meanings and statistical behavior. The random error term is variable, while the systematic error term is constant within a given day. Because one term is constant and the other is variable, they are statistically independent. This separation helps clarify the nature of measurement variability and supports the structure of the proposed error model.

Because of the properties of the means are as follows:

\sum_{j = 1}^{n} d_{i, j} (\bar{x}) = \sum_{j = 1}^{n} (x_{i, j} - \bar{x}) = \sum_{j = 1}^{n} (x_{i, j} - {\bar{x}}_{i}) + n \cdot ({\bar{x}}_{i} - \bar{x}) = 0

(14)

Calculating the long-term SD (

s_{R W}

), substituting the d_i,j(

\bar{x}

) values in Equation (11), the following is obtained:

s_{R W} = \sqrt{\frac{\sum_{i = 1}^{m} (\sum_{j = 1}^{n} {(d_{i, j} (\bar{x}))}^{2})}{n \cdot m - 1}} = \sqrt{\frac{\sum_{i = 1}^{m} {(\sum_{j = 1}^{n} (x_{i, j} - {\bar{x}}_{i}) + n \cdot ({\bar{x}}_{i} - \bar{x}))}^{2}}{n \cdot m - 1}} = = \sqrt{\frac{\sum_{i = 1}^{m} (\sum_{j = 1}^{n} {(x_{i, j} - {\bar{x}}_{i})}^{2})}{n \cdot m - 1} + \frac{\sum_{i = 1}^{m} (n \cdot {({\bar{x}}_{i} - \bar{x})}^{2})}{n \cdot m - 1}}

(15)

Because of Equations (9) and (14), the

\sum_{i = 1}^{m} (\sum_{j = 1}^{n} (x_{i, j} - {\bar{x}}_{i}) \cdot ({\bar{x}}_{i} - \bar{x}))

term is approximately 0 and can be neglected. In the first term, under the square root,

{({\bar{s}}_{r})}^{2}

can be identified, while the second term represents the square of an SD calculated from the variable daily mean values (

{\bar{x}}_{i}

). Because

({\bar{x}}_{i} - \bar{x})

is the difference between two means, it is not a random component. It characterizes the variability of the systematic error component. Let it be named the SD calculated from the variable component of the SE and noted as

s_{V C S E}

, as proposed in Section 2.

s_{V C S E} = \sqrt{\frac{\sum_{i = 1}^{m} (\sum_{j = 1}^{n} {({\bar{x}}_{i} - \bar{x})}^{2})}{n \cdot m - 1}} = \sqrt{\frac{\sum_{i = 1}^{m} n \cdot {({\bar{x}}_{i} - \bar{x})}^{2}}{n \cdot m - 1}} ~ \sqrt{\frac{\sum_{i = 1}^{m} {({\bar{x}}_{i} - \bar{x})}^{2}}{m - 1}}

(16)

Because neither of the terms under the sum is dependent on j, all terms in the

\sum_{j = 1}^{n} {({\bar{x}}_{i} - \bar{x})}^{2}

sums are equal. As m increases, the difference between the last two terms becomes insignificant. Using the

s_{V C S E}

notation, Equation (15) becomes

s_{R W} = \sqrt{{\bar{s}}_{r}^{2} + s_{V C S E}^{2}}

(17)

Equation (17) is Equation (3) (which was deduced using empirical methods by more authors, as mentioned in the Introduction), with a single difference: Equation (17) highlights that

{\bar{s}}_{r}

is a mean value. The former deduction confirms the validity of Equation (3). Equation (3) assumes a homoscedastic measurement system (constant

s_{r}

-QP2), while Equation (17) is also valid in heteroscedastic measuring systems.

Let it be that j = 2 (two measurements in repeatability conditions on each day). According to Equation (6)

{\bar{x}}_{i} = \frac{\sum_{j = 1}^{2} x_{i, j}}{2}

=

\frac{x_{i, 1} + x_{i, 2}}{2}

. Consequently, according to Equation (8), d_i,1(

{\bar{x}}_{i})

= −d_i,2(

{\bar{x}}_{i})

, according to Equations (13) and (14),

x_{i, 1} - x_{i, 2}

= d_i,1(

{\bar{x}}_{i})

, and the SD calculated from the differences in duplicate measurements measured in repeatability conditions is

\sqrt{2} * {\bar{s}}_{r}

(Equation (11)), validating the Dahlberg method [11] of calculating

s_{r}

using duplicate measurements, obtained in long time frames. The possibility of calculating

s_{r}

from long-term data demonstrates that

s_{r}

is not linked to the time frame, but rather to the constant conditions. The short time frame is necessary only to guarantee the constant conditions of the measurements. Consequently, Equations (3) and (17) suggest that the second term in the equation (

s_{V C S E}

) is not linked to the random phenomenon but is a systematic component.

It is unusual in practice to make more replicated measurements in repeatability conditions each day, and to calculate their averages. Let us imagine such an experiment, calculating the average of k measurements on each day performed in repeatability conditions. When calculating the average deviation of each day, the second term of Equation (13) will not be influenced (it is the same in each

d_{i, j} (\bar{x}

) term). Therefore

s_{R W} (a v e r a g e) = \sqrt{\frac{{\bar{s}}_{r}^{2}}{k} + s_{V C S E}^{2}}

(18)

If the method of measurement is homoscedastic or quasi-homoscedastic (the SD is constant or quasi-constant),

s_{r}

is obtained; if it is heteroscedastic (variable SD), a mean value of the SD is measured in repeatability conditions (

{\bar{s}}_{r})

. Equation (18) suggests that the second term in the equation (

s_{V C S E}

) is not linked to the random phenomenon.

In constant conditions (

s_{V C S E}

= 0), Equation (18) reduces to

s_{r} (a v e r a g e) = \sqrt{\frac{{\bar{s}}_{r}^{2}}{k}} = \frac{\bar{s_{r}}}{\sqrt{k}}

(19)

Equation (18) contradicts what we should expect from an average of normally distributed values.

4. Simulation

4.1. Basics

Five basic data series were generated using Python 3.12. Each series contains 10,000 time points and consists of random values drawn from a normal distribution with a mean of 0 and a standard deviation of 1. The series are defined as follows:

○: norm1: A single random sequence from a normal distribution N(0,1), t = 1,…,10,000).
○: norm2: The average of two independent N(0,1) random sequences, calculated for each time point.
○: norm3: The average of three independent N(0,1) random sequences, calculated for each time point.
○: norm4: The average of four independent N(0,1) random sequences, calculated for each time point.
○: norm5: The average of five independent N(0,1) random sequences, calculated for each time point.

For each of these simulated series, the standard deviation was calculated using Bessel’s correction [24]. This corrected standard deviation is referred to as

s_{r}

, and it represents the repeatability standard deviation of the data.

Because of the quadratic additivity law of the standard deviations, if one of the terms in a quadratic sum is significantly less than the other, the difference between the bigger component and the sum becomes insignificant. For this reason, the bias variations were chosen to have the same SD, s_VCSE = s_r = 1 (norm1 cases). However, these values are unitless; important is their ratio (s_VCSE/s_r). The absolute value has no significance. (Incidentally, in some real-life cases, e.g., serum glucose, they even resemble the absolute values numerically, because s_r ≈ 1 mg/dL). Because in real life, less significant biases than the mean calibration error (estimated by AB. Vandra [1] to 1–2 s_r) cannot be corrected efficiently, and s_VCSE/s_r ≈ 1 is a real-life minimal value in the case of most measurands. The bigger ratios are well simulated by the norm2–norm5 cases.

The time-dependent systematic error is described theoretically by Equation (2). This type of error, also referred to as bias, changes over time and can affect the accuracy of measurements. To study this behavior, we simulate the bias using a defined method:

-: bias1: In this case, the constant component of the systematic error (CCSE) is set to 1, while the time-dependent component is zero throughout the simulation. This means that the bias remains constant over time, with no variation. As a result, the standard deviation of the time-dependent component, denoted as $s_{V C S E}$ , is equal to 0. (This is a possible real-life case when the reagents are very stable.)
-: bias2: Here, the time-dependent component of the systematic error, denoted as SE(t), is treated as a random variable. It is simulated using a uniform distribution with an expected value E(X) = 1 and a variance $D^{2} (X) = 1$ . To achieve this, the lower and upper bounds of the uniform distribution are calculated as follows:
Lower bound: $a = 1 - \sqrt{3}$ ; upper bound: $b = 1 + \sqrt{3}$ . (The rounding error may introduce such uniformly distributed errors.)
-: bias3: In this simulation, the time-dependent component SE(t) is modeled as a normally distributed random variable. The distribution has a mean μ = 1 and a standard deviation σ = 1. This means the bias varies over time according to a normal distribution, which is commonly used to represent natural fluctuations in measurement systems. (It is a possible real-life case when, before each control measurement, a calibration is made. Calibration is a measurement and is subject to random measurement errors.)
-: bias4a: In this model, a deterministic pattern is used to simulate periodic variation. The data consists of a repeating sequence of 10 values that are linearly spaced between the bounds $1 - \sqrt{3}$ and $1 + \sqrt{3}$ . This sequence represents a controlled, systematic fluctuation across a fixed interval. The 10-element pattern is repeated continuously until a total of 10,000 data points are reached, resulting in a saw-tooth-shaped QC graph. The structure ensures uniform coverage of the defined range in each cycle, making the dataset suitable for analyzing periodic reagent changes or systematic error propagation in controlled environments.
-: bias4b: The same as bias4a, but with a fixed sequence of 25 linearly spaced values.
-: bias4c: The same as bias4a and bias4b, but with a fixed sequence of 57 linearly spaced values. (Bias 4a, bias 4b, and bias4c are possible real-life cases. If the reagent is unstable but calibration is not performed during its on-board life, the reagent is changed before the bias reaches critical values. In the case of bias 4a, the reagent changes are more frequent (after 10 control runs); in the case of bias 4b, the reagent is changed only after 25 control runs; in the case of bias4c, the reagent is changed only after 57 control runs.)
-: bias5a: In this model, constant periods are interrupted by random changes. We use a normally distributed random variable, where the expected value is E(X) = μ = 1 and the variance is D²(X) = σ² = 1. A single value drawn from this distribution is considered a bias and is applied consistently across 10 consecutive cases. After these 10 cases, a new value is drawn from the same uniform distribution and again used for the next 10 cases. This process continues, alternating between drawing a new value and applying it to 10 cases at a time.
-: bias5b: The same as bias5a, but the number of identical consecutive cases is 25. (Bias 5a and bias 5b are possible real-life cases. It simulates the case when the newly installed reagents are compulsorily calibrated, introducing random changes in the mean and bias.)
-: bias6a: In this simulation, the constant component of the systematic error (CCSE) is set to 1. The time-dependent component is $V C S E (t) = \sqrt{2} \cdot s i n (t)$ . This means the bias varies over time in a smooth, periodic way, following a sine wave pattern.
-: bias6b: The same as bias6a, but $V C S E (t) = \sqrt{2} \cdot s i n (\frac{t}{10})$ . (The period is 10 times longer than in the case bias6a). (However, the influence of the temperature variation (±0.1–0.2 °C) in the thermo-regulated water bath in the clinical laboratories is not significant; such sinusoidal influences can be imagined in other measurement systems.)

Figure 2 shows the formerly mentioned bias types represented on QC graphs.

The corrected standard deviation (SD) of the bias data series is

s_{V C S E}

. This value represents the variability of the time-dependent component of the systematic error.

The complete set of simulated data series is constructed by combining one basic data series with one bias type. Each basic series (norm1 to norm5) represents a different level of averaging of normally distributed random values. To each of these five basic series, we add one of the nine different bias types described earlier. This results in a total of 50 simulated data series. These combinations enable us to examine how different forms of systematic error interact with varying levels of random noise and how they impact the overall behavior of the data.

4.2. Analysis of Simulated Data

Figure 3 shows the distribution of the simulated data series. Only four of the ten studied bias types are presented as histograms. Each line of five histograms represents the case of a different bias type. The first line presents bias1 (constant bias). The consecutive histograms show the effect of calculating the average of more results obtained in repeatability conditions. If the distribution is normal (Gaussian), the SD of the average is

\sqrt{n}

times smaller than the SD of a single measurement.

As n, the number of measurements in the average increases, the histograms become narrower and sharper, which is well visible in the histograms. In the other three lines (cases of gradually increasing bias (bias4b), randomly variable bias (bias5a), and sinusoidal bias (bias6a)), the phenomenon is hardly observable because the distribution is not normal, but a mix of two distributions. As Equation (18) suggests, the

s_{V C S E}

term remains unaffected by calculating the average of more results. The phenomenon is most pronounced in the last case (sinusoidal bias), where the histogram exhibits a bimodal shape (Figure 4), as in the cases of data obtained from a sinusoidal function.

4.3. Statistical Analysis

Table 1 presents the results of 45 simulated data series, each created by combining one of five base data series (norm1 to norm5) with one of nine bias types. For each combination, four key values are reported: the repeatability SD,

s_{r}

, the corrected standard deviation of the time-dependent bias component,

s_{V C S E}

, the total standard deviation of the resulting data series,

s_{R W}

, and the theoretical total standard deviation of the resulting data series,

s_{R W (t h e o r)}

, calculated using Equation (17).

In the case of bias1, which includes only a constant shift and no time-dependent variation,

s_{V C S E}

is not applicable (indicated by “–”), and the total standard deviation equals the base standard deviation. This confirms that a constant bias does not increase the spread of the data, only shifts its mean.

For all other bias types, the time-dependent bias component introduces additional variability. This is reflected in the increase in both

s_{V C S E}

and

s_{R W}

compared to the base data. The values of

s_{R W}

closely match the theoretical values

s_{R W (t h e o r)}

(calculated using Equation (17)), confirming the accuracy of the simulation. As the base data series changes from norm1 to norm5, the value of

s_{r}

decreases due to averaging over more random sequences. However, the influence of the bias remains visible, especially in bias types with stronger or more variable time-dependent components (e.g., bias5a, bias5b, bias6a, and bias6b).

The equality of variances across different simulated scenarios was examined. The analysis revealed that in the cases of bias4a versus bias4b, and bias5a versus bias5b, notable differences in variance were observed (Figure 5). In particular, the scenario involving 25 repetitions exhibited a higher variance.

As the number of repeated measurements increased (from norm1 to norm5), the extent of variance divergence also increased. The percentage differences in variance between bias4a and bias4b were found to be 2.5%, 3.4%, 3.4%, 4.0%, and 4.1%, respectively. Similarly, the differences between bias5a and bias5b were measured as 2.8%, 3.4%, 4.4%, 5.0%, and 5.3%.

These results indicate that a greater number of repetitions tends to amplify the variance differences between the simulation pairs.

The phenomenon suggests that the length of the bias variation cycles (10 and 25 in the studied cases) influences the

s_{R W}

variability.

4.4. $s_{R W}$ Variability

Because significant variability in the monthly

s_{R W} .

values were observed in practice (consistent with the literature data), which raises questions about the normal distribution, 250 SDs (

s_{R W}

) were calculated from each of the 10,000 sets of simulated data, each based on 40 consecutive simulated values. The minimum value, the maximum value, and their difference were calculated (in absolute values and as a percentage of the overall SD–total

s_{R W}

, Table 2, line 3). Using Cochran’s F-test, the values were compared with the overall SD for the equality of the variances. Because there are 250 values, a 95% confidence level is not useful, because an average of 12–13 values can be expected to be outliers, to be rejected by the F-test. The outliers cannot be excluded, not even using a 1 − 1/250 = 0.996 (99.6%) confidence level, because probably 1 of 250 cases will be an outlier. Therefore, in addition to the differences between the minimum and maximum values, the number of outliers (no. of rejected samples, last row of Table 2) was also used to estimate the variability of

s_{R W}

.

In Table 2, the data from the first column (constant bias, normally distributed values) were used as a reference for comparison. For the random biases (bias2 and bias3) and bias with linear and sinusoidal variation with short cycles (bias4a and bias6a), the percentage differences were similar to bias1, and the number of outliers was very low. (The six outliers in the case of bias3 were considered a consequence of the limits of the statistical methods in the case of small sample sizes.)

In the meantime, for randomly variable biases, bias5a, and bias5b, and in the case of sinusoidal bias with long cycles, significant differences (105.79%, 148.71%, and 81.69%) and a high number of outliers (38, 44, and 12) were observed. In case of bias4b, the results were contradictory: although the percentage difference was high (69.98%, compared to 56.53% for bias1), the number of outliers was low (only 4). To investigate further, for the simulation for gradually increasing bias (with a much longer, 57-run length, bias4c), both the difference (69.97%) and the number of outliers (12) were significantly large. It is concluded that the variability of the

s_{R W}

values are linked to the calculation of the

s_{R W}

from incomplete cycles. The experiment has once again demonstrated that the long-term QC data are not normally distributed and provided an explanation for the monthly variability of the

s_{R W}

.

The standard deviations calculated from 40 consecutive values are presented in Figure 6. It can be clearly observed that the distribution of standard deviations varies across the different simulation scenarios. The lowest standard deviation values were recorded in the case of bias1, while the highest values were found in the bias5b simulation. The bias5b scenario represents a situation in which newly installed reagents are mandatorily calibrated. When the standard deviation is calculated from such a process, it often results in relatively high values. This is due to the random changes introduced during mandatory calibration, which increase the variability of the data.

4.5. Mean Variability

Another contradiction with the assumed normal distribution of the long-term QC data, as suggested by day-to-day experience and also mentioned in the literature, is the significant variability of monthly means and relative biases determined from internal QC data. To verify the observation, the moving average of 1, 17, 31, and 97 datasets were calculated for each cyclical bias type (bias4a_1, bias4b_1, bias4c_1, bias5a_1, bias5b_1) and compared with the moving averages obtained in the case of the constant bias (which has a normal distribution). The difference between the minimum and maximum values (the range of the obtained values) was compared with the theoretical range estimated with a 99.99% confidence (z = 3.8):

\frac{2 * 3.8}{\sqrt{N}}

(where N is the number of values used in the calculation of the average). Such a confidence level was necessary because there are 10,000 simulated datasets (Table 3).

5. Discussion

An inaccurate definition is not necessarily incorrect. An inaccurate definition means missing or unspecified conditions, details, or the definition leaves room for misinterpretation. Such an example is the name of the ‘normal distribution.’ K. Pearson, who proposed the unlucky name, acknowledged later: “It has led many writers to try and force all frequency (…) into a ‘normal’ curve”. [15].

This is the motivation behind the GUM F.1.1.3 [2,5] recommendation to verify all distributions before considering them normal. This step is essential to avoid errors in statistical analysis. In fields where periodic quality control (QC) is required, the bias is not constantly changing over time. This bias must be predictable, meaning it behaves as a time-dependent function. Without this property, reliable predictions cannot be made. Therefore, in such domains, treating bias as a time-variable function is not just practical—it is necessary.

This study focuses on the issue of measurement error, which includes both systematic and random components. The systematic error component can be divided into constant and variable parts. Our main goal is to draw attention to the fact that the variable part of systematic error is often overlooked in current computational methods. In our model, we assume that all input parameters are independent. This assumption simplifies the theory and helps us develop the model more easily. However, we do not explore the uncertainties that come from variable measurement errors caused by correlations between input parameters. Modeling the variances and covariances of these correlated systematic errors is important, but it is beyond the scope of this article.

Since the true value cannot be known, the exact value of the bias can only be determined with uncertainty. However, even though the bias cannot be measured precisely, it does exist. In the internal QC in the clinical laboratory, a single reference standard for each level is used as control material in long time frames. These control materials, because of the technology used in their production, are not commutable, which means that they do not behave exactly as patient samples, and the exact value of the bias cannot be determined because of the matrix error. Even in this condition, the variability of the measured value can be used to determine the VCSE(t) within the limits of uncertainties of the statistical methods. Only the constant component of the bias remains uncertain.

Bias does not increase or decrease indefinitely because corrective actions are regularly applied in quality control processes. As a result, bias is always a bounded function and typically exhibits a cyclical nature. Unfortunately, these cycles are unequal in length, amplitude, and mean. Bias can be strictly increasing or decreasing only over short time periods. In contrast, the probability density function of a normal distribution is unbounded, meaning it extends infinitely in both directions. This contradiction excludes the identity. Therefore, neither the daily

{\bar{x}}_{i}

, nor the variations in bias, nor the long-term control data can be normally distributed. However, the bias function is not always continuous; in fact, it is usually discontinuous in practice.

A human intervention (e.g., calibration) causes an unpredictable change in the mean and bias, discontinuing the function, because the results after calibration are calculated using a different calibration graph. (However, calibrations are performed by humans, calibration errors are not human errors, but inherent errors caused by the fact that calibration is a measurement). However, bias remains predictable between such interventions. It is essential to distinguish between two types of variability: random error and systematic error, which vary randomly. Random error refers to unpredictable changes that occur between two consecutive measurements (in replicated measurements according to VIM3 2.19 [2]). In contrast, the randomly variable systematic error is not consistent with the VIM3 2.19 definition, as it remains predictable in the short term (constant bias in the case of stable reagent or gradually increasing/decreasing bias, e.g., caused by reagent degradation). This latter type of error reflects changes in the bias that are not purely random but follow a pattern or function over time. Due to the unpredictable changes in the mean (caused by human interventions, e.g., calibration errors, though unexpected phenomena may also contribute, e.g., reagent impurification), the bias becomes unpredictable over time, but not “unpredictable in replicated measurements”. Therefore, such bias variations are not consistent with the VIM3 2.19 definition of the RE.

Equation (4), which describes the probability density function, is valid only in constant conditions, a condition that is frequently neglected/forgotten. In variable conditions, Equation (5) is valid, which describes a different normal distribution in each moment ‘t’, but the sum of these is not necessarily a normal distribution. For example, the sum of two normal distributions with the same σ, but a different mean (μ1 and μ2), is a bimodal distribution, with peaks in μ1 and μ2. Due to SE (bias) variability, the classical error model (TE = SE + RE, and _maxTE = SE + z · SD), which assumes a normal distribution of the long-term QC data, must be reevaluated.

More authors (using different names, abbreviations, and definitions for the terms) have proposed alternative, empirically deduced error models to address the SE variability. This study proposed to support mathematically these models. The study started from a model proposed by AB Vandra [15] (Equations (2) and (3)), which takes one more step, linking the bias components to the values measured in repeatability (B_r(t)) and intermediate reproducibility (reproducibility within laboratory) conditions (

{\bar{B}}_{R W}

).

By the mathematical deduction of Equation (17), Equation (3) is validated, with one minor difference. While Equation (3) assumes a homoscedastic measurement system, Equation (17) is more general, being valid in both homoscedastic and heteroscedastic measurement systems. Simultaneously, the Dahlberg method of calculating

s_{r}

(the SD measured in repeatability conditions) from differences between duplicate measurements was also mathematically validated. The method uses data obtained in long time frames (intermediate conditions), assuming a homoscedastic measurement system. If the measurement system is heteroscedastic, we do not obtain

s_{r}

but a mean value,

{\bar{s}}_{r}

. This possibility supports the idea that under repeatability conditions, the constant conditions are essential, not the short time frame.

According to Equations (3) and (17) (and to similar equations in the literature), the SD determined from long-term data has two terms:

s_{r}

, and

s_{V C S E}

. The second term,

s_{V C S E}

, is not a random but a systematic component, because it is calculated from means (

{\bar{x}}_{i} a n d \bar{x})

, which in repeatability conditions are constants and do not vary unpredictably in replicated measurements (VIM3 2.19). In longer time frames

\bar{x}

is a constant.

{\bar{x}}_{i}

varies, but not randomly. Its possible values form a bounded set of values; therefore, the distribution is not normal.

This aspect suggests that the long-term QC data are not normally distributed, but rather a mixture of two distinct distributions: a normal distribution caused by the random errors, and a bounded distribution of the systematic component.

An interesting aspect of the mathematical deduction is Equation (18). The two components of Equations (3) and (17) not only have different (systematic and random) origins but also have different mathematical behavior when the averages of multiple measurements obtained under repeatability conditions are calculated. The SD of the normally distributed values reduces

\sqrt{n}

times (Equation (19)) if it is calculated from averages of n values. In Equation (18), only the random component, s_r, is reduced

\sqrt{n}

times; the

s_{V C S E}

does not, proving its different origin. This fact not only challenges the normal distribution of the long-term data, but also questions whether

s_{R W}

is the correct estimator of σ.

The computer simulations and statistical analysis supported the mathematical deduction. In the case of all bias types, independent of the n number of data used in the average calculations, the differences between the obtained s_RW and the theoretically predicted values (Equation (17)) were insignificant. (Table 1).

The consecutive histograms (Figure 3) visually illustrate the effect of calculating the average of more results obtained under repeatability conditions. The first line (bias1–constant bias, normally distributed values) shows what can be expected from a normal distribution. As n, the number of measurements from which the average is calculated increases, and the histograms become narrower and sharper.

In the other three lines (cases of gradually increasing bias (bias4b), randomly variable bias (bias5a), and sinusoidal bias (bias6a)), the phenomenon is hardly observable because the distribution is not normal, but a mix of two distributions. As the SD of the random component decreases (being calculated from averages of n values), the histograms become more similar to the distribution of the systematic component.

The phenomenon is most pronounced in the last case (sinusoidal bias), where the histogram exhibits a bimodal shape, as in the cases of data obtained from a sinusoidal function. (Figure 4) The histograms are consistent with Equation (18); the

s_{V C S E}

term remains unaffected by calculating the average of more results. The long-term means also exhibit more significant variability than could be expected from a normal distribution (Table 3).

When a human intervention—such as calibration—is performed between each pair of quality control (QC) measurements, it introduces unpredictable changes in both the mean and the bias. Calibration is subject to measurement errors; therefore, systematic errors that vary randomly over time (like in bias5a and bias5b) begin to behave like random bias (as in the bias2 case). Consequently, the long-term QC data starts to look like they would follow a normal distribution, with a standard deviation close to σ ≈

s_{R W}

.

However, this effect applies only to the QC results. The QC results change unpredictably between two consecutive measurements due to the intervention. The same does not apply to the actual measured samples. All measurements between two calibrations are affected systematically by the last bias value.

A typical example of bias variation in clinical laboratory practice was reported by Vandra AB [16], where repeated measurements of ALP using an unstable reagent and high-level control showed that bias gradually changed between calibrations and shifted suddenly after each calibration—patterns that are common in laboratory work but often hidden by random error. Similarly, it has been noted that “Within a given day the small deviations of the calibration graph from an ‘ideal calibration graph’ affect all the samples in a systematic way” [21]. These examples highlight the importance of understanding bias variation in real-world settings and support the need for experimental validation using actual measurement data.

As sustained with examples in the Basics subsection, all studied bias variations are theoretically possible cases, but not all are significant in practice. In the clinical laboratory, the most important sources of bias variations are changes in reagent properties, which cause a gradual quasilinear variation (bias4 cases). In contrast, the calibrations (including measurement, stability, nominal value, and reconstitution errors) and the control material changes (reconstitution, stability, and nominal value errors) cause randomly variable biases (alternation between constant periods and unpredictable changes–bias5 cases) (Figure 2). The result is a sawtooth-shaped graph (masked by the random errors), a cyclical variation with unequal cycles (with different lengths, amplitudes, and means). Some of these cycles may last even a month.

One consequence of the non-Gaussian distribution caused by the cyclical variation in the bias values (B_r(t)) is the significant variability of the means and

s_{V C S E}

., causing a similar variability in

s_{R W}

. The phenomenon can be visually predicted if the graphs of the bias variations are studied. The longer the cycles in the case of bias4 (linearly increasing bias), the larger the

s_{V C S E}

, and consequently the

s_{R W}

. In the uniform distribution

s_{V C S E}

can be obtained by dividing the range of values by 2

\sqrt{3}

. Choosing different time frames with different ranges of bias values, different means,

s_{V C S E}

and

s_{R W}

, is obtained. If the time frame becomes longer than the cycle, all possible bias values are obtained, and the

s_{V C S E}

values become stable. (

s_{V C S E}

≈

\frac{p o s s i b l e r a n g e}{2 \sqrt{3}}

). In the case of bias5, the possible range of bias values is reached only after several human interventions (calibrations, newly reconstituted calibrators, and control materials). A few values, even those that originated from a normal distribution, do not behave as a normal distribution, but rather as discrete values. For this reason, it can be predicted that the variability of the mean and

s_{V C S E}

will be more significant (Figure 7).

The computer simulation data confirmed the former predictions. Data in Table 2 show that the significant variability of the

s_{R W}

calculated from 40 data points are linked to the incomplete cycles in the cases of bias4 (gradually increasing bias), bias5 (randomly variable bias), and bias6 (sinusoidal bias). If the cycles were shorter, the variability was less significant. In the case of the bias5 series, due to the inequality between the cycles (random changes), the variability was more significant, as proven by Cochran’s F-test.

The link between the lengths of the cycles and the

s_{R W}

variability is sustained by the comparison between the corresponding

s_{R W}

values obtained in the bias4a and bias4b respective to the bias5a and bias 5b series. (Figure 4). (The bias4b and bias5b series have longer cycles and more significant variability).

The more significant

s_{R W}

variability can also be visually observed on the individual value plot charts (Figure 5). (The less dispersed first five examples are from the bias1 series (constant bias–normal distribution)).

The influence of bias variation on the medium-term means (and relative biases) was studied by calculating the ranges (the differences between the minimum and maximum values in 10,000 simulated data) of the moving averages of 17, 31, and 97 values. The range of a single computer-simulated value was used as a comparison. These values were chosen to be prime numbers, avoiding being multiples of 10 and 25.

The ranges of the normally distributed values of the constant bias (bias1_1) are borderline to the estimated limits. Because the probability of having no values over the limits from 10,000 datasets is only 36% (0.9999¹⁰⁰⁰⁰), it is predictable that borderline exceeding the estimated limits will occur. Meanwhile, in all the other bias series, the variability of the biases is significantly bigger than could be expected from a normal distribution (Table 3). As in the case of

s_{R W}

, the most significant increases in the variability of the mean were obtained in the cases bias4c, bias 5a, and bias 5b (Table 3). The phenomenon can also be observed on boxplots of the moving averages, Figure 8.

The mathematical deductions and computer simulations confirm the experimental data from the literature and from the day-to-day experience of the authors about the significant variability of the means and

s_{R W}

. Monthly values of

s_{R W}

, as well as monthly means and bias variations, do not follow the laws of a normal distribution. Long-term QC data are not normally distributed. This observation supports assumptions 1, 4, and 6. The reason for this non-Gaussian behavior lies in the unequal cycles in bias variation.

Although the repeatability standard deviation

s_{r}

is defined under repeatability conditions, it can still be separated and determined from long-term data. This is possible using methods such as EP15-A3 [25] or variants of the Dahlberg method [11]. This separation works because

s_{V C S E}

, the time-dependent component of bias, is not a random error component. If it were really random, separating

s_{r}

from long-term data would not be possible. The phenomenon highlights once more that in repeatability conditions, it is not the short time frame but the constant conditions that are important.

It is important to distinguish between the two types of unpredictable variation:

True random error changes unpredictably in replicated measurements, consistent with the VIM3 2.19 definition.
Random variation in bias, an alternation between predictable periods and unpredictable changes in the mean, is not consistent with the VIM3 2.19 definition because in replicated measurements, it may be predictable. It behaves differently and follows distinct patterns.

Understanding this difference is crucial for accurately interpreting measurement data and designing reliable quality control procedures.

Another conclusion is that the correct estimator of the σ parameter is

s_{r}

, not

s_{R W}

. According to Equations (3), (17), and (18), the latter also contains a systematic component (

s_{V C S E}

), which causes the non-Gaussian distribution of the long-term control data.

Equation (4) can only be used correctly when the conditions are stable—meaning both the standard deviation (σ) and the mean (μ) stay constant. If the conditions change, then the standard deviation (SD) we measure does not represent just the random error. Instead, it shows the total variation in the data, which comes from two simultaneous sources: the random error and the variations in bias. These two effects must be separated and understood individually. This challenges the current paradigm, which often treats

s_{R W}

as the estimator for σ. However, this approach is based on a series of flawed assumptions:

The conditions required for applying the normal distribution are not verified.
It is wrongly assumed that long-term control data follows a normal distribution.
There is confusion between parameters representing random and variable error components.
The distinction between constant and variable bias components is often ignored.
Definitional uncertainty of “the bias”, the lack of distinguishing between constant and variable bias components, and the biases measured in repeatability and intermediate (reproducibility within laboratory) conditions, which permit the confusion between bias types.

To correct the wrong use of

s_{R W}

as the estimator of the σ parameter is especially important in the clinical laboratory quality control, which is based on the Westgard rules [14]. The long-term success of the Westgard-rules-based QC can be explained by two compensating errors: by using

s_{R W}

as the estimator of σ (instead of the correct

s_{r}

), due to the

s_{R W}

/

s_{r}

ratio, larger, proportionally increased decision limits are used.

As JO Westgard and T. Groth acknowledged, “The calculations based on computer simulations behind the power function graphs are made assuming “within-run (repeatability) SD”, while the graphs are designed with “total SD”. [14]. (The statement contradicts QP1). In this way hundreds of predictable false alarms are avoided, in the case of incorrigible biases. (As mentioned in the introduction, as the first among the assumed facts, half of the monthly mean biases measured in the internal QC are ≈1

s_{R W}

. Based on the laws of the normal distribution, it can be predicted that in the case of a bias of only 1SD, assuming three control runs/day on two levels for each measurand, daily R_1–2S false warnings will occur [1].) This compensation is not accurate, proven by several false alarms in practice. (After calibration, no improvement is obtained). A new quality control system is necessary, which simultaneously corrects both errors. The necessity of reevaluation of all equations used in the QC, which contain SD in their formula, will be discussed in Section 5.3.

5.1. Misinterpretation of Time-Dependent Bias as Random Error

The clinical laboratory has its particularities in comparison with other domains in which metrology is used. The samples are more complex, the errors are often larger, and the variability in reagents is more significant. Because of these differences, it is not surprising that the questionable theory of transforming variable bias components into random phenomena originated in the clinical laboratory literature. Behind this theory lies a binary way of thinking: the belief that there is one “correct” standard deviation (SD) that should be used in all situations—specifically,

s_{R W}

. This idea contradicts QP1, which recognizes that different conditions may require different estimators. The theory is based on a misinterpretation of the VIM3 2.19 [2] definition, more precisely of the ‘unpredictable in replicate measurements’ expression, considering all unpredictable phenomena as random.

The transformation of systematic error (SE) into random error is purely subjective. It is based on the assumption that once an unpredictable change occurs, the systematic error becomes unpredictable. Randomness in the metrological sense is an objective phenomenon, whereas unpredictable is subjective, depending on our ability to predict.

Two criteria serve to make a distinction between objectively transformed and subjectively perceived as transformed phenomena: (1) what is transformed becomes unmeasurable (e.g., we cannot measure the amount of ice in the water after melting); (2) what is transformed loses its properties (e.g., melted ice loses its form).

The random error (RE) (more precisely, the estimator of σ, the

s_{r}

) can be determined from long-term data. Meanwhile, the non-Gaussian distribution of the long-term control proves that the VCSE(t) maintains its influence on the distribution.

5.2. The VIM3 2.19 Definition of the Random Measurement Error

The VIM3 2.19 definition of the random measurement error is the “…component of the measurement error that in replicate measurements varies in an unpredictable manner”. Focusing on the ‘in replicate measurements’ expression, the definition does not leave room for incorrect interpretations. A correct interpretation is an unpredictable variation between all consecutive measurements, not simply ‘unpredictable.’

The misinterpretations in the literature [6,21] suggest that there is room for increasing the accuracy of the definition. There is a common tendency to treat any unpredictable phenomenon as random. In metrology, the term ‘random’ has a narrower and more specific meaning than ‘unpredictable’. The confusion between ‘random’ and ‘variable’ terms is traceable back to WA Shewhart, who, referring to SE variations, stated the following [8]: “The causes of this variability are, in general, unknown”. The consequence is our limited subjective capacity to predict. To avoid such confusions, an additional note is necessary for the VIM3 2.19 [2] definition as follows:

(Proposed note) “Unpredictable has a wider meaning than random (in the metrological sense). Unpredictable in replicate measurements must be understood as an unpredictable variation between each consecutive pair of measurements.”

5.3. The Subcomponents of the Systematic Measurement Error

s_{V C S E}

is calculated from the

{\bar{x}}_{i} - \bar{x}

differences, which can be identified with the VCSE(t) function. According to QP3, due to the uncertainties of the determinations and the inherent errors of the corrective actions, the bias cannot be efficiently eliminated, and the constant component of systematic error (CCSE) exists, confirming the experience, the literature data [10], and the validity of Equation (2).

The CCSE can be identified with the long-term mean bias CCSE ≡

{\bar{B}}_{R W}

. According to QP4, only this component can be efficiently corrected. The B_r(t) correction is theoretically possible. Still, it is an ephemeral solution, because it reappears. Due to its variability, the B_r(t) value has a short validity term, and a correction after this term expires, as the GUM [2,5] B.2.23 and B.2.24 recommend, are risky (may increase the bias). By being consistent with the error model described in Equations (2) and (3), and distinguishing between the bias measured in constant, repeatability, and respective, variable, intermediate reproducibility within laboratory conditions (B_r(t) and

{\bar{B}}_{R W}

), two sets of error parameters are obtained: B_r(t) and

s_{r}

, respectively,

{\bar{B}}_{R W}

and

s_{R W}

, defining two different TE equations. (z is a confidence factor, usually z = 2, corresponding to approximately 95% confidence).

In repeatability conditions:

_{\max} {T E}_{r} (t) = B_{r} (t) + z * s_{r}

(20)

In reproducibility within laboratory conditions:

_{\max} {T E}_{R W} = {\bar{B}}_{R W} + z * s_{R W}

(21)

The mathematical demonstration in the Results section has uncovered that the VCSE(t) is hidden in

B_{r} (t)

and

s_{R W}

too. Therefore, all equations that simultaneously contain these two parameters include redundant VCSE(t) and are erroneous. For example:

TE = B_{r} (t) + z * s_{R W} (Erroneous equation!)

(22)

Usually, this error is committed when the SE is expressed as ‘The bias’ with a definite article, containing a definitional uncertainty. (Which bias?) Such equations usually take the following form:

TE = B + z * s_{R W} (Uncertain equation!)

(23)

(Equation (23)-type formulas are frequently used in the clinical laboratory literature, substituting the bias value determined in the last EQA round, which is a

B_{r} (t)

; therefore, practically the redundant Equation (22) is used.)

To avoid such errors, there is an imperative need to distinguish between the bias types. Bias evaluations (any would be the methodology) do not determine ‘the bias’, but either

B_{r} (t)

or

{\bar{B}}_{R W}

. It is necessary to standardize (for names and abbreviations) and define in VIM3 the bias subcomponents. To be consistent with VIM3 2.17, AB. Vandra [16,17] proposed the following definitions:

The constant component of the systematic error is “The component of measurement error that in replicate measurements remains constant”.

The variable component of the systematic error is “The component of measurement error that in replicate measurements varies in a predictable manner”.

According to QP1, the conditions for determining the error parameters must be consistent with the conditions of their use. In a laboratory that performs periodic QC (in any domain), there are two types of decisions:

QC decisions: These are short-term decisions, such as whether measurements can proceed or whether corrective actions are needed. These decisions are made under repeatability conditions. In this context, Equation (20) is valid, and correct decisions can be made based on repeatability measured parameters. This principle is often violated by the calculations of the decision limits based on $s_{R W} .$ Unfortunately, such recommendations exist [14]. The QC must be redesigned in all domains in which the decision limits are calculated with $s_{R W}$ , to ensure that decisions are made under appropriate conditions (with decision limits calculated with $s_{r}$ ).
The evaluation of the uncertainty of the results is a long-term decision. In this case, Equation (21) is valid. To apply the GUM [2,5] recommendations, the first step is to correct for the bias. As mentioned earlier, a key finding of this study is that only the constant bias component can be corrected efficiently (QP4).

This study demonstrates that there must be made a distinction between the constant and variable bias components, and that

s_{r}

is the correct estimator of the mean RE (the σ parameter of the Gaussian equation). In this way, it becomes the basis of a fundamental reevaluation of the QC, mainly in the clinical laboratory, but also in other domains. Three main suggestions arise from the study, suggesting further research.

All calculations based on equations that include SD in their formula must be reevaluated, because it is not $s_{R W}$ that is the correct estimator of SD but $s_{r}$ .
A new, more accurate quality control system is necessary, based on $s_{r}$ as the correct estimator of σ, which also efficiently avoids the incorrigible biases.
The model suggests that efficient bias corrections, such as GUM B.2.23 and B.2.24 recommendations, are possible only for the constant bias component (QP4).

However, these conclusions were made based on the experience of the authors in the clinical laboratory; the conclusions may have applicability in other domains of metrology too, which also motivates further research.

6. Conclusions

The probability density function of the normal distribution is only valid when conditions are constant, meaning that both parameters, the expected value (μ) and the standard deviation (σ), do not change. However, in practice, these conditions cannot be maintained over long periods. Despite this, error parameters such as bias and standard deviation (SD) are usually determined from long-term quality control (QC) data. Regular, repeated QC is only necessary for measurement systems where the bias changes over time. Despite our efforts to eliminate it, a long-term bias persists, suggesting the existence of a constant bias component. Therefore, bias can be divided into two parts: a constant component and a variable component.

Several empirical models have been published to address the variability of bias. However, because there is no standard definition for the subcomponents of bias (for example, in VIM3), different authors have used various names, abbreviations, and definitions. This study aims to provide a mathematical basis for these models.

The current approach does not distinguish between biases measured in terms of constant, repeatability, and respective variable, reproducibility within laboratory conditions, as is usual in the case of SDs. This is different from how standard deviations (SDs) are usually treated, where these distinctions are made. In this study, we demonstrate mathematically the following:

The long-term data are not normally distributed.
The correct estimator of σ is $s_{r}$ (the SD measured in constant conditions).
The parameter $s_{R W}$ reflects the variability of data influenced by two changing components: random error (RE) and time-dependent systematic error (VCSE(t)). $s_{R W}$ includes both a random and a systematic component. The two components exhibit different behaviors, justifying a distinct classification.
By neglecting to distinguish between biases measured in repeatability, respective reproducibility within laboratory conditions (B_r(t) and ${\bar{B}}_{R W}$ ) creates a definitional uncertainty, an error source, and therefore, it is mandatory to distinguish them. By distinguishing between bias types, two pairs of parameters are obtained: B_r(t) and $s_{r}$ , respectively, ${\bar{B}}_{R W}$ and $s_{R W}$ , defining two different total error (TE) equations.
Both B_r(t) and $s_{R W}$ include VCSE(t); therefore, their simultaneous use in equations creates redundancy.
Only the CCSE = ${\bar{B}}_{R W}$ can be efficiently corrected.
Corrected equations for TE and measurement uncertainty were proposed.
The long-term decisions (measurement uncertainty calculations) must be based on long-term determined parameters, ${\bar{B}}_{R W}$ and $s_{R W}$ . However, it must be noticed that $s_{R W}$ is not the measure of the random error, but of variability.
Short-term decisions (e.g., internal QC) must be based on B_r(t) and $s_{r}$ .
In all domains in which the QC limits are based on $s_{R W}$ , they must be reevaluated (e.g., in the clinical laboratory).

The errors in the current approach can be attributed to three primary issues. First, the term “normal distribution” is misleading, which can cause confusion about when this distribution applies. Second, the definition of random measurement error in VIM3 section 2.19 is often misunderstood. Third, there is no clear definition for the different components of bias. To address these problems, this study proposes a note to clarify VIM3 2.19 and introduces clear definitions for the constant component of systematic error (CCSE) and the variable component of systematic error over time (VCSE(t)). These clarifications aim to improve the understanding and handling of measurement errors in scientific practice.

The proposed error model assumes independent input parameters and does not explore the effects of correlations between them. Also, the study focuses on clinical laboratory data, so further research is needed to confirm its applicability in other domains. Future studies should test this model in different measurement environments, explore the impact of correlated errors, and propose updates to international guides such as VIM3 to include definitions for CCSE and VCSE(t).

Author Contributions

Conceptualization, A.B.V.; Methodology, Á.D.-K.; Software, Á.D.-K.; Investigation, A.B.V. and Á.D.-K.; Resources, Á.D.-K.; Writing—original draft, A.B.V.; Writing—review & editing, Á.D.-K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Dataset available on request from the authors.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SD	Standard deviation (in general)
B	Bias
TE	Total error
RE	Random error component
SE	Systematic error component
CCSE	The constant component of the SE
VCSE(t)	The variable component of the SE, a time-variable function
U	Expanded measurement uncertainty
QC	Quality control
EQA	External quality assessment
VIM3	International Vocabulary of Metrology (3rd Ed.)
GUM	Guide of uncertainty of measurement
QP	Quintessential principle (valid in all fields of metrology)
RMS	Root mean square
$s_{r}$ (r index)	Parameter measured in repeatability conditions of measurement
$s_{R W}$ (RW index)	Parameter measured in reproducibility within laboratory conditions
$s_{V C S E}$	SD calculated from the daily mean values ${\bar{x}}_{i}$

References

Vandra, A. Calibration error, a neglected error source. eJIFCC, 2025; in press. [Google Scholar]
BIPM; IEC; IFCC; ILAC; ISO; IUPAC; IUPAP; OIML. International Vocabulary of Metrology—Basic and General Concepts and Associated Terms (VIM), 3rd ed.; Joint Committee for Guides in Metrology, JCGM: Sèvres, France, 2012; Volume 200. [Google Scholar] [CrossRef]
ISO/IEC 17025:2017; General Requirements for the Competence of Testing and Calibration Laboratories. International Organization for Standardization and International Electrotechnical Commission: Geneva, Switzerland, 2017.
ISO 5725:2023; Accuracy (Trueness and Precision) of Measurement Methods and Results-Part 1–6. International Organization for Standardization. ISO: Geneva, Switzerland, 2023.
BIPM; IEC; IFCC; ILAC; ISO; IUPAC; IUPAP; OIML. Evaluation of Measurement Data-Guide to the Expression of Uncertainty in Measurement; Joint Committee for Guides in Metrology, JCGM: Sèvres, France, 2008; Volume 100. [Google Scholar] [CrossRef]
Westgard, J.O. QC-The Calculations. Westgard QC. 2023. Available online: http://westgard.com/lessons/basic-qc-practices-l/lesson14.html (accessed on 25 August 2025).
Pearson, K. Contributions to the Mathematical Theory of Evolution. Philos. Trans. R. Soc. Math. Phys. Eng. Sci. A 1894, 185, 71–110. [Google Scholar] [CrossRef]
Shewhart, W.A. Economic quality control of manufactured product. Bell Syst. Tech. J. 1930, 9, 364–389. [Google Scholar] [CrossRef]
NIST Sematech. Engineering Statistics Handbook; NIST Sematech: Gaithersburg, MD, USA, 2020. Available online: https://www.itl.nist.gov/div898/handbook (accessed on 25 August 2025).
Kumar, B.V.; Mohan, T. Sigma metrics as a tool for evaluating the performance of internal quality control in a clinical chemistry laboratory. J. Lab. Physicians 2018, 10, 194–199. [Google Scholar] [CrossRef] [PubMed]
Dahlberg, G. Statistical Methods for Medical and Biological Students. Br. Med. J. 1940, 2, 358–359. [Google Scholar]
Synek, V.; Kříženecká, S. Estimation of uncertainty from duplicate measurements: New quantification procedure in the case of concentration-dependent precision. Accredit. Qual. Assur. 2023, 28, 279–298. [Google Scholar] [CrossRef]
Magnusson, B.; Näykki, T.; Hovind, H.; Krysell, M.; Sahlin, E. Handbook for Calculation of Measurement Uncertainty in Environmental Laboratories, 537th ed.; NORDTEST: Serravalle Scrivia, Italy, 2017; Available online: http://www.nordtest.info/wp/2017/11/29/handbook-for-calculation-of-measurement-uncertainty-in-environmental-laboratories-nt-tr-537-edition-4/ (accessed on 4 May 2025).
Westgard, J.O.; Groth, T. Power functions for statistical control rules. Clin. Chem. 1979, 25, 863–869. [Google Scholar] [CrossRef]
Pearson, K. Notes on the History of Correlation. Biometrika 1920, 13, 25–45. [Google Scholar] [CrossRef]
Vandra, A.B. Reevaluation of the Variable Component of the Systematic Error Calls for Paradigm Change in Clinical Laboratory Quality Control. medRxiv 2023. [Google Scholar] [CrossRef]
Vandra, A.B. The Variable Component of the Systematic Errors Calls for Paradigm Change in the Quality Control: Theoretical Reevaluation. JMIRx Med. 2025; submitted. [Google Scholar]
Haeckel, R.; Schneider, B. Detection of Drift Effects Before Calculating the Standard Deviation as a Measure of Analytical Imprecision. Clin. Chem. Lab. Med. 1983, 21, 491–498. [Google Scholar] [CrossRef] [PubMed]
Krouwer, J.S. Setting Performance Goals and Evaluating Total Analytical Error for Diagnostic Assays. Clin. Chem. 2002, 48, 919–927. [Google Scholar] [CrossRef] [PubMed]
Mackay, M.; Hegedus, G.; Badrick, T. Assay Stability, the missing component of the Error Budget. Clin. Biochem. 2017, 50, 1136–1144. [Google Scholar] [CrossRef] [PubMed]
Theodorsson, E.; Magnusson, B.; Leito, I. Bias in clinical chemistry. Bioanalysis 2014, 6, 2855–2875. [Google Scholar] [CrossRef] [PubMed]
Krystek, M. Calculating Measurement Uncertainties. Basic Principles and Implementation, 1st ed.; DIN Deutsches Institut für Normung e. V. Beuth Verlag: Berlin, Germany, 2016; p. 6. [Google Scholar]
Gauss, C.F. Bestimmung der Genauigkeit der Beobachtungen. Z. Für Astron. Verwandte Wiss. 1816, 1, 187–197. [Google Scholar]
Weisstein, E.W. Bessel’s Correction, MathWorld—A Wolfram Resource. Available online: https://mathworld.wolfram.com/BesselsCorrection.html (accessed on 25 August 2025).
Wayne, P. CLSI: User Verification of Precision and Estimation of Bias; Approved Guideline, 3rd ed.; CLSI document EP15-A3; Clinical and Laboratory Standards Institute: Wayne, PA, USA, 2014. [Google Scholar]

Figure 1. Step-by-step structure of the proposed error model.

Figure 2. Different bias types are shown on the QC graphs studied.

Figure 3. Effect of bias types on the distribution of simulated data, shown as histograms (each version is labeled as biasX_1 to biasX_5, where the number indicates which base series was used).

Figure 4. Effect of sinusoidal bias types on the distribution of simulated data, shown as histograms (n = 5).

Figure 5. The results of the test for equal variances (in case of bias4a,b and bias5a,b, multiple comparison intervals for the SD, α = 0.05).

Figure 6. The individual values for the standard deviations calculated for 40 consecutive datasets grouped by the different simulation cases.

Figure 7. In different time frames, different means,

s_{V C S E}

and

s_{R W}

, are obtained.

Figure 7. In different time frames, different means,

s_{V C S E}

and

s_{R W}

, are obtained.

Figure 8. Boxplot of moving averages with window sizes 17, 31, and 97 for different simulation types (for norm1 cases).

Table 1. Summary of standard deviations for simulated data series.

Data Base	Bias Data 1	$s_{r}$	$s_{V C S E}$	$s_{R W}$	$s_{R W (t h e o r)}$ Equation (17)	Bias Data 2	$s_{r}$	$s_{V C S E}$	$s_{R W}$	$s_{R W (t h e o r)}$ Equation (17)
norm1	bias1	1.001	-	1.001	1.001	bias4c	1.001	1.018	1.411	1.427
norm2	bias1	0.707	-	0.707	0.707	bias4c	0.707	1.018	1.232	1.239
norm3	bias1	0.576	-	0.576	0.576	bias4c	0.576	1.018	1.163	1.169
norm4	bias1	0.502	-	0.502	0.502	bias4c	0.502	1.018	1.133	1.135
norm5	bias1	0.451	-	0.451	0.451	bias4c	0.451	1.018	1.110	1.113
norm1	bias2	1.001	0.992	1.416	1.409	bias5a	1.001	0.986	1.402	1.405
norm2	bias2	0.707	0.992	1.219	1.218	bias5a	0.707	0.986	1.213	1.213
norm3	bias2	0.576	0.992	1.150	1.147	bias5a	0.576	0.986	1.144	1.142
norm4	bias2	0.502	0.992	1.114	1.112	bias5a	0.502	0.986	1.107	1.107
norm5	bias2	0.451	0.992	1.094	1.090	bias5a	0.451	0.986	1.087	1.084
norm1	bias3	1.001	0.998	1.409	1.414	bias5b	1.001	1.046	1.453	1.448
norm2	bias3	0.707	0.998	1.221	1.223	bias5b	0.707	1.046	1.269	1.262
norm3	bias3	0.576	0.998	1.153	1.153	bias5b	0.576	1.046	1.197	1.194
norm4	bias3	0.502	0.998	1.121	1.118	bias5b	0.502	1.046	1.164	1.160
norm5	bias3	0.451	0.998	1.102	1.095	bias5b	0.451	1.046	1.140	1.139
norm1	bias4a	1.001	1.106	1.488	1.491	bias6a	1.001	1.000	1.429	1.415
norm2	bias4a	0.707	1.106	1.301	1.312	bias6a	0.707	1.000	1.222	1.225
norm3	bias4a	0.576	1.106	1.235	1.246	bias6a	0.576	1.000	1.150	1.154
norm4	bias4a	0.502	1.106	1.206	1.214	bias6a	0.502	1.000	1.115	1.119
norm5	bias4a	0.451	1.106	1.186	1.194	bias6a	0.451	1.000	1.092	1.097
norm1	bias4b	1.001	1.041	1.452	1.444	bias6b	1.001	1.000	1.426	1.415
norm2	bias4b	0.707	1.041	1.262	1.258	bias6b	0.707	1.000	1.237	1.225
norm3	bias4b	0.576	1.041	1.192	1.189	bias6b	0.576	1.000	1.162	1.154
norm4	bias4b	0.502	1.041	1.158	1.156	bias6b	0.502	1.000	1.127	1.119
norm5	bias4b	0.451	1.041	1.138	1.134	bias6b	0.451	1.000	1.102	1.097

Table 2. The variability of the

s_{R W}

values that contradict the normal distribution are linked to the incomplete cycles from which they are calculated for norm1 cases (

{F_{c r i t} = F}_{0.004} (39, 9999

) = 1.72).

Table 2. The variability of the

s_{R W}

values that contradict the normal distribution are linked to the incomplete cycles from which they are calculated for norm1 cases (

{F_{c r i t} = F}_{0.004} (39, 9999

) = 1.72).

Bias Type	bias1	bias2	bias3	bias4a	bias4b	bias4c	bias5a	bias5b	bias6a	bias6b
_min $s_{R W}$	0.721	1.074	1.021	1.126	1.026	0.846	0.810	0.728	1.074	0.861
_max $s_{R W}$	1.287	1.785	1.864	1.957	2.039	1.836	2.213	2.605	1.736	1.968
total $s_{R W}$	1.001	1.418	1.410	1.499	1.447	1.414	1.326	1.262	1.438	1.355
difference	0.566	0.711	0.843	0.831	1.013	0.989	1.403	1.877	0.663	1.107
difference (%)	56.53	50.11	59.76	55.42	69.98	69.97	105.79	148.71	46.08	81.69
F_min	1.926	1.745	1.907	1.771	1.988	2.771	2.682	3.003	1.794	2.480
F_max	1.653	1.583	1.747	1.705	1.985	1.694	2.784	4.261	1.458	2.108
No. of rejected samples	1	0	6	0	4	12	38	44	1	12

Table 3. The variability of the medium-term means in the case of different bias types.

Stat	N	bias1	bias4a_1	bias4b_1	bias4c_1	bias5a_1	bias5b_2	Estimated
min	1	−2.953	−4.026	−3.590	−3.623	−3.941	−4.214
max		4.619	5.712	5.448	5.765	6.362	6.269
dif		7.571	9.738	9.038	9.388	10.304	10.483	7.600
min	17	−0.002	−0.082	−0.334	−0.927	−1.191	−1.808
max		1.960	2.019	2.393	2.723	3.630	5.112
dif		1.962	2.100	2.727	3.650	4.821	6.920	1.843
min	31	0.260	0.265	0.176	−0.328	−0.517	−1.459
max		1.845	1.801	2.026	2.291	2.782	4.344
dif		1.584	1.537	1.850	2.618	3.300	5.803	1.365
min	97	0.600	0.577	0.604	0.461	0.210	−0.556
max		1.445	1.450	1.485	1.529	1.865	2.593
dif		0.845	0.873	0.881	1.067	1.654	3.149	0.772

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Vandra, A.B.; Drégelyi-Kiss, Á. Distinguishing Constant and Variable Bias in Systematic Error: A New Error Model for Metrology and Clinical Laboratory Quality Control. Metrology 2025, 5, 67. https://doi.org/10.3390/metrology5040067

AMA Style

Vandra AB, Drégelyi-Kiss Á. Distinguishing Constant and Variable Bias in Systematic Error: A New Error Model for Metrology and Clinical Laboratory Quality Control. Metrology. 2025; 5(4):67. https://doi.org/10.3390/metrology5040067

Chicago/Turabian Style

Vandra, Atilla Barna, and Ágota Drégelyi-Kiss. 2025. "Distinguishing Constant and Variable Bias in Systematic Error: A New Error Model for Metrology and Clinical Laboratory Quality Control" Metrology 5, no. 4: 67. https://doi.org/10.3390/metrology5040067

APA Style

Vandra, A. B., & Drégelyi-Kiss, Á. (2025). Distinguishing Constant and Variable Bias in Systematic Error: A New Error Model for Metrology and Clinical Laboratory Quality Control. Metrology, 5(4), 67. https://doi.org/10.3390/metrology5040067

Article Menu

Distinguishing Constant and Variable Bias in Systematic Error: A New Error Model for Metrology and Clinical Laboratory Quality Control

Abstract

1. Introduction

2. Proposed Error Model

3. Mathematical Deduction

4. Simulation

4.1. Basics

4.2. Analysis of Simulated Data

4.3. Statistical Analysis

4.4. $s_{R W}$ Variability

4.5. Mean Variability

5. Discussion

5.1. Misinterpretation of Time-Dependent Bias as Random Error

5.2. The VIM3 2.19 Definition of the Random Measurement Error

5.3. The Subcomponents of the Systematic Measurement Error

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Distinguishing Constant and Variable Bias in Systematic Error: A New Error Model for Metrology and Clinical Laboratory Quality Control

Abstract

1. Introduction

2. Proposed Error Model

3. Mathematical Deduction

4. Simulation

4.1. Basics

4.2. Analysis of Simulated Data

4.3. Statistical Analysis

4.4. s R W Variability

4.5. Mean Variability

5. Discussion

5.1. Misinterpretation of Time-Dependent Bias as Random Error

5.2. The VIM3 2.19 Definition of the Random Measurement Error

5.3. The Subcomponents of the Systematic Measurement Error

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.4. $s_{R W}$ Variability