Quantum Errors and Disturbances: Response to Busch, Lahti and Werner

Busch, Lahti and Werner (BLW) have recently criticized the operator approach to the description of quantum errors and disturbances. Their criticisms are justified to the extent that the physical meaning of the operator definitions has not hitherto been adequately explained. We rectify that omission. We then examine BLW's criticisms in the light of our analysis. We argue that, although the approach BLW favour (based on the Wasserstein 2-deviation) has its uses, there are important physical situations where an operator approach is preferable. We also discuss the reason why the error-disturbance relation is still giving rise to controversies almost a century after Heisenberg first stated his microscope argument. We argue that the source of the difficulties is the problem of interpretation, which is not so wholly disconnected from experimental practicalities as is sometimes supposed.


Introduction
The error-disturbance principle remains highly controversial almost a century after Heisenberg wrote the paper [1] which originally suggested it. It is remarkable that this should be so, since the disagreements concern what is arguably the most fundamental concept of all, not only in physics, but in empirical science generally: namely, the concept of measurement accuracy. Measuring instruments are not born equal. If one did not have a way to distinguish measurements which are in some sense "good" from measurements which are in some sense "bad"-if one did not have what Busch et al [2] call a "figure of merit"-one would be forced to regard all measurements as being on the same footing. There would, in fact, be no reason to prefer numbers obtained using a state-of-the-art photon counter from those obtained using the cheaper, less demanding procedure of making a blind guess. Under such conditions empirical science would be impossible. Since physics has actually made huge advances over the last century it is obvious that, on a practical level, experimentalists have ways to distinguish good measurements from bad. However, those practical methods are not supported by an adequate understanding at the theoretical level.
It is worth asking why, given the fundamental importance of the problem, progress has been so slow. Although it is true that the problem is technically demanding, it less than perfectly accurate, and that the electron really will (in some sense) be disturbed by the photon. The situation seems to be crying out for proper quantum mechanical analysis. Yet it evidently did not seem that way to Heisenberg. Nor, apparently, did it seem that way to most other people before the 1960s. During the period between 1927 and the 1965 paper of Arthurs and Kelly [15] one finds various paraphrases and elaborations of the statements in Heisenberg's original paper but we are not aware of any clear statement of the error-disturbance principle conceived as a proposition distinct from the Kennard-Pauli-Weyl inequality, or any recognition of the fact that a quantum mechanical definition of measurement accuracy is needed. The question arises: Why is it that Heisenberg and so many others failed to draw what seems to most people now the obvious conclusion from his uncertainty paper? The answer, we suggest, is that their understanding was obstructed by one of the features of the Copenhagen interpretation.
In the words of Bell [16] the Copenhagen interpretation 2 divides the world "into speakable apparatus . . . that we can talk about . . . and unspeakable quantum system that we can not talk about" (ellipses in the original). This idea has been hard to maintain since the 1970's, when it was realized, in connection with the problem of gravity-wave detection, that the error-disturbance principle is relevant to highly accurate measurements of a macroscopic oscillator [20,21]. Such an oscillator is just as speakable as any other piece of laboratory apparatus; yet at the same time we need to analyze its behaviour quantum mechanically. But in the early days of quantum mechanics the unspeakability of quantum systems was accepted by almost everyone. Thinking of the quantum world as ineffable, and beyond the reach of thought [18]-forgetting that the quantum world is the one in front of our noses-encouraged the perception that quantum mechanical measurements are so utterly different from classical ones that no points of contact with classical concepts are possible. In particular, it encouraged the assumption that the classical concept of error cannot carry over to quantum mechanics in any shape or form. This, we would suggest, is why Heisenberg did not follow through on what now seems the obvious implication of his microscope argument, and formulate an error-disturbance principle. He did not do so because he rejected the very idea of a quantum error, or a quantum disturbance.
Corresponding to the idea that there are two different worlds, speakable and unspeakable, there is a widespread assumption that there are two kinds of measurement, classical and quantum. If highly accurate determinations of the centerof-mass motion of a macroscopic object are to be treated as quantum measurements then it is hard to see how one can consistently make such a distinction. Instead, one seems forced to the view that every measurement is a quantum measurement, measurements with a meter rule not excluded. To be sure, low precision measurements with a meter rule permit simplifying assumptions which cease to be valid as one increases the accuracy. However, that is a purely a matter of practical convenience, not the signal of a fundamental difference of kind. In the case of kinematics we 2 Of course, the Copenhagen interpretation is a somewhat nebulous entity. For one thing different proponents had different ideas (see Faye [17] and references cited therein). For another the views of individual proponents evolved in the course of time (see Plotnitsky [18] for the evolution in Bohr's thinking, Camilleri [19] for the evolution in that of Heisenberg). It is consequently impossible to give a characterization which is both concise and fully adequate. However, it appears to us that Bell's one sentence summary does identify a theme which, in one form or another, is common to all the variants. continue to use the Newtonian theory when analyzing low velocity motion, without taking this to mean that there is a fundamental difference of kind between the relativistic momentum of a space-ship travelling at near light speed and the Newtonian momentum of a train on the London underground. Similarly in the case of measurements: we need a unified description.
In particular, we need a unified description of measurement errors. The statement, that the kind of sophisticated measurement on a macroscopic object which demands a quantum analysis is more accurate than a commonplace measurement with a meter rule, tacitly assumes that there is a single concept of accuracy applicable to both. Otherwise, we would not have the basis for a comparison. In the case of kinematics the Newtonian definition of momentum is an approximation to the relativistic definition, valid for low velocities. In the same way, we need an overarching quantum definition of error, which effectively reduces to the classical one in limiting cases. At first sight this may seem impossible, since quantum mechanics requires us to drop the assumption that a measurement ascertains the pre-existing value of a specified observable. However, on further reflection it will be seen that even on classical assumptions one is never able to directly compare the measured value with the pre-existing true one. In classical physics as in quantum physics, measured values are the only ones available. It follows that, although in classical principle the error is the difference between the measured value and the true one, in point of classical practice it must be possible to do everything using measured values only.
The purpose of this paper is to make a small beginning on the task of constructing a unified theory of measurement. We focus on Busch, Lahti and Werner's (BLW's) criticisms [22][23][24][25] of the operator approach [26][27][28][29][30] to the description of quantum errors and disturbances. Their criticisms raise some issues which are highly relevant to the above discussion, and which need to be settled if we hope to make progress. It should be stressed, that although our conclusion is that the operator approach is more useful than BLW allow, we are far from rejecting everything they say. In particular, we completely agree with them on what is, perhaps, the most essential point, that quantum errors and disturbances need to be defined operationally. Moreover, in defending the operator approach, it is no part of our intention to impugn the distributional approach they favour. No one would say that the RMS characterization of an ordinary uncertainty is either "better" or "worse" than an entropic characterization. Rather one has different quantitative measures each of which has advantages and disadvantages. Similarly here. The task is not to single out one particular approach as somehow canonical, but rather to achieve a clear understanding, at the basic conceptual level, of what is meant by the words "error" and "disturbance" in a quantum mechanical context, and of the different ways of quantifying the concepts.
There are two versions of the operator approach (or O approach as we will call it from now on). BLW's criticisms are largely directed against the state-dependent version proposed by Ozawa [27,28]. However, we had previously proposed a stateindependent version [26]. Both versions are relevant to our discussion. In Section 2 we compare and contrast them. Section 3 is the core of the paper. We begin with the classical concepts of error and disturbance. We show that there at least two ways to reformulate them in a manner which does not involve a comparison with pre-existing values. We then show that the reformulated definitions have natural quantum generalizations, which we call the D and C definitions. The D and C errors are thus candidates for the overarching concept of measurement accuracy which, we argued above, is necessary if one wants to construct a unified theory of measurement, in which every measurement is seen as quantum. They also have an important bearing on BLW's criticism of the O approach. As BLW correctly observe, the O definitions are nonoperational. However, the D and C definitions are operational. Moreover, the O quantities are upper bounds on the corresponding D and C quantities. This gives indirect operational meaning to the O quantities. Specifically, it means that if one of the O quantities is small, then there are at least two well-defined operational senses in which the measurement is accurate or non-disturbing. The situation when an O quantity is large is more problematic. In the state-independent case it is possible that smallness of the O error/disturbance is both necessary and sufficient for the measurement to be accurate/non-disturbing in a well-defined operational sense. However, we have not been able to prove this.
In Section 4 we analyze BLW's objections to the O approach in the light of the foregoing. BLW contrast the operator approach with what they call a distributional approach. It is to be observed, however, that the D and C quantities are also defined distributionally. Since the O quantities owe their physical meaning to their connection with the D and C quantities, it follows that the O quantities are indirectly distributional. In short, the problem is not to decide between a distributional approach and some other, completely different approach. Rather it is to decide between two different kinds of distributional approach. As with all such questions, the answer is relative to the situation of interest. We show that there is at least one important class of physical problems for the which the D error, and by extension the O error, are clearly more appropriate than the definition which BLW favour, based on the Wasserstein 2-deviation.
Finally, in the Appendix, we give a more careful proof of the error-disturbance and error-error relations than the one we presented in ref. [26]. In that earlier paper we skated over certain questions of domain and differentiability. We here take the opportunity to fill in the missing details.

The Operator Approach
In this section we outline the operator characterization of quantum errors and disturbances. Our aim is purely descriptive. We justify the approach, and respond to the various criticisms which have been made of it, in subsequent sections.
Consider a classical measurement of position. Let x i , p i be the position and momentum immediately before the measurement and let x f , p f be their values immediately after it. Let µ f be the final value of the pointer observable. Then the error in the measurement of position is µ f − x i and the disturbance to the momentum is p f − p i (classical physics does not, of course, require there to be a disturbance to the momentum, but such a disturbance is perfectly possible). On the level of formal analogy it is natural to ask what happens if one replaces the classical variables in these expressions with the corresponding Heisenberg picture operators. Let H s and H a be the Hilbert spaces for the system and apparatus respectively, and assume that system+apparatus are initially in the product statê ρ ⊗α, whereρ is density matrix of the system andα is the density matrix of the apparatus. LetÛ : H s ⊗ H a → H s ⊗ H a be the unitary operator describing the measurement interaction, let be the position, momentum and pointer Heisenberg picture observables immediately before the measurement interaction commences, and let be the Heisenberg picture observables immediately after the interaction has finished. Formal analogy with the classical case then suggests that we define 3 We refer toˆ X (respectively,δ P ) as the error (respectively, disturbance) operator.
We then obtain a numerical characterization of the error by defining and a numerical characterization of the disturbance by defining We label the quantities with a superscriptρ because, while the apparatus "ready" stateα is assumed to be always the same, the system stateρ can vary. The operatorŝ X ,δ P are unbounded which means that the quantities ∆ρ e x, ∆ρ d p are not defined for every stateρ ⊗α. In the following we will always assume thatρ is in the set of physical states P defined in the Appendix. If this is true then, provided thatα is appropriately chosen, the expectation value Tr(M (ρ ⊗α) is well defined and finite for every monomial Of course, we have not yet justified the interpretation of ∆ρ e x and ∆ρ d p as an error and disturbance (beyond noting the formal analogy with classical physics which, though suggestive, is clearly not sufficient to justify the proposal). We defer a proper justification to the next section and focus here on the question, whether there exists an error-disturbance relation expressible in terms of these quantities. In various special cases [15,31,32,[34][35][36][37] one does indeed have ∆ρ e x∆ρ d p ≥ 2 (6) analogous to the ordinary uncertainty relation ∆x∆p ≥ /2. However, as we showed in ref. [26], it is easy to see that the inequality cannot be generally valid. Consider a simple model for the measurement process, in which the pointer observableμ is the position of a particle having momentumπ and in which the measurement rotates the system particle position onto the pointer particle position, so thatμ Such a rotation is effected byÛ 3 In Appleby [26,[31][32][33] we also introduced the predictive error operatorμ f −x f . In this paper we will focus exclusively on the retrodictive operator since that is the one which gives rise to conceptual difficulties.
(so ifx,μ were different components of the position of a single particle in three dimensionsĤ would be a component of the angular momentum operator). The fact thatμ f =x i means thatˆ X = 0. It is easy to see thatδ P = −π i −p i . So this is a measurement for which the error is zero while the disturbance is finite for every physical state. Although we are mainly concerned with the error-disturbance relation in this paper it is worth noting that exactly the same argument shows [26] that the errorerror relation for a joint measurement of position and momentum cannot be valid in general. Indeed, consider a joint measurement in which the interaction of the particle with the position pointer is described by the unitary in Eq. (8), while the momentum pointer just goes along for the ride, without interacting at all. One then has ∆ρ e x = 0 and ∆ρ e p = (μ P,i −p i ) 2 (whereμ P,i =μ P,f is the momentum pointer position). Even though the momentum is not really being measured at all, ∆ρ e p is still finite for every physical state. So Inequality (10) is violated for every physical state.
The fact that Inequalities (6), (10) are not generally valid was noted by ourselves 4 [26] and subsequently by Ozawa [27,28,[38][39][40]; in the case of (10) also by Hall [41]. We, Ozawa, and Hall responded to these facts by trying to find alternative inequalities which are generally valid. However, we on the one hand, and Ozawa and Hall on the other, were led in different directions. We begin by describing our approach to the problem, since this came first in point of time.
The essential point will emerge most clearly if we start with the violation of Inequality (10) by the measurement described by Eq. (8). For this measurement it is not simply that the product ∆ρ e x∆ρ e p is less than /2 for a certain subset of initial states. The product is in fact strictly zero for every possible initial state. However, it would be rash to conclude from this that the measurement is in some sense "best possible". As we noted above, the momentum pointer does not interact with the system, which means that so far as momentum is concerned the measurement is not only not highly accurate, it cannot properly be described as a measurement at all. It is true that ∆ρ e p is small for a certain, highly specific set of initial states. However, that is not a reason for describing the measurement as accurate. Consider the following scenario: Alice goes to Bob's shop and buys what Bob says is a highly accurate ammeter. However, when she gets home she finds that the 4 In ref. [26] we framed the discussion in the context of joint measurements of position and momentum. However, the example we used to make the point was the model interaction described by Eq. (8) above, in which the momentum pointer does not interact with the system at all, and which can therefore just as well be regarded as a single measurement of position only. Generally speaking any measurement of position only can be regarded as a joint measurement in which the momentum pointer does not interact with the system. Conversely, a joint measurement of both position and momentum becomes a single measurement of position only if we simply disregard the momentum reading.
needle is stuck at the 1 amp position. When she goes back to complain Bob is unrepentant. He insists that the meter is indeed highly accurate provided one uses it to measure a 1 amp current.
Clearly, Alice will not be satisfied with this response. No more would she be satisfied with the claim, that the interaction described by Eq. (8) gives a highly accurate measurement of momentum.
This example shows that the smallness of the product ∆ρ e x∆ρ e p is not always the signature of a highly accurate joint measurement of position and momentum. Similar remarks apply to the product ∆ρ e x∆ρ d p. Consider, for instance, a "measurement" for whichÛ is the identity, so that there is no coupling whatever between system and apparatus. Here ∆ρ d p is zero for every possible initial state while ∆ρ e x is always finite and sometimes small. Yet, as in the broken ammeter example, it would be an abuse of language to describe this as a measurement of position which is always non-disturbing and sometimes highly accurate.
In ref. [26] these considerations led us to look for replacements for the products ∆ρ e x∆ρ e p, ∆ρ e x∆ρ d p whose smallness can unequivocally be regarded as the signature of a measurement which is in some sense "good". In the broken ammeter example what makes Bob's claim absurd is the fact that an accurate classical ammeter is one for which the measured value is close to the true one, not just for one particular current, but for every current within a wide range. Applying the same principle to the quantum case suggests that we define the error by where P is the set of physical states, as defined in Appendix A. As we saw above, the smallness of ∆ρ e x for some particularρ is consistent with the apparatus being completely decoupled from the system, so that it is not really measuring anything. But if ∆ e x is small it means that ∆ρ e x is small for every possible state and we clearly are entitled to say that the measurement is highly accurate (taking into account the discussion in Section 3). Similar principles apply to the concept of disturbance. Consider, for instance, the measurement described by Eq. (8), which rotatesμ ontô x. For this measurement ∆ρ d p will be small for certain special choices ofρ and α. However, it will typically be large. A medical procedure would not usually be described as non-invasive merely on the grounds that it can occasionally happen that the patient escapes almost intact. Similarly here. We accordingly define the disturbance to be With these definitions it can be shown where we use the convention, here and elsewhere, that a product of the form q × ∞ counts as infinite, even if q = 0. One can also prove a universally valid version of the error-error relation for a joint measurement of position and momentum ∆ e x∆ e p ≥ 2 (14) where ∆ e p is defined by taking the supremum of ∆ρ e x. In ref. [26] we gave a proof of these relations which glossed over some questions to do with domains of definition and differentiability. A completely rigorous proof is given in Appendix A below. The quantities ∆ e x, ∆ d p are not without interest, as we discuss below. However, they are not the appropriate definitions for a real measuring instrument. The demand that ∆ e x be small is the demand that ∆ρ e x be small, not only whenρ is a wave-packet localized in the vicinity of the apparatus, but also whenρ is a wavepacket localized on the other side of the cosmic event-horizon. Clearly, this is not a reasonable demand to make of a practical laboratory instrument, which is only designed to give accurate readings for a restricted set of input states. In ref. [26] we accordingly proposed the following modified definitions where the supremum is now taken over a proper subset R of the set of physical states. We took R to be a set of physical states for which the mean values Tr(xρ, Tr(pρ) lie in a rectangular region of phase space with sides l X , l P , and satisfying certain additional conditions. We then proved the inequalities where we again use the convention that a product of the form q × ∞ counts as infinite, even if q = 0. It will be observed that in the limit as l X , l P → ∞ we recover Inequalities (13), (14). As with Inequalities (13), (14), the proof of Inequalities (17), (18) which we gave in ref. [26] glossed over certain details. We give a completely rigorous proof in Appendix A below, where we also take the opportunity to strengthen the statement somewhat.
Let us now turn to the approach of Ozawa [27,28,39,40] and Hall [41]. In our approach we replaced the state-dependent definitions ∆ρ e x, ∆ρ d p with the quantities ∆ e x, ∆ d p and ∆ R e x, ∆ R d p and proved inequalities applying to those. Ozawa [27,28], by contrast, kept with the state-dependent definitions and showed where ∆ρx, ∆ρp are the ordinary uncertainties in the stateρ. He also showed that [39,40], for a joint measurement of position and momentum, ∆ρ e x∆ρ e p + ∆ρp∆ρ e x + ∆ρx∆ρ e p ≥ 2 (20) It will be observed that these relations have a similar mathematical form to our (17), (18). Hall [41] proved a relation similar to (20). Other modifications and improvements have also been proved [42][43][44].
The reader should not conclude from our earlier discussion that we have any objection to the state-dependent definitions employed by Ozawa, Hall and others. Asking whether a state-independent definition is better than a state-dependent one is like asking whether a hammer is better than a screw-driver. The answer to all such questions, concerning the suitability of a tool, is relative to the use to which it is put. The fact that Bob, in the broken ammeter example, makes an inappropriate use of it does not invalidate the idea, that the classical error is the difference between the measured value and the true one. Similarly here. It is true that there exist quantum analogues of the broken ammeter-processes which do not properly count as a measurement for which the state-dependent error is small. Nevertheless, the state-dependent error has a well-defined physical meaning (as we discuss in Section 3) and this makes it a potentially useful tool. Stateindependent definitions, such as the ones proposed by ourselves or BLW [2,45], have the advantage that they supply what BLW call an overall figure of merit; while state-dependent definitions, if not handled with care, can lead to unreasonable conclusions. But, as Rozema et al [46] point out, state-independent definitions have the disadvantage that they are insensitive to fine, state-dependent details which can be important. The state-dependent error can be used to analyze those details. It is to be observed, furthermore, that the state-dependent quantities ∆ρ e x, ∆ρ d p are the limits of ∆ R e x, ∆ R d p as R is shrunk to a single point. If one takes the view that use of ∆ρ e x, ∆ρ d p is in all circumstances inappropriate then it is hard to see how one can avoid taking the view that use of ∆ R e x, ∆ R d p is also inappropriate when R is very small. Which raises the question: "Just how large has R got to be in order for the use of ∆ R e x, ∆ R d p to be justified?" It is difficult to see how the answer can be other than arbitrary. It appears to us that such discussions are fruitless, and that the solution to the quandary "state-dependent or state-independent?" is not to regard it as a quandary. Instead of making a once-and-for-all choice we are free to use either or both, in a manner adjusted to the question of interest 5 .
So far from being rivals Ozawa's inequalities and ours are closely related. Let R be any region satisfying the condition of Theorem 4 in the Appendix. If we take the supremum on both sides of Ozawa's inequality (19) we obtain the relation where This is weaker than our (17) if l X , l P are large but stronger if they, and the region R are small. It is probably fair to say that that an experimenter will never be committed to the proposition that the system state is preciselyρ. So what is in question is always a set of states R. If R is small one will want to use Ozawa's inequalities, but if it is large one will want to use ours (provided R satisfies the condition of Theorem 4).
Although ∆ρ e x will, in practice, only be small for a restricted set of states, the limiting situation, when it becomes zero for allρ ∈ P, is still conceptually important. It can be shown (Appleby [33], Ozawa [47], Busch [48]) that the condition ∆ e x = 0 is both necessary and sufficient for the distribution of measured values to be x|ρ|x . No real measuring instrument could have precisely this distribution of measured values for every input stateρ; in particular, it cannot do so for states such that the support of x|ρ|x is not compact 6 . Nevertheless the idea that x|ρ|x is the probability distribution for a measurement of position has played a fundamental role in physical thinking ever since Born [49] first proposed it 7 . There is no problem here provided we understand the proposal to be, not that x|ρ|x is an operational distribution (one corresponding to an actual measurement), but that it is the canonical, or target distribution to which an operational distribution may conform more or less well.
A similar result can be proved for joint measurements minimizing the product ∆ e x∆ e p: Namely, that the product is minimized if and only if the distribution of measured values is the Husimi function (Appleby [33], Werner [50], Busch et al [14]). In Appleby [51] we extended the analysis to measurements of angular momentum, and showed that a determination of spin-direction is optimal if and only if the distribution of measured values is n|ρ|n , where |n is a suitably normalized SU(2) coherent state.

Physical Interpretation of the Operator Definitions
We now come to the problem of interpreting the quantities defined in the last section. Quantum mechanics forces us to drop the classical assumption that a measurement ascertains the pre-existing value of a specified observable [3][4][5][6]. Even if one postulates that the observable measured does have a pre-existing value, that value must typically differ from the value found by measurement. In the Bohm theory, for example, the result of a measurement of velocity is usually quite different from the postulated pre-existing velocity [52][53][54][55]. Classically, the error is usually defined in terms of the difference between the measured value and the pre-existing true one. It might consequently seem that, in abandoning the idea that measurements ascertain pre-existing values, we are obliged also to abandon the concept of experimental error (in the Introduction we argued that that is exactly how it did seem to, for example, Heisenberg). We begin by showing that that is not the case. Specifically we describe a classical model for which the classical error can be defined in a way that does not involve a comparison with pre-existing values. We then show that this alternative definition naturally carries over to quantum mechanics..
The example we consider is that of a one-dimensional classical gas. Let x and p be the position and momentum of a particular particle in this gas, and let λ(dxdp) be the phase space probability measure. Suppose we measure x. Let µ f be the pointer position after the measurement. We assume that the measurement process is stochastic and is described by a transition kernel 8 χ(dµ f |x, p) such that the expectation value of a function f (x, p, µ f ) is given by (see, for example, Cinlar [56]) 6 In this connection it may be worth remarking that the x and p space wave-functions cannot both have compact support. So for every physical stateρ at least one of the two distributions x|ρ|x , p|ρ|p must be practically unrealizable. 7 Strictly speaking Born only proposed that we interpret p|ρ|p as a probability distribution. The superscript λ is to serve as a reminder that λ is arbitrary, unlike χ which characterizes the measurement interaction and is therefore fixed. Define 9 It will be seen that σ ce (x, p) is the RMS difference between the measured value and the pre-existing true one when λ is concentrated on the single point (x, p). We then define the classical error by Of course, this definition is open to the same objection as the quantity ∆ e x defined in the last section; namely, that it is likely to be infinite for a realistic model. However, this need not detain us because we are not interested in the model for its own sake, but only as a conceptual bridge which will take us from classical intuition to a reasonable quantum mechanical definition of measurement error. Now let be the mean and standard deviation relative to the measure λ. Then Note that (µ f −x) 2 λ , ∆ λ x are λ-dependent, but ∆ ce x is not. The inequality is actually tight. To see this choose a sequence (x n , p n ) such that σ ce (x n , p n ) → ∆ ce x, and let λ n be the measure concentrated on the point (x n , p n ). Then So where M is the set of all phase-space probability measures. This gives us an alternative formula for the classical error. We can derive a similar formula for the classical disturbance. Let p f be the momentum after the measurement and ξ(dp f |x, p) the transition kernel such that Define the classical disturbance by 9 σce(x, p) is the RMS difference between the measured value and the pre-existing true one if one makes the physically unrealistic assumption that the initial position and momentum are known to be precisely x and p.
Then, by an argument similar to the one above, we find We are now free to throw away the ladder and take Eqs. (29), (32) to be the definitions of ∆ ce x, ∆ cd p. These alternative definitions do not involve a direct comparison between the measured value and the pre-existing one. Consequently, they do not involve the expectation values of products of pairs of variables like µ f and x which, in a quantum mechanical context, become non-commuting operators. Instead they are framed in terms of the moments of probability distributions which are also defined in quantum mechanics. They therefore generalize. Just as we can classically, so in quantum mechanics, we can define the error and disturbance in terms of the increase in an RMS deviation from an initial state mean: where we employ the notations of the last section, together with xρ = Tr(xρ),pρ = Tr(pρ).
We may also define We will refer to these as the D definitions ("D" for "maximal increase in the RMS deviation from the initial state mean"). They are important because they show that the Bell-Kochen-Specker theorem is not, as it might seem, an insuperable obstacle blocking the path from the original classical intuition to a satisfactory quantum generalization. On the contrary, if the concepts are appropriately formulated there is complete continuity between classical and quantum in this regard. However, although the D definitions are valid and useful, they should not be regarded as canonical. In the first place, there are other classical definitions which also have natural quantum generalizations (as we will see in the next paragraph). In the second place, there is no reason to make classical physics the arbiter. There may be useful quantum definitions which are not the generalization of any classical concept. We arrive at another natural generalization of classical ideas if we consider measurements on a pair of correlated particles. Suppose we have two particles with positionsx A ,x B and momentap A ,p B , and suppose we measurex B . Suppose that the unitary operator describing the measurement interaction is of the form where ∆ρ Ce x = sup P AB being the set of physical statesρ AB such that Tr A (ρ AB ) =ρ. We may also define We refer to these quantities as the C definitions ("C" for "Correlation").
Let us now turn to the definitions in Section 2, which we will refer to as the O definitions ("O" for "Operator"). The commutators [μ f ,x i ] and [p f ,p i ] are typically non-zero so the O quantities are typically not generalizations of the corresponding classical quantities, as Busch and co-workers have stressed [22,48]. The O quantities do, however, impose bounds on the D and C quantities, and this gives them an indirect physical interpretation. We have for allρ ∈ P. Similarly ∆ρ Ce x ≤ ∆ρ e x ≤ ∆ρ Ce x + 2∆ρx (44) ∆ρ Taking suprema we deduce When R = P these inequalities reduce to We also have the following constraints on the relative sizes of the D and C quantities These inequalities mean, among other things, that the O quantities are upper bounds on the corresponding D and C quantities. Our discussion raises some important questions. If supρ ∈R ∆ρx is large then the above inequalities are consistent with one of the O quantities being large while the corresponding D and C quantities are both small. They also leave open the possibility that, in the case when supρ ∈R ∆ρx is large, one of the D quantities is large while the corresponding C quantity is small, or vice versa. One would like to know if these possibilities are actually realized.
Korzekwa et al [57] answer the first of these questions for the case of the statedependent disturbances. Consider two non-commuting observablesR,Ŝ on a finite dimensional Hilbert space. Suppose that the system is initially in an eigenstate of R which is not also an eigenstate ofŜ, and suppose that one makes a von Neumann measurement ofR. Then the D and C disturbances are both zero while the O disturbance is non-zero.
Busch [58] gives an example which shows that it is possible for the state-dependent D and C errors to be zero while the state-dependent O error is non-zero. Unlike Korzekwa et al 's example it is rather artificial (it is a quantum version of the broken-ammeter scenario); however, it is enough to establish the point of principle. Suppose the system and pointer particles are both spin-1/2 particles, and that the measured observable and pointer observables are theσ z operators for their respective particles. Suppose that the initial system+apparatus state is |ψ ⊗ |ψ , and thatÛ =Î. Then it is easily seen that the state-dependent O error is the ordinary uncertainty ofσ z in the state |ψ , while the state-dependent D and C errors are zero.
We can use a modification of this example to show that it is possible for the statedependent D quantities to be zero while the state-dependent C quantities are nonzero. Let everything be as in the last paragraph except that system+apparatus are in the maximally mixed state (1/4)Î ⊗Î. Then the D error is zero while the C error is √ 2 (the supremum in Eq. (38) being achieved for the maximally entangled statê where |± are the eigenstates ofσ z ). To show that the same is true of the D and C disturbances continue to assume that system+apparatus are in the maximally mixed state, but take the evolution operatorÛ to beσ y ⊗Î. Then the state-dependent D disturbance to the observableσ x is zero, while the state-dependent C disturbance is √ 2 (the supremum in Eq. The D and C quantities have a direct, operational interpretation as errors and disturbances. Smallness of one of these quantities is both necessary and sufficient for the measurement to be accurate or non-disturbing in a well-defined, operational sense. By contrast the interpretation of the O quantities, as we have presented it here, is indirect: their meaning comes from the fact that they supply various bounds on the D and C quantities. Moreover, although smallness of an O quantity is sufficient, we have not been able to show that it is necessary for the measurement to be accurate or non-disturbing in a well-defined sense. In the case of the stateindependent quantities it is possible that, with more work, one could establish necessity as well. If that were so it would mean, in effect, that the state-independent O quantities were fully operational characterizations of the error and disturbance.
Finally, let us note that there is no reason to assume that our analysis is complete. The O quantities may capture other operationally identifiable features of the measurement which the D and C quantities both miss.

Response to criticisms
We now consider BLW's critique of the O definitions (also see Busch et al [48] and Korzekwa et al [57]). BLW contrast the O approach with what they call a distributional approach. They argue that, although the O approach has its uses in certain special cases, the version of the distributional approach based on the Wasserstein 2-deviation is, in general, greatly preferable. In addressing their criticisms let us begin by observing that the D and C definitions are themselves distributional definitions. Moreover, although the O quantities are not defined distributionally, their physical interpretation (as given in Section 3) depends on the fact that they supply various bounds on the corresponding D and C quantities. So the distinction between operator and distributional approaches is less clear-cut than may initially appear. The problem is not really to decide between a distributional approach and some other completely different approach; rather it is to decide between alternative versions of the distributional approach. As with all such problems the answer is dependent on the situation of interest. In the following it is certainly not our intention to suggest that the O definitions are preferable to BLW's definitions in every situation. We only argue that there is a physically important class of situations in which the D definitions, and consequently the O definitions, are preferable.
BLW accept that the O definitions give valid characterizations of the error (respectively disturbance) under conditions where the observablesx i ,μ f (respectivelŷ p i ,p f ) commute. However, in cases where these observables do not commute they argue thatx i ,μ f (respectivelyp i ,p f ) are not jointly measurable and, consequently, that the interpretation ofˆ X ,δ P as error and disturbance operators is ungrounded. This objection would be justified if we were relying on a naive, purely formal analogy with the classical expressions (µ f − x i ) 2 and (p f − p i ) 2 . However, since we are actually relying on the fact that the O quantities bound the D and C quantities, and since the definitions of the latter are just as operational as BLW's own definitions, there is no problem here.
BLW go on to substantiate their criticisms by giving examples of measurements where the O error is zero even though the distribution of measured values is quite different from the initial state distribution. We will here confine ourselves to their Example 7. The reader will easily perceive that a suitably modified version of our discussion applies to their Examples 8, 9 and 10 (also to Example 5 in ref. [48]). The example is of a measurement of position in which the POVM describing the distribution of measured values is the spectral measure of the shifted oscillator HamiltonianĤ and in which the initial system state is the ground state ofĤ. It is easily verified that ∆ρ e x = 0. On the other hand it can be seen from Fig. 1 that the probability distributions forx i andμ f are very different. In particular the distribution forx i is continuous whereas that forμ f is discrete. BLW take this to mean that the measurement is highly inaccurate, and that the O definition of error is correspondingly misleading. They are right to the extent that there are applications-tomography, for example-for which this measurement would be very ill-suited. However, the purpose of a measurement is not always to accurately reproduce the initial state probability distribution. That is obviously the case in classical physics. Consider, for instance, measurements using a digital ammeter. Here too the initial state probability distribution is continuous while the distribution of measured values is discrete. But this would not usually be seen as a reason for preferring an analogue meter. Similarly in quantum physics: There are situations where one is only concerned with certain specific features of the distribution of measured values, its detailed shape being otherwise unimportant. Consider, for instance, a state discrimination problem where Bob is promised one of a finite set of Gaussian wave-packets and has to decide which particular wave-packet Alice has sent. In this case the crucial requirement is that the quantity (μ f −x) 2 be as small as possible. Other considerations, such as the difference between continuous and discrete, are irrelevant. Consequently, a measurement like the one described in BLW's Example 7 is very well-suited to Bob's purpose. The distributions depicted in Fig. 1 are indeed very different. However, they have exactly the same mean and variance. Consequently, (μ f −x) 2 is not enlarged at all as compared to the initial state variance. This is what Bob requires. It is also (see Inequalities (50)) one of the pieces of information conveyed by the statement that ∆ρ e x = 0, which is not misleading at all-provided it is correctly understood. By contrast, the Wasserstein 2-deviation would be decidedly misleading if applied to this situation, as it would cause one to prefer, to the measurement described, one for which the second distribution was a smeared out version of the first-even though this would clearly be worse for Bob's particular purposes.
Similarly with the disturbance: in a situation where one is interested in the deviation from the initial state mean, but not in any other feature of the probability distribution, then the D definition, and consequently the O definition of disturbance will be more useful than the one based on the Wasserstein 2-deviation.
It is seldom, if ever, the case, that a single figure of merit captures every potentially relevant feature of a piece of technology. Suppose one is buying a car. If one wants a vehicle that can drive very fast round a carefully prepared track one will choose one figure of merit; if, on the other hand, one wants a vehicle suitable for conveying a family of six to the beach one will choose another, quite different figure of merit. Similarly with quantum measurements.
In their examples 7-10 BLW criticize the O definitions on the grounds that the O error can be zero in situations where the initial state and final pointer distributions are very different. In examples 4 and 6 of Busch et al [48] and example 3 of Busch [58] the authors make the opposite point, that the O error can be large in situations where the initial state and final pointer distributions are identical; a fact which they regard as an evident defect of the operator approach. Their argument is based on the principle, that a perfectly accurate measurement is one which perfectly reproduces the initial state probability distribution. To see that the principle is not generally valid consider the following scenario: Alice lives in a city where 50% of the population are infected with HIV. She is worried that she may have it, so she goes to her doctor Bob to be tested. Bob pulls a coin out of his pocket and tosses it. He then puts on a grave face and says "I am sorry, I have bad news for you." Alice is outraged, on the grounds that this isn't a proper test. Bob, however, insists that it is a proper test. After all, it has the same probability distribution. What more can she want? This is a classical example. One can easily construct a quantum example. Suppose, for instance, that Alice and Bob are two students who want to perform a test of the Bell inequalities. Unfortunately they cannot afford state-of-the-art photon counters so they decide that Alice will toss a fair coin at her station, and Bob will independently toss another fair coin at his. On the principle adopted in examples 4 and 6 of ref. [48] and example 3 of ref. [58] these are perfectly accurate measurements. But they will, of course, fail to reveal any correlations between the two particles.
Outside of the three examples under discussion Busch and his co-workers adopt a state-independent version of the principle, according to which a measurement is perfectly accurate if it perfectly reproduces the initial state distribution for every initial state. The phrase in italics makes a crucial difference, as can be seen from the following modified version of the doctor scenario (originally suggested by Poulin [59]) Alice takes 10 cities, in each of which the incidence of HIV is different. She then takes a sample of 100 people from each of these cities and presents them to Bob for testing; without, however, telling Bob which patient comes from which city. It turns out the proportion of positive test results for each city coincides with the actual proportion of HIV infected people in that city. Alice concludes that, whatever it is that Bob is doing, it probably deserves to be considered a test.
Similarly with the state-independent version of the Busch et al principle: If a measurement reproduces the initial state distribution for every choice of state then it is very plausible to argue that it is, in some sense, highly accurate. Calculation confirms that impression. In particular, it is easily seen that a measurement for which the state-independent Wasserstein 2-error is zero will successfully reveal the correlations in a Bell experiment.
However, in the examples under discussion Busch and his co-workers adopt a state-dependent version of the principle. Like the state-dependent version of the operator approach this version of the principle can easily lead to unreasonable conclusions (c.f. the broken-ammeter scenario in Section 2). To show that their objection is not valid we will focus on example 3 in ref. [58]. The extension to the other two examples will, we hope, be apparent. We have already discussed this example at the end of Section 3 (specializing to the case of a spin measurement). As we noted there the O error is non-zero if the initial system state is not an eigenstate ofσ z . On the other hand, the fact that the initial system and apparatus states are the same, and the fact that the system and apparatus do not interact, means that the distribution of measured values is identical to the initial state probability distribution of the measured observable. Busch takes this to imply that the measurement is perfectly accurate. The fact that the example is a quantum version of the first doctor scenario may make one suspicious of this conclusion. To see that the suspicion is justified observe that if the measurement really were perfectly accurate the result of a second, von Neumann measurement of the system observable should be perfectly correlated with the result of the first. It is easily seen that that is not the case. Indeed, one actually finds where |Ψ is the Schrödinger picture state of the system and two apparatuses after the second measurement is completed,μ 1 ,μ 2 are the two pointer observables, and ∆ρ e σ z is the state-dependent O error. It is (to say the least) questionable whether the process just described counts as a measurement at all. Yet not only the state-dependent Wasserstein 2-error, but also the state-dependent D and C errors are zero. That is not a weakness of the definitions: In all three cases the fact that the error is zero is a well-defined operational statement which happens to be true-just as Bob's statement, in the broken-ammeter scenario, happens to be true. It does, however, illustrate the limitations of state-dependent definitions. We argued in Section 2 that state-dependent definitions have their uses. However, they need to be used with caution. In particular, a state-dependent error is not a figure of merit: its smallness does not, by itself, mean that a measurement is in any sense "good".
At this point we ought to stress that, although their arguments are, as it seems to us, invalid, the point that Busch and his co-workers are trying to establish-that there are measurements which are highly accurate as judged by any reasonable operational criterion but for which the O error is large-could be right. In Section 3 we showed that smallness of the D or C quantities is both necessary and sufficient for a measurement to be accurate or non-disturbing in a well-defined, operational sense. But in the case of the state-independent O quantities we only established sufficiency. In the state-dependent case Korzekwa et al [57] have shown that there are processes which are completely non-disturbing as judged by any reasonable operational criterion, but for which the state-dependent O disturbance is non-zero (see the discussion at the end of Section 3). However, it remains an open question whether the same is true of the state-dependent O error. The more challenging, and to our mind more important question, of what can be said regarding the stateindependent O errors and disturbances, also remains open.

Conclusion
In the Introduction we argued that we need to develop a unified theory of measurement, in which classical measurements are seen as a limiting case of quantum measurements, rather in the way that Newtonian kinematics is a limiting case of relativistic kinematics. In particular, we need an overarching quantum mechanical concept of measurement accuracy which effectively reduces to the classical one in special cases, such as measurements with a meter rule. We argued that, contrary to initial appearances, the Bell-KS theorem is not a major obstacle.
We made a start on the problem by showing that there are at least two ways to reformulate the classical definitions of error and disturbance in a way which does not involve a comparison with pre-existing values. The reformulated definitions have natural quantum generalizations which we called the D and C definitions. The D and C definitions are examples of quantum definitions which reduce to the classical concept in special cases. They also bound the O quantities introduced in refs. [26][27][28]. They thereby give physical meaning to the O quantities.
We then turned to BLW's criticisms of the O definitions. We argued that one should not expect there to be a single, canonical way of quantifying the concepts of measurement accuracy and disturbance. The answer to the question "what is the most appropriate quantitative definition?" is always relative to the physical problem of interest. We specified a class of problems for which the D definitions, and consequently the O definitions, are more appropriate than BLW's definitions based on the Wasserstein 2-deviation.
Our analysis raises a number of questions which it might be interesting to investigate. Firstly, in the state-independent case, one would like to know (1) whether smallness of the O quantities is necessary and sufficient for the corresponding D and C quantities both to be small and (2) whether smallness of the D quantities is necessary and sufficient for the corresponding C quantities to be small. Secondly, we have seen that there are physical problems for which the D quantities are better-suited than the ones based on the Wasserstein 2-deviation. One would like to know if the same is true of the C quantities. Thirdly it would be interesting to see if the O quantities capture any other operationally well-defined feature of the measurement, additional to those captured by the D and C quantities. Finally, it would be interesting to see if one can prove error-disturbance and error-error relations expressed in terms of the D and C quantities.
The problem we face is that the operatorsx,p are not defined on the whole Hilbert space. Specifically, |ψ is in the domain ofx if and only if the function x 2 | x|ψ | 2 is integrable. The domain ofp is defined similarly. Expressed more intuitively: |ψ is in the domain ofx if and only if x 2 < ∞. We take the view that states for which that is not true are never realized in a real, Earthbound laboratory experiment. To put it more succinctly: They are unphysical 10 .
One's first thought may be that one can define the domain of physical states to be the set of all |ψ such that x|ψ (respectively p|ψ ) is zero for all x such that |x| > B X (respectively, all p such that |p| > B P ), for suitably large positive constants B X , B P . However, the set of such states is empty (because, if x|ψ is zero outside the interval [−B X , B X ], then its Fourier transform is analytic, by Schwartz's extension of the Paley-Wiener theorem [see, for example, Treves [61]]). Nevertheless, although the theory forces at least one of the wave-functions x|ψ , p|ψ to have an infinite tail, nothing observable under ordinary laboratory conditions can depend on it. It is not possible that a currently performable laboratory experiment can give rise to a state in which there is significant probability of the momentum being greater than 10 (10 10 ) kgms −1 , and even if it was possible one would not use non-relativistic quantum mechanics to describe it. Nor is it possible to produce states for which there is significant probability of |x| being greater than 10 (10 10 ) m. We need to give quantitative expression to this point, that the infinite tails are physically irrelevant. We accordingly take the view (inspired by the rigged Hilbert space formulation of quantum mechanics [62][63][64]) that the set of physical pure states P 0 consists of those states |ψ for which the position space wave function x|ρ|y is (a) C ∞ and (b) rapidly decreasing at infinity in the sense that for every pair of non-negative integers n, m (in other words, x|ψ is a test function for the space of tempered distributions [61]). Note that this is equivalent to requiring that the momentum space wave function is C ∞ and rapidly decreasing at infinity. Note also that P 0 is in the domain of every monomial inx andp. At first sight this definition may appear arbitrary. The reader may allow that it is reasonable to impose some restriction on the behaviour at infinity, but wonder why we make this particular choice. Indeed, the requirement is much stronger than we need for our present purposes. We make the definition nonetheless because actually it is not arbitrary, as can be seen by using a von Neumann lattice [9,65,66]. For some appropriate scale-length λ, let |n, m be the coherent state with wave function Then the set {|n, m : n, m ∈ Z} with one point removed is a basis. Choose some suitably enormous integer N and letP be the projector onto the finite dimensional subspace spanned by the set {|n, m : − N ≤ n, m ≤ N }. Then for any state |ψ that is relevant to a real laboratory experiment the quantity (1 −P )|ψ ) will be negligible. Consequently, predictions obtained using the state |ψ will be 10 Perhaps it will be speculated that such states may exist in nature. We do not particularly deny that possibility. We only claim that they are irrelevant to the considerations of this paper.
experimentally indistinguishable from ones obtained using the state Without loss of predictive power we may therefore replace |ψ with |ψ r . The fact that |ψ r is a finite linear combination of coherent states means that it belongs to P 0 . Of course, P 0 also includes states like |n, m with n, m both much larger than N , which are certainly not relevant to ordinary, Earthbound laboratory experiments (being localized outside the cosmic event horizon). The point is only that every pure state which is experimentally relevant is empirically indistinguishable from a state in P 0 . Finally we need to define P, the set of physical density matrices. For each nonnegative integer m and real β define norms Now letρ = n ξ n |n n| be an arbitrary density matrix with eigenvectors |n . We define P to consist of thoseρ for which |n ∈ P 0 for all n and for which sup n (N m,β (|n )) < ∞ sup n Ñ m,β (|n ) < ∞ for all m, β. Note that it is enough to demand that one set of suprema is finite, since the finiteness of the other is then automatic. Note also that in the case when the spectrum ofρ has degeneracies the finiteness of the suprema does not depend on the particular choice of eigenvectors. Finally, let us remark that for the technical purposes of this appendix it would be enough to require that the suprema are finite for the particular case m = 0, β = 3/2. This definition is justified by the fact that no experimentally relevant density matrix can be distinguished empirically from a state in P. Indeed, letρ be an experimentally relevant density matrix, and letP be the projector onto the first N eigenstates. Choose N so that Tr((1 −P )ρ) is smaller than some suitably tiny number. No practicable experiment can distinguish betweenρ and (1/(Tr(Pρ))PρP . By the argument we used to justify the definition of P 0 , the state (1/(Tr(Pρ))PρP is in turn empirically indistinguishable from one of the form ρ 0 = n ξ n |n n| (61) where the states |n are a finite orthonormal set in P 0 . The fact that the set is finite meansρ 0 ∈ P. The proof of the main theorem depends on three technical lemmas. Define displacement operatorsD xp = e i(px−xp) , To prove the second inequality let |φ =x|ψ . Then implying x|ψ x1,p1 −x|ψ x2,p2 ≤ |φ x1,p1 − |φ x2,p2 + x 1 ψ x1,p1 − |ψ x2,p2 + |x 1 − x 2 |.
The proof now reduces to an application of the first inequality. The last inequality is proved in the same way.
Let the initial apparatus state bê α = na n=1 λ n |φ n φ n | for some set of positive numbers λ n and orthonormal set |φ n . We argue on the same physical grounds adduced in the first few paragraphs of this appendix that the quantities Tr (ρ ⊗α)ˆ 2 x , Tr ρ ⊗α)δ 2 p are well-defined for allρ ∈ P. Finally, for given positive real numbers l X , l P , define C lX,lP to be the phase-space box consisting of all x, p such that − lX 2 ≤ x ≤ lX 2 , − lP 2 ≤ x ≤ lP 2 . Lemma 3. Letρ be any element of P, and letρ xp =D xpρD † xp . LetÂ be any self-adjoint operator such that Tr ρ xp ⊗α)[p,Â] , Tr ρ xp ⊗α)[x,Â] are defined, and Tr (ρ xp ⊗α)Â 2 , Tr (ρ xp ⊗α)xÂ 2x , Tr (ρ xp ⊗α)pÂ 2p are both defined and bounded on C lX,lP . Then Tr (ρ xp ⊗α)Â is differentiable on C lX,lP , and Moreover, the derivatives are uniformly continuous on C lX,lP .
Uniform continuity of the x derivative now follows by another application of the Cauchy-Schwartz inequality. Uniform continuity of the p derivative is proved similarly.
We are now ready to prove our main result.
Theorem 4. Let R be a subset of P containing at least one state ρ such that ρ xp , 1 Tr(ρ xpx 2 )xρ xpx , 1 Tr(ρ xpp 2 )pρ xpp ∈ R for all (x, p) ∈ C lX,lP . Then for every measurement of position, and for every joint measurement of position and momentum.
Proof. To prove the first relation observe that it is automatic if either of the quantities ∆ R e x, ∆ R d p is infinite. Suppose, on the other hand, they are both finite. It is easily seen that Letρ be any state in R satisfying condition (93). It is easily seen that Tr ρ xpx 2 , Tr ρ xpp 2 are bounded. So we can apply Lemma 3 withÂ =ˆ x ,δ p to deduce where ∇ = ∂ ∂x ∂ ∂p and v = Tr (ρ xp ⊗α)ˆ x Tr (ρ xp ⊗α)δ p .
Since ∇ · v is continuous it is integrable. Hence where B lX,lP is the boundary of C lX,lP and n is the outward-pointing normal. The second inequality is proved in the same way, starting from the commutation relation Inequalities (13), (14) are proved by specializing to the case R = P and taking the limit as l X , l P → ∞.