Measurement uncertainty relations for position and momentum: Relative entropy formulation

Heisenberg's uncertainty principle has recently led to general measurement uncertainty relations for quantum systems: incompatible observables can be measured jointly or in sequence only with some unavoidable approximation, which can be quantified in various ways. The relative entropy is the natural theoretical quantifier of the information loss when a `true' probability distribution is replaced by an approximating one. In this paper, we provide a lower bound for the amount of information that is lost by replacing the distributions of the sharp position and momentum observables, as they could be obtained with two separate experiments, by the marginals of any smeared joint measurement. The bound is obtained by introducing an entropic error function, and optimizing it over a suitable class of covariant approximate joint measurements. We fully exploit two cases of target observables: (1) $n$-dimensional position and momentum vectors; (2) two components of position and momentum along different directions. In (1), we connect the quantum bound to the dimension $n$; in (2), going from parallel to orthogonal directions, we show the transition from highly incompatible observables to compatible ones. For simplicity, we develop the theory only for Gaussian states and measurements.


Introduction
Uncertainty relations for position and momentum [40] have always been deeply related to the foundations of Quantum Mechanics. For several decades, their axiomatization has been of 'preparation' type: an inviolable lower bound for the widths of the position and momentum distributions, holding in any quantum state. Such kinds of uncertainty relations, which are now known as preparation uncertainty relations (PURs), have been later extended to arbitrary sets of n ≥ 2 observables [44][45][46]59]. All PURs trace back to PUR the celebrated Robertson's formulation [58] of Heisenberg's uncertainty principle: for any two observables, represented by self-adjoint operators A and B, the product of the variances of A and B is bounded from below by the expectation value of their commutator; in formulae, Var ρ (A) Var ρ (B) ≥ 1 4 |Tr{ρ[A, B]}| 2 , where Var ρ is the variance of an observable measured in any system state ρ. In the case of position Q and momentum P , this inequality gives Heisenberg's relation Var ρ (Q) Var ρ (P ) ≥ 2 4 . About 30 years after Heisenberg and Robertson's formulation, Hirschman attempted a first statement of position and momentum uncertainties in terms of informational quantities. This led him to a formulation of PURs based on Shannon entropy [41]; his bound was later refined [12,14], and extended to discrete observables [50]. Also other entropic quantities have been used [35]. We refer to [31,63] for an extensive review on entropic PURs.
However, Heisenberg's original intent [40] was more focused on the unavoidable disturbance that a measurement of position produces on a subsequent measurement of momentum [21,25,26,[53][54][55][56]65]. Trying to give a better understanding of his idea, more recently new formulations were introduced, based on a 'measurement' interpretation of uncertainty, rather than giving bounds on the probability distributions of the target observables. Indeed, with the modern development of the quantum theory of measurement and the introduction of positive operator valued measures and instruments [1,20,23,34,39,44], it became possible to deal with approximate measurements of incompatible observables and to formulate measurement uncertainty relations (MURs) for position and momentum, as well as for more general MUR observables. The MURs quantify the degree of approximation (or inaccuracy and disturbance) made by replacing the original incompatible observables with a joint approximate measurement of them. A very rich literature on this topic flourished in the last 20 years, and various kinds of MURs have been proposed, based on distances between probability distributions, noise quantifications, conditional entropy, etc. [19, 21, 22, 24-26, 31, 32, 38, 53-56, 65, 66].
In this paper, we develop a new information-theoretical formulation of MURs for position and momentum, using the notion of the relative entropy (or Kullback-Leibler divergence) of two probabilities. The relative entropy S(p q) is an informational quantity which is precisely tailored to quantify the amount of information that is lost by using an approximating probability q in place of the target one p. Although classical and quantum relative entropies have already been used in the evaluation of the performances of quantum measurements [1, 6-11, 18, 19, 32, 51, 51], their first application to MURs is very recent [2].
In [2], only MURs for discrete observables were considered. The present work is a first attempt to extend that information-theoretical approach to the continuous setting. This extension is not trivial and reveals peculiar problems, that are not present in the discrete case. However, the nice properties of the relative entropy, such as its scale invariance, allow for a satisfactory formulation of the entropic MURs also for position and momentum.
We deal with position and momentum in two possible scenarios. Firstly, we consider the case of ndimensional position and momentum, since it allows to treat either scalar particles, or vector ones, or even the case of multi-particle systems. This is the natural level of generality, and our treatment extends without difficulty to it. Then, we consider a couple made up of one position and one momentum component along two different directions of the n-space. In this case, we can see how our theory behaves when one moves with continuity from a highly incompatible case (parallel components) to a compatible case (orthogonal ones).
The continuous case needs much care when dealing with arbitrary quantum states and approximating observables. Indeed, it is difficult to evaluate or even bound the relative entropy if some assumption is not made on probability distributions. In order to overcome these technicalities and focus on the quantum content of MURs, in this paper we consider only the case of Gaussian preparation states and Gaussian measurement apparatuses [16,36,45,46,49,59,62]. Moreover, we identify the class of the approximate joint measurements with the class of the joint POVMs satisfying the same symmetry properties of their target position and momentum observables [20,44]. We are supported in this assumption by the fact that, in the discrete case [2], simmetry covariant measurements turn out to be the best approximations without any hypothesis (see also [24-26, 65, 66] for a similar appearance of covariance within MURs for different uncertainty measures).
We now sketch the main results of the paper. In the vector case, we consider approximate joint measurements M of the position Q ≡ (Q 1 , . . . , Q n ) and the momentum P ≡ (P 1 , . . . , P n ). We find the following entropic MUR (Theorem 21, Remark 14): for every choice of two positive thresholds ǫ 1 , ǫ 2 , with ǫ 1 ǫ 2 ≥ 2 /4, there exists a Gaussian state ρ with position variance matrix A ρ ≥ ǫ 1 ½ and momentum variance matrix B ρ ≥ ǫ 2 ½ such that S(Q ρ M 1,ρ ) + S(P ρ M 2,ρ ) ≥ n (log e) ln 1 + 2 √ for all Gaussian approximate joint measurements M of Q and P . Here Q ρ and P ρ are the distributions of position and momentum in the state ρ, and M ρ is the distribution of M in the state ρ, with marginals M 1,ρ and M 2,ρ ; the two marginals turn out to be noisy versions of Q ρ and P ρ . The lower bound is strictly positive and it grows linearly with the dimension n. The thresholds ǫ 1 and ǫ 2 are peculiar of the continuous case and they have a classical explanation: the relative entropy S(p q) → +∞ if the variance of p vanishes faster than the variance of q, so that, given M, it is trivial to find a state ρ enjoying (1) if arbtrarily small variances are allowed. What is relevant in our result is that the total loss of information S(Q ρ M 1,ρ ) + S(P ρ M 2,ρ ) exceeds the lower bound even if we forbid target distributions with small variances.
The MUR (1) shows that there is no Gaussian joint measurement which can approximate arbitrarily well both Q and P . The lower bound (1) is a consequence of the incompatibility between Q and P and, indeed, it vanishes in the classical limit → 0. Both the relative entropies and the lower bound in (1) are scale invariant. Moreover, for fixed ǫ 1 and ǫ 2 , we prove the existence and uniqueness of an optimal approximate joint measurement, and we fully characterize it.
In the scalar case, we consider approximate joint measurements M of the position Q u = u · Q along the direction u and the momentum P v = v · P along the direction v, where u · v = cos α. We find two different entropic MURs. The first entropic MUR in the scalar case is similar to the vector case (Theorem 17, Remark 11). The second one is (Theorem 15): c ρ (α) = (log e) ln 1 + | cos α| for all Gaussian states ρ and all Gaussian joint approximate measurements M of Q u and P v . This lower bound holds for every Gaussian state ρ without constraints on the position and momentum variances Var (Q u,ρ ) and Var (P v,ρ ), it is strictly positive unless u and v are orthogonal, but it is state dependent. Again, the relative entropies and the lower bound are scale invariant. The paper is organized as follows. In Section 2, we introduce our target position and momentum observables, we discuss their general properties and define some related quantities (spectral measures, mean vectors and variance matrices, PURs for second order quantum moments, Weyl operators, Gaussian states). Section 3 is devoted to the definitions and main properties of the relative and differential (Shannon) entropies. Section 4 is a review on the entropic PURs in the continuous case [12,14,41], with a particular focus on their lack of scale invariance. This is a flaw due to the very definition of differential entropy, and one of the reasons that lead us to introduce relative entropy based MURs. In Section 5 we construct the covariant observables which will be used as approximate joint measurements of the position and momentum target observables. Finally, in Section 6 the main results on MURs that we sketched above are presented in detail. Some conclusions are discussed in Section 7.

Target observables and states
Let us start with the usual position and momentum operators, which satisfy the canonical commutation rules: Q ≡ (Q 1 , . . . , Q n ), P ≡ (P 1 , . . . , P n ), Each of the vector operators has n components; it could be the case of a single particle in one or more dimensions (n = 1, 2, 3), or several scalar or vector particles, or the quadratures of n modes of the electromagnetic field. We assume the Hilbert space H to be irreducible for the algebra generated by the canonical H operators Q and P . An observable of the quantum system H is identified with a positive operator valued measure (POVM); in the paper, we shall consider observables with outcomes in R k endowed with its POVM Borel σ-algebra B(R k ). The use of POVMs to represent observables in quantum theory is standard and B(R k ) the definition can be found in many textbooks [20,23,34,37]; the alternative name "non-orthogonal resolutions of the identity" is also used [44][45][46]. Following [20,23,38,46], a sharp observable is an observable represented by a projection valued measure (pvm); it is standard to identify a sharp observable on the out-pvm come space R k with the k self-adjoint operators corresponding to it by spectral theorem. Two observables are jointly measurable or compatible if there exists a POVM having them as marginals. Because of the non-vanishing commutators, each couple Q i , P i , as well as the vectors Q, P , are not jointly measurable. We denote by T(H) the trace class operators on H, by S ⊂ T(H) the subset of the statistical operators S, T(H) (or states, preparations), and by L(H) the space of the linear bounded operators. L(H)

Position and momentum
Our target observables will be either n-dimensional position and momentum (vector case) or position and momentum along two different directions of R n (scalar case). The second case allows to give an example ranging with continuity from maximally incompatible observables to compatible ones.

Vector observables
As target observables we take Q and P as in (3) and we denote by Q(A), P(B), A, B ∈ B(R n ), their pvm's, that is Then, the distributions in the state ρ ∈ S of a sharp position and a sharp momentum measurements (denoted by Q ρ and P ρ ) are absolutely continuous with respect to the Lebesgue measure; we denote by f (•|ρ) and g(•|ρ) their probability densities: ∀A, B ∈ B(R n ), In the Dirac notation, if |x and |p are the improper position and momentum eigenvectors, these densities take the expressions f (x|ρ) = x|ρ|x and g(p|ρ) = p|ρ|p , respectively. The mean vectors and the variance matrices of these distributions will be given in (7) and (8).

Scalar observables
As target observables we take the position along a given direction u and the momentum along another given direction v: In this case we have [Q u , P v ] = i cos α, so that Q u and P v are not jointly measurable, unless the directions u and v are orthogonal. Their pvm's are denoted by Q u and P v , their distributions in a state ρ by Q u,ρ and P v,ρ , and their corresponding probability densities by f u (•|ρ) and g v (•|ρ): ∀A, B ∈ B(R), g v (p|ρ) dp.
Of course, the densities in the scalar case are marginals of the densities in the vector case. Means and variances will be given in (11).

Quantum moments.
Let S 2 be the set of states for which the second moments of position and momentum are finite: Then, the mean vector and the variance matrix of the position Q in the state ρ ∈ S 2 are while for the momentum P we have For ρ ∈ S 2 it is possible to introduce also the mixed 'quantum covariances' Since there is no joint measurement for the position Q and momentum P , the quantum covariances C ρ ij are not covariances of a joint distribution, and thus they do not have a classical probabilistic interpretation. By means of the moments above, we construct the three real n × n matrices A ρ , B ρ , C ρ , the 2ndimensional vector µ ρ and the symmetric 2n × 2n matrix V ρ , with We say V ρ is the quantum variance matrix of position and momentum in the state ρ. In [59] dimensionless canonical operators are considered, but apart from this, our matrix V ρ corresponds to their "noise matrix in real form"; the name "variance matrix" is also used [49,60].
In a similar way, we can introduce all the moments related to the position Q u and momentum P v introduced in (6). For ρ ∈ S 2 , the means and variances are respectively Similarly to (9), we have also the 'quantum covariance' u · C ρ v ≡ v · (C ρ ) T u. Then, we collect the two means in a single vector and we introduce the variance matrix: be a real symmetric 2n × 2n block matrix with the same dimensions of a quantum variance matrix. Define In this case we have: V ≥ 0, A > 0, B > 0, and The inequalities (14) for V ± tell us exactly when a (positive semi-definite) real matrix V is the quantum variance matrix of position and momentum in a state ρ. Moreover, they are the multidimensional version of the usual uncertainty principle expressed through the variances [44,46,59], hence they represent a form of PURs. The block matrix Ω in the definition of V ± is useful to compress formulae involving position and momentum; moreover, it makes simpler to compare our equations with their frequent dimensionless versions (with = 1) in the literature [36,49].

By using the real block vector
αu ′ βv ′ , with arbitrary α, β ∈ R and given u ′ , v ′ ∈ R n , the semipositivity (14) implies which in turn implies A ≥ 0, B ≥ 0 and (15). Then, by choosing u ′ = v ′ = u i , where u 1 , . . . , u n are the eigenvectors of A (since A is a real symmetric matrix, u i ∈ R n for all i), one gets the strict positivity of all the eigenvalues of A; analogously, one gets B > 0.
Inequality (15) for u ′ = u and v ′ = v becomes the uncertainty ruleà la Robertson [58] for the observables in (6) (a position component and a momentum component spanning an arbitrary angle α): Inequality (16) is equivalent to Since V ± are block matrices, their positive semi-definiteness can be studied by means of the Schur complements [27,47,57]. However, as V ± are complex block matrices with a very peculiar structure, special results hold for them. Before summarizing the properties of V ± in the next proposition, we need a simple auxiliary algebraic lemma.
In this case we have Moreover, we have also the following properties for the various determinants: By interchanging A with B and C with C T in (18)-(22) equivalent results are obtained.
Proof. Since we already know that V + ≥ 0 implies the invertibility of A, the equivalence between (14) and (18) with A > 0 follows from [47, Theor. 1.12 p. 34] (see also [57,Theor. 11.6] or [27,Lemma 3.2]). In (19), the first inequality follows by summing up the two inequalities in (18). The last two ones are immediate by the positivity of A −1 .
The equality in (20) is Schur's formula for the determinant of block matrices [47, Theor. 1.1 p. 19]. Then, the first inequality is immediate by the lemma above and the trivial relation B ≥ B − C T A −1 C; the second one follows from (19): The equality det V = 2 2n is equivalent to det B − C T A −1 C = det 2 4 A −1 ; since the latter two determinants are evaluated on ordered positive matrices by (19), they coincide if and only if the respective arguments are equal (Lemma 2); this shows the equivalence in (21). Then, by (18), the self-adjoint matrix i 2 A −1 C − C T A −1 is both positive semi-definite and negative semi-definite; hence it is null, that is, (19), Lemma 2 then implies C T A −1 C = 0 and so C = 0.
By (18) and (19), every time three matrices A, B, C define the quantum variance matrix of a state ρ, the same holds for A, B, C = 0. This fact can be used to characterize when two positive matrices A and B are the diagonal blocks of some quantum variance matrix, or two positive numbers c Q and c P are the position and momentum variances of a quantum state along the two directions u and v. A −1 .
Two real numbers c Q > 0 and c P > 0, having the dimension of the square of a length and momentum, respectively, are such that c Q = Var(Q u,ρ ) and c P = Var(P v,ρ ) for some state ρ if and only if Proof. For A and B, the necessity follows from (19). The sufficiency comes from (18) by choosing For c Q and c P , the necessity follows from (15). The sufficiency comes from (18) with V ρ = A 0 0 B and for example the following choices of A and B: -if cos α = ±1, we take A = c Q ½ and B = c P ½; where A ′ and B ′ are any two scalar multiples of the orthogonal projection onto {u, v} ⊥ satisfying where A ′ and B ′ are as in the previous item.
In the last two cases, we chose A and B in such a way that B = cQ cP (cos α) 2 A −1 when restricted to the linear span of {u, v}.

Weyl operators and Gaussian states
In the following, we shall introduce Gaussian states, Gaussian observables and covariant observables on the phase-space. In all these instances, the Weyl operators are involved; here we recall their definition and some properties (see e.g. [45,Sect. 5.2] or [46,Sect. 12.2], where, however, the definition differs from ours in that the Weyl operators are composed with the map Ω −1 of (13)). Definition 1. The Weyl operators are the unitary operators defined by The Weyl operators (23) satisfy the composition rule in particular, this implies the commutation relation These commutation relations imply the translation property due to this property, the Weyl operators are also known as displacement operators.
With a slight abuse of notation, we shall sometimes use the identification where x p is a block column vector belonging to the phase-space R n × R n ≡ R 2n ; here, the first block x is a position and the second block p is a momentum. By means of the Weyl operators, it is possible to define the characteristic function of any trace-class operator.
Definition 2. For any operator ρ ∈ T(H), its characteristic function is the complex valued function ρ : Note that k is the inverse of a length and l is the inverse of a momentum, so that w is a block vector living in the space R 2n ≡ R n × R n regarded as the dual of the phase-space.
Instead of the characteristic function, sometimes the so called Weyl transform Tr {W (x, p)ρ} is introduced [45,49].
By [45,Prop. 5.3.2, Theor. 5.3.3], we have ρ(w) ∈ L 2 (R 2n ) and the following trace formula holds: ∀ρ, σ ∈ T(H), Moreover, the following inversion formula ensures that the characteristic function ρ completely characterizes the state ρ [45, Coroll. 5.3.5]: The last two integrals are defined in the weak operator topology. Finally, for ρ ∈ S 2 , the moments (7)-(10) can be expressed as in [45,Sect. 5.4]: for a vector µ ρ ∈ R 2n and a real 2n × 2n matrix V ρ such that V ρ + ≥ 0. The condition V ρ + ≥ 0 is necessary and sufficient in order that the function (31) defines the characteristic function of a quantum state [45, Theor. 5.5.1], [46,Theor. 12.17]. Therefore, Gaussian states are exactly the states whose characteristic function is the exponential of a second order polynomial [45, Eq. We shall denote by G the set of the Gaussian states; we have G ⊂ S 2 ⊂ S. By (30), the vectors a ρ , b ρ G and the matrices A ρ , B ρ , C ρ characterizing a Gaussian state ρ are just its first and second order quantum moments introduced in (7)-(9). By (31), the corresponding distributions of position and momentum are Gaussian, namely 2n if and only if ρ is pure.
, then the equivalence (22) gives B ρ = 2 4 (A ρ ) −1 , so that the variance matrices A ρ and B ρ have a common eigenbasis u 1 , . . . , u n . Thus, all the corresponding couples of position Q ui and momentum P ui have minimum uncertainties: Var(Q ui ) Var(P ui ) = 2 4 . Therefore, if we consider the factorization of the Hilbert space H = H 1 ⊗ · · · ⊗ H n corresponding to the basis u 1 , . . . , u n , all the partial traces of the state ρ on each factor H i are minimum uncertainty states. Since for n = 1 the minimum uncertainty states are pure and Gaussian, the state ρ is a pure product Gaussian state.
The converse is immediate.

Relative and differential entropies
In this paper, we will be concerned with entropic quantities of classical type [17,33,61]. We express them in 'bits', that is we use the base-2 logarithms: log a ≡ log 2 a. We deal only with probabilities on the measurable space R n , B(R n ) which admit densities with respect to the Lebesgue measure. So, we define the relative entropy and differential entropy only for such probabilities; moreover, we list only the general properties used in the following.

Relative entropy or Kullback-Leibler divergence
The fundamental quantity is the relative entropy, also called information divergence, discrimination information, Kullback-Leibler divergence or information or distance or discrepancy. The relative entropy of a probability p with respect to a probability q is defined for any couple of probabilities p, q on the same probability space.
Given two probabilities p and q on (R n , B(R n )) with densities f and g, respectively, the relative entropy of p with respect to q is The value +∞ is allowed for S(p q); the usual convention 0 log(0/0) = 0 is understood. The relative entropy (33) is the amount of information that is lost when q is used to approximate p [17, p. 51]. Of course, if x is dimensioned, then the densities f and g have the same dimension (that is, the inverse of x), and the argument of the logarithm is dimensionless, as it must be.
As S(p q) is scale invariant, it quantifies a relative error for the use of q as an approximation of p, not an absolute one.
Let us employ the relative entropy to evaluate the effect of an additive Gaussian noise ν ∼ N (b; β 2 ) on an independent Gaussian random variable X. If X ∼ N (a; α 2 ), then X + ν ∼ N (a + b; α 2 + β 2 ), and the relative entropy of the true distribution of X with respect to its disturbed version X + ν is This expression vanishes if the noise becomes negligible with respect to the true distribution, that is if β 2 /α 2 → 0 and b 2 /α 2 → 0. On the other hand, S(X X + ν) diverges if the noise becomes too strong with respect to the true distribution, or, in other words, if the true distribution becomes too peaked with respect to the noise, that is, β 2 /α 2 → +∞ or b 2 /α 2 → +∞.

Differential entropy
The differential entropy of an absolutely continuous random vector X with a probability density f is This quantity is commonly used in the literature, even if it lacks many of the nice properties of the Shannon entropy for discrete random variables. For example, H(X) is not scale invariant, and it can be negative [33, p. 244].
Since the density f enters in the logarithm argument, the definition of H(X) is meaningful only when f is dimensionless, which is the same as X being dimensionless. Note that, if X is dimensioned and c > 0 is a real parameter making X = cX a dimensionless random variable, then In the following, we shall consider the differential entropy only for dimensionless random vectors X.
The equality holds iff X is Gaussian with variance matrix A and arbitrary mean vector a.
(ii) If X = (X 1 , . . . , X n ) is an absolutely continuous random vector, then The equality holds iff the components X 1 , . . . , X n are independent.
Remark 1. In property (i) we have used the following well-known matrix identity, which follows by diagonalization: Remark 2. Property (i) yields that the differential entropy of a Gaussian random variable X ∼ N (a; α 2 ) is which is an increasing function of the variance α 2 , and thus it is a measure of the uncertainty of X. Note that H(X) ≥ 0 iff α 2 ≥ 1/(2πe).

Entropic PURs for position and momentum
The idea of having an entropic formulation of the PURs for position and momentum goes back to [12,14,41]. However, we have just seen that, due to the presence of the logarithm, the Shannon differential entropy needs dimensionless probability densities. So, this leads us to introduce dimensionless versions of position and momentum. Let λ > 0 be a dimensionless parameter and κ a second parameter with the dimension of a mass times a frequency. Then, we introduce the dimensionless versions of position and momentum: We use a unique dimensional constant κ, in order to respect rotation symmetry and do not distinguish different particles. Anyway, there is no natural link between the parameter multiplying Q and the parameter multiplying P ; this is the reason for introducing λ. As we see from the commutation rules, the constant λ plays the role of a dimensionless version of ; in the literature on PURs, often λ = 1 is used [12,14,31].

Vector observables
Let Q and P be the pvm's of Q and P ; then, Q ρ and P ρ are their probability distributions in the state ρ.
The total preparation uncertainty is quantified by the sum of the two differential entropies H( Q ρ )+H( P ρ ).
For ρ ∈ G, by Proposition 8 we get In the case of product states of minimum uncertainty, we have (det A ρ ) (det B ρ ) = 2 /4 n ; then, by taking (20) into account, we get Thus, the bound (37) arises from quantum relations between Q and P ; indeed, there would be no lower bound for (36) if we could take both det A ρ and det B ρ arbitrarily small. By item (ii) of Proposition 8, the differential entropy for the distribution of a random vector is smaller than the sum of the entropies of its marginals; however, the final bound (37) is a tight bound for both By the results of [12,14], the same bound (37) is obtained even if the minimization is done over all the states, not only the Gaussian ones.
The uncertainty result (37) depends on λ, this being a consequence of the lack of scale invariance of the differential entropy; note that the bound is positive if and only if λ > 1/(πe). Sometimes in the literature the parameter appears in the argument of the logarithm [19,32]; this fact has to be interpreted as the appearance of a parameter with the numerical value of , but without dimensions. In this sense the formulation (37) is consistent with both the cases with λ = 1 or λ = . Sometimes the smaller bound ln 2π appears in place of log πe [50]; this is connected to a state dependent formulation of the entropic PUR [31, Sect. V.B].

Scalar observables
The dimensionless versions of the scalar observables introduced in (6) are We denote by Q u,ρ and P v,ρ the associated distributions in the state ρ. For ρ ∈ S 2 , the respective means and variances are with Var( Q u,ρ ) Var( P v,ρ ) ≥ λ |cos α| /2. As in the vector case, the total preparation uncertainty is quantified by the sum of the two differential entropies H( Q u,ρ ) + H( P v,ρ ). For ρ ∈ G, Proposition 8 gives Then, we have the lower bound which depends on λ, but not on κ. Of course, because of (39), for Gaussian states a lower bound for the sum H( Q u,ρ ) + H( P v,ρ ) is equivalent to a lower bound for the product Var( Q u,ρ ) Var( P v,ρ ). By a slight generalization of the results of [12,14], the bound (40) is obtained also when the minimization is done over all the states. Let us note that the bound in (40) is positive for |λ cos α| > 1/(πe), and it goes to −∞ for α → π/2, which is the case of compatible Q u,ρ and P v,ρ . In the case α = 0, the bound (40) is the same as (37) for n = 1.

Approximate joint measurements of position and momentum
In order to deal with MURs for position and momentum observables, we have to introduce the class of approximate joint measurements of position and momentum, whose marginals we will compare with the respective sharp observables. As done in [21,28,44,45], it is natural to characterize such a class by requiring suitable properties of covariance under the group of space translations and velocity boosts: namely, by approximate joint measurement of position and momentum we will mean any POVM on the product space of the position and momentum outcomes sharing the same covariance properties of the two target sharp observables. As we have already discussed, two approximation problems will be of our concern: the approximation of the position and momentum vectors (vector case, with outcomes in the phase-space R n × R n ), and the approximation of one position and one momentum component along two arbitrary directions (scalar case, with oucomes in R × R). In order to treat the two cases altogether, we consider POVMs with outcomes in R m × R m ≡ R 2m , which we call bi-observables; they correspond to a measurement of m position components and m momentum components. The specific covariance requirements will be given in the Definitions 5,6,7. In studying the properties of probability measures on R k , a very useful notion is that of the characteristic function, that is, the Fourier cotransform of the measure at hand; the analogous quantity for POVMs turns out to have the same relevance. Different names have been used in the literature to refer to the characteristic function of POVMs, or, more generally, quantum instruments, such as characteristic operator or operator characteristic function [1, 3-6, 42-44, 49]. As a variant, also the symplectic Fourier transform quite often appears [46,Sect. 12.4.3]. The characteristic function has been used, for instance, to study the quantum analogues of the infinite-divisible distributions [3][4][5][6]43,44] and measurements of Gaussian type [42,46,49]. Here, we are interested only in the latter application, as our approximating bi-observables will typically be Gaussian. Since we deal with bi-observables, we limit our definition of the characteristic function only to POVMs on R m × R m , which have the same number of variables of position and momentum type.
Being measures, POVMs can be used to construct integrals, whose theory is presented e.g. in [23, Sect Here, the dimensions of the vector variables k and l are the inverses of a length and momentum, respectively, as in the definition of the characteristic function of a state (27). This definition is given so that Tr M(k, l)ρ is the usual characteristic function of the probability distribution M ρ on R 2m .

Covariant vector observables
In terms of the pvm's (4), the translation property (25) is equivalent to the symmetry properties and they are taken as the transformation property defining the following class of POVMs on R 2n [20,23,28,49,64].
We denote by C the set of all the covariant phase-space observables. C The interpretation of covariant phase-space observables as approximate joint measurements of position and momentum is based on the fact that their marginal POVMs have the same symmetry properties of Q and P, respectively. Although Q and P are not jointly measurable, the following well-known result says that there are plenty of covariant phase-space observables [30,48], [45,Theor. 4.8.3]. In (43) below, we use the parity operator Π on H, which is such that Proposition 9. The covariant phase-space observables are in one-to-one correspondence with the states on H, so that we have the identification S ∼ C; such a correspondence σ ↔ M σ is given by The characteristic function (41) of a measurement M σ ∈ C has a very simple structure in terms of the characteristic function (27) of the corresponding state σ ∈ S. Proposition 10. The characteristic function of M σ ∈ C is given by and the characteristic function of the probability M σ ρ is In (44) we have used the identification (26). The characteristic function of a state is introduced in (27).
In terms of probability densities, measuring M σ on the state ρ yields the density function h σ (x, p|ρ) = Tr{M σ (x, p)ρ}. Then, by (45), the densities of the marginals M σ 1,ρ and M σ 2 ρ are the convolutions where f and g are the sharp densities introduced in (5). By the arbitrariness of the state ρ, the marginal POVMs of M σ turn out to be the convolutions (or 'smearings') Let us remark that the distribution of the approximate position observable M σ 1 in a state ρ is the distribution of the sum of two independent random vectors: the first one is distributed as the sharp position Q in the state ρ, the second one is distributed as the sharp position Q in the state σ. In this sense, the approximate position M σ 1 looks like a sharp position plus an independent noise given by σ. Of course, a similar fact holds for the momentum. However, this statement about the distributions can not be extended to a statement involving the observables. Indeed, since Q and P are incompatible, nobody can jointly observe M σ , Q and P, so that the convolutions (46) do not correspond to sums of random vectors that actually exist when measuring M σ .

Covariant scalar observables
Now we focus on the class of approximate joint measurements of the observables Q u and P v representing position and momentum along two possibly different directions u and v (see Section 2.1.2). As in the case of covariant phase-space observables, this class is defined in terms of the symmetries of its elements: we require them to transform as if they were joint measurements of Q u and P v . Recall that Q u and P v denote the spectral measures of Q u , P v .
Due to the commutation relation (24), the following covariance relations hold for all A, B ∈ B(R) and x, p ∈ R n . We employ covariance to define our class of approximate joint measurements of Q u and P v .
We denote by C u,v the class of such bi-observables. C u,v So, our approximate joint measurements of Q u and P v will be all the bi-observables in the class C u,v . It is useful to work with a little more generality, and merge Definitions 5 and 6 into a single notion of covariance.
Thus, approximate joint observables of Q u and P v are just J-covariant observables on R 2 for the choice of the 2 × 2n matrix On the other hand, covariant phase-space observables constitute the class of ½ 2n -covariant observables on R 2n , where ½ 2n is the identity map of R 2n .

Gaussian measurements
When dealing with Gaussian states, the following class of bi-observables quite naturally arises.
for two vectors a M , b M ∈ R m , a real 2m × 2n matrix J M and a real symmetric 2m × 2m matrix V M satisfying the condition In this definition, the vector a M has the dimension of a length, and b M of a momentum; similarly, the matrices J M , V M decompose into blocks of different dimensions. The condition (49) is necessary and sufficient in order that the function (48) defines the characteristic function of a POVM.
For unbiased Gaussian measurements, i.e., Gaussian bi-observables with a M = b M = 0, the previous definition coincides with the one of [46,Section 12.4.3]. It is also a particular case of the more general definition of Gaussian observables on arbitrary (not necessarily symplectic) linear spaces that is given in [36,49]. We refer to [46,49] for the proof that Eq. (48) is actually the characteristic function of a POVM.
Measuring the Gaussian observable M on the Gaussian state ρ yields the probability distribution M ρ whose characteristic function is hence the output distribution is Gaussian,

Covariant Gaussian observables
For Gaussian bi-observables, J-covariance has a very easy characterization. Proof. For x, p ∈ R n , we let M ′ and M ′′ be the two POVMs on R 2m given by By the commutation relations (24) for the Weyl operators, we immediately get we have also Since M(k, l) = 0 for all k, l, by comparing the last two expressions we see that M ′ = M ′′ if and only if which in turn is equivalent to J M = J.

Vector observables
Let us point out the structure of the Gaussian approximate joint measurements of Q and P.

Proposition 12. A bi-observable M σ ∈ C is Gaussian if and only if the state σ is Gaussian. In this case, the covariant bi-observable M σ is Gaussian with parameters
Proof. By comparing (31), (44) and (48), and using the fact that W (x 1 , p 2 ) ∝ W (x 2 , p 2 ) if and only if x 1 = x 2 and p 1 = p 2 , we have the first statement. Then, for σ ∈ G, we see immediately that M σ is a Gaussian observable with the above parameters.
We call C G the class of the Gaussian covariant phase-space observables. By (50), observing M σ on C G a Gaussian state ρ ∈ G yields the normal probability distribution M σ When a σ = 0 and b σ = 0, we have an unbiased measurement.

Scalar observables
We now study the Gaussian approximate joint measurements of the target observables Q u and P u defined in (6).

Proposition 13. A Gaussian bi-observable M with parameters
where J is given by (47). In this case, the condition (49) is equivalent to Proof. The first statement follows from Proposition 11. Then, the matrix inequality (49) reads which is equivalent to (52).
We write C G u,v for the class of the Gaussian (u, v)-covariant phase-space observables. An observ-C G u,v able M ∈ C G u,v is thus characterized by the couple (µ M , V M ). From (50) with J M = J given by (47), we get that measuring M ∈ C G u,v on a Gaussian state ρ yields the probability distribution M ρ = N µ ρ u,v + µ M ; V ρ u,v + V M . Its marginals with respect to the first and second entry are, respectively, Example 2. Let us construct an example of an approximate joint measurement of Q u and P v , by using a noisy measurement of position along u followed by a sharp measurement of momentum along v. Let ∆ be a positive real number yielding the precision of the position measurement, and consider the POVM M on R 2 given by

The characteristic function of M is
Then, M ∈ C G u,v with parameters a M = 0, b M = 0, V M = 0 and J M = J given by (47). Note that M can be regarded as the limit case of the observables of the previous example when cos α = 0 and ∆ ↓ 0.

Entropic MURs for position and momentum
In the case of two discrete target observables, in [2] we found an entropic bound for the precision of their approximate joint measurements, which we named entropic incompatibility degree. Its definition followed a three steps procedure. Firstly, we introduced an error function: when the system is in a given state ρ, such a function quantifies the total amount of information that is lost by approximating the target observables by means of the marginals of a bi-observable; the error function is nothing else than the sum of the two relative entropies of the respective distributions. Then, we considered the worst possible case by maximizing the error function over ρ, thus obtaining an entropic divergence quantifying the approximation error in a state independent way. Finally, we got our index of the incompatibility of the two target observables by minimizing the entropic divergence over all bi-observables. In particular, when symmetries are present, we showed that the minimum is attained at some covariant bi-observables. So, the covariance followed as a byproduct of the optimization procedure, and was not a priori imposed upon the class of approximating bi-observables.
As we shall see, the extension of the previous procedure to position and momentum target observables is not straightforward, and peculiar problems of the continuous case arise. In order to overcome them, in this paper we shall fully analyse only a case in which explicit computations can be done: Gaussian preparations, and Gaussian bi-observables, which we a priori assume to be covariant. We conjecture that the final result should be independent of these simplifications, as we shall discuss in Section 7.
As we said in Section 5, by "approximate joint measurement" we mean "a bi-observable with the 'right' covariance properties".

Scalar observables
Given the directions u and v, the target observables are Q u and P v in (6) with pvm's Q u and P v . For ρ ∈ G with parameters (µ ρ , V ρ ) given in (12), the target distributions Q u,ρ and P v,ρ are normal with means and variances (11).
An approximate joint measurements of Q u and P v is given by a covariant bi-observable M ∈ C u,v ; then, we denote its marginals with respect to the first and second entry by M 1 and M 2 , respectively. For a Gaussian covariant bi-observable M ∈ C G u,v with parameters (µ M , V M ), the distribution of M in a Gaussian state ρ is normal, , so that its marginal distributions M 1,ρ and M 2,ρ are normal with means u · a ρ + a M and v · b ρ + b M and variances Let us recall that |u| = 1, |v| = 1, u · v = cos α, and that by (16) and (52), we have

Error function
The relative entropy is the amount of information that is lost when an approximating distribution is used in place of a target one. For this reason, we use it to give an informational quantification of the error made in approximating the distributions of sharp position and momentum by means of the marginals of a joint covariant observable.
Definition 9. Given the preparation ρ ∈ S and the covariant bi-observable M ∈ C u,v , the error function for the scalar case is the sum of the two relative entropies: The relative entropy is invariant under a change of the unit of measurement, so that the error function is scale invariant, too; indeed, it quantifies a relative error, not an absolute one. In the Gaussian case the error function can be explicitly computed. where and s : [0, +∞) → [0, +∞) is the following C ∞ strictly increasing function with s(0) = 0: Proof. The statement follows by a straightforward combination of (32), (34), (53) and (56).
Note that the error function does not depend on the mixed covariances u · C ρ v and V M 12 . Note also that, if we select a possible approximation M, then the error function S(ρ, M) decreases for states ρ with increasing sharp variances Var (Q u,ρ ) and Var (P v,ρ ): the loss of information decreases when the sharp distributions make the approximation error negligible. Finally, note that This means that, apart from the term ∆(ρ, M) due to the bias, our error function S(ρ, M) only depends on the two ratios "variance of the approximating distribution over variance of the target distribution". Thus, in order to optimize the error function, one has to optimize these two ratios. We use formula (57) to firstly give a state dependent MUR, and then, following the scheme of [2], a state independent MUR. A lower bound for the error function can be found by minimizing it over all possible approximate joint measurements of Q u and P v . First of all, let us remark that this minimization makes sense because we consider only (u, v)-covariant bi-observables: if we minimized over all possible bi-observables, then the minimum would be trivially zero for every given preparation ρ. Indeed, the trivial bi-observable M(A × When minimizing the error function over all (u, v)-covariant bi-observables, both the minimum and the best measurement attaining it are state dependent. When α = ±π/2, the two target observables are compatible, so that their joint measurement trivially exists (see Example 3) and we get inf M∈Cu,v S(ρ, M) = 0. In order to have explicit results for any angle α, we consider only the Gaussian case.
Theorem 15 (State dependent MUR, scalar observables). For every ρ ∈ G and M ∈ C G u,v , where the lower bound is with The lower bound is tight and the optimal measurement is unique: c ρ (α) = S(ρ, M * ), for a unique M * ∈ C G u,v ; such a Gaussian (u, v)-covariant bi-observable is characterized by Proof. As already discussed, the case cos α = 0 is trivial. If cos α = 0, we have to minimize the error function (57) over M. First of all we can eliminate the positive term ∆(ρ, M) by taking an unbiased measurement. Then, since s is an increasing function, by the second condition in (55) we can also take This implies V M * 12 = 0 by (52). In this case the error function (57) reduces to Var (Q u,ρ ) , with z ρ given by (61); by the first of (55), we have z ρ ∈ (0, 1]. Now, we can minimize the error function with respect to x by studying its first derivative: Having x > 0, we immediately get that x = z ρ gives the unique minimum. Thus S(ρ, M) ≥ S(ρ, M * ) = s(z ρ ) log e = log(1 + z ρ ) − z ρ 1 + z ρ log e, and which conclude the proof.
Remark 3. The minimum information loss c ρ (α) depends on both the preparation ρ and the angle α. When α = ±π/2, that is when the target observables are not compatible, c ρ (α) is strictly grater than zero. This is a peculiar quantum effect: given ρ, u and v, there is no Gaussian approximate joint measurement of Q u and P v that can approximate them arbitrarily well. On the other side, in the limit α → ±π/2, the lower bound c ρ (α) goes to zero; so, the case of commuting target observables is approached with continuity.
Remark 4. The lower bound c ρ (α) goes to zero also in the classical limit → 0. This holds for every angle α and every Gaussian state ρ.
Remark 5. Another case in which c ρ (α) → 0 is the limit of large uncertainty states, that is, if we let the product Var (Q u,ρ ) Var (P v,ρ ) → ∞: our entropic MUR disappears because, roughly speaking, the variance of (at least) one of the two target observables goes to infinity, its relative entropy vanishes by itself, and an optimal covariant bi-observable M * has to take care of (at most) only the other target observable.
Remark 6. Actually, something similar to the previous remark happens also at the macroscopic limit, and does not require the measuring instrument to be an optimal one; indeed, unbiasedness is enough in this case. This happens because the error function S(ρ, M) quantifies a relative error; even if the measurement approximation M is fixed, such an error can be reduced by suitably changing the preparation ρ. Indeed, if we consider the position and momentum of a macroscopic particle, for instance the center of mass of many particles, it is natural that its state has much larger position and momentum uncertainties than the intrinsic uncertainties of the measuring instrument; that is, Var(Qu,ρ) ≪ 1 and Var(Pv,ρ) ≪ 1, implying that the error function (57) is negligible. In practice, this is a classical case: the preparation has large position and momentum uncertainties and the measuring instrument is relatively good. In this situation we do not see the difference between the joint measurement of position and momentum and their separate sharp observations. Remark 7. The optimal approximating joint measurement M * ∈ C G u,v is unique; by (62) it depends on the preparation ρ one is considering, as well as on the directions u and v. A realization of M * is the measuring procedure of Example 2. Remark 9. For cos α = 0, we get inf M∈C G u,v S(ρ, M) = s(z ρ ) log e, where z ρ is defined by (61). As z ρ ranges in the interval (0, 1], the quantity inf M∈C G u,v S(ρ, M) takes all the values in the interval 0, 1 − log e 2 , so that sup In order to get this result, we needed cos α = 0; however, the final result does not depend on α. Therefore, in the sup ρ inf M -approach of (63), the continuity from quantum to classical is lost.
Now we want to find an entropic quantification of the error made in observing M ∈ C u,v as an approximation of Q u and P v in an arbitrary state ρ. The procedure of [2], already suggested in [25, Sect. VI.C] for a different error function, is to consider the worst case by maximizing the error function over all the states. However, in the continuous framework this is not possible for the error function (56); indeed, from (57) we get sup ρ∈G S(ρ, M) = +∞ even if we restrict to unbiased covariant bi-observables. Anyway, the reason for S(ρ, M) to diverge is classical: it depends only on the continuous nature of Q u and P v , without any relation to their (quantum) incompatibility. Indeed, as we noted in Section 3.1, if an instrument measuring a random variable X ∼ N (a; α 2 ) adds an independent noise ν ∼ N (b; β 2 ), thus producing an output X + ν ∼ N (a + b; α 2 + β 2 ), then the relative entropy S(X X + ν) diverges for α 2 → 0; this is what happens if we fix the noise and we allow for arbitrarily peaked preparations. Thus, the sum S(Q u,ρ M 1,ρ ) + S(P v,ρ M 2,ρ ) diverges if, fixed M, we let Var(Q u,ρ ) or Var(P v,ρ ) go to 0.
The difference between the classical and quantum frameworks emerges if we bound from below the variances of the sharp position and momentum observables. Indeed, in the classical framework we have inf b,β 2 sup α 2 ≥ǫ S(X X + ν) = 0 for every ǫ > 0; the same holds for the sum of two relative entropies if no relation exists between the two noises. On the contrary, in the quantum framework the entropic MURs appear due to the relation between the position and momentum errors occurring in any approximate joint measurement.
In order to avoid that S(ρ, M) → +∞ due to merely classical effects, we thus introduce the following subset of the Gaussian states: and we evaluate the error made in approximating Q u and P v with the marginals of a (u, v)-covariant bi-observable by maximizing the error function over all these states.
For Gaussian M, depending on the choice of the thresholds ǫ 1 and ǫ 2 , the divergence D G ǫ (Q u , P v M) can be easily computed or at least bounded.
Proof. By Proposition 4, maximizing the error function over the states in G u,v ǫ is the same as maximizing (57) with (54) over the parameters Var (Q u,ρ ) and Var (P v,ρ ) satisfying (55) and (64). (cos α) 2 , the thresholds themselves satisfy Heisenberg uncertainty relation, and so equality (66) follows from the expression (57) and the fact the functions s(x), s(y), ∆(ρ, M) are decreasing in Var (Q u,ρ ) and Var (P v,ρ ).

Entropic incompatibility degree of Q u and P v
The last step is to optimize the state independent ǫ-entropic divergence (65) over all the approximate joint measurements of Q u and P v . This is done in the next definition.
Again, depending on the choice of the thresholds ǫ 1 and ǫ 2 , the entropic incompatibility degree c G inc (Q u , P v ; ǫ) can be easily computed or at least bounded. (cos α) 2 , the incompatibility degree c G inc (Q u , P v ; ǫ) is given by The infimum in (68) is attained and the optimal measurement is unique, in the sense that for a unique M ǫ ∈ C G u,v ; such a bi-observable is characterized by The latter bound is where the state ρ ǫ (u, v) is defined in item (ii) of Theorem 16 and M ǫ is the bi-observable in C G u,v such that Proof. (i) In the case ǫ 1 ǫ 2 ≥ 2 4 (cos α) 2 , due to (66), the proof is the same as that of Theorem 15 with the replacements Var (Q u,ρ ) → ǫ 1 and Var (P v,ρ ) → ǫ 2 .
Remark 11 (State independent MUR, scalar observables). By means of the above results, we can formulate a state independent entropic MUR for the position Q u and the momentum P v in the following way. Chosen two positive thresholds ǫ 1 and ǫ 2 , there exists a preparation ρ ǫ (u, v) ∈ G u,v ǫ (introduced in Theorem 16) such that, for all Gaussian approximate joint measurements M of Q u and P v , we have The inequality follows by (66) and (69)  What is relevant is that, for every approximate joint measurement M, the total information loss S(ρ, M) does exceed the lower bound (75) even if the set of states G u,v ǫ forbids preparations ρ with too peaked target distributions. Indeed, without the thresholds ǫ 1 , ǫ 2 , it would be trivial to exceed the lower bound (75), as we noted in Section 6.1.2.
We also remark that, chosen ǫ 1 and ǫ 2 , we found a single state ρ ǫ (u, v) in G u,v ǫ that satisfies (75) for every M, so that ρ ǫ (u, v) is a 'bad' state for all Gaussian approximate joint measurements of position and momentum.
When ǫ 1 ǫ 2 ≥ 2 4 (cos α) 2 , the optimal approximate joint measurement M ǫ is unique in the class of Gaussian (u, v)-covariant bi-observables; it depends only on the class of preparations G u,v ǫ : it is the best measurement for the worst choice of the preparation in the class G u,v ǫ . Remark 12. The entropic incompatibility degree c G inc (Q u , P v ; ǫ) is strictly positive for cos α = 0 (incompatible target observables) and it goes to zero in the limits α → ±π/2 (compatible observables), → 0 (classical limit), and ǫ 1 ǫ 2 → ∞ (large uncertainty states). Remark 13. The scale invariance of the relative entropy extends to the error function S(ρ, M), hence to the divergence D G ǫ (Q u , P v M) and the entropic incompatibility degree c G inc (Q u , P v ; ǫ), as well as the entropic MUR (75).

Vector observables
Now the target observables are Q and P given in (3), with pvm's Q and P; the approximating bi-observables are the covariant phase-space observables C of Definition 5. Each bi-observable M ∈ C is of the form M = M σ for some σ ∈ S, where M σ is given by (43). C G is the subset of the Gaussian bi-observables in C, and M σ ∈ C G if and only if σ is a Gaussian state.
We proceed to define the analogues of the scalar quantities introduced in Sections 6.1.1, 6.1.2, 6.1.3. In order to do it, in the next proposition we recall some known results on matrices.

Error function
Definition 12. Given the preparation ρ ∈ S and the covariant phase-space observable M σ , with σ ∈ S, the error function for the vector case is the sum of the two relative entropies: As in the scalar case, the error function is scale invariant, it quantifies a relative error, and we always have S(ρ, M σ ) > 0 because position and momentum are incompatible. Indeed, since the marginals of a bi-observable M σ ∈ C turn out to be convolutions of the respective sharp observables Q and P with some probability densities on R n , Q ρ = M σ 1,ρ and P ρ = M σ 2,ρ for all states ρ; this is an easy consequence, for instance, of [15,Problem 26.1,p. 362]. In the Gaussian case the error function can be explicitly computed.
Proposition 19 (Error function for the vector Gaussian case). For ρ, σ ∈ G, the error function has the two equivalent expressions: where the function s is defined in (58), and Proof. First of all, recall that A direct application of (34) yields We can transform this equation by using ln det (½ + E ρ,σ ) = Tr {ln (½ + E ρ,σ )} , This gives In the same way a similar expression is obtained for S(P ρ M σ 2,ρ ) and (77a) is proved. On the other hand, by using and the analogous expressions involving B ρ and R ρ,σ , one gets (77b).

State dependent lower bound
In principle, a state dependent lower bound for the error function could be found by analogy with Theorem 15, by taking again the infimum over all joint covariant measurements, that is inf σ S(ρ, M σ ). By considering only Gaussian states ρ and measurements M σ , from (18), (77a), (78a) the infimum over σ ∈ G can be reduced to an infimum over the matrices A σ : The above equality follows since the monotonicity of s (Proposition 18) implies that the trace term in (77a) attains its minimum when B σ = 2 4 (A ρ ) −1 . However, it remains an open problem to explicitly compute the infimum over the matrices A σ when the preparation ρ is arbitrary.
Nevertheless, the computations can be done at least for a preparation ρ * of minimum uncertainty (Proposition 6). Indeed, by (22) we get Now we can diagonalize E ρ,σ and minimize over its eigenvalues; since s(x) + s(x −1 ) attains its minimum value at x = 1, this procedure gives E ρ,σ = ½. So, by denoting by σ * the state giving the minimum, we For an arbitrary ρ ∈ G, we can use the last formula to deduce an upper bound for inf σ∈G S(ρ, M σ ). Indeed, if ρ * is a minimum uncertainty state with A ρ * = A ρ , then B ρ ≥ 2 4 (A ρ ) −1 = B ρ * by (19), and, using again the state σ * of (79), we find The second inequality in the last formula follows from (77b), (78b) and the monotonicity of s (Prop. 18).

Entropic divergence of Q, P from M σ
In order to define a state independent measure of the error made in regarding the marginals of M σ as approximations of Q and P, we can proceed along the lines of the scalar case in Section 6.1.2. To this end, we introduce the following vector analogue of the Gaussian states defined in (64): In the vector case, Definition 10 then reads as follows.
As in the scalar case, when M σ is Gaussian, depending on the choice of the product ǫ 1 ǫ 2 , we can compute the divergence D G ǫ (Q, P M σ ) or at least bound it from below. Theorem 20. Let the bi-observable M σ ∈ C G be fixed.
, the divergence D G ǫ (Q, P M σ ) is given by where ρ ǫ is any Gaussian state with A ρǫ = ǫ 1 ½ and B ρǫ = ǫ 2 ½.
Remark 15. For n = 1, the vector lower bound in (92) reduces to the scalar lower bound found in (75) for two parallel directions u and v; for n ≥ 1, the bound linearly grows with n.
By (58) the function s is increasing and in a neighborhood of zero it behaves as s(x) ≃ x 2 /2; in the present case δ 1 /ǫ 1 ≪ 1 and δ 2 /ǫ 2 ≪ 1 and, so, we have that the error function is negligible. This is practically a 'classical' case: the preparation has 'large' position and momentum uncertainties and the measuring instrument is 'relatively good'. In this situation we do not see the difference between the joint measurement of position and momentum and their separate sharp distributions. Of course the bound (92) continues to hold, but it is also negligible since ǫ 1 ǫ 2 ≫ 2 /4.
Remark 18. The scale invariance of the relative entropy extends also in the vector case to the error function S(ρ, M σ ), the divergence D G ǫ (Q, P M σ ) and the entropic incompatibility degree c G inc (Q, P; ǫ), as well as the entropic MUR (92). Indeed, let us consider the dimensionless versions of position and momentum (35) and their associated projection valued measures Q, P introduced in Section 4. Accordingly, we rescale the joint measurement M σ of (43) in the same way, obtaining the POVM M σ (B) = B M σ ( x, p)d xd p, Here, both the vector variables x and p, as well as the components of the Borel set B, are dimensionless. By the scale invariance of the relative entropy, the error function takes the same value as in the dimensioned case: S( Q ρ M σ 1,ρ ) + S( P ρ M σ 2,ρ ) = S(Q ρ M σ 1,ρ ) + S(P ρ M σ 2,ρ ).

Conclusions
We have extended the relative entropy formulation of MURs given in [2] from the case of discrete incompatible observables to a particular instance of continuous target observables, namely the position and momentum vectors, or two components of them along two possibly non parallel directions. The entropic MURs we found share the nice property of being scale invariant and well-behaved in the classical and macroscopic limits. Moreover, in the scalar case, when the angle spanned by the position and momentum components goes to ±π/2, the entropic bound correctly reflects their increasing compatibility by approaching zero with continuity. Although our results are limited to the case of Gaussian preparation states and covariant Gaussian approximate joint measurements, we conjecture that the bounds we found still hold for arbitrary states and general (not necessarily covariant or Gaussian) bi-observables. Let us see with some more detail how this should work in the case when the target observables are the vectors Q and P .
The most general procedure should be to consider the error function S(Q ρ M 1,ρ ) + S(P ρ M 2,ρ ) for an arbitrary POVM M on R n × R n and any state ρ ∈ S. First of all, we need states for which neither the position nor the momentum dispersion are too small; the obvious generalization of the test states (81) is Then, the most general definitions of the entropic divergence and incompatibility degree are: It may happen that Q ρ is not absolutely continuous with respect to M 1,ρ , or P ρ with respect to M 2,ρ ; in this case, the error function and the entropic divergence take the value +∞ by definition. So, we can restrict to bi-observables that are (weakly) absolutely continuous with respect to the Lebesgue measure. However, the true difficulty is that, even with this assumption, here we are not able to estimate (94), hence (95). It could be that the symmetrization techniques used in [25,65] can be extended to the present setting, and one can reduce the evaluation of the entropic incompatibility index to optimizing over all covariant biobservables. Indeed, in the present paper we a priori selected only covariant approximating measurements; we would like to understand if, among all approximating measurements, the relative entropy approach selects covariant bi-observables by itself. However, even if M is covariant, there remains the problem that we do not know how to evaluate (94) if ρ and M are not Gaussian. It is reasonable to expect that some continuity and convexity arguments should apply, and the bounds in Theorem 21 could be extended to the general case by taking dense convex combinations. Also the techniques used for the PURs in [12,14] could be of help in order to extend what we did with Gaussian states to arbitrary states. This leads us to conjecture: c inc (Q, P; ǫ) = c G inc (Q, P; ǫ).
Conjecture (96) is also supported since the uniqueness of the optimal approximating bi-observable in Theorem 21.(i) is reminiscent of what happens in the discrete case of two Fourier conjugated mutually unbiased bases (MUBs); indeed, in the latter case, the optimal bi-observable is actually unique among all the bi-observables, not only the covariant ones [2,Theor. 5]. Similar considerations obviously apply also to the case of scalar target observables. We leave a more deep investigation of equality (96) to future work. As a final consideration, one could be interested in finding error/disturbance bounds involving sequential measurements of position and momentum, rather than considering all their possible approximate joint measurements. As sequential measurements are a proper subset of the set of all the bi-observables, optimizing only over them should lead to bounds that are greater than c inc . This is the reason for which in [2] an error/disturbance entropic bound, denoted by c ed and dinstinct from c inc , was introduced. However, it was also proved that the equality c inc = c ed holds when one of the target observables is discrete and sharp. Now, in the present paper, only sharp target observables are involved; although the argument of [2] can not be extended to the continuous setting, the optimal approximating joint observables we found in Theorems 17.(i) and 21.(i) actually are sequential measurements. Indeed, the optimal bi-observable in Theorem 17.(i) is one of the POVMs described in Examples 2 and 3 (see (74)); all these bi-observables have a (trivial) sequential implementation in terms of an unsharp measurement of Q u followed by sharp P v . On the other hand, in the vector case, it was shown in [29, Corollary 1] that all covariant phase-space observables can be obtained as a sequential measurement of an unsharp version of the position Q followed by the sharp measurement of the momentum P. Therefore, c inc = c ed also for target position and momentum observables, in both the scalar and vector case.