Correlation Distance and Bounds for Mutual Information

The correlation distance quantifies the statistical independence of two classical or quantum systems, via the distance from their joint state to the product of the marginal states. Tight lower bounds are given for the mutual information between pairs of two-valued classical variables and quantum qubits, in terms of the corresponding classical and quantum correlation distances. These bounds are stronger than the Pinsker inequality (and refinements thereof) for relative entropy. The classical lower bound may be used to quantify properties of statistical models that violate Bell inequalities. Entangled qubits can have a lower mutual information than can any two-valued classical variables having the same correlation distance. The qubit correlation distance also provides a direct entanglement criterion, related to the spin covariance matrix. Connections of results with classically-correlated quantum states are briefly discussed.


Introduction
The relative entropy between two probability distributions has many applications in classical and quantum information theory. A number of these applications, including the conditional limit theorem [1], and secure random number generation and communication [2,3], make use of lower bounds on the relative entropy in terms of a suitable distance between the two distributions. The best known such bound is the so-called Pinsker inequality [4] H(P Q) := j P (j)[log P (j) − log Q(j)] ≥ 1 2 D(P, Q) 2 log e, where D(P, Q) := P −Q 1 = j |P (j)−Q(j)| is the variational or L1 distance between distributions P and Q. Note that choice of logarithm base is left open throughout this paper, corresponding to a choice of units.
There are a number of such bounds [4], all of which easily generalise to the case of quantum probabilities [5,6]. However, in a number of applications of the Pinsker inequality and its quantum analog, a lower bound is in fact only needed for the special case that the relative entropy quantifies the mutual information between two systems. Such applications include, for example, secure random number generation and coding [2,3] (both classical and quantum), and quantum de Finnetti theorems [7]. Since mutual information is a special case of relative entropy, it follows that it may be possible to find strictly stronger lower bounds for mutual information.
Surprisingly little attention appears to have been paid to this possiblity of better lower bounds (although upper bounds for mutual information have been investigated [8]). The results of preliminary investigations are given here, with explicit tight lower bounds being obtained for pairs of two-valued classical random variables, and for pairs of quantum qubits with maximally-mixed reduced states.
In the context of mutual information, the corresponding variational distance reduces to the distance between the joint state of the systems and the product of their marginal states, referred to here as the 'correlation distance'. It is shown that both the classical and quantum correlation distances are relevant to quantifying properties of quantum entanglement: the former with respect to the classical resources required to simulate entanglement, and the latter as providing a criterion for qubit entanglement. In the quantum case, it is also shown that the minimum value of the mutual information can only be achieved by entangled qbuits if the correlation distance is more than ≈ 0.72654.
The main results are given in the following section. Lower bounds on classical and quantum mutual information for two-level systems are derived in sections 3 and 5, and an entanglement criterion for qubits in terms of the quantum correlation distance is obtained in Section 4. Connections with classically-correlated quantum states are briefly discussed in section 6, and conclusions presented in section 7.

Definitions and Main Results
For two classical random variables A and B, with joint probability distribution PAB(a, b) and marginal distributions PA(a) and PB(b), the Shannon mutual information and the classical correlation distance are defined respectively by where H(P ) := − j P (j) log P (j) denotes the Shannon entropy of distribution P . The term 'correlation distance' is used for C(PAB), since it inherits all the properties of a distance from the more general variational distance, and clearly vanishes for uncorrelated A and B.
For two quantum systems A and B described by density operator ρAB and reduced density operators ρA and ρB, the corresponding quantum mutual information and quantum correlation distance are analogously defined by where S(ρ) := −tr[ρ log ρ] denotes the von Neumann entropy of density operator ρ.
In both the classical and quantum cases, one has the lower bound for mutual information, as a direct consequence of the Pinsker inequality (1) for classical relative entropies [4,5,6]. However, better bounds for mutual information can be obtained, which are stronger than any general inequality for relative entropy and variational distance. For example, for two-valued classical random variables A and B one has the tight lower bound for classical mutual information. This inequality has been previously stated without proof in Ref. [9], where it was used to bound the shared information required to classically simulate entangled quantum systems. It is proved in section 3 below. In contrast to Pinsker-type inequalities such as Eq. (2), the quantum generalisation of Eq. (3) is not straightforward. In particular, note for a two-qubit system that one cannot simply replace PAB by ρAB in Eq. (3), as the right hand side would be undefined for C(ρAB) > 1 -which can occur if the qubits are entangled. Indeed, as shown in section 4, C(ρAB) > 1 is a sufficient condition for the entanglement of two qubits, as is the stronger condition An explicit expression for the quantum correlation distance for two qubits, in terms of the spin covariance matrix, is also given in section 4. It is shown in section 5 that the quantum equivalent of Eq. (3), i.e., a tight lower bound for the quantum mutual information shared by two qubits, is C(ρAB) ≤ C0, when the reduced density operators are maximally mixed, where C0 ≈ 0.72654. For C(ρAB) > C0 this lower bound can only be achieved by entangled states, and cannot be achieved by any classical distribution PAB having the same correlation distance. It is also shown that, for C(ρAB) > C0, the bound is also tight if only one of the reduced states is maximally mixed. Support is given for the conjecture that the bound in Eq. (5) in fact holds for all two-qubit states. In section 6 the natural role of 'classically-correlated' quantum states, in comparing classical and quantum correlations, is briefly discussed. Such states have the general form ρAB = j,k P (j, k)|j, k j, k| [10], where P (j, k) is a classical joint probability distribution and {|j } and {|k } are orthonormal basis sets for the two quantum systems. The lower bound in Eq. (5) can be saturated by a classically-correlated state if and only if C ≤ C0.
Now, Eq. (3) is equivalent to It is easy to check that this inequality is always saturated for the case of maximally-random marginals, i.e, when x = y = 0. In all other cases, the inequality may be proved by showing that f (r) has a unique global minimum value of 0 at r = 0. In particular, note first that f (0) = 0 (one has PAB = PAPB in this case, so that the mutual information vanishes). Further, using PAB(a, b) = [(1 + ax)(1 + by) + abr]/4, one easily calculates that, using logarithm base e for convenience, Hence, f (r) = 0 if and only if the argument of the logarithm is unity, i.e., if and only if Expanding and simplifying yields two possible solutions: r = 0, or r = (x 2 + y 2 − x 2 y 2 )/(2xy). However, in the latter case one has where α and γ denote the arithmetic mean and geometric mean, respectively, of x 2 and y 2 (hence α ≥ γ). This is clearly inconsistent with the positivity condition (6) (unless x = y = 0, which trivially saturates Eq. (7) for all r as noted above). The only remaining solution to f (r) = 0 is then r = 0, implying f (r) has a unique maximum or minimum value at r = 0. Finally, it is easily checked that it is a minimum, since (with equality only for the trivially-saturating case x = y = 0). Thus, f (r) ≥ f (0) = 0 as required.

Application: Resources for Simulating Bell Inequality Violation
The hallmark feature of quantum correlations is that they cannot be explained by any underlying statistical model that satisfies three physically very plausible properties: (i) no signaling faster than the speed of light, (ii) free choice of measurement settings, and (iii) independence of local outcomes. Various interpretations of quantum mechanics differ in regard to which of these properties should be given up. It is of interest to consider by how much they must be given up, in terms of the information-theoretic resources required to simulate a given quantum correlation. For example, how many bits of communication, or bits of correlation between the source and the measurement settings, or bits of correlation between the outcomes, are required? The lower bound for classical mutual information in Eq. (3) is relevant to the last of these questions. In more detail, if PAB(a, b) denotes the joint probability of outcomes a and b, for measurements of variables A and B on respective spacelike-separated systems, and λ denotes any underlying variables relevant to the correlations, then Bayes theorem implies that where summation is replaced by integration over any continuous values of λ. The no-signaling property requires that the underlying marginal distribution of A, pA(a|λ), is independent of whether B or B was measured on the second system (and vice versa), while the free-choice property requires that λ is independent of the choice of the measured variables A and B, i.e., that pAB(λ) = p A B (λ) for any A, A , B, B . Finally, the outcome independence property requires that any observed correlation between A and B arises from ignorance of the underlying variable, i.e., that PAB(a, b|λ) = PA(a|λ) PB(b|λ) for all A, B and λ. Thus the correlation distance of PAB(a, b|λ) vanishes identically: As is well known, the assumption of all three properties implies that two-valued random variables with values ±1 must satisfy the Bell inequality [11] AB whereas quantum correlations can violate this inequality by as much as a factor of √ 2. It follows that quantum correlations can only be modeled by relaxing one or more of the above properties, as has recently been reviewed in detail in Ref. [9].
For example, assuming that no-signaling and measurement independence hold (as they do in the standard Copenhagen interpretation of quantum mechanics), and defining Cmax to be the maximum value of C(P AB|λ ) over all A, B and λ, it can be shown that Eq. (9) generalises to the tight bound [9] AB It follows that to simulate a Bell inequality violation AB + AB + A B − A B = 2+V , for some V > 0, the observers must share random variables having a correlation distance of at least Cmax ≥ 2V /(2 + V ). Hence, using the classical lower bound Eq. (3) (stated without proof in Ref. [9]), the observers must share a minimum mutual information of Note this reduces to zero in the limit of no violation of Bell inequality (9), i.e., when V = 0, and reaches a maximum of 1 bit of information in the limit of the maximum possible violation, V = 2.

Quantum Correlation Distance and Qubit Entanglement
The positivity condition (6) may be used to show that the classical correlation distance between any pair of two-valued random variables is never greater than unity, i.e., that C(PAB) = |r| ≤ 1 [9]. In contrast, the quantum correlation distance between a pair of qubits can be greater than unity, with upper bound C(ρAB) ≤ 3/2. More generally, one has for pairs of n-valued random variables and n-level quantum systems, with saturation corresponding to maximal correlation and maximal entanglement respectively. Thus, quantum correlations have a quadratic advantage with respect to correlation distance (this is also the case for mutual information, for which one has I(PAB) ≤ log n and I(ρAB) ≤ log n 2 ). Nonclassical values of the quantum correlation distance are closely related to the quintessential nonclassical feature of quantum mechanics: entanglement. In particular, C(ρAB) > 1 is a direct signature of qubit entanglement. Indeed, even correlation distances smaller than unity can imply two qubits are entangled, as per the criterion given in Eq. (4) and shown below. An explicit formula for qubit correlation distance in terms of the spin covariance matrix, needed for section 5, is also obtained below.

Entanglement Criterion
Recall that the density operator ρAB of two qubits may always be written in the Fano form [12] ρAB = 1 4 Here I is the unit operator; {σj} denotes the set of Pauli spin observables on each qubit Hilbert space; the components of the 3-vectors u and v are the spin expectation values uj := σj ⊗ 1 and v := 1 ⊗ σ k , for A and B respectively; and T denotes the 3 × 3 spin covariance matrix with coefficients It immediately follows from Eq. (13) that the quantum correlation distance may be expressed in terms of the spin covariance matrix as This expression will be further simplified in subsection 4.2. Now consider the case where ρAB is a separable state, i.e., of the unentangled form Note that second line follows from the properties X + Y 1 ≤ X 1 + Y 1 and XY 1 ≤ X 1 Y 1 of the trace norm; the third line using X 1 = tr[ √ X † X] and the Schwarz inequality; and the last line via |u(λ)|, |v(λ)| ≤ 1.
Equation (15) holds for all separable qubit states. Hence, a nonclassical value of the correlation distance, C(ρAB) > 1, immediately implies that the qubits must be entangled. More generally, noting that ρA = The fact that entanglement is required between two qubits, for C(ρAB) to be greater than the maximum possible value of C(PAB) for two-valued classical variables, is a nice distinction between quantum and classical correlation distances. It would be of interest to determine whether this result generalises to n-level systems. This would follow from the validity of Eq. (4) for arbitrary quantum systems.

Explicit Expression for C(ρ AB )
To explicitly evaluate C(ρAB) in Eq. (14), let T = KDL T denote a singular value decomposition of the spin covariance matrix. Thus, K and L are real orthogonal matrices and D = diag[t1, t2, t3], with the singular values t1 ≥ t2 ≥ t3 ≥ 0 corresponding to the square roots of the eigenvalues of T T T . Noting that any 3 × 3 orthogonal matrix is either a rotation matrix, or the product of a rotation matrix with the parity matrix −I, one therefore always has a decomposition of the form T = ±KDL T where K and L are now restricted to be rotation matrices. Hence, defining unitary operators U and V corresponding to rotations K and L, via U σjU † = j,j K jj σ j and V σjV † = j,j L jj σ j , and using the invariance of the trace norm under unitary transformations, the quantum correlation distance in Eq. (14) can be rewritten as Determining the eigenvalues of the Hermitian operator j tj σj ⊗ σj is a straighforward 4 × 4 matrix calcuation using the standard representation of the Pauli sigma matrices. Summing the absolute values of these eigenvalues then yields the explicit expression for the quantum correlation distance, in terms of the singular values of the spin covariance matrix.
For example, for the Werner state ρAB = p|ψ ψ| + (1 − p)/4 I ⊗ I, where |ψ is the singlet state and −1/3 ≤ p ≤ 1 [13], one has T = −pI and hence that t1 = t2 = t3 = |p|. The corresponding correlation distance is therefore 3|p|/2, which is greater than the classical maximum of unity for p > 2/3. Equation (16) also allows the qubit entanglement criterion (4) to be directly compared with strongest known criterion based on the spin covariance matrix [14]: For the above Werner state this criterion is tight, indicating entanglement for p > 1/3. Hence, the main interest in weaker entanglement criteria based on quantum correlation distance lies in their direct connection with nonclassical values of the classical correlation distance.

Tight Lower Bound for Quantum Mutual Information
Here Eq. (5) is derived for the case ρA = ρB = 1 2 I. Evidence is provided for the conjecture that Eq. (5) in fact holds for all two-qubit states, including a partial generalisation of Eq. (5) when only one of ρA and ρB is maximally-mixed.
Numerical comparison shows that H3(C) > H1(C) for C > C0 ≈ 0.72654, and H3(C) ≤ H1(C) otherwise. Hence, from Eq. (19) one has the tight lower bound and any local unitary transformations thereof, where the quantum correlation distance of ρ(C) is C by construction.

Conjecture
It is conjectured that Eq. (5) is in fact a tight lower bound for any two-qubit state. This conjecture would follow immediately if it could be shown that it is straightforward to show that F = 0 and ∂F/∂rj = 0 for r1 = r2 = r3 = 0, consistent with F ≥ 0. However, it remains to be shown that the gradient ∂F/∂rj = 0 does not vanish for other physically possible values of the rj (other than for the trivially saturating case ρA = ρB = (1/2)I).
The above conjecture is further supported by the generalisation of Eq. (5) in the following section.

Generalisation to Maximally-Mixed ρ A or ρ B
It is straighforward to show that the lower bound on quantum mutual information is tight for C ≥ C0 when just one of the mixed density operators is mixed, i.e., if ρA or ρB is equal to (1/2)I. Second, let T denote the 'twirling' operation, corresponding to applying a random unitary transformation of the form U ⊗ U [15]. It is easy to check that by definition T (I ⊗ I) = I ⊗ I, T (I ⊗ σj) = 0 = T (σj ⊗ 1) and T (σj ⊗ σj) = T (σ k ⊗ σ k ), for any j and k. Since Werner states are invariant under twirling [13,15], it follows that T (σj ⊗ σj) = (1/3) k σ k ⊗ σ k . Using these properties, one finds that T (ρA ⊗ρB) = (1/4)I ⊗ I if one ofρA orρB is maximally mixed, and hence that for C ≥ C0. Since Werner states are invariant under twirling, this inequality is tight for α = −1, being saturated by the choiceρAB = ρ(C). Recalling that mutual information and correlation distance are invariant under local unitary operations, the inequality is therefore tight for any ρAB for which one of ρA and ρB is maximally mixed, as claimed.

Classically-Correlated Quantum States
It is well known that a quantum system behaves classically if the state and the observables of interest all commute, i.e., if they can be simultaneously diagonalised in some basis. Hence, a joint state will behave classically if the relevant observables of each system commute with each other and the state. It is therefore natural to define ρAB to be classically correlated if and only if it can be diagonalised in a joint basis [10], i.e., if and only if ρAB = j,k P (j, k)|j j| ⊗ |k k| for some distribution P (j, k) and orthonormal basis set {|j ⊗ |k }. Classical correlation is preserved by tensor products, and by mixtures of commuting states. While, strictly speaking, a classically-correlated quantum state only behaves classically with respect to observables that are diagonal with respect to |j ⊗ |k , they also have a number of classical correlation properties with respect to general observables [10,16], briefly noted here.
First, ρAB above is separable by construction, and hence is unentangled. Second, since it is diagonal in the basis {|j ⊗ |k }, the mutual information and correlation distance are easily calculated as and hence can only take classical values. Third, if M and N denote any observables for systems A and B respectively, then their joint statistics are given by where S m,n;j,k = p(m|j) p(n|k) is a stochastic matrix with respect to its first and second pairs of indices. Similarly, one finds PM (m) PN (n) = j,k S m,n;j,k P (j) P (k) for the product of the marginals. Since the classical relative entropy and variational distance can only decrease under the action of a stochastic matrix, it follows that one has the tight inequalities [10,16] I(PMN ) ≤ I(P ) = I(ρAB), C(PMN ) ≤ C(P ) = C(ρAB), where x, y ∈ [−1, 1] and r satisfies Eq. (6). Hence, the mutual information is bounded by the classical lower bound in Eq. (3), and ρ(C) in Eq. (23) is classically correlated for C ≤ C0. It follows that the lower bound for quantum mutual information in Eq. (5) can be attained by classically-correlated states if C ≤ C0. Conversely, the minimum possible bound cannot be reached by any classically-correlated two-qubit state if C > C0.

Conclusion
Lower bounds for mutual information have been obtained that are stronger than those obtainable from general bounds for relative entropy and variational distance. Unlike the Pinsker inequality in Eq. (2), the quantum form of these bounds is not a simple generalisation of the classical form.
Similarly to the case of upper bounds for (classical) mutual information [8], the tight lower bounds obtained here depend on the dimension of the systems. The results of this paper represent a preliminary investigation largely confined to two-valued classical variables and qubits. It would be of interest to generalise both the classical and quantum cases, and to further investigate connections between them.
Open questions include whether a quantum correlation distance greater than the corresponding maximum classical correlation distance is a signature of entanglement for higher-dimensional systems, and whether the related qubit entanglement criterion in Eq. (4) holds more generally. The conjecture in section 5.2, as to whether the quantum lower bound in Eq. (5) is valid for all two-qubit states, also remains to be settled. Finally, it would be of interest to generalise and to better understand the role of the transition from classically-correlated states to entangled states in saturating information bounds, in the light of Eq. (23) for qubits.