Some Remarks on Classical and Classical-Quantum Sphere Packing Bounds: Rényi vs. Kullback–Leibler

Marco Dalai

doi:10.3390/e19070355

Department of Information Engineering, University of Brescia, Via Branze 38, 25123 Brescia, Italy

^†

This paper is an extended version of our paper published in the Proceedings of the 2016 International Zurich Seminar on Communications, Zurich, Switzerland, 2–4 March 2016.

Entropy2017, 19(7), 355;https://doi.org/10.3390/e19070355

This article belongs to the Section Information Theory, Probability and Statistics

Version Notes

Order Reprints

Abstract

We review the use of binary hypothesis testing for the derivation of the sphere packing bound in channel coding, pointing out a key difference between the classical and the classical-quantum setting. In the first case, two ways of using the binary hypothesis testing are known, which lead to the same bound written in different analytical expressions. The first method historically compares output distributions induced by the codewords with an auxiliary fixed output distribution, and naturally leads to an expression using the Renyi divergence. The second method compares the given channel with an auxiliary one and leads to an expression using the Kullback–Leibler divergence. In the classical-quantum case, due to a fundamental difference in the quantum binary hypothesis testing, these two approaches lead to two different bounds, the first being the “right” one. We discuss the details of this phenomenon, which suggests the question of whether auxiliary channels are used in the optimal way in the second approach and whether recent results on the exact strong-converse exponent in classical-quantum channel coding might play a role in the considered problem.

Keywords:

channel coding; sphere packing bound; classical-quantum channels; hypothesis testing

1. Introduction

One of the central problems in coding theory deals with determining upper and lower bounds on the probability of error when communication over a given channel is attempted at some rate R. The capacity of the channel C is defined as the highest rate at which communication is possible with probability of error that vanishes as the blocklength of the code grows to infinity (see [1,2,3]). At rates

R < C

, it is known that the probability of error vanishes exponentially fast in the blocklength, and a classic problem in information theory is the determination of that exponential speed or, as is it customary to say, of the error exponent. This problem was dealt with in the classical setting back in the 1960s, when most of the still strongest results were obtained [4,5,6,7,8]. Instead, for classical-quantum channels, the topic is relatively more recent; first results were obtained around 1998 ([9,10]) and new ones are still in progress.

An important bound on error exponents is the so-called sphere packing bound, a fundamental lower bound on the probability of error of optimal codes and hence an upper bound on achievable error exponents. This particular result was first derived in different forms in the 1960s for classical channels (of different types) and more recently in [11,12,13] for classical-quantum channels. The aim of this paper is to present a detailed and self-contained discussion of the differences between the classical and classical-quantum settings, pointing out connections with an important open problem first suggested by Holevo in [10] and possibly with recent results derived by Mosonyi and Ogawa in [14].

2. The Problem

We consider a classical-quantum channel with finite input alphabet

X

and associated density operators

W_{x}

,

x \in X

, in a finite dimensional Hilbert space

H

. The n-fold product channel acts in the tensor product space

H = H^{\otimes n}

of n copies of

H

. To a sequence

x = (x_{1}, x_{2}, \dots, x_{n})

, we associate the signal state

W_{x} = W_{x_{1}} \otimes W_{x_{2}}, \dots, \otimes W_{x_{n}}

. A block code with M codewords is a mapping from a set of M messages

{1, \dots, M}

into a set of M codewords

{x_{1}, \dots, x_{M}}

, and the rate of the code is

R = (\log M) / n

. A quantum decision scheme for such a code, or Positive-Operator Valued Measure (POVM), is a collection of M positive operators

{Π_{1}, Π_{2}, \dots, Π_{M}}

such that

\sum Π_{m} = I

, where I is the identity operator. The probability that message

m^{'}

is decoded when message m is transmitted is

P_{m^{'} | m} = Tr Π_{m^{'}} W_{x_{m}}

and the probability of error after sending message m is

P_{e | m} = 1 - Tr (Π_{m} W_{x_{m}}) .

The maximum error probability of the code is defined as the largest

P_{e | m}

; that is,

P_{e, \max} = \max_{m} P_{e | m} .

When all the operators

W_{x}

commute, the channel is classical and we will use the classical notation

W_{x} (y)

to indicate the eigenvalues of the operators, which are the transition probabilities from inputs x to outputs

y \in Y

. Similarly,

W_{x} (y)

will represent the transition probabilities from input sequences

x

to output sequences

y \in Y^{n}

. In the classical case, it can be proved that optimal decision schemes can always be assumed to have separable measurements which commute with the states. Hence, we will use the classical notation

W_{x_{m}} (Y_{m})

in place of

Tr Π_{m} W_{x_{m}}

, where

Y_{m} \in Y^{n}

is the decoding region for message m.

Let

P_{e, \max}^{(n)} (R)

be the smallest maximum error probability among all codes of length n and rate at least R. We define the reliability function of the channel as

E (R) = \underset{n \to \infty}{\lim \sup} - \frac{1}{n} \log P_{e, \max}^{(n)} (R) .

(1)

In this paper, we focus on the so-called sphere packing upper bound on

E (R)

, which states that

E (R) \leq E_{sp} (R)

(2)

where

E_{sp} (R) = \max_{P} E_{sp}^{cc} (R, P)

(3)

and

\begin{matrix} E_{sp}^{cc} (R, P) & = \sup_{0 < s < 1} [E_{0}^{cc} (s, P) - \frac{s}{1 - s} R], \end{matrix}

(4)

\begin{matrix} E_{0}^{cc} (s, P) & = \min_{Q} [\frac{1}{s - 1} \sum_{x} P (x) \log Tr (W_{x}^{1 - s} Q^{s})], \end{matrix}

(5)

the minimum being over density operators Q. Here

E_{sp}^{cc} (R, P)

is an upper bound on the error exponent achievable by so-called constant composition codes; that is, such that in each codeword symbols appear with empirical frequency P. For classical channels,

E_{0}^{cc} (s, P)

is written in the standard notation as

E_{0}^{cc} (s, P) = \min_{Q} [\frac{1}{s - 1} \sum_{x} P (x) \log \sum_{y} W_{x} {(y)}^{1 - s} Q {(y)}^{s}] .

(6)

3. Binary Hypothesis Testing

3.1. Classical Case

We start by recalling that in classical binary hypothesis testing between two distributions

P_{0}

and

P_{1}

on some set

V

, based on n independent extractions, the trade-off of the achievable exponents for the error probabilities of the first and second kind can be expressed parametrically, for

0 < s < 1

, as (e.g., [7])

\begin{matrix} - \frac{1}{n} \log {P_{e}}_{| 0} & = - μ (s) + s μ^{'} (s) + o (1) \end{matrix}

(7)

\begin{matrix} - \frac{1}{n} \log {P_{e}}_{| 1} & = - μ (s) - (1 - s) μ^{'} (s) + o (1) \end{matrix}

(8)

where

μ (s) = \log \sum_{v \in V} P_{0} {(v)}^{1 - s} P_{1} {(v)}^{s} .

(9)

The quantity

μ (s)

defined above is actually a scaled version of the Rényi divergence, usually defined as

D_{α} (P ∥ Q) = \frac{1}{α - 1} \log \sum_{v \in V} P {(v)}^{α} Q {(v)}^{1 - α} .

(10)

We have in fact

μ (s) = - s D_{1 - s} (P_{0} ∥ P_{1})

. A key role in the derivation of the above result is played by the tilted mixture

P_{s}

, defined as

P_{s} (v) = \frac{P_{0} {(v)}^{1 - s} P_{1} {(v)}^{s}}{\sum_{v^{'}} P_{0} {(v^{'})}^{1 - s} P_{1} {(v^{'})}^{s}} .

(11)

Roughly speaking, the probability of error for the optimal test is essentially due to the set of those sequences in

V^{n}

with empirical distribution close to

P_{s}

.

A graphical representation relating the above equations suggested in [7] is shown in Figure 1. Figure 2 shows an interpretation of the role of the Rényi divergence. Note that one has the well-known property

\begin{matrix} \lim_{α \to 1} D_{α} (P ∥ Q) & = \sum_{v \in V} P (v) \log \frac{P (v)}{Q (v)} \end{matrix}

(12)

\begin{matrix} = D_{KL} (P ∥ Q), \end{matrix}

(13)

which explains the endpoints of the curve in Figure 2. In particular (though some technicalities would be needed for a rigorous derivation), the quantity

D_{KL} (P ∥ Q)

governs the “Stein regime”; if in the binary hypothesis test

{P_{e}}_{| 0}

is only required to be bounded away from 1 as

n \to \infty

, then

- \frac{1}{n} \log {P_{e}}_{| 1}

is asymptotically upper-bounded by

D_{KL} (P_{0} ∥ P_{1})

. This can be stated equivalently as saying that regions

S_{n} \subseteq V^{n}

for which

P_{0} (S_{n}) > ϵ

satisfy

P_{1} (S_{n}) > e^{- n D_{KL} (P_{0} ∥ P_{1}) + o (n)}

.

Figure 1. Interpretation of the error exponents in binary hypothesis testing from [7].

Figure 2. Error exponents in binary hypothesis testing.

An explicit computation of the derivatives

μ^{'} (s)

, or just a different way of deriving the bound, shows that equivalent expressions for the error exponents are (see for example [2])

\begin{matrix} - \frac{1}{n} \log {P_{e}}_{| 0} & = D_{KL} (P_{s} ∥ P_{0}) + o (1) \end{matrix}

(14)

\begin{matrix} - \frac{1}{n} \log {P_{e}}_{| 1} & = D_{KL} (P_{s} ∥ P_{1}) + o (1) \end{matrix}

(15)

where

P_{s}

is the tilted mixture already defined in (11). This second representation gives another interpretation of the result. As said for the previous approach, the error events essentially occur in the set of sequences in

V^{n}

with empirical distribution close to

P_{s}

, whose total probabilities under

P_{0}

and

P_{1}

vanish, according to Stein’s lemma as mentioned above, with exponents given by

D_{KL} (P_{s} ∥ P_{0})

and

D_{KL} (P_{s} ∥ P_{1})

, respectively. One can notice that the problem of determining the trade-off of the error exponents in the test between

P_{0}

and

P_{1}

is essentially reduced to the problem of testing

P_{s}

against

P_{i}

,

i = 0, 1

in the Stein regime where

{P_{e}}_{| s}

is bounded away from 1.

3.2. Quantum Case

In a binary hypothesis testing between two density operators

σ_{0}

and

σ_{1}

, based on n independent extractions (but with global measurement), the error exponents of the first and second kind can be expressed parametrically as (see [15]):

\begin{matrix} - \frac{1}{n} \log {P_{e}}_{| σ_{0}} & = - μ (s) + s μ^{'} (s) + o (1) \end{matrix}

(16)

\begin{matrix} - \frac{1}{n} \log {P_{e}}_{| σ_{1}} & = - μ (s) - (1 - s) μ^{'} (s) + o (1) \end{matrix}

(17)

where, in complete analogy with the classical case,

μ (s) = \log Tr σ_{0}^{1 - s} σ_{1}^{s} .

(18)

Upon differentiation, one finds for example for (16):

- \frac{1}{n} \log {P_{e}}_{| σ_{0}} = - \log Tr (σ_{0}^{1 - s} σ_{1}^{s}) + Tr [\frac{σ_{0}^{1 - s} σ_{1}^{s}}{Tr σ_{0}^{1 - s} σ_{1}^{s}} (\log σ_{1}^{s} - \log σ_{0}^{s})] + o (1) .

When

σ_{0}

and

σ_{1}

commute (i.e., in the classical case), we can define the density operator

σ_{s} = \frac{σ_{0}^{1 - s} σ_{1}^{s}}{Tr σ_{0}^{1 - s} σ_{1}^{s}}

(19)

and use the property

\log σ_{1}^{s} - \log σ_{0}^{s} = \log σ_{0}^{1 - s} σ_{1}^{s} - \log σ_{0}

to obtain

\begin{matrix} - \frac{1}{n} \log {P_{e}}_{| σ_{0}} & = Tr σ_{s} (\log σ_{s} - \log σ_{0}) + o (1) \end{matrix}

(20)

\begin{matrix} = D_{KL} (σ_{s} ∥ σ_{0}) + o (1) . \end{matrix}

(21)

In a similar way, we find

\begin{matrix} - \frac{1}{n} \log {P_{e}}_{| σ_{1}} & = D_{KL} (σ_{s} | | σ_{1}) + o (1) . \end{matrix}

(22)

This is indeed the second form of the bound as already mentioned in Section 3.1. However, if

σ_{0}

and

σ_{1}

do not commute, the above simplification is not possible. Hence, the two error exponents cannot be expressed in terms of the Kullback–Leibler divergence. So, unlike in the classical binary hypothesis testing, the problem of determining the trade-off of the error exponents in the test between

σ_{0}

and

σ_{1}

cannot be reduced to the problem of testing some

σ_{s}

against

σ_{i}

,

i = 0, 1

in the Stein regime.

To verify that this is actually a property of the quantum binary hypothesis testing and not an artificial effect of the procedure used, it is useful to consider the case of pure states; that is, when operators

σ_{0}

and

σ_{1}

have rank 1, say

σ_{0} = | ψ_{0} ⟩ ⟨ ψ_{0} |

and

σ_{1} = | ψ_{1} ⟩ ⟨ ψ_{1} |

, with non-orthogonal

ψ_{0}

and

ψ_{1}

. In this case,

σ_{0}^{1 - s} = σ_{0}

and

σ_{1}^{s} = σ_{1}

, so one simply has

\begin{matrix} μ (s) & = \log Tr σ_{0} σ_{1} \end{matrix}

(23)

\begin{matrix} = \log | ⟨ ψ_{0} | ψ_{1} ⟩ |^{2}, \end{matrix}

(24)

and at least one of the two error exponents is not larger than

- \log | ⟨ ψ_{0} | ψ_{1} ⟩ |^{2}

. These quantities cannot be expressed as

D_{KL} (σ_{s} ∥ σ_{i})

,

i = 0, 1

for any

σ_{s}

, because

D_{KL} (ρ ∥ σ_{i}) = \{\begin{matrix} 0 & ρ = σ_{i} \\ + \infty & ρ \neq σ_{i} \end{matrix}, i = 0, 1,

(25)

since

σ_{0}

and

σ_{1}

are pure.

4. Classical Sphere-Packing Bound

Two proofs are known for the classical version of the bound, which naturally lead to two equivalent yet different analytical expressions for the function

E_{sp} (R)

. The first was developed at the Massachusetts Institute of Technology (MIT) ([5,7]) while the other is due to Haroutunian [16,17]. A preliminary technical feature common to both procedures is that they both focus on some constant-composition sub-code which has virtually the same rate as the original code, but where all codewords have the same empirical composition P. In both cases, then, the key ingredient is binary hypothesis testing (BHT).

4.1. The MIT Proof

The first proof (see [5,7]) is based on a binary hypothesis test between the output distributions

W_{x_{m}}

induced by the codewords

x_{1}, \dots, x_{M}

and an auxiliary output product distribution

Q = Q^{\otimes n}

on

Y^{n}

. Let

Y_{m} \subseteq Y^{n}

be the decision region for message m. Since

Q

is a distribution, for at least one m, we have

\begin{matrix} Q (Y_{m}) & \leq 1 / M \end{matrix}

(26)

\begin{matrix} = e^{- n R} . \end{matrix}

(27)

Considering a binary hypothesis test between

W_{x_{m}}

and

Q

, with

Y_{m}

as decision region for

W_{x_{m}}

, Equation (26) gives an exponential upper bound on the probability of error under hypothesis

Q

, which implies a lower bound on the probability of error under hypothesis

W_{x_{m}}

, which is

W_{x_{m}} (\bar{Y_{m}})

, the probability of error for message m. Here the BHT is considered in the regime where both probabilities decrease exponentially. The standard procedure uses the first form of the bound mentioned in the previous section based on the Rényi divergence. The bound can be extended to the case of testing products of non-identical distributions; for the pair of distributions

W_{x_{m}} = W_{x_{m, 1}} \otimes, \dots, W_{x_{m, n}}

and

Q = Q \otimes, \dots, \otimes Q

, it gives the performance of an optimal test in the form

\begin{matrix} - \frac{1}{n} \log {P_{e}}_{| W_{x_{m}}} & = - μ (s) + s μ^{'} (s) + o (1) \end{matrix}

(28)

\begin{matrix} - \frac{1}{n} \log {P_{e}}_{| Q} & = - μ (s) - (1 - s) μ^{'} (s) + o (1) \end{matrix}

(29)

where now

μ (s) = \sum_{x} P (x) [\log \sum_{y \in Y} W_{x} {(y)}^{1 - s} Q {(y)}^{s}] .

(30)

At this point, the arguments in [5,7] diverge a bit; while the former is not rigorous, it has the advantage of giving the tight bound for the arbitrary codeword composition P. The latter is instead rigorous, but only gives the tight bound for the optimal composition P. In [13], we proposed a variation which we believe to be rigorous and that at the same time gives the tight bound for an arbitrary composition P. The need for this variation will be clear in the discussion of classical-quantum channels in the next section.

For the test based on the decoding region

Y_{m}

, the left hand side of (29) is lower-bounded by R due to (26). So, if we choose s and Q in such a way that the right hand side of (29) is roughly

R - ϵ

, then

- (1 / n) \log {P_{e}}_{| W_{x_{m}}}

must be smaller than the right hand side of (28) computed for those same s and Q (for otherwise the decision region

Y_{m}

would give a test strictly better than the optimal one). This is obtained by choosing Q, as a function of s, as the minimizer of

- μ (s)

and then selecting s which makes the right hand side of (29) equal to

R - ϵ

(whenever possible). Extracting

μ^{'} (s)

from (29) in terms of

μ (s)

and R and using it in (28), the probability of error for message m is bounded in terms of R. After some tedious technicalities, cf. [13] (Appendix A), we get

- \frac{1}{n} \log {P_{e}}_{| W_{x_{m}}} \leq \sup_{0 < s < 1} [E_{0}^{cc} (s, P) - \frac{s}{1 - s} (R - ϵ)] + o (1)

(31)

where

\begin{matrix} E_{0}^{cc} (s, P) & = \min_{Q} [\frac{1}{s - 1} \sum_{x} P (x) \log \sum_{y} W_{x} {(y)}^{1 - s} Q {(y)}^{s}] \end{matrix}

(32)

\begin{matrix} = \min_{Q} [\frac{s}{1 - s} \sum_{x} P (x) D_{1 - s} (W_{x} ∥ Q)] \end{matrix}

(33)

\begin{matrix} = \frac{s}{1 - s} I_{1 - s} (P, W), \end{matrix}

(34)

the minimum being over distributions Q and

I_{α} (P, W)

being the

α

-mutual information as defined by Csiszár [18]. We thus find the bound, valid for codes with constant composition P

- \frac{1}{n} \log P_{e, \max} \leq \sup_{0 < s < 1} \frac{s}{1 - s} [I_{1 - s} (P, W) - R + ϵ] + o (1) .

(35)

It is worth pointing out that the chosen Q, which achieves the minimum in the definition of

E_{0} (s, P)

, satisfies the constraint (cf. [5] (Equations (9.23), (9.24), and (9.50)), [19] (Corollary 3))

Q (y) = \sum_{x} P (x) V_{x} (y), \forall y \in Y,

(36)

where we define

V_{x} (y)

as

V_{x} (y) = \frac{W_{x}^{1 - s} (y) Q^{s} (y)}{\sum_{y^{'}} W_{x}^{1 - s} (y^{'}) Q^{s} (y^{'})}

(37)

note the analogy with the definition of

P_{s}

in (11). So, the chosen Q is such that its tilted mixtures with the distributions

W_{x}

induce Q itself on the output set

Y

. Using the second representation of the error exponents in binary hypothesis testing mentioned in Section 3.1 (extended for independent extractions from non-identical distributions), we observe thus that the chosen Q induces the construction of an auxiliary channel V such that the induced mutual information with input distribution P, say

I (P, V)

, equals

D_{KL} (V ∥ Q | P) = \sum_{x} P (x) D_{KL} (V_{x} ∥ Q) = R - ϵ

. The second proof of the sphere packing bound, which is summarized in the next section, takes this line of reasoning as a starting point.

4.2. Haroutunian’s Proof

In the second proof (see [16,17]), one considers the performance of the given coding scheme for channel W when used for an auxiliary channel V with same input and output sets such that

I (P, V) < R

. The converse to the coding theorem implies that the probability of error for channel V is bounded away from zero at rate R, which means that there exists a fixed

ϵ > 0

such that for any blocklength n,

V_{x_{m}} (\bar{Y_{m}}) > ϵ

for at least one m. Using the Stein lemma mentioned before, we deduce that

- \frac{1}{n} \log W_{x_{m}} (\bar{Y_{m}}) \leq n D_{KL} (V ∥ W | P) + o (1)

(38)

where now

D_{KL} (V ∥ W | P) = \sum_{x} P (x) \sum_{y} V (y) \log \frac{V (y)}{W (y)} .

(39)

After optimization over V, we deduce that the error exponent for channel W is bounded as

- \frac{1}{n} \log {P_{e}}_{| W_{x_{m}}} \leq \min_{V : I (P, V) \leq R} D_{KL} (V ∥ W | P) + o (1) .

(40)

We observe that a slightly different presentation (e.g., [17]) avoids the use of the Stein lemma by resorting to the strong converse rather than a weak converse. Indeed, for channel V, the coding scheme will actually incur an error probability

1 - o (1)

, which means that for at least one codeword m we must have

V_{x_{m}} (\bar{Y_{m}}) = 1 - o (1)

. Applying the data processing inequality for the Kullback–Leibler divergence, one thus finds that

V_{x_{m}} (\bar{Y_{m}}) \log \frac{V_{x_{m}} (\bar{Y_{m}})}{W_{x_{m}} (\bar{Y_{m}})} + V_{x_{m}} (Y_{m}) \log \frac{V_{x_{m}} (Y_{m})}{W_{x_{m}} (Y_{m})} \leq n D_{KL} (V ∥ W | P)

(41)

from which

\log W_{x_{m}} (\bar{Y_{m}}) \geq - \frac{n D_{KL} (V ∥ W | P) + 1}{1 + o (1)} .

(42)

So, strong converse can be traded for Stein’s lemma, and this fact (which appears as a detail here) will be seen to be related to a less trivial question.

The bound derived is precisely the same as in the previous section, and for the optimal choice of the channel V, if we define the output distribution

Q = P V

as in (36), then (37) is satisfied for some s (see Equation (19) in [16]). So, we notice that the two proofs actually rely on a comparison between the original channel and equivalent auxiliary channels/distributions. In the first procedure, we start with an auxiliary distribution Q, but we find that the optimal choice of Q is such that the tilted mixtures with the

W_{x}

distributions are the

V_{x}

which give

P V = Q

. In the second procedure, we start with the auxiliary channel V, but we find that the optimal V induces an output distribution Q whose tilted mixtures with the

W_{x}

are the

V_{x}

themselves. It is worth noting that in this second procedure we use a converse for channel V; hidden in this step we are using the output distribution Q induced by V, which we directly use for W in the MIT approach.

These observations point out that while the MIT proof follows the first formulation of the binary hypothesis testing bound in terms of Rényi divergences, Haroutunian’s proof exploits the second formulation based on Kullback–Leiblrer divergences, but the compared quantities are equivalent. There seems to be no reason to prefer the first procedure given the simplicity of the second one.

5. Classical-Quantum Sphere-Packing Bound

The different behavior of binary hypothesis testing in the quantum case with respect to the classical has a direct impact on the sphere packing bound for classical-quantum channels. Both the MIT and Haroutunian’s approaches can be extended to this setting, but the resulting bounds are different. In particular, since the binary hypothesis testing is correctly handled with the Rényi divergence formulation, the MIT form of the bound extends to what one expects as the right generalization (in particular, it matches known achievability bounds for pure-state channels), while Haroutunian’s form extends to a weaker bound. It was already observed in [20] that the latter gives a trivial bound for all pure state channels, which is a direct consequence of what has already been shown for the simple binary hypothesis testing in the previous section.

It is useful to investigate this weakness at a deeper level in order to clearly see where the problem truly is. Let now

W_{x}

,

x \in X

be general non-commuting density operators, the states of the channel to be studied. Consider then an auxiliary classical-quantum channel with states

V_{x}

and with capacity

C < R

. Again, the converse to the channel coding theorem holds for channel V, which implies that for any decoding rule, for at least one message the probability of error is larger than some fixed positive constant

ϵ

. In particular for the given POVM, for at least one m,

Tr (I - Π_{m}) V_{x_{m}} > ϵ .

(43)

Using the quantum Stein lemma, we deduce

- \frac{1}{n} \log Tr (I - Π_{m}) W_{x_{m}} > D_{KL} (V ∥ W | P) + o (1) .

(44)

and hence, again as in the classical case,

- \frac{1}{n} \log {P_{e}}_{| W_{x_{m}}} \leq \min_{V : I (P, V) \leq R} D_{KL} (V ∥ W | P) + o (1) .

(45)

In this case as well, one can use a strong converse to replace the Stein lemma with a simpler data processing inequality.

The problem we encounter in this case is that if W is a pure state channel, at rates

R < C

, any auxiliary channel

V \neq W

gives

D_{KL} (V ∥ W | P) = \infty

, so that the bound is trivial for all pure state channels. It is important to observe that this is not due to a weakness in the use of the Stein lemma or of the data processing inequality. In a binary hypothesis test between the pure state

W_{x_{m}}

and a state

V_{x_{m}}

built from a different channel V, one can notice that the POVM

{A, I - A}

with

A = W_{x_{m}}

satisfies

Tr (I - A) V_{x_{m}} = 1 + o (1), Tr (I - A) W_{x_{m}} = 0 .

(46)

So, it is actually impossible to deduce a positive lower bound for

Tr (I - Π_{m}) W_{x_{m}}

using only the fact that

Tr (I - Π_{m}) V_{x_{m}}

is bounded away from zero, or even approaches one.

It is also worth checking what happens with the MIT procedure. All the steps can be extended to the classical-quantum case (see [13] for details) leading to a bound which has the same form as (31) where

E_{0}^{cc} (s, P)

is defined in analogy with (32) as

\begin{matrix} E_{0}^{cc} (s, P) & = \min_{Q} [\frac{1}{s - 1} \sum_{x} P (x) \log Tr W_{x}^{1 - s} Q^{s}] \end{matrix}

(47)

\begin{matrix} = \min_{Q} [\frac{s}{1 - s} \sum_{x} P (x) D_{1 - s} (W_{x} ∥ Q)], \end{matrix}

(48)

the minimum being over all density operators Q, and

D_{1 - s} (\cdot ∥ \cdot)

being the quantum Rényi divergence. However, as far as we know there is no analog of Equations (36) and (37), and the optimizing Q does not induce an auxiliary V such that

I (P, V) = R - ϵ

.

6. Auxiliary Channels and Strong Converses

We have presented the two main approaches to sphere packing as different procedures which are equivalent in the classical case but not in the classical-quantum case. However, it is actually possible to consider the two approaches as particular instances of one general approach where the channel W is compared to an auxiliary channel V, since the auxiliary distribution/state Q can be considered as a channel with constant

V_{x} = Q

. This principle is very well described in [21], where it is shown that essentially all known converse bounds in channel coding can be cast in this framework.

According to this interpretation, the starting point in Haroutunian’s proof is general enough to include the MIT approach as a special case. So, the weakness of the method in the classical-quantum case must be hidden in one of the intermediate steps. It is not difficult to notice that the key point is how the (possibly strong) converse is used in Haroutunian’s proof. The general auxiliary channel V is only assumed to have capacity

C < R

, and the strongest possible converse for V which can used is of the simple form

P_{e} = 1 - o (1)

, which is good enough in the classical case. In the MIT proof, instead, the auxiliary channel is such that

C = 0

, so that the strong converse takes another simple form,

P_{e} \geq 1 - e^{- n R}

. The critical point is that in the classical-quantum setting a converse of the form

P_{e} = 1 - o (1)

for V does not lead to a lower bound on

P_{e}

for W in general. What is needed is a sufficiently fast exponential convergence to 1 of

P_{e}

for channel V, which essentially suggests that V should be chosen with capacity not too close to R, and that the exact strong converse exponent for V should be used.

The natural question to ask at this point is what the optimal (here we mean optimal memoryless channel for bounding the error exponent in the asymptotic regime) auxiliary channel is when the exact exponent of the strong converse is used. At high rates, the question is not really meaningful for all those cases where the known versions of the sphere packing bound coincide with achievability results; that is, for classical channels and for pure state channels [9]. However, in the remaining cases (i.e., in the low rate region for the mentioned channels or in the whole range of rates

0 < R < C

for general non-commuting mixed-state channels), the question is legitimate. In the classical case, since the choice of an (optimal) auxiliary channel with

C = 0

or

C = R^{-}

leads to the same result, one might expect that any other intermediate choice would give the same result. This can be indeed be proved by noticing that any version of the sphere packing derived with the considered scheme, independently of the used auxiliary channel, will always hold also when list decoding is considered for any fixed list-size L (see [7] for details or notice that the converse to the coding theorem for V would also hold in this setting). Since the bound obtained with the mentioned choices of auxiliary Q and V is achievable at any rate R when list-size decoding is used with sufficiently large list-size L (see [3] (Prob. 5.20)), no other auxiliary channel can give a better bound.

For classical-quantum channels, instead, the question is perhaps not trivial; it is worth pointing out that even the exact strong converse exponent has been determined only very recently [14]. What is very interesting is that while in the classical case the strong converse exponent for

R > C

is expressed in terms of Rényi divergence [22,23] ( similarly as error exponents for

R < C

), for classical-quantum channels, the strong converse exponents are expressed in terms of the so-called “sandwiched” Rényi divergence defined by

{\tilde{D}}_{α} (ρ, σ) = \frac{1}{α - 1} \log Tr {(σ^{\frac{1 - α}{2 α}} ρ σ^{\frac{1 - α}{2 α}})}^{α} .

(49)

The problem to study would thus be more or less as follows: Consider an auxiliary channel V with capacity

C < R

and evaluate its strong converse exponent in terms of sandwiched Rényi divergences. Fix this exponent as the probability of error under hypothesis

V_{x_{m}}

in a test between

W_{x_{m}}

and

V_{x_{m}}

, where

Π_{m}

is the operator in favor of

W_{x_{m}}

and

I - Π_{m}

is the one in favor of

V_{x_{m}}

. Then, deduce a lower bound for the probability of error under hypothesis

W_{x_{m}}

using the standard binary hypothesis testing bound in terms of Rényi divergences. It is not entirely clear to this author that the optimal auxiliary channel should necessarily always be one such that

C = 0

, as used up to now. Since for non-commuting mixed-state channels the current known form of sphere packing bound is not yet matched by any achievability result, one cannot exclude the possibility that it is not the tightest possible form.

Acknowledgments

This research was supported by the Italian Ministry of Education, University and Research (MIUR) under grant PRIN 2015 D72F1600079000.

Conflicts of Interest

The author declares no conflict of interest.

References

Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley: New York, NY, USA, 1990. [Google Scholar]
Gallager, R.G. Information Theory and Reliable Communication; Wiley: New York, NY, USA, 1968. [Google Scholar]
Shannon, C.E. Certain results in coding theory for noisy channels. Inf. Control 1957, 1, 6–25. [Google Scholar] [CrossRef]
Fano, R.M. Transmission of Information: A Statistical Theory of Communication; Wiley: New York, NY, USA, 1961. [Google Scholar]
Gallager, R.G. A Simple Derivation of the Coding Theorem and Some Applications. IEEE Trans. Inf. Theory 1965, 11, 3–18. [Google Scholar] [CrossRef]
Shannon, C.E.; Gallager, R.G.; Berlekamp, E.R. Lower Bounds to Error Probability for Coding in Discrete Memoryless Channels. I. Inf. Control 1967, 10, 65–103. [Google Scholar] [CrossRef]
Shannon, C.E.; Gallager, R.G.; Berlekamp, E.R. Lower Bounds to Error Probability for Coding in Discrete Memoryless Channels. II. Inf. Control 1967, 10, 522–552. [Google Scholar] [CrossRef]
Burnashev, M.V.; Holevo, A.S. On the Reliability Function for a Quantum Communication Channel. Probl. Peredachi Inf. 1998, 34, 3–15. [Google Scholar]
Holevo, A.S. Reliability Function of General Classical-Quantum Channel. IEEE Trans. Inf. Theory 2000, 46, 2256–2261. [Google Scholar] [CrossRef]
Dalai, M. Sphere Packing Bound for Quantum Channels. In Proceedings of the IEEE International Symposium on Information Theory, Cambridge, MA, USA, 1–6 July 2012; pp. 160–164. [Google Scholar]
Dalai, M. Lower Bounds on the Probability of Error for Classical and Classical-Quantum Channels. IEEE Trans. Inf. Theory 2013, 59, 8027–8056. [Google Scholar] [CrossRef]
Dalai, M.; Winter, A. Constant Composition in the Sphere Packing Bound for Classical-Quantum Channels. In Proceedings of the 2014 IEEE International Symposium on Information Theory, Honolulu, HI, USA, 29 June–4 July 2014; pp. 151–155. [Google Scholar]
Mosonyi, M.; Ogawa, T. Strong converse exponent for classical-quantum channel coding. arXiv, 2014; arXiv:1409.3562. [Google Scholar]
Audenaert, K.; Nussbaum, M.; Szkoła, A.; Verstraete, F. Asymptotic Error Rates in Quantum Hypothesis Testing. Commun. Math. Phys. 2008, 279, 251–283. [Google Scholar] [CrossRef]
Haroutunian, E.A. Estimates of the Error Exponents for the semi-continuous memoryless channel. Probl. Peredachi Inf. 1968, 4, 37–48. (In Russian) [Google Scholar]
Csiszár, I.; Körner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems; Academic Press: Cambridge, MA, USA, 1981. [Google Scholar]
Csiszár, I. Generalized Cutoff Rates and Rényi’s Information Measures. IEEE Trans. Inf. Theory 1995, 41, 26–34. [Google Scholar] [CrossRef]
Blahut, R.E. Hypothesis testing and Information theory. IEEE Trans. Inf. Theory 1974, 20, 405–417. [Google Scholar] [CrossRef]
Winter, A. Coding Theroems of Quantum Information Theory. Ph.D. Thesis, Universität Bielefeld, Bielefeld, Germany, July 1999. [Google Scholar]
Polyanskiy, Y.; Poor, H.; Verdu, S. Channel Coding Rate in the Finite Blocklength Regime. IEEE Trans. Inf. Theory 2010, 56, 2307–2359. [Google Scholar] [CrossRef]
Arimoto, S. On the converse to the coding theorem for discrete memoryless channels. IEEE Trans. Inf. Theory 1973, 19, 357–359. [Google Scholar] [CrossRef]
Polyanskiy, Y.; Verdú, S. Arimoto channel coding converse and Rényi divergence. In Proceedings of the 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, NY, USA, 29 September–1 October 2010; pp. 1327–1333. [Google Scholar]

© 2017 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Some Remarks on Classical and Classical-Quantum Sphere Packing Bounds: Rényi vs. Kullback–Leibler^†

Abstract

1. Introduction

2. The Problem

3. Binary Hypothesis Testing

3.1. Classical Case

3.2. Quantum Case

4. Classical Sphere-Packing Bound

4.1. The MIT Proof

4.2. Haroutunian’s Proof

5. Classical-Quantum Sphere-Packing Bound

6. Auxiliary Channels and Strong Converses

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Some Remarks on Classical and Classical-Quantum Sphere Packing Bounds: Rényi vs. Kullback–Leibler †

Abstract

1. Introduction

2. The Problem

3. Binary Hypothesis Testing

3.1. Classical Case

3.2. Quantum Case

4. Classical Sphere-Packing Bound

4.1. The MIT Proof

4.2. Haroutunian’s Proof

5. Classical-Quantum Sphere-Packing Bound

6. Auxiliary Channels and Strong Converses

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Some Remarks on Classical and Classical-Quantum Sphere Packing Bounds: Rényi vs. Kullback–Leibler^†