Achievable Information Rates for Probabilistic Amplitude Shaping: An Alternative Approach via Random Sign-Coding Arguments

Gültekin, Yunus Can; Alvarado, Alex; Willems, Frans M. J.

doi:10.3390/e22070762

Open AccessArticle

Achievable Information Rates for Probabilistic Amplitude Shaping: An Alternative Approach via Random Sign-Coding Arguments

by

Yunus Can Gültekin

^*

,

Alex Alvarado

and

Frans M. J. Willems

Information and Communication Theory Lab, Signal Processing Systems Group, Department of Electrical Engineering, Eindhoven University of Technology, 5600 MB Eindhoven, The Netherlands

^*

Author to whom correspondence should be addressed.

Entropy 2020, 22(7), 762; https://doi.org/10.3390/e22070762

Submission received: 3 April 2020 / Revised: 6 July 2020 / Accepted: 8 July 2020 / Published: 11 July 2020

(This article belongs to the Special Issue Information Theory for Communication Systems)

Download

Browse Figures

Versions Notes

Abstract

Probabilistic amplitude shaping (PAS) is a coded modulation strategy in which constellation shaping and channel coding are combined. PAS has attracted considerable attention in both wireless and optical communications. Achievable information rates (AIRs) of PAS have been investigated in the literature using Gallager’s error exponent approach. In particular, it has been shown that PAS achieves the capacity of the additive white Gaussian noise channel (Böcherer, 2018). In this work, we revisit the capacity-achieving property of PAS and derive AIRs using weak typicality. Our objective is to provide alternative proofs based on random sign-coding arguments that are as constructive as possible. Accordingly, in our proofs, only some signs of the channel inputs are drawn from a random code, while the remaining signs and amplitudes are produced constructively. We consider both symbol-metric and bit-metric decoding.

Keywords:

probabilistic amplitude shaping; achievable information rate; random coding; symbol-metric decoding; bit-metric decoding

1. Introduction

Coded modulation (CM) refers to the design of forward error correction (FEC) codes and high-order modulation formats, which are combined to reliably transmit more than one bit per channel use. Examples of CM strategies include multilevel coding (MLC) [1,2] in which each address bit of the signal point is protected by an individual binary FEC code, and trellis CM [3], which combines the functions of a trellis-based channel code and a modulator. Among many CM strategies, bit-interleaved CM (BICM) [4,5], which combines a high-order modulation format with a binary FEC code using a binary labeling strategy and uses bit-metric decoding (BMD) at the receiver, is the de-facto standard for CM. BICM is included in multiple wireless communication standards such as the IEEE 802.11 [6] and the DVB-S2 [7]. BICM is also currently the de-facto CM alternative for fiber optical communications.

Proposed in [8], probabilistic amplitude shaping (PAS) integrates constellation shaping into existing BICM systems. The shaping gap that exists for the additive white Gaussian noise (AWGN) channel [9] (Ch. 9) can be closed with PAS. To this end, an amplitude shaping block converts binary information strings into shaped amplitude sequences in an invertible manner. Then, a systematic FEC code produces parity bits encoding the binary labels of these amplitudes. These parity bits are used to select the signs, and the combination of the amplitudes and the signs, i.e., probabilistically shaped channel inputs, are transmitted over the channel. PAS has attracted considerable attention in fiber optical communications due to its availability of providing rate adaptivity [10,11].

Achievable information rates (AIRs) of PAS have been investigated in the literature [12,13,14]. It has been shown that the capacity of the AWGN channel can be achieved with PAS, e.g., in [13] (Example 10.4). The achievability proofs in the literature are based on Gallager’s error exponent approach [15] (Ch. 5) or on strong typicality [16] (Ch. 1).

In this work, we provide a random sign-coding framework based on weak-typicality that contains the achievability proofs relevant for the PAS architecture. We also revisit the capacity-achieving property of PAS for the AWGN channel. As explained in Section 2.5, the first main contribution of this paper is to provide a framework that combines the constructive approach to amplitude shaping with randomly-chosen error-correcting codes, where the randomness is concentrated only in the choice of the signs. The second contribution is to provide a unifying framework of achievability proofs to bring together PAS results that are somewhat scattered in the literature, using a single proof technique, which we call the random sign-coding arguments.

This work is organized as follows. In Section 2, we briefly summarize the related literature on CM, AIRs, and PAS and state our contribution. In Section 3, we provide some background information on typical sequences and define a modified (weakly) typical set. In Section 4, we explain the random sign-coding setup. Finally in Section 5, we provide random sign-coding arguments to derive AIRs for PAS and, consequently, show that it achieves the capacity of a discrete-input memoryless channel with a symmetric capacity-achieving distribution. Conclusions are drawn in Section 6.

2. Related Work and Our Contribution

2.1. Notation

Capital letters X are used to denote random variables, while lower case letters x are used to denote their realizations. Underlined capital and lower case letters

\underset{̲}{X}

and

\underset{̲}{x}

are used to denote random vectors and their realizations, respectively. Boldface capital and lower case letters

X

and

x

are used to denote collections of random variables and their realizations, respectively. Underlined boldface capital and lower case letters

\underset{̲}{X}

and

\underset{̲}{x}

are used to denote collections of random vectors and their realizations, respectively. Element-wise multiplication of

\underset{̲}{x}

and

\underset{̲}{y}

is denoted by

\underset{̲}{x} \otimes \underset{̲}{y}

. Calligraphic letters

X

represent sets, while

X Y = {x y : x \in X, y \in Y}

. We denote by

X^{n}

the n-fold Cartesian product of

X

with itself, while

X \times Y

is the Cartesian product of

X

and

Y

. Probability density and mass functions over

X

are denoted by

p (x)

. We use

𝟙 [\cdot]

to indicate the indicator function, which is one when its argument is true and zero otherwise. The entropy of X is denoted by

H (X)

(in bits), the expected value of X by

E [X]

.

2.2. Achievable Information Rates

For a memoryless channel that is characterized by an input alphabet

X

, input distribution

p (x)

, and channel law

p (y | x)

, the maximum AIR is the mutual information (MI)

I (X; Y)

of the channel input X and output Y. Consequently, the capacity of this channel is defined as

I (X; Y)

maximized over all possible input distributions

p (x)

, typically under an average power constraint, e.g., in [9] (Section 9.1). The MI can be achieved, e.g., with MLC and multi-stage decoding [1,2].

In BICM systems, channel inputs are uniquely labeled with

{log}_{2} | X | = (m + 1)

-bit binary strings. Here, we assume that

| X |

is an integer power of two. At the transmitter, the output of a binary FEC code is mapped to channel inputs using this labeling strategy. At the receiver, BMD is employed, i.e., binary labels

C = (C_{1}, C_{2}, \dots, C_{m + 1})

are assumed to be independent, and consequently, the symbol-wise decoding metric is written as the product of bit-metrics:

𝕢 (x, y) = \prod_{i = 1}^{m + 1} 𝕢_{i} (c_{i}, y) .

(1)

Since the metric in (1) is in general not proportional to

p (y | x)

, i.e., there is a mismatch between the actual channel law and the one assumed at the receiver, this setup is called mismatched decoding.

Different AIRs have been derived for this so-called mismatched decoding setup. One of these is the generalized MI (GMI) [17,18]:

GMI (p (x)) = max_{s \geq 0} E [log \frac{{[𝕢 (X, Y)]}^{s}}{\sum_{x \in X} p (x) {[𝕢 (x, Y)]}^{s}}],

(2)

which reduces to [19] (Thm. 4.11, Coroll. 4.12) and [20]:

GMI (p (c_{1}) p (c_{2}) \dots p (c_{m + 1})) = \sum_{i = 1}^{m + 1} I (C_{i}; Y)

(3)

when the bit levels are independent at the transmitter, i.e.,

p (x) = p (c) = p (c_{1}) p (c_{2}) \dots p (c_{m + 1})

where

c = (c_{1}, c_{2}, \dots, c_{m + 1})

, and:

𝕢_{i} (c_{i}, y) = p (y | c_{i}) .

(4)

The rate (3) is achievable for both uniform and shaped bit levels [5,21]. The problem of computing the bit level distributions that maximize the GMI in (3) was shown to be nonconvex in [22]. The parameter that maximizes (2) to obtain (3) is

s = 1

.

Another AIR for mismatched decoding is the LM (lower bound on the mismatch capacity) rate [18,23]:

LM (p (x)) = max_{s \geq 0, r (\cdot)} E [log \frac{{[𝕢 (X, Y)]}^{s} r (X)}{\sum_{x \in X} p (x) {[𝕢 (x, Y)]}^{s} r (x)}],

(5)

where

r (\cdot)

is a real-valued cost function defined on

X

. The expectations in (2) and (5) are taken with respect to

p (x, y)

.

When there is dependence among bit levels, i.e.,

p (x) = p (c) \neq p (c_{1}) p (c_{2}) \dots p (c_{m + 1})

, the rate [24,25]:

R_{BMD} (p (x)) = H (C) - \sum_{i = 1}^{m + 1} H (C_{i} | Y)

(6)

has been shown to be achievable by BMD for any joint input distribution

p (c) = p (c_{1}, c_{2}, \dots, c_{m + 1})

. In [24,25], the achievability of (6) was derived using random coding arguments based on strong typicality [16] (Ch. 1). Later in [26] (Lemma 1), it was shown that (6) is an instance of the so-called LM rate (5) for

s = 1

, the symbol decoding metric (1), bit decoding metrics (4), and the cost function:

r (c_{1}, c_{2}, \dots, c_{m + 1}) = \frac{\prod_{i = 1}^{m + 1} p (c_{i})}{p (c_{1}, c_{2}, \dots, c_{m + 1})} .

(7)

We note here that

R_{BMD}

in (6) can be negative as discussed in [26] (Section II-B). In such cases,

R_{BMD}

cannot be considered as an achievable rate. To avoid this,

R_{BMD}

is defined as the maximum of (6) and zero in [26] (Equation (1)).

2.3. Probabilistic Amplitude Shaping: Model

PAS [8] is a capacity-achieving CM strategy in which constellation shaping and FEC coding are combined as shown in Figure 1. In PAS, first an amplitude shaping block maps k-bit information strings to n-amplitude shaped sequences

\underset{̲}{a} = (a_{1}, a_{2}, \dots, a_{n})

in an invertible manner. These amplitudes are drawn from a

2^{m}

-ary alphabet

A

. The amplitude shaping block can be realized using constant composition distribution matching [27], multiset-partition distribution matching [28], shell mapping [29], enumerative sphere shaping [30], etc.

After n amplitudes are generated, binary labels

{\underset{̲}{c}}_{1} {\underset{̲}{c}}_{2} \dots {\underset{̲}{c}}_{m}

of the amplitudes

\underset{̲}{a}

and an additional

γ n

-bit information string

{\underset{̲}{s}}_{i} = (s_{1}, s_{2}, \dots, s_{γ n})

are fed to a rate

(m + γ) / (m + 1)

systematic FEC encoder. The encoder produces

(1 - γ) n

parity bits

{\underset{̲}{s}}_{p} = (s_{γ n + 1}, s_{γ n + 2}, \dots, s_{n})

. The additional data bits

{\underset{̲}{s}}_{i}

and the parity bits

{\underset{̲}{s}}_{p}

are used as the signs

\underset{̲}{s} = (s_{1}, s_{2}, \dots, s_{n})

for the amplitudes

\underset{̲}{a}

. Finally, probabilistically shaped channel inputs

\underset{̲}{x} = \underset{̲}{s} \otimes \underset{̲}{a}

are transmitted through the channel. Here,

γ

is the rate of the additional information in bits per symbol (bit/1D) or, equivalently, the fraction of signs that are selected directly by data bits. The transmission rate of PAS is

R = k / n + γ

in bit/1D.

2.4. Probabilistic Amplitude Shaping: Achievable Rates

Based on Gallager’s error exponent approach [15] (Ch. 5), AIRs of PAS were investigated in [12,13,14]. In [12], a random code ensemble was considered from which the channel inputs

\underset{̲}{x}

were drawn. Then, the AIR in [12] (Equations (32)–(34)) was derived for a general memoryless decoding metric

𝕢 (x, y)

. It was shown that by properly selecting

𝕢 (x, y)

,

I (X; Y)

and the rate (6) can be recovered from the derived AIR, and consequently, they can be achieved with PAS.

Computing error exponents for PAS was also the main concern of the work presented in [13] (Ch. 10). The difference from [12] was in the random coding setup. In [13] (Ch. 10), a random code ensemble was considered from which only the signs

\underset{̲}{s}

of the channel inputs were drawn at random. We call this the random sign-coding setup. The error exponent [13] (Equation (10.42)) was then derived again for a general memoryless decoding metric. Error exponents of PAS have also been examined based on the joint source-channel coding (JSCC) setup in [14,31]. Random sign-coding was considered in [14,31], but only with symbol-metric decoding (SMD) and only for the specific case where

γ = 0

.

2.5. Our Contribution

In this work, we derive AIRs of PAS in a random sign-coding framework based on weak typicality [9] (Section 3.1, Section 7.6 and Section 15.2). We first consider basic sign-coding in which amplitudes of the channel inputs are generated constructively while the signs are drawn from a randomly generated code. Basic sign-coding corresponds to PAS with

γ = 0

. Then, we consider modified sign-coding in which only some of the signs are drawn from the random code while the remaining are chosen directly by information bits. Modified sign-coding corresponds to PAS with

0 < γ < 1

. We compute AIRs for both SMD and BMD.

Our first objective is to provide alternative proofs of achievability in which the codes are generated as constructively as possible. In our random sign-coding experiment, both the amplitude sequences (

\underset{̲}{a}

) and the sign sequence parts (

{\underset{̲}{s}}_{i}

) that are information bits are constructively produced, and only the remaining signs (

{\underset{̲}{s}}_{p}

) are randomly generated as illustrated in Figure 2. In most proofs of Shannon’s channel coding theorem, channel input sequences (

\underset{̲}{x}

) are drawn at random, and the existence of a good code is demonstrated. Therefore, these proofs are not constructive and cannot be used to identify good codes as discussed, e.g., in [32] (Section I) and the references therein. On the other hand, in our proofs using random sign-coding arguments, it is self-evident how—at least a part of—the code should be constructed. Our second objective is to provide a unified framework in which all possible PAS scenarios are considered, i.e., SMD or BMD at the receiver with

0 \leq γ < 1

, and corresponding AIRs are determined using a single technique, i.e., the random sign-coding argument.

Note that our approach differs from the random sign-coding setup considered in [13,14] where all signs (

{\underset{̲}{s}}_{i}

and

{\underset{̲}{s}}_{p}

) were generated randomly, which was called partially systematic encoding in [13] (Ch. 10). We will show later that only

{\underset{̲}{s}}_{p}

needs to be chosen randomly. Furthermore, we define a special type of typicality (

B

-typicality; see Definition 1 below) that allows us to avoid the mismatched JSCC approach of [14].

3. Preliminaries

3.1. Memoryless Channels

We consider communication over a memoryless channel with discrete input

X \in X

and discrete output

Y \in Y

. The channel law is given by:

p (\underset{̲}{y} | \underset{̲}{x}) = \prod_{i = 1}^{n} p (y_{i} | x_{i}) .

(8)

Later in Example 1, we will also discuss the AWGN channel

Y = X + Z

where Z is zero-mean Gaussian with variance

σ^{2}

. In this case, we assume that the channel output Y is a quantized version of the continuous channel output

X + Z

. Furthermore, we assume that this quantization has a resolution high enough that the discrete-output channel is an accurate model for the underlying continuous-output channel. Therefore, the achievability results we will obtain for discrete memoryless channels carry over to the discrete-input AWGN channel.

3.2. Typical Sequences

We will provide achievability proofs based on weak typicality. In this section, which is based on [9] (Section 3.1, Section 7.6, and Section 15.2), we formally define weak typicality and list its properties that will be used in this paper.

Let

ε > 0

and n be a positive integer. Consider the random variable X with probability distribution

p (x)

. Then, the (weak) typical set

A_{ε}^{n} (X)

of length-n sequences with respect to

p (x)

is defined as:

A_{ε}^{n} (X) ≜ \{\underset{̲}{x} \in X^{n} : |- \frac{1}{n} log p (\underset{̲}{x}) - H (X)| \leq ε\},

(9)

where:

p (\underset{̲}{x}) ≜ \prod_{i = 1}^{n} p (x_{i}) .

(10)

The cardinality of the typical set

A_{ε}^{n} (X)

satisfies [9] (Thm. 3.1.2):

(1 - ε) 2^{n (H (X) - ε)} \overset{(a)}{\leq} |A_{ε}^{n} (X)| \overset{(b)}{\leq} 2^{n (H (X) + ε)},

(11)

where (a) holds for n sufficiently large and (b) holds for all n. For

\underset{̲}{x} \in A_{ε}^{n} (X)

, the probability of occurrence can be bounded as [9] (Equation (3.6)):

2^{- n (H (X) + ε)} \leq p (\underset{̲}{x}) \leq 2^{- n (H (X) - ε)} .

(12)

The idea of typical sets can be generalized for pairs of n-sequences. Now, consider the pair of random variables

(X, Y)

with probability distribution

p (x, y)

. Then, the typical set

A_{ε}^{n} (X Y)

of pairs of length-n sequences with respect to

p (x, y)

is defined as:

\begin{matrix} A_{ε}^{n} (X Y) ≜ {(\underset{̲}{x}, \underset{̲}{y}) \in X^{n} \times Y^{n} : & |- \frac{1}{n} log p (\underset{̲}{x}) - H (X)| \leq ε, \\ |- \frac{1}{n} log p (\underset{̲}{y}) - H (Y)| \leq ε, \\ |- \frac{1}{n} log p (\underset{̲}{x}, \underset{̲}{y}) - H (X, Y)| \leq ε} \end{matrix}

(13)

where:

p (\underset{̲}{x}, \underset{̲}{y}) ≜ \prod_{i = 1}^{n} p (x_{i}, y_{i}),

(14)

and where

p (x)

and

p (y)

are the marginal distributions that correspond to

p (x, y)

. The cardinality of the typical set

A_{ε}^{n} (X Y)

satisfies [9] (Thm. 7.6.1):

|A_{ε}^{n} (X Y)| \leq 2^{n (H (X, Y) + ε)}

(15)

for all n. For

(\underset{̲}{x}, \underset{̲}{y}) \in A_{ε}^{n} (X Y)

, the probability of occurrence can be bounded in a similar manner to (12) as:

2^{- n (H (X, Y) + ε)} \leq p (\underset{̲}{x}, \underset{̲}{y}) \leq 2^{- n (H (X, Y) - ε)} .

(16)

Along the same lines, joint typicality can be extended for collections of n-sequences

({\underset{̲}{X}}_{1}, {\underset{̲}{X}}_{2}, \dots, {\underset{̲}{X}}_{m})

and the corresponding typical set

A_{ε}^{n} (X_{1} X_{2} \dots X_{m})

can be defined similar to how (9) was extended to (13). Then, for

({\underset{̲}{x}}_{1}, {\underset{̲}{x}}_{2}, \dots, {\underset{̲}{x}}_{m}) \in A_{ε}^{n} (X_{1} X_{2} \dots X_{m})

, the probability of occurrence can be bounded in a similar manner to (16) as:

2^{- n (H (X) + ε)} \leq p ({\underset{̲}{x}}_{1}, {\underset{̲}{x}}_{2}, \dots, {\underset{̲}{x}}_{m}) \leq 2^{- n (H (X) - ε)},

(17)

where

X = (X_{1}, X_{2}, \dots, X_{m})

.

Finally, we fix

\underset{̲}{x}

. The conditional (weak) typical set

A_{ε}^{n} (Y | \underset{̲}{x})

of length-n sequences is defined as:

A_{ε}^{n} (Y | \underset{̲}{x}) = \{\underset{̲}{y} : (\underset{̲}{x}, \underset{̲}{y}) \in A_{ε}^{n} (X Y)\} .

(18)

In other words,

A_{ε}^{n} (Y | \underset{̲}{x})

is the set of all

\underset{̲}{y}

sequences that are jointly typical with

\underset{̲}{x}

. For

\underset{̲}{x} \in A_{ε}^{n} (X)

and for sufficiently large n, the cardinality of the conditional typical set

A_{ε}^{n} (Y | \underset{̲}{x})

satisfies [9] (Thm. 15.2.2):

| A_{ε}^{n} (Y | \underset{̲}{x}) | \leq 2^{n (H (Y | X) + 2 ε)} .

(19)

Definition 1

(

B

-typicality). Let the input probability distribution

p (u)

together with the transition probability distribution

p (v | u)

determine the joint probability distribution

p (u, v) = p (u) p (v | u)

. Now, we define:

B_{V, ε}^{n} (U) \overset{Δ}{=} \{\underset{̲}{u} : \underset{̲}{u} \in A_{ε}^{n} (U) and Pr \{(\underset{̲}{u}, \underset{̲}{V}) \in A_{ε}^{n} (U V) ∣ \underset{̲}{U} = \underset{̲}{u})\} \geq 1 - ε\},

(20)

where

\underset{̲}{V}

is the output sequence of a “channel”

p (v | u)

when sequence

\underset{̲}{u}

is input.

The set

B_{V, ε}^{n} (U)

in (20) guarantees that a sequence

\underset{̲}{u}

in this

B

-typical set will with high probability lead to a sequence

\underset{̲}{v}

that is jointly typical with

\underset{̲}{u}

. We note that U and/or V can be composite. The set

B_{V, ε}^{n} (U)

has three properties, as stated in Lemma 1, the proof of which is given in Appendix A.

Lemma 1

(

B

-typicality properties). The set

B_{V, ε}^{n} (U)

in Definition 1 has the following properties:

P₁:: For $\underset{̲}{u} \in B_{V, ε}^{n} (U)$ ,

$2^{- n (H (U) + ε)} \leq p (\underset{̲}{u}) \leq 2^{- n (H (U) - ε)} .$

(21)
P₂:: For n large enough,

$\sum_{\underset{̲}{u} \notin B_{V, ε}^{n} (U)} p (\underset{̲}{u}) \leq ε .$
P₃:: $| B_{V, ε}^{n} (U) | \leq 2^{n (H (U) + ε)}$ holds for all n, while $| B_{V, ε}^{n} (U) | \geq (1 - ε) 2^{n (H (U) - ε)}$ holds for n large enough.

4. Random Sign-Coding Experiment

We consider

2^{m + 1}

-ary amplitude shift keying (M-ASK) alphabets

X = {- M + 1, - M + 3, \dots, M - 1}

where

M = 2^{m + 1}

. We note that

X

is symmetric around the origin and can be factorized as

X = S A

. Here,

S = {- 1, + 1}

and

A = {+ 1, + 3, \dots, M - 1}

are the sign and amplitude alphabets, respectively. Accordingly, any channel input

x \in X

can be written as the multiplication of a sign and an amplitude, i.e.,

x = s \otimes a

.

4.1. Random Sign-Coding Setup

We cast the PAS structure shown in Figure 1 as a sign-coding structure as in Figure 3. The sign-coding setup consists of two layers: a shaping layer and a coding layer.

Definition 2

(Sign-coding). For every message index pair

(m_{a}, m_{s})

, with uniform

m_{a} \in {1, 2, \dots, M_{a}}

and uniform

m_{s} \in {1, 2, \dots, M_{s}}

, a sign-coding structure as shown in Figure 3 consists of the following.

A shaping layer that produces for every message index $m_{a}$ , a length-n shaped amplitude sequence $\underset{̲}{a} (m_{a})$ where the mapping is one-to-one. The set of amplitude sequences is assumed to be shaped, but uncoded.
An additional $n_{1}$ -bit (uniform) information string in the form of a sign sequence part ${\underset{̲}{s}}^{'} (m_{s}) = (s_{1} (m_{s}), s_{2} (m_{s}), \dots, s_{n_{1}} (m_{s}))$ for every message index $m_{s}$ .
A coding layer that extends the sign sequence part ${\underset{̲}{s}}^{'} (m_{s})$ by adding a second (uniform) sign sequence part ${\underset{̲}{s}}^{″} (m_{a}, m_{s}) = (s_{n_{1} + 1} (m_{a}, m_{s}), s_{n_{1} + 2} (m_{a}, m_{s}), \dots, s_{n} (m_{a}, m_{s}))$ of length- $n_{2}$ for all $m_{a}$ and $m_{s}$ . This is obtained by using an encoder that produces redundant signs in the set $S$ from $\underset{̲}{a} (m_{a})$ and ${\underset{̲}{s}}^{'} (m_{s})$ . Here, $n_{1} + n_{2} = n$ .

Finally, the transmitted sequence is

\underset{̲}{x} (m_{a}, m_{s}) = \underset{̲}{a} (m_{a}) \otimes \underset{̲}{s} (m_{a}, m_{s})

, where

\underset{̲}{s} (m_{a}, m_{s}) = ({\underset{̲}{s}}^{'} (m_{s}), {\underset{̲}{s}}^{″} (m_{a}, m_{s}))

. The sign-coding setup with

n_{1} = 0

(

γ = 0

) is called basic sign-coding, while the setup with

n_{1} > 0

(

γ > 0

) is called modified sign-coding.

4.2. Shaping Layer

When SMD is employed at the receiver, the shaping layer is as shown in Figure 4. Here, let A be distributed with

p (a)

over

a \in A

. Then, the shaper produces for every message index

m_{a}

a length-n amplitude sequence

\underset{̲}{a} (m_{a}) \in B_{S Y, ε}^{n} (A)

. We note that for this sign-coding setup, the rate is:

\begin{matrix} R & = & \frac{1}{n} {log}_{2} | M_{a} M_{s} | = γ + \frac{1}{n} {log}_{2} | B_{S Y, ε}^{n} (A) | \geq H (A) + γ - 2 ε \end{matrix}

(22)

where the inequality in (22) follows for n large enough from P₃.

On the other hand, when BMD is used at the receiver, the shaping layer is as shown in Figure 5. Here, let

B = (B_{1}, B_{2}, \dots, B_{m})

be distributed with

p (b) = p (b_{1}, b_{2}, \dots, b_{m})

over

(b_{1}, b_{2}, \dots, b_{m}) \in {0, 1}^{m}

. The shaper produces for every message index

m_{a}

an n-sequence of m-tuples

\underset{̲}{b} (m_{a}) = ({\underset{̲}{b}}_{1} (m_{a}), {\underset{̲}{b}}_{2} (m_{a}), \dots, {\underset{̲}{b}}_{m} (m_{a})) \in B_{S Y, ε}^{n} (B_{1} B_{2} \dots B_{m})

. Then, each m-tuple is mapped to an amplitude sequence

\underset{̲}{a} (m_{a})

by a symbol-wise mapping function

f (\cdot)

. We note that for this sign-coding setup, the rate is:

\begin{matrix} R & = & \frac{1}{n} {log}_{2} | M_{a} M_{s} | = γ + \frac{1}{n} {log}_{2} | B_{S Y, ε}^{n} (B) | \geq H (B) + γ - 2 ε \end{matrix}

(23)

where the inequality in (23) follows for n large enough from P₃.

To realize

f (\cdot)

, we label the channel inputs with

(m + 1)

-bit strings. The amplitude is addressed by m amplitude bits

(B_{1}, B_{2}, \dots, B_{m})

, while the sign is addressed by a sign bit S. The symbol-wise mapping function

f (\cdot)

in Figure 5 uses the addressing

(B_{1}, B_{2}, \dots, B_{m}) ⟺ A

. We emphasize that unlike the case in Section 2.2, we use

(S, B_{1}, B_{2}, \dots, B_{m})

to denote a channel input instead of

(C_{1}, C_{2}, \dots, C_{m + 1})

. Amplitudes and signs of

x \in X

are tabulated for 8-ASK in Table 1 along with an example of the mapping function

f (b_{1}, b_{2})

, namely the binary reflected Gray code [19] (Defn. 2.10).

4.3. Decoding Rules

At the receiver, SMD finds the unique message index pair

({\hat{m}}_{a}, {\hat{m}}_{s})

such that the corresponding amplitude-sign sequence is jointly typical with the received output sequence

\underset{̲}{y}

, i.e.,

(\underset{̲}{a} ({\hat{m}}_{a}), \underset{̲}{s} ({\hat{m}}_{a}, {\hat{m}}_{s}), \underset{̲}{y}) \in A_{ε}^{n} (A S Y)

.

On the other hand, BMD finds the unique message index pair

({\hat{m}}_{a}, {\hat{m}}_{s})

such that the corresponding bit and sign sequences are (individually) jointly typical with the received output sequence

\underset{̲}{y}

, i.e.,

(\underset{̲}{s} ({\hat{m}}_{a}, {\hat{m}}_{s}), \underset{̲}{y}) \in A_{ε}^{n} (S Y)

and

({\underset{̲}{b}}_{j} ({\hat{m}}_{a}), \underset{̲}{y}) \in A_{ε}^{n} (B_{j} Y)

for

j = 1, 2, \dots, m

. We note that the decoder can use bit metrics

p (b_{j i} = 1 | y_{i}) = 1 - p (b_{j i} = 0 | y_{i})

for

j = 1, 2, \dots, m

and

i = 1, 2, \dots, n

to find

p ({\underset{̲}{b}}_{j} | \underset{̲}{y})

. Here,

b_{j i}

is the

j^{th}

bit of the

i^{th}

symbol. Together with

p (\underset{̲}{y})

and

p ({\underset{̲}{b}}_{j})

, the decoder can check whether

({\underset{̲}{b}}_{j}, \underset{̲}{y}) \in A_{ε}^{n} (B_{j} Y)

. We note that

B_{j}

is in general not uniform. A similar statement holds for the uniform sign S.

5. Achievable Information Rates of Sign-Coding

Here, we investigate AIRs of the sign-coding architecture in Figure 3. We consider both SMD and BMD at the receiver. In what follows, four AIRs are presented. The proofs are based on

B

-typicality, a variation of weak typicality, and random sign-coding arguments and are given in Appendix B. As indicated in Definition 2, signs S are assumed to be uniform in the proofs. We have not applied weak typicality for continuous random variables, discussed in [9] (Section 8.2) and [33] (Section 10.4), since our channels are discrete-input. However, it is also possible to develop a hybrid version of weak typicality that matches with discrete-input continuous-output channels.

In the following, the concept of AIR is formally defined in the sign-coding context.

Definition 3

(Achievable information rate). A rate R is said to be achievable if for every

δ > 0

and n large enough, there exists a sign-coding encoder and a decoder such that

(1 / n) {log}_{2} (M_{a} M_{s}) \geq R - δ

and error probability

P_{e} \leq δ

.

5.1. Sign-Coding with Symbol-Metric Decoding

Theorem 1

(Basic sign-coding with SMD). For a memoryless channel

{X, p (y | x), Y}

with amplitude shaping and basic sign-coding, the rate:

R_{SMD}^{γ = 0} = max_{p (a) : H (A) \leq I (S A; Y)} H (A)

(24)

is achievable using SMD.

Theorem 1 implies that for a memoryless channel, the rate

R = H (A)

is achievable with basic sign-coding, as long as

H (A) \leq I (S A; Y) = I (X; Y)

is satisfied. For the AWGN channel, this means that a range of rate-SNR pairs are achievable. Here, SNR denotes the signal-to-noise ratio. One of these points,

H (A) = I (S A; Y)

, is on the capacity-SNR curve. Note that here, “capacity” indicates the largest achievable rate using

X

as the channel input alphabet under the average power constraint. It can be observed from Figure 6 discussed in Example 1 that there indeed exists an amplitude distribution

p (a)

for which

H (A) = I (S A; Y)

.

Theorem 2

(Modified sign-coding with SMD). For a memoryless channel

{X, p (y | x), Y}

with amplitude shaping and modified sign-coding, the rate:

R_{SMD}^{γ > 0} = max_{p (a), γ : H (A) + γ \leq I (S A; Y)} H (A) + γ

(25)

is achievable using SMD for

γ < 1

.

Theorem 2 implies that for a memoryless channel, the rate

H (A) + γ

is achievable with modified sign-coding, as long as

R = H (A) + γ \leq I (S A; Y) = I (X; Y)

is satisfied. For the AWGN channel, this means that all points on the capacity-SNR curve for which

H (X | Y) \leq 1 - γ

are achievable. This follows from:

H (A) + γ \leq I (S A; Y) = H (S A) - H (S A | Y) = H (A) + 1 - H (X | Y),

(26)

i.e., the constraint in the maximization in (25).

Example 1.

We consider the AWGN channel with average power constraint

E [X^{2}] \leq P

. Figure 6 shows the capacity of 4-ASK:

C_{4 - ASK} = max_{\begin{matrix} p (x) : X = {- 3, - 1, + 1, + 3}, \\ E [X^{2}] \leq P \end{matrix}} I (X; Y)

(27)

together with the amplitude entropy

H (A)

of the distribution that achieves this capacity. Here,

S N R = E [X^{2}] / σ^{2}

, and

σ^{2}

is the noise variance. Basic sign-coding achieves capacity only for

S N R = 0.72

dB, i.e., at the point where

H (A) = I (X; Y)

, which is

C_{4 - A S K} = 0.562

bit/1D. We see from Figure 6 that the shaping gap is negligible around this point, i.e., the capacity

C_{4 - A S K}

of 4-ASK and the MI

I (X; Y)

for uniform

p (x)

are virtually the same. On the other hand, this gap is significant for larger rates, e.g., it is around 0.42 dB at 1.6 bit/1D. To achieve rates larger than 0.562 bit/1D on the capacity-SNR curve, modified sign-coding (

γ > 0

) is required. At a given SNR,

C_{4 - A S K}

can be written as

C_{4 - A S K} = H (A) + γ

, i.e., when the

H (A)

curve is shifted above by γ, the crossing point is again at

C_{4 - A S K}

for that SNR. We also plot the additional rate

γ = C_{4 - A S K} - H (A)

in Figure 6. As an example, at

S N R = 9.74

dB,

C_{A S K} = H (A) + γ = 1.6

can be achieved with modified sign-coding where

H (A) = 0.9

and

γ = 0.7

. We observe that sign-coding achieves the capacity of 4-ASK for

S N R \geq 0.72

dB.

5.2. Sign-Coding with Bit-Metric Decoding

The following theorems give AIRs for sign-coding with BMD.

Theorem 3

(Basic sign-coding with BMD). For a memoryless channel

{X, p (y | x), Y}

with amplitude shaping using M-ASK and basic sign-coding, the rate:

R_{BMD}^{γ = 0} = max_{\begin{matrix} p (b) : H (B) \leq R_{BMD} (p (x)) \end{matrix}} H (B)

(28)

is achievable using BMD. Here,

B = (B_{1}, B_{2}, \dots, B_{m})

,

p (b) = p (b_{1}, b_{2}, \dots, b_{m})

, and

p (x) = p (s, b_{1}, b_{2}, \dots, b_{m})

, and

R_{BMD} (p (x))

is as defined in (6).

Theorem 4

(Modified sign-coding with BMD). For a memoryless channel

{X, p (y | x), Y}

with amplitude shaping using M-ASK and modified sign-coding, the rate:

R_{BMD}^{γ > 0} = max_{\begin{matrix} p (b), γ : H (B) + γ \leq R_{BMD} (p (x)) \end{matrix}} H (B) + γ

(29)

is achievable using BMD for

γ < 1

.

Theorems 3 and 4 imply that for a memoryless channel, the rate

R = H (B) + γ = H (A) + γ

is achievable with sign-coding and BMD, as long as

R \leq R_{BMD}

is satisfied.

Remark 1

(Random sign-coding with binary linear codes). An amplitude can be represented by m bits. We can uniformly generate a code matrix with

m n

rows of length n. This matrix can be used to produce the sign sequences. This results in the pairwise independence of any two different sign sequences, as is explained in the proof of [15] (Theorem 6.2.1). Inspection of the proof of our Theorem 1 shows that only the pairwise independence of sign sequences is needed. Therefore, achievability can also be obtained with a binary linear code. Note that our linear code can also be seen as a systematic code that generates parity. The code rate of the corresponding systematic code is

m / (m + 1)

. For BMD, a similar reasoning shows that linear codes lead to achievability, and also for modified sign-coding, achievability follows for binary linear codes. The rate of the systematic code that corresponds to the modified setting is

(m + γ) / (m + 1)

.

6. Conclusions

In this paper, we studied achievable information rates (AIRs) of probabilistic amplitude shaping (PAS) for discrete-input memoryless channels. In contrast to the existing literature in which Gallager’s error exponent approach was followed, we used a weak typicality framework. Random sign-coding arguments based on weak typicality were introduced to upper-bound the probability of error of a so-called sign-coding structure. The achievability of the mutual information was demonstrated for uniform signs, which were independent of the amplitudes. Sign-coding combined with amplitude shaping corresponded to PAS, and consequently, PAS achieved the capacity of a discrete-input memoryless channel with a symmetric capacity-achieving distribution.

Our approach was different than the random coding arguments considered in the literature, in the sense that our motivation was to provide achievability proofs that were as constructive as possible. To this end, in our random sign-coding setup, both the amplitudes and the signs of the channel inputs that were directly selected by information bits were constructively produced. Only the remaining signs were drawn at random. A study on the achievability of capacity for channels with asymmetric capacity-achieving distributions with a type of sign-coding is left for possible future research.

Author Contributions

Conceptualization, Y.C.G. and F.M.J.W.; formal analysis, Y.C.G., A.A., and F.M.J.W.; software, Y.C.G.; writing, original draft, Y.C.G. and F.M.J.W.; writing, review and editing, Y.C.G., A.A., and F.M.J.W. All authors have read and agreed to the published version of the manuscript.

Funding

The work of Y.C.G. and A.A. received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant Agreement No. 757791).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Lemma 1

Appendix A.1. Proof of P₁

We see from [9] (Equation (3.6)) that for

\underset{̲}{u} \in A_{ε}^{n} (U)

,

2^{- n (H (U) + ε)} \leq p (\underset{̲}{u}) \leq 2^{- n (H (U) - ε)} .

(A1)

Due to Definition 1, each

\underset{̲}{u} \in B_{V, ε}^{n} (U)

is also in

A_{ε}^{n} (U)

; more specifically,

B_{V, ε}^{n} (U) \subseteq A_{ε}^{n} (U)

. Consequently, (A1) also holds for

\underset{̲}{u} \in B_{V, ε}^{n} (U)

, which completes the proof of P₁.

Appendix A.2. Proof of P₂

Let

(\underset{̲}{U}, \underset{̲}{V})

be independent and identically distributed with respect to

p (u, v)

. Then:

\begin{matrix} Pr {(\underset{̲}{U}, \underset{̲}{V}) \in A_{ε}^{n} (U V)} & = & \sum_{\underset{̲}{u}} p (\underset{̲}{u}) \sum_{\underset{̲}{v} : (\underset{̲}{u}, \underset{̲}{v}) \in A_{ε}^{n} (U V)} p (\underset{̲}{v} | \underset{̲}{u}) \\ = & \sum_{\underset{̲}{u} \in B_{V, ε}^{n} (U)} p (\underset{̲}{u}) \sum_{\underset{̲}{v} : (\underset{̲}{u}, \underset{̲}{v}) \in A_{ε}^{n} (U V)} p (\underset{̲}{v} | \underset{̲}{u}) \end{matrix}

(A2)

\begin{matrix} + \sum_{\underset{̲}{u} \notin B_{V, ε}^{n} (U)} p (\underset{̲}{u}) \sum_{\underset{̲}{v} : (\underset{̲}{u}, \underset{̲}{v}) \in A_{ε}^{n} (U V)} p (\underset{̲}{v} | \underset{̲}{u}) \end{matrix}

(A3)

\begin{matrix} \leq & \sum_{\underset{̲}{u} \in B_{V, ε}^{n} (U)} p (\underset{̲}{u}) + \sum_{\underset{̲}{u} \notin B_{V, ε}^{n} (U)} p (\underset{̲}{u}) (1 - ε) \end{matrix}

(A4)

\begin{matrix} = & 1 - ε + ε \sum_{\underset{̲}{u} \in B_{V, ε}^{n} (U)} p (\underset{̲}{u}) \end{matrix}

(A5)

\begin{matrix} = & 1 - ε + ε Pr {\underset{̲}{U} \in B_{V, ε}^{n} (U)} . \end{matrix}

(A6)

Here, (A4) follows from Definition 1, which states that

Pr \{(\underset{̲}{u}, \underset{̲}{V}) \in A_{ε}^{n} (U V) | \underset{̲}{U} = \underset{̲}{u}\} < 1 - ε

for

\underset{̲}{u} \in A_{ε}^{n} (U)

, if

\underset{̲}{u} \notin B_{V, ε}^{n} (U)

. Then, from (A6), we obtain:

\begin{matrix} Pr {\underset{̲}{U} \in B_{V, ε}^{n} (U)} & \geq & \frac{Pr {(U, V) \in A_{ε}^{n} (U V)} - 1 + ε}{ε} \end{matrix}

(A7)

\begin{matrix} = & 1 - \frac{Pr {(U, V) \notin A_{ε}^{n} (U V)}}{ε} \end{matrix}

(A8)

\begin{matrix} \geq & 1 - ε . \end{matrix}

(A9)

for large enough n. Here, (A9) follows from [9] (Thm. 7.6.1), which states that

Pr {(\underset{̲}{U}, \underset{̲}{V}) \in A_{ε}^{n} (U V)} \to 1

as

n \to \infty

. This implies that

Pr {(\underset{̲}{U}, \underset{̲}{V}) \notin A_{ε}^{n} (U V)} \leq ε^{2}

for positive

ε

and large enough n, which completes the proof.

Appendix A.3. Proof of P₃

We see from [9] (Thm. 3.1.2) that:

| A_{ε}^{n} (U) | \leq 2^{n (H (U) + ε)} .

(A10)

Since

B_{V, ε}^{n} (U) \subseteq A_{ε}^{n} (U)

, again by Definition 1, (A10) also holds for

| B_{V, ε}^{n} (U) |

. This proves the upper bound in P₃. To prove the lower bound, we obtain from (A9) for n sufficiently large that:

\begin{matrix} 1 - ε & \leq & Pr {\underset{̲}{U} \in B_{V, ε}^{n} (U)} \end{matrix}

(A11)

\begin{matrix} \leq & \sum_{\underset{̲}{u} \in B_{V, ε}^{n} (U)} 2^{- n (H (U) - ε)} \end{matrix}

(A12)

\begin{matrix} = & | B_{V, ε}^{n} (U) | 2^{- n (H (U) - ε)}, \end{matrix}

(A13)

where (A12) follows from (A1).

Appendix B. Proofs of Theorems 1, 2, 3, and 4

To derive AIRs, we will follow the classical approach, e.g., as in [9] (Section 7.7), and upper-bound the average of the probability of error

{\bar{P}}_{e}

over a random choice of sign-codebooks. This way, we will demonstrate the existence of at least one good sign-code. Again as in [9] (Section 7.7) and as explained in Section 4.3, we decode by joint typicality: the decoder looks for a unique message index pair

({\hat{m}}_{a}, {\hat{m}}_{s})

for which the corresponding amplitude-sign sequence

(\underset{̲}{a}, \underset{̲}{s})

is jointly typical with the received sequence

\underset{̲}{y}

.

By the properties of weak typicality and

B

-typicality, the transmitted amplitude-sign sequence and the received sequence are jointly typical with high probability for n large enough. We call the event for which the transmitted amplitude-sign sequence is not jointly typical with the received sequence the first error event with average probability

{\bar{P}}_{e} (1)

. Furthermore, the probability that any other (not transmitted) amplitude-sign sequence is jointly typical with the received sequence vanishes for asymptotically large n. We call the event that there is another amplitude-sign sequence that is jointly typical with the received sequence the second error event with average probability

{\bar{P}}_{e} (2)

. Observing that these events are not disjoint, we can write [9] (Equation (7.75)):

{\bar{P}}_{e} \leq {\bar{P}}_{e} (1) + {\bar{P}}_{e} (2) .

(A14)

Appendix B.1. Proof of Theorem 1

For the error of the first kind, we can write:

\begin{matrix} {\bar{P}}_{e} (1) & = & \sum_{m_{a} = 1}^{M_{a}} \frac{1}{M_{a}} \sum_{\underset{̲}{s} \in S^{n}} p (\underset{̲}{s}) \sum_{\underset{̲}{y} \in Y^{n}} p (\underset{̲}{y} | \underset{̲}{a} (m_{a}), \underset{̲}{s}) 𝟙 [(\underset{̲}{a} (m_{a}), \underset{̲}{s}, \underset{̲}{y}) \notin A_{ε}^{n} (A S Y)] \end{matrix}

(A15)

\begin{matrix} = & \sum_{m_{a}} \frac{1}{M_{a}} \sum_{\underset{̲}{s}} \sum_{\underset{̲}{y}} p (\underset{̲}{s}, \underset{̲}{y} | \underset{̲}{a} (m_{a})) 𝟙 [(\underset{̲}{a} (m_{a}), \underset{̲}{s}, \underset{̲}{y}) \notin A_{ε}^{n}] \end{matrix}

(A16)

\begin{matrix} = & \sum_{m_{a}} \frac{1}{M_{a}} Pr \{(\underset{̲}{a} (m_{a}), \underset{̲}{S}, \underset{̲}{Y}) \notin A_{ε}^{n} | \underset{̲}{A} = \underset{̲}{a} (m_{a})\} \end{matrix}

(A17)

\begin{matrix} \leq & \sum_{m_{a}} \frac{ε}{M_{a}} \end{matrix}

(A18)

\begin{matrix} = & ε, \end{matrix}

(A19)

where we simplified the notation by replacing

m_{a} = 1, 2, \dots, M_{a}

by

m_{a}

,

\underset{̲}{s} \in S^{n}

by

\underset{̲}{s}

, and

\underset{̲}{y} \in Y^{n}

by

\underset{̲}{y}

in (A16). Furthermore, we dropped the index of the typical set

A_{ε}^{n} (A S Y)

and used

A_{ε}^{n}

instead. We will follow these notations for summations and for the typical sets for the rest of the paper, assuming for the latter that the index of the typical set will be clear from the context. To obtain (A16), we used

p (\underset{̲}{s}) p (\underset{̲}{y} | \underset{̲}{a} (m_{a}), \underset{̲}{s}) = p (\underset{̲}{s}, \underset{̲}{y} | \underset{̲}{a} (m_{a}))

. Then, (A18) is a direct consequence of Definition 1 since

\underset{̲}{a} (m_{a}) \in B_{S Y, ε}^{n} (A)

for

m_{a} = 1, 2, \dots, M_{a}

.

For the error of the second kind, we can write:

\begin{matrix} {\bar{P}}_{e} (2) & \leq & \sum_{m_{a}} \frac{1}{M_{a}} \sum_{\underset{̲}{s}} p (\underset{̲}{s}) \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{a} (m_{a}), \underset{̲}{s}) \sum_{k_{a} = 1, k_{a} \neq m_{a}}^{M_{a}} \sum_{\underset{̲}{\tilde{s}} \in S^{n}} p (\underset{̲}{\tilde{s}}) 𝟙 [(\underset{̲}{a} (k_{a}), \underset{̲}{\tilde{s}}, \underset{̲}{y}) \in A_{ε}^{n}] \end{matrix}

(A20)

\begin{matrix} = & M_{a} \sum_{m_{a}} \sum_{\underset{̲}{s}} \frac{p (\underset{̲}{s})}{M_{a}} \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{a} (m_{a}), \underset{̲}{s}) \sum_{k_{a} \neq m_{a}} \sum_{\underset{̲}{\tilde{s}}} \frac{p (\underset{̲}{\tilde{s}})}{M_{a}} 𝟙 [(\underset{̲}{a} (k_{a}), \underset{̲}{\tilde{s}}, \underset{̲}{y}) \in A_{ε}^{n}] \\ \leq & M_{a} 2^{6 n ε} \sum_{m_{a}} \sum_{\underset{̲}{s}} p (\underset{̲}{a} (m_{a})) p (\underset{̲}{s}) \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{a} (m_{a}), \underset{̲}{s}) \end{matrix}

(A21)

\begin{matrix} \cdot \sum_{k_{a} \neq m_{a}} \sum_{\underset{̲}{\tilde{s}}} p (\underset{̲}{a} (k_{a})) p (\underset{̲}{\tilde{s}}) 𝟙 [(\underset{̲}{a} (k_{a}), \underset{̲}{\tilde{s}}, \underset{̲}{y}) \in A_{ε}^{n}] \end{matrix}

(A22)

\begin{matrix} \leq & M_{a} 2^{6 n ε} \sum_{\underset{̲}{a} \in A^{n}} \sum_{\underset{̲}{s}} p (\underset{̲}{a}) p (\underset{̲}{s}) \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{a}, \underset{̲}{s}) \sum_{\underset{̲}{\tilde{a}} \in A^{n}} \sum_{\underset{̲}{\tilde{s}}} p (\underset{̲}{\tilde{a}}) p (\underset{̲}{\tilde{s}}) 𝟙 [(\underset{̲}{\tilde{a}}, \underset{̲}{\tilde{s}}, \underset{̲}{y}) \in A_{ε}^{n}] \end{matrix}

(A23)

\begin{matrix} = & M_{a} 2^{6 n ε} \sum_{\begin{matrix} (\underset{̲}{y}, \underset{̲}{\tilde{x}}) \in A_{ε}^{n} \end{matrix}} p (\underset{̲}{\tilde{x}}) p (\underset{̲}{y}) \end{matrix}

(A24)

\begin{matrix} \leq & 2^{n (H (A) + ε)} 2^{6 n ε} | A_{ε}^{n} (X Y) | 2^{- n (H (X) - ε)} 2^{- n (H (Y) - ε)} \end{matrix}

(A25)

\begin{matrix} \leq & 2^{n (H (A) + 7 ε)} 2^{n (H (X, Y) + ε)} 2^{- n (H (X) - ε)} 2^{- n (H (Y) - ε)} \end{matrix}

(A26)

\begin{matrix} = & 2^{n (H (A) - I (S A; Y) + 10 ε)}, \end{matrix}

(A27)

where we simplified the notation by replacing

k_{a} = 1, 2, \dots, M_{a} : k_{a} \neq m_{a}

by

k_{a} \neq m_{a}

, and

\underset{̲}{\tilde{s}} \in S^{n}

by

\underset{̲}{\tilde{s}}

in (A21). We will follow these notations for the rest of the paper. Then:

(A22): follows for n sufficiently large and for $\underset{̲}{a} \in B_{S Y, ε}^{n} (A)$ from:

$\begin{matrix} \frac{1}{M_{a}} = \frac{1}{| B_{S Y, ε}^{n} (A) |} & \leq & \frac{2^{- n (H (A) - ε))}}{1 - ε} \end{matrix}$

(A28)

$\begin{matrix} = & \frac{2^{2 n ε}}{1 - ε} 2^{- n (H (A) + ε)} \end{matrix}$

(A29)

$\begin{matrix} \leq & \frac{2^{2 n ε}}{1 - ε} p (\underset{̲}{a}) \end{matrix}$

(A30)

$\begin{matrix} \leq & 2^{3 n ε} p (\underset{̲}{a}), \end{matrix}$

(A31)

where (A28) follows from the $B$ -typicality property P₃, (A30) follows from the $B$ -typicality property P₁, and (A31) holds for all large enough n.
(A23): follows from summing over $\underset{̲}{a} \in A^{n}$ instead of over $\underset{̲}{a} (m_{a}) \in B_{ε}^{n}$ and over $\underset{̲}{\tilde{a}} \in A^{n}$ instead of $\underset{̲}{a} (k_{a}) \in B_{ε}^{n}$ for $k_{a} \neq m_{a}$ .
(A24): is obtained by working out the summations over $\underset{̲}{a}$ and $\underset{̲}{s}$ and by replacing $\underset{̲}{\tilde{a}} \underset{̲}{\tilde{s}}$ with $\underset{̲}{\tilde{x}}$ .
(A25): follows from $M_{a} = | B_{ε}^{n} (A) | \leq 2^{n (H (A) + ε)}$ , i.e., the $B$ -typicality property P₃, and from (12).
(A26): follows from (15).

The conclusion from (A27) is that for

H (A) < I (X; Y) - 10 ε

, the error probability of the second kind:

{\bar{P}}_{e} (2) \leq ε

(A32)

for n large enough. Using (A19) and (A32) in (A14), we find that the total error probability averaged over all possible sign-codes

{\bar{P}}_{e} \leq 2 ε

for n large enough. This implies the existence of a basic sign-code with total error probability

P_{e} = Pr {\hat{M_{a}} \neq M_{a}} \leq 2 ε

. This holds for all

ε > 0

, and therefore, the rate:

R = H (A) \leq I (X; Y),

(A33)

is achievable with basic sign-coding, which concludes the proof of Theorem 1.

Appendix B.2. Proof of Theorem 2

For the error of the first kind, we can write:

\begin{matrix} {\bar{P}}_{e} (1) & = & \sum_{m_{a}} \frac{1}{M_{a}} \sum_{m_{s} = 1}^{M_{s}} \frac{1}{2^{n_{1}}} \sum_{{\underset{̲}{s}}^{″} \in S^{n_{2}}} p ({\underset{̲}{s}}^{″}) \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{a} (m_{a}), {\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}) 𝟙 [(\underset{̲}{a} (m_{a}), {\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}, \underset{̲}{y}) \notin A_{ε}^{n}] \end{matrix}

(A34)

\begin{matrix} = & \sum_{m_{a}} \frac{1}{M_{a}} \sum_{m_{s}} \sum_{{\underset{̲}{s}}^{″}} 2^{- n} \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{a} (m_{a}), {\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}) 𝟙 [(\underset{̲}{a} (m_{a}), {\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}, \underset{̲}{y}) \notin A_{ε}^{n}] \end{matrix}

(A35)

\begin{matrix} = & \sum_{m_{a}} \frac{1}{M_{a}} \sum_{m_{s}} \sum_{{\underset{̲}{s}}^{″}} \sum_{\underset{̲}{y}} p ({\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}, \underset{̲}{y} | \underset{̲}{a} (m_{a})) 𝟙 [(\underset{̲}{a} (m_{a}), {\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}, \underset{̲}{y}) \notin A_{ε}^{n}] \end{matrix}

(A36)

\begin{matrix} = & \sum_{m_{a}} \frac{1}{M_{a}} Pr \{(\underset{̲}{a} (m_{a}), \underset{̲}{S}, \underset{̲}{Y}) \notin A_{ε}^{n} | \underset{̲}{A} = \underset{̲}{a} (m_{a})\} \end{matrix}

(A37)

\begin{matrix} \leq & \sum_{m_{a}} \frac{ε}{M_{a}} \end{matrix}

(A38)

\begin{matrix} = & ε, \end{matrix}

(A39)

where we simplified the notation by replacing

{\underset{̲}{s}}^{″} \in S^{n_{2}}

by

{\underset{̲}{s}}^{″}

and

m_{s} = 1, 2, \dots, M_{s}

by

m_{s}

in (A35). We will follow these notations for the rest of the paper. To obtain (A35), we used the fact that

{\underset{̲}{S}}^{″}

is uniform; more precisely

p ({\underset{̲}{s}}^{″}) = 2^{- n_{2}}

. To obtain (A36), we used the fact that

{\underset{̲}{S}}^{'}

is also uniform, and then,

2^{- n} p (\underset{̲}{y} | \underset{̲}{a} (m_{a}), {\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}) = p ({\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}, \underset{̲}{y} | \underset{̲}{a} (m_{a}))

. Then, (A38) is a direct consequence of Definition 1 since

\underset{̲}{a} (m_{a}) \in B_{S Y, ε}^{n} (A)

for

m_{a} = 1, 2, \dots, M_{a}

.

For the error of the second kind, we obtain:

\begin{matrix} {\bar{P}}_{e} (2) & \leq \sum_{m_{a}} \frac{1}{M_{a}} \sum_{m_{s}} \frac{1}{2^{n_{1}}} \sum_{{\underset{̲}{s}}^{″}} p ({\underset{̲}{s}}^{″}) \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{a} (m_{a}), {\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}) \\ \cdot \sum_{(k_{a}, k_{s}) \neq (m_{a}, m_{s})} \sum_{{\underset{̲}{\tilde{s}}}^{″}} p ({\underset{̲}{\tilde{s}}}^{″}) 𝟙 [(\underset{̲}{a} (k_{a}), {\underset{̲}{s}}^{'} (k_{s}) {\underset{̲}{\tilde{s}}}^{″}, \underset{̲}{y}) \in A_{ε}^{n}] \\ = M_{a} 2^{n_{1}} \sum_{m_{a}, m_{s}, {\underset{̲}{s}}^{″}} \frac{2^{- n}}{M_{a}} \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{a} (m_{a}), {\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}) \\ \cdot \sum_{(k_{a}, k_{s}) \neq (m_{a}, m_{s})} \sum_{{\underset{̲}{\tilde{s}}}^{″}} \frac{2^{- n}}{M_{a}} 𝟙 [(\underset{̲}{a} (k_{a}), {\underset{̲}{s}}^{'} (k_{s}) {\underset{̲}{\tilde{s}}}^{″}, \underset{̲}{y}) \in A_{ε}^{n}] \end{matrix}

(A40)

\begin{matrix} = & M_{a} 2^{n_{1}} \sum_{m_{a}, m_{s}, {\underset{̲}{s}}^{″}} \frac{2^{- n}}{M_{a}} \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{a} (m_{a}), {\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}) \sum_{k_{a} \neq m_{a}, k_{s}, {\underset{̲}{\tilde{s}}}^{″}} \frac{2^{- n}}{M_{a}} 𝟙 [(\underset{̲}{a} (k_{a}), {\underset{̲}{s}}^{'} (k_{s}) {\underset{̲}{\tilde{s}}}^{″}, \underset{̲}{y}) \in A_{ε}^{n}] \\ + 2^{n_{1}} \sum_{m_{a}, m_{s}, {\underset{̲}{s}}^{″}} \frac{2^{- n}}{M_{a}} \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{a} (m_{a}), {\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}) \sum_{k_{s} \neq m_{s}, {\underset{̲}{\tilde{s}}}^{″}} 2^{- n} 𝟙 [(\underset{̲}{a} (m_{a}), {\underset{̲}{s}}^{'} (k_{s}) {\underset{̲}{\tilde{s}}}^{″}, \underset{̲}{y}) \in A_{ε}^{n}] . \end{matrix}

(A41)

Here, we replaced nested summations over

m_{a}

,

m_{s}

, and

{\underset{̲}{s}}^{'}

by a single summation over

(m_{a}, m_{s}, {\underset{̲}{s}}^{'})

for the sake of better readability. We will use this notation for the rest of the paper. Then:

(A40): follows from $n = n_{1} + n_{2}$ and from the fact that ${\underset{̲}{S}}^{″}$ is uniform; more precisely, $p ({\underset{̲}{s}}^{″}) = 2^{- n_{2}}$ .
(A41): is obtained by splitting $(k_{a}, k_{s}) \neq (m_{a}, m_{s})$ into $k_{a} \neq m_{a}, k_{s}$ and $k_{a} = m_{a}, k_{s} \neq m_{s}$ .

From (A41), we obtain:

\begin{matrix} {\bar{P}}_{e} (2) \leq M_{a} 2^{n_{1}} 2^{6 n ε} & \sum_{m_{a}, m_{s}, {\underset{̲}{s}}^{″}} p (\underset{̲}{a} (m_{a})) p ({\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}) \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{a} (m_{a}), {\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}) \\ \cdot \sum_{k_{a} \neq m_{a}, k_{s}, {\underset{̲}{\tilde{s}}}^{″}} p (\underset{̲}{a} (k_{a})) p ({\underset{̲}{s}}^{'} (k_{s}) {\underset{̲}{\tilde{s}}}^{″}) 𝟙 [(\underset{̲}{a} (k_{a}), {\underset{̲}{s}}^{'} (k_{s}) {\underset{̲}{\tilde{s}}}^{″}, \underset{̲}{y}) \in A_{ε}^{n}] \\ + 2^{n_{1}} 2^{3 n ε} \sum_{m_{a}, m_{s}, {\underset{̲}{s}}^{″}} p (\underset{̲}{a} (m_{a})) p ({\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}) \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{a} (m_{a}), {\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}) \\ \cdot \sum_{k_{s} \neq m_{s}, {\underset{̲}{\tilde{s}}}^{″}} p ({\underset{̲}{s}}^{'} (k_{s}) {\underset{̲}{\tilde{s}}}^{″}) 𝟙 [(\underset{̲}{a} (m_{a}), {\underset{̲}{s}}^{'} (k_{s}) {\underset{̲}{\tilde{s}}}^{″}, \underset{̲}{y}) \in A_{ε}^{n}] \end{matrix}

(A42)

\begin{matrix} \leq & M_{a} 2^{n_{1}} 2^{6 n ε} \sum_{\underset{̲}{a}, {\underset{̲}{s}}^{'} {\underset{̲}{s}}^{″}} p (\underset{̲}{a}) p ({\underset{̲}{s}}^{'} {\underset{̲}{s}}^{″}) \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{a}, {\underset{̲}{s}}^{'} {\underset{̲}{s}}^{″}) \sum_{\underset{̲}{\tilde{a}}, {\tilde{\underset{̲}{s}}}^{'} {\underset{̲}{\tilde{s}}}^{″}} p (\underset{̲}{\tilde{a}}) p ({\tilde{\underset{̲}{s}}}^{'} {\underset{̲}{\tilde{s}}}^{″}) 𝟙 [(\underset{̲}{\tilde{a}}, {\tilde{\underset{̲}{s}}}^{'} {\underset{̲}{\tilde{s}}}^{″}, \underset{̲}{y}) \in A_{ε}^{n}] \\ + 2^{n_{1}} 2^{3 n ε} \sum_{\underset{̲}{a}, {\underset{̲}{s}}^{'} {\underset{̲}{s}}^{″}} p (\underset{̲}{a}) p ({\underset{̲}{s}}^{'} {\underset{̲}{s}}^{″}) \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{a}, {\underset{̲}{s}}^{'} {\underset{̲}{s}}^{″}) \sum_{{\tilde{\underset{̲}{s}}}^{'} {\underset{̲}{\tilde{s}}}^{″}} p ({\tilde{\underset{̲}{s}}}^{'} {\underset{̲}{\tilde{s}}}^{″}) 𝟙 [(\underset{̲}{a}, {\tilde{\underset{̲}{s}}}^{'} {\underset{̲}{\tilde{s}}}^{″}, \underset{̲}{y}) \in A_{ε}^{n}] \end{matrix}

(A43)

\begin{matrix} = & M_{a} 2^{n_{1}} 2^{6 n ε} \sum_{\underset{̲}{a}, \underset{̲}{s}} p (\underset{̲}{a}) p (\underset{̲}{s}) \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{a}, \underset{̲}{s}) \sum_{\underset{̲}{\tilde{a}}, \underset{̲}{\tilde{s}}} p (\underset{̲}{\tilde{a}}) p (\underset{̲}{\tilde{s}}) 𝟙 [(\underset{̲}{\tilde{a}}, \underset{̲}{\tilde{s}}, \underset{̲}{y}) \in A_{ε}^{n}] \\ + 2^{n_{1}} 2^{3 n ε} \sum_{\underset{̲}{a}, \underset{̲}{s}} p (\underset{̲}{a}) p (\underset{̲}{s}) \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{a}, \underset{̲}{s}) \sum_{\underset{̲}{\tilde{s}}} p (\underset{̲}{\tilde{s}}) 𝟙 [(\underset{̲}{a}, \underset{̲}{\tilde{s}}, \underset{̲}{y}) \in A_{ε}^{n}], \end{matrix}

(A44)

where:

(A42): follows for n sufficiently large and for $\underset{̲}{a} \in B_{S Y, ε}^{n} (A)$ from:

$\frac{1}{M_{a}} \overset{(A31)}{\leq} 2^{3 n ε} p (\underset{̲}{a})$

(A45)

and from $p ({\underset{̲}{s}}^{'} {\underset{̲}{s}}^{″}) = 2^{- n}$ ,
(A43): follows from summing over $\underset{̲}{a} \in A^{n}$ instead of over $\underset{̲}{a} (m_{a}) \in B_{ε}^{n}$ and over $\underset{̲}{\tilde{a}} \in A^{n}$ instead of $\underset{̲}{a} (k_{a}) \in B_{ε}^{n}$ for $k_{a} \neq m_{a}$ . Moreover, it follows from summing over ${\underset{̲}{s}}^{'} \in S^{n_{1}}$ instead of ${\underset{̲}{s}}^{'} (k_{s})$ for $k_{s} = 1, 2, \dots, M_{s}$ and $k_{s} \neq m_{s}$ .
(A44): follows from substituting $\underset{̲}{s}$ for ${\underset{̲}{s}}^{'} {\underset{̲}{s}}^{″}$ and $\underset{̲}{\tilde{s}}$ for ${\tilde{\underset{̲}{s}}}^{'} {\underset{̲}{\tilde{s}}}^{″}$ .

Finally, from (A44), we obtain:

\begin{matrix} {\bar{P}}_{e} (2) = & M_{a} 2^{n_{1}} 2^{6 n ε} \sum_{\underset{̲}{y}} p (\underset{̲}{y}) \sum_{\underset{̲}{\tilde{x}}} p (\underset{̲}{\tilde{x}}) 𝟙 [(\underset{̲}{\tilde{x}}, \underset{̲}{y}) \in A_{ε}^{n}] \\ + 2^{n_{1}} 2^{3 n ε} \sum_{\underset{̲}{a}, \underset{̲}{y}} p (\underset{̲}{a}, \underset{̲}{y}) \sum_{\underset{̲}{\tilde{s}}} p (\underset{̲}{\tilde{s}}) 𝟙 [(\underset{̲}{a}, \underset{̲}{\tilde{s}}, \underset{̲}{y}) \in A_{ε}^{n}] \end{matrix}

(A46)

\begin{matrix} \leq & 2^{n (H (A) + ε)} 2^{n γ} 2^{6 n ε} | A_{ε}^{n} (X Y) | 2^{- n (H (X) - ε)} 2^{- n (H (Y) - ε)} \\ + 2^{n γ} 2^{3 n ε} | A_{ε}^{n} (S A Y) | 2^{- n (H (A, Y) - ε)} 2^{- n (H (S) - ε)} \end{matrix}

(A47)

\begin{matrix} \leq & 2^{n (H (A) + 7 ε)} 2^{n γ} 2^{n (H (X, Y) + ε)} 2^{- n (H (X) - ε)} 2^{- n (H (Y) - ε)} \\ + 2^{n γ} 2^{3 n ε} 2^{n (H (S, A, Y) + ε)} 2^{- n (H (A, Y) - ε)} 2^{- n (H (S) - ε)} \end{matrix}

(A48)

\begin{matrix} = & 2^{n (H (A) + γ + 10 ε - I (X; Y))} + 2^{n (γ + 6 ε - I (S; A, Y))} . \end{matrix}

(A49)

Here, we substituted

n_{1} = n γ

in (A47). Then:

(A46): is obtained by working out the summations over $\underset{̲}{a}, \underset{̲}{s}$ in the first part and $\underset{̲}{s}$ in the second part. Moreover, we replaced $\underset{̲}{\tilde{a}} \underset{̲}{\tilde{s}}$ with $\underset{̲}{\tilde{x}}$ .
(A47): is obtained using for the first part that $M_{a} = | B_{ε}^{n} (A) | \leq 2^{n (H (A) + ε)}$ , i.e., the $B$ -typicality property P₃, and (12). For the second part, we used (12) for $p (\underset{̲}{s})$ and (16) for $p (\underset{̲}{a}, \underset{̲}{y})$ .
(A48): follows from (15), and its extension to jointly typical triplets; more precisely, $| A_{ε}^{n} (S A Y) | \leq 2^{n (H (S, A, Y) + ε)}$ .

The conclusion from (A49) is that for

H (A) + γ < I (X; Y) - 10 ε

and

γ < I (S; A, Y) - 6 ε

, the error probability of the second kind:

{\bar{P}}_{e} (2) \leq ε,

(A50)

for n large enough. The first constraint, i.e.,

H (A) + γ < I (X; Y) - 10 ε

, already implies the second constraint, i.e.,

γ < I (S; A, Y) - 6 ε

, since:

\begin{matrix} γ & < I (X; Y) - H (A) - 10 ε \\ \leq I (S, A; Y) - I (A; Y) - 10 ε \end{matrix}

(A51)

\begin{matrix} = & I (S; Y | A) - 10 ε \end{matrix}

(A52)

\begin{matrix} \leq & I (S; Y | A) + I (S; A) - 10 ε \end{matrix}

(A53)

\begin{matrix} = & I (S; A, Y) - 10 ε, \end{matrix}

(A54)

where we substituted

(S, A)

for X in (A51). Here, (A51) follows from [9] (Thm. 2.4.1), and both (A52) and (A54) follow from the chain rule for MI [9] (Thm. 2.5.2).

Using (A39) and (A50) in (A14), we find that the total error probability averaged over all possible modified sign-codes

{\bar{P}}_{e} \leq 2 ε

for n large enough. This implies the existence of a modified sign-code with total error probability

P_{e} = Pr {({\hat{M}}_{a}, {\hat{M}}_{s}) \neq (M_{a}, M_{s})} \leq 2 ε

. This holds for all

ε > 0

, and thus, the rate:

R = H (A) + γ \leq I (X; Y),

(A55)

is achievable with modified sign-coding, which concludes the proof of Theorem 2.

Appendix B.3. Proof of Theorem 3

For the error of the first kind, we can write:

\begin{matrix} {\bar{P}}_{e} (1) = & \sum_{m_{a}} \frac{1}{M_{a}} \sum_{\underset{̲}{s}} p (\underset{̲}{s}) \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{b} (m_{a}), \underset{̲}{s}) \\ \cdot 𝟙 [(({\underset{̲}{b}}_{1} (m_{a}), \underset{̲}{y}) \notin A_{ε}^{n}) \cup (({\underset{̲}{b}}_{2} (m_{a}), \underset{̲}{y}) \notin A_{ε}^{n}) \cup \dots \cup (({\underset{̲}{b}}_{m} (m_{a}), \underset{̲}{y}) \notin A_{ε}^{n}) \cup ((\underset{̲}{s}, \underset{̲}{y}) \notin A_{ε}^{n})] \end{matrix}

(A56)

\begin{matrix} \leq & \sum_{m_{a}} \frac{1}{M_{a}} \sum_{\underset{̲}{s}} \sum_{\underset{̲}{y}} p (\underset{̲}{s}, \underset{̲}{y} | \underset{̲}{b} (m_{a})) 𝟙 [(\underset{̲}{b} (m_{a}), \underset{̲}{s}, \underset{̲}{y}) \notin A_{ε}^{n}] \end{matrix}

(A57)

\begin{matrix} = & \sum_{m_{a}} \frac{1}{M_{a}} Pr \{(\underset{̲}{b} (m_{a}), \underset{̲}{S}, \underset{̲}{Y}) \notin A_{ε}^{n} | \underset{̲}{B} = \underset{̲}{b} (m_{a})\} \end{matrix}

(A58)

\begin{matrix} \leq & \sum_{m_{a}} \frac{ε}{M_{a}} \end{matrix}

(A59)

\begin{matrix} = & ε, \end{matrix}

(A60)

where we used

\underset{̲}{b} (m_{a})

to denote

({\underset{̲}{b}}_{1} (m_{a}), {\underset{̲}{b}}_{2} (m_{a}), \dots, {\underset{̲}{b}}_{m} (m_{a}))

in (A56) and

\underset{̲}{B}

to denote

({\underset{̲}{B}}_{1}, {\underset{̲}{B}}_{2}, \dots, {\underset{̲}{B}}_{m})

in (A58). Then, we used

p (\underset{̲}{s}) p (\underset{̲}{y} | \underset{̲}{b} (m_{a}), \underset{̲}{s}) = p (\underset{̲}{s}, \underset{̲}{y} | \underset{̲}{b} (m_{a}))

in (A57). Here, (A57) follows from the fact that if at least one of

{\underset{̲}{b}}_{1} (m_{a}), {\underset{̲}{b}}_{2} (m_{a}), \dots, {\underset{̲}{b}}_{m} (m_{a})

or

\underset{̲}{s}

is not jointly typical with

\underset{̲}{y}

, then

(\underset{̲}{b} (m_{a}), \underset{̲}{s}, \underset{̲}{y})

is not jointly typical. Then, (A59) is a direct consequence of Definition 1 since

\underset{̲}{b} (m_{a}) \in B_{S Y, ε}^{n} (B_{1} B_{2} \dots B_{m})

for

m_{a} = 1, 2, \dots, M_{a}

.

For the error of the second kind, we can write:

\begin{matrix} {\bar{P}}_{e} (2) \leq \sum_{m_{a}} \frac{1}{M_{a}} \sum_{\underset{̲}{s}} & p (\underset{̲}{s}) \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{b} (m_{a}), \underset{̲}{s}) \\ \cdot \sum_{k_{a} \neq m_{a}} \sum_{\underset{̲}{\tilde{s}}} p (\underset{̲}{\tilde{s}}) 𝟙 [({\underset{̲}{b}}_{1} (k_{a}), \underset{̲}{y}) \in A_{ε}^{n}, ({\underset{̲}{b}}_{2} (k_{a}), \underset{̲}{y}) \in A_{ε}^{n}, \dots, ({\underset{̲}{b}}_{m} (k_{a}), \underset{̲}{y}) \in A_{ε}^{n}, (\underset{̲}{\tilde{s}}, \underset{̲}{y}) \in A_{ε}^{n}] \\ = M_{a} \sum_{m_{a}} \sum_{\underset{̲}{s}} & \frac{p (\underset{̲}{s})}{M_{a}} \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{b} (m_{a}), \underset{̲}{s}) \\ \cdot \sum_{k_{a} \neq m_{a}} \sum_{\underset{̲}{\tilde{s}}} \frac{p (\underset{̲}{\tilde{s}})}{M_{a}} 𝟙 [({\underset{̲}{b}}_{1} (k_{a}), \underset{̲}{y}) \in A_{ε}^{n}, ({\underset{̲}{b}}_{2} (k_{a}), \underset{̲}{y}) \in A_{ε}^{n}, \dots, ({\underset{̲}{b}}_{m} (k_{a}), \underset{̲}{y}) \in A_{ε}^{n}, (\underset{̲}{\tilde{s}}, \underset{̲}{y}) \in A_{ε}^{n}] \\ \leq M_{a} 2^{6 n ε} & \sum_{m_{a}} \sum_{\underset{̲}{s}} p (\underset{̲}{b} (m_{a})) p (\underset{̲}{s}) \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{b} (m_{a}), \underset{̲}{s}) \end{matrix}

(A61)

\begin{matrix} \cdot \sum_{k_{a} \neq m_{a}} \sum_{\underset{̲}{\tilde{s}}} p (\underset{̲}{\tilde{s}}) p (\underset{̲}{b} (k_{a})) 𝟙 [({\underset{̲}{b}}_{1} (k_{a}), \underset{̲}{y}) \in A_{ε}^{n}, ({\underset{̲}{b}}_{2} (k_{a}), \underset{̲}{y}) \in A_{ε}^{n}, \dots, ({\underset{̲}{b}}_{m} (k_{a}), \underset{̲}{y}) \in A_{ε}^{n}, (\underset{̲}{\tilde{s}}, \underset{̲}{y}) \in A_{ε}^{n}] \\ \leq M_{a} 2^{6 n ε} \sum_{\underset{̲}{b} \in {0, 1}^{m n}} \sum_{\underset{̲}{s}} p (\underset{̲}{b}) p (\underset{̲}{s}) \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{b}, \underset{̲}{s}) \end{matrix}

(A62)

\begin{matrix} \cdot \sum_{\tilde{\underset{̲}{b}} \in {0, 1}^{m n}} \sum_{\underset{̲}{\tilde{s}}} p (\underset{̲}{\tilde{s}}) p (\tilde{\underset{̲}{b}}) 𝟙 [({\underset{̲}{\tilde{b}}}_{1}, \underset{̲}{y}) \in A_{ε}^{n}, ({\underset{̲}{\tilde{b}}}_{2}, \underset{̲}{y}) \in A_{ε}^{n}, \dots, ({\underset{̲}{\tilde{b}}}_{m}, \underset{̲}{y}) \in A_{ε}^{n}, (\underset{̲}{\tilde{s}}, \underset{̲}{y}) \in A_{ε}^{n}] \\ = M_{a} 2^{6 n ε} \sum_{\underset{̲}{y}} p (\underset{̲}{y}) \sum_{\tilde{\underset{̲}{b}}, \underset{̲}{\tilde{s}}} p (\tilde{\underset{̲}{b}}, \underset{̲}{\tilde{s}}) 𝟙 [({\underset{̲}{\tilde{b}}}_{1}, \underset{̲}{y}) \in A_{ε}^{n}, ({\underset{̲}{\tilde{b}}}_{2}, \underset{̲}{y}) \in A_{ε}^{n}, \dots, ({\underset{̲}{\tilde{b}}}_{m}, \underset{̲}{y}) \in A_{ε}^{n}, (\underset{̲}{\tilde{s}}, \underset{̲}{y}) \in A_{ε}^{n}] \end{matrix}

(A63)

\begin{matrix} \leq 2^{n (H (B) + 7 ε)} | A_{ε}^{n} (Y) | & 2^{- n (H (Y) - ε)} \\ \cdot | A_{ε}^{n} (B_{1} | \underset{̲}{y}) | | A_{ε}^{n} (B_{2} | \underset{̲}{y}) | \cdot \dots \cdot | A_{ε}^{n} (B_{m} | \underset{̲}{y}) | | A_{ε}^{n} (S | \underset{̲}{y}) | 2^{- n (H (B, S) - ε)} \end{matrix}

(A64)

\begin{matrix} \leq 2^{n (H (B) + 7 ε)} 2^{n (H (Y) + ε)} & 2^{- n (H (Y) - ε)} \\ \cdot 2^{n (H (B_{1} | Y) + H (B_{2} | Y) + \dots + H (B_{m} | Y) + H (S | Y) + 2 (m + 1) ε)} 2^{- n (H (B, S) - ε)} \end{matrix}

(A65)

\begin{matrix} = & 2^{n (H (B) - H (B, S) + H (B_{1} | Y) + H (B_{2} | Y) + \dots + H (B_{m} | Y) + H (S | Y) + (12 + 2 m) ε)}, \end{matrix}

(A66)

where we used

\underset{̲}{b}

to denote

({\underset{̲}{b}}_{1}, {\underset{̲}{b}}_{2}, \dots, {\underset{̲}{b}}_{m})

and

\tilde{\underset{̲}{b}}

to denote

({\underset{̲}{\tilde{b}}}_{1}, {\underset{̲}{\tilde{b}}}_{2}, \dots, {\underset{̲}{\tilde{b}}}_{m})

in (A62). We also used

B

to denote

(B_{1}, B_{2}, \dots, B_{m})

in (A64). Finally, we simplified the notation by replacing

\tilde{\underset{̲}{b}} \in {0, 1}^{m n}

by

\tilde{\underset{̲}{b}}

in (A63). Then:

(A61): follows for n sufficiently large and for $\underset{̲}{b} \in B_{S Y, ε}^{n} (B)$ from $1 / M_{a} \leq 2^{3 n ε} p (\underset{̲}{b})$ , which can be shown in a similar way as (A31) was derived.
(A62): follows from summing over $\underset{̲}{b} \in {0, 1}^{m n}$ instead of over $\underset{̲}{b} (m_{a}) \in B_{ε}^{n}$ and over $\tilde{\underset{̲}{b}} \in {0, 1}^{m n}$ instead of over $\underset{̲}{b} (k_{a}) \in B_{ε}^{n}$ for $k_{a} \neq m_{a}$ .
(A63): is obtained by working out the summations over ${\underset{̲}{b}}_{1}, {\underset{̲}{b}}_{2}, \dots, {\underset{̲}{b}}_{m}$ , and $\underset{̲}{s}$ .
(A64): follows from $M_{a} = | B_{ε}^{n} (B) | \leq 2^{n (H (B) + ε)}$ , i.e., the $B$ -typicality property P₃, from (12), and from (17).
(A65): follows from (11) and (19).

The conclusion from (A66) is that for:

\begin{matrix} H (B) & < H (B, S) - H (S | Y) - (\sum_{i = 1}^{m} H (B_{i} | Y)) - (12 + 2 m) ε \\ = R_{BMD} (p (b, s)) - (12 + 2 m) ε, \end{matrix}

the error probability of the second kind:

{\bar{P}}_{e} (2) \leq ε

(A67)

for n large enough. Using (A60) and (A67) in (A14), we find that the total error probability averaged over all possible sign-codes

{\bar{P}}_{e} \leq 2 ε

for n large enough. This implies the existence of a sign-code with total error probability

P_{e} = Pr {{\hat{M}}_{a} \neq M_{a}} \leq 2 ε

. This holds for all

ε > 0

, and thus, the rate:

R = H (B) \leq R_{BMD}

(A68)

is achievable with sign-coding and BMD, which concludes the proof of Theorem 3.

Appendix B.4. Proof of Theorem 4

For the error of first kind, we can write:

\begin{matrix} {\bar{P}}_{e} (1) = \sum_{m_{a}} \frac{1}{M_{a}} \sum_{m_{s}} \frac{1}{2^{n_{1}}} \sum_{{\underset{̲}{s}}^{″}} & p ({\underset{̲}{s}}^{″}) \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{b} (m_{a}), {\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}) \\ \cdot 𝟙 [⋃_{i = 1}^{m} (({\underset{̲}{b}}_{i} (m_{a}), \underset{̲}{y}) \notin A_{ε}^{n}) ⋃ (({\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}, \underset{̲}{y}) \notin A_{ε}^{n})] \\ = \sum_{m_{a}} \frac{1}{M_{a}} \sum_{m_{s}} \sum_{{\underset{̲}{s}}^{″}} & 2^{- n} \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{b} (m_{a}), {\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}) \\ \cdot 𝟙 [⋃_{i = 1}^{m} (({\underset{̲}{b}}_{i} (m_{a}), \underset{̲}{y}) \notin A_{ε}^{n}) ⋃ (({\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}, \underset{̲}{y}) \notin A_{ε}^{n})] \end{matrix}

(A69)

\begin{matrix} \leq & \sum_{m_{a}} \frac{1}{M_{a}} \sum_{m_{s}} \sum_{{\underset{̲}{s}}^{″}} \sum_{\underset{̲}{y}} p ({\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}, \underset{̲}{y} | \underset{̲}{b} (m_{a})) 𝟙 [(\underset{̲}{b} (m_{a}), {\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}, \underset{̲}{y}) \notin A_{ε}^{n}] \end{matrix}

(A70)

\begin{matrix} = \sum_{m_{a}} \frac{1}{M_{a}} Pr {(\underset{̲}{b} (m_{a}), \underset{̲}{S}, \underset{̲}{Y}) \notin A_{ε}^{n} | \underset{̲}{B} = \underset{̲}{b} (m_{a})} \\ \leq \sum_{m_{a}} \frac{ε}{M_{a}} \end{matrix}

(A71)

\begin{matrix} = & ε . \end{matrix}

(A72)

Here, to obtain (A69), we used the fact that

{\underset{̲}{S}}^{″}

is uniform; more precisely,

p ({\underset{̲}{s}}^{″}) = 2^{- n_{2}}

. Then, we used

2^{- n} p (\underset{̲}{y} | \underset{̲}{b} (m_{a}), {\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}) = p ({\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}, \underset{̲}{y} | \underset{̲}{b} (m_{a}))

in (A70). Furthermore, (A70) also follows from the fact that if at least one of

{\underset{̲}{b}}_{1} (m_{a}), {\underset{̲}{b}}_{2} (m_{a}), \dots, {\underset{̲}{b}}_{m} (m_{a})

or

{\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}

is not jointly typical with

\underset{̲}{y}

, then

(\underset{̲}{b} (m_{a}), {\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}, \underset{̲}{y})

is not jointly typical. Then, (A71) is a direct consequence of Definition 1 since

\underset{̲}{b} (m_{a}) \in B_{S Y, ε}^{n} (B_{1} B_{2} \dots B_{m})

for

m_{a} = 1, 2, \dots, M_{a}

.

For the error of second kind, we can write:

\begin{matrix} {\bar{P}}_{e} (2) & \leq \sum_{m_{a}} \frac{1}{M_{a}} \sum_{m_{s}} \frac{1}{2^{n_{1}}} \sum_{{\underset{̲}{s}}^{″}} p ({\underset{̲}{s}}^{″}) \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{b} (m_{a}), {\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}) \\ \cdot \sum_{(k_{a}, k_{s}) \neq (m_{a}, m_{s})} \sum_{{\underset{̲}{\tilde{s}}}^{″}} p ({\underset{̲}{\tilde{s}}}^{″}) 𝟙 [⋂_{i = 1}^{m} (({\underset{̲}{b}}_{i} (k_{a}), \underset{̲}{y}) \in A_{ε}^{n}) ⋂ (({\underset{̲}{s}}^{'} (k_{s}) {\underset{̲}{\tilde{s}}}^{″}, \underset{̲}{y}) \in A_{ε}^{n})] \\ = M_{a} 2^{n_{1}} \sum_{m_{a}, m_{s}, {\underset{̲}{s}}^{″}} \frac{2^{- n}}{M_{a}} \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{b} (m_{a}), {\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}) \\ \cdot \sum_{(k_{a}, k_{s}) \neq (m_{a}, m_{s})} \sum_{{\underset{̲}{\tilde{s}}}^{″}} \frac{2^{- n}}{M_{a}} 𝟙 [⋂_{i = 1}^{m} (({\underset{̲}{b}}_{i} (k_{a}), \underset{̲}{y}) \in A_{ε}^{n}) ⋂ (({\underset{̲}{s}}^{'} (k_{s}) {\underset{̲}{\tilde{s}}}^{″}, \underset{̲}{y}) \in A_{ε}^{n})] \end{matrix}

(A73)

\begin{matrix} = M_{a} 2^{n_{1}} & \sum_{m_{a}, m_{s}, {\underset{̲}{s}}^{″}} \frac{2^{- n}}{M_{a}} \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{b} (m_{a}), {\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}) \\ \cdot \sum_{k_{a} \neq m_{a}, k_{s}, {\underset{̲}{\tilde{s}}}^{″}} \frac{2^{- n}}{M_{a}} 𝟙 [⋂_{i = 1}^{m} (({\underset{̲}{b}}_{i} (k_{a}), \underset{̲}{y}) \in A_{ε}^{n}) ⋂ (({\underset{̲}{s}}^{'} (k_{s}) {\underset{̲}{\tilde{s}}}^{″}, \underset{̲}{y}) \in A_{ε}^{n})] \\ + 2^{n_{1}} & \sum_{m_{a}, m_{s}, {\underset{̲}{s}}^{″}} \frac{2^{- n}}{M_{a}} \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{b} (m_{a}), {\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}) \\ \cdot \sum_{k_{s} \neq m_{s}, {\underset{̲}{\tilde{s}}}^{″}} 2^{- n} 𝟙 [⋂_{i = 1}^{m} (({\underset{̲}{b}}_{i} (m_{a}), \underset{̲}{y}) \in A_{ε}^{n}) ⋂ (({\underset{̲}{s}}^{'} (k_{s}) {\underset{̲}{\tilde{s}}}^{″}, \underset{̲}{y}) \in A_{ε}^{n})], \end{matrix}

(A74)

where (A73) follows from

n = n_{1} + n_{2}

and from the fact that

{\underset{̲}{S}}^{″}

is uniform; more precisely,

p ({\underset{̲}{s}}^{″}) = 2^{- n_{2}}

. Then, (A74) is obtained by splitting

(k_{a}, k_{s}) \neq (m_{s}, m_{s})

into

k_{a} \neq m_{s}, k_{s}

and

k_{a} = m_{a}, k_{s} \neq m_{s}

.

From (A74), we obtain:

\begin{matrix} {\bar{P}}_{e} (2) \leq M_{a} 2^{n_{1}} 2^{6 n ε} & \sum_{m_{a}, m_{s}, {\underset{̲}{s}}^{″}} p (\underset{̲}{b} (m_{a})) p ({\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}) \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{b} (m_{a}), {\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}) \\ \cdot \sum_{k_{a} \neq m_{a}, k_{s}, {\underset{̲}{\tilde{s}}}^{″}} p (\underset{̲}{b} (k_{a})) p ({\underset{̲}{s}}^{'} (k_{s}) {\underset{̲}{\tilde{s}}}^{″}) 𝟙 [⋂_{i = 1}^{m} (({\underset{̲}{b}}_{i} (k_{a}), \underset{̲}{y}) \in A_{ε}^{n}) ⋂ (({\underset{̲}{s}}^{'} (k_{s}) {\underset{̲}{\tilde{s}}}^{″}, \underset{̲}{y}) \in A_{ε}^{n})] \\ + 2^{n_{1}} 2^{3 n ε} \sum_{m_{a}, m_{s}, {\underset{̲}{s}}^{″}} p (\underset{̲}{b} (m_{a})) p ({\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}) \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{b} (m_{a}), {\underset{̲}{s}}^{'} (m_{s}) {\underset{̲}{s}}^{″}) \\ \cdot \sum_{k_{s} \neq m_{s}, {\underset{̲}{\tilde{s}}}^{″}} p ({\underset{̲}{s}}^{'} (k_{s}) {\underset{̲}{\tilde{s}}}^{″}) 𝟙 [⋂_{i = 1}^{m} (({\underset{̲}{b}}_{i} (m_{a}), \underset{̲}{y}) \in A_{ε}^{n}) ⋂ (({\underset{̲}{s}}^{'} (k_{s}) {\underset{̲}{\tilde{s}}}^{″}, \underset{̲}{y}) \in A_{ε}^{n})] \end{matrix}

(A75)

\begin{matrix} \leq M_{a} 2^{n_{1}} 2^{6 n ε} \sum_{\underset{̲}{b}, {\underset{̲}{s}}^{'} {\underset{̲}{s}}^{″}} p (\underset{̲}{b}) p ({\underset{̲}{s}}^{'} {\underset{̲}{s}}^{″}) \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{b}, & {\underset{̲}{s}}^{'} {\underset{̲}{s}}^{″}) \sum_{\tilde{\underset{̲}{b}}, {\tilde{\underset{̲}{s}}}^{'} {\underset{̲}{\tilde{s}}}^{″}} p (\tilde{\underset{̲}{b}}) p ({\tilde{\underset{̲}{s}}}^{'} {\underset{̲}{\tilde{s}}}^{″}) \\ \cdot 𝟙 [⋂_{i = 1}^{m} (({\tilde{\underset{̲}{b}}}_{i}, \underset{̲}{y}) \in A_{ε}^{n}) ⋂ (({\tilde{\underset{̲}{s}}}^{'} {\underset{̲}{\tilde{s}}}^{″}, \underset{̲}{y}) \in A_{ε}^{n})] \\ + 2^{n_{1}} 2^{3 n ε} \sum_{\underset{̲}{b}, {\underset{̲}{s}}^{'} {\underset{̲}{s}}^{″}} p (\underset{̲}{b}) p ({\underset{̲}{s}}^{'} {\underset{̲}{s}}^{″}) \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{b}, & {\underset{̲}{s}}^{'} {\underset{̲}{s}}^{″}) \sum_{{\tilde{\underset{̲}{s}}}^{'} {\underset{̲}{\tilde{s}}}^{″}} p ({\tilde{\underset{̲}{s}}}^{'} {\underset{̲}{\tilde{s}}}^{″}) \\ \cdot 𝟙 [⋂_{i = 1}^{m} (({\underset{̲}{b}}_{i}, \underset{̲}{y}) \in A_{ε}^{n}) ⋂ (({\tilde{\underset{̲}{s}}}^{'} {\underset{̲}{\tilde{s}}}^{″}, \underset{̲}{y}) \in A_{ε}^{n})] \end{matrix}

(A76)

\begin{matrix} = & M_{a} 2^{n_{1}} 2^{6 n ε} \sum_{\underset{̲}{b}, \underset{̲}{s}} p (\underset{̲}{b}) p (\underset{̲}{s}) \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{b}, \underset{̲}{s}) \sum_{\tilde{\underset{̲}{b}}, \underset{̲}{\tilde{s}}} p (\tilde{\underset{̲}{b}}) p (\underset{̲}{\tilde{s}}) 𝟙 [⋂_{i = 1}^{m} (({\tilde{\underset{̲}{b}}}_{i}, \underset{̲}{y}) \in A_{ε}^{n}) ⋂ ((\underset{̲}{\tilde{s}}, \underset{̲}{y}) \in A_{ε}^{n})] \\ + 2^{n_{1}} 2^{3 n ε} \sum_{\underset{̲}{b}, \underset{̲}{s}} p (\underset{̲}{b}) p (\underset{̲}{s}) \sum_{\underset{̲}{y}} p (\underset{̲}{y} | \underset{̲}{b}, \underset{̲}{s}) \sum_{\underset{̲}{\tilde{s}}} p (\underset{̲}{\tilde{s}}) 𝟙 [⋂_{i = 1}^{m} (({\underset{̲}{b}}_{i}, \underset{̲}{y}) \in A_{ε}^{n}) ⋂ ((\underset{̲}{\tilde{s}}, \underset{̲}{y}) \in A_{ε}^{n})], \end{matrix}

(A77)

where:

(A75): follows for n sufficiently large and for $\underset{̲}{b} \in B_{S Y, ε}^{n} (B)$ from $1 / M_{a} \leq 2^{3 n ε} p (\underset{̲}{b})$ and from $p ({\underset{̲}{s}}^{'} {\underset{̲}{s}}^{″}) = 2^{- n}$ ,
(A76): follows from summing over $\underset{̲}{b} \in {0, 1}^{m n}$ instead of over $\underset{̲}{b} (m_{a}) \in B_{ε}^{n}$ and over $\tilde{\underset{̲}{b}} \in {0, 1}^{m n}$ instead of $\underset{̲}{b} (k_{a}) \in B_{ε}^{n}$ for $k_{a} \neq m_{a}$ . Moreover, it follows from summing over ${\underset{̲}{s}}^{'} \in S^{n_{1}}$ instead of ${\underset{̲}{s}}^{'} (k_{s})$ for $k_{s} = 1, 2, \dots, M_{s}$ and $k_{s} \neq m_{s}$ ,
(A77): follows from substituting $\underset{̲}{s}$ for ${\underset{̲}{s}}^{'} {\underset{̲}{s}}^{″}$ and $\underset{̲}{\tilde{s}}$ for ${\tilde{\underset{̲}{s}}}^{'} {\underset{̲}{\tilde{s}}}^{″}$ .

Finally, from (A77), we obtain:

\begin{matrix} {\bar{P}}_{e} (2) = & M_{a} 2^{n_{1}} 2^{6 n ε} \sum_{\underset{̲}{y}} p (\underset{̲}{y}) \sum_{\tilde{\underset{̲}{b}}, \underset{̲}{\tilde{s}}} p (\tilde{\underset{̲}{b}}, \underset{̲}{\tilde{s}}) 𝟙 [⋂_{i = 1}^{m} (({\tilde{\underset{̲}{b}}}_{i}, \underset{̲}{y}) \in A_{ε}^{n}) ⋂ ((\underset{̲}{\tilde{s}}, \underset{̲}{y}) \in A_{ε}^{n})] \\ + 2^{n_{1}} 2^{3 n ε} \sum_{\underset{̲}{b}, \underset{̲}{y}} p (\underset{̲}{b}, \underset{̲}{y}) \sum_{\underset{̲}{\tilde{s}}} p (\underset{̲}{\tilde{s}}) 𝟙 [⋂_{i = 1}^{m} (({\underset{̲}{b}}_{i}, \underset{̲}{y}) \in A_{ε}^{n}) ⋂ ((\underset{̲}{\tilde{s}}, \underset{̲}{y}) \in A_{ε}^{n})] \end{matrix}

(A78)

\begin{matrix} \leq & 2^{n (H (B) + ε)} 2^{n γ} 2^{6 n ε} | A_{ε}^{n} (Y) | 2^{- n (H (Y) - ε)} (\prod_{i = 1}^{m} | A_{ε}^{n} (B_{i} | \underset{̲}{y}) |) | A_{ε}^{n} (S | \underset{̲}{y}) | 2^{- n (H (B_{1} B_{2} \dots B_{m} S) - ε)} \\ + 2^{n γ} 2^{3 n ε} | A_{ε}^{n} (Y) | 2^{- n (H (B Y) - ε)} 2^{- n (H (S) - ε)} (\prod_{i = 1}^{m} | A_{ε}^{n} (B_{i} | \underset{̲}{y}) |) | A_{ε}^{n} (S | \underset{̲}{y}) | \end{matrix}

(A79)

\begin{matrix} \leq & 2^{n (H (B) + ε)} 2^{n γ} 2^{6 n ε} 2^{n (H (Y) + ε)} 2^{- n (H (Y) - ε)} (\prod_{i = 1}^{m} 2^{n (H (B_{i} | Y) + 2 ε)}) 2^{n (H (S | Y) + 2 ε)} 2^{- n (H (B S) - ε)} \\ + 2^{n γ} 2^{3 n ε} 2^{n (H (Y) + ε)} 2^{- n (H (B Y) - ε)} 2^{- n (H (S) - ε)} (\prod_{i = 1}^{m} 2^{n (H (B_{i} | Y) + 2 ε)}) 2^{n (H (S | Y) + 2 ε)} \end{matrix}

(A80)

\begin{matrix} = & 2^{n (H (B) + γ + (\sum_{i = 1}^{m} H (B_{i} | Y)) + H (S | Y) - H (B S) + (12 + 2 m) ε)} \\ + 2^{n (γ + H (Y) - H (B Y) - H (S) + (\sum_{i = 1}^{m} H (B_{i} | Y)) + H (S | Y) + (8 + 2 m) ε)} . \end{matrix}

(A81)

Here, we substituted

n_{1} = n γ

in (A79). Then:

(A78): is obtained by working out the summations over ${\underset{̲}{b}}_{1}, {\underset{̲}{b}}_{2}, \dots, {\underset{̲}{b}}_{m}, \underset{̲}{s}$ in the first part and $\underset{̲}{s}$ in the second part.
(A79): is obtained using for the first part that $M_{a} = | B_{ε}^{n} (B) | \leq 2^{n (H (B) + ε)}$ , i.e., the $B$ -typicality property P₃, (12) for $p (\underset{̲}{y})$ , and (17) for $p (\tilde{\underset{̲}{b}}, \underset{̲}{\tilde{s}})$ . For the second part, we used (12) for $p (\underset{̲}{\tilde{s}})$ and (17) for $p (\underset{̲}{b}, \underset{̲}{y})$ .
(A80): follows from (11) and (19).

The conclusion from (A81) is that for:

H (B) + γ \leq R_{BMD} - (12 + 2 m) ε,

(A82)

and for:

γ \leq H (B Y) + H (S) - H (Y) - (\sum_{i = 1}^{m} H (B_{i} | Y)) - H (S | Y) - (8 + 2 m) ε,

(A83)

the error probability of the second kind:

{\bar{P}}_{e} (2) \leq ε

(A84)

for n large enough. The second constraint (A83) is already implied by the first constraint (A82) since:

\begin{matrix} γ & \leq & H (B Y) + H (S) - H (Y) - (\sum_{i = 1}^{m} H (B_{i} | Y)) - H (S | Y) - (8 + 2 m) ε \end{matrix}

(A85)

\begin{matrix} = & H (B Y) + H (S) - H (Y) - (\sum_{i = 1}^{m} H (B_{i} | Y)) - H (S | Y) + H (B S) - H (B S) - (8 + 2 m) ε \end{matrix}

(A86)

\begin{matrix} = & H (B Y) + H (S) - H (Y) + R_{BMD} - H (B) - H (S) - (8 + 2 m) ε \end{matrix}

(A87)

\begin{matrix} = & H (B | Y) + R_{BMD} - H (B) - (8 + 2 m) ε . \end{matrix}

(A88)

Using (A72) and (A84) in (A14), we find that the total error probability averaged over all possible modified sign-codes

{\bar{P}}_{e} \leq 2 ε

for n large enough. This implies the existence of a modified sign-code with total error probability

P_{e} = Pr {({\hat{M}}_{a}, {\hat{M}}_{s}) \neq (M_{a}, M_{s})} \leq 2 ε

. This holds for all

ε > 0

, and thus, the rate:

R = H (B) + γ \leq R_{BMD},

(A89)

is achievable with modified sign-coding, which concludes the proof of Theorem 4.

References

Imai, H.; Hirakawa, S. A new multilevel coding method using error-correcting codes. IEEE Trans. Inf. Theory 1977, 23, 371–377. [Google Scholar] [CrossRef]
Wachsmann, U.; Fischer, R.F.H.; Huber, J.B. Multilevel codes: Theoretical concepts and practical design rules. IEEE Trans. Inf. Theory 1999, 45, 1361–1391. [Google Scholar] [CrossRef]
Ungerböck, G. Channel coding with multilevel/phase signals. IEEE Trans. Inf. Theory 1982, 28, 55–67. [Google Scholar] [CrossRef]
Zehavi, E. 8-PSK trellis codes for a Rayleigh channel. IEEE Trans. Commun. 1992, 40, 873–884. [Google Scholar] [CrossRef]
Caire, G.; Taricco, G.; Biglieri, E. Bit-interleaved coded modulation. IEEE Trans. Inf. Theory 1998, 44, 927–946. [Google Scholar] [CrossRef]
IEEE Standard for Information Technology—Telecommunications and Information Exchange between Systems Local and Metropolitan Area Networks—Specific Requirements—Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications; IEEE Std 802.11-2016 (Revision of IEEE Std 802.11-2012); IEEE Standards Association: Piscataway, NJ, USA, 2016; pp. 1–3534. [CrossRef]
Digital Video Broadcasting (DVB); 2nd Generation Framing Structure, Channel Coding and Modulation Systems for Broadcasting, Interactive Services, News Gathering and Other Broadband Satellite Applications (DVB-S2); European Telecommun. Standards Inst. (ETSI) Standard EN 302 307, Rev. 1.2.1; European Telecommunications Standards Institute: Valbonne, France, 2009.
Böcherer, G.; Steiner, F.; Schulte, P. Bandwidth efficient and rate-matched low-density parity-check coded modulation. IEEE Trans. Commun. 2015, 63, 4651–4665. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
Buchali, F.; Steiner, F.; Böcherer, G.; Schmalen, L.; Schulte, P.; Idler, W. Rate adaptation and reach increase by probabilistically shaped 64-QAM: An experimental demonstration. J. Lightw. Technol. 2016, 34, 1599–1609. [Google Scholar] [CrossRef]
Idler, W.; Buchali, F.; Schmalen, L.; Lach, E.; Braun, R.; Böcherer, G.; Schulte, P.; Steiner, F. Field trial of a 1 Tb/s super-channel network using probabilistically shaped constellations. J. Lightw. Technol. 2017, 35, 1399–1406. [Google Scholar] [CrossRef]
Böcherer, G. Achievable rates for probabilistic shaping. arXiv 2018, arXiv:1707.01134. [Google Scholar]
Böcherer, G. Principles of Coded Modulation. Habilitation Thesis, TUM Department of Electrical and Computer Engineering Technical University of Munich, Munich, Germany, 2018. [Google Scholar]
Amjad, R.A. Information rates and error exponents for probabilistic amplitude shaping. In Proceedings of the 2018 IEEE Information Theory Workshop (ITW), Guangzhou, China, 25–29 November 2018. [Google Scholar]
Gallager, R.G. Information Theory and Reliable Communication; John Wiley & Sons: New York, NY, USA, 1968. [Google Scholar]
Kramer, G. Topics in multi-user information theory. Found. Trends Commun. Inf. Theory 2008, 4, 265–444. [Google Scholar] [CrossRef]
Kaplan, G.; Shamai, S. Information rates and error exponents of compound channels with application to antipodal signaling in a fading environment. AËU Archiv für Elektronik und Übertragungstechnik 1993, 47, 228–239. [Google Scholar]
Merhav, N.; Kaplan, G.; Lapidoth, A.; Shamai, S. On information rates for mismatched decoders. IEEE Trans. Inf. Theory 1994, 40, 1953–1967. [Google Scholar] [CrossRef]
Szczecinski, L.; Alvarado, A. Bit-Interleaved Coded Modulation: Fundamentals, Analysis, and Design; John Wiley & Sons: Chichester, UK, 2015. [Google Scholar]
Martinez, A.; Guillén i Fàbregas, A.; Caire, G.; Willems, F.M.J. Bit-interleaved coded modulation revisited: A mismatched decoding perspective. IEEE Trans. Inf. Theory 2009, 55, 2756–2765. [Google Scholar] [CrossRef]
Guillén i Fàbregas, A.; Martinez, A. Bit-Interleaved Coded Modulation with Shaping. In Proceedings of the 2010 IEEE Information Theory Workshop, Dublin, Ireland, 30 August–3 September 2010. [Google Scholar]
Alvarado, A.; Brännström, F.; Agrell, E. High SNR bounds for the BICM capacity. In Proceedings of the 2011 IEEE Information Theory Workshop, Paraty, Brazil, 16–20 October 2011. [Google Scholar]
Peng, L. Fundamentals of Bit-Interleaved Coded Modulation and Reliable Source Transmission. Ph.D. Thesis, University of Cambridge, Cambridge, UK, 2012. [Google Scholar]
Böcherer, G. Probabilistic signal shaping for bit-metric decoding. In Proceedings of the 2014 IEEE International Symposium on Information Theory, Honolulu, HI, USA, 29 June–4 July 2014. [Google Scholar]
Böcherer, G. Probabilistic signal shaping for bit-metric decoding. arXiv 2014, arXiv:1401.6190. [Google Scholar]
Böcherer, G. Achievable rates for shaped bit-metric decoding. arXiv 2016, arXiv:1410.8075. [Google Scholar]
Schulte, P.; Böcherer, G. Constant composition distribution matching. IEEE Trans. Inf. Theory 2016, 62, 430–434. [Google Scholar] [CrossRef]
Fehenberger, T.; Millar, D.S.; Koike-Akino, T.; Kojima, K.; Parsons, K. Multiset-partition distribution matching. IEEE Trans. Commun. 2019, 67, 1885–1893. [Google Scholar] [CrossRef]
Schulte, P.; Steiner, F. Divergence-optimal fixed-to-fixed length distribution matching with shell mapping. IEEE Wirel. Commun. Lett. 2019, 8, 620–623. [Google Scholar] [CrossRef]
Gültekin, Y.C.; van Houtum, W.J.; Koppelaar, A.; Willems, F.M.J. Enumerative sphere shaping for wireless communications with short packets. IEEE Trans. Wirel. Commun. 2020, 19, 1098–1112. [Google Scholar] [CrossRef]
Amjad, R.A. Information Rates and Error Exponents for Probabilistic Amplitude Shaping. arXiv 2018, arXiv:1802.05973. [Google Scholar]
Shulman, N.; Feder, M. Random coding techniques for nonrandom codes. IEEE Trans. Inf. Theory 1999, 45, 2101–2104. [Google Scholar] [CrossRef]
Yeung, R. Information Theory and Network Coding; Springer: Boston, MA, USA, 2008. [Google Scholar]

Figure 1. Probabilistic amplitude shaping with transmission rate

R = k / n + γ

bit/1D.

Figure 1. Probabilistic amplitude shaping with transmission rate

R = k / n + γ

bit/1D.

Figure 2. The scope of the random coding experiments considered in this work and in [12,13,14].

Figure 3. Sign-coding structure: sign-coding (coder) is combined with amplitude shaping (shaper). SMD, symbol-metric decoding; BMD, bit-metric decoding.

Figure 4. Shaping layer of the random sign-coding setup with SMD.

Figure 5. Shaping layer of the random sign-coding setup with BMD for M-ASK.

Figure 6. Sign-coding with SMD for 4-ASK. All

C_{4 - ASK} \geq 0.562

bit/1D can be achieved with sign-coding. AIR, achievable information rate.

Figure 6. Sign-coding with SMD for 4-ASK. All

C_{4 - ASK} \geq 0.562

bit/1D can be achieved with sign-coding. AIR, achievable information rate.

Table 1. Input alphabet and mapping function for 8-ASK.

A	7	5	3	1	1	3	5	7
S	−1	−1	−1	−1	1	1	1	1
X	−7	−5	−3	−1	1	3	5	7
$B_{1}$	0	0	1	1	1	1	0	0
$B_{2}$	0	1	1	0	0	1	1	0

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gültekin, Y.C.; Alvarado, A.; Willems, F.M.J. Achievable Information Rates for Probabilistic Amplitude Shaping: An Alternative Approach via Random Sign-Coding Arguments. Entropy 2020, 22, 762. https://doi.org/10.3390/e22070762

AMA Style

Gültekin YC, Alvarado A, Willems FMJ. Achievable Information Rates for Probabilistic Amplitude Shaping: An Alternative Approach via Random Sign-Coding Arguments. Entropy. 2020; 22(7):762. https://doi.org/10.3390/e22070762

Chicago/Turabian Style

Gültekin, Yunus Can, Alex Alvarado, and Frans M. J. Willems. 2020. "Achievable Information Rates for Probabilistic Amplitude Shaping: An Alternative Approach via Random Sign-Coding Arguments" Entropy 22, no. 7: 762. https://doi.org/10.3390/e22070762

APA Style

Gültekin, Y. C., Alvarado, A., & Willems, F. M. J. (2020). Achievable Information Rates for Probabilistic Amplitude Shaping: An Alternative Approach via Random Sign-Coding Arguments. Entropy, 22(7), 762. https://doi.org/10.3390/e22070762

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Achievable Information Rates for Probabilistic Amplitude Shaping: An Alternative Approach via Random Sign-Coding Arguments

Abstract

1. Introduction

2. Related Work and Our Contribution

2.1. Notation

2.2. Achievable Information Rates

2.3. Probabilistic Amplitude Shaping: Model

2.4. Probabilistic Amplitude Shaping: Achievable Rates

2.5. Our Contribution

3. Preliminaries

3.1. Memoryless Channels

3.2. Typical Sequences

4. Random Sign-Coding Experiment

4.1. Random Sign-Coding Setup

4.2. Shaping Layer

4.3. Decoding Rules

5. Achievable Information Rates of Sign-Coding

5.1. Sign-Coding with Symbol-Metric Decoding

5.2. Sign-Coding with Bit-Metric Decoding

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

Appendix A. Proof of Lemma 1

Appendix A.1. Proof of P1

Appendix A.2. Proof of P2

Appendix A.3. Proof of P3

Appendix B. Proofs of Theorems 1, 2, 3, and 4

Appendix B.1. Proof of Theorem 1

Appendix B.2. Proof of Theorem 2

Appendix B.3. Proof of Theorem 3

Appendix B.4. Proof of Theorem 4

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Appendix A.1. Proof of P₁

Appendix A.2. Proof of P₂

Appendix A.3. Proof of P₃