Rate and Nearly-Lossless State over the Gilbert–Elliott Channel

Lapidoth, Amos; Wang, Ligong

doi:10.3390/e27050494

Open AccessArticle

Rate and Nearly-Lossless State over the Gilbert–Elliott Channel

by

Amos Lapidoth

^*,†

and

Ligong Wang

^†

Department of Information Technology and Electrical Engineering, ETH Zurich, 8092 Zurich, Switzerland

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Entropy 2025, 27(5), 494; https://doi.org/10.3390/e27050494

Submission received: 26 March 2025 / Revised: 29 April 2025 / Accepted: 30 April 2025 / Published: 2 May 2025

(This article belongs to the Section Information Theory, Probability and Statistics)

Download Versions Notes

Abstract

The capacity of the Gilbert–Elliott channel is calculated for a setting in which the state sequence is revealed to the encoder and is, along with the transmitted message, to be conveyed to the receiver with a vanishing symbol error rate. Said capacity does not depend on whether the state sequence is provided to the encoder strictly causally, causally, or noncausally. It can be achieved using a Block-Markov coding scheme with backward decoding.

Keywords:

causal; Gilbert–Elliott channel; noncausal; rate-and-state capacity; state information; strictly causal

1. Introduction

The Gilbert–Elliott channel [1,2,3,4] is a binary-input, binary-output finite-state channel [5,6,7,8], which was proposed as a simple model for digital communications with bursty errors. Its state sequence is a stationary two-state Markov process whose Time-i state

S_{i}

takes values in the set

S = {0, 1}

. When the Time-i state

S_{i}

is 0, the Time-i channel output

Y_{i}

is the result of feeding the Time-i input

x_{i}

through a Binary Symmetric Channel (BSC) of crossover probability

ϵ_{0}

; when

S_{i}

is 1, the output

Y_{i}

is the result of feeding

x_{i}

through a BSC of crossover probability

ϵ_{1}

. Alternatively, we can describe the output sequence as being the componentwise mod-2 addition of the input sequence with a noise sequence

{Z_{i}}

, where the latter is a binary two-state Hidden Markov process.

The capacity of the Gilbert–Elliott channel is achieved by having the input sequence comprise Independent and Identically Distributed (IID) random bits. It is given, in bits, by

1 - H ({Z_{i}})

, where

H ({Z_{i}})

denotes the entropy rate of the noise

{Z_{i}}

[3]. If the state sequence is revealed to the decoder, then capacity is still achieved by the above input distribution, but it now equals the weighted average of the capacities of the BSCs, namely,

π_{0} (1 - H_{b} (ϵ_{0})) + π_{1} (1 - H_{b} (ϵ_{1}))

. Here,

π_{1} = 1 - π_{0}

denotes the probability that

S_{i}

is 1, and

H_{b} (\cdot)

is the binary entropy function. Throughout this paper, logarithms are to base 2, and information is measured in bits.

Here, we study the case where the state sequence is revealed not to the decoder but to the encoder. We do not, however, seek the Shannon capacity but rather the “rate-and-nearly-lossless-state” capacity, where the decoder wishes to recover not only the transmitted message but also the state sequence. We thus require that the state sequence be conveyed to the decoder with a vanishing symbol error rate. This requirement is weaker than the requirement that the probability of the receiver correctly recovering the entire state sequence tends to one. Neither do we consider a general (nonzero) distortion constraint, as studied for memoryless channels in [9,10].

We solve for this capacity and show that it does not depend on whether the state sequence is provided to the encoder strictly causally, causally, or noncausally. Moreover, it can be achieved using a Block-Markov coding scheme with backward decoding.

2. The Gilbert–Elliott Channel

The Gilbert–Elliott channel has binary input, output, and state alphabets:

X = Y = S = {0, 1}

. The evolution of the state is unaffected by the channel inputs: irrespective of the channel inputs, it forms a stationary time-homogeneous Markov chain of kernel

Pr [S_{i} = s_{i} | S^{i - 1} = s^{i - 1}] = Pr [S_{i} = s_{i} | S_{i - 1} = s_{i - 1}]

(1)

and stationary distribution

P_{S} = (π_{0}, π_{1})

(2)

by which we mean that

P_{S} (1) = 1 - P_{S} (0) = π_{1} = 1 - π_{0} .

(3)

Above and throughout, we use

A^{i}

to denote

A_{1}, \dots, A_{i}

, and we use

A_{j}^{i}

to denote

A_{j}, \dots, A_{i}

. We denote the entropy rate of the state sequence

H ({S_{i}})

. It is given explicitly by [11]

H ({S_{i}}) = H (S_{2} | S_{1}) .

(4)

Given

S = 0

, the channel from X to Y is a BSC with crossover probability

ϵ_{0}

, whereas, given

S = 1

, it is a BSC with crossover probability

ϵ_{1}

. Here,

ϵ_{0}, ϵ_{1} \in [0, 1]

are arbitrary known constants. Thus, if we define

W (1 - x | x, 0) = 1 - W (x | x, 0) = ϵ_{0}

(5)

and

W (1 - x | x, 1) = 1 - W (x | x, 1) = ϵ_{1},

(6)

then we can express the behavior of the channel given the state as

Pr [Y_{i} = y_{i} | X^{i} = x^{i}, Y^{i - 1} = y^{i - 1}, S^{i} = s^{i}] = W (y_{i} | x_{i}, s_{i}) .

(7)

3. The Rate-and-Nearly-Lossless-State Capacity

When discussing rate-R blocklength-n communications over the Gilbert–Elliott channel, we consider the message set

M = {1, \dots, 2^{n R}}

with

2^{n R}

messages, one of which is to be conveyed to the receiver. The latter observes the output sequence

Y^{n}

and attempts to recover the transmitted message m and the state sequence

S^{n}

. The decoder is thus specified by a function

ϕ : Y^{n} \to M \times S^{n}, y^{n} \mapsto (\hat{m}, {\hat{s}}^{n}) .

(8)

As to the encoder, its structure depends on the manner in which the state information is revealed to it. In the strictly causal setting, the Time-i channel input may depend, not only on the transmitted message, but also on the past states; it is therefore denoted

X_{i} (m, S^{i - 1})

. The encoder is thus specified by n functions

f_{i} : M \times S^{i - 1} \to X, i = 1, \dots, n,

(9)

with

X_{i} (m, S^{i - 1})

being

f_{i} (m, S^{i - 1})

. In the causal case, the Time-i channel input is denoted

X_{i} (m, S^{i})

, and the encoder is specified by n functions

f_{i} : M \times S^{i} \to X, i = 1, \dots, n,

(10)

with

X_{i} (m, S^{i})

being

f_{i} (m, S^{i})

. Finally, in the noncausal case, the Time-i channel input is denoted

X_{i} (m, S^{n})

, and the encoder is specified by one function

f : M \times S^{n} \to X^{n},

(11)

with

X_{i} (m, S^{n})

being the i-th component of

f (m, S^{n})

.

We refer to an encoder/decoder pair as a “coding scheme”. The probability of error associated with a given coding scheme and a given message m is

Pr [\hat{M} \neq m]

calculated when the scheme’s encoder is used to transmit Message m, and the scheme’s decoder is used by the receiver. The average probability of error associated with a coding scheme is the arithmetic average of the probabilities of error associated with the different messages. It can be expressed as

P_{e} ≜ Pr [\hat{M} \neq M]

(12)

when the transmitted message M is drawn equiprobably from

M

.

The symbol error rate, or (expected) Hamming distortion, in reconstructing the state sequence is

β ≜ \frac{1}{n} \sum_{i = 1}^{n} Pr [{\hat{S}}_{i} \neq S_{i}]

(13)

again computed when the transmitted message M is drawn equiprobably from

M

.

In all cases, we say that a rate R is achievable if there exists a sequence of coding schemes indexed by the blocklength for which

P_{e}

and

β

both tend to zero. The capacity is defined as the supremum of the achievable rates, with the understanding that, if no positive rate is achievable, then capacity is zero.

Our main result is the following theorem:

Theorem 1.

Irrespective of whether the state information is provided to the encoder strictly causally, causally, or noncausally, if

1 - (π_{0} H_{b} (ϵ_{0}) + π_{1} H_{b} (ϵ_{1})) - H ({S_{i}})

(14)

is positive, then it equals the rate-and-nearly-lossless-state capacity of the Gilbert–Elliott channel; else, said capacity is zero.

The proof is provided in the next section. Expression (14) can be interpreted as the result of subtracting the optimal average description length of the state sequence from the capacity of the channel when the receiver is cognizant of the state.

4. Proof of Theorem 1

4.1. Converse

We prove the converse part of the theorem under the assumption that the state sequence is provided to the encoder noncausally. Let M be uniform over

M

. We have

\begin{matrix} (15) & H (S^{n} | M, Y^{n}) & \leq & H (S^{n} | {\hat{S}}^{n}) \\ (16) & \leq & \sum_{i = 1}^{n} H (S_{i} | {\hat{S}}_{i}) \\ (17) & \leq & \sum_{i = 1}^{n} H_{b} (Pr [\hat{S_{i}} \neq S_{i}]) \\ (18) & \leq & n H_{b} (β) . \end{matrix}

Here, (15) holds because

{\hat{S}}^{n}

is a function of

Y^{n}

and hence also of

(M, Y^{n})

; (17) because

{\hat{S}}_{i}

is binary and because conditioning reduces entropy; and (18) because

H_{b} (\cdot)

is concave and by recalling the definition of

β

in (13). Note that

H_{b} (β)

tends to zero when

β

tends to zero.

Since the decoder needs to recover M with high probability, Fano’s inequality implies the existence of some sequence

{ϵ_{n}}

tending to zero with the blocklength such that

n (R - ϵ_{n}) \leq I (M; Y^{n}) .

(19)

From these two inequalities, we obtain

\begin{matrix} (20) & n (R - ϵ_{n} - H_{b} (β)) & \leq & I (M; Y^{n}) - H (S^{n} | M, Y^{n}) \\ (21) & = & I (M; Y^{n}, S^{n}) - I (M; S^{n} | Y^{n}) - H (S^{n} | M, Y^{n}) \\ (22) & = & I (M; Y^{n}, S^{n}) - H (S^{n} | Y^{n}) \\ (23) & = & I (M; Y^{n} | S^{n}) - H (S^{n} | Y^{n}) \\ (24) & = & H (Y^{n} | S^{n}) - H (S^{n} | Y^{n}) - H (Y^{n} | S^{n}, M) \\ (25) & = & (H (Y^{n}) - I (Y^{n}; S^{n})) - (H (S^{n}) - I (Y^{n}; S^{n})) - H (Y^{n} | S^{n}, M) \\ (26) & = & H (Y^{n}) - H (S^{n}) - H (Y^{n} | S^{n}, M) . \end{matrix}

Note that, above, we only used the chain rule and the fact that

M ⊥ ⊥ S^{n}

(i.e., that M and

S^{n}

are independent). The three terms in the last line can be bounded or simplified as follows:

\begin{matrix} (27) & H (Y^{n}) & \leq & n \\ (28) & H (S^{n}) & = & H (S_{1}) + (n - 1) H ({S_{i}}) \\ (29) & H (Y^{n} | S^{n}, M) & = & H (Y^{n} | S^{n}, M, X^{n}) \\ (30) & = & H (Y^{n} | S^{n}, X^{n}) \\ (31) & = & \sum_{i = 1}^{n} H (Y_{i} | S_{i}, X_{i}) \\ (32) & = & n (π_{0} H_{b} (ϵ_{0}) + π_{1} H_{b} (ϵ_{1})) . \end{matrix}

The converse now follows from (26), (27), (28), and (32).

4.2. Direct Part

For the direct part of the theorem, we assume that the state sequence is revealed to the encoder strictly causally and consider a Block-Markov coding scheme with backward decoding. Consider b blocks, each of k channel uses. Label the length-k typical state sequences

1, \dots, 2^{k (H ({S_{i}}) + ϵ)}

. For each block, randomly generate a codebook with

2^{k \tilde{R}}

independent codewords (with

\tilde{R}

to be specified later), each having k independent Bernoulli

(1 / 2)

components. In the first block, use the entire codebook to send a message of

k \tilde{R}

bits. As to the transmission in Block i for

i \geq 2

, first check whether or not the state sequence in the previous block, namely,

{(s^{k})}_{i - 1}

, was typical. If it was, set

ℓ_{i - 1}

to be the label that is assigned to it; else, set

ℓ_{i - 1} = 1

. Use the generated codebook to send both

ℓ_{i - 1}

and

m_{i}

, the latter consisting of

k (\tilde{R} - (H ({S_{i}}) + ϵ))

information bits, so that the total number of bits matches the size of the codebook.

After completing the transmission in all b blocks, add one extra block to transmit

ℓ_{b}

. To this end, the transmitter—ignoring any information about the states in this extra block—uses the Gilbert–Elliott channel to send

ℓ_{b}

(which we view as a message) and nothing else. The length of the extra block may be larger than k, but—as long as the Shannon capacity of our channel is positive—this will not affect the overall rate when we choose b to be very large. We will address this caveat shortly. However, first, we discuss the decoding.

To decode, we begin with the last bock to decode

ℓ_{b}

and thus recover

{(s^{k})}_{b}

. This task will be accomplished successfully provided that the last extra block is sufficiently long. With the state information

{(s^{k})}_{b}

at hand, we then decode both

ℓ_{b - 1}

and

m_{b}

from Block b. Both can be decoded correctly with high probability (as k grows large) provided that

\tilde{R} < I (X_{i}; Y_{i} | S_{i}) = 1 - (π_{0} H_{b} (ϵ_{0}) + π_{1} H_{b} (ϵ_{1})) .

(33)

We continue this procedure backwards: By the time we get to decoding Block i, we will have already reliably recovered

{(s^{k})}_{i}

in Block

(i + 1)

, so we could use it as decoder-side information. The overall information rate—when b tends to infinity—approaches

\tilde{R} - H ({S_{i}}) - ϵ,

(34)

which can indeed be made arbitrarily close to (14).

We have now established that, provided that

\tilde{R}

is appropriately chosen, as k grows large, the probability of the decoder correctly decoding both the message and

s^{k b}

tends to one. It only remains to note that this also means that the symbol error rate

β

is small. Indeed, having

{\hat{S}}^{k b} = S^{k b}

with high probability guarantees that the symbol error rate in the first b blocks is close to zero, whereas the influence of the extra block (whose symbol error rate will typically be high, since we do not attempt to decode the states there) becomes negligible when we choose b to be large.

We now return to the caveat regarding the Shannon capacity and the extra block. We need to show that the Shannon capacity is positive whenever (14) is positive. (If (14) is not positive, there is no need for a direct part.) Recall that the Shannon capacity of the Gilbert–Elliott channel without any state information is

1 - H ({Z_{i}})

, with

{Z_{i}}

denoting the noise random variables, which form a Hidden Markov process. This Shannon capacity is positive unless

H ({Z_{i}}) = 1

bit. Our concern regarding the caveat is thus only when

H ({Z_{i}})

is 1.

In Appendix A, we show the following:

Proposition 1.

If

H ({Z_{i}}) = 1

, then

1.: ${Z_{i}}$ are IID Bernoulli $(1 / 2)$ , and
2.: either ${S_{i}}$ are IID or $ϵ_{0} = ϵ_{1} = 1 / 2$ .

If

{Z_{i}}

are IID Bernoulli

(1 / 2)

and

{S_{i}}

are IID, then (14) cannot be positive because, in this case, it equals

H (Z_{i}) - H (Z_{i} | S_{i}) - H (S_{i}) = - H (S_{i} | Z_{i}) .

(35)

If

ϵ_{0} = ϵ_{1} = 1 / 2

, then (14) is also (trivially) not positive. Hence, indeed, the Shannon capacity is positive whenever (14) is positive.

Author Contributions

Writing—original draft preparation, A.L. and L.W.; writing—review and editing, A.L. and L.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Swiss National Science Foundation (SNSF) under Grant 200021-215090.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BSC	Binary Symmetric Channel
IID	Independent and Identically Distributed

Appendix A. Proof of Proposition 1

In this appendix, we prove Proposition 1. We start with the first part, which we formally state as follows:

Lemma A1.

If

{Z_{i}}

is a stationary process taking values in the binary set

{0, 1}

, and if its entropy rate is 1, then it must be IID Bernoulli

(1 / 2)

.

Proof.

For any positive integers

ν

and ℓ,

\begin{matrix} (A1) & \frac{1}{ℓ ν} H (Z_{1}^{ℓ ν}) & = & \frac{1}{ℓ ν} H (Z_{1}^{ν}, \dots, Z_{(ℓ - 1) ν + 1}^{ℓ ν}) \\ (A2) & \leq & \frac{1}{ℓ ν} \sum_{i = 1}^{ℓ} H (Z_{(i - 1) ν + 1}^{i ν}) \\ (A3) & = & \frac{1}{ℓ ν} ℓ H (Z_{1}^{ν}) \\ (A4) & = & \frac{1}{ν} H (Z_{1}^{ν}) \end{matrix}

where (A3) follows from the stationarity hypothesis. Letting ℓ tend to infinity so that the left-hand side of (A1) converges to the entropy rate, i.e., to 1, we obtain

1 \leq \frac{1}{ν} H (Z_{1}^{ν}) .

(A5)

We obtain the reverse inequality by noting that

Z_{1}^{ν}

takes values in a set of cardinality

2^{ν}

. We thus conclude that

Z_{1}^{ν}

is equiprobably distributed, and its components are thus IID Bernoulli

(1 / 2)

. Since

ν

can be any positive integer, we conclude that

{Z_{i}}

is an IID Bernoulli

(1 / 2)

process. □

We next show the remaining part of Proposition 1:

Lemma A2.

If

{Z_{i}}

are IID Bernoulli

(1 / 2)

, and if

ϵ_{0}

and

ϵ_{1}

are not both equal to

1 / 2

, then

{S_{i}}

must be IID.

Proof.

We assume

π_{0}, π_{1} \in (0, 1)

; otherwise,

{S_{i}}

are all deterministic and trivially IID. We also assume, without loss of generality, that

ϵ_{0} < \frac{1}{2} .

(A6)

For

Z_{i}

to be Bernoulli

(1 / 2)

, we must then have

ϵ_{1} > \frac{1}{2} .

(A7)

It then follows that the equation

π ϵ_{0} + (1 - π) ϵ_{1} = \frac{1}{2}

(A8)

has a unique solution in

π

, which must be

π_{0}

as in the stationary distribution (2).

Because

Z_{i}

is Bernoulli

(1 / 2)

and

S_{i}

is Bernoulli

(π_{1})

, we can use Bayes’s rule to express the conditional distribution of

S_{i}

given

Z_{i} = 1

as

\begin{matrix} (A9) & P_{S_{i} | Z_{i} = 1} & = (2 ϵ_{0} π_{0}, 2 ϵ_{1} π_{1}) \\ (A10) & \neq (π_{0}, π_{1}), \end{matrix}

where the inequality follows from (A6) and (A7). Let

M

denote the (matrix form of the) Markov kernel

P_{S_{i + 1} | S_{i}}

. We next assume that

M

is nonsingular and show that it leads to a contradiction. If

M

is nonsingular, then

\begin{matrix} (A11) & P_{S_{i + 1} | Z_{i} = 1} & = & (2 ϵ_{0} π_{0}, 2 ϵ_{1} π_{1}) M \\ (A12) & \neq & (π_{0}, π_{1}) M \\ (A13) & = & (π_{0}, π_{1}), \end{matrix}

where the inequality follows from (A10) and the nonsingularity of

M

, and the last equality holds because

(π_{0}, π_{1})

is the stationary distribution corresponding to

M

.

Recalling that (A8) is only satisfied when

π

equals

π_{0}

, we conclude that

P_{Z_{i + 1} | Z_{i} = 1} (1) = P_{S_{i + 1} | Z_{i} = 1} (0) ϵ_{0} + P_{S_{i + 1} | Z_{i} = 1} (1) ϵ_{1} \neq \frac{1}{2},

(A14)

which contradicts our assumption that

{Z_{i}}

are IID Bernoulli

(1 / 2)

.

We thus conclude that the Markov kernel

M

must be singular, which, in the

2 \times 2

case, means that its two rows are identical, and that

{S_{i}}

are hence IID. □

References

Gilbert, E.N. Capacity of a burst-noise channel. Bell Syst. Tech. J. 1960, 39, 1253–1265. [Google Scholar] [CrossRef]
Elliott, E.O. Estimates of error rates for codes on burst-noise channels. Bell Syst. Tech. J. 1963, 42, 1977–1997. [Google Scholar] [CrossRef]
Mushkin, M.; Bar-David, I. Capacity and coding for the Gilbert-Elliott channels. IEEE Trans. Inf. Theory 1989, 35, 1277–1290. [Google Scholar] [CrossRef]
Han, Y.; Guillén i Fàbregas, A. Fixed-memory capacity bounds for the Gilbert-Elliott channel. In Proceedings of the 2024 IEEE International Symposium on Information Theory (ISIT), Athens, Greece, 7–12 July 2024; pp. 155–159. [Google Scholar]
Gallager, R.G. Information Theory and Reliable Communication; John Wiley & Sons: Hoboken, NJ, USA, 1968. [Google Scholar]
Goldsmith, A.; Varaiya, P. Capacity, mutual information, and coding for finite-state Markov channels. IEEE Trans. Inf. Theory 1996, 42, 868–886. [Google Scholar] [CrossRef]
Permuter, H.H.; Weissman, T.; Goldsmith, A.J. Finite state channels with time-invariant deterministic feedback. IEEE Trans. Inf. Theory 2006, 55, 644–662. [Google Scholar] [CrossRef]
Shrader, B.; Permuter, H. Feedback capacity of the compound channel. IEEE Trans. Inf. Theory 2009, 55, 3629–3644. [Google Scholar] [CrossRef]
Choudhuri, C.; Kim, Y.H.; Mitra, U. Causal state communication. IEEE Trans. Inf. Theory 2012, 59, 3709–3719. [Google Scholar] [CrossRef]
Bross, S.I.; Lapidoth, A. The rate-and-state capacity with feedback. IEEE Trans. Inf. Theory 2018, 64, 1893–1918. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lapidoth, A.; Wang, L. Rate and Nearly-Lossless State over the Gilbert–Elliott Channel. Entropy 2025, 27, 494. https://doi.org/10.3390/e27050494

AMA Style

Lapidoth A, Wang L. Rate and Nearly-Lossless State over the Gilbert–Elliott Channel. Entropy. 2025; 27(5):494. https://doi.org/10.3390/e27050494

Chicago/Turabian Style

Lapidoth, Amos, and Ligong Wang. 2025. "Rate and Nearly-Lossless State over the Gilbert–Elliott Channel" Entropy 27, no. 5: 494. https://doi.org/10.3390/e27050494

APA Style

Lapidoth, A., & Wang, L. (2025). Rate and Nearly-Lossless State over the Gilbert–Elliott Channel. Entropy, 27(5), 494. https://doi.org/10.3390/e27050494

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Rate and Nearly-Lossless State over the Gilbert–Elliott Channel

Abstract

1. Introduction

2. The Gilbert–Elliott Channel

3. The Rate-and-Nearly-Lossless-State Capacity

4. Proof of Theorem 1

4.1. Converse

4.2. Direct Part

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Proof of Proposition 1

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI