Next Article in Journal
Does Excellence Correspond to Universal Inequality Level?
Previous Article in Journal
Entropy, Carnot Cycle and Information Theory, Part 2
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Rate and Nearly-Lossless State over the Gilbert–Elliott Channel

Department of Information Technology and Electrical Engineering, ETH Zurich, 8092 Zurich, Switzerland
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Entropy 2025, 27(5), 494; https://doi.org/10.3390/e27050494
Submission received: 26 March 2025 / Revised: 29 April 2025 / Accepted: 30 April 2025 / Published: 2 May 2025
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
The capacity of the Gilbert–Elliott channel is calculated for a setting in which the state sequence is revealed to the encoder and is, along with the transmitted message, to be conveyed to the receiver with a vanishing symbol error rate. Said capacity does not depend on whether the state sequence is provided to the encoder strictly causally, causally, or noncausally. It can be achieved using a Block-Markov coding scheme with backward decoding.

1. Introduction

The Gilbert–Elliott channel [1,2,3,4] is a binary-input, binary-output finite-state channel [5,6,7,8], which was proposed as a simple model for digital communications with bursty errors. Its state sequence is a stationary two-state Markov process whose Time-i state S i takes values in the set S = { 0 , 1 } . When the Time-i state S i is 0, the Time-i channel output Y i is the result of feeding the Time-i input x i through a Binary Symmetric Channel (BSC) of crossover probability ϵ 0 ; when S i is 1, the output Y i is the result of feeding x i through a BSC of crossover probability ϵ 1 . Alternatively, we can describe the output sequence as being the componentwise mod-2 addition of the input sequence with a noise sequence { Z i } , where the latter is a binary two-state Hidden Markov process.
The capacity of the Gilbert–Elliott channel is achieved by having the input sequence comprise Independent and Identically Distributed (IID) random bits. It is given, in bits, by 1 H ( { Z i } ) , where H ( { Z i } ) denotes the entropy rate of the noise { Z i } [3]. If the state sequence is revealed to the decoder, then capacity is still achieved by the above input distribution, but it now equals the weighted average of the capacities of the BSCs, namely, π 0 ( 1 H b ( ϵ 0 ) ) + π 1 ( 1 H b ( ϵ 1 ) ) . Here, π 1 = 1 π 0 denotes the probability that S i is 1, and H b ( · ) is the binary entropy function. Throughout this paper, logarithms are to base 2, and information is measured in bits.
Here, we study the case where the state sequence is revealed not to the decoder but to the encoder. We do not, however, seek the Shannon capacity but rather the “rate-and-nearly-lossless-state” capacity, where the decoder wishes to recover not only the transmitted message but also the state sequence. We thus require that the state sequence be conveyed to the decoder with a vanishing symbol error rate. This requirement is weaker than the requirement that the probability of the receiver correctly recovering the entire state sequence tends to one. Neither do we consider a general (nonzero) distortion constraint, as studied for memoryless channels in [9,10].
We solve for this capacity and show that it does not depend on whether the state sequence is provided to the encoder strictly causally, causally, or noncausally. Moreover, it can be achieved using a Block-Markov coding scheme with backward decoding.

2. The Gilbert–Elliott Channel

The Gilbert–Elliott channel has binary input, output, and state alphabets: X = Y = S = { 0 , 1 } . The evolution of the state is unaffected by the channel inputs: irrespective of the channel inputs, it forms a stationary time-homogeneous Markov chain of kernel
Pr [ S i = s i | S i 1 = s i 1 ] = Pr [ S i = s i | S i 1 = s i 1 ]
and stationary distribution
P S = π 0 , π 1
by which we mean that
P S ( 1 ) = 1 P S ( 0 ) = π 1 = 1 π 0 .
Above and throughout, we use A i to denote A 1 , , A i , and we use A j i to denote A j , , A i . We denote the entropy rate of the state sequence H ( { S i } ) . It is given explicitly by [11]
H ( { S i } ) = H ( S 2 | S 1 ) .
Given S = 0 , the channel from X to Y is a BSC with crossover probability ϵ 0 , whereas, given S = 1 , it is a BSC with crossover probability ϵ 1 . Here, ϵ 0 , ϵ 1 [ 0 , 1 ] are arbitrary known constants. Thus, if we define
W ( 1 x | x , 0 ) = 1 W ( x | x , 0 ) = ϵ 0
and
W ( 1 x | x , 1 ) = 1 W ( x | x , 1 ) = ϵ 1 ,
then we can express the behavior of the channel given the state as
Pr [ Y i = y i | X i = x i , Y i 1 = y i 1 , S i = s i ] = W ( y i | x i , s i ) .

3. The Rate-and-Nearly-Lossless-State Capacity

When discussing rate-R blocklength-n communications over the Gilbert–Elliott channel, we consider the message set M = { 1 , , 2 n R } with 2 n R messages, one of which is to be conveyed to the receiver. The latter observes the output sequence Y n and attempts to recover the transmitted message m and the state sequence S n . The decoder is thus specified by a function
ϕ : Y n M × S n , y n ( m ^ , s ^ n ) .
As to the encoder, its structure depends on the manner in which the state information is revealed to it. In the strictly causal setting, the Time-i channel input may depend, not only on the transmitted message, but also on the past states; it is therefore denoted X i ( m , S i 1 ) . The encoder is thus specified by n functions
f i : M × S i 1 X , i = 1 , , n ,
with X i ( m , S i 1 ) being f i ( m , S i 1 ) . In the causal case, the Time-i channel input is denoted X i ( m , S i ) , and the encoder is specified by n functions
f i : M × S i X , i = 1 , , n ,
with X i ( m , S i ) being f i ( m , S i ) . Finally, in the noncausal case, the Time-i channel input is denoted X i ( m , S n ) , and the encoder is specified by one function
f : M × S n X n ,
with X i ( m , S n ) being the i-th component of f ( m , S n ) .
We refer to an encoder/decoder pair as a “coding scheme”. The probability of error associated with a given coding scheme and a given message m is Pr [ M ^ m ] calculated when the scheme’s encoder is used to transmit Message m, and the scheme’s decoder is used by the receiver. The average probability of error associated with a coding scheme is the arithmetic average of the probabilities of error associated with the different messages. It can be expressed as
P e Pr [ M ^ M ]
when the transmitted message M is drawn equiprobably from M .
The symbol error rate, or (expected) Hamming distortion, in reconstructing the state sequence is
β 1 n i = 1 n Pr [ S ^ i S i ]
again computed when the transmitted message M is drawn equiprobably from M .
In all cases, we say that a rate R is achievable if there exists a sequence of coding schemes indexed by the blocklength for which P e and β both tend to zero. The capacity is defined as the supremum of the achievable rates, with the understanding that, if no positive rate is achievable, then capacity is zero.
Our main result is the following theorem:
Theorem 1.
Irrespective of whether the state information is provided to the encoder strictly causally, causally, or noncausally, if
1 π 0 H b ( ϵ 0 ) + π 1 H b ( ϵ 1 ) H ( { S i } )
is positive, then it equals the rate-and-nearly-lossless-state capacity of the Gilbert–Elliott channel; else, said capacity is zero.
The proof is provided in the next section. Expression (14) can be interpreted as the result of subtracting the optimal average description length of the state sequence from the capacity of the channel when the receiver is cognizant of the state.

4. Proof of Theorem 1

4.1. Converse

We prove the converse part of the theorem under the assumption that the state sequence is provided to the encoder noncausally. Let M be uniform over M . We have
(15) H ( S n | M , Y n ) H ( S n | S ^ n ) (16) i = 1 n H ( S i | S ^ i ) (17) i = 1 n H b Pr [ S i ^ S i ] (18) n H b ( β ) .
Here, (15) holds because S ^ n is a function of Y n and hence also of ( M , Y n ) ; (17) because S ^ i is binary and because conditioning reduces entropy; and (18) because H b ( · ) is concave and by recalling the definition of β in (13). Note that H b ( β ) tends to zero when β tends to zero.
Since the decoder needs to recover M with high probability, Fano’s inequality implies the existence of some sequence { ϵ n } tending to zero with the blocklength such that
n ( R ϵ n ) I ( M ; Y n ) .
From these two inequalities, we obtain
(20) n ( R ϵ n H b ( β ) ) I ( M ; Y n ) H ( S n | M , Y n ) (21) = I ( M ; Y n , S n ) I ( M ; S n | Y n ) H ( S n | M , Y n ) (22) = I ( M ; Y n , S n ) H ( S n | Y n ) (23) = I ( M ; Y n | S n ) H ( S n | Y n ) (24) = H ( Y n | S n ) H ( S n | Y n ) H ( Y n | S n , M ) (25) = H ( Y n ) I ( Y n ; S n ) H ( S n ) I ( Y n ; S n ) H ( Y n | S n , M ) (26) = H ( Y n ) H ( S n ) H ( Y n | S n , M ) .
Note that, above, we only used the chain rule and the fact that M S n (i.e., that M and S n are independent). The three terms in the last line can be bounded or simplified as follows:
(27) H ( Y n ) n (28) H ( S n ) = H ( S 1 ) + ( n 1 ) H ( { S i } ) (29) H ( Y n | S n , M ) = H ( Y n | S n , M , X n ) (30) = H ( Y n | S n , X n ) (31) = i = 1 n H ( Y i | S i , X i ) (32) = n π 0 H b ( ϵ 0 ) + π 1 H b ( ϵ 1 ) .
The converse now follows from (26), (27), (28), and (32).

4.2. Direct Part

For the direct part of the theorem, we assume that the state sequence is revealed to the encoder strictly causally and consider a Block-Markov coding scheme with backward decoding. Consider b blocks, each of k channel uses. Label the length-k typical state sequences 1 , , 2 k ( H ( { S i } ) + ϵ ) . For each block, randomly generate a codebook with 2 k R ˜ independent codewords (with R ˜ to be specified later), each having k independent Bernoulli ( 1 / 2 ) components. In the first block, use the entire codebook to send a message of k R ˜ bits. As to the transmission in Block i for i 2 , first check whether or not the state sequence in the previous block, namely, ( s k ) i 1 , was typical. If it was, set i 1 to be the label that is assigned to it; else, set i 1 = 1 . Use the generated codebook to send both i 1 and m i , the latter consisting of k R ˜ ( H ( { S i } ) + ϵ ) information bits, so that the total number of bits matches the size of the codebook.
After completing the transmission in all b blocks, add one extra block to transmit b . To this end, the transmitter—ignoring any information about the states in this extra block—uses the Gilbert–Elliott channel to send b (which we view as a message) and nothing else. The length of the extra block may be larger than k, but—as long as the Shannon capacity of our channel is positive—this will not affect the overall rate when we choose b to be very large. We will address this caveat shortly. However, first, we discuss the decoding.
To decode, we begin with the last bock to decode b and thus recover ( s k ) b . This task will be accomplished successfully provided that the last extra block is sufficiently long. With the state information ( s k ) b at hand, we then decode both b 1 and m b from Block b. Both can be decoded correctly with high probability (as k grows large) provided that
R ˜ < I ( X i ; Y i | S i ) = 1 π 0 H b ( ϵ 0 ) + π 1 H b ( ϵ 1 ) .
We continue this procedure backwards: By the time we get to decoding Block i, we will have already reliably recovered ( s k ) i in Block ( i + 1 ) , so we could use it as decoder-side information. The overall information rate—when b tends to infinity—approaches
R ˜ H ( { S i } ) ϵ ,
which can indeed be made arbitrarily close to (14).
We have now established that, provided that R ˜ is appropriately chosen, as k grows large, the probability of the decoder correctly decoding both the message and s k b tends to one. It only remains to note that this also means that the symbol error rate β is small. Indeed, having S ^ k b = S k b with high probability guarantees that the symbol error rate in the first b blocks is close to zero, whereas the influence of the extra block (whose symbol error rate will typically be high, since we do not attempt to decode the states there) becomes negligible when we choose b to be large.
We now return to the caveat regarding the Shannon capacity and the extra block. We need to show that the Shannon capacity is positive whenever (14) is positive. (If (14) is not positive, there is no need for a direct part.) Recall that the Shannon capacity of the Gilbert–Elliott channel without any state information is 1 H ( { Z i } ) , with { Z i } denoting the noise random variables, which form a Hidden Markov process. This Shannon capacity is positive unless H ( { Z i } ) = 1 bit. Our concern regarding the caveat is thus only when H ( { Z i } ) is 1.
In Appendix A, we show the following:
Proposition 1.
If H ( { Z i } ) = 1 , then
1.
{ Z i } are IID Bernoulli ( 1 / 2 ) , and
2.
either { S i } are IID or ϵ 0 = ϵ 1 = 1 / 2 .
If { Z i } are IID Bernoulli ( 1 / 2 ) and { S i } are IID, then (14) cannot be positive because, in this case, it equals
H ( Z i ) H ( Z i | S i ) H ( S i ) = H ( S i | Z i ) .
If ϵ 0 = ϵ 1 = 1 / 2 , then (14) is also (trivially) not positive. Hence, indeed, the Shannon capacity is positive whenever (14) is positive.

Author Contributions

Writing—original draft preparation, A.L. and L.W.; writing—review and editing, A.L. and L.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Swiss National Science Foundation (SNSF) under Grant 200021-215090.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
BSCBinary Symmetric Channel
IIDIndependent and Identically Distributed

Appendix A. Proof of Proposition 1

In this appendix, we prove Proposition 1. We start with the first part, which we formally state as follows:
Lemma A1.
If { Z i } is a stationary process taking values in the binary set { 0 , 1 } , and if its entropy rate is 1, then it must be IID Bernoulli ( 1 / 2 ) .
Proof. 
For any positive integers ν and ,
(A1) 1 ν H Z 1 ν = 1 ν H Z 1 ν , , Z ( 1 ) ν + 1 ν (A2) 1 ν i = 1 H Z ( i 1 ) ν + 1 i ν (A3) = 1 ν H Z 1 ν (A4) = 1 ν H Z 1 ν
where (A3) follows from the stationarity hypothesis. Letting tend to infinity so that the left-hand side of (A1) converges to the entropy rate, i.e., to 1, we obtain
1 1 ν H Z 1 ν .
We obtain the reverse inequality by noting that Z 1 ν takes values in a set of cardinality 2 ν . We thus conclude that Z 1 ν is equiprobably distributed, and its components are thus IID Bernoulli ( 1 / 2 ) . Since ν can be any positive integer, we conclude that { Z i } is an IID Bernoulli ( 1 / 2 ) process. □
We next show the remaining part of Proposition 1:
Lemma A2.
If { Z i } are IID Bernoulli ( 1 / 2 ) , and if ϵ 0 and ϵ 1 are not both equal to 1 / 2 , then { S i } must be IID.
Proof. 
We assume π 0 , π 1 ( 0 , 1 ) ; otherwise, { S i } are all deterministic and trivially IID. We also assume, without loss of generality, that
ϵ 0 < 1 2 .
For Z i to be Bernoulli ( 1 / 2 ) , we must then have
ϵ 1 > 1 2 .
It then follows that the equation
π ϵ 0 + ( 1 π ) ϵ 1 = 1 2
has a unique solution in π , which must be π 0 as in the stationary distribution (2).
Because Z i is Bernoulli ( 1 / 2 ) and S i is Bernoulli ( π 1 ) , we can use Bayes’s rule to express the conditional distribution of S i given Z i = 1 as
(A9) P S i | Z i = 1 = 2 ϵ 0 π 0 , 2 ϵ 1 π 1 (A10) ( π 0 , π 1 ) ,
where the inequality follows from (A6) and (A7). Let M denote the (matrix form of the) Markov kernel P S i + 1 | S i . We next assume that M is nonsingular and show that it leads to a contradiction. If M is nonsingular, then
(A11) P S i + 1 | Z i = 1 = 2 ϵ 0 π 0 , 2 ϵ 1 π 1 M (A12) ( π 0 , π 1 ) M (A13) = ( π 0 , π 1 ) ,
where the inequality follows from (A10) and the nonsingularity of M , and the last equality holds because ( π 0 , π 1 ) is the stationary distribution corresponding to M .
Recalling that (A8) is only satisfied when π equals π 0 , we conclude that
P Z i + 1 | Z i = 1 ( 1 ) = P S i + 1 | Z i = 1 ( 0 ) ϵ 0 + P S i + 1 | Z i = 1 ( 1 ) ϵ 1 1 2 ,
which contradicts our assumption that { Z i } are IID Bernoulli ( 1 / 2 ) .
We thus conclude that the Markov kernel M must be singular, which, in the 2 × 2 case, means that its two rows are identical, and that { S i } are hence IID. □

References

  1. Gilbert, E.N. Capacity of a burst-noise channel. Bell Syst. Tech. J. 1960, 39, 1253–1265. [Google Scholar] [CrossRef]
  2. Elliott, E.O. Estimates of error rates for codes on burst-noise channels. Bell Syst. Tech. J. 1963, 42, 1977–1997. [Google Scholar] [CrossRef]
  3. Mushkin, M.; Bar-David, I. Capacity and coding for the Gilbert-Elliott channels. IEEE Trans. Inf. Theory 1989, 35, 1277–1290. [Google Scholar] [CrossRef]
  4. Han, Y.; Guillén i Fàbregas, A. Fixed-memory capacity bounds for the Gilbert-Elliott channel. In Proceedings of the 2024 IEEE International Symposium on Information Theory (ISIT), Athens, Greece, 7–12 July 2024; pp. 155–159. [Google Scholar]
  5. Gallager, R.G. Information Theory and Reliable Communication; John Wiley & Sons: Hoboken, NJ, USA, 1968. [Google Scholar]
  6. Goldsmith, A.; Varaiya, P. Capacity, mutual information, and coding for finite-state Markov channels. IEEE Trans. Inf. Theory 1996, 42, 868–886. [Google Scholar] [CrossRef]
  7. Permuter, H.H.; Weissman, T.; Goldsmith, A.J. Finite state channels with time-invariant deterministic feedback. IEEE Trans. Inf. Theory 2006, 55, 644–662. [Google Scholar] [CrossRef]
  8. Shrader, B.; Permuter, H. Feedback capacity of the compound channel. IEEE Trans. Inf. Theory 2009, 55, 3629–3644. [Google Scholar] [CrossRef]
  9. Choudhuri, C.; Kim, Y.H.; Mitra, U. Causal state communication. IEEE Trans. Inf. Theory 2012, 59, 3709–3719. [Google Scholar] [CrossRef]
  10. Bross, S.I.; Lapidoth, A. The rate-and-state capacity with feedback. IEEE Trans. Inf. Theory 2018, 64, 1893–1918. [Google Scholar] [CrossRef]
  11. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lapidoth, A.; Wang, L. Rate and Nearly-Lossless State over the Gilbert–Elliott Channel. Entropy 2025, 27, 494. https://doi.org/10.3390/e27050494

AMA Style

Lapidoth A, Wang L. Rate and Nearly-Lossless State over the Gilbert–Elliott Channel. Entropy. 2025; 27(5):494. https://doi.org/10.3390/e27050494

Chicago/Turabian Style

Lapidoth, Amos, and Ligong Wang. 2025. "Rate and Nearly-Lossless State over the Gilbert–Elliott Channel" Entropy 27, no. 5: 494. https://doi.org/10.3390/e27050494

APA Style

Lapidoth, A., & Wang, L. (2025). Rate and Nearly-Lossless State over the Gilbert–Elliott Channel. Entropy, 27(5), 494. https://doi.org/10.3390/e27050494

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop