On the Interactive Capacity of Finite-State Protocols

The interactive capacity of a noisy channel is the highest possible rate at which arbitrary interactive protocols can be simulated reliably over the channel. Determining the interactive capacity is notoriously difficult, and the best known lower bounds are far below the associated Shannon capacity, which serves as a trivial (and also generally the best known) upper bound. This paper considers the more restricted setup of simulating finite-state protocols. It is shown that all two-state protocols, as well as rich families of arbitrary finite-state protocols, can be simulated at the Shannon capacity, establishing the interactive capacity for those families of protocols.

like in human conversation, on what the other would tell them. A simple instructive example (taken from [2]) is the following. Suppose that Alice and Bob play correspondence chess. Namely, they are located in two distinct places and play by announcing their moves over a communication channel (using, say, 12 bits per move, which is clearly sufficient). If the moves are conveyed without error, then both parties can keep track of the state of the board, and the game can proceed to its termination. The sequence of moves occurring over the course of this noiseless game is called a transcript, and it is dictated by the protocol of the game, which constitutes Alice and Bob's respective strategies determining their moves at any given state of the board. Now, assume that Alice and Bob play a chess game over a noisy two-way channel, yet wish to simulate the transcript as if no noise were present. In other words, they would like to communicate back and forth in a way that ensures, once communication is over, that the transcript of the noiseless game can be reproduced by to both parties with a small error probability. They would also like to achieve this goal as efficiently as possible, i.e., with the least number of channel uses. One direct way to achieve this is by having both parties describe their entire protocol to their counterpart, i.e., each and every move they might take given each and every possible state of the board. This reduces the interactive problem to a non-interactive one, with the protocol becoming a pair of messages to be exchanged. However, this solution is grossly inefficient; the parties now know much more than they really need to in order to simply reconstruct the transcript. At the other extreme, Alice and Bob may choose to describe the transcript itself by encoding each move separately on-the-fly, using a short error correcting code. Unfortunately, this code must have some fixed error probability, and hence an undetected error is bound to occur at some unknown point, causing the states of the board held by the two parties to diverge, and rendering the remainder of the game useless. It is important to note that if Alice and Bob had wanted to play sufficiently many games in parallel, then they could have used a long error-correcting code to simultaneously protect the set of all moves taken at each time point, which in principle would have let them operate at the one-way Shannon capacity (which is the best possible). The crux of the matter therefore lies in the fact that the interactive problem is one-shot, namely, only a single instance of the game is played.
In light of the above, it is perhaps surprising that it is nevertheless possible to simulate any one-shot interactive protocol using a number of channel uses that is proportional to the length of the transcript, or in other words, that there is a positive interactive capacity whenever the Shannon capacity is positive. This fact was originally proved by Schulman [3], who was also the first to introduce the notion of interactive communication over noisy channels. The lower bound on the interactive capacity was recently studied in [4], and was found to be at least 0.0302 of the Shannon capacity for all binary memoryless symmetric channels.
In this work, rather than giving a lower bound on the interactive capacity for any protocol, we study the notion of interactive capacity where constraints are imposed on the family of protocols to be simulated. We define the family of finite-state protocols, and show that for a large class of these protocols, the Shannon capacity is achievable. In particular, we prove that Shannon capacity is achievable for all protocols having only two states (two-state protocols). For larger state-spaces, we discuss rich families of protocols which satisfy two sufficient conditions, and show that within these families almost all members can be reliably simulated at the Shannon capacity. We note that the approach of studying the interactive capacity of protocols having a specific structure was previously taken in [5]. The authors of [5] limited the "interactiveness" of the protocols by considering families of protocols whose transcript is predictable to a certain extent, and proved that they can be simulated in higher rates than general protocols. The constraints imposed on the protocols in this paper are, however, on the memory of the protocols and not on their predictability.
The rest of the paper is organized as follows: in Section II the interactive communication problem is formulated. In Section III, finite-state (or M -state) protocols, which are the main model discussed in this paper, are defined. In Section IV the basic concepts of the coding schemes are presented. In Section V a capacity achieving coding scheme for two-state protocols is presented. In Section VI it is proved that the concepts in Section IV cannot be used for at least one three-state protocol. In Section VII families of finite-state protocols for which almost-all members can be simulated at Shannon capacity are presented. Finally, Section VIII concludes the paper.
A preliminary version of some of the results in this paper appeared in [6]. Here we extend upon the results of [6] as follows: first, [6] only considered Markovian protocols, a special case of the type of protocols we consider here. Second, [6] gave a simple special case of the coding scheme of Section IV; here we generalize the scheme and give two methods that can handle more complex protocols, beyond Markovian. Finally, the Shannon capacity achieving scheme for two-state protocols in Section V, the inachievability results for three states in Section VI and the scheme for higher order models in Section VII appear here for the first time.

II. THE INTERACTIVE COMMUNICATION PROBLEM
In this paper, we define a length-n interactive protocol as a triplet π (φ Alice , φ Bob , µ), where: The functions φ Alice are known only to Alice, and the functions φ Bob are known only to Bob. The speaker order functions µ are known to both parties. The transcript τ associated with the protocol π is sequentially generated by Alice and Bob as follows: where σ i is the identity of the speaker at time i, which is given by: In the interactive simulation problem, Alice and Bob would like to simulate the transcript τ , by communicating back and forth over a noisy memoryless channel P Y |X . Specifically, we restrict our discussion to channels with a binary input alphabet X = {0, 1}, and a general (possibly continuous) output alphabet Y. We use C Sh (P Y |X ) to denote the Shannon capacity of the channel. Note that C Sh (P Y |X ) ≤ 1, since the input of the channel is binary. Naturally, we also limit the discussion to channels whose Shannon capacity is non-zero. Note that while the order of speakers in the interactive protocol itself might be determined on-the-fly (by the sequence of functions µ), we restrict the simulating protocol to use a predetermined order of speakers. The reason is, that allowing an adaptive order over a noisy channel, will lead to a non-zero probability of disagreement regarding the order of speakers. This disagreement might lead to simultaneous transmissions at both parties, which is not supported by the chosen physical channel model To achieve their goal, Alice and Bob employ a length-N coding scheme Σ that uses the channel N times. The coding scheme consists of a disjoint partitionÃ B = {1, ..., N } whereÃ (resp.B) is the set of time indices where Alice (resp. Bob) speaks. This disjoint partition can be a function of µ, but not of φ Alice , φ Bob . At time j ∈Ã (resp. j ∈B), Alice (resp. Bob) sends some Boolean function of (φ Alice , µ) (resp. (φ Bob , µ)), and of everything she has received so far from her counterpart. The rate of the scheme is R = n N bits per channel use. When communication terminates, Alice and Bob produce their simulations of the transcript τ , denoted byτ A (Σ, φ Alice , µ) ∈ {0, 1} n andτ B (Σ, φ Bob , µ) ∈ {0, 1} n respectively. The error probability attained by the coding scheme is the probability that either of these simulations is incorrect, i.e., A rate R is called achievable if there exists a sequence Σ n of length-N n coding schemes with rates n Nn ≥ R, such that lim n→∞ max π of length n P e (Σ n , π) = 0, where the maximum is taken over all length-n interactive protocols. Accordingly, we define the interactive capacity C I (P Y |X ) as the maximum of all achievable rates for the channel P Y |X . Note that this definition parallels the definition of maximal error capacity in the one-way setting, as we require the error probability attained by the sequence of coding schemes to be upper bounded by a vanishing term uniformly for all protocols.
It is clear that at least n bits need to be exchanged in order to reliably simulate a general protocol, and hence the interactive capacity satisfies C I (P Y |X ) ≤ 1. In the special case of a noiseless channel, i.e., where the output deterministically reveals the input bit, and assuming that the order of speakers is predetermined (namely µ contains only constant functions), this upper bound can be trivially achieved; Alice and Bob can simply evaluate and send τ i sequentially according to (1) and (2). Note, however, that if the order of speakers is general, then this is not a valid solution, since we required the order of speakers in the coding scheme to be fixed in advance. Nevertheless, any general interactive protocol can be sequentially simulated using the channel 2n times with alternating order of speakers, where each party sends a dummy bit whenever it is not their time to speak. Conversely, a factor two blow-up in the protocol length in order to account for a non predetermined order of speakers is also necessary. To see this, consider an example of a protocol where Alice's first bit determines the identity of the speaker for the rest of time; in order to simulate this protocol using a predetermined order of speakers, it is easy to see that at least n − 1 channel uses must be allocated to each party in advance. We conclude that under our restrictive capacity definition, the interactive capacity of a noiseless channel is exactly 1 2 . When the channel is noisy, a tighter trivial upper bound holds: To see this, consider the same example given above, and note that each party must have sufficient time to reliably send n − 1 bits over the noisy channel. Hence, the problem reduces to a pair of one-way communication problems, in which the Shannon capacity is the fundamental limit. We remark that it is reasonable to expect the bound (4) to be loose, since general interactive protocols cannot be trivially reduced to one-way communication as the parties cannot generate their part of the transcript without any interaction. However, the tightness of the bound remains a wide open question.
In the remainder of this paper we limit the discussion to protocols in which the order of speakers is predetermined and bit vs. bit. Namely, Alice speaks at odd times (σ i = Alice for odd i) and Bob speaks at even times (σ i = Bob for even i). We note that for such protocols, the 1/2 penalty required for the adaptive order of speakers in not needed and the upper bound is therefore

A. Background
The interactive communication problem introduced by Schulman [3], [7] is motivated by Yao's communication complexity paradigm [8]. In this paradigm, the input of a function f is distributed between Alice and Bob, who wish to compute f with negligible error (nominally set to 1/3 and can be reduced to any other fixed value without changing the order of magnitude) by exchanging (noiseless) bits using some interactive protocol. The length of the shortest protocol achieving this is called the communication complexity of f , and denoted by CC(f ). In the interactive communication setup, Alice and Bob must achieve their goal by communicating through a pair of independent BSC(ε). The minimal length of an interactive protocol attaining this goal is now denoted by CC ε (f ).
In [9], Kol and Raz defined the interactive capacity as and proved that in the limit of ε → 0, under the additional assumption that the communication complexity of f is computed with the restriction that the order of speakers is predetermined and has some fixed period. The former assumption on the order of speakers is important. Indeed, consider again the example where the function f is either Alice's input or Bob's input as decided by Alice. In this case, the communication complexity with a predetermined order of speakers is double that without this restriction, and hence considering such protocols renders C KR I (ε) ≤ 1 2 . For further discussion on the impact of speaking order as well as channel models that allow collisions, see [10]. For a fixed nonzero ε, the coding scheme presented in [3] (which precedes [9]) already showed that C KR I (ε) = Θ(C Sh (ε)), but the constant has not been computed. In [4] it is shown that C I (P Y |X ) ≥ 0.0302 · C Sh (P Y |X ) for any channel P Y |X taken from the class of binary memoryless symmetric channels (which include the binary symmetric channel, the binary erasure channel, the binary input additive Gaussian channel etc.).

III. FINITE-STATE PROTOCOLS
Let us start by defining the notions of interactive rate and capacity for families of protocols. Let Π = {Π 1 , Π 2 , ...} be a sequence of families of protocols, where Π n denotes some family of length-n protocols. A rate R is called achievable for Π if there exists a sequence Σ n of (n, N n ) coding schemes where N n ≤ n R , and such that Namely, the difference from (3) is that now the maximum is taken over the protocols in Π n and not over the entire family of protocols with length n. Accordingly, we denote the interactive capacity respective to the channel P X|Y and the family of protocols Π by C I (Π, P Y |X ), and define it as the maximum of all achievable rates for P Y |X and Π.
The family of protocols studied in this paper is the family of finite-state protocols with M states, which will be referred to in short as M -states protocols. In these protocols, the entire history of the transcript is encapsulated in a state-variable taken from a set with a finite cardinality. The state variable determines the following transcript bit, and is advanced by both parties using a predetermined update rule.
The notation of finite-state protocols is given here: ..} denote the family of M -state protocols of increasing lengths. For these protocols Alice speaks at odd times and Bob speaks on even times: namely σ i = Alice if i is odd, and σ i = Bob if i is even. The transcript of these protocols is generated by where s i is the state variable at time i, s i ∈ S . S is the state-space, with cardinality |S| = M assumed to be S = {0, 1, ..., M − 1} without loss of generality. φ i : S → {0, 1} is the transmission function at time i, owned by Alice at odd i and by Bob at even i and assumed to be unknown to the counterpart. In block # initial state where η : (S, {0, 1}) → S is the state-advance function, which is time invariant and known to both parties.
The following example for a finite-state protocols is the family of Markovian protocols previously presented in [6] and defined as follows: Example 1. For a Markovian protocol, the number of states M is a power of two, and the state variable corresponds to the last log M bits of the transcript. Namely, the state can be regarded as the binary vector and the state-advance function is

IV. BASIC CONCEPTS OF THE CODING SCHEMES
The proofs in this paper are based on constructive coding schemes which use the concept of vertical simulation presented below, implemented in conjunction with either one of the two methods described in Subsections IV-B and IV-C.

A. Vertical Simulation
As explained before, the transcript bits of interactive protocols are produced sequentially (τ 1 , τ 2 , τ 3 , ....). Simulating a protocol over noisy channel requires the reliable transmission of the bits sent in every round, whose number is potentially small (and can even be equal to one, in the extreme case and in the finitestate protocols discussed in this paper), which impedes the use of efficient channel codes due to finite block-length bounds [11].
One way of circumventing the problem of a short block-length (i.e. small number of bits per round) is using vertical simulation as explained in this subsection. The concept of a vertical simulation is depicted in Table I, in which the protocol is simulated in vertical blocks, according to the indexing at the bottom row of the table. Namely, the first vertical block contains the transcript bits (τ 1 , τ m+1 , τ 2m+1 , ...), the second vertical block contains the transcript bits (τ 2 , τ m+2 , τ 2m+2 , ...) and so on. As shall be explained in the sequel, the vertical blocks can be constructed to be sufficiently long in order to allow reliable transmission at rates approaching Shannon capacity. The main obstacle of using this technique in the general case is the assumption that future transcript bits (for example τ m+1 , τ 2m+1 etc. for the first vertical block) are known prior to the simulation of the protocol. In the sequel, we shall provide methods for the efficient computation of these future transcript bit, which facilitate the simulation of the entire protocol at Shannon capacity.
Let us now explicitly define the concept of vertical simulation. Let the n times of the protocols be divided into n/m blocks of length m, and assume that all the initial-states respective to the beginnings of all blocks (s 0 , s m , s 2m etc.), are known to both parties before the transcript is simulated. For simplicity of presentation, one can consider at this point that the initial states are calculated and revealed by a genie, who knows the transmission functions of both parties. More realistic methods for calculating the initial states are elaborated later in this section. We now note, that by the finite-state property in Definition 1, having the initial-states of all the blocks known, the parties can continue simulating the transcript of every block, without needing to know the transcripts of its preceding blocks. In other words, the knowledge of the initial state at every blocks decouples the simulation problems of distinct blocks.
Using this decoupling assumption, the following coding scheme can be used for the simulation of the protocol over P Y |X . We start by defining the vectors of state estimates and transcript estimates held by Alice and Bob. We use a distinct notation for every party and emphasize the fact that these are estimates, since they are computed over noisy channels. We denote the vector of initial state estimates at Alice's side for vertical block j byŝ and the respective vector of transcript estimates bŷ Bob's counterparts toŝ A (j) andτ A (j) are respectively denoted byŝ B (j) andτ B (j) and are similarly defined. The scheme can now be presented for odd j from 1 to m: 1) Assume that Alice and Bob haveŝ A (j) andŝ B (j).
3) Alice encodesτ A (j) using a block code with rate R v < C Sh (P Y |X ), and sends it to Bob over the channel, using n/m R times. This code will be referred to as a vertical block code. 4) Bob decodes the output of the channel and obtainsτ B (j). 5) Alice (resp. Bob) usesŝ A (j) andτ A (j) (resp.ŝ B (j) andτ B (j)) to calculateŝ A (j + 1) (resp. ŝ B (j + 1)) according to (6). 6) Alice and Bob advance j by one.
For even j, the same steps are implemented by exchanging the roles of Alice and Bob. We recall that we previously assumed that for the first block, both parties know the actual initial states of the noiseless protocol, i.e.ŝ A (1) =ŝ B (1) and both are equal to the state vector of the noiseless protocol. It is clear from the construction of the scheme, that if all block codes are reliably decoded, the transcript is simulated without error. The following basic lemma gives a condition for the reliable decoding of block codes: Lemma 1. Suppose l(n) independent blocks of b(n) bits are to be conveyed over channel P Y |X at rate R < C Sh (P Y |X ) and n → ∞. Then, if l(n) = o(e b(n) ), the probability of error in the decoding of one or more blocks is o(1).
The proof is due to the basic fact that the probability of error decays exponentially in the block length and appears in Appendix A.
From this point on, we set m = √ n. We assume that if needed, the transcript is extended by zeros in order to ensure that √ n is an integer. Using Lemma 1 with l(n) = m(n) = √ m ensures that reliable transmission of the vertical blocks can be accomplished at any rate R v < C Sh (P Y |X ).
Let us now bound the total length N of the simulating protocol: , which means that the protocol can be reliably simulated at Shannon capacity if n → ∞.
So far, we assumed without justification, that initial states of all the blocks were revealed to both parties before the beginning of the simulation. We now present two alternative methods for their efficient calculation.

B. Efficient State Lookahead
This method is based on two assumptions: 1) For every block, the last state can be calculated by both parties given the first state, without knowing the entire transcript of the block, using only o(m) (clean) bits exchanged between the parties. 2) The n m o(m) bits required for this calculation for the entire protocol, can be reliably exchanged over the noisy channels at a strictly positive rate.
Assuming that the very first state of the protocol is known to both parties, and that the first condition holds, Alice and Bob can go from the first block to the last and calculate all their respective initial states. The second condition guarantees that only additional Θ( n m o(m)) = o(n) channel uses are required for this process. The total length of the simulating protocol can thus be bounded by so, as before, lim n→∞ n N = R v for every R v < C Sh (P Y |X ), which means that the protocol can be reliably simulated at Shannon capacity provided that n → ∞.

C. Efficient Exhaustive Simulation
The following method was previously presented in [6] for the simulation of Markovian protocols. So far, we assumed that for every block, only the transcript related to single initial state, which was assumed to be the actual state in the noiseless protocol, was simulated. Alternatively, it is possible to simulate all transcripts resulting from all possible initial states in every block, and then go from the first block to the last and estimate the transcript of the noiseless protocol according the the final state of the previous block. Such a simulation can be made possible, for example, if the parties simply describe the identities of their transmission functions to their counterparts. While it is easy to show that the required bits can be conveyed at Shannon capacity, if there are more than two possible transmission functions at every time, the total rate of such a coding scheme is bound to be lower than Shannon capacity.
However, Shannon capacity can be achieved if the following conditions hold: 1) At every block, the transcripts associated with all possible M initial states, can be encoded using only m + o(m) bits.
2) The required bits can be reliably conveyed over the noisy channels at any rate below Shannon capacity.
If both conditions hold then the total number of channel usese required for the simulation is and the protocol can be simulated at any rate below Shannon capacity as long as n → ∞.

V. ACHIEVING SHANNON CAPACITY WITH TWO STATES
The first result presented in this paper is that any two-state protocol can be simulated at Shannon capacity. An equivalent statement is given in the following theorem: where Π = Φ 2 , namely, the family of two-state protocols.
The proof is based on the following coding scheme: Proof. We assume without loss of generality that S = {0, 1} and start by presenting an algorithm for the efficient state lookahead method from Subsection IV-B. For simplicity of exposition we use the time indices of the first block. For other blocks the indices should be appropriately shifted. We also assume that the bits required for the algorithm are exchanged between Alice and Bob without error. In the sequel we explain how they can be reliably conveyed over the noisy channels.
The first step in the algorithm is the calculation of the following sequence of composite-functions, ν i : S → S, defined as: for 1 ≤ i ≤ m, which is done by Alice at odd i and Bob at even i. We note that knowing ν i (s i−1 ), and the value of s i−1 , the following state s i can be calculated. We also note that since ν i : {0, 1} → {0, 1}, ν i (s i−1 ) must be one of the following four functions: which can also be described in the following form: where d Alice ⊕ i is odd,i∈{1,...,m} In other words, s m can be calculated by its initial value s 0 and the parity of the number of times in the block it is flipped (from 0 to 1 or vice versa) by either Alice or Bob. All in all, assuming that the parties know s 0 , they only need to exchange d Alice and d Bob (i.e. two bits) in order to calculate s m . However, so far we assumed that all the compsite functions in the block in the following form ν i (s i−1 ) = s i−1 ⊕ c i . In the general case in which ν i (·) are taken from the complete set of four functions in (8), the algorithm can be modified by first exchanging the location and the value of the last constant composite-function in the block, i.e. the last composite-function of the form ν i (s i−1 ) = b i . We note that this process requires only exchanging O(log m) between Alice and Bob. Then, s m can be calculated similarly to (9) but from the location of the last constant composite-function and not from the beginning of the block. The algorithm is formulated as follows:

1) Alice sends Bob her latest (odd) time index in the block for which
her latest constant composite-function), along with value of b i . If such an index does not exist she sends zero to Bob. Bob then repeats the same process with the appropriate alterations. We use i const to denote the maximum of the indices, which therefore represents the location of the last constant composite-function in the block. We now set b 0 = s 0 if i const = 0 and b iconst if i const > 0. This process requires exchanging O(log m) bits between Alice and Bob. 2) We now note, that since i const is the index of the latest constant composite-function in the block, then for all i const < i ≤ m, ν i (s i−1 ) = s i−1 ⊕ c i for some c i ∈ {0, 1}. The final state in the block, s m , can therefore be calculated by We finally note, that d Alice and d Bob are single bits that can be calculated by their respective parties and then exchanged, leaving the total number of required exchanged bits for the algorithm O(log m).
After repeating this operation for all blocks, it is possible to calculate all the final states of all blocks (i.e. all the initial states of their following blocks) by applying (10)  For the sake of completeness we now give the high level of an alternative coding scheme based on the efficient exhaustive simulation method described in Subsection IV-C. This coding scheme is a little more involved than the previously described one, and depends on the identity of the state-advance function η(·). We start by noting that η(·) is a binary function with two binary inputs, so there are in total sixteen possible such function. In particular, there are four state-advance function that do not depend on transcript bit τ i : As the very first state of the protocol is assumed to be known to both parties, having one of these state-advance functions, the state sequence of the entire protocol can be determined before its simulation, rendering the entire protocol non-interactive, hence trivial to simulate. For the remaining twelve stateadvance functions, the following coding scheme is proposed, which is described for simplicity for the first block, but should be independently implemented for all blocks: 1) Before the simulation begins, both parties communicate the locations of the first (rather than the last) constant composite-function in the block: the smallest value 1 ≤ i ≤ j for which ν i (s i−1 ) = b i , for some b i ∈ {0, 1}. This process requires exchanging O(log m) bits.
2) The parties exchange the identities of their transmission functions (i.e. ψi(·)) before the location of the first constant composite-function in the block, using a single bit per time index. In the sequel we show that there are indeed only two relevant functions to describe, so their description requires only a single bit. At the end of this process, the parties can independently simulate the transcripts for both initial states until the location of the first constant composite-function. 3) For time indices after the location of the first constant composite-function, the transcripts associated with both initial states coincide, so they can both be simulated using a single bit per time index.
Using this coding scheme, only m+o(m) bits are required for the simulation of the transcripts associated with both initial states. These bits can be reliably conveyed for all blocks using vertical block codes, as required in the description of the scheme in Subsection IV-C. To see that, observe that there are only three canonical types of state-advance functions, depicted in the state-diagrams in Figure 1. The nodes represent the state variables, and the directed edges show the possible state transitions. The specific values of the states and transcript bits on the edges are deliberately not indicated; it is easy to check that there are four possible setting for every type, summing up to twelve functions in total. An example for a Type I state-advance function is η(s i−1 , τ i ) = τ i , for a Type II state-advance function is η(s i−1 , τ i ) = s i−1 ∧ τ i , and for a Type III state-advance function is: We now return to the definition of the composite-functions ν i (s i−1 ) in (7), and note that ν i (s i−1 ) is constant (i.e. set to either 0 or 1) if the transmission functions are such that s i receives the same value for both s i−1 = 0 and s i−1 = 1. As the transmission function ψ i (s i−1 ) determines the values associated with the edges of the state diagram, it can be seen that for every type of advance function, there exist only two transmission functions which render ν i (s i ) constant. Since there are in total four possible transmission function, there are therefore only two possible transmission functions before the appearance of one of the two that makes ν i (s i ) constant, as required by the scheme.

VI. FAILURE OF THE CODING SCHEME FOR THREE STATES
We now provide an example of a protocol for which both methods described in Subsections IV-B and IV-C fail. Since the protocol is to be used on a block (rather than on the entire protocol), we use m to denote its length.
and the following transmission function is used by Bob at even time indices: We start by proving the failure of the efficient state lookahead scheme from Subsection IV-B, by showing a reduction from the disjointness problem commonly used in the communication complexity literature [12].
Definition 2 (Disjointness). Alice and Bob are given as input the sets X, Y ⊆ {1, ..., m/2}, respectively. The disjointness function is defined as where 1 (·) is the indicator function, which equals one if the condition is satisfied and zero otherwise.
We now show how DISJ(X, Y ) can be computed using the three-state protocol of Example 2. We set the values of the vector α for k ∈ {1, 2, ..., m/2} according to The values of the elements of β do not affect the reduction from disjointness and can be all set to zero for simplicity. They will be used in the proof of the failure of the exhaustive simulation scheme shown in the sequel.
Observe that s m = 2 if and only if there exist at least one k ∈ {1, 2, ..., m/2} for which α 2k−1 = 1 and α 2k = 1, which means that k ∈ X and k ∈ Y and the intersection of X and Y is not empty. Namely, which means that s m can be used to compute DISJ(X, Y ).
Since it is assumed that s m can be computed using o(m) bits, and DISJ(X, Y ) can be computed using s m without additional communication due to (11), it follows that DISJ(X, Y ) can also be computed using o(m) bits. However, it is well-known that the communication complexity of the disjointness function is Ω(m): even if Alice and Bob can use a shared randomness source in their communication protocol, and even if they are allowed to err with probability 1/3, they must still exchange Ω(m) bits in the worst-case in order to compute DISJ(X, Y ) [13], [14]. In fact, disjointness remains hard even in the amortized case, where Alice and Bob are given a sequence of inputs X 1 , ..., X l ⊆ {1, ..., m/2} and Y 1 , ..., Y l ⊆ {1, ..., m/2} (respectively), and their goal is to output the sequence DISJ(X 1 , Y 1 ), ..., DISJ(X l , Y l ). The average communication per-copy for this task is still Ω(m) (i.e., the total communication is Ω(m · l, where l is the number of copies). This result is the direct consequence of the following three results: i) the information cost of disjointness is linear [15]; ii) information cost is additive [16]; and iii) information is a lower bound on communication [15].
We now prove that for Example 2, the efficient exhaustive simulation of Subsection IV-C also fails, by providing a setting of α and β for which simulating the transcripts of all three possible initial states requires the parties to reliably exchange 3 2 m bits. This is impossible to accomplish using only m + o(m) exchanged bits, as assumed in the scheme, and therefore the scheme must fail. We set up the example as follows: we set β to be an arbitrary binary vector whose odd elements are known only to Alice and whose even elements are known only to Bob. In addition, we set the odd elements of α to be arbitrary and known only to Alice, and set all the even elements of α to zero. We observe that the transcript associated with the initial state s 0 = 2 essentially sends the sequence β 1 , ..., β m non-interactively, implying that the parties exchange m bits that are initially unknown to their counterparts. For the other two initial states, s 0 ∈ {0, 1}, the setting of the even entries of α to zero guarantees that the transcripts associated with the two initial states s 0 ∈ {0, 1} will never reach the state s i = 2 at i ≤ m. This way, in order to simulate the associated transcripts, Alice must convey to Bob the even entries of α. Hence, successful exhaustive simulation means that 3 2 m bits must be exchanged overall, which cannot be done using only m + o(m) bits.

VII. ACHIEVING SHANNON CAPACITY WITH MORE THAN TWO STATES
In the previous sections we presented a coding scheme that achieves capacity for all two-state protocols, but fails to achieve capacity for at least one three-state protocol. In this section we present specific families of M -state protocols which obey two conditions, and show that within these families, almost all protocols can be simulated at Shannon capacity. The notion of achieving capacity for almost all members of a family is demonstrated in the following example: Example 3. Consider the family of Markovian protocols with M states defined in Example 1 where M is a power of two, and whose transmission functions are taken from the entire set of S → {0, 1} functions. We shall now show that capacity is achievable for almost all protocols in this family.
To see this, first observe that the set of possible transmission functions contains two constant functions: one that maps all states to 0 and one function that maps all states to 1. Now, assume that vertical simulation is implemented as described in Section IV, but all transcripts for initial states in all blocks are simulated for the last n 1/4 times in every block (which requires only o(n)) channel uses. It is easy to show (and a stronger statement is proved in Theorem 2 below) that almost all protocols in the family have at least one sequence of log M constant functions within the last n 1/4 times in every transmission block. Having this sequence of constant functions will ensure that all transcripts in every block will have the same final state, which could be used for the efficient state lookahead method described in Subsection IV-B.
On the other hand, one might argue that the presence of a sequence of constant transmission functions reduces the interactiveness of the protocol. In other words, that highly interactive protocols are not likely to include a constant function. However, it was previously shown in [6], that the scheme described above can be used for protocols whose transmission functions are taken from a smaller families of non-constant functions, such as the family of balanced Boolean functions. We shall now extend the results from [6] to finite-state protocols. For this purpose, we define two conditions that the family of protocols should fulfill. The first condition is related to the state-advance function, and the second condition is related to the transmission functions, as defined here: Definition 3. A state-advance function η of an M -state protocol Φ M is called "coinciding" if there exist K ∈ N such that for every pair of distinct states j, j ∈ S, j = j there exists a pair of binary sequences of length K: (b j 1 , b j 2 , ..., b j K ) and (b j 1 , b j 2 , ..., b j K ) for whichs j K =s j K wheres j K is generated by applying for i going from 1 to K with the initial conditions j 0 = j ands j K is generated similarly, replacing j by j . The following theorem formalizes the notion of achieving capacity for almost all members of these families of protocols: Theorem 2. Let Π be the family of all M -state protocols whose state-advance function is coinciding and whose transmission functions are taken from a fixed given set of useful functions. Then there exists a sequence of families of protocols S = {S 1 , S 2 , ...}, S n ⊆ Π n and |S n |/|Π n | = 1 − o(1), for which the interactive capacity is equal to the Shannon capacity. Namely, Proof. The proof is based on implementing the methods from Subsections IV-B or Subsection IV-C using one of the following two constructions. We start by presenting the construction for the efficient state lookahead method from Subsection IV-B: For the last p = n 1/4 times in every transmission block, exhaustively simulate the transcripts related to all possible M initial states. We assume for simplicity that the protocol is extended by zeros so that n 1/4 is an integer and in addition, so that n 1/4 /K is also an integer, as shall be required in the sequel. This simulation can be implemented by each side describing all its respective p/2 transmission functions to its counterpart. After this is done, both parties can simulate the transcripts for all possible initial states in the last p times in every block without any additional channel uses. Since there are only 2 M functions S → {0, 1}, the description of every function in F requires at most M bits. The bits required for the description of all transmission functions of a party, for the last p times in all n/m transmission blocks, can be reliably conveyed over the noisy channel, either by a single block code per party, or by a distinct block code per time instance. It is easy to see that the setting of p = n 1/4 and m = n 1/2 ensures the transmission of these bits with a vanishing error using the channels only o(n) times. Now, if in every block, the transcripts respective to all possible initial state have the same (possibly block dependent) final state, we can use this set of states as the state lookahead. We call this phenomenon state-coincidence and note that if it occurs, since the channel was used only o(n) times for the calculation of the state lookahead, Shannon capacity can be achieved, as explained in Alternatively, the efficient exhaustive simulation described in Subsection IV-C can be similarly implemented by using the construction for the first (rather than the last) p times in all blocks. If all states coincide in all blocks, then for every block there is only a single transcript to simulate for the last m − p times in the block. All in all, only m + o(m) bits are required for the simulation the transcripts of all the initial states, as required by the method.
We now use S n ⊆ Π n to denote the subset of protocols for which the states coincide, so their respective transcripts can be simulated at Shannon capacity as explained above. It remains to prove that |S n |/|Π n | = 1 − o(1). This is done by assuming that the protocols in S n are generated by drawing all their transmission function uniformly from the set F and independently in time, and denoting the probability of drawing a protocol in S n by Pr(S n ), so transmission functions (ψ k+1 , ..., ψ K ) and particularly for s j K = s j K . Therefore, We now observe, that (16) only assumed that the initial states are distinct, i.e. s j = j = s j = j . Therefore, in case s j K = s j K we can consider the drawing of the following K functions as a repeated, statistically identical and independent experiment. Following this argumentation, we can consider consecutive p/K such experiments, and observe that the a failure in the coincidence at the end of the block of length p, s j p = s j p , implies that all these p/K experiments failed. We can therefore state the following bound: Pr s j p = s j p ≤ Pr s j K = s j K p/K (17) where (17) is potentially loose since it considers only the coincidence events occurring in non-overlapping blocks of length K, (18) is due to (16), and finally is by the inequality (1 − x) a ≤ exp(−ax) which holds for any x > 0 and a ∈ N.
We emphasize that so far we examined the coincidence of only two initial states, j and j in a single transmission block. We denote by E 1 the event in which all the transcripts corresponding to all initial states did not coincide to the same final state. The probability of E 1 can be bounded by: Pr(E 1 ) = Pr . Finally, we denote by E 2 the event that the final states did not coincide in all transmission blocks. Using the union bound again, this probability can be bounded by: Pr(E 2 ) ≤ n/m Pr(E 1 ) It now immediately follows that: which concludes the proof.

VIII. CONCLUDING REMARKS
In this paper, the problem of simulating an interactive protocol over a pair of binary-input noisy channels is considered. While previous works [3], [4], [7], [9] approach this problem using worst-case assumptions (characterizing the rates in which all possible interactive protocols can be simulated), this work restricts the discussion to a specific set of finite-state protocols. A coding scheme is presented that achieves Shannon capacity for all two-state protocol, but can not be used to simulate at Shannon capacity for at least one three-state protocol. Then, specific families of finite-state protocols are considered, and Shannon capacity is proved to be achievable for almost all of their members.
Since the proofs in this paper are based on specific coding schemes, proving their failure does not prove the inachievability of Shannon capacity. It is also plausible that Shannon capacity is achievable for larger classes of nontrivial interactive protocols using different coding scheme. A nontrivial upper bound on the ratio between the Shannon capacity and the interactive capacity for a fixed channel (i.e., not in the limit of a very clean channel) still remains an intriguing open question even in the simplest binary symmetric case.

APPENDIX A PROOF OF LEMMA 1
Proof. The proof is by straightforward implementation of Gallager's random coding error exponent and the union bound. Due to [17][Theorem 5.6.4], the probability of decoding error in a single block is upper bounded by: where E r (R) (the error exponent) is strictly positive for any 0 ≤ R < C Sh (P Y |X ) and b(n)/R is the length of the block code. Now, having l(n) independent such blocks, the probability of error in one or more blocks can be upper bounded using the union bound: Pr(error in any block) ≤ l(n) exp − b(n) R E r (R) where (a) is by the assumption that l(n) = o(e b(n) ).