Some New Results on the Wiretap Channel with Side Information

In this paper, the model of wiretap channel has been reconsidered for the case that the main channel is controlled by channel state information (side information), and it is available at the transmitter in a noncausal manner (termed here noncausal side information) or causal manner (termed here causal side information). Inner and outer bounds are derived on the capacity-equivocation regions for the noncausal and causal manners, and the secrecy capacities for both manners are described and bounded, which provide the best transmission rate with perfect secrecy. Moreover, for the case that the side information is available at the transmitter in a memoryless manner (termed here memoryless side information), both the capacity-equivocation region and the secrecy capacity are determined. The results of this paper extend the previous work on wiretap channel with noncausal side information by providing an outer bound on the capacity-equivocation region. In addition, we find that the memoryless side information can not help to obtain the same secrecy capacity as that of the causal case, and this is different from the well known fact that the memoryless manner can achieve the capacity of the channel with causal side information.


Introduction
The concept of the wiretap channel was first introduced by A.D. Wyner [1].It is a kind of degraded broadcast channel.The wiretapper knows the encoding scheme used at the transmitter and the decoding scheme used at the legitimate receiver, see Figure 1.The object is to describe the rate of reliable communication from the transmitter to the legitimate receiver, subject to a constraint of the equivocation to the wiretapper.After the publication of A.D. Wyner's work, I. Csiszár and J. Körner [2] investigated a more general situation: the broadcast channels with confidential messages.It is clear that A.D. Wyner's wiretap channel is a special case of the model of I. Csiszár and J. Körner, in a manner that the main channel is less noisy than the wiretap channel.Furthermore, S.K. Leung-Yan-Cheong and M.E.Hellman studied the Gaussian wiretap channel (GWC) [3], and showed that its secrecy capacity was the difference between the main channel capacity and the overall wiretap channel capacity (the cascade of main channel and wiretap channel).
The coding for channels with causal (past and current) side information at the encoder was first investigated by C.E. Shannon [4] in 1958.After that, in order to solve the problem of coding for a computer memory with defective cells, N.V. Kuznetsov and B.S. Tsybakov [5] considered a channel in the presence of non-causal side information at the transmitter.They provided some coding techniques without determination of the capacity.The capacity was found in 1980 by S. I. Gel'fand and M. S. Pinsker [6].Furthermore, Max H.M. Costa [7] investigated a power constrained additive noise channel, where part of the noise is known at the transmitter as side information.This channel is also called dirty paper channel.Based on the dirty paper channel, C. Mitrpant et al. [8] studied the Gaussian wiretap channel with side information, and provided an inner bound on the capacity-equivocation region.Furthermore, Y. Chen et al. [9] investigated the discrete memoryless wiretap channel with noncausal side information, and also provided an inner bound on the capacity-equivocation region.Note that the coding scheme of [9] is a combination of those in [1,6].Chen et al. [9] generalize Mitrpant et al.'s work [8] by extending the Gaussian channel to the discrete memoryless channel (DMC), i.e., the result of [8] can be obtained from that of [9].Recently, N. Merhav [10] studied a variation of the wiretap channel, and obtained the capacity-equivocation region, where both the legitimate receiver and the wiretapper have access to some leaked symbols from the source, but the channels for the wiretapper are more noisy than the legitimate receiver, which shares a secret key with the encoder.
In this paper, we study the model of wiretap channel with side information, see Figure 2. The transition probability distribution of the main channel depends on a channel state information V N , which is available at the encoder in a noncausal or causal manner.The wiretapper can get a degraded version of the symbols Y N via a wiretap channel.Both the main channel and the wiretap channel are discrete memoryless channels.Inner and outer bounds are derived on the capacity-equivocation regions for the noncausal and causal manners (the inner bound for the noncausal manner is in fact equivalent to that of [9]), and the secrecy capacity for both manners is described and bounded.Moreover, for the case that the side information is available at the transmitter in a memoryless manner (at time i, the encoder is only allowed to use the side information V i ), both the capacity-equivocation region and the secrecy capacity are determined.In Shannon's well known paper [4], it shows that the optimal way to achieve the capacity of the channel with causal side information is to use V i instead of V i for the channel encoder.Then, it is natural to think about whether the memoryless side information can help to obtain the same secrecy capacity as that of the wiretap channel with causal side information, and this is also our motivation on the study of the memoryless model.
Compared with [9], the inner bound on the capacity-equivocation region for the noncausal manner of this paper, in fact, is equivalent to the achievable region in [9].However, the region provided in this paper is easier to understand than that of [9].
The remainder of this paper is organized as follows.In Section 2, we present the basic definitions and the main results on the capacity-equivocation regions.In Section 3, we prove the outer bounds on the capacity-equivocation regions for noncausal and causal manners, and provide the converse proof of the capacity-equivocation region for the memoryless manner.The inner bound for causal manner and the direct part of the capacity-equivocation region for the memoryless manner are proved in Section 4. Final conclusions are presented in Section 5.

Notations, Definitions and the Main Results
Throughout the paper, random variables, sample values and alphabets are denoted by capital letters, lower case letters and calligraphic letters, respectively.A similar convention is applied to the random vectors and their sample values.For example, U N denotes a random N -vector (U 1 , ..., U N ), and u N = (u 1 , ..., u N ) is a specific vector value in U N that is the N th Cartesian power of U. U N i denotes a random N − i + 1-vector (U i , ..., U N ), and u N i = (u i , ..., u N ) is a specific vector value of U N i .Let p V (v) denote the probability mass function P r{V = v}.
In this section, the model of Figure 2 is considered in three parts.The model of Figure 2 with noncausal side information is described in Section 2.1, the causal side information is described in Section 2.2, and the memoryless side information is described in Section 2.3, see the following.

The Model of Figure 2 with Noncausal Side Information
In this subsection, a description of the wiretap channel with noncausal side information is given by Definitions 1-4.The inner and outer bounds on the capacity-equivocation region C composed of all achievable (R, d) pairs are given in Theorem 1 and Theorem 2, respectively, where the achievable (R, d) pair is defined in Definition 5.
Definition 1 (encoder) The source S k is defined as (S 1 , S 2 , ..., S k ), where S i (1 ≤ i ≤ k) are i.i.d.random variables that take values in the finite set S. Then H(S k ) = kH S , where H S = H(S i ) for 1 ≤ i ≤ k.The side information V N is the output of a discrete memoryless source P V (•), and it is available at the encoder in a noncausal manner.V N is independent of S k .
The inputs of the encoder are S k and V N , while the output is is the probability that the source s k and the side information v N are encoded as the channel input x N .Definition 2 (main channel) The main channel is a DMC with finite input alphabet X × V, finite output alphabet Y, and transition probability The inputs of the main channel are X N and V N , while the output is Y N .Definition 3 (wiretap channel) The wiretap channel is also a DMC with finite input alphabet Y, finite output alphabet Z, and transition probability Q W (z|y), where y ∈ Y, z ∈ Z.The input and output of the wiretap channel are Y N and Z N , respectively.The equivocation to the wiretapper is defined as The cascade of the main channel and the wiretap channel is another DMC with transition probability Let C M W be the capacity of the channel Definition 4 (decoder) The decoder is a mapping f D : Y N → S k , with input Y N and output Ŝk = f D (Y N ).Let P e be the error probability, and it is defined as P r{S k = Ŝk }.
Definition 5 (achievable (R, d) pair in the model of Figure 2) A pair (R, d) (where R, d > 0) is called achievable if, for any > 0, there exists an encoder-decoder (N, k, ∆, P e ) such that 3) The capacity-equivocation region C is a set composed of all achievable (R, d) pairs.Inner and outer bounds on C are respectively provided in the following Theorem 1 and Theorem 2.
Theorem 1 The capacity-equivocation region C of the wiretap channel with noncausal side information satisfies R i ⊆ C, where where the random variables U , X, V , Y and Z satisfy the following Markov chain, Remark 1 There are some notes on Theorem 1, see the following.
• The range of the random variable U satisfies The proof is similar to that of Theorem 2, and it is omitted here.
• The region R i , in fact, is equivalent to the achievable region in [9], however, it is easier to understand than that of [9].The proof of Theorem 1 is a combination of Gel'fand-Pinsker's technique [6] and Wyner's random binning method [1], and we omit it here.

• Secrecy capacity
The points in R i for which d = 1 are of considerable interest, which imply the perfect secrecy H(S k ) = H(S k |Z N ).Clearly, we can easily bound the secrecy capacity C s of the model of Figure 2 with noncausal side information by Theorem 2 The capacity-equivocation region C, as defined above, satisfies C ⊆ R o , where where the random variables U , K, A, X, V , Y and Z satisfy the following Markov chains, and A may be assumed to be a (deterministic) function of K (these are directly from the definitions of the random variables U , K, A, X, V , Y and Z, see Equations (3.18), (3.19), (3.20) and (3.21)).
Remark 2 There are some notes on Theorem 2, see the following.
• The ranges of the random variables U , K and A satisfy The proof is in Appendix 5. • Observing the formula Rd ≤ I(U ; Y ) − I(K; Z|A) in Theorem 2, we have where (a) is from the fact that A may be assumed to be a (deterministic) function of K, and (b) is from the Markov chain K → U → Y → Z. Then it is easy to see that R i ⊆ R o .

The Model of Figure 2 with Causal Side Information
The model of Figure 2 with causal side information is similar to the model with noncausal side information in Section 2.1, except that the side information V N in Definition 1 is known to the encoder in a causal manner, i.e., at the i-th time (1 ≤ i ≤ N ), the output of the encoder ) and f i is the probability that the source s k and the side information v i are encoded as the channel input x i at time i.Define Inner and outer bounds on the capacity-equivocation region C c for the model of Figure 2 with causal side information are respectively provided in the following Theorem 3 and Theorem 4.

Theorem 3
The capacity-equivocation region C c satisfies R ci ⊆ C c , where where the random variables U , X, V , Y and Z satisfy the following Markov chain, There are some notes on Theorem 3, see the following.
• The range of the random variable U satisfies The proof is similar to that in Theorem 2, and it is omitted here.

• Secrecy capacity
The points in R ci for which d = 1 are of considerable interest, which imply the perfect secrecy Clearly, we can easily bound the secrecy capacity C c s of the model of Figure 2 with causal side information by where the random variables U , K, A, X, V , Y and Z satisfy the following Markov chains, and A may be assumed to be a (deterministic) function of K (these are directly from the definitions of the random variables U , K, A, X, V , Y and Z, see Equations (3.18), (3.19), (3.20)

and (3.21)).
Remark 4 There are some notes on Theorem 4, see the following.
• The ranges of the random variables U , K and A satisfy The proof is similar to that of Theorem 2, and it is omitted here.• Since the causal side information is a special case of the noncausal manner, the outer bound R co can be directly obtained from R o by using the fact that U is independent of V .• Note that I(U ; Y ) − I(K; Z|A) ≥ I(U ; Y ) − I(U ; Z) (the proof is the same as that in Remark 2), and therefore, it is easy to see that R ci ⊆ R co

The Model of Figure 2 with Memoryless Side Information
The model of Figure 2 with memoryless side information is similar to the model with causal side information in Section 2.2, except that the side information V N in Definition 1 is known to the encoder in a memoryless manner, i.e., at the i-th time (1 ≤ i ≤ N ), the output of the encoder , where f i is the probability that the source s k and the side information v i are encoded as the channel input x i at time i.Define The capacity of the main channel for the memoryless case is determined by C. E. Shannon [4], is similar to that of Theorem 2, and it is omitted here.A function Γ (R) used for describing the capacity-equivocation region composed of all achievable (R, d) pairs in the model of Figure 2 with memoryless side information is defined in Definition 6.
It is easy to see that ρ(R) is empty for R > C M , where C M is the capacity of the main channel, see Equation (2.9).For 0 ≤ R ≤ C M , denote The following Lemma 1 provides some properties about Γ (R).The proof of Lemma 1 is in Appendix 5.

Lemma 1
The quantity Γ (R), where 0 ≤ R ≤ C M , satisfies the following properties: (i) The "supremum" in the definition of Γ (R) is, in fact, a maximum, i.e., for each R, there exists a mass function Our problem in the model of Figure 2 with memoryless side information is to characterize the capacity-equivocation region C m composed of all achievable (R, d) pairs.The following Theorem 5 gives a characterization of the capacity-equivocation region C m , which is proved in the remaining sections.The secrecy capacity is defined in Remark 5 (see Equation (2.12)), which is bounded by the Formula (2.14).
Theorem 5 The capacity-equivocation region C m is equal to R * , where Remark 5 There are some notes on Theorem 5, see the following.
• Comparison with A. D. Wyner's wiretap channel [1] The main channel capacity C M denoted by Equation (2.9) in Theorem 1 is different from that of [1].When the channel state information V is a constant, the model of Figure 2 reduces to A. D. Wyner's wiretap channel [1].Substituting V by a constant and U by X into Equations (2.9), (2.10) and (2.11), the characters C M , ρ(R), Γ (R) and the region R * are the same as those of [1].

• Secrecy capacity
A transmission rate C s denoted by is called the secrecy capacity in the model of Figure 2 with memoryless side information.Furthermore, C s is the unique solution of the equation and satisfies Proof 1 (Proof of Equations (2.13) and (2.14)) Firstly, since , C * is the secrecy capacity C s in the model of Figure 2 with memoryless side information.By using the Formula (2.13), and the non-increasing property of Γ (•) (see Lemma 1 (iii)), we get Equation (2.14).The proof is completed.
• Note that in Equation (2.14), we have C s ≤ Γ (0), which implies that C s ≤ max(I(U ; Y ) − I(U ; Z)).Also note that for the causal model, the secrecy capacity satisfies max(I(U ; ).Then, it is easy to see that the memoryless manner for the encoder can not help to obtain the same secrecy capacity as that of the wiretap channel with causal side information.Suppose (R, d) is achievable, i.e., for any given > 0, there exists an encoder-decoder (N, k, ∆, P e ) such that Then we will show the existence of random variables Letting → 0, we have d ≤ 1. <Part i> We begin with the left parts of the inequalities Equations (3.4) and (3.5), see the following.
where ( 1) and ( 2) follow from the Fano's inequality.<Part ii> The character 1 N I(S k ; Y N ) in Formulas (3.8) and (3.9) can be bounded by Equation (3.10), see the following.
Formula (a) follows from the fact that S k is independent of V N .Formula (b) is from Formula (c) follows from that V N is composed of N i.i.d.random variables.
Proof 3 (Proof of Equation (3.11))Since the left part of Equation (3.11) is equal to and the right part of Equation (3.11) is equal to The Formula (3.11) is verified by Equations (3.12) and (3.13).
<Part iii> The character 1 N I(S k ; Z N ) in Formula (3.9) can be bounded by the following Equation (3.14).
Formula ( 1) is from the fact that S k is independent of V N .Formula (2) follows from Formula ( 3) is from the fact that V i is independent of V N i+1 .Formula ( 4) is from the Markov chain Proof 4 (Proof of Equation (3.15))Since the left part of Equation (3.15) is equal to and the right part of Equation (3.15) is equal to The Formula (3.15) is verified by Equations (3.16) and (3.17).
<Part iv>(single letter) To complete the proof, we introduce an random variable J, which is independent of S k , X N , V N , Y N and Z N .Furthermore, J is uniformly distributed over {1, 2, ..., N }.
<Part v> Then Equation (3.10) can be rewritten as where (a) is from the fact that V J is independent of J, i.e., p( From <Part iv>, we know that the random variable J is independent of V N , and therefore, where (1) follows from Equation (3.23).
On the other hand, the probability p(V J = v) can be calculated as follows, where (a) is from the fact that J is independent of V N , the Formula (b) is from Equation (3.23).By using Equations (3.24) and (3.25), it is easy to verify that V J is independent of J, completing the proof.
<Part vi> Analogously, Equation (3.14) can be rewritten as where (a) follows from the fact that V J is independent of J. Substituting Equations (3.22) and (3.26) into Equations (3.8) and (3.9), Lemma 2 is proved.The proof of Theorem 2 is completed.

Proof of Theorem 4
Suppose (R, d) is achievable, i.e., for any given > 0, there exists an encoder-decoder (N, k, ∆, P e ) such that Then we will show the existence of random variables The Formula (3.27) is from Since the model of Figure 2 with causal side information is a special case of the model of where (a) follows from Equations (3.8) and (3.10), and the Formula (b) is from the definitions of Y , U , see Equations (3.20) and (3.21), and V i is independent of (Y i−1 , S k , V N i+1 ).Letting → 0, the proof of Equation (3.28) is completed.
where (a) follows from Equation (3.9), the Formula (b) is from Equations (3.10) and (3.14), the Formula (c) is from the fact that V i is independent of ), the Formula (d) is from the definitions of Y , Z, U , K, A, see Equations (3.18), (3.19), (3.20) and (3.21).Letting → 0, the proof of Equation (3.29) is completed.
The proof of Theorem 4 is completed.

Converse Half of Theorem 5
In this subsection, we establish the converse theorem of Theorem 5: the region C m which is composed of all achievable (R, d) pairs is contained in the set R * , i.e., C m ⊆ R * .
Suppose (R, d) ∈ C m , i.e., for any given > 0, there exists an encoder-decoder (N, k, ∆, P e ) such that Then we will show that (R, d) ∈ R * , i.e., (R, d) satisfies the following conditions The proof of R ≤ C M and d ≤ 1 is obvious, and it is omitted here.It only needs to prove Rd ≤ Γ (R), see the following.
The following Lemma 3 provides a Markov chain used in the remaining of this subsection.The proof of Lemma 3 is in Appendix 5.
Lemma 3 In the model of Figure 2, the random variable Z i and the random vectors S k and Y i−1 (1 ≤ i ≤ N ) form the following Markov chain: The proof of Rd ≤ Γ (R) is considered in the following five steps: (i) Show that H(S k )∆ ≤ I(S k ; Y N |Z N ) + kδ(P e ) (ii) In the right part of step (i), show that In the right part of step (ii), show that (iv) A property about the variable of the function (v) Substituting step (ii), step (iii) and step (iv) into step (i), we have By using Fano's inequality, where δ(P e ) = h(P e ) + P e log(|S| − 1).
Then we have Thus, the proof of step (i) is completed.

Proof of
Step (ii) where Formula (a) follows from S k → Y N → Z N , see Lemma 3 in Appendix 5. Formula (b) follows from the fact that Lemma 2).Formula (d) follows from the fact that V N n+1 is independent of Z n , Y n−1 , S k .Formula (e) follows from the definition that U n = (S k , Y n−1 , V N n+1 ), and this is coincident with the definition of U used in the converse proof of Equation (2.9).
The proof of step (ii) is completed.

Proof of Step (iii)
The proof of step (iii) is considered in two parts.The first part is for some definitions, and the second part is for the main proof.
• For n = 2, 3, ..., N , and any It follows from the definition of ρ(R) in Equation (2.10) that the distribution p 1 , defined by Then it is easy to see that p n,y n−1 ∈ ρ(α n (y n−1 )).Thus, from the definition of Γ (R) in Equation (2.11), • By using the Formulas (3.35) and (3.40), the proof of step (iii) is as follows, where Formula (a) follows from the inequality Equation (3.40).Formula (b) follows from the concavity of Γ (R)[Lemma 1 (ii)].Formula (c) follows from the definition Equation (3.35).
The proof of step (iii) is completed.

Proof of
Step (iv) where Formula (a) follows from the definition that U n = (S k , Y n−1 , V N n+1 ).Formulas (b) and (c) follow from the fact that V N n+1 is independent of Y n , Y n−1 , S k , V n .Formula (d) follows from H(S k ) = kH S and the Fano's inequality.
The proof of step (iv) is completed.In this section, all logarithms are taken to the base 2.

Proof of Theorem 3
In this subsection, we will show the achievability of the region R ci , and we only need to prove that the pair (R, d = I(U ;Y )−I(U ;Z) R ) is achievable.A separated source-channel coding method is provided.The source encoder is a mapping with the input S k and the output W . Generate a random code-book composed of 2 N (I(U ;Y )−γ 1 ) codewords of u N (γ 1 is a small fixed positive number), and each of them is i.i.d.generated according to p U (u). Divide the code-book into 2 kH S (1+k − 1 4 ) • (Proof of U ≤ X 2 V 2 ( X V + 1)) Once the the alphabet of K is fixed, we apply similar arguments to bound the alphabet of U , see the following.Define the following continuous scalar functions of p : Since there are X V − 1 functions of f XV (p), the total number of the continuous scalar functions of p is X V + 1.
Let pXV |U = P r{X = x, V = v|U = u}.With these distributions pXV |U , we have According to the support lemma ( [11], p. 310), for every fixed k, the random variable U can be replaced by new ones such that the new U takes at most X V + 1 different values and the expressions in Equations (5-7) are preserved.Therefore, U ≤ X 2 V 2 ( X V + 1) is proved.

Proof of Lemma 1
Proof of (i) Since I(U ; Y ) − I(U ; Z) and I(U ; Y ) are continuous functions of P r{X = x, U = u|V = v}, using similar argument of [1], p. 1382, we conclude that I(U ; Y ) − I(U ; Z) has a maximum on ρ(R).
Formula (1) follows from Q → U → Y .Formula (2) follows from that Q is independent of U , U , Y , Y .
Formula (4) follows from U → Y → Z. Formula (5) follows from H(Y |U, Z) = H(Y |U, Z, Q).Formula (6) follows from the fact that Q is independent of U , U , Y , Y , Z , Z .Formula (7) follows from U → Y → Z and U → Y → Z .

Figure 1 .
Figure 1.The model of wiretap channel.

Figure 2 .
Figure 2. Wiretap channel with side information.
Figure 2 with noncausal side information, the Formulas (3.28) and (3.29) are obtained from Equations (3.2) and (3.3), respectively, see the following.Proof 6 (Proof of Equation (3.28))The parameter R of Equation (3.28) can be written as follows,

Figure 3 .
Figure 3.The definition of the random variable Q.