1. Introduction
Due to the openness of wireless communication, the personal health information, which is exchanged on the wireless channel in WBAN, is readily fetched and attacked by hackers. To address this issue, there are usually two ways to enhance the security of wireless communications: one is the security guaranteed by information theory in Refs. [
1,
2,
3], another is the security verified by the computational complexity in Refs. [
4,
5]. In this paper, we aim to study the secure transmission problem in WBAN on the basis of the information theory. Here, the secure transmission indicates the way to code the transmitted data so that the attackers cannot get the data. The concept of wiretap channel is introduced by Wyner in Ref. [
6]. In his model, the source message was sent to the targeted user via a discrete memoryless channel (DMC). Meanwhile, an eavesdropper was able to tap the transmitted data via a second DMC. It was supposed that the eavesdropper knew the encoding scheme and decoding scheme. The object was to find a pair of encoder and decoder such that the eavesdropper’s level of confusion on the source message was as high as possible, while the receiver could recover the transmitted data with a small decoding error. Wyner’s wiretap channel model is called the discrete memoryless wiretap channel, since the main channel output was taken as the input of the wiretap channel in Ref. [
7].
After Wyner’s pioneering work, the models of wiretap channels have been studied from various aspects. Csiszar and Korner considered a more general wiretap channel model called the broadcast channels with confidential messages (BCCs) in Ref. [
8]. The wiretap channel was not necessarily a degraded version of the main channel. Moreover, they also considered the case where public data was supposed to be broadcasted through both main channel and wiretap channel. The degraded wiretap channels with discrete memoryless side information accessed by the encoder were considered in Refs. [
9,
10,
11]. BCCs with causal side information were studied in Ref. [
12]. Communication models with channel states known at the receiver were considered in Refs. [
13,
14]. Ozarow and Wyner considered another wiretap channel model called wiretap channel of type II [
15]. The secrecy capacity was established there. In that model, the source data was encoded into 
N digital symbols and transmitted to the targeted user through a binary noiseless channel. Meanwhile, the eavesdropper was able to observe an arbitrary 
-subcollection of those symbols.
In the last few decades, a lot of capacity problems related to the wiretap channel II were studied. A special class of non-DMC wiretap channel was studied in Ref. [
16]. The main channel was a DMC instead of noiseless, and the eavesdropper observed 
 digital symbols through a uniform distribution. An extension of wiretap channel II was studied in Ref. [
17], where the main channel was a DMC and the eavesdropper was able to observe 
 digital bits through arbitrary strategies.
The model of finite-state Markov channel was first introduced by Gilbert [
18] and Elliott [
19]. They studied a kind of Markov channel model with two states, which is known as the Gilbert–Elliott channel. In their channel model, one state was related to a noiseless channel and the other state was related to a totally noisy channel. Wang in Ref. [
20] extended the Gilbert–Elliott channel and considered the case with finite states.
This paper discusses finite-state Markov erasure wiretap channel (FSME-WTC) (see 
Figure 1). In this new model, the source data 
W is encoded into 
N digital symbols, denoted by 
, and transmitted to the targeted user through a DMC. The eavesdropper is able to observe the transmitted symbols through a finite-state erasure Markov channel (FSMEC). Secrecy capacity of this new communication model is established, based on the coding scheme devised by the authors in Ref. [
17].
The model of FSME-WTC can be applied to model the security problem of WBAN readily. Let us suppose that there are N sensors in WBAN. Then, we can treat the collection of symbols obtained from the sensors as a digital sequence of length N transimitted over an imaginary channel. The imaginary channel is not DMC because the symbols from the sensors are correlated. Markov chain is an important model to characterize the correlation of random variables since it will not bring too much complexity of the system. The wiretap channel is set as an erasure channel to model the situation where the attacker in WBAN is able to tap data from only part of the sensors. Thus, our model of FSME-WTC is to ensure that the attacker is not able to get any information from the WBAN when he/she can only observe data from at most  sensors.
The importance of this model is obvious. As the technology of 5G advances towards the stage of commercial applications, wireless networks are becoming more and more significant in our daily lives [
21,
22]. Therefore, the security problem of wireless communication is critical from the aspects of both theory and engineering. Meanwhile, the finite state Markov channel is a common model to character the properties of wireless communication. Hence, the results of this paper are meaningful to many kinds of wireless networks with high confidentiality requirements, such as WBAN and IoT.
The remainder of this paper is organized as follows. The formal statements of Finite-state Markov Erasure Wiretap Channel and the capacity results are given in 
Section 2 (see also 
Figure 1). The secrecy capacity of this model is established in Theorem 1. Some concrete examples of this communication model are given in 
Section 3. The converse part of Theorem 1, relying on Fano’s inequality and Proposition 1, is proved in 
Section 4. The direct part of Theorem 1, based on Theorem 1 in [
17], is proved in 
Section 5. 
Section 6 gives the proof of Proposition 1, and 
Section 7 finally concludes this paper.
  2. Notations, Definitions and the Main Results
Throughout this paper,  is the set of positive integers.  is the set of positive integers no greater than N for any . For any index set  and random vector , denote by  the “projection” of  onto the index set  such that  for all , and , otherwise.
Let  be any finite alphabet not containing the “error” letter ? and . It follows that  is distributed on  for any random vector  over .
Example 1. Let ,  and . Then,Let  be an arbitrary random vector distributed on . Then, the random vector  is distributed on .  Definition 1. (Encoder) Let the source message W be uniformly distributed on a certain message set . The (stochastic) encoder  is specified by a matrix of conditional probability  with  and . The value of  specifies the probability that we encode message w encoded into the sequence .
 Definition 2. (Main channel) The main channel is a DMC, whose input alphabet is  and output alphabet is , where . The transition probability matrix of the main channel is denoted by  with  and . The input and output of the main channel are denoted by  and , respectively. For any  and , it follows thatwhere  Remark 1. From the property of DMC, it holds that  Definition 3. (Wiretap channel) Let  be the channel state of FSMEC at time n satisfying that  forms a Markov chain. The transition of channel states is homogeneous, i.e., the conditional probability  is independent from the time index n. Moreover, the channel states are stationary, i.e.,  share a generic probability distribution  on a common finite set  of channel states. Moreover, let  be the probability that the state at the next time slot is changed to  when the state is t currently. It follows thatfor . The input of FSMEC is a digital sequence , which is actually the main channel output. Denote by  the wiretap channel output. For each time slot n, the channel is either totally noisy, i.e.,  or totally noiseless, i.e., , which depends on the value of . Thus, the channel output  is totally determined by the channel input  and the channel state . Let  be the set of states under which the channel is noiseless. Then, it follows that  contains the states where the channel is totally noisy. Denote by  the probability that the channel outputs z when the channel input is y and the channel state is t. It follows thatwhere For any ,  and , it is readily obtained that  Remark 2. Throughout this paper, it is supposed that  is independent from W,  and .
 Proposition 1.  forms a Markov chain for every .
 Proof.  The proof of Proposition 1 is given in 
Section 6. Proposition 1 would be used to establish the converse part of Theorem 1 (see 
Section 4). □
 Definition 4. (Decoder) The decoder is specified by a mapping . To be particular, the estimation of the source message is , where  is the main channel output. The average decoding error probability is denoted by .
 Definition 5. (Achievability) A positive real number R is said to be achievable, if, for any real number , one can find an integer  such that, for any , there exists a pair of encoder and decoder of length of length N satisfying that  Definition 6. (Secrecy capacity) A real number  is said to be the secrecy capacity of the communication model if it is achievable for every  and unachievable for every .
 Theorem 1. Let  be the function of  defined in Definition 3 such that  if , and , otherwise. If it follows thatfor any , the secrecy capacity of the communication model in Figure 1 is , where  is the capacity of the main channel, i.e.,  Proof.  The proof of Theorem 1 is divided into the following two parts. The first part, given in 
Section 4, proves that every achievable real number 
R must satisfy 
, which is the converse half of the theorem. The second part, given in 
Section 5, proves that every real number 
R satisfying 
 is achievable, which is the direct half. □
 Theorem 1 claims that, if the Markov chain 
 satisfies Label (
2), then the secrecy capacity of the wiretap channel model depicted in 
Figure 1 is 
. In the rest of this section, we will introduce a class of Markov chains satisfying (
2) in Theorem 2, and provide the secrecy capacity of the related wiretap channel model in Corollary 1.
A stationary Markov chain is call ergodic if, for each pair of states , it is possible to go from state t to  in expected finite steps. One can prove that, if a Markov of chain is ergodic, the stationary probability distribution of the state is unique.
Theorem 2. (Law of Large Number for Markov Chain) If the Markov chain  is ergodic, let π be the unique stationary distribution of the state. Then, it follows thatfor each channel state t, where  is 1 
or 0
, indicating whether  is true or not.  With the theorem above, we immediately obtain that
Corollary 1. If the Markov chain  is ergodic with the unique stationary distribution π over , then the secrecy capacity of the wiretap channel model depicted in Figure 1 is given bywhere  is the capacity of the main channel, and    4. Converse Half of Theorem 1
This section proves that every achievable real number 
R must satisfy 
. The proof is based on Fano’s inequality (cf. Formula (76) in Ref. [
6]) and Proposition 1.
For any give 
 and 
, Formula (2) indicates that
      
      or equivalently
      
      when 
N is sufficiently large, where
      
Suppose that there exists a code of length 
N satisfying (1), i.e.,
      
	  Then, we have
      
      where 
 as 
, and the last inequality follows from the Fano’s inequality. Since 
, the formula above indicates that
      
	  The value of 
 is upper bounded by
      
      where (a) and (b) follow from the fact that 
 forms a Markov chain, and (c) follows from Proposition 1 and the fact that 
 is independent from 
 and 
.
For any 
, denoting 
, Formula (
7) is further deduced by
	  
      where (a) follows because 
 forms a Markov chain when given 
, and (b) follows because 
 and 
 are independent from 
. For any fixed 
, denote 
. On account of the chain rule, we have
      
      and
      
	  Moreover, from the property of DMC, Remark 1 yields
      
	  Combining Formulas (
9)–(
12), it follows that
	  Considering that 
 forms a Markov chain, we have
      
      or equivalently
      
	  Substituting the formula above into Formula (
13), we have
      
      Noticing that
      
      Formula (
14) is further deduced by
      
      Substituting the formula above with Formula (
8) gives
      
      where the last inequality follows from (5). Combining (6) and the formula above yields
      
 is finally established by letting 
 and 
 converge to 0. This completes the proof of converse half.