Stealthy Secret Key Generation

In this work, we consider a complete covert communication system, which includes the source-model of a stealthy secret key generation (SSKG) as the first phase. The generated key will be used for the covert communication in the second phase of the current round and also in the first phase of the next round. We investigate the stealthy SK rate performance of the first phase. The derived results show that the SK capacity lower and upper bounds of the source-model SKG are not affected by the additional stealth constraint. This result implies that we can attain the SSKG capacity for free when the sequences observed by the three terminals Alice ($X^n$), Bob ($Y^n$) and Willie ($Z^n$) follow a Markov chain relationship, i.e., $X^n-Y^n-Z^n$. We then prove that the sufficient condition to attain both, the SK capacity as well as the SSK capacity, can be relaxed from physical to stochastic degradedness. In order to underline the practical relevance, we also derive a sufficient condition to attain the degradedness by the usual stochastic order for Maurer's fast fading Gaussian (satellite) model for the source of common randomness.


Introduction
Consider the following motivating example. Two agents, Alice and Bob, want to establish a communication that does not raise the curiosity of a warden Willie, whose duty is to monitor if there is any suspicious activity and also decrypts the data. In order to realize a confidential transmission for such a scenario, we may adopt the following two steps. The first step is to make Willie unaware of the existence of the meaningful communication, which is embedded in the messages intended to be delivered to Bob. In contrast, in a meaningless communication, Bob does not care about the received signal, which is only used to confuse Willie. If Willie can successfully detect the existence of the meaningful transmission, then the second step is to use wiretap coding [1] to provide secrecy (or hidability [2]). There are two main concepts to attain the goal of the first step: (1) communications with a stealth constraint [2,3] and (2) communications with a covert constraint [2,4,5]. Both concepts make Willie unable to differentiate between the existence or nonexistence of the meaningful transmission, solely according to the probability distributions of his observations. More specifically, in the first concept, we transmit meaningful and meaningless signals non-overlapped in time. Note that the meaningful signal is the one Alice wants to communicate with Bob, while the meaningless signal is used to confuse Willie. If well designed, Willie cannot differentiate between those two signals, because the induced output distributions are close (the closeness can be defined in several different ways, e.g., by total variational distance, divergence, etc.) to each other. In the second concept, the meaningful signal can be superimposed on the meaningless one. Under the stealth constraint, we can have a positive capacity, while the covert transmission rate is zero, asymptotically. Even though the transmission rate of the second concept is in general zero, asymptotically, the second order rate is positive following the square root law [5,6].
For the aforementioned two concepts, if the channel between Alice and Bob (denoted by Bob's channel in the following) is no better than the channel between Alice and Willie (denoted by Willie's channel in the following), we need additional keys to conceal the meaningful signals, e.g., [4,5]. In particular, these additional keys are used to choose between codebooks to fool Willie. Our motivation is to design an achievable scheme for the source model secret key generation (SKG) for the above scenario, i.e., Bob has no channel advantage over Willie, while fulfilling both the security and stealth constraints, simultaneously. Note that the keys generated from the stealthy SKG can also be used to protect the data on top of concealing the behavior of transmission, e.g., encryption/one-time pad, etc. Note also that our design goal is violated if we directly apply common SKG schemes [7]. This is because common SKG schemes use public communications for several important operations including advantage distillation, information reconciliation, and privacy amplification ([7] Chapter 4.3).Without subtle modifications, these operations will raise Willie's curiosity. To attain our objective, we focus on stealthy SKG, which is from its counterpart, stealth communications [3]. The main reason not to consider covertness but stealth for the SKG is that, under the assumption of a noiseless public discussion channel, there is no ambient noise to hide the discussion signal. Instead, covert SKG may be feasible if there is a noisy public channel. In addition, in general, the covert SKG suffers a sub-linear rate, e.g., [8], which is inherited from the covert communications. Recall that a channel-model SKG with a rate-unlimited public channel was considered in [8]. The authors applied the scheme from covert communication to the key transmission, while a stealth-like public discussion was used.
The main contributions of this work are summarized in the following: • We investigate a source-model SKG under strong secrecy with an additional stealth constraint.
• We derive an achievable secret key (SK) rate under the stealth constraint, if I(X; Y) ≥ I(X; Z), where X, Y, and Z are the observations of the common randomness source at Alice, Bob, and Willie, respectively. Moreover, if (X, Y, Z) form a Markov chain X − Y − Z, then the SK capacity with the additional stealth constraint can be achieved without extra cost, compared to the SKG without the stealth constraint. • We prove that a sufficient condition to achieve the stealthy SK capacity can be relaxed from the physically degraded channel to a stochastically degraded one. • A sufficient condition for the existence of an equivalent degraded model is derived by the usual stochastic order [9], which is for the fast fading Gaussian Maurer's (satellite) model [10].
Notation: Lower case bold letters denote deterministic vectors, and upper case normal/bold letters denote random variables/random vectors (or matrices), which will be defined when they are first mentioned. We denote the probability mass function (pmf) by P. The entropy of X is denoted as H(X). The mutual information between two random variables X and Y is denoted by I(X; Y). The divergence between distributions P X and P Y is denoted by D(P X ||P Y ). X ∼ F denotes that the random variable X follows the distribution F, whileF 1 − F. The subscript i in X i denotes the ith symbol, and X i [X 1 , X 2 , · · · , X i ]. X − Y − Z denotes the Markov chain. · denotes the ceiling operator. All logarithms are to base two. (a) + max(a, 0). The rest of the paper is organized as follows. In Section 2, we introduce the preliminaries and the considered system model. In Section 3, we derive our main results. Finally, Section 4 concludes this paper.

Preliminaries
We first introduce some necessary definitions and results to develop our work. Definition 1. The strong secrecy and the stealth constraints are respectively defined as: for arbitrarily small > 0, where M, Z n , P Z n , and Q Z n are transmitted messages, the observed signal at Willie, and the output distributions at Willie induced by meaningful and meaningless signals, respectively.
The second constraint in the above definition can be explained by hypothesis testing as discussed in [3]. By this viewpoint, if the second constraint is fulfilled, the adversary's best strategy is to blindly guess whether the current transmitted signal is meaningful or meaningless.

Definition 2.
Denote a common random source as (X , Y, Z, P XYZ ), where X , Y, and Z are the alphabets of the observations at Alice, Bob, and Willie. The random source is stochastically degraded, if the marginal distributions P Y|X and P Z|X are identical to those of another source of common randomness (X , Y, Z, P XỸZ ) following the physical degradedness, i.e., X −Ỹ −Z. Corollary 1. The same marginal property for one transmitter ([11] Theorem 13.9) Consider a discrete memoryless multiuser channel including one transmitter and two non-cooperative receivers with input and output alphabets X and Y × Z, respectively. The capacity region of such a channel depends only on the conditional marginal distributions P Y|X and P Z|X and not on the joint conditional distribution P Y,Z|X , where X ∈ X and Y ∈ Y and Z ∈ Z are the transmit signal and the two receive signals, respectively. Definition 3. δ-robust typicality ( [12] Appendix) The sequence x n ∈ X n is δ-robust typical for δ > 0: where N(a|x n ) is the number of occurrences of a in x n .

System Model
The considered system model is shown in Figure 1. We denote the n-time source observations at Alice, Bob, and Willie by X n , Y n , and Z n , respectively, which follow the independent and identically distributed (i.i.d.) joint distribution P X n Y n Z n = ∏ n i=1 P X i Y i Z i = ∏ n i=1 P XYZ with alphabets X , Y, Z, respectively. The public discussion between Alice and Bob through a noiseless channel is denoted by a random vector F n ∈ F n . We consider the case without rate limitation on the public discussion channel. Willie can perfectly observe F n . The joint distributions of the signals that Willie can observe when the SKG is meaningful and meaningless are denoted by P F n Z n and Q F n Z n , respectively. Alice and Bob aim at sharing keys K ∈ K satisfying the constraints as follows: for arbitrarily small > 0, where (2) is the error probability having different keys at Bob from Alice, (3) is the keys' uniformity constraint, while |K| is the number of keys and (4) is the constraint for the strong secret key, which is an adaptation from the stealth communication in Definition 1.
In particular, K is dual to M, and Z n F n is dual to Z n . The stealth constraint is considered in (5), which is again an adaptation from Definition 1, i.e., here, F n Z n is what Willie can observe, instead of solely Z n in stealth communications.

Definition 5.
The rate of the keys generated fulfilling (2)-(5) is called the achievable stealthy strong SK rate. Definition 6. The maximum achievable stealthy strong SK rate is called the stealthy strong SK capacity.

Main Results
We show two main result in this section: (1) the stealthy strong SK rate and a condition to attain the capacity; (2) a scheme to identify the fast fading Gaussian Maurer's model as a degraded one, so that the stealth SK capacity can be determined explicitly.

Stealthy Strong Secret Key Rate and Capacity
Our main result is described by the following theorem followed by discussions. Theorem 1. If (X, Y, Z) drawn from the common random source (X , Y, Z, P XYZ ), then the stealthy strong SK capacity C SK of source model SKG with the stealth constraint can be bounded by: Furthermore, if (X, Y, Z) forms a Markov chain X − Y − Z, the stealthy strong SK capacity is:

Sufficient Conditions for a Degraded Common Randomness
In this section, we derive a sufficient condition to obtain C SK = I(X; Y) − I(X; Z). In particular, we show that this sufficient condition, i.e., the common randomness forming a Markov chain X − Y − Z, can be relaxed to be stochastically degraded. After that, we show that this relaxed condition can be satisfied under a quite common setting by, e.g., the fast fading Gaussian Maurer's (satellite) model [10]. In particular, a central random source S 0 emits signals passing through fast fading additive white Gaussian noise (AWGN) channels, which are observed as X, Y, and Z at Alice, Bob, and Willie, respectively. Theorem 2. If a common random source (X , Y, Z, P XỸZ ) is stochastically degraded such that PỸ |X = P Y|X and PZ |X = P Z|X , where X − Y − Z, then: The proof is delegated to Appendix C. Example: Consider the fast fading Gaussian Maurer's (satellite) model [10] as a special case of Theorem 2: where N X , N Y , and N Z are independent AWGNs at Bob and Willie, respectively, while both are with zero mean and unit variance; A X , A Y , and A Z follow CDFs F X , F Y , and F Z , respectively, and are the i.i.d. fast fading channel gains from the source S 0 to Alice and Willie, respectively. Note that intuitively, X, Y, and Z have no degradedness relation in general due to the random fading. This is because, by the definition of degradedness, the trichotomy order of all realizations between the two fading channels within a codeword length should be the same. We can invoke the same marginal property [14] to construct an equivalent channel, wherein by imposing the usual stochastic order constraint, we can identify those fading channels that can be re-ordered in the equivalent channel to keep the trichotomy order fixed. If the random channels A X and A Z fulfillF A 2 where the subscripts denote the absolute square of the channel magnitudes, then from Lemma 1, we have equivalent (in the sense of having the same stealthy SK capacity) observations at Bob and Willie asŶ =Â X S 0 + N Y andẐ =Â Z S 0 + N Z , respectively, whereÂ 2 X ≥Â 2 Z almost surely. Therefore, it is clear that Assume that A X and A Z in Equations (9a)-(9c) are fast fading magnitudes following the Nakagami-m distribution with shape parameters m x and m z and spread parameters w x and w z [15], respectively. From Theorem 2, we know that Z is a degraded version of X if: is the incomplete gamma function and Γ(s) = ∞ 0 t s−1 e −t dt is the ordinary gamma function. An example satisfying the above inequality is (m x , w x ) = (1, 3) and (m z , w z ) = (1, 2).

Conclusions
In this work, we analyzed the performance of the secret key generation from a common random source, which satisfied the additional constraint that the generation of keys should not invoke the warden Willie's attention. Our results showed that compared to the normal SKG, the additional stealth constraint could be fulfilled without extra cost. In particular, the stealthy SK capacity with strong secrecy constraint is I(X; Y) − I(X; Z), if the common random source satisfies I(X; Y) ≥ I(X; Z). To emphasize the practical relevance, a sufficient condition was derived to attain the degradedness by the usual stochastic order for the Gaussian Maurer's (satellite) model for the source of common randomness under fast fading. As a final note, we can also use Slepian-Wolf coding with a proper use of the binning code book to derive the same result.

Acknowledgments:
The authors would like to thank Matthieu Bloch for fruitful discussions and the anonymous reviewers' efforts on Remark 1 and also on improving the quality of the presentation of this paper.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Proof of Theorem 1
Our main idea for deriving the lower bound of the SK capacity in the second scheme is by constructing a conceptual WTC (CWTC) as in [16]. An equivalent wiretap codebook {U n (m, w)} is constructed, where m = 1, · · · , L 0 , w = 1, · · · , L 1 , L 0 2 nR is the number of secure messages, and L 1 2 nR 1 is the number of confusion messages; m and w are uniformly and independently selected; U n (m, w) ∈ X n , ∀(m, w). In addition, (Z n , F n , U n ) are generated according to P Z n F n U n = ∏ n i=1 P Z i F i U i = ∏ n i=1 P Z i F i |U i P U i , where we consider the equivalent channel from Alice to Willie as: where (Z n , F n ) is the equivalent channel output at Willie. Similarly, (Y n , F n ) is the equivalent channel output at Bob. We choose U n mutually independent of X n , Y n , and Z n . In order to analyze the stealth, the respective distributions of the meaningful and meaningless signals at the equivalent channel output at Willie are: Q Z n F n = ∑ u n P Z n ,F n |U n (z n , f n |u n )P U n (u n ).
We first decompose the stealth secrecy constraint as follows: D(P KZ n F n ||P K P Z n F n ) + D(P Z n F n ||Q Z n F n ) = ∑ K,Z n ,F n P KZ n F n log P KZ n F n P K P Z n F n + log P Z n F n Q Z n F n = ∑ K,Z n ,F n P KZ n F n log P KZ n F n P K Q Z n F n where (a) follows the chain rule of divergence ( [17] Th.2.2.2). We then apply the channel resolvability analysis [18] to the CWTC, in order to find the rate constraint on the confusion messages, in order to guarantee the validity of the stealth secrecy constraint (A4).
From the analysis conducted in Appendix B, we know: Recall that L 1 = 2 nR 1 is the number of confusion messages inside each bin, and we need to design it such that (A6) is vanishing. The main difference between our proof and that in [3] is that we introduce an additional channel output at both Bob and Willie by constructing a CWTC for the considered SKG model, which makes the results from [3] not able to be directly applied. Now, we reexpress the ratio in the logarithm on the right-hand side (RHS) of (A6) as follows: P Z n F n |U n Q Z n F n (a) = P Z n F n U n P U n 1 P Z n Q F n (b) = P Z n F n U n P Z n U n where (a) is due to the fact that F n and Z n are independent when a meaningless discussion is transmitted, which has a pmf denoted by Q F n ; (b) comes the fact that U n is independent of Z n by selection, i.e., P Z n U n = P Z n P U n . Note that even though Z n and F n are independent and Z n and U n are independent by assumption, that does not mean Z n , F n , and U n are necessarily generated according to P Z n , F n , U n = P Z n P F n , U n or P Z n , F n , U n = P Z n P F n P U n . In fact, since pairwise independence does not imply mutual independence ([19] Chapter 7.1, 7.2), there exists joint distribution P Z n , F n , U n such that we can invoke tools from the joint asymptotic equipartition property [20].
Then, we can rewrite (A6) as follows: The RHS of (A8) can be discussed in two cases as follows similar to [3], according to whether (z n , f n , u n ) are jointly typical or not: (z n , f n , u n )∈ T n δ (P Z n , F n , U n ) P Z n F n U n (z n , f n , u n ) log P F n |U n Z n ( f n |u n z n ) L 1 Q F n ( f n ) + 1 , (z n , f n , u n )/ ∈ T n δ (P Z n , F n , U n ) P Z n F n U n (z n , f n , u n ) log P F n |U n Z n ( f n |u n z n ) where T n δ follows the δ-robust typicality [12] definition for the subsequent derivation. The Chernoff bound and an important upper bound, which will be used later, are restated in the following.
where S X {x ∈ X : P(x) > 0} and µ x min x∈S X P(x).
Next, we derive the constraint (The constraint that Bob should successfully decode both the secret and confusion messages is a point-to-point transmission without secrecy, which can be seen from [12]. Therefore, we omit the proof here.) on R 1 as follows: where (a) comes from a specific use of the public discussion following ( [16] Theorem 3) and ⊕ is the modulo addition in X . We can follow the argument in ( [21] Appendix B) in order to apply the crypto lemma in (A12) or (A14) to unbounded X like the Gaussian case. (b) comes from the fact that U is uniformly distributed with the crypto lemma. In addition, we can derive that d 2 → 0 as n → ∞ as follows: where (a) is due to the fact that P F n |U n Z n ( f n |u n z n ) ≤ 1, and therefore, P F n |U n Z n ( f n |u n z n )/L 1 ≤ 1; (b) is by lower bounding Q n F n ( f n ) with µ f = min f n ∈S F n Q F n ( f n ), where S F n { f n ∈ X n : P( f n ) > 0}; (c) by simple algebra; (d) is by definition of probability; (e) is by Lemma A2. Note that µ f in (A13) is a constant, but not a function of n. Therefore, the RHS of (A13) can be easily seen to vanish exponentially fast, if n → ∞. Then, from (A11) and (A13), it is clear that (4) and (5) are fulfilled.
By constructing the CWTC, the following rate between Alice and Bob is achievable: where (a) again comes from the crypto lemma with the selection of U n being independent of Y n . Then, from (A12) and (A14), the achievable stealthy strong SK rate can be derived as follows: where (a) is by plugging (A12) with the assumption of a memoryless common randomness, which is independent and identically distributed (i.i.d.). We can interchange the roles of X and Y to get I(Y; X) − I(Y; Z), which completes the proof.
Remark A1. In addition to the channel resolvability scheme, we can also attain the proof by a modified SWC scheme, which is sketched as follows. We can first construct a binary auxiliary random variable S, which selects the meaningful or meaningless discussion when S = 1 and S = 0, respectively. The stealth constraint is to avoid Willie successfully guessing the realization of S and can be formulated as I(S; Z n F n ) ≤ . By the data processing inequality for divergence [17], we can have the inequality I(S; Z n F n ) ≤ D(P KZ n F n ||P K Q Z n Q F n ), which is an effective secrecy constraint of the considered SKG model. It is the counterpart to the one of the wiretap channel with stealth constraint [3]. We can then construct two binning codebooks for S = 0 and S = 1, where in each case, there is one corresponding binning codebook, in order to fulfill the secrecy constraint. In the error analysis, there are two cases: (1) when S = 1, the probability of Alice and Bob having different keys; (2) when S = 0, the probability of Bob generating a key that is not null. By vanishing enforcing the average error probability, we can attain the result in Theorem 1.

Remark A2.
In the analysis by the WTC scheme in Appendix A, we derive the stealthy strong SK rate by combining a tool based one channel resolvability developed in [3] with the CWTC. In particular, we first derive an upper bound of the averaged stealth secrecy constraint (A5) by the random coding analysis. By enforcing the upper bound to vanish, we can derive the constraints on the stealthy strong SK rate and the confusion rate in the codebook design. By this scheme, we can proceed with the derivation based on the result of the wiretap channel.
(P Z n ,F n |U n (·|u n (L, 1)) · · · +P Z n ,F n |U n (·|u n (L, L 1 ))) · log A L (z n , f n |u n ) P n U (u n (m, w)) + · · · + P Z n ,F n ,U n (·, u n (1, L 1 )) P n U (u n (m, w)) + · · · + P Z n ,F n ,U n (·, u n (L, L 1 )) P Z n ,F n ,U n (·, u n (1, 1))E U n \U n (1,1) log A 1 (z n , f n |u n ) B(z n , f n ) + · · · + ∑ u n (k,l) P Z n ,F n ,U n (·, u n (k, l))E U n \U n (k,l) log k (z n , f n |u n ) B(z n , f n ) + · · · + ∑ u n (L,L 1 ) P Z n ,F n ,U n (·, u n (L, L 1 ))E U n \U n (L,L 1 ) log A L (z n , f n |u n ) where (a) is by constructing a CWTC, such that the key K is interchangeable with the message M; (b) is by definition of the conditional K-L distance ([17] Definition 2.2); (c) is due to the fact that P Z n ,F|M is the marginalization of P Z n ,F n |U n with respect to w, which is the index of the confusion message; in (d), we expand the expectation with respect to M; (e) is by defining ∑ L 1 l=1 P Z n ,F n |U n (z n , f n |u n (m, l)) and L 1 Q Z n ,F n (z n , f n ) by A m (z n , f n |u n ) and B(z n , f n ), respectively, to simplify the expression, where m = 1, · · · , L; (f) is by definition of the expectation over {U n (m, w)} L,L 1 m=1,w=1 . Since {U n (m, w)} are generated independently and identically according to P n U , the joint distribution of codewords in a codebook is the product of marginal distributions; in (g), we expand the summation with respect to m and w; in (h), we expand the product according to the form in step (g); in (i), we collect terms to form the expectation E U n \U n (k,l) ; in (j), we collect the terms by introducing additional indices (a, b); in (k), we apply Jensen's inequality to the logarithm; (l) is by expanding the expectation E U n \U n (k,l) ; (m) is by adding the term P Z n F n U n (z n , f n , u n (a, b)); (n) is by the definition of marginalization over P Z n F n U n with respect to U n . In particular, the second term on the RHS of the numerator in (m) becomes E U n [P Z n ,F n |U n ] = Q Z n F n from (A3); (o) is by definition of the expectation.

Appendix C. Proof of Theorem 2
We sketch the proof as follows in three steps, while the main idea is summarized in Figure A1.
The key is to show that the stochastically degraded source (X,Ỹ,Z) implies that the corresponding CWTC [16] is also stochastically degraded, then the secret key capacity C SK of the source is the same as the secrecy capacity of a CWTC constructed from a physically degraded source (X, Y, Z). The first step is to construct the CWTC of the source (X, Y, Z) and prove that, if X − Y − Z, then U − Y − Z , i.e., the corresponding CWTC is also physically degraded, where U is the conceptual code symbol, uniformly distributed in X and independent of (X, Y, Z). The equivalently received signals at Bob and Willie in the CWTC are Y (Y, U ⊕ X) and Z (Z, U ⊕ X), respectively, while U ⊕ X is the signal transmitted through the public channel. The second step is to construct a stochastically degraded source (X,Ỹ,Z) from (X, Y, Z). After that, we construct the corresponding CWTC of (X,Ỹ,Z) as (U, Y , Z ), where Y (Ỹ, U ⊕ X) and Z (Z, U ⊕ X). The third step is to show that the CWTC described by (U, Y , Z ) has the same marginals as the CWTC described by (U, Y , Z ), i.e., the two CWTC's have the same secrecy capacity. In addition to the fact that stochastic degradedness is no stronger than the physical degradedness, this results in that the former one should not have a higher secret key capacity than the latter one. We then know that the sources (X, Y, Z) and (X,Ỹ,Z) have the same secret key capacity, which completes the proof.

Common random source Conceptual WTC
Physically degraded: Stochastically degraded: Figure A1. Key steps in the proof of Theorem 2. First is to show that the CWTCalso physically degraded if the source is physically degraded. Then, we show that the CWTC constructed from a stochastically degraded source corresponding to the physically degraded source is a stochastically degraded CWTC with the same marginal as that physically degraded CWTC. Finally, we show that the secrecy capacity of the second CWTC is indeed the secret key capacity of the stochastically degraded source. The key and secrecy capacities of the physically degraded source and the corresponding CWTC are denoted by C SK and C S , respectively, while the key (rate) and secrecy capacities of the stochastic source and the corresponding CWTC are denoted by C SK (R SK ) and C S , respectively.
In the following, we will prove that the stochastically degraded random source (X,Ỹ,Z) implies that the corresponding CWTC [16] is also stochastically degraded, which is constructed by the corresponding physically degraded source (X, Y, Z). We start from constructing the CWTC of the random source (X, Y, Z), where the equivalently received signals at Bob and Willie are Y (Y, U ⊕ X) and Z (Z, U ⊕ X), respectively. If X − Y − Z, then U − Y − Z , i.e., the CWTC is also a physically degraded one, which can be shown as follows: where (a) is by the definition of Y and Z ; (b) is by the crypto lemma [22], and U is selected to be independent of Y and Z; (c) is from the fact that given U, we can know X from U ⊕ X; (d) is by the same reason as (b); (e) is due to X − Y − Z and by the definition of conditional mutual information. Now, we consider the stochastically degraded source of common randomness (X,Ỹ,Z) fulfilling PỸ |X = P Y|X and PZ |X = P Z|X . Similar to the first step, we construct the CWTC from (X,Ỹ,Z), namely {(U, Y , Z ), P U,Y ,Z = P U P Y ,Z |U }, where Y (Ỹ, U ⊕ X) and Z (Z, U ⊕ X) are the equivalent channel outputs at Bob and Willie, respectively. To prove that the two CWTC's P Y ,Z |U and P Y ,Z |U are equivalent, we can invoke the same marginal property in ( [11] Theorem 16.6) to prove that P Y |U = P Y |U and P Z |U = P Z |U , which is shown in the following. By the definitions of Y and Y , we know that P Y |U=u = PỸ ,u⊕X PỸ ,Xu and P Y |U=u = P Y,u⊕X P Y,Xu , respectively, where we define X u as u ⊕ X. Note that due to the closed operation ⊕ in X , the distribution of X u is a left circular shift of that of X by u. Instead of directly proving PỸ ,Xu = P Y,Xu , we can equivalently prove PỸ |Xu = P Y|Xu , ∀ u, as follows: where the second equality is due to the assumption of (X,Ỹ,Z) forming a stochastically degraded source from the physically degraded one (X, Y, Z). Therefore, P Y |U = P Y |U . Similarly, we can derive P Z |U = P Z |U . Hence, we prove that the CWTC of a stochastic degraded source is also a degraded CWTC and the two CWTC's have the same secrecy capacity.
Note that due to X − Y − Z, we can derive the key capacity C SK from the corresponding CWTC, not just an achievable key rate. For the source (X,Ỹ,Z), till now, we may only claim the achievable secret key rate, but not the secret key capacity, namely C SK , is the same as C SK . That is because the CWTC is in general only an achievable scheme to derive the secret key rate. However, due to the fact that the stochastic degradedness is more general than the physical degradedness, i.e., less stringent on characterizing the order between X,Ỹ andZ, the stochastic degradedness cannot result in a larger secret key capacity than the physically degraded one. Therefore, we attain that C SK = C SK = I(X; Y) − I(X; Z), which completes the proof.