Compressed Secret Key Agreement: Maximizing Multivariate Mutual Information Per Bit

The multiterminal secret key agreement problem by public discussion is formulated with an additional source compression step where, prior to the public discussion phase, users independently compress their private sources to filter out strongly correlated components for generating a common secret key. The objective is to maximize the achievable key rate as a function of the joint entropy of the compressed sources. Since the maximum achievable key rate captures the total amount of information mutual to the compressed sources, an optimal compression scheme essentially maximizes the multivariate mutual information per bit of randomness of the private sources, and can therefore be viewed more generally as a dimension reduction technique. Single-letter lower and upper bounds on the maximum achievable key rate are derived for the general source model, and an explicit polynomial-time computable formula is obtained for the pairwise independent network model. In particular, the converse results and the upper bounds are obtained from those of the related secret key agreement problem with rate-limited discussion. A precise duality is shown for the two-user case with one-way discussion, and such duality is extended to obtain the desired converse results in the multi-user case. In addition to posing new challenges in information processing and dimension reduction, the compressed secret key agreement problem helps shed new light on resolving the difficult problem of secret key agreement with rate-limited discussion, by offering a more structured achieving scheme and some simpler conjectures to prove.


Introduction
In Information-theoretic security, the secret key agreement problem by public discussion is the problem where a group of users discuss in public to generate a common secret key that is independent of their discussion. The problem was first formulated by Maurer [34], Ahlswede and Csiszár [1] under a private source model involving two users who observe some correlated private sources. Rather surprisingly, public discussion was shown to be useful in generating the secret The secret key agreement formulation was subsequently extended to the multi-user case by Csiszár and Narayan [22]. Some users are also allowed to act as helpers who can participate in the public discussion but need not share the secret key. The designated set of users who need to share the secret key are referred to as the active users. Different from the two-user case, one-way discussion may not achieve the secrecy capacity when there are more than two users. Instead, an omniscience strategy was considered in [22] where the users first communicate minimally in public until omniscience, i.e., the users discuss in public at the smallest total rate until every active user can recover all the private sources. The scheme was shown to achieve the secrecy capacity in the case when the wiretapper only listens to the public discussion. This assumes, however, that the public discussion is lossless and unlimited in rate, and the sources take values from finite alphabet sets. If the sources were continuous or if the public discussion were limited to a certain rate, it may be impossible to attain omniscience. This work is motivated by the search of a better alternative to the omniscience strategy for multiterminal secret key agreement. A prior work of Csiszár and Narayan [21] considered secret key agreement under rate-limited public discussion. The model involves two users and a helper observing correlated discrete memoryless sources. The public discussion by the users is conducted in a par-ticular order and direction. While the region of achievable secret key rate and discussion rates remains unknown, single-letter characterizations involving two auxiliary random variables were given for many special cases, including the twouser case with two rounds of interactive public discussion, where each user speaks once in sequence, with the last public message possibly depending on the first. By further restricting to one-way public discussion, the characterization involves only one auxiliary random variable and was extended to continuous sources by Watanabe and Oohama in [48], where they also gave an explicit characterization without any auxiliary random variable for scalar Gaussian sources in [48]. For vector Gaussian sources, the characterization by the same authors in [49] involving some matrix optimization was further improved in [31] to a more explicit formula. However, if the discussion is allowed to be two-way and interactive, Tyagi [45] showed with a concrete two-user example that the minimum total discussion rate required, called the communication complexity, can be strictly reduced. Using the technique of Kaspi [30], multi-letter characterizations were given in [45] for the communication complexity and, similarly, by Liu et al. in [32] for the region of achievable secret key rate. [32] further simplified the characterization using the idea of convex envelope using the technique by Ma et al [33]. While these characterizations provide many new insights and properties, they are not considered computable, compared to the usual single-letter and explicit characterizations. Further extension to the multi-user case also appears difficult, as the converse can be seen to rely on the Csiszár sum identity [1, Lemma 4.1], which does not appear to extend beyond the two-user case.
Nevertheless, partial solutions under more restrictive public discussion constraints were possible. By simplifying the problem to the right extent, new results were discovered in the multi-user case, which has led to the formulation in this work. For instance, Gohari and Anantharam [28] characterized the secrecy capacity in the multi-user case under the simpler vocality constraint where some users have to remain silent throughout the public discussion. Using this result, simple necessary and sufficient conditions can be derived as to whether a user can remain silent without diminishing the maximum achievable key rate [7,36,50]. This is a simpler result than characterizing the achievable rate region because it does not say how much discussion is required if a user must discuss. Another line of work [9,19,35,37] follows [45] to characterize the communication complexity but in the multi-user case. Courtade and Halford [19] characterized the communication complexity under a special non-asymptotic hypergraphical source model with linear discussion. [37] obtained a multi-letter lower bound on the communication complexity for the asymptotic general source model. It also gave a precise and simple condition under which the omniscience strategy for secret key agreement is optimal for a special source model called the pairwise independent network (PIN) [40], which is a special hypergraphical source model [18]. [9,17] further derived some single-letter and more easily computable explicit lower bounds, from which one can also obtain conditions for the omniscience strategy to be optimal under the hypergraphical source model, which covers the PIN model as a special case. [10] considered the more general problem of characterizing the multiterminal secrecy capacity under rate-limited public discussion. In particular, an objective of [10] is to characterize the constrained secrecy capacity defined as the maximum achievable key rate as a function of the total discussion rate. This covers the communication complexity as a special case when further increase in the public discussion rate does not increase the secrecy capacity. While only single-letter bounds were derived for the general source model, a surprisingly simple explicit formula was derived for the PIN model [10]. The optimal scheme in [10] follows the tree-packing protocol in [39]. It turns out to belong to the more general approach of decremental secret key agreement in [5,6] inspired by the achieving scheme in [19] and the notion of excess edge in [18]. More precisely, the omniscience strategy is applied after some excess or less useful edge random variables are removed (decremented) from the source. Since the entropy of the decremented source is smaller, the discussion required to attain omniscience of the decremented source is also smaller. Such decremental secret key agreement approach applies to hypergraphical sources more generally, and it results in one of the best upper bounds in [35] for communication complexity. However, for more general source models that are not necessarily hypergraphical, the approach does not directly apply.
The objective of this work is to formalize and extend the idea of decremental secret key agreement beyond the hypergraphical source model. More precisely, the secret key agreement problem is considered with an additional source compression step before public discussion where each user independently compresses their private source component to filter away less correlated randomness that does not contribute much to the achievable secret key rate. The compression is such that the entropy rate of the compressed sources is reduced to under certain specified level. In particular, the edge removal process in decremental secret key agreement can be viewed as a special case of source compression, and the more general problem will be referred to as compressed secrecy key agreement. The objective is to characterize the achievable secret key rate maximized over all valid compression schemes. For simplicity, this work will focus on the case without helpers, i.e., when all users are active and want to share a common secret key. A closely related formulation is by Nitinawarat and Narayan [38], which characterized the maximum achievable key rate for the two-user case under the scalar gaussian source model where one of the user is required to quantize the source to within a given rate. [46] also extended the formulation and techniques in [38] to the multi-user case where every user can quantize their sources individually to a certain rate. The compression considered in this work is more general than quantizations for gaussian sources, and the new results are meaningful beyond continuous sources.
The compressed secret key agreement problem is also motivated by the study of multivariate mutual information (MMI) [15], i.e., an extension of Shannon's mutual information to the multivariate case involving possibly more than two random variables. The unconstrained secrecy capacity in the no-helper case has been viewed as a measure of mutual information in [11,15], not only because of its mathematically appealing interpretations such as the residual indepen-dence relation and data processing inequalities in [15], but also because of its operational significance in undirected network coding [13,14], data clustering [8] and feature selection [16] (cf. [20]). The optimal source compression scheme that achieves the compressed secrecy capacity can be viewed more generally as an optimal dimension reduction procedure that maximizes the MMI per bit of randomness, which is an extension of the information bottleneck problem [44] to the multivariate case. However, different from the multivariate extension in [25], the MMI is used instead of Watanabe's total correlation [47], and so it captures only the information mutual to all the random variables rather than the information mutual to any subsets of the random variables. Furthermore, the compression is on each random variable rather than subsets of random variables.
The paper is organized as follows. The problem of compressed secret key agreement is formulated in Section 2. Preliminary results of secret key agreement are given in Section 3. The main results are motivated in Section 4 and presented in Section 5, followed by the conclusion and some discussions on potential extensions in Section 6.

Problem Formulation
Similar to the multiterminal secret key agreement problem [22] without helpers or wiretapper's side information, the setting of the problem involves a finite set V of |V | > 1 users, and a discrete memoryless multiple source N.b., letters in sans serif font are used for random variables and the corresponding capital letters in the usual math italic font denote the alphabet sets. P ZV denotes the joint distribution of Z i 's.
A secret key agreement protocol with source compression can be broken into the following phases: Private observation: Each user i ∈ V observes an n-sequence Private randomization: Each user i ∈ V generates a random variable U i independent of the private source, i.e., Source compression: Each user i ∈ V computes for some function ζ i that maps to a finite set.Z V is referred to as the compressed source.
Public discussion: Using a public authenticated noiseless channel, a user i t ∈ V is chosen in round t ∈ [ℓ] to broadcast a messagẽ ℓ is a positive integer denoting the number of rounds andF t−1 denotes all the messages broadcast in the previous rounds. If the dependency onF t−1 is dropped, the discussion is said to be non-interactive. The discussion is said to be one-way (from user i) if ℓ = 1 (and i 1 = 1). For convenience, denote the aggregate message from user i ∈ V and the aggregation of the messages from all users respectively.
Key generation: A random variable K, called the secret key, is required to satisfy the recoverability constraint that for some function θ i , and the secrecy constraint that where K denotes the finite alphabet set of possible key values. N.b., unlike [45], non-interactive discussion is considered different from one-way discussion in the two-user case since both users are allowed to discuss even though their messages cannot depend on each other. Different from [23], there is an additional source compression phase, after which the protocol can only depend on the origninal sources through the compressed sources.
The objective is to characterize the maximum achievable secret key rate for a continuum of different levels of source compression: Definition 1. The compressed secrecy capacity with a joint entropy limit α ≥ 0 is defined asC where the supremum is over all possible compressed secret key agreement schemes satisfying lim sup This constraint limits the joint entropy rate of the compressed source.
N.b., instead of the joint entropy limit, one may also consider entropy limits on some subset B ⊆ V that lim sup If multiple entropy limits are imposed,C S will be a higher-dimensional surface instead of a one-dimensional curve. For example, in the two-user case under the scalar gaussian source model, [38] considered the entropy limit only on one of the users. In the multi-user case under the gaussian markov tree model, [46] considered the symmetric case where the entropy limit is imposed on every user. For simplicity, however, the joint entropy constraint (7) will be the primary focus in this work. It will be shown thatC S (α) is closely related to the constrained secrecy capacity C S (R) defined as [10] C S (R) := sup lim inf (2), i.e., without compression, and the entropy limit (7) replaced by the constraint on the total discussion rate N.b., it follows directly from the result of [22] thatC S (α) remains unchanged whether the discussion is interactive or not. Indeed, the relation betweenC S (α) and C S (R) to be shown in this work will not be affected either. Therefore, for notational simplicity, C S (R) may refer to the case with or without interaction, even though C S (R) may be smaller with non-interactive discussion. It is easy to show that C S (R) is continuous, non-decreasing and concave in R [10, Proposition 3.1]. As R goes to ∞, the secrecy capacity is the usual unconstrained secrecy capacity defined in [22] without the discussion rate constraint (10). The smallest discussion rate that achieves the unconstrained secrecy capacity is the communication complexity denoted by Similar to C S (R), the following basic properties can be shown forC S (α): achieving the unconstrained secrecy capacity in the limit.
Because of (13), a quantity playing the same role of R S for C S can be defined forC S (α) as follows.
Definition 2. The smallest entropy limit that achieves the unconstrained secrecy capacity is defined as and referred to as the minimum admissible joint entropy.
One may also consider both the entropy limit (7) and discussion rate constraint (10) simultaneously, and define the secrecy capacity as a function of α and R. For simplicity, however, we will not consider this case but, instead, focus on the relationship betweenC S (α) and C S (R).
The following example illustrates the problem formulation. It will be revisited at the end of Section 5 (Example 3) to illustrate the main results.
where X a , X b and X c are uniformly random and independent bits. It is easy to argue thatC To see this, notice that X a is observed by every user. Any choice of K = θ(X n a ) can therefore be recovered by every user without any discussion, satisfying the recoverability constraint (4) trivially. Since there is no public discussion required, the secrecy constraint (5) also holds immediately by taking a portion of the bits from X n a to be the key bits in K.
, satisfying the entropy limit (7) with α equal to the key rate. Hence,C S (α) ≥ α as desired. Indeed, we will show (by Proposition 5) that the reverse inequality holds in general, and so we have equality for α ∈ [0, 1] for this example.
For α = H(Z V ) = H(X a , X b , X c ) = 3, every user can simply retain their source without compression, i.e., withZ i = Z i for i ∈ V while satisfying the entropy limit (7). Now, with K = (X n a , X n b ) and F = F 2 = X n b ⊕ X n c where ⊕ is the elementwise XOR, it can be shown that both the recoverability (4) and secrecy (5) constraints hold. This is because user 3 can recover X b from the XOR X b ⊕ X c with the side information X c . Furthermore, the XOR bit is independent of (X a , X b ) and therefore does not leak any information about the key bits. With this scheme,C S (3) ≥ 2. By the usual time-sharing argument, Indeed, the reverse inequality can be argued using one of the main results (Theorem 1) and so the minimum admissible joint entropy will turn out to be α S = 3.

Preliminaries
In this section, a brief summary of related results for the secrecy capacity and communication complexity will be given. The results for the two-user case will be introduced first, followed by the more general results for the multi-user case, and the stronger results for the special hypergraphical source model. An example will also be given at the end to illustrate some of the results.

Two-user case
As mentioned in the introduction, no single-letter characterization is known for C S (R) andC S (α) even in the two-user case where V := {1, 2}. Furthermore, while multi-letter characterizations for R S and C S (R) were given in [45] and [32] respectively in the two-user case under interactive discussion, no such multiletter characterization is known for the case with non-interactive discussion. Nevertheless, if one-way discussion from user 1 is considered, then the result of [21, Theorem 2.4] and its extension [48] to continuous sources gave the following characterization of C S (R): The last constraint (17c) corresponds to the Markov chain Z ′ 1 − Z 1 − Z 2 and so the supremum is taken over the choices of the conditional distribution P Z ′ 1 |Z1 = P Z ′ 1 |Z1,Z2 . Using the double Markov property as in [45], it follows that C S (0) can be characterized more explicitly by the Gács-Körner common information where U is a discrete random variable. If (18) is finite, a unique optimal solution U exists and is called the maximum common function of Z 1 and Z 2 because any common function of Z 1 and Z 2 must be a function of U. The communication complexity also has a more explicit characterization [45, (44)] and W is a discrete random variable. If J W,1 (Z 1 ∧ Z 2 ) is finite, a unique optimal solution W exists and is called the minimum sufficient statistics of Z 1 for Z 2 since Z 2 can only depend on Z 1 through W.
In Section 4, the expression C S,1 (R) will be related to the compressed secret key agreement restricted to the two-user case when the entropy limit is imposed only on user 1. This duality relationship in the two-user case will serve as the motivation of the main results for the multi-user case. Indeed, the desired characterization ofC S (α) for the two-user case has appeared in [38, Lemma 4.1] for the scalar gaussian source model: For the general source model, the expression (21) has also appeared before with other information-theoretic interpretations as mentioned in [24]. The lagrangian dual of (21), in particular, reduces to the dimension reduction technique called the information bottleneck method in [44], where Z 1 is an observable used to predict the target Z 2 , and Z ′ 1 is a feature of Z 1 that captures as much mutual information with the target variable as possible per bit of mutual information with the observable. Interestingly, the principal of the information bottleneck method was also proposed in [42,43] as a way to understand deep learning, since the best prediction of Z 2 from Z 1 is nothing but a particular feature of Z 1 sharing a lot of mutual information with Z 2 .

General source with finite alphabet set
Consider the multi-user case where |V | ≥ 2. If Z V takes values from a finite set, then the unconstrained secrecy capacity was shown in [22] to be achievable via communication for omniscience (CO) and equal to where R CO is the smallest rate of CO [22] characterized by the linear program where r(B) denotes the sum i∈B r i . Further, R CO can be achieved by noninteractive discussion. It follows that It was also pointed out in [22] that private randomization does not increase C S (∞). Hence, if Z V is finite, we have because C S (∞) can be achieved withZ i = Z i . While it seems plausible that randomization does not decrease R S nor increase C S (R) for any R ≥ 0, a rigorous proof remains elusive. Similarly, it appears plausible that neither α S norC S (α) are affected by randomization but, again, no proof is known yet. An alternative characterization of C S (∞) was established in [11,18] by showing that the divergence bound in [22] is tight in the case without helpers. More precisely, with Π ′ (V ) defined as the set of partitions of V into at least two non-empty disjoint sets, then In the bivariate case when V = {1, 2}, I(Z V ) reduces to Shannon's mutual information I(Z 1 ∧ Z 2 ). It was further pointed out in [15] that I(Z V ) is the minimum solution γ to the residual independence relation for some P ∈ Π ′ (V ). To get an intuition of the above relation, notice that γ = 0 is a solution when the joint entropy H(Z V ) on the left is equal to the sum of entropies H(Z C )'s on the right for some partition P. In other words, the MMI is the smallest value of γ removal of which leads to an independence relation, i.e., the total residual randomness on the left is equal to the sum of individual residual randomness on the right according to some partitioning of the random variables. It was further shown in [15] that there is a unique finest optimal partition to (26a) with a clustering interpretation in [8]. The MMI is also computable in polynomial time, following the result of Fujishige [26].
In the opposite extreme with R → 0, it is easy to argue that where J GK (Z V ) is the multivariate extension of the Gács-Körner common information in (18) with U again chosen as a discrete random variable. Note that, even without any public discussion, every user can compress their source independently to U n where U is the maximum common function if J GK (Z V ) is finite. Hence, it is easy to achieve a secret key rate of H(U) = J GK (Z V ) without any discussion. The reverse inequality of (28) seems plausible but has not been proven yet except in the two-user case. The technique in [21] which relies on the Csiszár sum identity does not appear to extend to the multi-user case to give a matching converse.

Hypergraphical sources
Stronger results have been derived for the following special source model: Definition 3 (Definition 2.4 of [18]). Z V is a hypergraphical source w.r.t. a hypergraph (V, E, ξ) with edge functions ξ : E → 2 V \ {∅} iff, for some independent edge variables X e for e ∈ E with H(X e ) > 0, In the special case when the hypergraph is a graph, i.e., |ξ(e)| = 2, the model reduces to the pairwise independent network (PIN) model in [40]. The hypergrahical source can also be viewed as a special case of the finite linear source considered in [12] if the edge random variables take values from a finite field.
For hypergraphical sources, various bounds on R S and C S (R) have been derived in [9,10,35,37]. The achieving scheme makes use of the idea of decremental secret key agreement [5,6], where the redundant or less useful edge variables are removed or reduced before public discussion. This is a special case of the compressed secret key agreement, where the compression step simply selects the more useful edge variables up to the joint entropy limit.
For the PIN model, it turns out that decremental secret key agreement is optimal, leading to a single-letter characterization of R S and C S (R) in [10]: It can be verified that (31a) is the smallest value of R such that C S (R) = C S (∞) using (31b). While the proof of converse, i.e., ≤ for (31b), is rather involved, the achievability is by a simple tree packing protocol, which belongs to the decremental secret key agreement approach that removes excess edges unused for the maximum tree packing. In other words, the achieving scheme is a compressed secret key agreement scheme. This connection will lead to a single-letter characterization ofC S (α) for the PIN model (in Theorem 2). To illustrate the above results, a single-letter characterization for C S (R) will be derived in the following for the source in Example 1. It will also demonstrate how an exact characterization for C S (R) can be extended from a PIN model to a hypergraphical model via some contrived arguments. The characterization will also be useful later in Example 3 to give an exact characterization ofC S (α).  (23), we have R CO = 1 with the optimal solution r 1 = r 3 = 0 and r 2 = 1. This means that user 2 needs to discuss 1 bit to attain omniscience. In particular, user 2 can reveal the XOR X b ⊕ X c so that user 1 and 3 can recover X c and X b respectively from their observations. By (24b), then, we have It can also be checked that the alternative characterization of C S (∞) in (26) gives Next, we argue that The achievability, i.e., the inequality C S (R) ≥ 1+R, is by the usual time-sharing argument. In particular, the bound C S (0.5) ≥ 1.5, for example, can be achieved by the compressed secret key agreement scheme in Example 1 with α = 2, i.e., by time-sharing the compressed secret key agreement schemes for α = 1 and for α = 3 equally. More precisely, we setZ 1 = (X n a , X It follows that the public discussion rate is lim sup n→∞ 1 n log|F | = 0.5. Now, to prove the reverse inequality ≤ for (33), we modifies the source Z V to another source Z ′ V defined as follows with an additional uniformly random and independent bit X d : is obtained from Z 2 by adding X d , and Z ′ 3 is obtained from Z 3 by adding X d and removing X a . It follows that Z ′ V is a PIN. By (26) and (31b), the constrained secrecy capacity for the modified source Z ′ V is C ′ S (R) = min{R, 2}. The desired inequality is proved if we can show that C ′ S (R + 1) ≥ C S (R). To argue this, note that, if user 2 reveals F ′ 2 = X a ⊕ X d in public, then user 3 can recover X a . Furthermore, F ′ 2 does not leak any information about X a , and so the source Z ′ V effectively emulates the source Z V . Consequently, any optimal discussion scheme F V that achieves C S (R) for Z V can be used to achieve the same secret key rate but after an additional bit of discussion F ′ 2 . This gives the desired inequality that establishes (33).

Multi-letter characterization
We start with a simple multi-letter characterization of the compressed secrecy capacity in terms of the MMI (26).

Proposition 2.
For any α ≥ 0, we havẽ where the supremum is over all valid compressed sourceZ V satisfying the joint entropy limit (7).
Proof. This is because the compressed secrecy capacity is simply the secret key agreement on a compressed source. Hence, by (26), the MMI on the compressed source gives the compressed secrecy capacity.

⊓ ⊔
The characterization in (34) is simpler than the formulation in (6) because it does not involve the random variables F and K, nor the recoverability (4) and secrecy (5) constraints. Although such a multi-letter expression is not computable and therefore not accepted as a solution to the problem, it serves as an intermediate step that helps derive further results. More precisely, consider the bivariate case where V = {1, 2}. Then, (34) becomes If in addition the joint entropy constraint (35b) is replaced by the entropy constraint on user 1 only, i.e., lim sup thenC S (α) can be single-letterized by standard techniques as in [21] toC S,1 (α) defined in (21). The following gives a simple upper bound that is tight for sufficiently small α.
Proof. Monotonicity is obvious. Continuity and concavity can be shown by the usual time-sharing argument as in Proposition 1. (36) follows directly from the data processing inequality that , then there exist a feasible solution U to (18) (a common function of Z 1 and Z 2 ) with H(U) ≥ α, and so the compressed sourcesZ 1 andZ 2 can be chosen as a function of U n to achieve the equality for (36). Conversely, suppose J GK (Z 1 ∧ Z 2 ) is finite and (36) is satisfied with equality. Then, in addition to Z ′ 1 − Z 1 − Z 2 , we also have Z ′ 1 − Z 2 − Z 1 , which implies by the double Markov property that, for the maximum common function U achieving J GK (Z 1 ∧ Z 2 ) defined in (18), In other words, the optimal Z ′ 1 is a stochastic function of the maximum common function of Z 1 and Z 2 , and so α = We will show that the above upper bound in (36) extends to the multi-user case (in Proposition 5). However, for α ≥ J GK (Z 1 ∧ Z 2 ), the above upper bound is not tight even in the two-user case. To improve the upper bound, the following duality betweenC S,1 and C S,1 will be used and extended to the multi-user case (in Theorem 1).
Furthermore, the set of optimal solutions to the left (achievingC S,1 (α) defined in (21)) is the same as the set of optimal solutions to the right (achieving C S,1 (R) in (17) with R = α −C S,1 (α)). It follows that the minimum admissible entropy (12) but with the entropy constraint on user 1 instead is where R S,1 and J W,1 (Z 1 ∧ Z 2 ) are defined in (19) and (20) respectively.
Proof. Set R = α −C S,1 (α). Consider first an optimal solution Z ′ 1 toC S,1 (α) and show that it is also an optimal solution to C S,1 (R). By optimality, By the constraint (21b), I(Z ′ 1 ∧ Z 1 ) ≤ α. It follows that the constraint (17b) holds, and so Z ′ 1 is a feasible solution to C S,1 (R), i.e., we have ≥ for (37) that To show that Z ′ 1 is also optimal to C S,1 (R), suppose to the contrary that there exists a strictly better solution Z ′′ 1 to C S,1 (R), i.e., with It follows that The last equality means that the constraint (21b) is satisfied with equality. If to the contrary that the equality does not hold, setting Z ′ 1 to be Z ′′ 1 for some fraction λ > 0 of time gives a better solution to C S,1 (R), contradicting the optimality of Z ′ 1 . The first inequality can also be argued similarly by the optimality of Z ′ 1 . Now, we have where (a) is by the concavity ofC S,1 (α); and (b) is by the upper boundC S,1 (α) ≤ α in (36). N.b., equality cannot hold simultaneously for (a) and (b) because, otherwise, we have I(Z ′′ 1 ∧Z2) I(Z ′′ 1 ∧Z1) = 1, which, together with (41) and (42), contradicts the result in Proposition 3 thatC S,1 (α) < α (with strict inequality) for α > J GK (Z 1 ∧ Z 2 ). Hence, which, together with (41) and (42), implies contradicting even the feasibility of Z ′′ 1 to C S,1 (R), namely, the constraint (17b) with Z ′ 1 replaced with Z ′′ 1 . This completes the proof of the optimality of Z ′ 1 to C S,1 (R).
Next, consider showing that an optimal solution Z ′ 1 to C S,1 (R) is also optimal toC S,1 (α). Then, where the first inequality is by (17b); the second equality is by the optimality of Z ′ 1 ; and the last inequality follows from (40). Hence, the constraint (21b) holds and so Z ′ 1 is a feasible solution forC S,1 (α). If to the contrary that we have a better solution Z ′′ 1 forC S,1 (α), then Z ′′ 1 can be shown to be a feasible solution for C S,1 (R), contradicting the optimality of Z ′ 1 . ⊓ ⊔

Main Results
The following extends the single-letter upper bound (36) in Proposition 3 to the muli-user case.
Proof. The upper boundC S (α) ≤ α is because nC S (α) cannot exceed the unconstrained secrecy capacity for the compressed sourceZ V , which, by (22) and (7), is upper bounded by H(Z V ) ≤ n [α + δ n ] for some δ n → 0 as n → ∞. Next, to prove the equality condition is sufficient, suppose α ≤ J GK (Z V ). Then, each user can compress their source directly to a common secret key at rate α without any public discussion. Hence,C S (α) = α as desired.
⊓ ⊔ N.b., unlike the two-user case in Proposition 3, the equality condition above in terms of the multivariate Gács-Körner common information is sufficient but not shown to be necessary. Nevertheless, necessity seems very plausible, as there seems to be no counter-example that suggests otherwise. As in Proposition 4, a duality can be proved in the multi-user case, relating the compressed secret key agreement problem to the constrained secrecy key agreement problem. Theorem 1. With C S (R) and R S defined in (9) and (12) respectively, we have for all α ≥ 0.
Proof. (43a) can be obtained from (43b) by setting α = α S as follows: where (b) is given by (43b) with α = α S ; while (a) and (c) follows directly from (11), (13) and monotonicity. It follows that the inequalities (a) and (b) hold with equality. In particular, equality for (a) means that To show (43b), consider an optimal compressed secret key agreement scheme achievingC S (α) with an arbitrary entropy limit α. It suffices to show that the discussion rate need not be larger than α −C S (α). LetZ V be the optimal compressed source andR CO be the smallest rate of communication for omniscience ofZ V , which is given by (23) with Z V replaced byZ V . The discussion rate for the omniscience strategy is by (22). This simplifies to α −C S (α) as desired in the limit n → ∞. N.b., since the omniscience strategy is non-interactive, the desired hold even if C S and R S are defined with non-interactive discussion.

⊓ ⊔
While it is obvious from the above proof that a compressed secret key agreement scheme can be used as a constrained secret key agreement scheme, yielding one of the best lower bounds for C S (R) in [10], the above result also means that a converse result on constrained secret key agreement can be applied to compressed secret key agreement. Upper bounds onC S (α) may be obtained from the upper bounds for C S (R) such as those in [10]. It turns out that this approach can give better upper bounds which, surprisingly, is tight for the PIN model as mentioned in Section 3.3. This leads to the following exact single-letter characterization ofC S (α).
Proof. (44a) follows easily from (44b) by setting the two terms in the minimization to be equal and solving for α. To show (44b), note that, by (31b), we have C −1 S (γ) = (|V | − 2)γ ∀γ < C S (∞) because C S (R) is non-decreasing and concave, and so it must be strictly nondecreasing before it reaches C S (∞) = C S (∞). Now, by (43b), for any α ≥ 0 such thatC S (α) < C S (∞), i.e., for α ≤ α S , and soC S (α) ≤ α |V |−1 . This implies ≤ for (44b). The bound is achievable by the same achieving scheme in [10,Theorem 4.4] along the idea of decremental secrecy key agreement and the tree packing protocol in [39]. More precisely, every (|V | − 1) bits of edge variable forming a spanning tree are turned into a secret key bit by the tree packing protocol. This results in the factor of (|V | − 1) in (44), which corresponds to the number of edges in a spanning tree.
⊓ ⊔ For the more general source model, the idea of decremental secret key agreement needs to be refined because there need not be any edge variables to remove. The following is a simple extension that leads to a single-letter lower bound oñ C S (α).
Theorem 3. A single-letter lower bound onC S (α) is for any random vector (Q, Z ′ V ) taking values from a finite set and satisfying Furthermore, it is admissible to have |Q| ≤ 3.
Proof. By (46b), we have Z ′ i = ξ i (Z i , Q) for some function ξ i . W.l.o.g., let Q := {1, . . . , k} for some integer k > 0. ChooseZ i to be the following function of Z n i : Basically, Q acts as a time-sharing random variable where P Q (q) is the fraction of time the source Z i is processed to Z (q) i := ξ i (Z i , q), for 1 ≤ q ≤ k. More precisely, we have nq −nq−1 n converge to P Q (q), and so by (46c), satisfying the entropy limit (7). Hence,Z V is a valid compressed source, the unconstrained capacity of which is I(Z ′ V |Q), leading to the desired lower bound (45).
The condition that |Q| ≤ 3 is admissible follows from the usual argument by the well-known Eggleston-Carathéodory theorem. More precisely, let It can be seen that the conditions above are equivalent to (46a) and (46b) respectively, and so the set of feasible values to (46), namely is equal to the convex hull of S . Since the dimension of S is at most 2, the pair (C S (Z V ), α) can be obtained as a convex combination of at most 3 points in S as desired by the Eggleston-Carathéodory theorem.

⊓ ⊔
The main results above can be illustrated as follows using the hypergraphical source in Example 1 given earlier. In particular, an exact single-letter characterization ofC S (α) will be derived, even though such an exact characterization is not known for general hypergraphical sources.
Example 3. Consider the source defined in (15) in Example 1. It will be shown that (16a) and (16b) are satisfied with equality, which gives the desired singleletter characterization ofC S (α).
It is easy to show that J GK (Z V ) = 1 since X a is the maximum common function of Z 1 , Z 2 and Z 3 . Hence, the reverse inequality of (16a) follows from Proposition 5.
Inspired by the idea of decremental secret key agreement and its application to the constrained secret key agreement problem, we have formulated a multiterminal secret key agreement problem with a more general source compression step that applies beyond the hypergraphical source model. This formulation allow us to separate and compare the issues of source compression and discussion rate constraint in secret key agreement. While a single-letter characterization of the compressed secrecy capacity and admissible entropy limit remains unknown, single-letter bounds have been derived and they are likely to be tight for the hypergraphical model, and possibly more general source models such as the finite linear source model [12]. For the PIN model, in particular, the bounds are tight, giving rise to a complete characterization of the capacity in Theorem 2. One way to improve the current converse results is to show whether the equality condition in Proposition 5 is necessary, that is,C S (α) < α for α > J GK (Z V ). By the duality in Theorem 1, the condition is necessary if one can show that C S (0) = J GK (Z V ), i.e., (28) holds with equality. Such equality can be proved for hypergraphical as well as finite linear sources by extending the lamination techniques in [10]. It is hopeful that a complete solution can be given for the finite linear source model and the well-known jointly gaussian source model. The bounds (43) in the duality result may plausibly be tight for these special sources, in which case non-interactive discussion suffices to achieve the constrained secrecy capacity. The current achievability results may also be improved. In particular, for the two-user case with joint entropy constraint (35), the lower bound in (45) can be improved toC S (α) ≥ max I(Z ′ 1 ∧ Z ′ 2 ) where I(Z ′ 1 ∧ Z 1 ) + I(Z ′ 2 ∧ Z 2 ) ≤ α and Z ′ 1 −Z 1 −Z 2 −Z ′ 2 . Whether this improvement is strict or is the best possible is not clear yet but an extension to the multi-user case seems possible. A related open problem is to characterize the C S (R) in the two-user case with two-way noninteractive discussion. A simpler question is whether two-way non-interactive discussion can be strictly better than one-way discussion.
As pointed out before, by regarding the secrecy capacity as a measure of mutual information, an optimal source compression scheme translates to a dimension reduction technique potentially useful for machine learning. A closely related line of work is the study of the strong data processing inequality in [2,3,24], in particular, the ratio s * (Z 1 ; Z 2 ) := sup (21), the supremum is taken over the choice of the conditional distribution P Z ′ 1 |Z1,Z2 such that Z ′ 1 −Z 1 −Z 2 forms a Markov chain and I(Z ′ 1 ∧Z) > 0. It is straightforward to show that sup α≥0C S (α) α for the two-user case in (35) is upper bounded by s * (Z 1 ; Z 2 ) and s * (Z 2 ; Z 1 ). However, a sharper bound and a more precise mathematical connection may be possible, and the result may be extended to the multivariate case. Furthermore, the linearization considered in [29] may potentially be adopted to provide a single-letter lower bound on the compressed secrecy capacity. As in [2,32], the problem may also be related to a notion of maximum correlation appropriately extended to the multivariate case.