Unique Information and Secret Key Agreement

The partial information decomposition (PID) is a promising framework for decomposing a joint random variable into the amount of influence each source variable Xi has on a target variable Y, relative to the other sources. For two sources, influence breaks down into the information that both X0 and X1 redundantly share with Y, what X0 uniquely shares with Y, what X1 uniquely shares with Y, and finally what X0 and X1 synergistically share with Y. Unfortunately, considerable disagreement has arisen as to how these four components should be quantified. Drawing from cryptography, we consider the secret key agreement rate as an operational method of quantifying unique information. Secret key agreement rate comes in several forms, depending upon which parties are permitted to communicate. We demonstrate that three of these four forms are inconsistent with the PID. The remaining form implies certain interpretations as to the PID’s meaning—interpretations not present in PID’s definition but that, we argue, need to be explicit. Specifically, the use of a consistent PID quantified using a secret key agreement rate naturally induces a directional interpretation of the PID. We further reveal a surprising connection between third-order connected information, two-way secret key agreement rate, and synergy. We also consider difficulties which arise with a popular PID measure in light of the results here as well as from a maximum entropy viewpoint. We close by reviewing the challenges facing the PID.


Introduction
Consider a joint distribution over "source" variables X 0 and X 1 and "target" Y. Such distributions arise in many settings: sensory integration, logical computing, neural coding, functional network inference, and many others. One promising approach to understanding how the information shared between (X 0 , X 1 ), and Y is organized is the partial information decomposition (PID) [1]. This decomposition seeks to quantify how much of the information shared between X 0 , X 1 , and Y is done so redundantly, how much is uniquely attributable to X 0 , how much is uniquely attributable to X 1 , and finally how much arises synergistically by considering both X 0 and X 1 together.
Unfortunately, the lack of a commonly accepted method of quantifying these components has hindered PID's adoption. In point of fact, several proposed axioms are not mutually consistent [2,3]. Furthermore, to date, there is little agreement as to which should hold. Here, we take a step toward understanding these issues by adopting an operational definition for the unique information. This operational definition comes from information-theoretic cryptography and quantifies the rate at which two parties can construct a secret while a third party eavesdrops. Said more simply, for a source and the target to uniquely share information, no other variables can have any portion of that information-their uniquely shared information is a secret that only the source and target have.
There are four varieties of secret key agreement rate depending on which parties are allowed to communicate, each of which defines a different PID. Each variety also relates to a different intuition as to how the PID operates. We discuss several aspects of these different methods and further demonstrate that three of the four fail to construct an internally consistent decomposition. The surviving method induces a directionality on the PID that has not been explicitly considered before.
Our development proceeds as follows. Section 2 briefly describes the two-source PID. Section 3 reviews the notion of secret key agreement rate and how to quantify it in three contexts: no one communicates, only Alice communicates, and both Alice and Bob communicate. Section 4 discusses the behavior of the PID quantified utilizing secret key agreement rates as unique information and what intuitions are implied by the choice of who is permitted to communicate. Section 5 compares the behavior of the one consistent secret key agreement rate PID with several others proposed in the literature. Section 6 explores two further implications of our primary results, first in a distribution where two-way communication seems to capture synergistic, third-order connected information and second in the behavior of an extant method of quantifying the PID along with maximum entropy methods. Finally, Section 7 summarizes our findings and speculates about PID's future.

Partial Information Decomposition
Two-source PID seeks to decompose the mutual information I[X 0 X 1 : Y] between "sources" X 0 and X 1 and a "target" Y into four nonnegative components. The components identify information that is redundant, uniquely associated with X 0 , uniquely associated with X 1 , and synergistic: Furthermore, the mutual information I[X 0 : Y] between X 0 and Y is decomposed into two components: unique with X 0 (2) and, similarly: In this way, PID relates the four component pieces of information. However, since Equations (1) to (3) provide only three independent constraints for four quantities; it does not uniquely determine how to quantify them in general. That is, this fourth constraint lies outside of the PID.
By the same logic, though, the decomposition is uniquely determined by quantifying exactly one of its constituents. In the case that one wishes to directly quantify the unique information , a consistency relation must hold so that they do not overconstrain the decomposition: This ensures that using either Equation (2) or Equation (3) results in the same quantification of

Secret Key Agreement
Secret key agreement is a fundamental concept within information-theoretic cryptography [4]. Consider three parties-Alice, Bob, and Eve-who each partially observe a source of common randomness, joint probability distribution A, B, E ∼ p(a, b, e), where Alice has access only to a, Bob b, and Eve e. The central challenge is to determine if it is possible for Alice and Bob to agree on a secret key of which Eve has no knowledge. The degree to which they may generate such a secret key immediately depends on the structure of the joint distribution A, B, E. It also depends on whether Alice and Bob are allowed to publicly communicate.
Concretely, consider Alice, Bob, and Eve each receiving n independent, identically distributed samples according to p(a, b, e)-Alice receiving A n , Bob B n , and Eve E n , where X n denotes a sequence of random variables X 1 , X 2 , . . . , X n . Note that, although each party's observations X i , X j are independent, the observations of different parties at the same time are correlated according to p(a, b, e). A secret key agreement scheme consists of functions f and g, as well as a protocol h for public communication allowing either Alice, Bob, neither, or both to communicate. In the case of a single party being permitted to communicate-say, Alice-she constructs C = h(A n ) and then broadcasts it to all parties over an authenticated channel. In the case that both parties are permitted communication, they take turns constructing and broadcasting messages of the form [5]. Said more plainly, Alice's public messages are a function of her observations and any prior public communication from both parties. Bob's public messages are a function of his observations and any prior communication from both parties.
Formally, a secret key agreement scheme is considered R-achievable if for all > 0: ≤ , and where (1) and (2) denote the method by which Alice and Bob construct their keys K A and K B , respectively, (3) states that their keys must agree with arbitrarily high probability, (4) states that the information about the key which Eve-armed with both her private information E n as well as the public communication C-has access is arbitrarily small, and (5) states that the key consists of approximately R bits per sample. The greatest rate R such that an achievable scheme exists is known as the secret key agreement rate. Notational variations indicate which parties are permitted to communicate. In the case that Alice and Bob are not allowed to communicate, their rate of secret key agreement is denoted S(A : B || E). When only Alice is allowed to communicate, their secret key agreement rate is S(A → B || E) or, equivalently, S(B ← A || E). When both Alice and Bob are allowed to communicate, their secret key agreement rate is denoted S(A ↔ B || E). In this, we modified the standard notation for secret key agreement rates to emphasize which party or parties communicate.
The secret key agreement rates obey a simple partial order. S(A : B || E) lower bounds both S(A → B || E) and S(A ← B || E), since no communication is a special case of one party communicating. Similarly, both S(A → B || E) and S(A ← B || E) lower bound S(A ↔ B || E), since only one party communicating is a special case of both parties communicating. Other than having some identical lower and upper bounds, S(A → B || E) and S(A ← B || E) are themselves generally incomparable.
In the case of no communication, S(A : B || E) is given by [6]: where X Y denotes the Gács-Körner common random variable [7]. It is worth noting that the entropy of this variable, the Gács-Körner common information, is not continuous under smooth changes in the probability distribution. It is also only nonzero for a measure-zero set of distributions within the simplex; specifically, the set of distributions whose joint events form bipartite graphs with multiple connected components [8].
In the case of one-way communication, S(A → B || E) is given by [9]: where the maximum is taken over all variables C and K, such that the following Markov condition holds: This quantifies the maximum amount of information that Bob can share with the key above the amount that Eve shares with the key, both given the public communication. It suffices, by the Fenchel-Eggleston strengthening of Carathéodory's theorem [10], to assume K and C alphabets are limited: |K| ≤ |A| and |C| ≤ |A| 2 .
There is no such closed-form, calculable solution for S(A ↔ B || E); however, various upper-and lower-bounds are known [5]. One simple lower bound is the supremum of the two one-way secret key agreement rates, as they are both a subset of bidirectional communication. An even simpler upper bound that we will use is the intrinsic mutual information [11]: This states that the amount of secret information that Alice and Bob share is no greater than their mutual information conditioned on any modification of Eve's observations. Here, E is an arbitrary stochastic function of E or alternatively the result of passing E through a memoryless channel.
The unique PID component I ∂ [X 0 → Y \ X 1 ] could be assigned the value of a secret key agreement rate under four different schemes. First, neither X 0 nor Y may be allowed to communicate. Second, only X 0 can communicate. Third, only Y is permitted to communicate. Finally, both X 0 and Y may be allowed to communicate. Note that the eavesdropper X 1 is not allowed to communicate in any secret sharing schemes here.
Secret key agreement rates have been associated with unique information before. One particular upper bound on S(A ↔ B || E)-the intrinsic mutual information Equation (7)-is known to not satisfy the consistency condition Equation (4) [12]. Rosas et al. [13] briefly explored the idea of eavesdroppers influence on the PID. More recently, the relationship between a particular method of quantifying unique information and one-way secret key agreement S(X 0 ← Y || X 1 ) has been considered [14]. Furthermore, there are analogous notions of secret key agreement rates within the channel setting, as opposed to the source setting considered here. We leave an analysis of that setting as future work.

Cryptographic Partial Information Decompositions
We now address the application of each form of secret key agreement rate as unique information in turn. For each resulting PID, we consider two distributions. The first is that called POINTWISE UNIQUE, chosen here to exemplify the differing intuitions that can be applied to the PID. The second distribution we look at is entitled PROBLEM as it serves as a counterexample demonstrating that three of the four forms of secret key agreement do not result in a consistent decomposition. Both distributions are given in Figure 1. Although we only consider two distributions here, their behaviors are rich enough for us to draw out the two main results of this work. Further examples are given in Section 5.
Interpreting the POINTWISE UNIQUE [15] distribution is relatively straightforward. The target Y takes on the values "1" and "2" with equal probability. At the same time, exactly one of the two sources (again with equal probability) will be equal to Y, while the other is "0". The mutual information I[X 0 : Y] = 1 /2 bit and I[X 1 : Y] = 1 /2 bit.
The PROBLEM distribution lacks the symmetry of POINTWISE UNIQUE, yet still consists of four equally probable events. The sources are restricted to take on pairs "00", "01", "02", "10". The target Y is equal to a "1" if either X 0 or X 1 is "1", and is "0" otherwise. With this distribution, the mutual information I[X 0 : Y] = 0.3113 bit and I[X 1 : Y] = 1 /2 bit.
Two distributions of interest: The first, POINTWISE UNIQUE, exemplifies the directionality inherent in the one-way secret key agreement rates. The second, PROBLEM, demonstrates that the no-communication, one-way communication with the source communicating ("camel"), and the two-way communication secret key agreement rates result in inconsistent PIDs (partial information decomposition).

No Public Communication
In the first case, we consider the unique information from X i to Y as the rate at which X i and Y can agree on a secret key while exchanging no public communication: This approach has some appeal: the PID is defined simply by a joint distribution without any express allowance or prohibition on public communication. However, given its quantification in terms of the Gács-Körner common information, the quantity S(X i : Y || X j ) does not vary continuously with the distribution of interest. Now, what is the behavior of this measure on our two distributions of interest?
When applied to POINTWISE UNIQUE, each source and the target are unable to construct a secret key. In turn, each unique information is determined to be 0 bits. This results in a redundancy and a synergy each of 1 /2 bit.
The PROBLEM distribution demonstrates the inability of S(X i : Y || X j ) to construct a consistent PID. In this instance, as in the case of POINTWISE UNIQUE, no secrecy is possible and each unique information is assigned a value of 0 bits. We therefore determine from Equation (2) that the redundancy should be I[ This contradiction demonstrates that no-communication secret key agreement rate cannot be used as a PID's unique components.
The resulting partial information decompositions for both distributions are listed in Table 1.

One-Way Public Communication
We next consider the situation when one of the two parties is allowed public communication. This gives us two options: either the source X i communicates to target Y or vice versa. Both situations enshrine a particular directionality in the resulting PID.
The first, where X i constructs C = h(X n i ) and publicly communicates it, emphasizes the channels X i → Y and the channel X 0 X 1 → Y. This creates a narrative of the sources conspiring to create the target. We call this interpretation the camel intuition, after the aphorism that a camel is a horse designed by committee. The committee member X i may announce what design constraints they brought to the table.
The second option, where Y constructs C = h(Y n ) and publicly communicates it, emphasizes the channels Y → X i and Y → X 0 X 1 . It implies the situation that the sources are imperfect representations of the target. We call this interpretation the elephant intuition, as it recalls the parable of the blind men describing an elephant for the first time. The elephant Y may announce which of its features is revealed in a particular instance.

Camels
The first option adopts , bringing to mind the idea of sources acting as inputs into some scheme by which the target is produced. When viewed this way, one may ask questions such as "How much information in X 0 is uniquely conveyed to Y?". Furthermore, the channels X 0 → Y, X 1 → Y, and X 0 X 1 → Y take center stage.
Through this lens, the POINTWISE UNIQUE distribution has a clear interpretation. Given any realization, exactly one source is perfectly correlated with the target, while the other is impotently "0". From this vantage, it is clear that the unique information should each be 1 /2 bit, and this is borne out with the one-way secret key agreement rate. To see this, consider X 0 broadcasting each time they observed a "1" or a "2". This corresponds to the joint events "101" and "202", respectively. It is clear that, when considering just these two events, X 0 and Y can safely utilize their observations which will agree exactly and with which X 1 has no knowledge. Since these events occur half the time, we conclude that the secret key agreement rate is 1 /2 bit. This implies that the redundancy and synergy of this decomposition are both 0 bits.
For the PROBLEM distribution, we find that X 1 can broadcast the times when they observed a "1" or a "2", which correspond to Y having observed a "1" or "0", respectively. In both instances, X 0 observed a "0" and so cannot deduce what the other two have agreed upon. This leads to S(X 1 → Y || X 0 ) being equal to 1 /2 bit. At the same time, S(X 0 → Y || X 1 ) vanishes. However, PROBLEM's redundancy and synergy cannot be quantified, since the two secret key agreement schemes imply different redundancies and so are inconsistent with Equation (4).
The resulting PIDs for both are given in Table 2.

Elephants
When the target Y is the one party permitted communication, one adopts I ∂ X i → Y \ X j = S(X i ← Y || X j ) and we can interpret the sources as alternate views of the singular target. Consider, for example, journalism where several sources give differing perspectives on the same event. When viewed this way, one might ask a question such as "How much information in Y is uniquely captured by X 0 ?". The channels X 0 ← Y, X 1 ← Y, and X 0 X 1 ← Y are paramount with this approach. We denote these in reverse to emphasize that Y is still the target in the PID.
Considered this way, the POINTWISE UNIQUE distribution takes on a different character. The sources each receive identical descriptions of the target-accurate half the time and erased the remainder. The description is identical, however. Nothing is uniquely provided to either source. This is reflected in the secret key agreement rates: Y can broadcast her observation, restricting events to either "011" and "101" or to "022" and "202". In either case, these restrictions do not help in the construction of a secret key since Y cannot further restrict to cases where it is X 0 that agrees with her and not X 1 . This makes each unique information 0 bits, leaving both the redundancy and synergy 1 /2 bit.
The PROBLEM distribution's unique information are S(X 0 ← Y || X 1 ) = 0 bit and S(X 1 ← Y || X 0 ) = 0.1887 bit. Unlike the prior two decompositions, this unique information satisfies Equation (4). The resulting redundancy is 0.3113 bit while the synergy is 1 /2 bit.
Their PIDs are listed in Table 3. Thus, by having Y publicly communicate and thus invoking a particular directionality, we finally get a consistent PID. Table 3. PID (partial information decomposition) for POINTWISE UNIQUE and PROBLEM when quantified using one-way communication secret key agreement rate with the target permitted public communication: POINTWISE UNIQUE decomposes into 1 /2 bit for either unique information, and into 0 bit for both the redundancy and synergy. PROBLEM admits unique information of 0 bit and 0.1887 bit, respectively. This results in a redundancy of 0.3113 bit and a synergy of 1 /2 bit, providing a consistent PID.

Two-Way Public Communication
We finally turn to the full two-way secret key agreement rate: . This approach is also appealing, as it does not ascribe any directionality to interpreting the PID. Furthermore, it varies continuously with the distribution, unlike the no-communication case. However, this quantity is generally impossible to compute directly, with only upper and lower bounds known. Fortunately, this only slightly complicates the analyses we wish to make.
In the case of the POINTWISE UNIQUE distribution, it is not possible to extract more secret information than was done in the camel situation. Therefore, the resulting PID is identical: unique information of 1 /2 bit and redundancy and synergy of 0 bits. PROBLEM, however, is again a problem. In this instance, upper and lower bounds on S(X 1 ↔ Y || X 0 ) converge: the larger of the two one-way secret key agreement rates form a lower bound of 1 /2 bit, while the upper bound provided by the intrinsic mutual information is also 1 /2 bit, and so we know this value exactly. Utilizing the consistency relation Equation (4), we find that the other unique information must be 0.3113 bit in order for the full decomposition to be consistent.
However, the intrinsic mutual information places an upper bound of 0.1887 bit on S(X 0 ↔ Y || X 1 ). We therefore must conclude that two-way secret key agreement rates cannot be used to directly quantify unique information and a consistent PID cannot be built using them.
The resulting PIDs for both these distributions can be seen in Table 4. Table 4. PID for POINTWISE UNIQUE and PROBLEM when quantified using two-way communication secret key agreement rate: POINTWISE UNIQUE decomposes into 1 /2 bit for either unique information, and into 0 bits for both the redundancy and synergy. PROBLEM's redundancy and synergy cannot be quantified because the two secret key agreement rates result in different quantifications indicated by the symbol .

Summary
To conclude, then, there is only one secret-key communication scenario-Y publicly communicates-that yields a consistent PID, as in Table 3. The above arguments by counterexample pruned away the unworkable scenarios, narrowing to only one. Naturally, this does not constitute proof that the remaining scenario always leads to a consistent PID. The narrowing, though, allowed us to turn to numerical searches using the dit [16] software package. Extensive searches were unable to find a counterexample. Thus, practically, with a high probability, this scenario leads to consistent PIDs.
Though this is the singular viable secret key agreement rate-based PID, we hesitate to fully endorse its use due to the necessary directionality that comes with it. That is, one must invoke a directionality, unspecified by the PID, to have a consistent PID when using secret key agreement as the basis for the PID component of unique information. It is not immediately obvious as to why only S(X i ← Y || X j ) results in a viable partial information decomposition, if this is indeed the case. We leave a proof as to whether (4) where both optimizations are performed over the space is the only secret key agreement rate resulting in a viable PID due to the fact that the spaces in which the solutions are found are identical. The reasons why the other three secret key agreement rates fail to form consistent decompositions are likely particular to each scenario. In the case of no communication, the limitations carried over from the Gács-Körner common information play a major role-specifically, that it vanishes even for weakly mixing distributions. In the case of the one-way "camel" secret key agreement rate, it is possible that the failure arises from the optimization spaces of each unique information being different. Finally, for the case of two-way communication, we offer several speculations in Section 6.1.
These measures of unique information can also be applied within the multivariate sources setting [1], though, like other measures of unique information [17,18], these measures cannot fully quantify the general decompositions there.

Examples
We next demonstrate the behavior of the partial information decomposition quantified using I ∂ X i → Y \ X j = S(X i ← Y || X j ), herein denoted I ← . On many of the "standard" distributions-RDN, UNQ, XOR-I ← behaves as intuited by Griffith [19]. Here, we compare its behavior with that of several other proposed measures-I min [1], I proj [20], I BROJA [17], I ccs [21], I dep [18], I ± [15], and I RR [22]-across five distributions-AND, DIFF, NOT TWO, PNT. UNQ., and TWO BIT COPY. These distributions are given in Figure 2. Note that Reference [14] proved that the secret key agreement rates S(X i ← Y || X j ) lower bound the unique information of I BROJA . And Two Bit Copy The AND distribution-the first set of results in Table 5-yields the same decomposition under I min , I proj , I BROJA , and I ← . In each of these cases, there is no unique information, resulting in 0.311 bit of redundancy and 1 /2 bit of synergy. I ± produces negative unique values of −1 /4 bit with a redundancy of 0.561 bit and a synergy of 3 /4 bit. I ccs , I dep , and I RR all produce positive unique values, indicating that, at least in this case, they interpret unique information as something more than the ability of source and target to agree on a secret when the target can communicate.
The DIFF distribution-the second set of results in Table 5-is named so due to its keen ability to differentiate PID measures. In this example, two pairs of PIDs coincide: I min with I ccs and I BROJA with I dep . In magnitude, I ← attributes the least to unique information for this distribution, while both I BROJA and I dep attribute the most.
The NOT TWO distribution-the third set of results in Table 5-is named due to it consisting of all binary events over three variables that do not contain two "1"s. As far as the behavior of the various PIDs goes, the distribution is very similar to the AND distribution. I min , I proj , I BROJA , and I ← all allot 0 bit to the unique information, while I ± finds the unique information to be negative. Here, I ccs also finds them to be negative. There is a major difference between the NOT TWO and AND distributions, however: the NOT TWO possesses a great deal of third-order information, while the AND possesses none. This indicates that, although the PIDs of the two distributions are qualitatively similar, their structures are in fact quite distinct.
Next, we consider the PNT. UNQ. distribution-the fourth set of results in Table 5-again. We see that I min , I proj , I BROJA , and I ← behave as elephants, while I ccs and I ± behave as camels. I RR does not commit to either but leans towards elephant, while I dep splits the difference.
Finally, we consider the venerable TWO BIT COPY distribution, and the final set of results in Table 5. Here, I min and I ± stand out as assigning one bit to redundant information and one bit to synergy, while assigning nothing to unique. All other measures assign one bit to each unique piece of information and nothing to redundancy and synergy. Note that this is not a directionality issue: all four secret key agreement rates agree that the rate at which a secret key can be constructed is 1 bit.

Discussion
We now turn to two follow-on developments arising from the tools developed thus far. First, we define a distribution whose two-way secret key agreement rates behave in a curious manner with very interesting implications regarding the nature of information itself. Second, we take a closer look at an alternative proposal for quantifying unique information and describe its behavior in relationship to the camel-elephant dichotomy defined in Section 4.

When Conversation Is More Powerful Than a Lecture
We now explore the PID quantified by two-way secret key agreement further. Though this does not generally form a consistent decomposition, in our exploration of its behavior, we discovered an interesting phenomena independent of its use as a measure of unique information. Consider the GIANT BIT distribution, which exemplifies redundant information. The distribution G.B. ERASED, resulting from passing each variable through an independent binary erasure channel (BEC), exhibits many interesting properties. It is listed in Figure 3. Most notably, the one-way secret key agreement rates between any two variables with the third eavesdropping vanish. However, the two-way secret key agreement rate is equal to pp 2 = I X i : Y|X j [23]. Furthermore, notice that subtracting Equation (3) from Equation (1) tells us that: That is, the conditional mutual information is equal to unique information plus synergistic information.
Evaluating the PID using S(X i ↔ Y || X j ) as unique information results, in this case, in a consistent decomposition. Furthermore, the redundant and synergistic information are zero. This is, however, troublesome: G.B. ERASED possesses nonzero third-order connected information [24], a quantity commonly considered a component of the synergy [18]. Indeed, it is provably attributed to synergy by both the I dep [18] and I BROJA [17] methods, and likely others as well. No other proposed method of quantifying the PID results in zero redundancy or synergy. The implication here is that, if indeed the third-order connected information is a form of synergy, the two-way secret key agreement rate overestimates unique information by including some types of synergistic effect. Therefore, we conclude that bidirectional communication between two parties can, in some instances, determine information held solely in trivariate interactions. This may underlie the inability of a two-way secret key agreement rate to form a consistent partial information decomposition: in some instances, the third-order connected information is attributed to both unique information when it should be attributed solely to the synergistic information. It remains to understand (i) how independently and identically transforming a distribution with no third-order connected information can result in its creation and (ii) how only two of the variables can recover it when allowed to communicate. Figure 3. Distribution whose one-way secret key agreement rates are all 0 bits, yet has a nonzero two-way secret key agreement rate. It is constructed from the GIANT BIT distribution by passing each variable independently through a binary erasure channel BEC(p) with erasure probability p. This distribution has a two-way secret key agreement rate of pp 2 between any two variables with the third as an eavesdropper.

I BROJA , the Elephant
The measure I BROJA of Bertschinger et al. [17] is perhaps the most widely accepted and used method of quantifying the PID. Though popular, it has its detractors [15,21]. Here, we interpret the criticisms leveled and I BROJA as a product of camel intuitions being applied to an elephantesque [14] measure. In doing so, we will primarily consider the POINTWISE UNIQUE distribution.
As noted in Section 4.2, if a source is permitted to communicate with the target, then a secret key agreement rate of 1 /2 bit is achievable, while, if the target communicates with the source, then it is impossible to agree on a secret key. From this camel perspective, it is clear that each source uniquely determines the target, half the time. The elephant perspective, however, allots nothing to unique information as each source is provided with identical information. This would greatly disconcert the camel and may lead one to think that the elephant has "artificially inflated" the redundancy [21]. We next take a closer look at this notion, using I BROJA .
In the course of computing I BROJA for the distribution p(X 0 , X 1 , Y), the set of distributions: is considered. The ( * ) assumption [17] is then invoked, which states that redundancy and all unique information are constant within this family of distributions. To complete the quantification, the distribution with minimum I[X 0 X 1 : Y] is selected from this family. The resulting distribution associated with the POINTWISE UNIQUE distribution can be seen in Figure 4. Made explicit, it can now be seen that I BROJA does indeed correlate the sources, but, under assumption ( * ), this does not affect the redundancy. One aspect of I BROJA and assumption ( * ) we believe warrants further investigation is its relationship with maximum entropy philosophy [25]. The latter is, in effect, Occam's razor applied to probability distributions: given a set of constraints, the most natural distribution to associate with them is that with maximum entropy. As it turns out, this is equivalent to the distribution nearest the unstructured product-of-marginals distribution p(x, y, z, . . .) = p(x)p(y)p(z) . . . [26]: where D KL [P || P] is the relative entropy between distributions P and Q. Having briefly introduced the ideas behind maximum entropy, we next cast their light on the BROJA optimization employed to calculate I BROJA .
Let us first consider the distribution resulting from BROJA optimization. Its entropy is unchanged from the POINTWISE UNIQUE distribution indicating that it has the same amount of structure-they are equally distant from the product distribution. The BROJA distribution has reduced I[X 0 X 1 : Y] mutual information, however, indicating perhaps that the optimization has shifted some of the distribution's structure away from the sources-target interaction. It is interesting that this optimization could not simply remove the synergy from the distribution altogether, resulting in a larger entropy.
If one takes assumption ( * ) and directly applies the maximum entropy philosophy, a different distribution results. This distribution, seen in Figure 4, has a larger entropy than both the POINTWISE UNIQUE and the BROJA intermediate distribution, indicating that it in fact has less structure than either. Under assumption ( * ), the MAXENT distribution, also in Figure 4, retains all the redundant and unique information, while under maximum entropy it contains no structure not implied by the source-target marginals-specifically, no synergy. The combination of assumption ( * ) and maximum entropy philosophy does not, however, result in a viable partial information decomposition; the maximization of entropy can result in source-target mutual information which exceeds that of the original distribution. To be clear, this is not to claim that assumption ( * ) or BROJA optimization are wrong or incorrect, only that the optimization's behavior in light of well-established maximum entropy principles is subtle and requires careful investigation. For example, it may be that the source-target marginals do imply some level of triadic interaction and therefore the maximum entropy distribution reflects this lingering synergy. At the same time, BROJA minimization may be capable of maintaining that level of structure implied by the marginals, but somehow shunts it into H[Y|X 0 X 1 ] .

Conclusions
At present, a primary barrier for PID's general adoption as a useful and possibly central tool in analyzing how complex systems store and process information is an agreement on a method to quantify its component information. Here, we posited that one reason for disagreement stems from conflicting intuitions regarding the decomposition's operational behavior. To give an operational meaning to unique information and address these intuitions, we equated unique information with the ability of two parties to agree on a secret-a reasonably intuitive operationalization of what it means for two variables to share a piece of information that no others have. This led to numerous observations. The first is that the PID, as currently defined, is ambivalent to any notion of directionality. There are, however, very clear cases in which the assumption of a directionality-or lack thereof-is critical to the existence of unique information. Consider, for example, the case of the McGurk effect [27] where the visual stimulus of one phoneme and the auditory stimulus of another phoneme gives rise to the perception of a third phoneme. By construction, the stimuli cause the perception, and the channels implicit in a camel intuition are central. If one were to study this interaction using an elephant-like PID, it is unclear that the resulting decomposition would reflect the neurobiological mechanisms by which the perception is produced. Similarly, a camel-like measure would be inappropriate when interpreting simultaneous positron emission tomography (PET) and magnetic resonance imaging (MRI) scans of a tumor.
One can view this as the PID being inherently context-dependent and conclude that quantification requires specifying directionality. In this case, the elephant intuition is apparently more natural, as adopting closely-related notions from cryptography results in a consistent PID. If context demands the camel intuition, though, either a noncryptographic method of quantifying unique information is needed or consistency must be enforced by augmenting the secret key agreement rate. It is additionally possible that associating secret key agreement rates with unique information is fundamentally flawed and that, ultimately, PID entails quantifying unique information as something distinct from the ability to agree on a secret key. Whatever is missing has yet to be identified.
The next observation concerns the third-order connected information. We first demonstrated that such triadic information can be constructed from common information in which each constituent variable is independently and identically modified. Furthermore, it was shown that any two of three parties, when engaging in bidirectional communication, capture the triadic information. Granted, this does not generically occur. For example, if X 0 X 1 Y are related by XOR, the distribution contains 1 bit of third-order connected information, but S(X 0 ↔ Y || X 1 ) (or any permutation of the variables) is equal to 0 bits. This suggests that the third-order connected information may not be an atomic quantity, but rather consists of two parts, one accessible to two communicating parties and one not. This idea has been explored in a different context in Reference [28].
Our third observation regards the behavior of the I BROJA measure, especially in relation to standard maximum entropy principles. We first demonstrated that I BROJA indeed correlates sources, but argued that this behavior only seems inappropriate when adopting a camel intuition. We then discussed how its intermediate distribution is as structured as the initial one and so, if indeed I BROJA is operating correctly, it must shunt the dependencies that result in synergy to another aspect of the distribution. Finally, we discussed how the standard maximum entropy approach may remove synergy from a distribution all together. This calls for a more careful investigation as to whether it does (and BROJA optimization is incorrect) or does not and if synergistic information arises from source-target marginals and Occam's razor.
Looking to the future, we trust that this exploration of the relationship between cryptographic secrecy and unique information will provide a basis for future efforts to understand and quantify the partial information decomposition. Furthermore, the explicit recognition of the role that directional intuitions play in the meaning and interpretation of a decomposition should reduce cross-talk and improve understanding as we collectively move forward.
Author Contributions: All authors contributed equally to this work.
Funding: This material is based upon work supported by, or in part by, Foundational Questions Institute grant FQXi-RFP-1609, the U.S. Army Research Laboratory and the U.S. Army Research Office under contracts W911NF-13-1-0390 and W911NF-13-1-0340 and grant W911NF-18-1-0028, and via Intel Corporation support of the Complexity Sciences Center as an Intel Parallel Computing Center.