# BROJA-2PID: A Robust Estimator for Bivariate Partial Information Decomposition

^{*}

^{†}

Next Article in Journal

Previous Article in Journal

Institute of Computer Science, University of Tartu, Ülikooli 17, 51014 Tartu, Estonia

Author to whom correspondence should be addressed.

These authors contributed equally to this work.

Received: 22 February 2018
/
Revised: 27 March 2018
/
Accepted: 9 April 2018
/
Published: 11 April 2018

(This article belongs to the Section Information Theory, Probability and Statistics)

Makkeh, Theis, and Vicente found that Cone Programming model is the most robust to compute the Bertschinger et al. partial information decomposition (BROJA PID) measure. We developed a production-quality robust software that computes the BROJA PID measure based on the Cone Programming model. In this paper, we prove the important property of strong duality for the Cone Program and prove an equivalence between the Cone Program and the original Convex problem. Then, we describe in detail our software, explain how to use it, and perform some experiments comparing it to other estimators. Finally, we show that the software can be extended to compute some quantities of a trivaraite PID measure.

For random variables $X,Y,Z$ with finite range, consider the mutual information $\mathrm{MI}(X;Y,Z)$: the amount of information that the pair $(Y,Z)$ contain about X. How can we quantify the contributions of Y and Z, respectively, to $\mathrm{MI}(X;Y,Z)$? This question is at the heart of (bivariate) partial information decomposition (PID) [1,2,3,4]. Information theorists agree that there can be: information shared redundantly by Y and Z; information contained uniquely within Y but not within Z; information contained uniquely within Z but not within Y; and information that synergistically results from combining both Y and Z. The quantities are denoted by: $\mathrm{SI}(X;Y,Z)$; $\mathrm{UI}(X;Y\backslash Z)$, $\mathrm{UI}(X;Z\backslash Y)$; and $\mathrm{CI}(X;Y,Z)$, respectively. All four of these quantities add up to $\mathrm{MI}(X;Y,Z)$; moreover, the quantity of total information that Y has about X is decomposed into the quantity of unique information that Y has about X and shared information that Y shares with Z about X, and similarly for quantity of total information that Z has about X, thus $\mathrm{SI}(X;Y,Z)+\mathrm{UI}(X;Y\backslash Z)=\mathrm{MI}(X;Y)$, and $\mathrm{SI}(X;Y,Z)+\mathrm{UI}(X;Z\backslash Y)=\mathrm{MI}(X;Z)$. Hence, if the joint distribution of $(X,Y,Z)$ is known, then there is (at most) one degree of freedom in defining a bivariate PID. In other words, defining the value of one of the information quantities defines a bivariate PID.

Bertschinger et al. [1] have given a definition of a bivariate PID where the synergistic information is defined as follows:
where the maximum extends over all triples of random variables $({X}^{\prime},{Y}^{\prime},{Z}^{\prime})$ with the same 12,13-marginals as $(X,Y,Z)$, i.e., $P(X=x,Y=y)=P({X}^{\prime}=x,{Y}^{\prime}=y)$ for all $x,y$ and $P(X=x,Z=z)=P({X}^{\prime}=x,{Z}^{\prime}=z)$ for all $x,z$. It can easily be verified that this amounts to maximizing a concave function over a compact, polyhedral set described by inequalities [5]. Hence, using standard theorems of convex optimization [5,6], BROJA’s bivariate PID can be efficiently approximated to any given precision.

$$\mathrm{CI}(X;Y,Z):=max(\mathrm{MI}(X;Y,Z)-\mathrm{MI}({X}^{\prime};{Y}^{\prime},{Z}^{\prime}))$$

In practice, computing CI has turned out to be quite challenging, owing to the fact that the objective function is not smooth on the boundary of the feasible region, which results in numerical difficulties for state-of-the-art interior point algorithms for solving convex optimization problems. We refer to [5] for a thorough discussion of this phenomenon.

Due to these challenges and the need in the scientific computing community to have a reliable easily usable software for computing the BROJA bivariate PID, we made available on GitHub a Python implementation of our best method for computing the BROJA bivariate PID (https://github.com/Abzinger/BROJA_2PID/). The solver is based on a conic formulation of the problem and thus a Cone Program is used to compute the BROJA bivariate PID. This paper has two contributions. Firstly, we prove the important property of strong duality for the Cone Program and prove an equivalence between the Cone Program and the original Convex problem (1). Secondly, we describe in detail our software and how to use it. Thirdly, we test the software against different instances and then compare the results with the computeUI estimator introduced in [7] and the ibroja estimator from the dit package (https://github.com/dit/dit). Finally, we show how to use the so-called exponential cone to model some quantities of the multivariate PID measure introduced in [8].

This paper is organized as follows. In the remainder of this section, we define some notation we will use throughout, and review the Convex Program for computing the BROJA bivariate PID from [1]. In the next section, we review the math underlying our software to the point which is necessary to understand how it works and how it is used. In Section 3, we walk the reader through an example of how to use the software, and then explain its inner workings and its use in detail. In Section 4, we present some computations on larger problem instances, discuss how the method scales up, and compare it to other methods. In Section 5, we present a modeling, using the exponential cone, of some quantities for a multivariate PID measure. We conclude the paper by discussing our plans for the future development of the code.

Denote by $\mathbf{X}$ the range of the random variable X, by $\mathbf{Y}$ the range of Y, and by $\mathbf{Z}$ the range of Z. We identify joint probability density functions with points in ${\mathbb{R}}^{\mathbf{W}}$; for example, the joint probability distribution of $(X,Y,Z)$ is a vector in ${\mathbb{R}}^{\mathbf{X}\times \mathbf{Y}\times \mathbf{Z}}$. (We measure information in nats, unless otherwise stated.) We use the following notational convention.

An asterisk stands for “sum over everything that can be plugged in instead of the ∗”, e.g., if $p,q\in {\mathbb{R}}^{\mathbf{X}\times \mathbf{Y}\times \mathbf{Z}}$,We do not use the symbol ∗ in any other context.$${q}_{x,y,\ast}={\sum}_{w\in Z}{q}_{x,y,w};\phantom{\rule{2.em}{0ex}}{p}_{\ast ,y,z}{q}_{\ast ,y,z}=\left({\sum}_{u\in X}{p}_{u,y,z}\right)\left({\sum}_{u\in X}{q}_{u,y,z}\right)$$

We define the following notation for the marginal distributions of $(X,Y,Z)$: With p the joint probability density function of $(X,Y,Z)$:
These notations allow us to write the Convex Program from [1] in a succinct way. Unraveling the objective function of (1), we find that, given the marginal conditions, it is equal, up to a constant not depending on ${X}^{\prime},{Y}^{\prime},{Z}^{\prime}$, to the conditional entropy $H({X}^{\prime}\mid {Y}^{\prime},{Z}^{\prime})$. Replacing maximizing $H(\dots )$ by minimizing $-H(\dots )$, we find (1) to be equivalent to the following Convex Program:

$$\begin{array}{cc}\hfill {p}_{x,y,\ast}=\mathbf{P}\left(X=x\wedge Y=y\right)& \hfill \mathrm{for}\mathrm{all}x\in \mathbf{X},y\in \mathbf{Y}\\ \hfill {p}_{x,\ast ,z}=\mathbf{P}\left(X=x\wedge Z=z\right)& \hfill \mathrm{for}\mathrm{all}x\in \mathbf{X},y\in \mathbf{Y}.\end{array}$$

$$\begin{array}{ccc}\hfill \mathrm{minimize}& \sum _{x,y,z}{q}_{x,y,z}ln\frac{{q}_{x,y,z}}{{q}_{\ast ,y,z}}\hfill & \mathrm{over}q\in {\mathbb{R}}^{\mathbf{X}\times \mathbf{Y}\times \mathbf{Z}}\hfill \\ \hfill \mathrm{subject}\mathrm{to}\phantom{\rule{1.em}{0ex}}& {q}_{x,y,\ast}={p}_{x,y,\ast}\hfill & \mathrm{for}\mathrm{all}(x,y)\in \mathbf{X}\times \mathbf{Y}\hfill \\ & {q}_{x,\ast ,z}={p}_{x,\ast ,z}\hfill & \mathrm{for}\mathrm{all}(x,z)\in \mathbf{X}\times \mathbf{Z}\hfill \\ & {q}_{x,y,z}\ge 0\hfill & \mathrm{for}\mathrm{all}(x,y,z)\in \mathbf{X}\times \mathbf{Y}\times \mathbf{Z}.\hfill \end{array}$$

In [5], we introduced a model for computing the BROJA bivariate PID based on a so-called “Cone Programming”. Cone Programming is a far reaching generalization of Linear Programming: The usual inequality constraints which occur in Linear Programs can be replaced by so-called “generalized inequalities”—see below for details. Similar to Linear Programs, dedicated software is available for Cone Programs, but each type of generalized inequalities (i.e., each cone) requires its own algorithms. The specific type of generalized inequalities needed for the computation of the BROJA bivariate PID requires solvers for the so-called “Exponential Cone”, of which two we are aware of ECOS [9] and SCS [10].

In the computational results of [5], we found that the Cone Programming approach (based on one of the available solvers) was, while not the fastest, the most robust of all methods for computing the BROJA bivariate PID which we tried, such as projected gradient descent, interior point for general convex programs, geometric programming, etc. The reason for this success is that the interior point method for Cone Programming is an extension of the efficient interior point methods of Linear Programming, see [11] for more details. This is why our software is based on the Exponential Cone Programming model.

In this section, we review the mathematical definitions to the point in which they are necessary to understand our model and the properties of the software based on it.

A nonempty closed convex cone $\mathcal{K}\subseteq {\mathbb{R}}^{m}$ is a closed set which is convex, i.e., for any $x,y\in \mathcal{K}$ and $0\le \theta \le 1$ we have
and is a cone, i.e., for any $x\in \mathcal{K}$ and $\theta \ge 0$ we have
for example, ${\mathbb{R}}_{+}^{n}$ is a closed convex cone. Cone Programming is a far-reaching generalization of Linear Programming, which may contain so-called generalized inequalities: For a fixed closed convex cone $\mathcal{K}\subseteq {\mathbb{R}}^{m}$, the generalized inequality “$a{\le}_{\mathcal{K}}b$” denotes $b-a\in \mathcal{K}$ for any $a,b\in {\mathbb{R}}^{m}$. Recall the primal-dual pair of Linear Programming. The primal problem is,
over variable $w\in {\mathbb{R}}^{n}$, where $A\in {\mathbb{R}}^{{m}_{1}\times n},G\in {\mathbb{R}}^{{m}_{2}\times n},c\in {\mathbb{R}}^{n},b\in {\mathbb{R}}^{{m}_{1}},$ and $h\in {\mathbb{R}}^{{m}_{2}}$. Its dual problem is,
over variables $\eta \in {\mathbb{R}}^{{m}_{1}}$ and $\theta \in {\mathbb{R}}^{{m}_{2}}$. There are two properties that the pair (2) and (3) may or may not have, namely, weak and strong duality. The following defines the duality properties.

$$\theta x+(1-\theta )y\in \mathcal{K},$$

$$\theta x\in \mathcal{K};$$

$$\begin{array}{cc}\hfill \mathrm{minimize}& {c}^{T}w\hfill \\ \hfill \mathrm{subject}\mathrm{to}& Aw=b\hfill \\ & Gw\le h\hfill \end{array}$$

$$\begin{array}{cc}\hfill \mathrm{maximize}& -{b}^{T}\eta -{h}^{T}\theta \hfill \\ \hfill \mathrm{subject}\mathrm{to}& -{A}^{T}\eta -{G}^{T}\theta =c\hfill \\ & \theta \ge 0.\hfill \end{array}$$

Consider a primal-dual pair of the Linear Program (2) and (3). Then, we define the following,

- 1.
- A vector $w\in {\mathbb{R}}^{n}$ (respectively, $(\eta ,\theta )\in {\mathbb{R}}^{{m}_{1}}\times {\mathbb{R}}^{{m}_{2}}$) is said to be a feasible solution of (2) (respectively, (3)) if $Aw=b$ and $Gw{\le}_{\mathcal{K}}h$ (respectively, $-{A}^{T}\eta -{G}^{T}\theta =c$ and $\theta \ge 0$), i.e., none of the constraints in (2) (respectively, (3)) are violated by w (respectively, $(\eta ,\theta )$).
- 2.
- We say that (2) and (3) satisfy weak duality if for all w and all $(\eta ,\theta )$ feasible solutions of (2) and (3), respectively,$$-{b}^{T}\eta -{h}^{T}\theta \le {c}^{T}w.$$
- 3.
- If w is a feasible solution of (2) and $(\eta ,\theta )$ is a feasible solution of (3), then the duality gap d is$$d:={c}^{T}w+{b}^{T}\eta +{h}^{T}\theta .$$
- 4.
- We say that (2) and (3) satisfy strong duality when the feasible solutions w and $(\eta ,\theta )$ are optimal in (2) and (3), respectively, if and only if d is zero.

Weak duality always holds for a Linear Program, however strong duality holds for a Linear Program whenever a feasible solution of (2) or (3) exists. These duality properties are used to certify the optimality of w and $(\eta ,\theta )$. The same concept of duality exists for Cone Programming, the primal cone problem is
over variable $w\in {\mathbb{R}}^{n}$, where $A\in {\mathbb{R}}^{{m}_{1}\times n},G\in {\mathbb{R}}^{{m}_{2}\times n},c\in {\mathbb{R}}^{n},b\in {\mathbb{R}}^{{m}_{1}},$ and $h\in {\mathbb{R}}^{{m}_{2}}$. The dual cone problem is,
where ${\mathcal{K}}^{\ast}:=\{u\in {\mathbb{R}}^{n}\mid {u}^{T}v\ge 0\mathrm{for}\mathrm{all}v\in \mathcal{K}\}$ is the dual cone of $\mathcal{K}$. The entries of the vector $\eta \in {\mathbb{R}}^{{m}_{1}}$ are called the dual variables for equality constraints, $Aw=b$. Those of $\theta \in {\mathbb{R}}^{{m}_{2}}$ are the dual variables for generalized inequalities, $Gw{\le}_{\mathcal{K}}h$. The primal-dual pair of Conic Optimization (P) and (D) satisfies weak and strong duality in the same manner as the Linear Programming pair. In the following, we define the interior point of a Cone Program which is a necessary condition for strong duality (see Definition 1) to hold for the Conic Programming pair.

$$\begin{array}{cc}\hfill \mathrm{minimize}& {c}^{T}w\hfill \\ \hfill \mathrm{subject}\mathrm{to}& Aw=b\hfill \\ & Gw{\le}_{\mathcal{K}}h,\hfill \end{array}$$

$$\begin{array}{cc}\hfill \mathrm{maximize}& -{b}^{T}\eta -{h}^{T}\theta \hfill \\ \hfill \mathrm{subject}\mathrm{to}& -{A}^{T}\eta -{G}^{T}\theta =c\hfill \\ & \theta {\ge}_{{\mathcal{K}}^{\ast}}0,\hfill \end{array}$$

Consider a primal-dual pair of the Conic Optimization (P) and (D). Then, the primal problem (P) has an interior point $\tilde{x}$ if,

- $\tilde{x}$ is a feasible solution of (P).
- There exists $\u03f5>0$ such that for any $y\in {\mathbb{R}}^{n}$, we have $y\in \mathcal{K}$ whenever ${\parallel h-G\tilde{x}-y\parallel}_{2}\le \u03f5.$

(Theorem 4.7.1 [12]). Consider a primal-dual pair of the Conic Optimization (P) and (D). Let w and $(\eta ,\theta )$ be the feasible solutions of (P) and (D), respectively. Then,

- 1.
- Weak duality always hold for (P) and (D).
- 2.
- If ${c}^{T}w$ is finite and (P) has an interior point $\tilde{w}$, then strong duality holds for (P) and (D).

If the requirements of Theorem 1 are met for a conic optimization problem, then weak and strong duality can be used as guarantees that the given solution of a Cone Program is optimal.

One of the closed convex cones which we use throughout the paper is the exponential cone, ${\mathcal{K}}_{\mathrm{exp}}$, (see Figure 1) defined in [13] as
which is the closure of the set
and its dual cone, ${\mathcal{K}}_{\mathrm{exp}}^{\ast},$ (see Figure 1) is
which is the closure of the set

$$\{(r,t,q)\in {\mathbb{R}}^{3}\mid q>0\mathrm{and}q{e}^{r/q}\le t\}\cup \{(r,p,0)\in {\mathbb{R}}^{3}\mid r\le 0\mathrm{and}t\ge 0\},$$

$$\{(r,t,q)\in {\mathbb{R}}^{3}\mid q>0\mathrm{and}q{e}^{r/q}\le t\},$$

$$\{(u,v,w)\in {\mathbb{R}}^{3}\mid u<0\mathrm{and}-u\xb7{e}^{w/u}\le e\xb7v\}\cup \{(0,v,w)\mid v\ge 0\mathrm{and}w\ge 0\},$$

$$\{(u,v,w)\in {\mathbb{R}}^{3}\mid u<0\mathrm{and}-u\xb7{e}^{w/u}\le e\xb7v\}.$$

When $\mathcal{K}={\mathcal{K}}_{\mathrm{exp}}$ in (P), the Cone Program is referred to as “Exponential Cone Program”.

The Convex Program (CP) which computes the bivariate partial information decomposition can be formulated as an Exponential Cone Program. Consider the following Exponential Cone Program where the variables are $r,t,q\in {\mathbb{R}}^{\mathbf{X}\times \mathbf{Y}\times \mathbf{Z}}$.

$$\begin{array}{cc}\hfill \mathrm{minimize}& -\sum _{x,y,z}{r}_{x,y,z}\hfill \\ \hfill \mathrm{subject}\mathrm{to}\phantom{\rule{1.em}{0ex}}& {q}_{x,y,\ast}={p}_{x,y,\ast}\hfill & \mathrm{for}\mathrm{all}(x,y)\in \mathbf{X}\times \mathbf{Y}\hfill \\ & {q}_{x,\ast ,z}={p}_{x,\ast ,z}\hfill & \mathrm{for}\mathrm{all}(x,z)\in \mathbf{X}\times \mathbf{Z}\hfill \\ & {q}_{\ast ,y,z}-{t}_{x,y,z}=0\hfill & \mathrm{for}\mathrm{all}(x,y,z)\in \mathbf{X}\times \mathbf{Y}\times \mathbf{Z}\hfill \\ & (-{r}_{x,y,z},-{t}_{x,y,z},-{q}_{x,y,z}){\le}_{{\mathcal{K}}_{\mathrm{exp}}}0\hfill & \mathrm{for}\mathrm{all}(x,y,z)\in \mathbf{X}\times \mathbf{Y}\times \mathbf{Z}.\hfill \end{array}$$

The first two constraints are the marginal equations of (CP). The third constraints connects the t-variables with the q-variables which will be denoted as coupling equations. The generalized inequality connects these to the variables forming the objective function.

The exponential cone program (EXP) is equivalent to the Convex Program (CP).

Let ${\P}_{\mathrm{CP}}\left(b\right)$ and ${\P}_{\mathrm{exp}}\left(b\right)$ be the feasible region of (CP) and (EXP), respectively. We define the following
For ${q}_{x,y,z}\in {\P}_{\mathrm{CP}}$, we have
and since conditional entropy at ${q}_{x,y,z}=0$ vanishes, then the objective function of (CP) evaluated at $q\in {\P}_{\mathrm{CP}}$ is equal to that of (EXP) evaluated at $f\left(q\right)$. If $(r,t,q)\in {\P}_{\mathrm{exp}}\backslash \mathrm{Im}\left(f\right)$, then there exists $x,y,z$ such that ${r}_{x,y,z}<{q}_{x,y,z}ln\frac{{t}_{x,y,z}}{{q}_{x,y,z}}$ and so
☐

$$\begin{array}{ccc}\hfill f:{\P}_{\mathrm{CP}}\left(b\right)& \to \hfill & {\P}_{\mathrm{exp}}\left(b\right)\hfill \\ \hfill {q}_{x,y,z}& \to \hfill & f({q}_{x,y,z}):=\left\{\begin{array}{cc}({q}_{x,y,z}ln\frac{{q}_{\ast ,y,z}}{{q}_{x,y,z}},{q}_{\ast ,y,z},{q}_{x,y,z})\hfill & \mathrm{if}{q}_{xyz}0\hfill \\ (0,{q}_{\ast ,y,z},{q}_{x,y,z})\hfill & \mathrm{if}{q}_{x,y,z}=0.\hfill \end{array}\right.\hfill \end{array}$$

$${(-1,0,0)}^{T}\xb7f({q}_{x,y,z})=\left\{\begin{array}{cc}{q}_{x,y,z}ln\frac{{q}_{x,y,z}}{{q}_{\ast ,y,z}}\hfill & \mathrm{if}{q}_{xyz}0\hfill \\ 0\hfill & \mathrm{if}{q}_{xyz}=0\hfill \end{array}\right.$$

$$-\sum _{x,y,z}{r}_{x,y,z}>\sum _{x,y,z}{q}_{x,y,z}ln\frac{{q}_{x,y,z}}{{t}_{x,y,z}}.$$

The dual problem of (EXP) is

$$\begin{array}{cc}\hfill \mathrm{maximize}& -\sum _{x,y}{\lambda}_{x,y}{p}_{x,y,\ast}-\sum _{x,z}{\lambda}_{x,z}{p}_{x,\ast ,z}\hfill \end{array}$$

$$\begin{array}{ccc}\hfill \mathrm{subject}\mathrm{to}\phantom{\rule{1.em}{0ex}}& {\nu}_{x,y,z}^{1}=-1\hfill & \mathrm{for}\mathrm{all}(x,y,z)\in \mathbf{X}\times \mathbf{Y}\times \mathbf{Z}\hfill \end{array}$$

$$\begin{array}{ccc}& -{\mu}_{x,y,z}+{\nu}_{x,y,z}^{2}=0\hfill & \mathrm{for}\mathrm{all}(x,y,z)\in \mathbf{X}\times \mathbf{Y}\times \mathbf{Z}\hfill \end{array}$$

$$\begin{array}{ccc}& -{\mu}_{\ast ,y,z}-{\lambda}_{x,y}-{\lambda}_{x,z}+{\nu}_{x,y,z}^{3}=0\hfill & \mathrm{for}\mathrm{all}(x,y,z)\in \mathbf{X}\times \mathbf{Y}\times \mathbf{Z}\hfill \end{array}$$

$$\begin{array}{ccc}& ({\nu}_{x,y,z}^{1},{\nu}_{x,y,z}^{2},{\nu}_{x,y,z}^{3}){\ge}_{{\mathcal{K}}_{\mathrm{exp}}^{\ast}}0\hfill & \mathrm{for}\mathrm{all}(x,y,z)\in \mathbf{X}\times \mathbf{Y}\times \mathbf{Z}\hfill \end{array}$$

Using the definition of ${\mathcal{K}}_{\mathrm{exp}}^{\ast}$ the system consisting of (9a)–(9d) is equivalent to
and so the dual problem of (EXP) can be formulated as

$${\lambda}_{x,y}+{\lambda}_{x,z}+{\mu}_{\ast ,y,z}+1+ln(-{\mu}_{x,y,z})\ge 0\phantom{\rule{11.111pt}{0ex}}\mathrm{for}\mathrm{all}(x,y,z)\in \mathbf{X}\times \mathbf{Y}\times \mathbf{Z}$$

$$\begin{array}{cc}\hfill \mathrm{maximize}& -\sum _{x,y}{\lambda}_{x,y}{p}_{x,y,\ast}-\sum _{x,z}{\lambda}_{x,z}{p}_{x,\ast ,z}\hfill \\ \hfill \mathrm{subject}\mathrm{to}\phantom{\rule{1.em}{0ex}}& {\lambda}_{x,y}+{\lambda}_{x,z}+{\mu}_{\ast ,y,z}+1+ln(-{\mu}_{x,y,z})\ge 0\hfill & \mathrm{for}\mathrm{all}(x,y,z)\in \mathbf{X}\times \mathbf{Y}\times \mathbf{Z}\hfill \end{array}$$

Strong duality holds for the primal-dual pair (EXP) and (D-EXP).

We assume that ${p}_{x,y,\ast},{p}_{x,\ast ,z}>0.$ Consider the point $\tilde{s}$ with ${\tilde{s}}_{x,y,z}=({\tilde{r}}_{x,y,z},{\tilde{t}}_{x,y,z},{\tilde{q}}_{x,y,z})$ such that
$\tilde{s}$ is an interior point of (EXP). We refer to [14] for the proof. Hence, by Theorem 1, strong duality holds for the primal-dual pair (EXP) and (D-EXP). ☐

$$\begin{array}{cc}\hfill {\tilde{r}}_{x,y,z}& :={\tilde{q}}_{x,y,z}log\frac{{\tilde{p}}_{x,y,z}}{{\tilde{q}}_{x,y,z}}-100\hfill \\ \hfill {\tilde{t}}_{x,y,z}& :={\tilde{q}}_{\ast ,y,z}\hfill \\ \hfill {\tilde{q}}_{x,y,z}& :=\frac{{p}_{x,y,\ast}\xb7{p}_{x,\ast ,z}}{{p}_{x,\ast ,\ast}}.\hfill \end{array}$$

Weak and strong duality in their turn provide a measure for the quality of the returned solution, for more details see Section 3.4.

We implemented the exponential cone program (EXP) in Python and used a conic optimization solver to get the desired solution. Note that we are aware of only two conic optimization software toolboxes which allow solving Exponential Cone Programs, ECOS and SCS. The current version of Broja_2pid utilizes ECOS to solve the Exponential Cone Program (EXP). ECOS (we use the version from 8 November 2016) is a lightweight numerical software for solving Convex Cone programs [9].

This section describes the Broja_2pid package form the user’s perspective. We briefly explain how to install Broja_2pid. Then, we illustrate the framework of Broja_2pid and its functions. Further, we describe the input, tuning parameters, and output.

To install Broja_2pid, you need Python to be installed on your machine. Currently, you need to install ECOS, the Exponential Cone solver. To do that, you most likely `pip3 install ecos`. If there are troubles installing ECOS, we refer to its Github repository https://github.com/embotech/ecos-python. Finally, you need to `gitclone` the Github link of Broja_2pid and it is ready to be used.

In this subsection, we will explain how Broja_2pid works. In Figure 2, we present a script as an example of using Broja_2pid package to compute the partial information decomposition of the And distribution, $X=YANDZ$ where Y and Z are independent and uniformly distributed in $\{0,1\}$.

We will go through the example (Figure 2) to explain how Broja_2pid works. The main function in Broja_2pid package is `pid()`. It is a wrap up function which is used to compute the partial information decomposition. First, `pid()` prepares the “ingredients” of (EXP). Then, it calls the Cone Programming solver to find the optimal solution of (EXP). Finally, it receives from the Cone Programming solver the required solution to compute the decomposition.

The “ingredients” of (EXP) are the marginal and coupling equations, generalized inequalities, and the objective function. Thus, `pid()` needs to compute and store ${p}_{x,y,\ast}$ and ${p}_{x,\ast ,z}$, the marginal distributions of $(X,Y)$ and $(X,Z)$. For this, `pid()` requires a distribution of $X,Y,$ and Z. In Figure 2, the distribution comes from the And gate where $X=YANDZ$.

Distributions are stored as a Python dictionary data structure in which the random variable $(x,y,z)$ is a triplet key and the probability at this point is the value. This provides an associate memory structure where the value of the random variable is a reference to the probability at that point. For example, the triplet $(0,0,0)$ occurs with probability $1/4$ and so on for the other triplets. Thus, And distribution is defined as a Python dictionary, `andgate=dict()` where `andgate[ (0,0,0) ]=0.25` is assigning the key “$(0,0,0)$” a value “0.25” and so on.

Note that the user does not need to add the triplets with zero probability to the dictionary since `pid()` will always discard such triplets. In [5], the authors discussed in details how to handle the triplets with zero probability. The input of `pid()` is explained in details in the following subsection.

Now, we briefly describe how `pid()` proceeds to return the promised decomposition. `pid()` calls the Cone Programming solver and provides it with the “ingredients” of (EXP) as a part of the solver’s input. The solver finds the optimal solution of (EXP) and (D-EXP). When the solver halts, it returns the primal and dual solutions. Using the returned solutions, `pid()` computes the decomposition based on (1). The full process is explained in Figure 3.

Finally, `pid()` returns a Python dictionary, `returndata` containing the partial information decomposition and information about the quality of the Cone Programming solver’s solution. In Section 3.4, we give a detailed explanation on how to compute the quality of the solution and Table 3 contains a description of the keys and values of `returndata`.

For example, in the returned dictionary `returndata` for the And gate, `returndata[’CI’]` contains the quantity of synergistic information and `returndata[’Num_err’][0]` the maximum primal feasibility violation of (EXP).

Note that conic optimization solver is always supposed to return a solution. Thus, Broja_2pid will raise an exception, `BROJA_2PID_Exception`, when no solution is returned.

In Broja_2pid package, `pid()` is the function which the user needs to compute the partial information decomposition. The function `pid()` takes as input a Python dictionary.

The Python dictionary represents a probability distribution. This distribution computes the vectors ${p}_{x,y,\ast}$ and ${p}_{x,\ast ,z}$ for the the marginal expressions in (EXP). A key of the Python dictionary is a triplet of $(x,y,z)$ which is a possible outcome of the random variables $X,Y,$ and Z. A value of the key $(x,y,z)$ in the Python dictionary is a number which is the probability of $X=x,Y=y,$ and $Z=z$.

The Cone Programming solver has to make sure while seeking the optimal solution of (EXP) that w and $(\eta ,\theta )$ are feasible and (ideally) should halt when the duality gap is zero, i.e., w and $(\eta ,\theta )$ are optimal. However, w and $(\eta ,\theta )$ entries belong to $\mathbb{R}$ and computers represent real numbers up to floating precision. Thus, the Cone Programming solver considers a solution feasible when none of the constraints are violated, or optimal, duality gap is zero, up to a numerical precision (tolerance). The Cone Programming solver allows the user to modify the feasibility and optimality tolerances along with couple other parameters which are described in Table 1.

To change the default Cone Programming solver parameters, the user should pass them to `pid()` as a dictionary. For example, in Figure 4, we change the maximum number of iterations which the solver can do. For this, we created a dictionary, `parms=dict()`. Then, we set a desired value, `1000`, for the key `’max_iter’`. Finally, we are required to pass `parms` to `pid()` as a dictionary, `pid(andgate,`^{∗∗}parms). Note that, in the defined dictionary `parms`, the user only needs to define the keys for which the user wants to change the values.

The parameters `output` determines the printing mode of `pid()` and is an integer in $\{0,1,2\}$. This means that it allows the user to control what will be printed on the screen. Table 2 gives a detailed description of the printing mode.

Currently, we only use ECOS to solve the Exponential Cone Program but in the future we will add the SCS solver. Thus, the user should determine which solver to use in the computations. For exmple, setting `cone_solver=“ECOS”` will utilize ECOS in the computations.

The function `pid()` returns a Python dictionary called `returndata`. Table 3 describes the returned dictionary.

Let $w,\eta ,$ and $\theta $ be the lists returned by the Cone Programming solver where ${w}_{x,y,z}=[{r}_{x,y,z},{t}_{x,y,z},{q}_{x,y,z}],$ ${\eta}_{x,y,z}=[{\lambda}_{x,y},{\lambda}_{x,z},{\mu}_{x,y,z}],$ and ${\theta}_{x,y,z}=[{\nu}_{x,y,z}]$. Note that w is the primal solution and $(\eta ,\theta )$ is the dual solution. The dictionary `returndata` gives the user access to the partial information decomposition, namely, shared, unique, and synergistic information. The partial information decomposition is computed using only the positive values of ${q}_{x,y,z}$. The value of the key `’Num_err’` is a triplet such that the primal feasibility violation is `returndata[’Num_err’][0]`, the dual feasibility violation is `returndata[’Num_err’][1]`, and `returndata[’Num_err’][2]` is the duality gap violation. In the following, we will explain how we compute the violations of primal and dual feasibility in addition to that of duality gap.

The primal feasibility of (EXP) is

$$\begin{array}{c}{q}_{x,y,\ast}={p}_{x,y,\ast}\\ {q}_{x,\ast ,z}={p}_{x,\ast ,z}\\ {q}_{\ast ,y,z}={t}_{x,y,z}\\ (-{r}_{x,y,z},-{t}_{x,y,z},-{q}_{x,y,z}){\le}_{{\mathcal{K}}_{\mathrm{exp}}}0\end{array}$$

We check the violation of ${q}_{x,y,z}\ge 0$ which is required by ${\mathcal{K}}_{\mathrm{exp}}$. Since all the non-positive ${q}_{x,y,z}$ are discarded when computing the decomposition, we check if the marginal equations are violated using only the positive ${q}_{x,y,z}$. The coupling equations are ignored since they are just assigning values to the ${t}_{x,y,z}$ variables. Thus, `returndata[’Num_err’][0]` (primal feasibility violation) is computed as follows,

$$\begin{array}{cc}\hfill {q}_{x,y,z}^{\prime}& =\left\{\begin{array}{cc}0\hfill & \mathrm{if}{q}_{x,y,z}\le 0\hfill \\ {q}_{x,y,z}\hfill & \mathrm{otherwise}\hfill \end{array}\right.\hfill \\ \hfill \mathtt{returndata}\mathtt{[}{}^{{\mathtt{\prime}}}{\mathtt{Num}}{\mathtt{\_}}{{\mathtt{err}}}^{{\mathtt{\prime}}}\mathtt{]}\mathtt{\left[}{\mathtt{0}}\mathtt{\right]}& =\underset{x,y,z}{max}(|{q}_{x,y,\ast}^{\prime}-{p}_{x,y,\ast}|,|{q}_{x,\ast ,z}^{\prime}-{p}_{x,\ast ,z}|,-{q}_{x,y,z})\hfill \end{array}$$

The dual feasibility of (D-EXP) is

$${\lambda}_{x,y}+{\lambda}_{x,z}+{\mu}_{\ast ,y,z}+1+ln(-{\mu}_{x,y,z})\ge 0$$

For dual feasibility violation, we check the non-negativity of (12). Thus, the error `returndata[’Num_err’][1]` is equal to

$$\underset{x,y,z}{min}({\lambda}_{x,y}+{\lambda}_{x,z}+{\mu}_{\ast ,y,z}+1+ln(-{\mu}_{x,y,z}),0)$$

When w is the optimal solution of (EXP), we have

$$-\sum _{x,y,z}{r}_{x,y,z}=\sum _{x,y,z}{q}_{x,y,z}log\frac{{q}_{x,y,z}}{{q}_{\ast ,y,z}}=-H(X\mid Y,Z).$$

The duality gap of (EXP) and (D-EXP) is
where

$$-H(X\mid Y,Z)+{\lambda}^{T}b,$$

$${\lambda}^{T}b=\sum _{x,y}{\lambda}_{x,y}{p}_{x,y,\ast}+\sum _{x,z}{\lambda}_{x,z}{p}_{x,\ast ,z}.$$

Since weak duality implies $H(X\mid Y,Z)\le {\lambda}^{T}b$, we are left to check the non-negativity of (13) to inspect the duality gap. Thus, `returndata[’Num_err’][2]` is given by,

$$max(-H(X\mid Y,Z)+{\lambda}^{T}b,0)$$

In this section, we test the performance of Broja_2pid on three types of instances. We describe the instances that Broja_2pid is tested against, report the results, and finally compare the performance of other estimators on the same instances. The two estimators that we compare the performance of Broja_2pid to, which produce reasonable results and we are aware of, are computeUI and ibroja. The first two types of instances are used as primitive validation tests. However, the last type of instances is used to evaluate the accuracy and efficiency of Broja_2pid in computing the partial information decomposition. We used a computer server with Intel(R) Core(TM) i7-4790K CPU (4 cores) and 16GB of RAM to compute PID for the instances. All computations of Broja_2pid and computeUI were done using one core, whereas ibroja used multiple cores.

The following set of instances have been studied extensively throughout the literature. The partial information decomposition of the set of instances is known [2]. Despite their simplicity, they acquire desired properties of shared or synergistic quantities.

The first type of instances is based on the “gates” (Rdn, Unq, Xor, And, RdnXor, RdnUnqXor, and XorAnd) described in Table 1 of [1]. Each gate is given as a function $(x,y,z)=\mathtt{G}\left(W\right)$ which maps a (random) input W to a triple $(x,y,z)$. The inputs are sampled uniformly at random, whereas, in Table 1 of [1], the inputs are independent and identically distributed.

All the gates are implemented as dictionaries and `pid()` is called successively with different printing modes to compute them. The latter is coded into the script file at the Github directory `Testing/test_gates.py`. The values of the partial information decomposition for all the gates distributions (when computed by `pid()`) were equal to the actual values up to precision error of order ${10}^{-9}$ and the slowest time of computations is less than a millisecond.

Both estimators, computeUI and ibroja, produced values of the partial information decomposition for all the gate distributions equal to the actual values up to precision error of order ${10}^{-10}$ but the slowest time of computations is more than ten milliseconds for computeUI and 12 s for ibroja.

The Copy gate requires a large number of variables and constraints—see below for details. Thus, we used it to test the memory efficiency of the Broja_2pid estimator. Since its decomposition is known, it also provides to some extent a validation for the correctness of the solution in large systems.

Copy gate is the mapping of $(y,z)$ chosen uniformly at random to a triplet $(x,y,z)$ where $x=(y,z)$. The Copy distribution overall size scales as ${\left|\mathbf{Y}\right|}^{2}\times {\left|\mathbf{Z}\right|}^{2}$ where $y,z\in \mathbf{Y}\times \mathbf{Z}$. Proposition 18 in [1] shows that the partial information decomposition of Copy gate is

$$\begin{array}{cc}\hfill \mathrm{CI}(X;Y,Z)& =0\hfill \\ \hfill \mathrm{SI}(X;Y,Z)& =\mathrm{MI}(Y;Z)\hfill \\ \hfill \mathrm{UI}(X;Y\backslash Z)& =H(Y\mid Z)\hfill \\ \hfill \mathrm{UI}(X;Z\backslash Y)& =H(Z\mid Y)\hfill \end{array}$$

Since Y and Z are independent random variables, then $\mathrm{UI}(X;Y\backslash Z)=H(Y)$ and $\mathrm{UI}(X;Z\backslash Y)=H\left(Z\right)$ and $SI(Y;Z)=0$.

The Copy distributions is generated for different sizes of $\mathbf{Y}$ and $\mathbf{Z}$ where $\mathbf{Y}=\left[m\right]$ and $\mathbf{Z}=\left[n\right]$ for $m,n\in \mathbb{N}\backslash \left\{0\right\}$. Then, `pid()` is called to compute the partial information decomposition for each pair of $m,n$. Finally, the `returndata` dictionary is printed along with the running time of the Broja_2pid estimator and the deviations of `returndata[’UIY’]` and `returndata[’UIZ’]` from $H\left(Y\right)$ and $H\left(Z\right)$, respectively. The latter process is implemented in `Testing/test_large_copy.py`. The worst deviation was of percentage at most ${10}^{-8}$ for any $m,n\le 90.$

The ibroja estimator failed to give a solution to any instance since the machine was running out of memory. The computeUI estimator could solve instance of size less than or equal to $2.5\mathrm{exp}7$, but, for larger instances, the machine was running out of memory. computeUI was slower than Broja_2pid by at least a factor of 40 and at most factor of 113; see Figure 5 for comparison.

This is the main set of instances for which we test the efficiency of Broja_2pid estimator. It has three subsets of instance where each one is useful for an aspect of efficiency when the estimator is used against large systems. This set of instances had some hard distributions in the sense that the optimal solution lies on the boundary of feasible region of the problem (1).

The last type of instances are joint distributions of $(X,Y,Z)$ sampled uniformly at random over the probability simplex. We have three different sets of the joint distributions depending on the size of $\mathbf{X},\mathbf{Y},$ and $\mathbf{Z}$.

- (a)
- For Set 1, we fix $\left|\mathbf{X}\right|=\left|\mathbf{Y}\right|=2$ and vary $\left|\mathbf{Z}\right|$ in $\{2,3,\dots ,14\}$. Then, for each size of Z, we sample uniformly at random 500 joint distribution of $(X,Y,Z)$ over the probability simplex.
- (b)
- For Set 2, we fix $\left|\mathbf{X}\right|=\left|\mathbf{Z}\right|=2$ and vary $\left|\mathbf{Y}\right|$ in $\{2,3,\dots ,14\}$. Then, for each value of $\left|Y\right|$, we sample uniformly at random 500 joint distribution of $(X,Y,Z)$ over the probability simplex.
- (c)
- For Set 3, we fix $\left|\mathbf{X}\right|=\left|\mathbf{Y}\right|=\left|\mathbf{Z}\right|=s$ where $s\in \{8,9,\dots ,18\}$. Then, for each s, we sample uniformly at random 500 joint distribution of $(X,Y,Z)$ over the probability simplex.

Note that, in each set, instances are grouped according to the varying value, i.e., $\left|\mathbf{Y}\right|,\left|\mathbf{Z}\right|,$ and s, respectively.

The instances were generated using the Python script `Testing/test_large_randompdf.py`. The latter script takes as command-line arguments $\left|\mathbf{X}\right|,\left|\mathbf{Y}\right|,\left|\mathbf{Z}\right|$ and the number of joint distributions of $(X,Y,Z)$ the user wants to sample from the probability simplex. For example, if the user wants to create the instance of Set 1 with $\left|\mathbf{Z}\right|=7$, then the corresponding command-line is `python3 test_large_randompdf.py 2 2 7 500`. The script outputs the `returndata` along with the running time of Broja_2pid estimator for each distribution and finally it prints the empirical average over all the distributions of $SI(X;Y,Z),UI(X;Y\backslash Z),UI(X;Y\backslash Z),$ $CI(X;Y,Z),$ and of the running time of Broja_2pid estimator.

In the following, for each of the sets, we look at $UI(X;Y\backslash Z)$ to validate the solution, the `returndata[’Num_err’]` triplet to examine the quality of the solution, and the running time to analyze the efficiency of the estimator.

We compare the Broja_2pid estimator with the computeUI estimator and ibroja against the instance of the same type of Set 3 for $s=\{2,\dots ,17\}$.

Even though the optimality criterion of computeUI is ${10}^{-7}$, the solution of Broja_2pid was closer to the optima with a magnitude of ${10}^{-5}$ more than that of computeUI which concludes that Broja_2pid produces more enhanced solutions than those of computeUI. The test is implemented in `Testing/test_from_file_broja_2pid_computeUI.py` where the distributions in the folder `randompdfs/` are the inputs.

The solution of Broja_2pid was closer to the optima with a magnitude of ${10}^{-3}$ for instances with $s=\dots $ more than that of ibroja. Note that the results of this comparisons aligns with the claims imposed in [5] that first order methods are not suitable to tackle this problem. The test is implemented in `Testing/test_from_file_dit.py` where the distributions in the folder `randompdfs/` are the inputs.

Chicharro [8] introduced a multivariate PID measure using the so-called tree-base decompositions. The measure is similar to the bivariate BROJA PID measure yet it is not an extension of the BROJA PID measure. In this section, we show how to model some of the trivariate PID quantities using the exponential cone. Thus, with some modification, the Broja_2pid can be extended to compute some of the trivariate PID quantities. Note that, due to time constraint, we could not check whether the other trivariate PID quantities can be also fitted into the exponential cone.

Let $S,X,Y,Z$ be random variables with finite range, where S is the target and $X,Y,Z$ are the sources. Chicharro [8] defined the quantity of synergistic information that the sources $X,Y,Z$ hold about the target S as:
where the maximum extends over the triples of random variables $({S}^{\prime},{X}^{\prime},{Y}^{\prime},{Z}^{\prime})$ with the same 12,13,14-marginals as $(S,X,Y,Z)$, i.e., $P(S=s,X=x)=P({S}^{\prime}=s,{X}^{\prime}=x)$ for all $s,x$, $P(S=s,Y=y)=P({S}^{\prime}=s,{Y}^{\prime}=y)$ for all $s,y$, and $P(S=s,Z=z)=P({S}^{\prime}=s,{Z}^{\prime}=z)$ for all $s,z$. The objective function of (14), given the marginal conditions, is equal, up to a constant not depending on ${S}^{\prime},{X}^{\prime},{Y}^{\prime},{Z}^{\prime}$, to the conditional entropy $H({S}^{\prime}\mid {X}^{\prime},{Y}^{\prime},{Z}^{\prime})$. Thus, we find that (14) is equivalent to the following Convex Program:

$$\mathrm{CI}(S;{X}_{1},{X}_{2},{X}_{3})=max(\mathrm{MI}(S;X,Y,Z)-\mathrm{MI}({S}^{\prime};{X}^{\prime},{Y}^{\prime},{Z}^{\prime}))$$

$$\begin{array}{ccc}\hfill \mathrm{minimize}& \sum _{s,x,y,z}{q}_{s,x,y,z}ln\frac{{q}_{s,x,y,z}}{{q}_{\ast ,x,y,z}}\hfill & \mathrm{over}q\in {\mathbb{R}}^{\mathbf{S}\times \mathbf{X}\times \mathbf{Y}\times \mathbf{Z}}\hfill \\ \hfill \mathrm{subject}\mathrm{to}\phantom{\rule{1.em}{0ex}}& {q}_{s,x,\ast ,\ast}={p}_{s,x,\ast ,\ast}\hfill & \mathrm{for}\mathrm{all}(s,x)\in \mathbf{S}\times \mathbf{X}\hfill \\ & {q}_{s,\ast ,y,\ast}={p}_{s,\ast ,y,\ast}\hfill & \mathrm{for}\mathrm{all}(s,y)\in \mathbf{S}\times \mathbf{Y}\hfill \\ & {q}_{s,\ast ,\ast ,z}={p}_{s,\ast ,\ast ,z}\hfill & \mathrm{for}\mathrm{all}(s,z)\in \mathbf{S}\times \mathbf{Z}\hfill \\ & {q}_{s,x,y,z}\ge 0\hfill & \mathrm{for}\mathrm{all}(s,x,y,z)\in \mathbf{S}\times \mathbf{X}\times \mathbf{Y}\times \mathbf{Z}.\hfill \end{array}$$

Hence, the following Exponential Cone Program where the variables are $r,t,q\in {\mathbb{R}}^{\mathbf{S}\times \mathbf{X}\times \mathbf{Y}\times \mathbf{Z}}$:

$$\begin{array}{cc}\hfill \mathrm{minimize}& -\sum _{s,x,y,z}{r}_{s,x,y,z}\hfill \\ \hfill \mathrm{subject}\mathrm{to}\phantom{\rule{1.em}{0ex}}& {q}_{s,x,\ast ,\ast}={p}_{s,x,\ast ,\ast}\hfill & \mathrm{for}\mathrm{all}(s,x)\in \mathbf{S}\times \mathbf{X}\hfill \\ & {q}_{s,\ast ,y,\ast}={p}_{s,\ast ,y,\ast}\hfill & \mathrm{for}\mathrm{all}(s,y)\in \mathbf{S}\times \mathbf{Y}\hfill \\ & {q}_{s,\ast ,\ast ,z}={p}_{s,\ast ,\ast ,z}\hfill & \mathrm{for}\mathrm{all}(s,z)\in \mathbf{S}\times \mathbf{Z}\hfill \\ & {q}_{\ast ,x,y,z}-{t}_{s,x,y,z}=0\hfill & \mathrm{for}\mathrm{all}(s,x,y,z)\in \mathbf{S}\times \mathbf{X}\times \mathbf{Y}\times \mathbf{Z}\hfill \\ & (-{r}_{s,x,y,z},-{t}_{s,x,y,z},-{q}_{s,x,y,z}){\le}_{{\mathcal{K}}_{\mathrm{exp}}}0\hfill & \mathrm{for}\mathrm{all}(s,x,y,z)\in \mathbf{S}\times \mathbf{X}\times \mathbf{Y}\times \mathbf{Z}.\hfill \end{array}$$

The exponential cone program in (16) is equivalent to the Convex Program (15).

The proof follows in the same manner to that of Proposition 1. ☐

The dual problem of (16) can be formulated as

$$\begin{array}{cc}\hfill \mathrm{maximize}& -\sum _{s,x}{\lambda}_{s,x}{p}_{s,x,\ast ,\ast}-\sum _{s,y}{\lambda}_{s,y}{p}_{s,\ast ,y,\ast}-\sum _{s,y}{\lambda}_{s,z}{p}_{s,\ast ,\ast ,z}\hfill \\ \hfill \mathrm{subject}\mathrm{to}\phantom{\rule{1.em}{0ex}}& {\lambda}_{s,x}+{\lambda}_{s,y}+{\lambda}_{s,z}+{\mu}_{\ast ,x,y,z}+1+ln(-{\mu}_{s,x,y,z})\ge 0\hfill & \mathrm{for}\mathrm{all}(s,x,y,z)\in \mathbf{S}\times \mathbf{X}\times \mathbf{Y}\times \mathbf{Z}.\hfill \end{array}$$

Strong duality holds for the primal-dual pair (16) and (17).

The proof follows in the same manner to that of Proposition 2. ☐

Chicharro [8] defined the quantity of unique information that the sources X hold about the target S as:
where both minimums extend over the triples of random variables $({S}^{\prime},{X}^{\prime},{Y}^{\prime},{Z}^{\prime})$ with the same 12,13,14-marginals as $(S,{X}_{1},{X}_{2},{X}_{3})$, i.e., $P(S=s,X=x)=P({S}^{\prime}=s,{X}^{\prime}=x)$ for all $s,x$, $P(S=s,Y=y)=P({S}^{\prime}=s,{Y}^{\prime}=y)$ for all $s,y$, and $P(S=s,Z=z)=P({S}^{\prime}=s,{Z}^{\prime}=z)$ for all $s,z$. Analogously, he defines the unique information $\mathrm{UI}(S;Y\backslash X,Z)$ and $\mathrm{UI}(S;Z\backslash X,Y)$. Computing $\mathrm{UI}(S;X\backslash Y,Z)$ consists of solving two optimization problems. The first problem in (18) can be formulated as (15) and the second problem can be formulated as follows:

$$\mathrm{UI}(S;X\backslash Y,Z)=min\mathrm{MI}({S}^{\prime};{X}^{\prime},{Y}^{\prime},{Z}^{\prime})-min\mathrm{MI}({S}^{\prime};{Y}^{\prime},{Z}^{\prime})$$

$$\begin{array}{ccc}\hfill \mathrm{minimize}& \sum _{s,y,z}{q}_{s,\ast ,y,z}ln\frac{{q}_{s,\ast ,y,z}}{{q}_{\ast ,\ast ,y,z}}\hfill & \mathrm{over}q\in {\mathbb{R}}^{\mathbf{S}\times \mathbf{X}\times \mathbf{Y}\times \mathbf{Z}}\hfill \\ \hfill \mathrm{subject}\mathrm{to}\phantom{\rule{1.em}{0ex}}& {q}_{s,x,\ast ,\ast}={p}_{s,x,\ast ,\ast}\hfill & \mathrm{for}\mathrm{all}(s,x)\in \mathbf{S}\times \mathbf{X}\hfill \\ & {q}_{s,\ast ,y,\ast}={p}_{s,\ast ,y,\ast}\hfill & \mathrm{for}\mathrm{all}(s,y)\in \mathbf{S}\times \mathbf{Y}\hfill \\ & {q}_{s,\ast ,\ast ,z}={p}_{s,\ast ,\ast ,z}\hfill & \mathrm{for}\mathrm{all}(s,z)\in \mathbf{S}\times \mathbf{Z}\hfill \\ & {q}_{s,x,y,z}\ge 0\hfill & \mathrm{for}\mathrm{all}(s,x,y,z)\in \mathbf{S}\times \mathbf{X}\times \mathbf{Y}\times \mathbf{Z}.\hfill \end{array}$$

Hence, the following Exponential Cone Program where the variables are $r,t,q\in {\mathbb{R}}^{\mathbf{S}\times \mathbf{Y}\times \mathbf{Z}}$:

$$\begin{array}{cc}\hfill \mathrm{minimize}& -\sum _{s,y,z}{r}_{s,y,z}\hfill \\ \hfill \mathrm{subject}\mathrm{to}\phantom{\rule{1.em}{0ex}}& {q}_{s,x,\ast ,\ast}={p}_{s,x,\ast ,\ast}\hfill & \mathrm{for}\mathrm{all}(s,x)\in \mathbf{S}\times \mathbf{X}\hfill \\ & {q}_{s,\ast ,y,\ast}={p}_{s,\ast ,y,\ast}\hfill & \mathrm{for}\mathrm{all}(s,y)\in \mathbf{S}\times \mathbf{Y}\hfill \\ & {q}_{s,\ast ,\ast ,z}={p}_{s,\ast ,\ast ,z}\hfill & \mathrm{for}\mathrm{all}(s,z)\in \mathbf{S}\times \mathbf{Z}\hfill \\ & {q}_{\ast ,\ast ,y,z}-{t}_{s,y,z}=0\hfill & \mathrm{for}\mathrm{all}(s,y,z)\in \mathbf{S}\times \mathbf{Y}\times \mathbf{Z}\hfill \\ & (-{r}_{s,y,z},-{t}_{s,y,z},-{q}_{s,\ast ,y,z}){\le}_{{\mathcal{K}}_{\mathrm{exp}}}0\hfill & \mathrm{for}\mathrm{all}(s,y,z)\in \mathbf{S}\times \mathbf{Y}\times \mathbf{Z}.\hfill \end{array}$$

Similarly, we can prove the equivalence of (20) and (19) and that strong duality holds for (20) and its dual.

We are aware of one other Cone Programming solver with support for the Exponential Cone, SCS [10]. We are currently working on adding the functionality to our software. When that is completed, giving the parameter `cone_solver=“SCS”` to the function `pid()` will make our software use the SCS-based model instead of the ECOS-based one. (The models themselves are in fact different: SCS requires us to start from the dual exponential cone program (D-EXP).) SCS employs parallelized first-order methods which can be run on GPUs, so we expect a considerable speedup for large-scale problem instances.

We may note that other information theoretical functions can also be fitted into the exponential cone. Thus, with some modification, the model can be used to solve other problems.

The authors would like to thank Patricia Wollstadt and Michael Wibral for their feedback on pre-production versions of our software. In addition, we thank Daniel Chicharro for fruitful discussions and pointing us to applications of our approach to estimate multivariate formulations of PID.

This research was supported by the Estonian Research Council, ETAG (Eesti Teadusagentuur), through PUT Exploratory Grant #620. R.V. also thanks the financial support from ETAG through the personal research grant PUT1476. We also gratefully acknowledge funding by the European Regional Development Fund through the Estonian Center of Excellence in in IT, EXCITE.

A.M. and D.O.T. conceived and designed the experiments; A.M. performed the experiments; A.M. and D.O.T. analyzed the data; R.V. contributed reagents/materials/analysis tools; and A.M. wrote the paper.

The authors declare no conflict of interest.

- Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J.; Ay, N. Quantifying unique information. Entropy
**2014**, 16, 2161–2183. [Google Scholar] [CrossRef] - Griffith, V.; Koch, C. Quantifying synergistic mutual information. In Guided Self-Organization: Inception; Springer: Berlin, Germany, 2014; pp. 159–190. [Google Scholar]
- Harder, M.; Salge, C.; Polani, D. Bivariate measure of redundant information. Phys. Rev. E
**2013**, 87, 012130. [Google Scholar] [CrossRef] [PubMed] - Williams, P.L.; Beer, R.D. Nonnegative decomposition of multivariate information. arXiv, 2010; arXiv:1004.2515. [Google Scholar]
- Makkeh, A.; Theis, D.O.; Vicente, R. Bivariate Partial Information Decomposition: The Optimization Perspective. Entropy
**2017**, 19, 530. [Google Scholar] [CrossRef] - Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
- Banerjee, P.K.; Rauh, J.; Montúfar, G. Computing the Unique Information. arXiv, 2017; arXiv:1709.07487. [Google Scholar]
- Chicharro, D. Quantifying multivariate redundancy with maximum entropy decompositions of mutual information. arXiv, 2017; arXiv:1708.03845. [Google Scholar]
- Domahidi, A.; Chu, E.; Boyd, S. ECOS: An SOCP solver for embedded systems. In Proceedings of the European Control Conference (ECC), Zurich, Switzerland, 17–19 July 2013; pp. 3071–3076. [Google Scholar]
- O’Donoghue, B.; Chu, E.; Parikh, N.; Boyd, S. SCS: Splitting Conic Solver, Version 1.2.7, 2016. Available online: https://github.com/cvxgrp/scs (accessed on 26 November 2017).
- Luenberger, D.G.; Ye, Y. Linear and Nonlinear Programming; Springer: Berlin, Germany, 1984; Volume 2. [Google Scholar]
- Gärtner, B.; Matousek, J. Approximation Algorithms and Semidefinite Programming; Springer: Berlin, Germany, 2012. [Google Scholar]
- Chares, R. Cones and Interior-Point Algorithms for Structured Convex Optimization Involving Powers Andexponentials. Ph.D. Thesis, UCL-Université Catholique de Louvain, Louvain-la-Neuve, Belgium, 2009. [Google Scholar]
- Makkeh, A. Applications of Optimization in Some Complex Systems. Ph.D. Thesis, University of Tartu, Tartu, Estonia, 2018. forthcoming. [Google Scholar]

Parameter | Description | Recommended Value |
---|---|---|

feastol | primal/dual feasibility tolerance | ${10}^{-7}$ |

abstol | absolute tolerance on duality gap | ${10}^{-6}$ |

reltol | relative tolerance on duality gap | ${10}^{-6}$ |

feastol_inacc | primal/dual infeasibility relaxed tolerance | ${10}^{-3}$ |

abstol_inacc | absolute relaxed tolerance on duality gap | ${10}^{-4}$ |

reltol_inacc | relaxed relative duality gap | ${10}^{-4}$ |

max_iter | maximum number of iterations that “ECOS” does | 100 |

Output | Description |
---|---|

0 (default) | pid() prints its output (python dictionary, see Section 3.4). |

1 | In addition to output=0, pid() prints a flags when it starts preparing (EXP). |

2 | and another flag when it calls the conic optimization solver. |

In addition to output=1, pid() prints the conic optimization solver’s output. | |

(The conic optimization solver usually prints out the problem statistics and the status of optimization.) |

Key | Value |
---|---|

’SI’ | Shared information, $\mathrm{SI}(X;Y,Z)$. |

(All information quantities are returned in bits.) | |

’UIY’ | Unique information of Y, $\mathrm{UI}(X;Y\backslash Z)$. |

’UIZ’ | Unique information of Z, $\mathrm{UI}(X;Z\backslash Y)$. |

’CI’ | Synergistic information, $\mathrm{CI}(X;Y,Z)$. |

’Num_err’ | information about the quality of the solution. |

’Solver’ | name of the solver used to optimize (CP). |

(In this version, we only use ECOS, but other solvers might be added in the future.) |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).