# BROJA-2PID: A Robust Estimator for Bivariate Partial Information Decomposition

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

#### Notation and Background

An asterisk stands for “sum over everything that can be plugged in instead of the ∗”, e.g., if $p,q\in {\mathbb{R}}^{\mathbf{X}\times \mathbf{Y}\times \mathbf{Z}}$,We do not use the symbol ∗ in any other context.$${q}_{x,y,\ast}={\sum}_{w\in Z}{q}_{x,y,w};\phantom{\rule{2.em}{0ex}}{p}_{\ast ,y,z}{q}_{\ast ,y,z}=\left({\sum}_{u\in X}{p}_{u,y,z}\right)\left({\sum}_{u\in X}{q}_{u,y,z}\right)$$

## 2. Cone Programming Model for Bivariate PID

#### 2.1. Background on Cone Programming

**Definition**

**1.**

- 1.
- A vector $w\in {\mathbb{R}}^{n}$ (respectively, $(\eta ,\theta )\in {\mathbb{R}}^{{m}_{1}}\times {\mathbb{R}}^{{m}_{2}}$) is said to be a feasible solution of (2) (respectively, (3)) if $Aw=b$ and $Gw{\le}_{\mathcal{K}}h$ (respectively, $-{A}^{T}\eta -{G}^{T}\theta =c$ and $\theta \ge 0$), i.e., none of the constraints in (2) (respectively, (3)) are violated by w (respectively, $(\eta ,\theta )$).
- 2.
- We say that (2) and (3) satisfy weak duality if for all w and all $(\eta ,\theta )$ feasible solutions of (2) and (3), respectively,$$-{b}^{T}\eta -{h}^{T}\theta \le {c}^{T}w.$$
- 3.
- If w is a feasible solution of (2) and $(\eta ,\theta )$ is a feasible solution of (3), then the duality gap d is$$d:={c}^{T}w+{b}^{T}\eta +{h}^{T}\theta .$$
- 4.
- We say that (2) and (3) satisfy strong duality when the feasible solutions w and $(\eta ,\theta )$ are optimal in (2) and (3), respectively, if and only if d is zero.

**Definition**

**2.**

- $\tilde{x}$ is a feasible solution of (P).
- There exists $\u03f5>0$ such that for any $y\in {\mathbb{R}}^{n}$, we have $y\in \mathcal{K}$ whenever ${\parallel h-G\tilde{x}-y\parallel}_{2}\le \u03f5.$

**Theorem**

**1**

- 1.
- Weak duality always hold for (P) and (D).
- 2.
- If ${c}^{T}w$ is finite and (P) has an interior point $\tilde{w}$, then strong duality holds for (P) and (D).

#### 2.2. The Exponential Cone Programming Model

**Proposition**

**1.**

**Proof.**

**Proposition**

**2.**

**Proof.**

## 3. The BROJA_2PID Estimator

#### 3.1. Installation

`pip3 install ecos`. If there are troubles installing ECOS, we refer to its Github repository https://github.com/embotech/ecos-python. Finally, you need to

`gitclone`the Github link of Broja_2pid and it is ready to be used.

#### 3.2. Computing Bivariate PID

`pid()`. It is a wrap up function which is used to compute the partial information decomposition. First,

`pid()`prepares the “ingredients” of (EXP). Then, it calls the Cone Programming solver to find the optimal solution of (EXP). Finally, it receives from the Cone Programming solver the required solution to compute the decomposition.

`pid()`needs to compute and store ${p}_{x,y,\ast}$ and ${p}_{x,\ast ,z}$, the marginal distributions of $(X,Y)$ and $(X,Z)$. For this,

`pid()`requires a distribution of $X,Y,$ and Z. In Figure 2, the distribution comes from the And gate where $X=YANDZ$.

`andgate=dict()`where

`andgate[ (0,0,0) ]=0.25`is assigning the key “$(0,0,0)$” a value “0.25” and so on.

`pid()`will always discard such triplets. In [5], the authors discussed in details how to handle the triplets with zero probability. The input of

`pid()`is explained in details in the following subsection.

`pid()`proceeds to return the promised decomposition.

`pid()`calls the Cone Programming solver and provides it with the “ingredients” of (EXP) as a part of the solver’s input. The solver finds the optimal solution of (EXP) and (D-EXP). When the solver halts, it returns the primal and dual solutions. Using the returned solutions,

`pid()`computes the decomposition based on (1). The full process is explained in Figure 3.

`pid()`returns a Python dictionary,

`returndata`containing the partial information decomposition and information about the quality of the Cone Programming solver’s solution. In Section 3.4, we give a detailed explanation on how to compute the quality of the solution and Table 3 contains a description of the keys and values of

`returndata`.

`returndata`for the And gate,

`returndata[’CI’]`contains the quantity of synergistic information and

`returndata[’Num_err’][0]`the maximum primal feasibility violation of (EXP).

`BROJA_2PID_Exception`, when no solution is returned.

#### 3.3. Input and Parameters

`pid()`is the function which the user needs to compute the partial information decomposition. The function

`pid()`takes as input a Python dictionary.

`pid()`as a dictionary. For example, in Figure 4, we change the maximum number of iterations which the solver can do. For this, we created a dictionary,

`parms=dict()`. Then, we set a desired value,

`1000`, for the key

`’max_iter’`. Finally, we are required to pass

`parms`to

`pid()`as a dictionary,

`pid(andgate,`. Note that, in the defined dictionary

^{∗∗}parms)`parms`, the user only needs to define the keys for which the user wants to change the values.

`output`determines the printing mode of

`pid()`and is an integer in $\{0,1,2\}$. This means that it allows the user to control what will be printed on the screen. Table 2 gives a detailed description of the printing mode.

`cone_solver=“ECOS”`will utilize ECOS in the computations.

#### 3.4. Returned Data

`pid()`returns a Python dictionary called

`returndata`. Table 3 describes the returned dictionary.

`returndata`gives the user access to the partial information decomposition, namely, shared, unique, and synergistic information. The partial information decomposition is computed using only the positive values of ${q}_{x,y,z}$. The value of the key

`’Num_err’`is a triplet such that the primal feasibility violation is

`returndata[’Num_err’][0]`, the dual feasibility violation is

`returndata[’Num_err’][1]`, and

`returndata[’Num_err’][2]`is the duality gap violation. In the following, we will explain how we compute the violations of primal and dual feasibility in addition to that of duality gap.

`returndata[’Num_err’][0]`(primal feasibility violation) is computed as follows,

`returndata[’Num_err’][1]`is equal to

`returndata[’Num_err’][2]`is given by,

## 4. Tests

#### 4.1. Paradigmatic Gates

#### 4.1.1. Data

#### 4.1.2. Testing

`pid()`is called successively with different printing modes to compute them. The latter is coded into the script file at the Github directory

`Testing/test_gates.py`. The values of the partial information decomposition for all the gates distributions (when computed by

`pid()`) were equal to the actual values up to precision error of order ${10}^{-9}$ and the slowest time of computations is less than a millisecond.

#### 4.1.3. Comparison with Other Estimators

#### 4.2. Copy Gate

#### 4.2.1. Data

#### 4.2.2. Testing

`pid()`is called to compute the partial information decomposition for each pair of $m,n$. Finally, the

`returndata`dictionary is printed along with the running time of the Broja_2pid estimator and the deviations of

`returndata[’UIY’]`and

`returndata[’UIZ’]`from $H\left(Y\right)$ and $H\left(Z\right)$, respectively. The latter process is implemented in

`Testing/test_large_copy.py`. The worst deviation was of percentage at most ${10}^{-8}$ for any $m,n\le 90.$

#### 4.2.3. Comparison with Other Estimators

#### 4.3. Random Probability Distributions

#### 4.3.1. Data

- (a)
- For Set 1, we fix $\left|\mathbf{X}\right|=\left|\mathbf{Y}\right|=2$ and vary $\left|\mathbf{Z}\right|$ in $\{2,3,\dots ,14\}$. Then, for each size of Z, we sample uniformly at random 500 joint distribution of $(X,Y,Z)$ over the probability simplex.
- (b)
- For Set 2, we fix $\left|\mathbf{X}\right|=\left|\mathbf{Z}\right|=2$ and vary $\left|\mathbf{Y}\right|$ in $\{2,3,\dots ,14\}$. Then, for each value of $\left|Y\right|$, we sample uniformly at random 500 joint distribution of $(X,Y,Z)$ over the probability simplex.
- (c)
- For Set 3, we fix $\left|\mathbf{X}\right|=\left|\mathbf{Y}\right|=\left|\mathbf{Z}\right|=s$ where $s\in \{8,9,\dots ,18\}$. Then, for each s, we sample uniformly at random 500 joint distribution of $(X,Y,Z)$ over the probability simplex.

#### 4.3.2. Testing

`Testing/test_large_randompdf.py`. The latter script takes as command-line arguments $\left|\mathbf{X}\right|,\left|\mathbf{Y}\right|,\left|\mathbf{Z}\right|$ and the number of joint distributions of $(X,Y,Z)$ the user wants to sample from the probability simplex. For example, if the user wants to create the instance of Set 1 with $\left|\mathbf{Z}\right|=7$, then the corresponding command-line is

`python3 test_large_randompdf.py 2 2 7 500`. The script outputs the

`returndata`along with the running time of Broja_2pid estimator for each distribution and finally it prints the empirical average over all the distributions of $SI(X;Y,Z),UI(X;Y\backslash Z),UI(X;Y\backslash Z),$ $CI(X;Y,Z),$ and of the running time of Broja_2pid estimator.

`returndata[’Num_err’]`triplet to examine the quality of the solution, and the running time to analyze the efficiency of the estimator.

**Validation**. Sets 1 and 2 are mainly used to validate the solution of Broja_2pid. For Set 1, when $\left|\mathbf{Z}\right|$ is considerably larger than $\left|\mathbf{Y}\right|$, the amount of unique information that Y has about X is more likely to be small for any sampled joint distribution. Thus, for Set 1, the average $\mathrm{UI}(X;Y\backslash Z)$ is expected to decrease as the size of Z increases. For Set 2, $\mathrm{UI}(X;Y\backslash Z)$ is expected to increase as the size of Y increases, i.e., when $\left|\mathbf{Y}\right|$ is considerably larger than $\left|\mathbf{Z}\right|$. Broja_2pid shows such behavior of $\mathrm{UI}(X;Y\backslash Z)$ on the instances of Sets 1 and 2 (see Figure 6).

**Quality**. The estimator did well on most of the instances. The percentage of solved instances to optimality was at least $99\%$ for each size in any set of instances. In Figure 7, we plot the successfully solved instances against the maximum value of the numerical error triplet

`returndata[’Num_err’]`. On the one hand, these plots show that, whenever an instance is solved successfully, the quality of the solution is good. On the other hand, we noticed that the duality gap,

`returndata[’Num_err’][2]`, was very large whenever the Cone Programming solver fails to find an optimal solution for an instance, i.e., the primal feasibility or dual feasibility or the duality gap is violated. In addition, even when Broja_2pid fails to solve an instance to optimality, it will return a solution. (Broja_2pidraise an exception if the conic optimization solver fails to return a solution.) Thus, these results reflect the reliability of the solution returned by Broja_2pid.

**Efficiency**. To test the efficiency of Broja_2pid in the sense of running time, we looked at Set 3 because Sets 1 and 2 are small scale systems.Set 3 has a large input size mimicking large scale systems. Testing Set 3 instances also reveals how the estimator empirically scales with the size of input. Figure 8 shows that the running time for Broja_2pid estimator against large instances was below 50 minutes. Furthermore, the estimator has a scaling of $\left|\mathbf{X}\right|\times \left|\mathbf{Y}\right|\times \left|\mathbf{Z}\right|$, so, on Set 3, it scales as ${N}^{3}$ where N is the size of input for the sampled distributions such that $\left|\mathbf{X}\right|=\left|\mathbf{Y}\right|=\left|\mathbf{Z}\right|=N$.

#### 4.3.3. Comparison with Other Estimators

**ComputeUI**: We ran computeUI with the default parameters, which are the -far from optimality ${10}^{-7}$, maximum outer iterations 1000, and maximum inner iteration 1000, for more details, see [7]. The estimator computeUI was slower than Broja_2pid on the instances of sizes $\left|\mathbf{X}\right|=\left|\mathbf{Y}\right|=\left|\mathbf{Z}\right|\le 12$ and faster on the larger instances. For $\left|\mathbf{X}\right|=\left|\mathbf{Y}\right|=\left|\mathbf{Z}\right|\le 12$, computeUI was slower than Broja_2pid by at least a factor of 1.4 and at most factor of 1330; see Figure 9 for the actual running times. For $13\le \left|\mathbf{X}\right|=\left|\mathbf{Y}\right|=\left|\mathbf{Z}\right|\le 17$, computeUI was faster than Broja_2pid by at least a factor of 3.2 and at most factor of 39; see Figure 9 for the actual running times. The comparison shows that computeUI scales better than Broja_2pid on large instances, whereas on the regime $\left|\mathbf{X}\right|=\left|\right|\mathbf{Y}|=|\mathbf{Z}|\le 12$, which is needed in practice, Broja_2pid scales better than computeUI.

`Testing/test_from_file_broja_2pid_computeUI.py`where the distributions in the folder

`randompdfs/`are the inputs.

**Ibroja**: The estimator ibroja is slower on any instances than Broja_2pid. For $\left|\mathbf{X}\right|=\left|\mathbf{Y}\right|=\left|\mathbf{Z}\right|\le 7$ ibroja was slower than Broja_2pid by at least a factor of 206 and at most factor of 6626. Note that the factor was increasing as $\left|\mathbf{X}\right|=\left|\mathbf{Y}\right|=\left|\mathbf{Z}\right|$ increases. We did not compute the instances of sizes $\left|\mathbf{X}\right|=\left|\mathbf{Y}\right|=\left|\mathbf{Z}\right|\ge 8$ since ibroja started taking immensely long time to obtain the solutions for these instances.

`Testing/test_from_file_dit.py`where the distributions in the folder

`randompdfs/`are the inputs.

## 5. Cone Programming Model for Multivariate PID

**Proposition**

**3.**

**Proof.**

**Proposition**

**4.**

**Proof.**

## 6. Outlook

`cone_solver=“SCS”`to the function

`pid()`will make our software use the SCS-based model instead of the ECOS-based one. (The models themselves are in fact different: SCS requires us to start from the dual exponential cone program (D-EXP).) SCS employs parallelized first-order methods which can be run on GPUs, so we expect a considerable speedup for large-scale problem instances.

#### Thanks

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J.; Ay, N. Quantifying unique information. Entropy
**2014**, 16, 2161–2183. [Google Scholar] [CrossRef] - Griffith, V.; Koch, C. Quantifying synergistic mutual information. In Guided Self-Organization: Inception; Springer: Berlin, Germany, 2014; pp. 159–190. [Google Scholar]
- Harder, M.; Salge, C.; Polani, D. Bivariate measure of redundant information. Phys. Rev. E
**2013**, 87, 012130. [Google Scholar] [CrossRef] [PubMed] - Williams, P.L.; Beer, R.D. Nonnegative decomposition of multivariate information. arXiv, 2010; arXiv:1004.2515. [Google Scholar]
- Makkeh, A.; Theis, D.O.; Vicente, R. Bivariate Partial Information Decomposition: The Optimization Perspective. Entropy
**2017**, 19, 530. [Google Scholar] [CrossRef] - Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
- Banerjee, P.K.; Rauh, J.; Montúfar, G. Computing the Unique Information. arXiv, 2017; arXiv:1709.07487. [Google Scholar]
- Chicharro, D. Quantifying multivariate redundancy with maximum entropy decompositions of mutual information. arXiv, 2017; arXiv:1708.03845. [Google Scholar]
- Domahidi, A.; Chu, E.; Boyd, S. ECOS: An SOCP solver for embedded systems. In Proceedings of the European Control Conference (ECC), Zurich, Switzerland, 17–19 July 2013; pp. 3071–3076. [Google Scholar]
- O’Donoghue, B.; Chu, E.; Parikh, N.; Boyd, S. SCS: Splitting Conic Solver, Version 1.2.7, 2016. Available online: https://github.com/cvxgrp/scs (accessed on 26 November 2017).
- Luenberger, D.G.; Ye, Y. Linear and Nonlinear Programming; Springer: Berlin, Germany, 1984; Volume 2. [Google Scholar]
- Gärtner, B.; Matousek, J. Approximation Algorithms and Semidefinite Programming; Springer: Berlin, Germany, 2012. [Google Scholar]
- Chares, R. Cones and Interior-Point Algorithms for Structured Convex Optimization Involving Powers Andexponentials. Ph.D. Thesis, UCL-Université Catholique de Louvain, Louvain-la-Neuve, Belgium, 2009. [Google Scholar]
- Makkeh, A. Applications of Optimization in Some Complex Systems. Ph.D. Thesis, University of Tartu, Tartu, Estonia, 2018. forthcoming. [Google Scholar]

**Figure 1.**The ${\mathcal{K}}_{\mathrm{exp}}$ cone and its dual: (

**a**) ${\mathcal{K}}_{\mathrm{exp}}$ for $-2\le r\le 0$ and $0\le q,t\le 2.$; and (

**b**) ${\mathcal{K}}_{\mathrm{exp}}^{\ast}$ for $-2\le u\le 0$ and $0\le w,v\le 2$.

**Figure 3.**Broja_2pid workflow: (

**Left**) the flow in

`pid()`; and (

**Right**) the flow in ECOS. The arrows with oval tail indicate passing of data, whereas the ones with line tail indicate time flow.

**Figure 5.**For each $10\le m\le 60$ and $10\le m\le 90$, the time for estimator computeUI and Broja_2pid for computing BROJA PID for the Copy gate with $\mathbf{Y}=n$ and $\mathbf{Z}=m$ is shown. The instances were arranged in increasing order with respect to the value of ${m}^{2}{n}^{2}$.

**Figure 6.**For each group of instances in Sets 1 and 2: (

**a**) $\mathrm{UI}(X;Y\backslash Z)$ of Set 1; and (

**b**) $\mathrm{UI}(X;Y\backslash Z)$ of Set 2 show the instance with the largest $\mathrm{UI}(X;Y\backslash Z)$, the average value of $\mathrm{UI}(X;Y\backslash Z)$ for the instances, and the instance with the smallest $\mathrm{UI}(X;Y\backslash Z)$.

**Figure 7.**For each group of instances in Sets 1, 2, and 3: (

**a**) maximum numerical error of Set 1; (

**b**) maximum numerical error of Set 2; and (

**c**) maximum numerical error of Set 3 show the instance with the largest ϵ, the average value of ϵ for the instances, and the instance with the smallest ϵ; where ϵ is the maximum numerical error.

**Figure 8.**For each group of instances in Set 3: (

**a**) ${t}^{1/6}$ versus s; (

**b**) ${t}^{1/3}$ versus s; and (

**c**) $t{10}^{3}$ versus s show the slowest instance, the average value of running times, and the fastest instance; where the running time of Broja_2pid, t (secs), is scaled to ${t}^{1/6},{t}^{1/3},$ and $t{10}^{3}$, respectively.

**Figure 9.**For each group of instances in Set 3: (

**a**) t versus $2\le s\le 12$; and (

**b**) t versus $13\le s\le 17$ show the running times of computeUI and Broja_2pid; where the running time t is in seconds.

**Table 1.**Parameters (tolerances) of ECOS. The parameter

`reltol`is not recommended to be set higher. For more explanation, see https://github.com/embotech/ecos.

Parameter | Description | Recommended Value |
---|---|---|

feastol | primal/dual feasibility tolerance | ${10}^{-7}$ |

abstol | absolute tolerance on duality gap | ${10}^{-6}$ |

reltol | relative tolerance on duality gap | ${10}^{-6}$ |

feastol_inacc | primal/dual infeasibility relaxed tolerance | ${10}^{-3}$ |

abstol_inacc | absolute relaxed tolerance on duality gap | ${10}^{-4}$ |

reltol_inacc | relaxed relative duality gap | ${10}^{-4}$ |

max_iter | maximum number of iterations that “ECOS” does | 100 |

Output | Description |
---|---|

0 (default) | pid() prints its output (python dictionary, see Section 3.4). |

1 | In addition to output=0, pid() prints a flags when it starts preparing (EXP). |

2 | and another flag when it calls the conic optimization solver. |

In addition to output=1, pid() prints the conic optimization solver’s output. | |

(The conic optimization solver usually prints out the problem statistics and the status of optimization.) |

Key | Value |
---|---|

’SI’ | Shared information, $\mathrm{SI}(X;Y,Z)$. |

(All information quantities are returned in bits.) | |

’UIY’ | Unique information of Y, $\mathrm{UI}(X;Y\backslash Z)$. |

’UIZ’ | Unique information of Z, $\mathrm{UI}(X;Z\backslash Y)$. |

’CI’ | Synergistic information, $\mathrm{CI}(X;Y,Z)$. |

’Num_err’ | information about the quality of the solution. |

’Solver’ | name of the solver used to optimize (CP). |

(In this version, we only use ECOS, but other solvers might be added in the future.) |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Makkeh, A.; Theis, D.O.; Vicente, R. BROJA-2PID: A Robust Estimator for Bivariate Partial Information Decomposition. *Entropy* **2018**, *20*, 271.
https://doi.org/10.3390/e20040271

**AMA Style**

Makkeh A, Theis DO, Vicente R. BROJA-2PID: A Robust Estimator for Bivariate Partial Information Decomposition. *Entropy*. 2018; 20(4):271.
https://doi.org/10.3390/e20040271

**Chicago/Turabian Style**

Makkeh, Abdullah, Dirk Oliver Theis, and Raul Vicente. 2018. "BROJA-2PID: A Robust Estimator for Bivariate Partial Information Decomposition" *Entropy* 20, no. 4: 271.
https://doi.org/10.3390/e20040271