Distributed Quantization for Partially Cooperating Sensors Using the Information Bottleneck Method

This paper addresses the optimization of distributed compression in a sensor network with partial cooperation among sensors. The widely known Chief Executive Officer (CEO) problem, where each sensor has to compress its measurements locally in order to forward them over capacity limited links to a common receiver is extended by allowing sensors to mutually communicate. This extension comes along with modified statistical dependencies among involved random variables compared to the original CEO problem, such that well-known outer and inner bounds do not hold anymore. Three different inter-sensor communication protocols are investigated. The successive broadcast approach allows each sensor to exploit instantaneous side-information of all previously transmitting sensors. As this leads to dimensionality problems for larger networks, a sequential point-to-point communication scheme is considered forwarding instantaneous side-information to only one successor. Thirdly, a two-phase transmission protocol separates the information exchange between sensors and the communication with the common receiver. Inspired by algorithmic solutions for the original CEO problem, the sensors are optimized in a greedy manner. It turns out that partial communication among sensors improves the performance significantly. In particular, the two-phase transmission can reach the performance of a fully cooperative CEO scenario, where each sensor has access to all measurements and the knowledge about all channel conditions. Moreover, exchanging instantaneous side-information increases the robustness against bad Wyner–Ziv coding strategies, which can lead to significant performance losses in the original CEO problem.


Introduction
This contribution considers a special case of the distributed source coding problem where each sensor observes the same source signal. In order to forward their measurements over capacity limited links to a common receiver, the sensors have to compress their measurements. In the case where a direct communication among operating sensors is not possible, this problem is termed as the CEO problem. Here, the compression at each sensor is optimized according to the Wyner-Ziv coding principle exploiting only statistical side-information. Within this paper, the term CEO problem always stands for this noncooperative CEO problem, which means that sensors cannot communicate with each other during runtime.
We extend this scenario and allow sensors to cooperate with each other by exchanging instantaneous side-information. The fully cooperative Chief Executive Officer (fcCEO) problem is obtained if sensors can forward their uncompressed observations over intersensor links to all other sensors. The partially cooperative Chief Executive Officer (pcCEO) problem represents a scenario where instantaneous side-information is compressed before it is forwarded to other sensors.

The Information Bottleneck Principle
The information bottleneck (IB) principle was first introduced by Tishby et al. in [22,23] and defines a clustering framework based on information theoretic measures. An overview about algorithmic solutions for this basic optimization problem is given in [23,24]. The IB principle finds application in various fields in communications [25][26][27][28][29][30].
The general IB setup is depicted in Figure 1. It contains the relevant process X , a noisy observation Y of X and a compressed version Z of Y. The IB approach aims to optimize the mapping p(z|y) in order to preserve as much information about the relevant process X in Z as possible. More precisely, it tries to maximize the relevant mutual information I(X ; Z ) while fulfilling a rate constraint I(Y; Z ) ≤ C. This general goal is summarized in Figure 2a. The optimization can be formulated as a maximization of the Lagrangian function It turns out to be a non-convex optimization problem, since I(X ; Z ) and I(Y; Z ) are both convex functions of the mapping p(z|y). The parameter β is a trade-off parameter steering the focus between the preservation of relevant information and the compression of the observation. In the case of β = 0, the focus only lies on preservation of relevant information. By increasing β the compression becomes more and more important up to the case of β → ∞. Here, the functional in (1) becomes maximal if I(Y; Z ) = 0, which means that all information is compressed to a single cluster. Therefore, the parameter β can be used to adjust the compression rate I(Y; Z ) in order to fulfill a desired rate constraint I(Y; Z ) ≤ C.
Since the compression-rate curve is a monotonic increasing function in 1 β , a simple bisection search can be applied. The optimization problem in (1) can be solved by taking the derivative with respect to the mapping p(z|y) and equating it to zero. It results in the implicit update equation In the case of focusing solely on preservation of relevant information with β = 0, the optimization algorithm yields a deterministic clustering p(z|y) ∈ {0, 1}. For β > 0, the clustering p(z|y) ∈ [0, 1] is generally stochastic. The IB method can easily be extended to multiple input values. A graphical tool for visualization are IB graphs [31]. Figure 2b illustrates an example where the observations y 1 , y 2 are compressed into the cluster index z. The trapezoid represents the IB compression with respect to the relevant variable written inside the trapezoid. Exemplarily, a measurement y m = x + w m represents the relevant signal x corrupted by zero mean white Gaussian measurement noise w m with measurement signal-to-noise-

Non-Cooperative Distributed Sensing System
x , σ 2 w m denote signal and noise variances, respectively. In order to be able to forward the measurements over capacity limited links with capacities C 1 , . . . , C M , each sensor has to compress its observations using a specific encoding process. More precisely, each sensor compresses its observations y m to a cluster index z m using the mapping p(z m |y m ) leading to the Markov property: The encoding process contains a second lossy compression step if the mapping p(z m |y m ) is stochastic and lossless entropy coding if the mapping p(z m |y m ) is deterministic. Therefore, a compressed version of the index z m is transmitted without any further loss to the common receiver. The optimization of p(z m |y m ) for each sensor is done offline.
The mathematical analysis of the CEO problem and the structure of its rate-region for discrete input alphabets and the log-loss distortion measure was presented in [9] and exploits (4). It was proved that the extreme points of the contra-polymatroid solution space can be determined by greedy algorithms as the one described next. Since the communication among sensors during run-time is not possible in this approach, the solution represents a lower bound on the performance of cooperative distributed compression in this paper. An algorithmic solution to solve the CEO problem has previously been proposed in [14,15] as the so called Greedy Distributed Information Bottleneck (GDIB) algorithm. It is based on the inner bound of the CEO rate-region for the logarithmic loss distortion measure [9] and optimizes the quantization at the sensors successively. Replacing the logarithmic loss function H(X |Z) by the relevant mutual information I(X ; Z) delivers the optimization problem The set P = p(z 1 |y 1 ) · · · p(z M |y M ) defines the set of all mappings. According to [9], the compression rates I(Y S ; Z S |Z S ) are supermodular set functions with respect to the sets S [32], while the relevant information I(X ; Z) does not depend on S. Therefore, the greedy optimization structure of the GDIB algorithm is optimal and finds the extreme points of the solution space. It has to be emphasized that since the GDIB algorithm is based on the inner bound of the rate-region, it does not find the complete rate-region of the CEO problem. Following this approach, M IB related Lagrangian optimization problems are obtained, one for each sensor.

L
(1) Obviously, the optimization problem of the first sensor resembles the optimization problem for the scalar IB problem given in (1) since there is no predecessor. Subsequent sensors exploit the mappings of previously designed quantizers as statistical side-information leading to the well-known Wyner-Ziv coding strategy. Naturally, each Lagrange multiplier β m has to be chosen such that the corresponding compression rate fulfills the individual rate constraint I(Y m ; Z m |Z <m ) ≤ C m . The objectives in (6) and (7) can be solved by equating the derivative with respect to the mapping p(z m |y m ) to zero delivering the update rule with the exponent Similar to the scalar IB optimization, the implicit expression in (8) can be solved using a Blahut-Arimoto like algorithm, providing local optimal solutions. It has to be mentioned that for asymmetric scenarios, this optimization has to be performed for all M! possible permutations of the optimization order to find the best solution. A detailed derivation and performance analysis of this algorithm can be found in [14,15]. If the capacity is equally distributed over all sensors in the network, e.g., sensors share the same channel in an orthogonal way and a round robin fashion, numerical results demonstrate that the GDIB algorithm outperforms an individual scalar IB optimization at each sensor. However, there is still a large gap to the performance of a fcCEO scenario, which is defined in Section 4. Moreover, in asymmetric scenarios, the performance highly depends on the optimization order. Although no clear conclusion about the optimal Wyner-Ziv coding strategy can be drawn, a good solution can be expected when starting the optimization with the best forward channel conditions, i.e., the lowest compression (highest compression rate).

Fully Cooperative Distributed Sensing-A Centralized Quantization Approach
This section introduces the fcCEO scenario, which considers distributed sensors being able to forward their uncompressed observations to all other sensors in the network over ideal noiseless inter-sensor links. In this case, sensors can perfectly exchange their measurements y m before they jointly compress the received signals taking into account the rate constraints of all individual forward channels. Naturally, the exchange has to be done by a two-phase transmission protocol, consisting of a cooperation phase and a transmission phase. During the cooperation phase sensors exchange information until every sensor knows measurements y = [y 1 . . . y M ] T of all M sensors. The actual forwarding of the compressed observations to the common receiver is performed during the transmission phase. This full cooperation is equivalent to a single central quantizer having access to all measurements y as depicted in Figure 4. Applying the IB principle, this central quantizer can be designed in order to compress the vector y onto a cluster index z using the mapping p(z|y), which motivates the name centralized IB (CIB) for the algorithmic solution in a fcCEO scenario. The optimization problem can be formulated as the maximization of and is solved using update Equation (2) with (3) substituting the scalar y by vector y. The number of output clusters |Z| has to be chosen to |Z| = ∏ M m=1 |Z m | while the single link from the imaginary central quantizer to the receiver in Figure 4 has a channel capacity of C sum = ∑ M m=1 C m . The actual transmission over the M links has to be coordinated such that each sensor m transmits a specific part of the bits corresponding to its link capacity C m . In the special case of the measurement process being modeled as additive noise, the algorithm can be simplified to a scalar optimization problem where maximum ratio combining of all inputs y m delivers a scalar sufficient statistics of the desired relevant signal x with an overall SNR γ = ∑ m γ m . The solution of the fcCEO scenario serves as an upper bound in this paper.

Partially Cooperative Distributed Sensing
In order to investigate how the gap between non-cooperative and fully-cooperative distributed compression can be reduced, partially cooperating sensors shall now be considered. Partial cooperation means a limited exchange of instantaneous side-information among the sensors during runtime due to a rate-limitation of inter-sensor links. Non-cooperative CEO and fully-cooperative CEO problems represent the extreme cases for zero rate and unlimited rate inter-sensor links, respectively. The rate limitation requires the compression of instantaneous side-information before forwarding it to other sensors.
In this paper, only deterministic mappings are considered for this compression, while indexes z m are still obtained by stochastic mappings. This is motivated by the fact that deterministic mappings do not require further lossy compression and the resulting sideinformation indices s m can be exploited at other sensors by choosing a particular mapping p(z m |y m ) from a list of possible mappings designed offline in advance. As a consequence, the compression rates for instantaneous side-information can only be adjusted by changing the cardinalities |S m |. For all results presented below, inter-sensor links are modeled as bit pipes being able to deliver s m reliably.
The GDIB algorithm to solve the non-cooperative CEO problem is based on the inner bound (5) of the CEO rate-region. Moreover, a greedy optimization approach is optimal due to the supermodularity of the compression rates in (5). Both require the Markovian structure in (4). However, cooperation among sensors changes the Markovian structure and implies different statistical dependencies among involved random variables. As (4) does not hold anymore in pcCEO scenarios, the inner bound on the rate-region in (5) cannot be utilized to find solutions of the pcCEO scenario. To the knowledge of the authors, tight bounds on the rate-region are not available for the cooperative case. Therefore, a heuristic approach based on the greedy optimization structure of the GDIB algorithm will be applied to solve the pcCEO scenario, which is not proven to be optimal. Nevertheless, the numerical evaluation of the found solutions demonstrate their usefulness. However, the computation of required pmfs becomes more challenging and results in recursive calculations given in Appendices A.1 and A.2 because the Markovian structure of (4) does not hold anymore in pcCEO scenarios.
This paper introduces three different inter-sensor communication protocols for exchanging this instantaneous side-information: successive broadcasting, a successive pointto-point transmission and a two-phase transmission. The first two protocols perform the exchange of instantaneous side-information s m with other sensors and the forwarding of compressed versions of z m to the common receiver in the same time slot. Contrarily, the two-phase transmission protocol separates the exchange of instantaneous side-information among sensors and the communication with the common receiver into two distinct phases. The latter starts after the exchange among sensors has been completed such that all sensors have (approximately) the same amount of side-information.

Successive Broadcasting Protocol
The system model for the successive broadcasting protocol is illustrated in Figure 5. In the same time slot, sensor m − 1 not only forwards a compressed version of the quantization index z m−1 to the common receiver, but also broadcasts instantaneous side-information s m−1 to all other sensors. However, due to the greedy optimization structure, only subsequent sensors can exploit this instantaneous side-information. Thus, sensor m can exploit indices s <m of all previously transmitting sensors in order to select its quantization index z m as well as a new instantaneous side-information index s m . This scenario leads to the Markov model

Generation of Broadcast Side-Information
The design of p(s m |y m , s <m ) is inspired by the general GDIB algorithm, i.e., the optimization is done in a greedy manner. Again, there emerges one optimization problem for each sensor: The optimization problem of the first sensor equals the individual scalar optimization without any side-information at all, as described in Section 2. Subsequent sensors combine the instantaneous side-information of all previously transmitting sensors s <m with its observation y m . The relevant mutual information and the compression rate of sensor m are conditioned on S <m since broadcasting instantaneous side-information ensures all successive sensors to have access to s <m allowing Wyner-Ziv coding for generating s m . Each optimization problem given in (12) and (13) can be solved by taking the derivative with respect to the mapping p(s m |y m , s <m ) and equating it to zero. This results in the implicit update equation with As in the general GDIB algorithm, the implicit update equation in (14) can be solved using a Blahut-Arimoto like algorithm resulting in local optimal solutions.

Algorithmic pcCEO Solution for the Successive Broadcasting Protocol
After designing the mapping for the instantaneous side-information, the mapping p(z m |y m , s <m ) can be optimized, again by means of the IB principle. Therefore, the original GDIB algorithm is modified to exploit the broadcasted instantaneous side-information, defining the GDIB-BC algorithm. The optimization problem for each sensor is given as The main difference to the original GDIB optimization problem in (6) and (7) lies in the definition of the compression rate I(Y m , S <m ; Z m |Z <m ) which emerges from the combination of the observation y m and the instantaneous side-information s <m . Taking the derivative of the optimization problem for sensor m with respect to the mapping p(z m |y m , s <m ) and equating it to zero delivers with Again, the implicit update equation in (18) can be solved using a Blahut-Arimoto like algorithm. The extended Blahut-Arimoto like algorithm to design the mapping p(z m |y m , s <m ) of sensor m for a specific Lagrange parameter β m and instantaneous side-information s <m is given in Algorithm 1. The input pmf p(y m−1 , s <m−1 , z <m−1 , x) can be computed during the optimization of previous sensors. Lines 3 to 5 determine the required pmfs for the calculation of the KL-divergence of (19) in lines 6 to 9. The statistical distance of (19) is determined in lines 10 to 14. It is used to update the quantizer mapping p(z m |y m , s <m ) of sensor m. This procedure is repeated until no significant changes of the desired mappings occur anymore. The algorithm returns the updated mapping p(z m |y m , s <m ) as well as the pmf p(y m , s <m , z <m , x), which is used as an input for the successive sensor.
The parameter β m , which determines the compression rate at sensor m, has to be adjusted such that I(Y m , S <m ; Z m |Z <m ) ≤ C m is fulfilled. Similar to the original GDIB algorithm, the GDIB-BC algorithm has to be performed for each sensor and all possible optimization orders. Figure 7 illustrates the amount of instantaneous side-information available at the different sensors in a network of size M = 6 considering the broadcast of side-information. It depicts the relevant mutual information I(X ; S ≤m ) versus the sensor number m for different cardinalities |S m | and SNRs γ m . The relevant signal is chosen to be a uniformly distributed 4-ASK signal leading to |X| = 4. As expected, the amount of available instantaneous side-information increases with each additional sensor for all |S m | and γ m . To be more specific, the resolution and the quality of instantaneous side-information available at sensor m increases with growing m. In the considered symmetric scenario, the amount of information I(X ; S m |S <m ) a sensor can contribute to I(X ; S ≤m ) gets smaller for each additional sensor and the slopes of the curves decrease. Since one bit is not enough to represent the information of |X| = 4, the largest gain can be observed between |S m | = 2 and |S m | = 4. Increasing the cardinality further to |S m | = 8 results only in a small additional improvement. Certainly, this observation depends on the relevant signal X and can not be generalized. The gray colored area represents the non-achievable region, since I(X ; Z) cannot exceed I(X ; Y) due to the data-processing inequality. Both figures consider a scenario where all sensors in the network share the same channel to the common receiver with a fixed sumrate C sum in an orthogonal way and a round robin fashion. Consequently, larger network sizes correspond to smaller individual capacities C m = C sum M for each forward link. The performance of partially cooperating sensors broadcasting instantaneous side-information (pcCEO-BC) is compared to the non-cooperative case (CEO) of Section 3 and the fully cooperative case (fcCEO) of Section 4. As already mentioned, these two scenarios provide upper and lower bounds. In general, it can be observed that increasing the number of sensors in the network also increases the overall relevant mutual information I(X ; Z). This holds even for the case without cooperation, since each sensor applies Wyner-Ziv coding and exploits the mapping of previously designed quantizers as statistical sideinformation [14]. Independent of the cardinality |S m |, the performance of the pcCEO-BC scenario is superior to the case without cooperation among sensors. This difference grows for larger network sizes because the amount of information s <m has about the relevant variable x increases. As expected from Figure 7, increasing the cardinality |S m | not only improves the relevant information I(X ; S ≤m ), but also the overall performance measured by I(X ; Z). However, it can be observed that even for large |S m | there remains a gap to the fcCEO upper bound, especially for smaller network sizes or lower SNRs. This gap can be explained by the successive transmission protocol resulting in a gradually increasing amount of instantaneous side-information at the sensors. For instance, the first sensors does not profit at all from the partial cooperation in contrast to the fcCEO scenario where all sensors exploit almost the same amount of side-information. Considering the pmfs in Algorithm 1, it becomes obvious that larger networks might suffer from the curse of dimensionality. More precisely, pmfs like p(y m , s <m , z <m , x) can become very large during the optimization for larger network sizes. Moreover, the mapping p(z m |y m , s <m ) also depends on the network size, i.e., this problem does not only occur during the optimization, but also when storing the already optimized mapping. This numerical issue is the reason why there is no result for |S m | = 8 and a network size of M = 6 in Figures 8 and 9. In this case, it requires 2024 GiB (1 GiB = 1024 MiB, 1 MiB = 1024 KiB, 1 KiB = 1024 byte) just for storing a single instance of the pmf p(y m , s <m , z <m , x).

Successive Point-to-Point Protocol
For larger network sizes, broadcasting side-information might not be feasible anymore, since the dimensions of the mappings p(z m |y m , s <m ) and p(s m |y m , s <m ) as well as intermediate pmfs used within the optimization become huge. In order to relax this curse of dimensionality, the successive way of cooperation is exploited and the instantaneous side-information of sensor m shall only be forwarded to the direct successor m + 1 as depicted in Figure 10. Hence, a sequential chain is established from the first to the last sensor leading to the Markov Model: Again, the instantaneous side-information is obtained by a deterministic mapping optimized by means of the information bottleneck principle, illustrated in Figure 11. With each step in the sequential chain, the information s m has about the relevant signal x increases.

Generation of Point-to-Point Side-Information
Similar to the broadcast case, the design of p(s m |y m , s m−1 ) is inspired by the original GDIB algorithm. The optimization problem can be formulated in a greedy manner as where Equation (21) equals the individual scalar optimization without any side-information. Subsequent sensors combine the instantaneous side-information s m−1 sent by the previous sensor with its observation y m . In contrast to the broadcast case, the relevant mutual information is not conditioned on S <m as in (12) and (13) with As in the broadcast case, using a Blahut-Arimoto like algorithm to solve the update Equation (23) results in local optimal solutions.

Algorithmic pcCEO Solution Applying the Successive Point-to-Point Protocol
After the optimization of the mapping for instantaneous side-information, the mapping p(z m |y m , s m−1 ) can be designed by means of the information bottleneck principle. Inspired by the original GDIB algorithm, the optimization problem can be formulated as The main difference to the original GDIB optimization problem in (6) and (7) with Thus, the mapping p(z m |y m , s m−1 ) can be optimized using a Blahut-Arimoto-like algorithm. The specific algorithm for a given sensor m and a Lagrange parameter β m is given in Algorithm 2. The input pmfs p(z i |y i , s i−1 ) ∀i < m and p(s i |z ≤i , x) ∀i < m as well as p(z <m−1 , x) are calculated in advance by previous sensor optimizations. Lines 3 to 7 calculates required pmfs as given the Appendix A.2. The KL-divergence is calculated in lines 8 to 11. Using this, the statistical distance d β m (z m , y m , s m−1 ) of (28) can be calculated in lines 12 to 16, which is then used to update the quantizer mapping p(z m |y m , s m−1 ). The algorithm stops if this mapping does not change significantly anymore during subsequent iterations. Finally, the output pmfs p(s m |z ≤m , x) and p(z <m , x) need to be calculated in lines 21 to 25 for their usage in the optimization of the next sensor. Similar to the original GDIB algorithm, the optimization needs to be done for all possible optimization orders. A simple bisection search can be applied to find the ratefulfilling parameter β m , such that I(Y m , S m−1 ; Z m |Z <m ) ≤ C m holds. Figure 12 illustrates the amount of instantaneous side-information I(X ; S m ) at a specific sensor m in a network of size M = 6 using the successive point-to-point transmission protocol for different cardinalities |S m |. Obviously, I(X ; S m ) increases with each further sensor. The main difference to the broadcast case is that the instantaneous side-information provided to sensor m is represented by a single highly compressed index s m−1 with cardinality |S m−1 |. While the resolution |S <m | of the available instantaneous side-information s <m increases with m in the broadcast case, it remains the same for the successive point-to-point protocol. Therefore, a higher cardinality |S m | is required compared to the broadcast case to avoid additional compression losses.    Figures 13 and 14 illustrate the overall performance of the pcCEO system with point-topoint exchanged instantaneous side-information where all sensors share the same channel to the common receiver with a fixed sum-rate C sum = ∑ M m=1 C m in an orthogonal way and a round robin fashion. Again, the black curves represent the non-cooperative CEO scenario and the fcCEO scenario. Hence, they serve as lower and upper bound, respectively. In general, the curves are very similar to those for broadcasting instantaneous side-information in Figures 8 and 9. Independent of the SNR or the sum-rate C sum , the relevant mutual information I(X ; Z) increases for larger networks and even a single bit as instantaneous side-information |S m | = 2 leads to slight improvements compared to the non-cooperative case. However, there still remains a gap to the fcCEO scenario even for large |S m |, which results from the successive communication strategy, since sensors at the beginning of the optimization chain can exploit no or little instantaneous side-information.   Figure 15 illustrates the influence of the sum-rate C sum = ∑ M m=1 C m for a scenario with M = 5 sensors. Naturally, larger sum-rates correlate with higher individual link capacities. Again, CEO and fcCEO scenarios provide lower and upper bounds, respectively.

Performance for Different Sum-Rates
For a cardinality of |S m | = 2, only a small gain compared to the non-cooperative CEO scenario can be observed. However, the gain gets more and more significant with increasing |S m |. Comparing the results to the upper fcCEO bound illuminates the loss due to limited available side-information at early transmitting sensors. The largest difference can be observed for sum-rates between 2 ≤ C sum ≤ 4 bit/s/Hz.

Asymmetric Scenarios
A very important part is the investigation of asymmetric scenarios. As the achievable relevant information I(X ; Z) of a non-cooperative CEO scenario is very sensitive to the optimization order, i.e., the Wyner-Ziv coding strategy, in asymmetric scenarios [14] the question arises if the exchange of instantaneous side-information can improve the robustness against bad optimization orders. Therefore, the same two asymmetric setups as in [14] are analyzed. Scenario 1 considers the case where sensors with low SNRs γ m have low link capacities C m while sensors with high SNRs γ m have high link capacities C m . Scenario 2 considers the opposite case, where sensors with low SNRs have high link capacities and vice versa. Figure 16 illustrates the relevant mutual information I(X ; Z) for all M! = 24 sensor permutations for a network of M = 4 sensors. The dots represent the results from [14] for a non-cooperative CEO scenario. Blue dots show Scenario 1 while the red dots represent Scenario 2. The results for the pcCEO scenario with successive point-to-point side-information exchange is depicted as bars.
Comparing the non-cooperative case with the successive point-to-point exchange of side-information for Scenario 1, we observe a slight increase of the overall relevant mutual information I(X ; Z) for partial cooperation and this particular scenario. Moreover, the influence of the Wyner-Ziv coding strategy (optimization order) becomes smaller due to cooperation. The performance for Scenario 2 is worse than the performance for Scenario 1, again for both the cooperative and the non-cooperative case. In this scenario, accurate measurements have to be strongly compressed in order to forward them to the common receiver while unreliable measurements cannot contribute much to the overall performance although they can be forwarded to the common receiver at high rates. However, the loss due to bad optimization orders is much lower for partial cooperation. A sensor with a bad forward channel and a high SNR can still forward its information to the next sensor, which might have a better forward channel. Therefore, exchanging instantaneous side-information can improve the robustness against bad optimization orders.

Two-Phase Transmission Protocol with Artificial Side-Information
Previous subsections revealed that partial cooperation by exchanging instantaneous side-information improves the overall performance. However, a gap to the fcCEO scenario still remains, and we claimed that the successive exchange of instantaneous sideinformation is the reason for this difference. Due to the sequential forwarding protocols considered so far, early sensors have no or little instantaneous side-information. They hardly profit from the cooperation as opposed to the full cooperation case where all sensors have access to the complete information. In order to substantiate this statement, a third transmission protocol consisting of two phases is considered. Inspired by the fcCEO scenario, the first cooperation phase is used to exchange instantaneous side-information between all sensors, while the transmission phase is used to forward the information to the common receiver in the usual way. The difference to the fcCEO scenario is that only compressed versions of the observations can be exchanged during the cooperation phase.
For simplicity, we assume that each sensor obtains the same instantaneous sideinformation represented by s * , independent of its position in the optimization chain, see Figure 17. Moreover, we pursue the EXIT chart philosophy [33], where extrinsic information is artificially created to analyze the information exchange between decoders in concatenated coding schemes. In the pcCEO context, the artificial side-information can be interpreted as extrinsic information about the relevant signal x being generated by adding AWGN to x. The noise variance is adapted to obtain a specific SNR γ extr or equivalently a desired mutual side-information I(X ; S * ). It has to be emphasized that γ extr can be chosen independently from the measurement SNRs at the sensors in order to obtain general conclusions. Since the instantaneous side-information is created artificially, s * is assumed to be independent of the indexes y m given the relevant signal x, i.e., p(y m , s * |x) = p(y m |x)p(s * |x) holds. This simplifies the Markovian structure of the optimization problem which equals the one of the original CEO problem. With the same argumentation as in the original CEO problem, we claim that the supermodularity holds and the greedy optimization structure is optimal. This model leads to the modified optimization problem Theoptimization problem can be solved using the same strategy as described in previous Sections 5. 1 and 5.2. This leads to the implicit update equation with  Figure 18 illustrates the same experiment as in Figure 8 or Figure 13, but for the two-phase transmission protocol. The extrinsic information is chosen independent of the measurement SNR and has its own SNR γ ext represented by different colors in Figure 18. The cardinality of the extrinsic information is chosen as |S * | = 512 to not introduce any compression losses. As before, the black dashed line represents the fcCEO scenario. The curve for γ extr = γ m = 8 dB represents the case where each sensor forwards instantaneous side-information whose quality corresponds to its measurement SNR. We observe the same performance as for the fcCEO scenario. This demonstrates that the remaining performance gap to the fcCEO scenario disappears completely for appropriate cooperation among sensors. Naturally, decreasing the SNR of the extrinsic information γ ext or equivalently I(X ; S * ) leads to a lower overall performance I(X ; Z).

Influence of Extrinsic Information
The influence of extrinsic information is depicted in Figure 19 for γ m = 8 dB and γ m = 3 dB. Therefore, the overall relevant mutual information I(X ; Z) is depicted versus the mutual information of the extrinsic information I(X ; S * ) for different network sizes. As before, all sensors share the same forward channel with C sum = 2.5 bit/s/Hz and C m = C sum M . Providing no extrinsic information, i.e., I(X ; S * ) = 0 delivers the same result as the non-cooperative CEO scenario of Section 3. Naturally, enhancing the quality of the extrinsic information increases the overall relevant mutual information I(X ; Z) up to the maximum of 2 bit/s/Hz.

Conclusions
This paper extends the non-cooperative CEO scenario allowing partial cooperation among sensors in the network. Therefore, it extends the algorithmic solution introduced in [14] for three different inter-sensor communication protocols: successive broadcasting, successive point-to-point communication and a two-phase transmission protocol. The first two protocols perform the exchange of instantaneous side-information and forwarding information to the common receiver at the same time step. Therefore, successive broadcasting exploits the instantaneous side-information of all previous sensors within the optimization chain. Since this may cause dimensionality problems during the optimization, the successive point-to-point transmission protocol forwards the instantaneous side-information only to the next sensor. It turns out that allowing this partial communication outperforms the non-cooperative compression where no communication among sensors is possible. Moreover, cooperative compression shows a larger robustness to suboptimal Wyner-Ziv coding strategies in asymmetric scenarios. However, a small performance gap to the fcCEO scenario still remains for the proposed successive broadcasting and successive point-topoint transmission protocols. This gap can be closed by a third protocol separating the cooperation from the forwarding phase and allowing each sensor to access the maximal available side-information. Although no formal conclusion about the optimality of the pcCEO can be drawn, the closeness to the fcCEO scenario in the investigated simulations reveals that solutions found by the proposed greedy algorithms are at least close to optimal.
The variational problem can be solved by taking the derivative with respect to the mapping p(z m |y m , s <m ) and equating it to zero. In the following, the derivatives of both mutual information are given. Exploiting Bayes' theorem, the argument of the logarithmic function can be rewritten p(z m |x, z <m ) p(z m |z <m ) = p(x|z ≤m ) p(x|z <m ) = p(x|z ≤m ) p(x|y m , s <m , z <m ) · p(x|y m , s <m , z <m ) p(x|z <m ) .
The last ratio in (A4) can be dropped because it does not depend on p(z m |y m , s <m ) and its contribution can be incorporated into the Lagrange multiplier β m . The insertion of the first ratio into (A3) yields the contribution of the derivative of the relevant mutual information Finally, the required pmf to calculate the conditional expectation in (A26) can be determined by p(z <m |y m , s m−1 ) = ∑ x p(x, y m , s m−1 , z <m ) ∑ x ∑ z <m p(x, y m , s m−1 , z <m ) (A37) with p(x, y m , s m−1 , z <m ) being already defined in (A28). Note that all above equations simplify to the scalar IB equations given in Section 2 when optimizing the first sensor. Moreover, when optimizing the second sensor, there is no pre-predecessor m − 2 and its impact on the above equations can be omitted.