Zero Delay Joint Source Channel Coding for Multivariate Gaussian Sources over Orthogonal Gaussian Channels

: Communication of a multivariate Gaussian source transmitted over orthogonal additive white Gaussian noise channels using delay-free joint source channel codes (JSCC) is studied in this paper. Two scenarios are considered: (1) all components of the multivariate Gaussian are transmitted by one encoder as a vector or several ideally collaborating nodes in a network; (2) the multivariate Gaussian is transmitted through distributed nodes in a sensor network. In both scenarios, the goal is to recover all components of the multivariate Gaussian at the receiver.

Abstract: Communication of a multivariate Gaussian source transmitted over orthogonal additive white Gaussian noise channels using delay-free joint source channel codes (JSCC) is studied in this paper.Two scenarios are considered: (1) all components of the multivariate Gaussian are transmitted by one encoder as a vector or several ideally collaborating nodes in a network; (2) the multivariate Gaussian is transmitted through distributed nodes in a sensor network.In both scenarios, the goal is to recover all components of the multivariate Gaussian at the receiver.The paper investigates a subset of JSCC consisting of direct source-to-channel mappings that operate on a symbol-by-symbol basis to ensure zero coding delay.A theoretical analysis that helps explain and quantify distortion behavior for such JSCC is given.Relevant performance bounds for the network are also derived with no constraints on complexity and delay.Optimal linear schemes for both scenarios are presented.Results for Scenario 1 show that linear mappings perform well, except when correlation is high.In Scenario 2, linear mappings provide no gain from correlation when the channel signal-to-noise ratio (SNR) gets large.The gap to the performance upper bound is large for both scenarios, regardless of SNR, when the correlation is high.The main contribution of this paper is the investigation of nonlinear mappings for both scenarios.It is shown that nonlinear mappings can provide substantial gain compared to optimal linear schemes when correlation is high.Contrary to linear mappings for Scenario 2, carefully

Introduction
We study the problem of transmitting a multivariate Gaussian source over orthogonal additive white Gaussian noise channels with joint source channel codes (JSCC), where the source and channel dimensions, M , are equal.We place special emphasis on delay-free codes.That is, we require the JSCC to operate on a symbol-by-symbol basis.Two scenarios are considered: (1) the multivariate Gaussian is communicated as an M -dimensional vector source by one encoder over M parallel channels or M channel uses.This scenario can also be seen as ideally collaborating nodes in a network ("Ideal collaboration" means that all nodes have access to a noiseless version of all the other node observations without any additional cost), communicating over equal and independent channels (see Figure 1a).
(2) The multivariate Gaussian is communicated as M distributed (i.e., non-collaborating) sensor nodes with correlated measurements in a sensor network.That is, each node encodes one component of the multivariate Gaussian (see Figure 1b).Scenario 2 can be seen as a special case of Scenario 1.In a more practical setting, Scenarios 1 and 2 may, for instance, represent several wired or non-wired in-or on-body sensors in a body area network communicating with a common off-body receiver.) , , ( Communication problems of this nature have been investigated for several decades.For lossless channels, it was proven in [1] that distributed lossless coding of finite alphabet correlated sources can be as rate efficient as with full collaboration between the sensor nodes.This result assumes no restriction on complexity and delay.It is not known whether a similar conclusion holds in the finite complexity and delay regime.For lossy source coding of a Gaussian vector source (Scenario 1), the rate-distortion function was determined in [2].For lossy distributed source coding (Scenario 2), the rate-distortion function for recovering information common to all sources was solved in [3].For the case of recovering Gaussian sources (both common, as well as individual information) from two terminals, the exact ratedistortion region was determined by [4,5].The multi-terminal rate-distortion region is still unknown, although several efforts towards a solution have been made in [4,6].
If the channel between the source and sink is lossy, system performance is generally evaluated in terms of tradeoffs between cost on the channel, for example, transmit power, and the end-to-end distortion of the source.Considering Scenario 1, the bounds can be found by equating the rate-distortion function for vector sources with the Gaussian channel capacity.These bounds can be achieved by separate source and channel coding (SSCC), assuming infinite complexity and delay.Considering Scenario 2, the bound is determined, in the case of two sensor nodes, by combining the rate-distortion region in [4,5] with the Gaussian channel capacity.This bound is achieved through SSCC by vector quantizing each source, then applying Slepian-Wolf coding [1], followed by capacity achieving channel codes [7].
Optimality of the aforementioned SSCC schemes comes at the expense of complexity and infinite coding delays.If the application has low complexity and delay requirements, it may be beneficial to apply JSCC.Several such schemes have been investigated in the literature: For Scenario 2, a simple nonlinear zero delay JSCC scheme for transmitting a single random variable observed by several noisy sensor measurements over a noisy wireless channel was suggested in [8].Similar JSCC schemes for communication of two or more correlated Gaussian random variables over wireless noisy channels was proposed and optimized in [9].An extension of the scheme suggested in [8] using multidimensional lattices to code blocks of samples proposed in [10].Further, [11] found similar JSCC using variational calculus and [12] introduced a distributed Karhunen-Loève transform.The authors of [13] examined Scenario 2, also with side information available at both encoder and decoder.A similar problem with non-orthogonal access on the channel was studied in [14,15].At the moment, we do not know of any efforts specifically targeting delay-free JSCC for Scenario 1, although all schemes for Scenario 2 apply as special cases.Optimal linear solutions for this problem may be found from [16].
A theoretical analysis that helps explain and quantify distortion behavior for such mappings is given in this paper.We investigate Scenario 1 as a generalization of our previous work in [31,32,38] on dimension expanding S-K mappings for i.i.d.sources, by including arbitrary correlation.Similarly, we study Scenario 2 by extending the use of S-K mappings to a network of non-collaborating nodes with inter-correlated observations.Properly designed JSCC schemes for Scenario 1 may serve as bounds for schemes developed for Scenario 2, since Scenario 1 has more degrees of freedom in constructing encoding operations.The treatment of Scenario 2 also seeks to explain why certain existing JSCC solutions for this problem (like the ones in [9]) are configured the way they are and also suggests schemes that can offer better performance in certain cases.Throughout this paper, Scenarios 1 and 2 will often be referred to as collaborative case and distributed case, respectively.
The paper is organized as follows: In Section 2, we formulate the problem and derive performance bounds assuming arbitrary code lengths.These bounds are achievable in Scenario 1 and serve as upper bounds on performance for Scenario 2. In Section 3, we analyze optimal linear mappings and discuss under what conditions it is meaningful to apply linear schemes.In Section 4, we introduce nonlinear mappings.We revisit some results from [8,31,32,38] in order to mathematically formulate the problem and give examples on and optimize selected mappings.In Section 5, we summarize and conclude.
Note that parts of this paper have previously been published in [39].Results on nonlinear mappings in Section 4 are mostly new and constitute the main contribution of the paper.

Problem Formulation and Performance Bounds
The communication system studied in this paper is depicted in Figure 1.M correlated sources, x 1 , x 2 , ...x M , are encoded by M functions and transmitted on M orthogonal channels.
The sources have a common information, y, and an (additive where C x = E{xx T } is the covariance matrix.Two scenarios are considered: (1) Each encoding function operates on all variables, 1a.This scenario can be seen as one encoder operating on an M dimensional vector source or as M ideally collaborating encoders.(2) Each encoder operates on one variable, 1b.This scenario can be seen as M non-collaborating nodes in a sensor network.The encoders operate independently, but are jointly optimized.Throughout the rest of the paper, we refer to Scenario 1 as collaborative case, and Scenario 2 as distributed case.
The encoded observations are transmitted over M -independent, additive white Gaussian noise (AWGN) channels with noise n ∼ N (0, σ 2 n I), where I is the identity matrix.For the distributed case, we impose an average transmit power constraint, P m , for each node, m, where: whereas in the collaborative case, we look at an average power constraint, P a , over all outputs, P a = (P 1 + P 2 + • • • + P M )/M .These constraints are equal, if the power for all nodes is the same.We will consider the special case where σ 2 For this special case, the covariance matrix is simple and has eigenvalues We restrict our investigation to this special case for the sake of simplicity and in order to achieve compact closed-form expressions.Generalization to networks with unequal transmit power and correlation can naturally be made.
At the receiver, the decoding functions, g m (r 1 , r 2 , ...r M ), which have access to all received channel outputs, r 1 , r 2 , ...r M , produce an estimate, xm , of each source.We define the end-to-end distortion, D, as the mean-squared-error (MSE) averaged over all source symbols: We assume ideal Nyquist sampling and ideal Nyquist channels, where the sampling rate of each source is the same as the signaling rate of each channel.We also assume ideal synchronization and timing in the network.Our design objective is to construct the encoding and decoding functions, f m and g m , that minimize D, subject to a transmit power constraint, P .

Distortion Bounds
Achievable bounds for the problem at hand can be derived for the cooperative case, and these serve as lower bounds for the distributed case.The achievable bound for the distributed case is currently only known when M = 2, and was shown in [9] to be: where SNR = P/σ 2 n is the channel signal-to-noise ratio.To derive bounds for general M , we consider ideal collaboration.
Proposition 1 Consider the network depicted in Figure 1.In the symmetric case, ∀i, j, the smallest achievable distortion for Scenario 1 and the distortion lower bound for Scenario 2 is given by: Proof 1 Let R * , D * and P * denote optimal rate, distortion and power, respectively.Assuming full collaboration, the M sources can be considered as a Gaussian vector source of dimension, M .Then from [2]: where λ i is the i-th eigenvalue of the covariance matrix, C x .The channel is a memoryless Gaussian vector channel of dimension, M , with covariance matrix, σ 2 n I. Its capacity with power, M P * , per source vector, (x 1 , x 2 , ...x M ), is: Now equate R * from Equation (6) with C in Equation ( 7) and calculate the corresponding power, P * .We get D ≥ D * (θ, M ) with D * given in Equation ( 6) and: The max and min in Equations ( 6) and ( 8) depend on ρ x and the SNR.Since the special case, ρ ij = ρ x , ∀i, j, is treated, there are two cases to consider: only the first eigenvalue, λ 1 (the common information), or all eigenvalues, are to be represented.By solving Equation ( 8) with respect to θ for these two cases and inserting the result into Equation ( 6), the bound in Equation ( 5) results.Finally, the validity of these two cases is found by solving the equation, λ i = θ [with θ in Equation ( 8)], with respect to SNR=P/σ 2 n .
In the following sections, we will compare suggested mappings to OPTA coop = σ 2 x /D * (Optimal Performance Theoretically Attainable for the cooperative case), i.e., the best possible received signal-todistortion ratio (SDR) as a function of SNR.
By comparing Equations ( 6) and (8) with Equation (4) (see Figure 2a), one can show that the above cooperative distortion bound is tight, even for the distributed case when channel SNR is high enough.For the boundary case of ρ x = 0, the problem is turned into transmitting M -independent memoryless Gaussian source over M parallel Gaussian channels, and the resulting end-to-end distortion is It is well known that linear schemes, often named uncoded transmission, are optimal in this case, and collaboration between the sensors would make no difference.Similarly, if ρ x = 1, i.e., all M sources are identical, we have a single source to transmit over M orthogonal channels, where the overall source-channel bandwidth ratio is M .Then D * = σ 2 x (1 + P * /σ 2 n ) −M .As noticed in [9], this special case is equivalent to transmitting a single Gaussian source on a point-to-point channel with M times the bandwidth or channel uses (bandwidth/dimension expansion by a factor M ).Additionally for this special case, it is possible to achieve D * with distributed encoders, but with infinite complexity and delay.

Optimal Linear Mappings
Optimal linear schemes are presented for both the distributed and the cooperative case.

Distributed Linear Mapping
At the encoder side, the observations are scaled at each sensor to satisfy the power constraint, P : At the decoder, we estimate each sensor observation utilizing all received channel outputs, r.For memoryless Gaussian sources, their MSE estimate can be expressed as linear combinations of the received channel symbols: b i are coefficients satisfying the Wiener-Hopf equations: where C r is the covariance matrix of the received vector, r, and C x i r is the cross-covariance matrix for x i and r.The average end-to-end distortion per source symbol is then given by D nc = σ 2 x − C T rx i b i , i = {1, 2, ...M } (nc denotes "no cooperation").All i terms are equal for the case treated in this paper, i.e., By inserting the relevant cross covariance matrices and the optimal coefficient vector, it is straight forward to show that:

Cooperative Linear Mapping
When cooperation is possible, the sources can be decorrelated prior to transmission by a diagonalizing transform (which is a simple rotation operation when M = 2).The transmit power, M P , is then optimally allocated along the eigenvectors of C x .This scheme is also known as Block Pulse Amplitude Modulation (BPAM) [23].The end-to-end distortion, D BPAM , is given by [16] (pp.65-66) : where t = min(M, t ′ ) and t ′ is the smallest integer that satisfies: Case 1: The total power is allocated to all encoders, that is t = M , and so: Case 2: The total power is allocated only to the first encoder, that is t = t ′ = 1, and thus: To determine for which channel SNR Case 1 and Case 2 apply, assume that t = t ′ = 1.Then, Equation ( 14) becomes By inserting the eigenvalues, one can show that Case 2 is valid whenever: The performance of any linear scheme for the network in Figure 1 is then, for SNR,≥ 0, bounded by: Observe that the bound in Equation ( 5) results when inserting ρ x = 0 in both Equations ( 12) and ( 18).The theoretical performance of both cooperative and distributed linear schemes are plotted for various M and ρ x in Figure 2, along with the OPTA curve for ρ x = 0 and OPTA coop .
Observe that when SNR is low, the cooperative and the distributed schemes coincide, regardless of M and ρ x .When SNR grows, the performance of the cooperative scheme remains parallel to that of OPTA coop , while the distributed scheme approaches the OPTA curve for ρ x = 0.The distributed linear scheme, therefore, fails to exploit correlation between the sources at a high SNR.The reason is that optimal power allocation is impossible in the distributed case, since decorrelation requires that the encoding function operate on all variables.
The conclusions we draw from the performances of the distributed linear scheme are somewhat different from the conclusions in [13].There, the authors claimed that the distributed linear scheme (referred to as AF in their paper) performs close to optimal for all SNR and ρ x .As we can see from Figure 2b, this is not necessarily the case, especially at a high SNR.In addition, linear mappings are not necessarily suitable for all values of ρ x , since its gap to OPTA coop becomes substantial when ρ x gets close to one.Distributed linear coding, although simple, is basically meaningful to use at a relatively low SNR, since its performance converges to the ρ x = 0 case as SNR grows (except when ρ x = 1).When ρ x is close to one, a significant gain can be achieved by applying nonlinear mappings.

Nonlinear Mappings
The nonlinear mappings we apply for highly correlated sources are known as S-K mappings.We first review the basics of S-K mapping and illustrate how they apply directly in the distributed case when ρ x = 1.We then generalize these mappings so that they apply to ρ x < 1, but still close to one.

Special Case ρ x = 1
S-K mappings can be effectively designed for both bandwidth/dimension compression and expansion on point-to-point links [27,31].Consider the dimension expansion of a single source (random variable): each source sample is mapped into M channel symbols or an M dimensional channel symbol.At the decoder side, the received noisy channel symbols are collected in M -tuples to jointly identify the single source sample.Such an expanding mapping, named 1:M mapping, can be realized by parametric curves residing in the channel space (as "continuous codebooks"), as shown in Figure 3 for the M = 2 and M = 3 case.The curves basically depict the one-dimensional source space as it appears in the channel space after being mapped through the S-K mapping.Noise will take the transmitted value away from the curve, and the task of the decoder is to identify the point on the curve that results in the smallest error.If we consider a Maximum Likelihood (ML) receiver, the decoded source symbol, xm , is the point on the curve that is closest to the received vector.An ML decoder is therefore realized as a projection onto the curve [40].One may also expand an M -dimensional source (M random variables) or M consecutive samples collected in a vector, into and N -dimensional channel symbol (where M < N ).Such an M :N expanding S-K mapping is realized as a hyper-surface residing in the channel space.
S-K mappings can be applied distributedly by encoding each variable by a unique function, f m (x m ): when ρ x = 1, Equation ( 19) is really a dimension expanding S-K mapping, since x m = y ∀m.That is, the same variable is coded and transmitted by all encoders.With the received signal, rm = f m (y) + n m , the ML estimate of y is given by: When M = 2, a good choice of functions is the Archimedean spiral in Figure 3a, defined by [27]: where + is for positive source values (blue spiral), while − is for negative (red spiral).∆ reflects the distance between the blue and the red curves, φ(•) is a conveniently chosen mapping function and a determines if the distance between the spiral arms will diverge outwards (a > 1), have constant distance (a = 1) or collapse inwards (a < 1).Similarly, the "Ball of Yarn" in Figure 3b is defined by [41]: when these transformed values are transmitted simultaneously on orthogonal channels, we get a Cartesian product resulting in the structures in Figure 3 (when ρ x =1).The performance of these mappings is shown in Figure 4 for a = 1.1 and several values of ∆, together with OPTA and distributed linear mappings.The optimal ∆ is found in the same way as in [31], and a similar derivation is given in Section 4.5.Interpolation between optimal points are plotted for the M = 2 case, while a robustness plot (that is, ∆ is fixed while the SNR varies; this shows how the mapping deteriorates as the channel SNR moves away from the optimal SNR) is plotted for the M = 3 case.Both the Archimedes spiral and the Ball of Yarn improve significantly over linear mappings.The distance to OPTA is quite large for the M = 3 case, but there is still a substantial gain of around 4-6 dB compared to the M = 2 case.A 1:3 mapping with better performance has been found in [42], but can only be applied with collaborative encoders.It has also been shown that S-K mappings can perform better at a low SNR using MMSE decoding [35].
One can get better insight into the design process of S-K mappings by understanding their distortion behavior.Analyzing nonlinear mappings in general is difficult, and in order to provide closed form expressions that can be interpreted further, we follow the method of Kotelnikov [18] (chapters 6-8) (see also [40] (chapter 8.2) and [38]) and divide the distortion into two main contributions: weak noise distortion, denoted by ε2 wn , and anomalous distortion, denoted by ε2 th .Weak noise distortion results from channel noise being mapped through the nonlinear mapping at the receiver and refers to the case when the error in the reconstruction varies gradually with the magnitude of the channel noise (non-anomalous errors).For a curve, f , weak noise distortion is quantified by [18,31]: where D is the domain of the source and p y (y) its pdf (we use y here, since . ∥f ′ (y)∥ is the length of the tangent vector of the curve at the point, y.The Equation (23) says that the more the source space (y) is stretched by the S-K mapping, f , at the encoder side (think about stretching of the real line like a rubber band or a nonlinear amplification), the more the channel noise will be suppressed (attenuated) when mapped through the inverse mapping at the receiver.If the curve should be stretched a significant amount without violating a channel power constraint, a nonlinear mapping that "twists" the curve into the constrained region is needed, as illustrated in Figure 5a.Still, the curve must have finite length (it cannot be stretched indefinitely), since, otherwise, anomalies, also called threshold effects [17,43], will occur.Anomalies can be understood as follows: Consider the 1:2 mapping in Figure 5b.When the distance between the spiral arms (∆) becomes too small, for a given noise variance, σ 2 n , the transmitted value, f (y 0 ), may be detected as f (y + ) on another fold of the curve at the receiver.The resulting distortion when averaging over all such incidents is what we have named anomalous distortion (see, e.g., [31] or Section 4.5 for more details).Since anomalous distortion depend on the structure of the chosen mapping, we only state that the pdf needed to calculate the probability for such errors here and give a specific example on how to calculate anomalous distortion for the Archimedes spiral in Section 4.5.The pdf of the norm, ϱ = ∥ñ∥, for an N -dimensional vector, ñ, with i.i.d.Gaussian components, is given in [44] (p.237): Note that if f (y) is chosen, so that only noise vectors perpendicular to it, n ⊥ , lead to anomalous errors (like the spiral in Figure 5), then N = M − 1. Anomalous errors happen in general if the norm, ϱ, gets larger than a specific value.For instance, for the Archimedes spiral, the probability for anomalies are give by P r{ϱ ≥ ∆/2} when a = 1 (around the optimal ∆).There is a tradeoff between the two distortion effects, which results in an optimal curve length for a given channel SNR (this corresponds to an optimal ∆ for the Archimedes spiral and Ball of Yarn).The two distortion effects can be seen for the M = 3 case in Figure 4, where ε2 wn dominates above the optimal point and ε2 th dominates below.Note specifically that ε2 wn has the same slope as a linear mapping, which results from the fact that a linear approximation to any continuous (one-to-one) nonlinear mapping is valid at each point if σ n is sufficiently small.

Nonlinear Mappings for ρ x < 1
The situation becomes more complicated when ρ it is straight forward to deduce from the reverse water-filling principle [45] that only the largest eigenvalue (here λ 1 ) should be represented when SNR is below a certain threshold (for instance, for the bound in Equation ( 5); this threshold is given by SNR= . That is, only transmission and decoding of y should be considered below a certain SNR.Above this threshold, one should consider all eigenvalues, i.e., transmit and decode all individual observations.One can get an idea on how specific mappings should be constructed when M = 2 from the distributed quantizer (DQ) scheme in [9].There, each node quantizes its source using a scalar quantizer.Figures 6a and 6b show the DQ centroids plotted in pairs in the two-dimensional channel space for ρ x = 0.95 and 0.99.Observe that the DQ centroids in Figure 6b lie on a thin spiral-like surface strip that is "twisted" into the channel space.One possible way to construct a continuous mapping is to use the parametric curves introduced for the ρ x = 1 case as they are, i.e., use Equations ( 21) and ( 22) directly.Inspired from Figure 6b, we choose to apply the Archimedes spiral in Equation ( 21), shown in Figure 6c, when ρ x = 0.999.Compared to Figure 3a, the spiral is now "widened" into a thin surface strip.
We propose a mapping for collaborative encoders for the M = 2 case to provide insight into what benefits collaboration may bring.To simplify, we make a change of variables from x 1 , x 2 to the independent variables, y a = (x 1 + x 2 )/2 and z a = (x 2 − x 1 )/2.y a is aligned with the first eigenvector of C x , while z a is aligned with the second eigenvector.One possible generalization of the spiral in Equation ( 21) is: where h(y a ) is the Archimedes spiral in Equation (21) and N(y a ) is the unit normal vector to the spiral at the point, h(y a ).One can use Appendix A to show that the components of N(y a ) are: A similar generalization can be applied to other parametric curves, h(y), for any M .
To provide geometrical insight into how two correlated variables are transformed by Equations ( 25) and ( 21), we show how they transform the three parallel lines, x 2 = x 1 − κ (red), x 2 = x 1 (blue) and x 2 = x 1 + κ (green), in Figures 7a and 7b, respectively.
The generalized spiral in Equation ( 25) represents both common information (the blue curve) and the individual contributions from both sources uniquely.The distributed mapping in Figure 7b represents common information well, but ambiguities will distort the individual contributions in certain intervals: The green curve in Figure 7b results by inserting x 2 = x 1 + κ in Equation ( 21), which is a "deformed" spiral lying inside an ellipse with the major axis aligned along the function, w 2 = w 1 .The red spiral, on the other hand, is lying inside an ellipse with the major axis aligned along w 2 = −w 1 .These spirals are therefore destined to cross at certain points.Ambiguities can also be observed for similar mappings found in the literature.One example is the DQ in Figure 6b, as illustrated in Figure 7d.Whether continuous mappings that avoid ambiguities can be found when the encoders operate on only one variable is uncertain.Further research is needed in order to conclude.
An alternative mapping that avoids ambiguities in the distributed case is the piecewise continuous sawtooth mapping proposed in [8], depicted in Figure 6d.Although this mapping was proposed for transmission of noisy observations of a single random variable, it is applicable for the coding of several correlated variables by a slight change in the decoder.The encoders for the M = 2 case are given by: where ∆ determines the period of the sawtooth function, f 2 , and α 1 , α 2 makes it possible to control the power for each encoder separately.We use this mapping as an example for M = 2.It can easily be extended to both arbitrary M , as shown in [8] and blocks of samples (code length beyond one), as shown in [10], which makes it a good choice of mapping.From Figure 7c, one can observe that ambiguities are avoided.
To determine the reconstruction, xm , m = 1, • • • , M , we first decode y then z m .y is found by projecting the received vector onto the closest point on the curve representing common information, f (y).f (y) corresponds to the blue curves shown in Figures 7a and 7b when M = 2.The ML detector for y is therefore given in Equation (20).Given ŷ, the individual contributions, z m , can be found by mapping values of z = [z 1 , • • • , z M ] within an M-ball of a certain radius, ϱ, through the encoding functions, f , then choose the z that results in the smallest distance to the received vector.For the collaborative case: whereas for the distributed case: Note that ϱ decreases with increasing ρ x , making the search for z simpler.If ϱ is chosen as too large, then anomalous errors will result.The reconstruction is finally given by xm = ŷ + ẑm .Note that in order to achieve the best possible result at low SNR, one should use MMSE decoding.Since only y is reconstructed at low SNR, a similar approach to that in [35] for dimension expanding S-K mappings may be used.We leave out this issue and refer to [35] as one possible way to achieve better performance.
In the following sections, distortion and power expressions are given.These expressions will facilitate analysis and optimization of S-K mappings for the network under consideration.

Power and Distortion Formulation: Collaborative Encoders
To calculate power and distortion, we apply and generalize selected results from [8,31,32,38].For both cases, the formulation of the problem will depend on whether only common information or both common information, as well as individual contributions should be reconstructed at the receiver.Details in some derivations are omitted, since they require substantial space.

Reconstruction of Common Information
When the encoders collaborate, one may drop all individual contributions, z m , prior to transmission by averaging over all variables, y a = ( and therefore, the same distortion contributions as for the ρ x = 1 case in Section 4.1 apply.That is, Equation ( 23) quantifies weak noise distortion by exchanging y with y a and the pdf, p y (y), with p ya (y a ), and the probability for anomalous errors can be found from Equation (24).
We also get a distortion contribution from excluding z m .This contribution is reflected in the fact that the eigenvalues, λ m , m = 2, • • • , M , are not represented.The distortion is given by the sum of these eigenvalues divided over all M sources: The power per source symbol is given by [31]:

Reconstruction of Common Information and Individual Contributions
With , the power per source symbol becomes: Since all eigenvalues are now represented, the distortion in Equation (30) will disappear.The weak noise and anomalous distortion need to be modified.
Weak Noise Distortion: Although we have M variables communicated on M channels, expansion of x by an S-K mapping is possible when ρ x is close to one (for similar reasons as in Section 4.1).A similar analysis as that in [32,38] for M :N dimension expanding S-K mappings can therefore be applied.
We now have a thin M -dimensional hyper-surface strip of dimension M that is twisted and bent into the M -dimensional channel space (like the structure in Figure 7a).That is, a subset of R M is mapped into R M .An important fact is that weak noise distortion is defined intrinsically [32,38], i.e., it is defined only on the M -dimensional surface representing the S-K mapping, independent of any surrounding coordinate system.One can therefore calculate weak noise distortion here in the same way as in [38].For brevity, we only state the result and explain the essentials of the given expression.The reader may consult [38] for details in the derivation.We have: with p x (x) given in Equation ( 1) and D, the relevant domain of the source space.g ii , i = 1, • • • , M , denote the diagonal components of the so-called metric tensor, described in Appendix B (an intrinsic feature of the surface, f ), which corresponds to the squared norm of the tangent vectors along f (x i ), i = 1, • • • , M .These components quantify the nonlinear "magnification" done by f on the source vector, x.
Note that Equation ( 33) is a generalization of Equation ( 23) and basically says that the more the source, x, is stretched/magnified by f (in all M directions) at the encoder, the more suppressed the channel noise will become when mapped through the inverse mapping at the receiver.(Note that for Equation ( 33) to be valid, all off-diagonal components g ij = 0; this is the case for all mappings treated in this paper.) Anomalous distortion: Anomalies now refer to the confusion of x 1 , • • • , x M with the vector, x1 , • • • , xM , on another fold of the mapping.For instance, in Figure 7a, vectors between the blue and green spirals may get exchanged with values along the green spiral on another fold.We only derive the pdf needed to calculate the probability for anomalous errors here and give a specific example on how to calculate anomalous distortion in Section 4.5.
The probability for anomalies now depends on both the noise, n and z, since the mapping "widens" with the magnitude of z m .Let y 0 denote an M -dimensional vector with all components equal to y 0 .To be able to calculate the pdf of z after the nonlinear mapping, f (x 1 , • • • , x M ), given that y = y 0 , we have to assume that ρ x is close enough to one (z m small) to consider the linear approximation: The variance per dimension of the transformed vector, z T = J(y 0 )z, is then given by σ 2 z T (y 0 ) = (1/M )E { (J(y 0 )z) T (J(y 0 )z) } .By assuming that the off-diagonal components of the metric tensor of f is g ij = 0, the same arguments as in [38] lead to: The g ii (y)'s reflects the magnification of z given y and are given by: Let zT and ñ denote the N -dimensional sub-vectors of z T and n at f (y 0 ) that point in the direction of the closest point on another fold of f (like n ⊥ in Figure 5a).The pdf of the sum, zT + ñ, is given by a convolution [46] p zT (z T , y) * p ñ( ñ).Since both p zT and p ñ are i.i.d.Gaussians, the convolution is also Gaussian [46], with variance σ 2 z T ,n (y) = σ z T (y) 2 + σ 2 n .From Equation ( 24), with ϱ an = ∥z T + ñ∥, we get:

Distributed Encoders: ρ x < 1
Since each encoder operates on only one variable, it is not possible to diagonalize or take averages, implying that one cannot remove z m prior to transmission.The average power is therefore given by the same expression, whether transmission of common information or both common information and individual contributions are considered:
Weak noise distortion: The individual contributions, z, will now represent noise that corrupts the value of y.If ρ x is close to one, then the variance of z m will be small enough to consider the linear approximation, f (y + z m ) ≈ f (y) + z m f ′ (y).We are then in the same situation as in [8], and the distortion can be derived in the same way.The result is (consult [8] for details): The first term accounts for distortion due to z m , whereas the last term accounts for distortion due to channel noise and is the same as in Equation (23).It was shown in [8] that the first term in Equation ( 39) is minimized by a linear mapping.On the other hand, a linear mapping is, in most cases, sub-optimal when it comes to suppressing channel noise, i.e., minimizing the second term in Equation (39).There is, therefore, a tradeoff between lowering distortion, due to z and channel noise.
Anomalous distortion: Since z m cannot be removed prior to transmission, the pdf needed to calculate probability for anomalous errors is basically the same as in Equation (37), except that the metric tensor is different.The diagonal components of the metric tensor is now:

Reconstruction of Common Information and Individual Contributions
The power is given by Equation (38), and the pdf needed to calculate probability for anomalous errors is given by Equation (37), with the g ii 's in Equation (40).
The weak noise distortion must be reformulated, and a distortion contribution, due to the ambiguities mentioned earlier (shown in Figure 7b), must be added.
Weak noise distortion: Weak noise distortion now refers to distortion in the areas without ambiguities, that is, where each source vector has a unique representation after being mapped through f .Since (40)] is a function of only one variable, Equation ( 33) is reduced to: The integral domain, D, is over all x i for a mapping that avoids ambiguities (like the sawtooth mapping) and over the domain without ambiguities otherwise.
Distortion due to ambiguities: Picture the M = 2 case.For continuous mappings, like the spiral shown in Figure 7b, remote source values, represented by the green and red lines, may cross in certain intervals, leading to ambiguities at the decoder.Ambiguities will make the decoder interchange values along the minor axis (or minor axes for general M ).When the green and red lines cross, positive and negative values may be interchanged, which lead to a large error in the decoded value.If a continuous mapping is to be applied, it is better to only decode common information in the areas where ambiguities are prominent (for instance, in the interval between the arrows in Figure 7b).
Assume that ambiguities happen in the intervals [y i , y i+1 ] and that there are K such intervals in total.If we decode only common information in these intervals, the distortion is quantified by: while Equation ( 41) quantifies distortion outside these intervals.ε2 l is given by Equation ( 30) and takes into account the distortion from only representing common information.The second term takes into account the distortion due to z m , and the last term is distortion due to channel noise.To determine the values of y i and y i+1 , the relevant intersection points must be found (for instance, where the red and green spirals cross the blue spiral in Figure 7b).

Examples for the ρ
In this section, the mappings in Equations ( 25), ( 21) and ( 27) will be optimized using the power and distortion analysis presented in the preceding sections, then simulations of the optimized mappings are given.
First a suitable function, φ must be chosen for the spiral in Equation (21).In [31], it was argued why choosing φ as the inverse curve length is convenient.For the spiral, the curve length function is similar to a quadratic function.Since it is unknown for the problem at hand which function is optimal, we choose the inverse of φ to be the polynomial φ −1 = ℓ(θ) = ∆(aθ 2 + bθ), and so: a and b are coefficients that will be optimized, α is an amplification factor and ± reflects the sign of x m .

Power and Distortion Calculation for Collaborating Encoders
The spiral in Equation ( 21) is applied when only common information is transmitted, and the generalized spiral in Equation ( 25) is applied when individual contributions, z 1 and z 2 , are included.
Reconstruction of common information: Here, only the average, y a = (x 1 + x 2 )/2, is transmitted.To calculate the power, the norm, ∥f (y a )∥, is needed.By sin 2 (x) + cos 2 (x) = 1, it is easy to show that: The power is found by inserting Equation (44) in Equation ( 31) with M = 2.
Since z 1 and z 2 are not transmitted, the distortion, Weak noise distortion is found by inserting M = 2 in Equation ( 23), exchanging y with y a and p y (y) with p ya (y a ).By again using sin 2 (x) + cos 2 (x) = 1, one can show that: A good approximation to anomalous distortion must be found: With only y a transmitted, we only need to consider the blue spiral in Figure 7a.Then, the same method as in [31] applies, which we restate here for clarity.Figure 8 illustrates how to determine anomalous errors approximately.

Pth
The green curve depicts the noise pdf for a given y a .Since anomalies are mainly caused by the one-dimensional noise component perpendicular to the spiral (at least, close to the optimal operation point), denoted n ⊥ , the wanted pdf is found by inserting N = 1 in Equation ( 24).The result is the Gaussian distribution, n ⊥ ∼ N (0, σ 2 n ), denoted by p n ⊥ (n ⊥ ).When n ⊥ crosses the black dotted curve in Figure 8, anomalous errors result.The probability of anomalous errors, given that y a was transmitted, is therefore: To determine the errors magnitude, assume first that f (y a ) is moved outwards and exchanged with the nearest point on the neighboring spiral arm.That is, f (y a ) is detected as f + = f (y a ) + ∆.By converting to polar coordinates, we get −∆φ(ŷ + )/π = ∆φ(y a )/π + ∆.By solving this with respect to ŷ+ and using the same argument for noise moving the transmitted vector inwards to f − , we get: An approximation of the anomalous distortion is therefore given by: This expression is accurate around the mappings' optimal SNR, whereas it may differ if the SNR drops far below optimum.It serves well in order to determine the optimal parameters.

Reconstruction of individual contributions:
The decoder in Equation ( 28) is simplified with the generalized spiral in Equation ( 25), since we only need to search over one variable, z a = (x 2 − x 1 )/2 (instead of z 1 and z 2 ).
To calculate the power, ∥f (y a , z a )∥ must be determined.Since: and by using the fact that E{h m (y a )z a } = 0, N 2 1 (y a ) + N 2 2 (y a ) = 1 (unit normal vector) and h 2 1 (y a ) + h 2  2 (y a ) = φ 2 (y a ), Equation ( 32) is reduced to: where Weak noise distortion is found from Equation ( 33) by inserting M = 2.The g ii 's must be determined.Since ∂f (y a , z a )/∂z a = α z [N 1 (y a ), N 2 (y a )] T and N 2 1 (y a ) + N 2 2 (y a ) = 1: For g 11 : By using the fact that E{f (y a )z a } = 0 (for any measurable function f ) and Equation ( 45), one can show that: To calculate anomalous distortion, a similar procedure as the one that led to Equation ( 48) can be applied.The fact that the spiral in Figure 8 "widens" due to z a has to be taken into account.With the mapping in Equation ( 25), y a moves along the spiral, h(y a ), while z a moves normal to it.Therefore, only g 22 affects the probability for anomalies (since it magnifies z a ).The relevant pdf is found by inserting , which is a Gaussian distribution.With the construction in Equation ( 25), the error probability will be the same for any given n , then: p ϱan (ϱ an )dϱ an = 1 2 In Equation ( 48), the magnitude of anomalous errors was calculated by assuming that points on the blue solid-line spiral in Figure 8 were exchanged with points on the dashed blue spiral (or the other way around).Here, as can be seen from Figure 7a, either values lying between the blue and green spirals will get exchanged with points on the green spiral on another fold, or values between the red and blue spiral get exchanged with values on the red spiral on another fold.This makes the error somewhat smaller than in Equation (48).The difference in the error for these two cases is small when ρ x is close to one.To simplify calculations, we therefore choose to use the same error here as in Equation (48), which gives an upper bound on the error.The anomalous distortion is therefore bounded by Equation (48) with P th given in Equation (54).
Optimization and simulation: A constrained optimization problem must be solved in order to determine the optimal free parameters, α, ∆, a and b for the spiral and α, α z , ∆, a and b for the generalized spiral.All parameters are functions of the channel SNR.With P max , the maximum allowed power per encoder output, and D t , the sum of all distortion contributions for the relevant case: All parameters must also be positive.The problem must be solved numerically.The S-K mappings are simulated using the optimized parameters.Figure 9 shows the performance for cooperative S-K mappings compared to OPTA coop and BPAM when ρ x = 0.99 and 0.999.
The nonlinear S-K mappings outperform BPAM for most SNR, most significantly so when ρ x = 0.999.Robustness plots are shown for S-K mappings for several sets of optimal parameters.The cyan curves show the performance when only common information is reconstructed.The performance levels off and becomes inferior to BPAM at about a 22 dB channel SNR when ρ x = 0.99 and 29 dB when ρ x = 0.999.The reason is that the distortion term, ε2 l = σ 2 x (1 − ρ x )/2, results from not transmitting z a .The spiral is also quite robust against variations in SNR.With the generalized spiral in Equation ( 25), shown by the green curve, the performance does not level off at large SNR, and it does, in fact, maintain a constant gap to OPTA coop , as SNR increases without changing α, α z , ∆, a and b.This can be explained geometrically: as long as only y a should be transmitted, one can let the distance between the spiral arms, ∆, drop as SNR increases and, thereby, increase the curve length of f , resulting in a larger magnification of y a .This leads to a unique optimal SNR for each value of ∆, as shown by the cyan curves.When both y a and z p are transmitted, the spiral widens, and there must therefore be a lower bound on ∆ = ∆ min , if anomalous errors should be avoided.Then, with the right choice of ∆ min , weak noise distortion will be the only contributions to total distortion when the SNR gets high enough.As mentioned earlier, weak noise distortion has the same slope as the distortion of (linear) BPAM.This effect is also reflected in OPTA coop , since it has the same slope as BPAM when SNR gets high enough.Diagonalizing transforms cannot be applied in this case.Some derivations and expressions are, therefore, long and complicated, and some have to be found numerically.For brevity, we avoid stating some of the power and distortion expressions and just mention how they can be found numerically.

Reconstruction of common information:
The output power is found by inserting the Equation ( 21) with a = 1 and M = 2 in Equation (38) and doing the integration numerically.Since decoding of z 1 and z 2 are not considered, we get the distortion term, ε2 l = σ 2 x (1 − ρ x )/2.Weak noise distortion is found by inserting M = 2 and the derivatives of Equation ( 21) evaluated at y in Equation (39), then doing the integration numerically.
Anomalous distortion can be calculated in a similar way as in Equation ( 48), but we have to take into account that z 1 and z 2 cannot be removed prior to transmission.From Figure 7b, one can observe that P th depends on y (where we are on the blue spiral) and must be moved inside the integral in (48) (we now integrate over y, not y a ).P th (y) is found by changing κ in Equation (54) to κ(y) = σ 2 x (1 − ρ x )(g 11 (y) + g 22 (y)) + 2σ 2 n .The g ii 's are found from Equation (40), i.e., the partial derivatives of Equation ( 21) w.r.t., x 1 and x 2 , evaluated at y.

Reconstruction of individual contributions:
Two examples are considered: (1) spiral mapping and (2) sawtooth mapping.
(1) Spiral mapping: We may use the same power and anomalous distortion as we did when considering common information, since the encoders are the same [given by Equation ( 21)].
To reduce errors from ambiguities, we choose to decode only common information (values along the blue curve in Figure 7b) whenever ambiguities are present.The distortion is then given by Equation (42) with M = 2.One can numerically determine where the green and red spirals cross the blue spiral in Figure 7b in order to find the intervals, [y i , y i+1 ].
The calculation of weak noise distortion is complicated by two reasons: First, g ii (x i ) = f ′ (x i ) 2 for the functions in Equation ( 21) contains zeros, implying that 1/g ii (x i ) becomes infinite for certain values of x i .Second, since weak noise distortion is valid only in areas where no ambiguities occur, the domain of integration consists of several subdomains.To get around these problems, one can make the substitution, x 1 = (y p − z p )/ √ 2 and x 2 = (y p + z p )/ √ 2, where y p ∼ N (0, σ 2 x (1 + ρ x )) and z p ∼ N (0, σ 2 x (1 − ρ x )) are independent.One may then formulate an integral, like in Equation ( 33) (with M = 2), where g ii (x) is exchanged with g ii (y p , z p ) and p x (x) is exchanged with p(y p , z p ) = p(y p )p(z p ).One can now divide the integral over y p into several intervals corresponding to the complement of the intervals, [y i , y i+1 ], in Equation ( 42), and further integrate z p over a much smaller range (which is valid, since large values of z p are unlikely when ρ x is close to one).In order to ensure that the g ii 's stays finite when running the numerical optimization algorithm, it is convenient to further add a negligibly small constant to each of them.
(2) Sawtooth mapping: The functions in Equation ( 27) create a Cartesian product, f , with period, α 1 ∆, and "height", α 2 ∆. Figure 10 [displaying f (y)] helps in explaining some of the calculations that follow.Only f (y), i.e., the transformation of common information, is displayed here.

D Mapping Decision
The decoder applied is somewhat different than for the spirals, and similar to the decoder in [8].First, the decoder determines which domain, D i , i ∈ Z, that (r 1 , r 2 ) belongs to.That is, it decides between which two decision borders (green dashed lines in Figure 10) the received signal is located.The first source is decoded as x1 = r 1 /α 1 .If (r 1 , r 2 ) ∈ D i , the decoded value of the second source should be located in the interval, [(2i−1)∆/2, (2(i+1)−1)∆/2].The second source is therefore reconstructed as: An equivalent way of determining which interval, x 2 , is in is to choose x 2 , so that The power for Encoder 1 is To find P 2 , we need f 2 2 , which consists of parabolas limited to the intervals, [(2n − 1)∆/2, ((2n + 1) − 1)∆/2], centered at n∆, n ∈ Z. Therefore: Note that P 1 and P 2 may be unequal.To assure equal power, one can use time sharing between the two encoders.That is, encoder i uses f 1 half of the time and f 2 the other half.Weak noise distortion is found from Equation (39), where Since f 2 is a piecewise continuous function, its derivative must be taken in the sense of distributions (distribution means a special set of linear and continuous functionals here, not a probability distribution; the reader may consult, e.g., [47] for technical details concerning such functional derivatives).That is: where δ i is the Dirac delta functional centered at i. Since weak noise distortion is defined intrinsically, it is not affected by bending or cutting of the surface, f , into several pieces (such operations may affect anomalous distortion).Only stretching changes weak noise distortion.One may therefore look away from the sum of δ's when calculating weak noise distortion, and so: Since x 1 is encoded by a linear function, it does not experience anomalous errors.x 2 , on the other hand, experience anomalies when the noise becomes so large that the decision borders in Figure 10 are crossed.The pdf needed to calculate the probability for anomalous errors is found by setting N = 1 in Equation (37), where σ 2 z T is found by setting M = 2, g 11 = α 2 1 and g 22 = α 2 2 in Equation ( 35) (the sum, δ's in Equation (58), has been removed, since they do not contribute to the magnification of z 1 and z 2 .Anomalies happen whenever ∥ρ an ∥ > d 1 /2.To determine d 1 , consider Figure 10.First, note that l ).The probability for anomalous errors becomes: Entropy Whenever the green dashed border in Figure 10 is crossed, the detection, x2 , jumps across one period of the sawtooth function, which leads to an error of magnitude, ∆, in the reconstruction.The anomalous distortion is therefore quantified by: 4), in order to avoid anomalous errors.This places a lower bound on ∆ in any case.
Optimization and simulation: A constrained optimization problem must be solved in order to determine the optimal free parameters, α, ∆, a and b, for the spiral, and α 1 , α 2 , ∆, for the sawtooth mapping.All parameters are functions of the channel SNR.With P max , the maximum allowed power per encoder output, and D t , the sum of all distortion contributions for the relevant case, an optimization problem similar to Equation ( 55) is solved numerically.All parameters should also be constrained to be lager than zero.
The S-K mappings are simulated using the optimized parameters.Figure 11 shows the performance for distributed S-K mappings compared to OPTA coop , OPTA dist and the distributed linear mapping for ρ x = 0.99 and ρ x = 0.999.Robustness plots are given for S-K mappings for several sets of optimal parameters.The cyan curves show the performance of the Archimedes spiral when only common information is reconstructed.The performance levels off and becomes inferior to the distributed linear scheme at about 18 dB when ρ x = 0.99 and 28 dB when ρ x = 0.999.The reason why the Archimedes spiral levels off is that z 1 and z 2 act as noise.Also, when individual observations are reconstructed, the spiral levels off (magenta curve in Figure 11b), although at a slightly higher SNR.It becomes inferior to the linear scheme at about 32 dB.The reason for saturation is the measures taken to avoid ambiguities, resulting from the distortion term in Equation (42).The spiral is quite robust to variations in SNR.The sawtooth mapping, shown by the green line, does not level off at high SNR, since it avoids ambiguities.It also maintains a constant gap to OPTA, as SNR increases without changing the parameters, α 1 , α 2 , ∆.The reason is the same as for the generalized spiral in Section 4.5.1. .The Archimedes spiral is somewhat closer to OPTA (at its optimal points) than the sawtooth mapping before it levels off (especially when ρ x = 0.999).A probable reason is that the spiral utilizes the available space more properly (at least with Gaussian statistics), most significantly so when ρ x = 0.999.The nonlinear solutions clearly outperform the linear ones for most SNR when ρ x is close to one.

Comparison Between Collaborative Case, Distributed Case and DQ
Figure 12 shows a comparison between the optimal performance of all the suggested cooperative and distributed S-K mappings and 5-bit DQ from [9] optimized at 18 dB SNR.There is a clear gain from collaboration for SNR above 8 dB.The reason why collaboration helps when only common information is decoded is that z 1 and z 2 can be removed prior to transmission, and thereby reduce the probability for anomalous errors.The fact that z 1 and z 2 cannot be removed with distributed encoders may be one possible explanation why the OPTA bounds for the distributed and cooperative case differ.For higher SNR, when both common information, as well as individual observations are decoded, there is still a clear gain from collaborative encoders.A probable reason is that the generalized spiral utilizes the available space better than the sawtooth mapping as ρ x approaches one (at least with Gaussian statistics).The question is if there exists better zero delay distributed mappings that can close the gap to the cooperative case at high SNR (the OPTA bounds are at least the same at a high SNR).DQ is around 2 dB inferior to the distributed S-K mappings at its optimal point (17 dB).With a higher number of bits in the DQ encoders, one would expect the DQ to become at least as good as the distributed S-K mapping.Note that the difference between these three cases is smaller when ρ x = 0.99.
Note that when ρ x gets smaller than about = 0.95, the DQ optimization algorithm in [9] only generates quantized linear mappings.This corresponds to what happens with S-K mappings: When ρ x drops below a certain value, the source space will be too "wide" to be twisted into the channel space by a nonlinear S-K mapping.Either the channel power constraint will be violated or a myriad of anomalous errors would be introduced.

Extensions
A particular case that needs further investigation is the distributed case when correlation is around 0 < ρ x < 0.95.The only known zero delay JSCC applicable (to our knowledge) for this case is distributed linear schemes, which provide no gain compared to the non-correlated case when SNR is large (see Figure 2).To avoid this problem, one can look for nonlinear alternatives that may provide better performance than linear mappings at zero delay.Whether such alternatives can be found or not must be determined through further research.Alternatively, one will have to increase the code length beyond one.
The DQ algorithm discussed in Section 4.5.3. is built on uniform quantization, and it is likely that additional gains can be achieved when 0 < ρ x < 0.95 by applying nonuniform quantization.The continuous analogy of this would be an exchange of the linear encoder in Equation (9), with a nonlinear one-to-one "stretching" function, φ(x m ), that changes the coordinate grids along each dimension in a nonlinear way.This would result in g ii 's better tailored to a Gaussian distribution.Then, weak noise distortion could be made somewhat smaller without bending or twisting the source space (unlike the nonlinear mappings suggested in this paper).Anomalous errors are then avoided, and the power constraint is satisfied.The optimal g 11 was determined for a 1:N mapping in [48] (pp.294-297) using variational calculus.An extension of this result to include several g ii could be applied for our purpose.Instead of letting all encoders be nonlinear, one may possibly achieve a gain by letting φ(x m ) be linear for some of the encoders and nonlinear for the others, then apply optimal power allocation among all the encoders.Further research is needed to come to a conclusion, however.
How could one go about extending the nonlinear schemes in this paper beyond zero delay?Piecewise continuous mappings, like the Sawtooth mapping, have already been extended [10] using known lattice structures.At the time of writing, we know of no such tools for fully continuous mappings.Fully continuous mappings can be extended conceptually, however, as we illustrated for i.i.d.sources in [38].Take the generalized spiral in Equation ( 25) as an example.The spiral, h, could be generalized to a two-dimensional "spiral like" surface that map two and two samples of y a at each time instant.One could further map two and two samples of z a along the two normal vectors of h.The difficult part is to determine the equation for the surface h, since when h is found, its normal vectors are determined by Equation (63).It might be possible to find such extensions of h when the codelength is small, but this will be notoriously difficult when the codelength is large.For practical reasons, it is therefore likely that fully continuous nonlinear mappings are applicable at low delay only.

Summary, Conclusions and Future Work
In this paper, delay-free joint source channel coding (JSCC) for communication of multiple inter-correlated memoryless Gaussian sources over orthogonal additive white Gaussian noise channels was investigated.Both ideally collaborating and distributed encoders were studied for the case where all sources should be reconstructed at the decoder.
First, optimal linear JSCC were investigated.With collaborative encoders, one may decorrelate the sources then allocate power optimally among the encoders.This provides an increase in received fidelity with increasing correlation.For the distributed case, however, it is impossible to decorrelate the sources, implying that no gain in fidelity can be achieved with distributed linear schemes when correlation increases if the channel SNR is high.Nonlinear JSCC, on the other hand, can provide significant gains in fidelity over all linear schemes when correlation is close to one for most SNR.Contrary to linear distributed schemes, carefully chosen nonlinear distributed schemes can provide an increasing gain in fidelity from increasing correlation also at high SNR.Since collaborative encoders offer more degrees of freedom in the choice of encoders, they can provide benefits over distributed encoders, except in the cases when correlation is zero.The zero correlation case is trivial and achieves the performance upper bounds (OPTA) with linear schemes.When the correlation is nonzero, all suggested schemes leave a gap to the performance upper bound, however.All schemes studied are robust towards changes in channel SNR.
Possible extensions of the work in this paper include increasing code length, unequal correlations between each encoder and unequal attenuation on each sub-channel.Nonlinear mapping that may provide better performance at intermediate correlation should also be investigated, and the gap to the performance upper bound should be quantified.Practical issues, like imperfect synchronization and timing, could also be investigated.
where K = ∥dT/ds∥ is the curvature of f .By calculating Equation (63) for the functions in Equation ( 21) with a = 1, then Equation ( 26) results.

B. Metric Tensor
Consider an M -dimensional parametric hyper-surface, f (x).The metric tensor (also called a Riemannian Metric) for a smooth embedding of f in R N (M ≤ N ) can be described by the symmetric and positive definite matrix [50] (chapter 9): where J is the Jacobian of f , given by: g ii can be interpreted as the squared norm of the tangent vector along f (x i ), where x i can be seen as the i'th parameter in the parametrization, f .All cross terms, g ij , are the inner product of tangent vectors along f (x i ) and f (x j ).See, e.g., [50] (chapter 9) for further details.Note that the metric tensor is an intrinsic feature of a manifold/hypersurface.That is, it describes local properties of a manifold/hypersurface without any dependence on an external coordinate system.
Furthermore, both y and z m are discrete time, continuous amplitude, memoryless Gaussian random variables, of zero mean and variances σ 2 y and σ 2 zm , respectively, and x 1 , x 2 , ...x M are conditionally independent, given y.The correlation coefficient between any pair of sources, m, k, is then ρ m,k = σ 2 y /σ xm σ x k , m ̸ = k, where σ 2 xm is the variance of source m and σ 2 xm = σ 2 y +σ 2 zm .With all observations collected in the vector x = [x 1 , x 2 , • • • , x M ] T , the joint probability density function (pdf) is given by:

Figure 3 .
Figure 3. Shannon-Kotelnikov (S-K) mappings.The curves represent a scalar source mapped through f in the channel space.Positive source values reside on the blue curves, while negative reside on the red.(a) M = 2: Archimedes spiral; (b) M = 3: "Ball of Yarn".

Figure 4 .
Figure 4. Performance of S-K mappings when M = 2, 3 for several values of ∆.

Figure 5 .
Figure 5. 1:2 S-K mappings.(a) Linear and nonlinear mappings; (b) when spiral arms come too close, noise may take the transmitted vector, f (y 0 ), closer to another fold of the curve, leading to large decoding errors.

Figure 8 .
Figure 8. Illustration on how to approximately calculate anomalous errors when only common information is to be reconstructed.

Figure 10 .
Figure 10.Geometrical illustration of Sawtooth mapping used for calculation of distortion.Only f (y), i.e., the transformation of common information, is displayed here.