Information Properties of a Random Variable Decomposition through Lattices

A full-rank lattice in the Euclidean space is a discrete set formed by all integer linear combinations of a basis. Given a probability distribution on $\mathbb{R}^n$, two operations can be induced by considering the quotient of the space by such a lattice: wrapping and quantization. For a lattice $\Lambda$, and a fundamental domain $D$ which tiles $\mathbb{R}^n$ through $\Lambda$, the wrapped distribution over the quotient is obtained by summing the density over each coset, while the quantized distribution over the lattice is defined by integrating over each fundamental domain translation. These operations define wrapped and quantized random variables over $D$ and $\Lambda$, respectively, which sum up to the original random variable. We investigate information-theoretic properties of this decomposition, such as entropy, mutual information and the Fisher information matrix, and show that it naturally generalizes to the more abstract context of locally compact topological groups.


Introduction
Lattices are discrete sets in R n formed by all integer linear combinations of a set of independent vectors, and have found different applications, such as in information theory and communications [1,2,3].Given a probability distribution in R n , two operations can be induced by considering the quotient of the space by a lattice: wrapping and quantization.
The wrapped distribution over the quotient is obtained by summing the probability density over each coset.It is used to define parameters for lattice coset coding, particularly for the AWGN and wiretap channels, such as the flatness factor, which is, up to a constant, the L ∞ distance from a wrapped probability distribution to a uniform one [4,5].This factor is equivalent to the smoothing parameter, used in post-quantum lattice-based cryptography [6].In the context of directional statistics, wrapping has been used as a standard way to construct distributions on a circle and on a torus [7].
The quantized distribution over the lattice can be defined by integrating over each fundamental domain translation, thus corresponding to the distribution of the fundamental domains after latticebased quantization is applied.Lattice quantization has different uses in signal processing and coding: for instance, it can achieve the optimal rate-distortion trade-off and can be used for shaping in channel coding [2].A special case of interest is when the distribution on the fundamental region is uniform, which amounts to high-resolution quantization or dithered quantization [8,9].
In this work, we relate these two operations by remarking that the random variables induced by wrapping and quantization sum up to the original one.We study information properties of this decomposition, both from classical information theory [10] and from information geometry [11], and provide some examples for the exponential and Gaussian distributions.We also propose a generalization of these ideas to locally compact groups.Probability distributions on these groups have been studied in [12], and some information-theoretic properties have been investigated in [13,14,15].In addition to probability measures, one can also define the notions of lattice and fundamental domains on them, thereby generalizing the Euclidean case.We show that wrapping and quantization are also well defined, and provide some illustrative examples.

Lattices and Fundamental Domains
formed by all integer linear combinations of a set of linearly independent vector {b 1 , . . ., b k } ⊂ R n , called a basis of Λ.A matrix B whose column vectors forms a basis is called a generator matrix of Λ, and we have Λ = BZ k .The lattice dimension is k, and, if k = n, the lattice is said to be full-rank; we henceforth consider full-rank lattices.A lattice Λ defines an equivalence relation in R n : x ∼ y ⇐⇒ x − y ∈ Λ.The associated equivalence classes are denoted by x or x + Λ.The set of all equivalence classes is the lattice quotient R n /Λ, and we denote the standard projection π : R n → R n /Λ, π(x) = x.
Let D be a Lebesgue-measurable set of R n and Λ a lattice.We say that D is a fundamental domain or a fundamental region of Λ, or that D tiles R n by Λ, if1 1) λ∈Λ (λ + D) = R n , and 2) (λ + D) ∩ ( λ + D) = ∅, for all λ = λ in Λ.Given a fundamental domain D, each coset x ∈ R n /Λ has a unique representative in D, i.e., the measurable map π| D : D → R n /Λ is a bijection.This fact suggests using a fundamental domain to represent the quotient.Each fundamental domain contains exactly one lattice point, which may be chosen as the origin.One example of fundamental domain is the fundamental parallelotope with respect to a basis {b 1 , . . ., b n }, namely one is the Voronoi region V(Λ) of the origin, given by the points that are closer to the origin than to any other lattice point, with an appropriate choice for ties.It is a well-known fact that every fundamental domain has the same volume, denoted by covol Λ := vol D = |det B|, for any generator matrix B of Λ.

Wrapping and Quantization
Consider R n with the Lebesgue measure µ, and P a probability measure such that P µ.Then the probability density function (pdf) of P is p = dP dµ , the Radon-Nikodym derivative.For fixed full-rank lattice Λ and fundamental domain D, the wrapping of P by Λ is the distribution P π := π * P on R n /Λ, given by P π (A) = P (π −1 A).For simplicity, we identify R n /Λ with D to regard P π as a distribution over D, and then we have π : R n → D given by (y + λ) → y, for all y ∈ D, λ ∈ Λ.Using this identification, the wrapping has density p π = dPπ dµ given by A construction that is, in some sense, dual to wrapping is quantization.Note that each fundamental domain D partitions the space as R n = λ∈Λ (λ + D).The quantization function is the measurable map Q : R n → Λ, given by (y + λ) → λ, for y ∈ D and λ ∈ Λ.The quantized probability distribution of P on the discrete set Λ is P Q := Q * P , given by P Q (A) := P (Q −1 A).The probability mass function of the quantized distribution is then Letting X be a vector random variable in R n with distribution p, we define X π := π(X) and X Q := Q(X) the wrapped and quantized random variables, respectively.By definition, they are distributed according to p π and p Q .Interestingly, they sum up to the original one: Note also that X π + X Q has the same distribution as (X π , X Q ), by the bimeasurable bijection y+λ → (y, λ).These factors, however, are not independent, since, in general, p(y + λ) = p π (y)p Q (λ).The difference between p(x) and (p π ⊗ p Q )(x) := p π π(x) p Q Q(x) shall be illustrated in the following examples.Note that the expression for the quantized distribution depends on the choice of fundamental domain, while the wrapped distribution does not, up to a lattice translation.We say a random variable X over [0, ∞) is memoryless if C(t) = C(t + s)/ C(s) for all t, s, where C(t) := P [X > t] is the tail distribution function.In particular, a memoryless distribution satisfies C(y + λ) = C(y) C(λ) for all y ∈ D, λ ∈ Λ, which implies p = p π ⊗ p Q .The converse, however, is not true; for example, independence holds whenever p is constant on each region λ + D, for λ ∈ Λ.

Example 1. The exponential distribution, parametrized by
Note that, in this special case, p = p π ⊗ p Q , as a consequence of memorylessness.The wrapped distribution with α = 2π, which amounts to a distribution on the unitary circle, is well studied in [16].

Example 2. Consider the univariate Gaussian distribution
and the lattice Λ = αZ, with fundamental domain The wrapped and quantized distributions are given respectively by The value α = 2π for the wrapped distribution on a unitary circle is usually considered in directional statistics [7]. Figure 1 illustrates the original, wrapped, quantized and product distributions for different zero-mean Gaussian distributions.As it can be seen in the figure, in this case, p(x) = p π (y)p Q (λ).
A straightforward consequence of the decomposition (3) is where E[•], Var[•] and Cov[•, •] denote respectively the expectation, the variance and the crosscovariance operators.We note that different types of discretization have also been studied, other then integrating over a fundamental domain [17].For instance, in [4,18,19] the discretized distribution is defined by restricting the original pdf p(x) to the lattice Λ, and then normalizing: for a fixed c ∈ D. This discretization is nothing other than the conditional distribution of X Q given that X π = c, expressed as p Q|π (λ|c) = p(c + λ)/p π (c).Moreover, when p = p π ⊗ p Q , such as in the exponential distribution, cf.Example 1, then D Λ,c (λ) = p Q (λ).
3 Information Properties

Information-theoretic Measures
Let us consider a random variable X with distribution p and the induced wrapped and quantized ones, respectively, X π ∼ p π and X Q ∼ p Q .The mutual information between X π and X Q is defined as the Kullback-Leibler divergence I(X π ; X Q ) := D KL p p π ⊗ p Q , and is a measure of how nonindependent the marginal distributions p π and p Q are [10].Using the theorem of change of variables, we have Note that, from this decomposition, we have h(X) ≤ h(X π ) + H(X Q ).
Proposition 1.Let X be a random variable, and X π and X Q the respective wrapped and quantized random variables, using the lattice αZ.Denote Proof.First, h(X π ) ≤ log α, since the uniform distribution maximizes entropy on a bounded support.Then, note that the mean and variance of the integer-valued random variable α −1 X Q are α −1 µ Q and α −2 σ 2 Q , respectively.For (7), use that, for positive integer random variables, H(X Q ) < log e µ Q /α + 1/2 , as in [20,Thm. 8]; for (8), the upper-bound for integer-valued random variables from [20,Thm. 10] . Replacing the corresponding inequalities in (6) yields the desired results.
The following lemma can be found in [2, Appendix 3].

Mutual Information
But, by choosing α sufficiently large, we can make p Q (0) = Dα p(x) dx arbitrarily close to 1, since 0 is in the interior of D α .Therefore, H(X Q ) can be made arbitrarily small.Example 3. In the case of the exponential distributions, as in Example 1, the distributions of X π and X Q are independent, i.e., p = p π ⊗ p Q , therefore I(X π ; X Q ) = 0.The mutual information and the corresponding upper bound (7) are plotted in Figure 2a, as function of the parameter ν. (6) to numerically compute the mutual information I(X π ; X Q ), as a function of the standard deviation σ, and compare it with the upper bound (8) (Figure 2b).Interestingly, I(X π ; X Q ) vanishes as σ → 0 or σ → +∞, which is equivalent to choosing a lattice Λ = αZ with α → 0 or α → +∞, cf.Proposition 2. The mutual information attains a maximum in σ ≈ 0.38, showing this is the value for which X π and X Q are the least independent.

Fisher Information
Let M = {p θ : θ ∈ Θ} be a family of probability densities p θ : R n → R + smoothly parametrized by θ in an open set Θ ⊂ R d .The Fisher information matrix is defined as the positive semi-definite matrix G(θ) with coefficients where θ (x) := log p θ (x).When M is a manifold satisfying certain regularity conditions [11], and G is positive definite, then it becomes a Riemannian manifold with the metric given by g ij (θ), called a statistical manifold.Let denote the Loewner partial order for matrices, given by A B if, and only if, B − A is positive semi-definite.The following results justify the name information matrix given to this quantity.Proposition 3 ([11, 21]).Let X be a random variable distributed according to a distribution parametrized by θ, and G(θ) its information matrix.The following hold.
1. Monotonicity: if F : X → Y is a measurable function (i.e. a statistic) and G F (θ) is the information matrix of F (X), then G F (θ) G(θ), with equality if, and only if, F is a sufficient statistic for θ.
2. Additivity: if X, Y are independent random variables, then the joint information matrix satisfies Let X be a random variable on R n , and X π and X Q its wrapped and quantized factors, respectively.We denote their respective Fisher information matrices by G(θ), G π (θ) and G Q (θ).By additivity, the Fisher information of , and, by monotonicity, we have both Example 5.In the family of exponential distributions, as in Example 1, the independence of X π and X Q implies that the Fisher information matrix is additive.Indeed, for Λ = αZ: , and .

A Generalization to Topological Groups
A topological group is a topological space (G, τ G ) that is also a group with respect to some operation • called product, and such that the inverse g −1 and product g • are continuous.As additional requisites, we ask G to be locally compact, Hausdorff and second-countable (has a countable basis) [22].Let B G be the the Borel σ-algebra of G. Haar's theorem says there is a unique (up to a constant) Radon measure on G that is invariant by left translations-we will suppose a fixed normalization, and denote both the measure and integration with respect to it by dg.The group G is said to be unimodular if dg is also invariant by right translations.Since G is σ-compact, the Haar measure is σ-finite [12].
Let Γ be a discrete subgroup of G, which is necessarily closed, since G is Hausdorff, and countable, since G is second-countable.Let us also consider the quotient space of left cosets G/Γ = {ḡ = gΓ | g ∈ G} , which has a natural projection π : G → G/Γ, given by π(g) = ḡ.We call Γ a lattice if the induced Haar measure on G/Γ is finite and bi-invariant.A particular case is when the quotient G/Γ is compact; then Γ is said to be a uniform lattice.A cross-section is defined as a set D ⊂ G of representatives of G/Γ such that all cosets are uniquely represented.A fundamental domain is a measurable cross-section.It can be shown that Γ is a lattice if, and only if, it admits a fundamental domain.Furthermore, every fundamental domain has the same measure [23,24].
Let P be a probability measure on the space (G, B G ) that is absolutely continuous with respect to the Haar measure dg.By the Radon-Nikodym theorem, we can define a density function p = dP dg ∈ L 1 (G), such that p ≥ 0 and P (A) = A p(g) dg, for all A ∈ B G .The original measure can be represented as P = p dg, and we consider the family of all such densities as Probability distributions on locally compact groups have been studied in [12], and some information-theoretic properties have been investigated in [13,14,15].The result that allows us to consider wrapped distributions in this context is the Weil formula, taken as a particular case of [24,Thm. 3.4.6]: As a consequence, for every probability density p ∈ P(G), we can consider its wrapping p π (ḡ) = λ∈Γ p(gλ), which is L 1 (G/Γ), non-negative and is also a probability density: G/Γ p π dḡ = 1.The associated probability measure over (G/Γ, B G/Γ ) is P π = p π dḡ.This notation, suggesting P π as the push-forward measure by π, is not a coincidence, since, from Theorem 1, Analogously, given a fundamental domain D, it is possible to define a quantization map Q : G → Λ by Q(gλ) = λ, for every g ∈ D, λ ∈ Γ, which is unique since G = g∈D gΓ.The quantized probability distribution is the discrete probability measure P Q over Λ, defined by the mass function p Q (λ) = D p(gλ) dg, or as the push-forward measure Q * P .
If X is distributed according to p, and as a consequence of g → π(g), Q(g) being a measurable bijection whose inverse is the product π(g) • Q(g).Despite being an abstract definition, this framework expands the scope of the previous approach, cf.examples below.In the following, let Λ ⊂ R n be a full-rank lattice, and Λ s ⊂ Λ be a full-rank sublattice, as defined in Section 2. Example 6.Let G = R n and Γ = Λ.This recovers the approach from Section 2 as a particular case.
Example 8. Let G = R n /Λ s (a torus) and Γ = π s (Λ) (the projection of Λ to G).Then π s (Λ) consists of a finite family of cosets λ1 , . . ., λk , for k = Λ/Λ s , and a choice of fundamental domain D is the projection of a fundamental domain D of Λ.There are some standard choices for the distribution on G, such as a wrapping from the Euclidean space and the bivariate von Mises distribution [7,Section 11.4].Then p π (x) = k i=1 p(x + λi ) and p Q ( λi ) = D p(x + λk ) dx, and, in the particular case where p(x) = λs∈Λs p(x + λ s ) is a Λ s -wrapped distribution, they become p π (x) = k i=1 λs∈Λs p(x + λ s + λ i ) and p Q ( λi ) = λs∈Λs D p(x + λ s + λ i ) dx.
Example 9. Let G = F n q (a finite field) or G = Z n q , and Γ = C (any linear block code).A fundamental domain can be a finite set of points that tiles the space by C. The distributions then become finite sums, such as in Example 7.
Example 10.Let G = SL(n, R) the Lie group of square matrices with determinant 1, and Γ = SL(n, Z) (the subgroup of integer matrices).This is in fact a lattice, since for n = 2, vol(G/Γ) = √ 2ζ(2) where ζ is the Riemann zeta function, and for n > 2 the finite covolume is calculated in [27], where descriptions of fundamental domains are also given.

Conclusion
In this work, we have studied the decomposition of a random variable through lattices into its wrapping and quantization terms.Generalization of examples and of Proposition 1 to higher dimensions constitutes work in progress.We have also proposed a generalization of this decomposition to topological groups; in particular, this allows one to study information theory on such abstract spaces, which is another perspective for future work.
and 0 otherwise.Choosing the lattice Λ = αZ, α ∈ R + , and the fundamental domain D = [0, α[, one can write closed-form expressions for the wrapped and quantized distributions:

Figure 2 :
Figure 2: Mutual information I(X π ; X Q ) and its upper bound.