Isometric Signal Processing under Information Geometric Framework

Information geometry is the study of the intrinsic geometric properties of manifolds consisting of a probability distribution and provides a deeper understanding of statistical inference. Based on this discipline, this letter reports on the influence of the signal processing on the geometric structure of the statistical manifold in terms of estimation issues. This letter defines the intrinsic parameter submanifold, which reflects the essential geometric characteristics of the estimation issues. Moreover, the intrinsic parameter submanifold is proven to be a tighter one after signal processing. In addition, the necessary and sufficient condition of invariant signal processing of the geometric structure, i.e., isometric signal processing, is given. Specifically, considering the processing with the linear form, the construction method of linear isometric signal processing is proposed, and its properties are presented in this letter.


Introduction
Information geometry was pioneered by Rao [1] in 1945, and the more concise framework was built up by Chentsov [2], Efron [3,4], and Amari [5]. In information geometry, the research object is the statistical manifold, which consists of a parameterized family of probability distributions with a topological structure, M = {p(x; ξ)}. Given the Fisher information matrix as the Riemannian metric, the distance between any two points (probability distributions) can be calculated [6]. In such a manifold, the distance between two points stands for the intrinsic measure for the dissimilarity between two probability distributions [7]. As information geometry provides a new perspective on signal processing, there are many applications of it. In estimation issues, based on the Riemannian distance, the natural gradient has been employed [8][9][10]. The intrinsic Cramér-Rao bound is a tighter bound of both biased and unbiased estimators and derives from the Grassmann manifold [11]. In addition, the geometric structure (considering the distance between all pairs of points) can be used as an evaluation of the quality of the observation model, which has been applied in waveform optimization [12]. In optimization problems under the matrix constraint, the geometric structure was utilized [13][14][15]. Moreover, there are also many significant works of detection based on the distance [16][17][18][19][20]. Furthermore, in image processing, based on the Grassmann manifold, the target recognition in the SAR (Synthetic Aperture Radar) image is proposed [21].
As this new general theory has revealed the capability to solve statistical problems, the further development of information geometry demands the unambiguous relationship between the geometric structure and the intrinsic characteristic of common issues. This letter focuses on the influence of the signal processing on the statistical manifold in terms of estimation issues. In the estimation issues, the signal processing is the common means to mine for the information of a desired parameter. Accompanying signal processing, the geometric structure of the considered statistical manifold, to which the distribution of the observed data belongs, would change. The purpose of this letter is studying the geometric structure change accompanying signal processing and proposing an appropriate processing based on the change of the structure. This research will be presented in the following way. At first, according to the essence of the estimation issues, the intrinsic parameter submanifold, which reflects the geometric characteristic of the issues, has been defined. Then, we show that the statistical manifold will become a tighter one after processing and give the necessary and sufficient condition of the invariant signal processing of the geometric structure (named isometric signal processing). Considering the more specific condition that the processing is linear, the construction method of linear isometric processing is proposed. Moreover, the properties of the constructed processing are presented.
The following notations are adopted in this paper: the math italic x, lowercase bold italic x, and uppercase bold A denote the scalars, vectors, and matrices, respectively. Constant matrix I indicates the identity matrix. Symbols (·) H , (·) T , and (·) * indicate the conjugate transpose operator, transpose operator, and the complex conjugate, respectively. In addition, [A] ij indicates the ith row jth column element of matrix A, and rk(A) is the rank of matrix A. Moreover, A ≥ 0 means that the matrix A is a positive semidefinite matrix. Finally, E(·) indicates the statistical expectation of a random variable.

Intrinsic Parameter Submanifold
Let M = {p(x, ξ)} be a statistical manifold with coordinate system ξ, which consists of a family of probability distributions. Consider an estimation issue on the statistical manifold M; the observed data x = (x 1 , x 2 , . . . , x N ) belong to one of the probability distributions p(x, ξ) in M. Suppose the desired parameter θ is implied in parameter ξ and the relation between θ and ξ can be expressed as a mapping, h : θ → ξ. As an instance, in the distance measurement of the pulse-Doppler radar, the desired distance r is embedded in the statistical mean µ of the observed data, i.e., µ = h(r) = P(t − 2r/c) (P(t) means the pulse signal, and c is the velocity of light).
Actually, not all p(x, ξ) in M are concerned with the estimation issue; the considered probability distributions {p(x, h(θ))} not cover the whole manifold, they are only from a submanifold, which is the essential manifold in the issue. In the above example, the considered distributions are screened by the pulse signal P(t) (the statistical mean µ is able to be expressed as P(t − 2r/c)).
The Riemannian metric of submanifold S is defined as I x (θ), the Fisher information matrix associated with parameter θ, as in Figure 1. Actually, the distance of two points on the submanifold is defined by using the Riemannian metric [6]. Figure 1. The intrinsic parameter submanifold.

Remark 1.
When the Fisher information matrices G 1 , G 2 belonging to two observation models satisfy G 1 ≥ G 2 , the observation model with G 1 is suggested to be better than another in terms of the estimation problem. The reason is that the distance D 1 (θ 1 , θ 2 ) (defined by G 1 ) is larger than D 2 (θ 1 , θ 2 ) (defined by G 2 ), because of the definition of the distance on the manifold. That means the two parameters θ 1 , θ 2 are easier to discriminate in the manifold with G 1 than G 2 .
Furthermore, the above remark also can be explained in traditional statistical signal processing. In estimation theory, the Fisher information also plays an important role, as the CRLB (Cramér-Rao Lower Bound) inequality. Therefore, in the traditional estimation theory, the same conclusion can be educed.

Geometric Structure Change by Signal Processing
In estimation issues, the signal is often processed to another form to obtain accurate estimates. Consider the signal processing y = g(x), where x indicates the original signal and y is the processed signal. The signal processing often accompanies the varying of the statistical manifold, specially the varying of the Riemannian metric.
One of the most vital factors of the submanifold in terms of estimation issues is its Riemannian metric, because the distance, representing the similarity, between two parameters is defined by it. Suppose the intrinsic parameter submanifold of x and y are S and S , respectively. The Riemannian metrics of S and S are G S and G S , respectively. If the PDFs (Probability Density Functions) p x (x; θ), p y (y; θ), and p xy (x, y; θ) obey the boundary condition [22], then the Fisher information satisfies the following equation [22,23], Because y is produced by x via y = g(x), the following equation has been established.
Proof. Because p xy (x, y; θ) = 0 for y = g(x), then the p x,y (x, y; θ) can be expressed as p x,y (x, y; θ) = p x (x; θ)δ(y − g(x)), the Fisher information can be simplified: Then, the following lemma holds.

Lemma 1.
The Riemannian metrics G S and G S satisfy, ∀θ: Proof. By Equations (1) and (3), and the definitions of G S and G S , the lemma has been established.
Proof. By Equation (2), Equation (3), and the definitions of G S and G S , the corollary has been established.
Therefore, according to Lemma 1 and its corollary, the signal processing would result in Fisher information loss. As Figure 2 shows, the signal processing would turn the intrinsic parameter submanifold into a tighter one, i.e., discriminating two parameters turns out to be more difficult.

Isometric Signal Processing
As the above discussion, the appropriate signal processing should satisfy that the intrinsic parameter submanifold of processed signal is isometric to the original submanifold, i.e., the difference between any two parameters is unreduced. Definition 2 (Isometry). When G S (θ) = G S (θ), the two intrinsic parameter submanifolds S and S are isometric.
Actually, the sufficient and necessary condition of the isometry of S and S is as follows.

Theorem 1.
If and only if y is the sufficient statistic of x, G S (θ) = G S (θ).
Proof. For Lemma 1, the following relations are equivalent, That means p x|y (x|y; θ) is irrelevant to parameter θ, i.e., y is the sufficient statistic of x.
The theorem suggests to use the test statistic to estimate the desired parameter, in the information geometry view. Actually, this conclusion also can be ensured in traditional estimation theory. For the Rao-Blackwell theorem [24], for any estimatorθ(x), the estimatorθ(y) = E(θ(x)|y) is the better estimator, i.e., E(θ(y) − θ) 2 2 , when y = g(x) is the sufficient statistic. This theorem indicates that designing the estimator using the sufficient statistic y is more appropriate, because for each estimatorθ(x) using the original signal x as input, there exists the estimatoř θ(y) = E(θ(x)|y) using the sufficient statistic y as the input that is better thanθ(x). Furthermore, for the Lehmann-Scheffé theorem [25,26], when the sufficient statistic y is complete, if the estimatorθ(y) is unbiased, i.e., E(θ(y)) = θ, the estimatorθ(y) is the minimum-variance unbiased estimator.

Corollary 2. If g(x) is a reversible function, G S (θ) = G S (θ).
Proof. If g(x) is a reversible function, the PDF of x and y satisfy: According to the Fisher-Neyman factorization theorem [27], y is the sufficient statistic of x, so G S (θ) = G S (θ).
When the processed signal y = g(x) is the sufficient statistic of x, the signal processing g(x) is the isometric signal processing. Specifically, the reversible processing is definitely isometric processing, such as DFT (Discrete Fourier Transformation, because the inverse discrete Fourier transformation can recover the original signal, i.e., DFT is a reversible process). Moreover, this conclusion is also encountered in traditional estimation theory as the Rao-Blackwell theorem and Lehmann-Scheffé theorem.

Linear Form of Signal Processing
In real works, the noise is often Gaussian or asymptotically Gaussian, and the common signal processing is linear, such as DFT, matched filter, coherent integration, etc. This section will discuss the linear form of signal processing on the Gaussian statistical manifold.

Model Formulation
The information, as the desired parameter, is usually embedded in the signal, and the signal is often contaminated by noise, which can be described as x = s(θ) + w, where s(θ) is the uncontaminated signal waveform, w is the Gaussian noise, and x is the signal. The linear signal processing can be expressed as a matrix form, y = Hx.

Fisher Information Loss of Linear Signal Processing
Suppose the linear form of signal processing is formed as y = Hx; x is the m dimension, and y is the n dimension, then the matrix H is the n × m dimension. If rk(H) < n, there are n − rk(H) rows, which are the linear combination of the rest of the rk(H) rows. Therefore, the PDF of y only depends on the rk(H) corresponding elements, and the Fisher information loss is equivalent to the loss of the submatrix consisting of such rk(H) rows. Therefore, for a convenient statement, rk(H) is assumed to be n, i.e., matrix H is row full rank.
The Fisher information loss will be discussed under WGN (White Gaussian Noise), at first. Then, the Fisher information under CGN (Colored Gaussian Noise) will be presented based on the results under WGN.

White Gaussian Noise
Suppose the noise is WGN and with power σ 2 , then the signal also obeys normal distribution x ∼ N (s(θ), σ 2 I). As the property of the normal distribution, the distribution of y is also the normal distribution, but with different parameter N (Hs(θ), σ 2 HH H ). Calculate the Fisher information of x and Hx; the loss of information is:

Colored Gaussian Noise
Suppose the noise is CGN and with covariance matrix C. According to the property of the Hermite positive definite matrix, the covariance matrix can be expressed as C = DD H , where D is a reversible matrix.
According to Theorem 1, perform the reversible transformation x * = D −1 x; the Fisher information is invariant, i.e., G S (θ) = G S (θ), and the noise in x * is WGN. Performing the linear processing HD to x * , the result is: and the information loss can be calculated by Equation (8). Therefore, the loss of information is:

The Construction of the Isometric Linear Form of Signal Processing
In the previous section, the sufficient and necessary condition of isometric signal processing was that y = g(x) is the sufficient statistic of x. However, the sufficient statistic of x is often difficult to obtain, and the isometric processing should be constructed in another way. This part will introduce the construction method of linear isometric signal processing.
As regards the previous discussion, the signal under CGN can be transformed to the signal under WGN without information loss. Therefore, the signal under WGN is discussed in this part. As for the condition of CGN, the signal can be white at first, then the next steps are the same as the WGN condition.
The linear isometric processing can be obtained in the following way. Firstly, solve the equation:

∀θ, ∂s(θ) ∂θ
Suppose the solution space is V = span{v 1 , v 2 , · · · , v l } with dimension l and the orthogonal complement of V is V ⊥ with dimension n = m − l. Then, the desired signal processing is formed as: where v 1 , v 2 , · · · , v n is the bias of V ⊥ . . Therefore, the eigenvalue of Q is one (m − n multiplicity) and zero (n multiplicity). Then, as the matrix Q is the Hermitian symmetric matrix, it can be expressed as: Consider the fact QH H = 0; the first n columns of HL must equal zero. That means the first n columns of L are the bias of V, and the rest of the columns are the bias of V ⊥ , i.e., Because v 1 , · · · , v m−n is the solution of Equation (12), i.e., the Fisher information loss is zero.
According to the proposed construction method, the following proposition can be obtained.

Proposition 2.
The matrix H is the isometric matrix with the minimal rows, i.e., the processed signal has the minimal length.
Proof. Let H be the isometric matrix with dimension n and Q = I − H H (H H H ) −1 H . Similarly, the matrix also can be expressed as: where the multiplicity of eigenvalue one is m − n . As: the first m − n rows of L H ∂s(θ) ∂θ must be zero, which means the first m − n columns of L is the linear independent solution of Equation (12). However, the solution space V = span{v 1 , v 2 , · · · , v l } has dimension m − n, so we can get m − n ≤ m − n, i.e., n ≥ n.
Therefore, the matrix H is the isometric matrix with the minimal rows.

Remark 2.
Because the first m − n columns of L are the linear independent solution of Equation (12), that means any element v from V ⊥ satisfies that the first m − n elements of L H v equal zero. Therefore, the solution In other words, the isometric matrix with dimension n is the equivalent matrix of H, which indicates that the proposed construction method can generate any isometric matrix with minimal rows.

Sample of the Construction
Consider the radar target detection scene: the radar emits the single frequency signal and receives the echo to obtain the distance and RCS (Radar-Cross-Section) information of the target. The observation model can be formulated as: where j indicates the unit of the imaginary part, t is the sampling interval, f is the frequency of the emitted signal, c is the velocity of light, w k denotes WGN, r indicates the distance of the target, and A is the unknown amplitude, which contains the information of RCS. The desired parameter is θ = (A, r). Firstly, the derivative is: Solve Equation (12); the orthogonal complement of the solution space is: span{(exp (j2π f t ), . . . , exp (j2π f Nt ))}.

Conclusions
This letter focuses on the influence of signal processing on the geometric structure of the statistical manifold in estimation issues. Based on the intrinsic characteristics of the estimation issues, the intrinsic parameter submanifold is defined in this letter. Then, the intrinsic parameter submanifold is proven, which turns into a tighter one after signal processing. Moreover, we show that if and only if the processed signal is the sufficient statistic, the geometric structure of the intrinsic parameter submanifold is invariant. In addition, the construction method of the linear isometric signal processing is proposed. Moreover, the linear processing produced by the proposed method is shown with minimal rows (when it is represented as a matrix), i.e., the processed signal has the minimal length, and the proposed method can generate all linear isometry with minimal rows.