Lie Group Methods in Blind Signal Processing

Mika, Dariusz; Jozwik, Jerzy

doi:10.3390/s20020440

Open AccessArticle

Lie Group Methods in Blind Signal Processing

by

Dariusz Mika

^1,* and

Jerzy Jozwik

^2,*

¹

Institute of Technical Sciences and Aviation, The State School of Higher Education in Chelm, 22-100 Chelm, Poland

²

Faculty of Mechanical Engineering, Lublin University of Technology, 20-618 Lublin, Poland

^*

Authors to whom correspondence should be addressed.

Sensors 2020, 20(2), 440; https://doi.org/10.3390/s20020440

Submission received: 31 October 2019 / Revised: 27 December 2019 / Accepted: 7 January 2020 / Published: 13 January 2020

(This article belongs to the Special Issue Advances in Machine Learning for Intelligent Engineering Systems and Applications)

Download

Browse Figures

Versions Notes

Abstract

This paper deals with the use of Lie group methods to solve optimization problems in blind signal processing (BSP), including Independent Component Analysis (ICA) and Independent Subspace Analysis (ISA). The paper presents the theoretical fundamentals of Lie groups and Lie algebra, the geometry of problems in BSP as well as the basic ideas of optimization techniques based on Lie groups. Optimization algorithms based on the properties of Lie groups are characterized by the fact that during optimization motion, they ensure permanent bonding with a search space. This property is extremely significant in terms of the stability and dynamics of optimization algorithms. The specific geometry of problems such as ICA and ISA along with the search space homogeneity enable the use of optimization techniques based on the properties of the Lie groups

O (n)

and

S O (n)

. An interesting idea is that of optimization motion in one-parameter commutative subalgebras and toral subalgebras that ensure low computational complexity and high-speed algorithms.

Keywords:

geometric optimization; Independent Component Analysis; independent subspace analysis; Lie groups; Lie algebra; toral subalgebra; sensors

1. Introduction

Blind signal processing (BSP) is currently one of the most attractive and fast-growing signal processing areas with solid theoretical foundations and many practical applications. BSP has become a vital research topic in many areas of application, particularly in biomedical engineering, medical imaging, speech and image recognition, communication systems, geophysics, economics, and data analysis. The term “blind processing” originates from the basic feature of these processing methods, i.e., the fact that there is no need to use any training data or a priori knowledge to obtain results. These methods include, among others, Independent Component Analysis (ICA), independent subspace analysis (ISA), sparse component analysis (SCA), nonnegative matrix factorization (NMF), singular value decomposition (SVD), principal component analysis (PCA) and minor component analysis (MCA) as well as the related eigenproblem and invariant subspace problem. Optimization problems of this kind often occur in the context of artificial neural networks, signal processing, pattern recognition, computer vision and numeric [1]. BSP is widely used in biomedical engineering, in technical diagnostics as well as in energy. The work [2] presents the use of SCA to analyze biomedical EEG and fMRI signals proving the effectiveness of this method in the detection of ocular artifacts. The use of SCA in technical diagnostics is presented in [3]. The three-dimensional geometric features-based SCA algorithm was used for compound faults diagnosis of roller bearing. A similar topic was discussed in [4] where NMF was used to extract error signals. The conducted experiments confirmed the effectiveness of these methods in extract the fault features and diagnosis for roller bearing. An interesting use of BSP techniques in energy issues is presented in [5]. Bayesian-optimized bidirectional Long Short-Term Memory (LSTM) method was used for energy disaggregation aiming to identify the individual contribution of appliances in the aggregate electricity load. The use of machine learning techniques as k-means clustering and Support Vector Machine for low-complexity energy disaggregation is presented in [6].

This paper primarily focuses on ICA and ISA problems, which does not, however, limit the applicability of the described methods to other types of problems. The scope of this paper is mainly limited to presenting the geometry of ICA and ISA problems and the application of Lie groups and Lie algebra without providing specific algorithms.

Standard Independent Component Analysis (ICA) consists of the linear transformation of multidimensional data such that the transformed signal components are as much statistically independent as possible. The effectiveness of ICA depends on the correct choice of a cost function and an optimization strategy. Most numerical optimization techniques assume that the model’s parameter space is a usual Euclidean space. In many cases, however, the parameter space has a non-linear structure with its unique non-Euclidean geometry. From a mathematical point of view, the space of search equipped with an inner product takes on the properties of Riemannian manifolds, often with desired algebraic properties [7].

The authors of works in this field take advantage of the specific internal geometry and algebraic properties of models such as the orthogonal group

O (n)

or the special orthogonal group

S O (n)

. Apart from the general group properties, these groups also have the structure of a smooth differential manifold, and thus acquire the Lie group properties and the corresponding Lie algebra. The application of this convenient structure to ICA algorithms is described in [8,9].

From the point of view of standard optimization techniques, in issues of this type one deals with the so-called constrained optimization. The problem of constrained optimization occurs in many issues related to signal processing. In the case of ICA, optimization of this kind consists of looking for extrema of the cost function on the set of matrices satisfying the condition of orthonormal columns (

W^{T} W = I

). However, with standard constrained algorithms one operates in Euclidean space, so in each iterative step the matrix orthogonality is lost. To restore the orthogonality condition, it is necessary to perform orthogonalization in each iterative step (e.g., by the well-known Gram-Schmidt orthogonalization process), which, however, reduces the convergence rate of the algorithms. Other algorithms use the Lagrangian method of optimization by the addition to the cost function of the so-called penalty function to prevent an orthogonality deviation. However, such algorithms are characterized by a low convergence rate and poor quality of the achieved optimum.

If there is a limitation in the form of matrix orthogonality, one can use an alternative method that ensures “locked” with the hyper-surface of orthogonal matrices during optimization motion. This method uses the group structure of a set of orthogonal square matrices which, apart from the properties of a smooth differential manifold, provides the set with the properties of a special structure known as a Lie group.

2. Model Definition (ICA, ISA)

Standard independent components analysis (ICA) consists of estimating a sequence of p statistically independent components (ICs)

s_{1}, \dots, s_{p}

and the mixing matrix A of dimension

n \times p

with only a sequence of n observed signals

x_{1}, \dots, x_{n}

. Giving the source signals and observed signals in the form of the source vector

s = {(s_{1}, \dots, s_{p})}^{T}

and the vector of observed (mixed) signals

x = {(x_{1}, \dots, x_{n})}^{T}

, where T stands for transposition, the standard linear ICA model takes the form (1):

x = A s

(1)

This assumes that there is no additional noise signal in the observed signal (Figure 1). The ICA model thereby formulated is characterized by a scale and permutation ambiguity, i.e., it is possible to scale (multiply by a given constant) of any source signals

s_{i}

and at the same time to divide the i-th column

a_{i}

of the mixing matrix

A

by this constant, while the observed signal

x

remains unchanged. The same phenomenon will occur at random transposition of any rows of the source vector

s

(permutation of the source vector

s

) and the same transposition of the columns of the mixing matrix

A

. It is customary to assume that the source signals have the unit variance (

C_{s} = E (s s^{T}) = I

). In non-negative ICA, it is additionally assumed that the source signals

s_{i}

satisfy the condition

s_{i} \geq 0

[10,11].

A solution for the ICA problem when

n = p

consists of finding the demixing (filtration) matrix

Q^{T} = A^{- 1}, Q \in G l (n)

where the filtration matrix

Q

belongs to a general linear group

G l (n)

of non- singular matrices

\det (Q) \neq 0

. Source signals are obtained via (2):

\hat{s} = Q^{T} x = Q^{T} A s

(2)

where

\hat{s}

is the estimator of a source vector

s

(it meets the statistical assumptions for a source signal).

To reduce the computation load in ICA, the pre-processing usually involves performing the whitening of the observed signal to obtain the signal

z = B x = B A s

, where

B

is the whitening matrix, with unit variance and the decorrelation

C_{z} = E (z z^{T}) = I

. Assuming that

C_{s} = I

, we get (3):

I = C_{z} = E (z z^{T}) = B A E (s s^{T}) {(B A)}^{T} = B A {(B A)}^{T}

(3)

From this it follows that

{(B A)}^{T} = {(B A)}^{- 1}

. Hence, the transformation from

s

to

z

takes place via an orthogonal matrix

B A

. Therefore, if

\hat{s} = W^{T} z = W^{T} B A s = U s

, then the matrix

U = W^{T} B A

must be an orthogonal matrix (permutation matrix), and thus a new filtering matrix

W

(after whitening) must also satisfy the orthogonality condition. The whitening of the observed signal therefore simplifies the ICA problem from optimization on the general linear group

G l (n)

(matrices

Q

only satisfying the invertibility condition

\det (Q) \neq 0

) to optimization on the special orthogonal group

S O (n)

(matrices

W

satisfying the orthogonality condition

W^{T} W = I

). Both groups are Lie groups at the same time.

Standard ICA is based on the assumption that

n = p

, i.e., the number of source signals

s_{i}

is known and equal to the number of observed signals

x_{i}

. ICA also yields interesting results in a more general case when the number of estimated source signals p is unknown. In this case, it can be

n \neq p

. When

n < p

, i.e., the number of observed signals is smaller than the number of source signals, the problem is known as over complete bases ICA, whereas when

n > p

it is called under complete bases ICA. This kind of problem can be formally considered to be unconstrained optimization on the Stiefel manifold [12,13]. It is also possible to solve ICA problems for the case

p = 1

. This type of problem is often called Single Channel Source Separation [14,15].

Hyvarinen and Hoyer introduced independent subspace analysis (ISA) [16] by omitting the statistical independence condition between extracted source components. The source vector

s

is composed in

d_{k}

-tuple

(k = 1, \dots, r)

, where for a given tuple a statistical dependence between its source signals

s_{i}

is allowed, while signals belonging to different tuples are statistically independent. When using the whitening process, the ISA problem boils down to finding orthogonal matrices

W^{T} W = I

as in standard ICA. However, due to the statistical relationship between the source signals in the tuple, ISA problem optimization cannot be performed on an ordinary Stiefel manifold. It is necessary to introduce a different, more universal manifold allowing for additional symmetries. This manifold is known as a flag manifold.

Traditionally, the ICA model assumes the statistical independence of extracted source signals. It turns out, however, that there are reasons to replace the orthonormality condition with the condition of source signal normality [17]. A precise definition of the ICA problem consists of finding a linear non-orthogonal transformation (of the coordinate system) of multidimensional data such that the transformed data have minimal mutual information. Hyvarinen [18] demonstrated (in an ICA problem) the differences between the use of cost functions based on mutual information and those based on the so-called non-Gaussianity. Achieving maximum de-correlation by maximizing the sum of non-Gaussianity of independent components (ICs) is not necessarily related to the minimization of mutual information (MI). In addition, the orthonormality condition leads to a smaller subset of matrices, which simplifies the optimization process yet may reduce its quality. Orthonormality imposes a greater limitation on the degrees of freedom than normality. In standard ICA, the orthonormality condition of

n \times n

filtering matrices reduces the number of the degrees of freedom to

(n - 1) / 2

, while the normality condition increases the number of free parameters to

n (n - 1)

, which considerably improves the quality of obtained results. A problem of this type can be formally considered to be the unconstrained optimization of an oblique manifold [19,20].

3. Geometry of ICA, ISA and Other BSP Models

The manifolds frequently arise from BSP tasks for a general do not have the group properties. Nevertheless, they are homogenous spaces of the Lie groups. A homogenous space

M

is a manifold on which the Lie group

G

acts transitively [21]. This property is fundamental for the considered manifolds because it enables analyzing them as quotient spaces. As mentioned in Section 2, the optimization problem in standard ICA (ISA) boils down to optimizing on the general linear group

G l (n)

(matrices

Q

only satisfy the invertibility condition

\det (Q) \neq 0

). The whitening of the observed signal simplifies the ICA problem to optimization on the special orthogonal group

S O (n)

(matrices

W

satisfy the orthogonality condition

W^{T} W = I_{n}

). In the case of an under complete problem

p < n

, i.e., when the number of extracted ICs is smaller than the number of observed signals, the set of filtering matrices can be treated as an orthogonal Stiefel manifold

S t (n, p)

defined as the set of orthonormal matrices of dimension

n \times p

with the form (4):

S t (n, p) = {W = (w_{1}, \dots, w_{p}) | W \in R^{n \times p}, W^{T} W = I_{p}, r a n k (W) = p}

(4)

which can be regarded as the quotient space arising from the orthogonal group.

Lie group

G = O (n)

acts transitively on the Stiefel manifold via (5):

O (n) \times S t (n, p) ∋ (Q, W) \to Q W \in S t (n, p)

(5)

where

Q \in O (n)

,

W \in S t (n, p)

. It is possible to demonstrate that for two given points

W_{1}, W_{2} \in S t (n, p)

there exists

Q \in O (n)

such that

W_{2} = Q W_{1}

. This means that starting from any point

W_{0} \in S t (n, p)

it is possible to reach any point

W \in S t (n, p)

by the

G

action. Resorting to group theory terminology, one can say that the entire manifold

S t (n, p)

is equivalent to the single orbit

G (W_{0})

of a given point

W_{0}

where

G (W_{0}) = {W = Q W_{0} | Q \in O (n), W_{0} \in S t (n, p)}

(6)

The point

W

on the manifold

S t (n, p)

can be expressed via a certain point

Q

on

O (n)

. The mapping

π : Q \to W

is surjective, i.e., many to one (projective mapping). Redundancy of this mapping is described by so-called the isotropy subgroup

H

of the point

W_{0}

. It is a set of matrices that do not change

W_{0}

H = {N \in O (n) | N W_{0} = W_{0}}

(7)

The isotropy subgroup

H \in O (n)

of the group

O (n)

of the point

W_{0} \in S t (n, p)

has the form (8):

H = (W_{0}, W_{0}_{⊥}) (\begin{matrix} I_{p} & 0 \\ 0 & O (n - p) \end{matrix}) {(W_{0}, W_{0}_{⊥})}^{T}

(8)

where

W_{0}_{⊥} \in S t (n, n - p)

is any

n \times (n - p)

matrix that satisfies the condition

(W_{0}, W_{0}_{⊥}) \in O (n)

. It is easy to check that the isotropy condition of point

W_{0}

is satisified (9):

H \cdot W_{0} = (W_{0}, W_{0}_{⊥}) (\begin{matrix} I_{p} & 0 \\ 0 & O (n - p) \end{matrix}) {(W_{0}, W_{0}_{⊥})}^{T} W_{0} = W_{0}

(9)

Choosing

W_{0} = (\begin{matrix} I_{p} \\ 0_{n - p, p} \end{matrix})

the isotropy subgroup

W_{0}

is a set

H = (\begin{matrix} I_{p} & 0 \\ 0 & O (n - p) \end{matrix})

.

In this shot two

n \times n

orthogonal matrices represent the same point of the Stiefel manifold if their first p columns are identical or equivalently, if they are related by right multiplication of a matrix of the form

(\begin{matrix} I_{p} & 0 \\ 0 & O (n - p) \end{matrix})

where

O (n - p)

is an orthogonal matrix group of dimension

(n - p) \times (n - p)

[1]. From a mathematical point of view, we say that such representations are in an equivalence relation. All matrices in an equivalence relation form what is called the equivalence class

[W]

. Thus, the point on the Stiefel manifold is the equivalence class

[W]

of

n \times n

orthogonal matriceswith identical first p columns, while the Stiefel manifold is a quotient space of the form (10):

S t (n, p) ≅ O (n) / O (n - p)

(10)

and specifically as

S t (n, p) ≅ O (n) / H

where

H = (\begin{matrix} I_{p} & 0 \\ 0 & O (n - p) \end{matrix})

. However,

H

is isomorphic to

O (n - p)

, i.e.,

H ≅ O (n - p)

, therefore

S t (n, p) ≅ O (n) / O (n - p)

.

There are many applications for the problem formulated as the finding of (zero) extreme of a given field defined in a non-Euclidean subspace of dimension

p

embedded in the Euclidean space

R^{n}

. This non-Euclidean subspace is known as the Grassmann manifold

G r (n, p; R)

[22,23]. Grassmann manifolds can be described as an equivalence class of

n \times p

orthogonal matrices spanning the same p-dimensional subspace

[W] = {W O (p) | W \in S t (n, p)}

. Therefore, from a theoretical point of view, the Grassmann manifold can be expressed as the quotient space

G r (n, p) ≅ S t (n, p) / O (p)

and given that

S t (n, p) ≅ O (n) / O (n - p)

, the Grassmann manifold

G r (n, p)

can also be seen as the quotient space

O (n) / O (p) \times O (n - p)

. In this case, the equivalence class

[W] = {W (\begin{matrix} O_{p} & 0 \\ 0 & O (n - p) \end{matrix}) | W \in O (n)}

is a set of square n-dimensional orthogonal matrices whose first p columns span the same p-dimensional subspace. Manifolds of this type are used, among others, in invariant subspace analysis, application-driven dimension reduction and subspace tracking [24,25].

When there is a need for a simultaneous (parallel) subspace extraction, as is the case in independent subspace analysis (ISA), one resorts to the concept of generalized flag manifold, which is a manifold consisting of orthogonal subspaces that constitutes a generalization of both Stiefel and Grassmann manifolds [26,27,28]. The generalized flag manifold

F l (n, d_{1}, \dots, d_{r}; R)

is defined as (11):

F l (n, d_{1}, \dots, d_{r}; R) = {W | W \in R^{n \times p}, W^{T} W = I_{p},}

(11)

where the orthogonal matrix

W

takes the form (12):

W = [W_{1}, \dots, W_{r}], W_{i} = [w_{1}^{i}, \dots, w_{d_{i}}^{i}],

(12)

where

w_{k}^{i} \in R^{n}, k = 1, \dots, d_{r}

for a specified

i = 1, \dots, r

is a set of orthogonal bases that span subspaces

V_{i}

. The subspaces

V_{i}

are orthogonal relative to each other and satisfy the condition (13):

V = V_{1} \oplus V_{2} \oplus \dots \oplus V_{r} \subset R^{n \times p}

(13)

Points on the flag manifold are a set of vector spaces

V

which can be decomposed as (13). If all

d_{i} (1 \leq i \leq r) = 1

, the manifold

F l (n, d_{1}, \dots, d_{r}; R)

is reduced to the Stiefel manifold

S t (n, p)

. If

r = 1

, it is reduced to the Grassmann manifold

G r (n, p)

. It is abbreviated as

F l (n, d)

where

d = (d_{1}, \dots, d_{r})

. The orthogonal group

O (n)

also acts transitively on the manifold

F l (n, d)

via simple matrix multiplication (14):

O (n) \times F l (n, d) ∋ (Q, W) \to Q W \in F l (n, d)

(14)

The isotropy subgroup

H \in O (n)

of the group

O (n)

of the point

W \in F l (n, d)

has the form (15):

H = (W, W_{⊥}) d i a g (R_{1}, \dots, R_{r}, R_{r + 1}) {(W, W_{⊥})}^{T}

(15)

where

d i a g (R_{1}, \dots, R_{r}, R_{r + 1})

is a block-diagonal matrix of the form

(\begin{matrix} R_{1} & 0 \\ ⋱ & ⋮ \\ ⋮ & R_{r} \\ 0 & R_{r + 1} \end{matrix})

,

R_{k} \in O (d_{k}), (1 \leq k \leq r), R_{r + 1} \in O (n - p)

,

W_{⊥} \in S t (n, n - p)

is any

n \times (n - p)

matrix that satisfies the condition

[W, W_{⊥}] \in O (n)

. It is easy to check that the isotropy condition of point

W

is satisified (16):

\begin{array}{c} H \cdot W = [W] (W, W_{⊥}) d i a g (R_{1}, \dots, R_{r}, R_{r + 1}) {(W, W_{⊥})}^{T} \cdot W = (W R, W_{⊥} R_{r + 1}) (W^{T} W_{⊥}^{T}) W \\ = (W R W^{T} + W_{⊥} R_{r + 1} W_{⊥}^{T}) W = W R = [W] \end{array}

(16)

where

R = d i a g (R_{1}, \dots, R_{r})

,

W R = [W] = d i a g (W_{i} R_{i}), i = 1, \dots, r

is an equivalence class of the point on

F l (n, d)

. This means that any two matrices

W_{1}

and

W_{2}

satisfying the condition

W_{2} = W_{1} R = (W_{1}, \dots, W_{r}) d i a g (R_{1}, \dots, R_{r}) = d i a g (W_{1} R_{1}, W_{2} R_{2}, \dots, W_{r} R_{r})

are identified with the very same point on the manifold

F l (n, d)

. Given the above,

F l (n, d) ≅ O (n) / O (d_{1}) \times \dots \times O (d_{r}) \times O (n - p)

(17)

As it was already mentioned, the manifold

F l (n, d)

is locally isomorphic to

S t (n, p)

as a homogenous space when all

d_{i} (1 \leq i \leq r) = 1

and to the manifold

G r (n, p)

when

r = 1

.

In terms of optimization, the homogeneity of the considered differential manifolds enables the search (optimization motion) in the group

O (n)

or

S O (n)

, and the use of optimization techniques that are well known and adapted to these types of groups. Section 4 presents the basic ideas of optimization methods used in

S O (n)

and the concept of toral subalgebra that is characteristic of problems of this type.

4. Lie Group Optimization Methods. One-Parameter Subalgebra and Toral Subalgebra

The idea of a standard optimization procedure based on the Lie groups consists of performing the optimization motion in the Lie algebra space and then mapping exp to find a solution in the Lie group (manifold). The optimization motion in the group

S O (n)

starting from the point (matrix)

W_{0}

therefore consists of, first, the transition to the Lie algebra

Ω = \log W \in s o (n)

via mapping inversely to the exponentiation

\log : = \exp^{- 1}

of motionin the Lie algebra (performing an operation of addition (of matrices) in the abelian group) in order to obtain a new antisymmetric matrix

Ω^{'} \in s o (n)

and, finally, returning to the Lie group via exponential mapping

W^{'} = \exp Ω^{'} \in S O (n)

. A simple update method using line search procedure relies on finding the search direction in Lie algebra

s o (n)

calculating the gradient of cost function

J

in Lie algebra space. This gradient must be skew-symmetric (see Appendix A) so (18) [9]:

\nabla_{A} J = (\nabla_{w} J) W^{T} - W (\nabla_{w} J)^{T}

(18)

Applying the steepest descent procedure with small constant update factor

μ

we start from

A = 0_{n} \in s o (n)

, move to

B = - μ \nabla_{A} J

, map to

R = \exp (B) \in S O (n)

and finally perform rotating (multiplicative) update

W_{k + 1} = \exp (- μ \nabla_{A} J) W_{k}

. This kind of optimization method is called a geodesic flow method [9].

At this point it is necessary to comment on motion in the Lie algebra. In our context, the addition of vectors in the Lie algebra

s o (n)

can only be useful if it is matched by multiplication in the Lie group

S O (n)

. Then one can write (19):

\exp (A) \exp (B) = \exp (B) \exp (A) = \exp (A + B)

(19)

As was already mentioned, this equation holds true only when the matrices

A

and

B

commutate

[A, B] = 0

. This condition is satisfied for all matrices with

s o (2)

. When

n \geq 3

, this condition is not satisfied for all matrices in the algebra. When the matrices not commutate (non-abelian Lie algebra), Equation (19) is not satisfied and optimization motion in the Lie algebra (sum A+B) in a direction of e.g., the cost function gradient will not be reflected in the Lie group

\exp (A + B) \neq \exp (A) \exp (B)

. However, taking

\exp (A) = I_{n}

, which is tantamount to selecting an initial matrix

A_{0} = 0_{n}

, this condition will always be satisfied. In this case,

[A_{0}, B] = 0

, and Equation (19) is satisfied too. This is tantamount to motion in the one-parameter Lie algebra. By selecting

A = t Ω

for a random antisymmetric matrix

Ω \in s o (n)

and a scalar

\in R

, all matrices of this form commutate with each other (

A = t_{1} Ω, B = t_{2} Ω : [A, B] = t_{1} Ω t_{2} Ω - t_{2} Ω t_{1} Ω = 0

). A set of such matrices

s o_{Ω} (n) = {t Ω | Ω \in s o (n), t \in R}

is in itself a Lie algebra known as a one-parameter subalgebra of the Lie algebra

s o (n)

. The subalgebra

s o_{Ω} (n)

is an abelian (commutative) algebra related to the one-parameter subgroup

R (t) = \exp (t Ω)

. Optimization motion in the subalgebra

s o_{Ω} (n)

is therefore an equivalent (generalization) to the idea of linear motion in Euclidean space. In this case, the optimization procedure consists of searching for a minimum of the cost function along the subalgebra

s o_{Ω_{1}} (n)

(for a chosen search direction

Ω_{1}

), which corresponds to the search along the subgroup

R (t)

.

Having found the cost function minimum

(R (t) W_{0})

, where

W_{0}

is a starting point, a new direction of linear searches

Ω_{2}

is selected, and the procedure is repeated until the desired convergence is achieved. Plumblay [8] proposed a modification of the standard procedure described above. This modification consists of moving the point of “origin” of the Lie algebra from a neutral element of the group to point

W

. Due to the group properties

S O (n)

, it can be written that

W^{'} = R W

for some matrix

R \in S O (n)

. Moving from the matrix

W = I_{n} W

to

W^{'} = R W

is therefore equivalent to moving from the identity matrix

I_{n}

to the matrix

R

. This procedure consists of moving from the matrix

0_{n} = \log I_{n} \in s o (n)

to

Ω = \log R \in s o (n)

in the Lie algebra and then returning to the group

S O (n)

via the exponential mapping

R = \exp Ω

and, finally, determining

W^{'} = R W = (\exp Ω) W \in S O (n)

. This is equivalent to the concept of optimization motion in the one-parameter abelian subalgebra described above.

The above optimization procedures are computationally expensive due to the necessity of performing (computationally expensive) matrix exponentiation in every iterative step. The representation of antisymmetric matrices in the Jordan canonical form enables the decomposition of optimization movement in the group

S O (n)

to commutative rotations in orthogonal planes. Every antisymmetric matrix

Ω

can be presented in a block-diagonal form (for

2 m \leq n)

(20):

Ω = Q d i a g (Φ_{1}, \dots, Φ_{m}, 0, \dots, 0) Q^{t}

(20)

where

d i a g (Φ_{1}, \dots, Φ_{m}, 0, \dots, 0)

is a block-diagonal matrix of the form

(\begin{matrix} Φ_{1} & \dots & 0 \\ ⋱ \\ Φ_{m} & ⋮ \\ ⋮ & 0 \\ ⋱ \\ 0 & 0 \end{matrix})

,

Q \in S O (n)

,

Φ_{i} = (\begin{matrix} 0 & φ_{i} \\ - φ_{i} & 0 \end{matrix})

denotes the

2 \times 2

dimensional antisymmetric matrices [29]. This form is known as the Jordan canonical form. Since the relationship

\exp (Q^{T} Ω Q) = Q^{T} \exp (Ω) Q

holds true, the matrix

Ω

can be decomposed into a sum of the form

Ω = Ω_{1} + \dots + Ω_{m}

where

Ω_{i}

is the matrix only containing the i-th Jordan matrix

Φ_{i}

and zeros beyond it (21):

Ω = Q d i a g (Φ_{1}, 0, \dots, 0) Q^{t} + \dots + Q d i a g (0, \dots, Φ_{m}, 0, \dots, 0) Q^{t}

(21)

The exponentiation of thereby presented matrix

Ω

yields an orthogonal matrix

W

of the form (22):

W = \exp Ω = Q d i a g (R_{1}, \dots, R_{m}, 1, \dots, 1) Q^{t}

(22)

where

R_{i} = (\begin{matrix} c o s φ_{i} & s i n φ_{i} \\ - s i n φ_{i} & c o s φ_{i} \end{matrix})

are the

2 \times 2

dimensional rotation matrices. The matrix

W

can be decomposed into a product of the matrix

W = W_{1} \dots W_{m}

where

W_{i}

has the form (23):

W_{i} = {\exp Ω}_{i} = Q d i a g (1, \dots, 1, R_{i}, 1, \dots, 1) Q^{t}

(23)

One can notice that the exponentiation of the matrix

Ω

in the Jordan form is reduced to a simple and inexpensive calculation of the functions

s i n φ_{i}

and

c o s φ_{i}

, which significantly increases the speed of optimization algorithms. The Jordan canonical form of antisymmetric matrix can be obtained via symmetric eigenvalue decomposition [29]. It can be observed that the antisymmetric matrix

Ω

commutates with the symmetric matrix

Ω^{2} = - Ω^{T} Ω

, which means that

Ω

and

Ω^{T} Ω

have the same eigenvectors and eigenvalues. The eigenvalues

Ω^{T} Ω

occur in pairs corresponding to individual Jordan matrices

Φ_{i}

.

This form can be visualized as compounding rotations (represented by

W_{i}

) in mutually orthogonal planes. In addition, the rotation matrices

W_{i}

commutate

[W_{i}, W_{j}] = 0

. The commutation property of the rotation matrix

W_{i}

provides the possibility of using the optimization procedure on

S O (n)

, moving in the Lie algebra

s o (n)

.

The case of

S O (4)

is interesting from a geometrical point of view. The Jordan canonical form of the antisymmetric matrix

Ω

contains two blocks (matrices)

Φ_{i}

:

Ω = Q (\begin{matrix} \begin{matrix} 0 & φ_{1} \\ - φ_{1} & 0 \end{matrix} & \begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix} \\ \begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix} & \begin{matrix} 0 & φ_{2} \\ - φ_{2} & 0 \end{matrix} \end{matrix}) Q^{t}

(24)

Here, the orthogonal matrix

W

takes the form (25):

W = \exp Ω = Q (\begin{matrix} \begin{matrix} c o s φ_{1} & s i n φ_{1} \\ - s i n φ_{1} & c o s φ_{1} \end{matrix} & \begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix} \\ \begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix} & \begin{matrix} c o s φ_{2} & s i n φ_{2} \\ - s i n φ_{2} & c o s φ_{2} \end{matrix} \end{matrix}) Q^{t}

(25)

A visual representation of this case shows rotations in two mutually orthogonal planes, which corresponds to toral geometry (Figure 2).

From the point of view of optimization procedures, the rotation angles

φ_{1}

and

φ_{2}

should not be free parameters (independent of each other). For the procedure to make sense, the curve over which the search is carried out after a complete rotation (or its multiple) relative to one of the planes of rotation should return to the starting point on the toral surface (Figure 2). This is possible when for some

t

, the relationships

t φ_{1} = 2 k_{1} π

and

t φ_{2} = 2 k_{2} π

are satisfied for the integers

k_{1}

and

k_{2}

. Therefore, the angle of rotation should be described by the relationship

\frac{φ_{1}}{φ_{2}} = k_{1} / k_{2}

or

φ_{1} = a φ_{2}

, where

a = k_{1} / k_{2}

is a rational number. This concept is naturally transferred to a general case of

S O (n)

for

n > 4

. The Jordan canonical form represents optimization motion in one-parameter Lie subalgebra

R (t) = \exp (t Ω) \in s o_{Ω} (n)

as a rotation in

p

mutually orthogonal planes, and these rotations are commutative. The geometry of 2-dimensional torus for

S O (4)

can also be generalized to the geometry of the

p

-dimensional torus in

S O (n)

where

n = 2 p

for even

n

or

n = 2 p + 1

for odd

n

. This perception of motion in

S O (n)

leads to the concept of toral subalgebra

t (p) \subset s o (n)

. If we consider a general case of motion on the surface of a

p

-dimensional torus where the angles of rotation

φ_{i}

are not interrelated by the above relationship, and individual independent planes of rotation are represented by a set

p

of commuting matrices

Ω_{i}

(

Ω = \sum_{i} Ω_{i}

). The motion (or rather rotation) on each of the independent planes of rotation can be expressed in the form of a parameterized curve

B_{i} = t_{i} Ω_{i}

, or actually via its simple exponentiation

W_{i} = \exp (t_{i} Ω_{i})

. The set of independent parameters

t_{i}

that can be identified with the angles of rotation

φ_{i}

forms a coordinate system on the toral subalgebra

t (p)

. Compared to the original

n

–dimensional search space

S O (n)

, the toral subalgebra

t (p)

is, however, an abelian algebra, which means that motion in this search space is commutative. This ensures the possibility of motion in all directions specified by the coordinates

t_{i}

and their sum in the form

B = t_{1} Ω_{1} + \dots + t_{p} Ω_{p}

will be reflected in the composition of rotations

W = \exp B = \exp (t_{1} Ω_{1}) \cdot \dots \cdot \exp (t_{p} Ω_{p})

. The optimization procedure based on such a concept consists of decomposing to canonical form a specific antisymmetric matrix

B \in s o (n)

(this can be, for example, a cost function gradient as in the method of steepest descent), and thereby formulating a toral subalgebra. Since the orthogonal matrix

Q

in the Jordan decomposition (24) is constant, the transition to a new point

W

in the search space is done by determining

p

values of the

s i n

and

c o s

functions corresponding to

p

planes of rotation. After finding in the subalgebra the point that minimizes the cost function, a new antisymmetric matrix

B^{'} \in s o (n)

is calculated and again presented in the Jordan canonical form, which establishes a new toral subalgebra. The procedure is repeated until the set minimum cost function is reached. A different problem concerns the determination of the direction of search

B \in s o (n)

and the manner of search along the subalgebra. The selection of directions and the manner of searches depend on the adopted optimization procedure. It can be the steepest descent (SD) method and, in general, geodesic flow, Newton’s method or conjugate gradients. This problem has been extensively studied in [12,22,30].

5. Experimental Results

To illustrate the presented optimization methods on Lie groups, we will first present a rather simply simulation experiment. The purpose of this example is to show how different algorithms work on optimization problems with unitarity constraint. To this end, let us consider the Lie group of complex numbers with the unit module

U (1)

, which is isomorphic with the

S O (2)

group. The unitarity constraint of elements of this group is a unit circle on the complex plane. The cost function we will minimize will be

J (z) = {| z + 0.3 |}^{2}

with constraint

z z^{*} = 1

. For optimization, we will use five types of steepest descent (SD) algorithms:

(1): algorithm SD unconstrained on the Euclidean space,
(2): algorithm SD on the Euclidean space with constraint restoration,
(3): algorithm SD on the Euclidean space with penalty function,
(4): non-geodesic algorithm SD on Riemannian space,
(5): geodesic algorithm SD on Riemannian space.

In Algorithm (1) update rule has the form

z_{k + 1} = z_{k} - μ (z_{k} + 0.3)

where

μ

is the step size. The quantity

\frac{\partial J}{\partial z^{*}} = (z_{k} + 0.3)

is the gradient (The gradient of the function defined in the complex space has the form [31]:

\frac{\partial J}{\partial z^{*}} = \frac{1}{2} (\frac{\partial J}{\partial Re (z)} + \frac{\partial J}{\partial Im (z)})

where

Re (z)

and

Im (z)

is respectively the real and imaginary part of a complex number

z

.) of the cost function

J

on the Euclidean space. Algorithm (2) uses the same update rule, but after each iteration the unitarity condition is restored in the form

z_{k + 1} = \frac{z_{k + 1}}{| z_{k + 1} |}

. In Algorithm (3) we used the Lagrange multiplier method. The penalty function of the form

{({| z_{k} |}^{2} - 1)}^{2}

weighted by a Lagrange parameter

λ

has been added to the initial cost function in order to penalize deviations from unitarity. In this case, update rule is

z_{k + 1} = z_{k} - μ [(z_{k} + 0.3) + λ z_{k} {({| z_{k} |}^{2} - 1)}^{2}]

. In the case of (4), the algorithm works in the Riemannian space (unit circle) determined by the condition

z z^{*} = 1

. At each point of

z_{k}

the algorithm determines the search direction tangent to the unit circle and after each iteration the obtained point is projected back to the unit circle. In this case, update rule has the form

z_{k + 1} = π (z_{k} - μ (\frac{\partial J}{\partial z^{*}} - 〈 z, \frac{\partial J}{\partial z^{*}} 〉 z)) = π (z_{k} - μ [z_{k} (1 - {| z_{k} |}^{2}) - 0.3 (z_{k}^{2} - 1)])

where

π

is the projection operator on the unit circle. In Algorithm (5) we used a multiplicative optimization algorithm on the Lie group described in Section 4. In this case, the update rule has the form

z_{k + 1} = \exp (0.6 μ i Im (z_{k})) z_{k}

where

Im (z_{k})

is imaginary part of

z_{k}

. The starting point of each algorithm is

z_{0} = \exp (i π / 4)

. Figure 3 shows the results of the simulation.

In the point

z_{(1)} = - 0.3

the cost function reaches its minimum

J (z_{(1)}) = 0

. However, this is the undesirable minimum determined in the Euclidean space by Algorithm (1) not taking into account the unitarity constraint (Figure 3). The minimum considering this constraint is at the point

z_{m i n} = - 1

and the value of the cost function reaches its minimum over the Riemannian space, i.e., on the unit circle

J (z_{m i n}) = 0.49

. Unconstrained SD Algorithm (1) and with penalty function (3) achieve undesirable minimums in points respectively from

z_{(1)} = - 0.3

and from

z_{(3)} = - 0.5

while Algorithms (2) (4) and (5) minimum appropriate

z_{m i n} = - 1

. In the case of Algorithms (2) and (4), a characteristic “zig-zag” is associated with lowering the constraint surface and undesirable from the point of view of optimizing properties. Algorithm (4) determines the SD direction tangent to the constraint condition, thus leaving the unit circle in the optimization motion. The resulting point is again projected into a unit circle. Algorithm (5) using the multiplicative update rule on the Lie group (phase rotation) described in this article naturally ensures the condition of unitarity at each step. The optimization movement takes place at each step along the geodesic line. This simple one-dimensional example is only intended to present the idea of algorithms on Lie groups. The following is an example of using these methods on a real signal. As an example of the practical application of optimization methods on Lie groups, we will present a solution to the ICA problem. As the source signals, three speech recordings and a quasi-noisy signal (harmonic signal with high noise content) (Figure 4) with a length of 5000 samples (1.25 s) were used. The source signals were mixed using a four-by-four random mixing matrix. The four observed signals are shown in Figure 4. A

S O (4)

group optimization algorithm was used to implement ICA.

For comparison, the INFOMAX algorithm in its original form was also used [32]. Based on visual inspection and listening to the separated components, it can be concluded that ICA results using the INFOMAX algorithm and optimization on the

S O (4)

group are good with scale and permutation accuracy. The INFOMAX algorithm with the assumed convergence criterion converges after about 30–40 steps, while the algorithm on the

S O (4)

group after about 20 steps. Figure 5 shows the sum of entropy values of separated components depending on the iteration number.

The optimization algorithm on the

S O (4)

group converges to

E (Y) = 0.11

(The entropy value was determined according to an approximate relationship [32]:

E (Y) = - \sum_{i} E {\sum_{n} \tan h (y_{i})} + \log (\det (W)))

while the INFOMAX algorithm to

E (Y) = 0.084

. Listening to the results and comparison with the sources confirms the better ICA separation results obtained by the

S O (4)

group optimization algorithm.

6. Conclusions

This paper described the application of the Lie group methods for blind signal processing, including ICA and ISA. Theoretical fundamentals of the Lie groups and the Lie algebra as well as the geometry of problems occurring in BSP and basic optimization techniques based on the use of Lie groups are presented. Owing to the specific geometry and algebraic properties of BSP problems, it is possible to use Lie group methods to solve these problems. The homogeneity of search space (parameters) in BSP problems enables the use of optimization techniques based on the Lie group methods for the groups

O (n)

and

S O (n)

. It has been demonstrated that the one-parameter subalgebra

s o_{Ω} (n)

ensures the convenient property of commutating search directions. In addition, the presentation of an antisymmetric matrix (search direction) in the Jordan canonical form establishes the toral subalgebra

t (p) \subset s o (n)

, which—in terms of optimization algorithms—ensures low computational complexity and high process dynamics.

Author Contributions

Conceptualization, J.J., D.M.; Methodology, D.M.; Software, D.M., Validation, J.J., Preparation, D.M.; Writing-Review and Editing, J.J., Visualization, D.M.; Supervision, J.J.; Funding Acquisition, J.J.; Formal Analysis, J.J.; Investigation, J.J.; Resources, D.M. All authors provided critical feedback and collaborated in the research. All authors have read and agreed to the published version of the manuscript.

Funding

The project/research was financed in the framework of the project Lublin University of Technology—Regional Excellence Initiative, funded by the Polish Ministry of Science and Higher Education (contract no. 030/RID/2018/19).

Conflicts of Interest

The authors declare no conflict of interest

Appendix A. Lie Group and Lie Algebra

A characteristic of Lie groups is optimization motion that always leaves the orthogonality condition intact (the system constantly remains on the constraint surface). Before providing a formal definition of Lie groups, we will give an example to illustrate this concept. Let us consider a set of complex numbers of modulus 1 in

r - φ

notation:

G = {z \in C | | z | = | e^{i φ} | = 1, φ \in R}

. If one of these numbers

a = e^{i φ}

is multiplied by a number

b = e^{i ω}

, we get

a b = e^{i (φ + ω)}

, which is also several modulus 1, i.e., it belongs to a set

Z

(Figure A1). On the other hand, if we want to reach any number

c = e^{i τ}

of modulus 1 starting from the number

a = e^{i φ}

, it is enough to multiply

a

by

b = e^{i ω}

such that

ω = τ - φ

.

Figure A1. Complex value (a) a in

r - φ

notation, and (b) product of a and b.

Figure A1. Complex value (a) a in

r - φ

notation, and (b) product of a and b.

Thus, it can be observed that a group is formed by the set of complex numbers

Z

of modulus 1 combined with a multiplication operation. As a reminder, the group

G

is formed by a set of elements and a group operation

(\cdot)

(colloquially known as multiplication) with the following properties:

Closure under group operation: if $a, b \in G$ then $a \cdot b \in G$
Associativity: $a \cdot (b \cdot c) = (a \cdot b) \cdot c$
There exists a neutral element $I$ and an inverse element $z^{- 1} \in G$ for every element of the group, such that: $I \cdot z = z \cdot I = z . z \cdot z^{- 1} = z^{- 1} \cdot z = I$

It can easily be verified that the set

Z

satisfies these conditions with a group operation in the form of multiplication of complex numbers, the neutral element

I = 1 = e^{i 0}

and the inverse

z^{- 1} = e^{- i φ}

of any element of

z = e^{i φ}

. Moreover, since the multiplication operation in the set

G

is commutative, this set forms a commutative group, also known as an abelian group. In addition, the group

G

is smooth, i.e., differentiable. We can perform an infinitesimally small movement in the group by going from one number to another, these numbers differing from each other by an infinitesimally low value. In fact, the group

G

locally “looks” like a section of the real line

R

. This means that the motion in the group

G

can be described with one parameter, e.g., the angle

φ

, and thus this group with the coordinate system of

φ

creates a differential manifold. The group which is a smooth differential manifold is called the Lie group. The Lie group combines algebraic properties of the groups possessing the properties of a smooth differential manifold. As a result, we can, for example, consider a tangent space

T_{x} G

of the Lie group

G

at point

x

. In a special case when

x = I

, the tangent space in an identity element of the group

T_{I} G

additionally provided with an elementary operation known as the Lie bracket is called the Lie algebra

g

. The group operation (multiplication) by a constant element of the group

Q

creates a differentiable mapping of the group

G

to itself, i.e., a diffeomorphism known as left translation (A1):

L_{Q} : G \to G : W \to Q W; Q, W \in G

(A1)

and right translation (A2):

R_{Q} : G \to G : W \to W Q; Q, W \in G

(A2)

and tangential mappings induced by them (A3) and (A4):

d L_{Q} : T_{W} G \to T_{Q W} G : Z \to Q Z; Z \in T_{W} G, Q \in G

(A3)

d R_{Q} : T_{W} G \to T_{W Q} G : Z \to Z Q; Z \in T_{W} G, Q \in G

(A4)

A characteristic of the Lie group is that each of its elements

W

can be “moved” by left or right translation to a “convenient” neighborhood of the identity element, and the same applies to every tangent space

T_{W} G

:

L_{W^{- 1}} (W) = W^{- 1} W = I : R_{W^{- 1}} (W) = W W^{- 1} = I

(A5)

d L_{W^{- 1}} (T_{W} G) = {W^{- 1} Z = X \in T_{I} G = g} : d R_{W^{- 1}} (T_{W} G) = {Z W^{- 1} = X \in T_{I} G = g}

(A6)

Equation (A6) can also be written in a reversed form (A7):

T_{W} G = d L_{W} (T_{I} G) = {W X; X \in T_{I} G = g} : T_{W} G = d R_{W} (T_{I} G) = {W X; X \in T_{I} G = g}

(A7)

From the above it can be seen that the tangent space

T_{W} G

via the tangent mapping

d L_{W^{- 1}}

or

d R_{W^{- 1}}

is transferred to the Lie algebra

g

.

From the above one can draw an important conclusion. If the structure of a Lie algebra

g

is known, it can be used to conveniently parameterize the neighbourhood of an identity element of the group

G

via the application of a suitable homeomorphism. Such homeomorphism is widely known as the exponential map or, in short, exponentiation, and is denoted as

\exp : g \to G

. It should be noted that “exp” is here only a symbolic denotation and even in the case of Lie matrix groups it does not necessarily mean a matrix exponentiation operation.

A set of 2-dimensional orthogonal matrices

W^{T} W = I_{2}

forms a Lie group as is done by the set of complex numbers of modulus 1. It is also an abelian (commutative) group and has the same algebraic properties as the set of complex numbers of modulus 1. In general, square n-dimensional orthogonal matrices forming the group

O (n)

can be presented in a block-diagonal form with diagonal elements as

2 \times 2

orthogonal matrices. This form is known as the Jordan canonical form of orthogonal matrix [23]. Understanding the “behaviour” of the Lie groups over the complex numbers of modulus 1 and thus

2 \times 2

orthogonal matrices is therefore crucial for analysis of a general case. The group

O (n)

is also a Lie group composed of two disjoint parts (subsets): the orthogonal matrices with determinant 1 and the matrices with determinant −1. For example, for the group

O (2)

, these two components can be shown in the form (A8):

W = (\begin{matrix} c o s φ & s i n φ \\ - s i n φ & c o s φ \end{matrix}) W^{'} = (\begin{matrix} c o s φ & - s i n φ \\ - s i n φ & - c o s φ \end{matrix})

(A8)

d e t W = 1 d e t W^{'} = - 1

for any real number

φ

that can be a Cartesian coordinate system identified with the angle of rotation. As one cannotice, both in the first and the second part, the transition from one element to another takes place (smoothly) by changing the parameter

φ

. The transition between the matrices of both parts is only possible via multiplying the matrices belonging to both parts, which is tantamount to permutation of the matrices. The transposition of the matrix columns or rows in any part will lead to a change in the sign of the determinant. This occurs as a transition between the parts. However, a smooth transition between the parts is not possible. Hence, these components (parts) are called disjoint. The subset of matrices with determinant 1 is called a special orthogonal group

S O (n)

. It is a subgroup

O (n)

and constitutes an associative Lie group (it is possible to move between elements of the subgroup smoothly by performing multiplication in the subgroup).

In ICA problems, due to permutation ambiguity (Section 2), optima of the cost function must be present in both subgroups

O (n)

. We will therefore limit our considerations to the (associative) subgroup

S O (n)

considerations, which will not diminish the general nature of the considerations. The multiplication of the matrices belonging to

S O (2)

:

W V = (\begin{matrix} c o s φ & s i n φ \\ - s i n φ & c o s φ \end{matrix}) (\begin{matrix} c o s θ & s i n θ \\ - s i n θ & c o s θ \end{matrix}) = (\begin{matrix} \cos (φ + θ) & s i n (φ + θ) \\ - s i n (φ + θ) & c o s (φ + θ) \end{matrix})

(A9)

This involves, as one can observe, adding the angles

φ

and

θ

. The above also demonstrates that the multiplication operation in the group

S O (2)

is commutative. The group

S O (2)

is also known as the rotation group because its operation over any vector can be associated with a rotation on the plane

R^{2}

. Moreover, the group

S O (2)

—like the group of complex numbers of modulus 1 can be parameterized by means of one parameter—the angle

0 \leq φ < 2 π

. Groups that “operate” in the same way (have the same algebraic properties) are called isomorphic. From the point of view of optimization techniques, the key question is how to determine a derivative in the Lie group

G

. Basing on the previously considered Lie group of complex numbers of modulus 1, let us consider the curve

z (t) = e^{i φ (t)}

in the group

G

parameterized by

t

(the

t

parameter can be identified as time). Assuming that

φ (t) = ω t

, the derivative of the curve

z (t)

with respect to

t

has the form (A10):

\frac{d z}{d t} = \frac{d}{d t} (e^{i φ (t)}) = i (\frac{d φ}{d t}) e^{i φ (t)} = i ω e^{i φ (t)} = i ω z

(A10)

As one can observe, the derivative

z (t)

is proportional to

i z = e^{i \frac{π}{2}} z = e^{i (φ (t) + \frac{π}{2})}

, which means that it is perpendicular to

z

. This result is analogous to the relationship between the vector of velocity and the radiusin circular motion. Assuming that

ω

is an angular velocity and

z

is a radius, the circular motion can be described in an identical manner. The velocity vector of length

ω z

is tangent to the trajectory, i.e., it is perpendicular to the radius of the circle. A similar relationship can be obtained by differentiating the group

S O (2)

. Likewise, assuming that

φ = ω t

and differentiating

W = (\begin{matrix} c o s φ & s i n φ \\ - s i n φ & c o s φ \end{matrix})

relative to

t

, we get (A11):

\frac{d W}{d t} = \frac{d}{d t} (\begin{matrix} c o s φ & s i n φ \\ - s i n φ & c o s φ \end{matrix}) = (\begin{matrix} - s i n φ & c o s φ \\ - c o s φ & - s i n φ \end{matrix}) ω = (\begin{matrix} 0 & ω \\ - ω & 0 \end{matrix}) W = X W

(A11)

One can notice that similarly to the group of complex numbers of modulus 1, the derivative in the group

S O (2)

is proportional to an element of the group where the derivative

d W / d t ~ W

is determined. It is easy to check that this derivative is tangent at point

W

to the constraint surface

t r ({(\frac{d W}{d t})}^{T} W) = t r X = 0

. This pattern is also true for the general case

S O (n)

. Differential equations with the structure

\frac{d A}{d t} = B (t) A (t)

, where

A (t) \in G

belongs to the Lie group and

B (t) \in g

belongs to the Lie algebra, are known as differential equations of the Lie groups. As with the solution of the scalar differential equation

\frac{d z}{d t} = a z ⟹ z (t) = e^{a t}

, the solution of the differential Equation (A11) is a matrix

W (t) = \exp (t X)

, where

\exp (.)

is a matrix exponentiation operation given by the general formula (Maclaurin series) (A12):

\exp (t X) = I + t X + \frac{t^{2} X^{2}}{2!} + \dots + \frac{t^{k} X^{k}}{k!} + \dots,

(A12)

It can easily be verified that

W (t)

satisfies Equation (A11):

\frac{d W}{d t} = \frac{d}{d t} \exp (t X) = X \exp (t X) = X W

The space tangent to

S O (2)

at point

W

is given by the matrices, where

X = - X^{T} = (\begin{matrix} 0 & ω \\ - ω & 0 \end{matrix})

is an antisymmetric matrix. As one can observe, with any antisymmetric matrix

X \neq 0

, every element of the group

S O (2)

can be made precise by defining a single parameter

t

as (A13):

W (t) = \exp (t X) = \exp (\begin{matrix} 0 & t ω \\ - t ω & 0 \end{matrix}) = \exp (Ω)

(A13)

where

Ω = - Ω^{T} = (\begin{matrix} 0 & t ω \\ - t ω & 0 \end{matrix})

is an antisymmetric matrix.

From the point of view of the optimization process, it is a very convenient property. Instead of searching in the space of orthogonal matrices, it is enough to search in the space of the angles

φ

or, alternatively, in the space of antisymmetric matrices

Ω

. The space tangent to

S O (n)

in the identity (neutral) element

I_{n}

is therefore determined by the set of antisymmetric matrices

Ω = - Ω^{T}

. According to previous general considerations, this space is called a Lie algebra

s o (n)

of the Lie group

S O (n)

. Homeomorphism is determined by the matrix exponential operation,

Ω \in s o (n) \to \exp (Ω) \in S O (n)

(or

\exp : s o (n) \to S O (n)

). As mentioned above, from the point of view of the optimization process, it is convenient to navigate in the Lie algebra because it is a vector space, so the addition of elements and multiplication by scalars are allowed. The result of such operations will still belong to the Lie algebra. The idea presented above for the group

S O (2)

is generalized to

S O (n)

. However, for

n \geq 3

, the matrix multiplication is not commutative:

\exp (A) \exp (B) \neq \exp (B) \exp (A)

. From the general dependence (A14):

\exp (ε A) \exp (ε B) - \exp (ε B) \exp (ε A) = ε^{2} [A, B] + O (ε^{3})

(A14)

where

[\cdot, \cdot]

is the matrix commutator or the Lie bracket defined as

[A, B] = A B - B A

,

O (ε^{3})

is a lower order

ε^{3}

and

ε ≪ 1

is a scalar, it follows that the “non-commutativeness” of matrix multiplication in the group

S O (n)

is expressed with the commutator

[\cdot, \cdot]

of the matrices belonging to the Lie algebra

s o (n)

. The commutator of antisymmetric matrices is also an antisymmetric matrix

{[A, B]}^{T} = {(A B - B A)}^{T} = B^{T} A^{T} - A^{T} B^{T} = B A - A B = - [A, B]

, which means that

[A, B] \in s o (n)

. The Lie algebra of antisymmetric matrices is therefore closed under addition, multiplication by scalars and the Lie bracket. In addition to this, the Lie bracket has the following properties (A15)–(A17):

[A, A] = 0

(A15)

[A + B, C] = [A, C] + [B, C]

(A16)

[A, [B, C]] + [B, [C, A]] + [C, [A, B]] = 0

(A17)

The first two properties result from the antisymmetric nature of the commutator, while the third property is called the Jacobi identity and is widely known in differential geometry. The multiplication of

2 \times 2

antisymmetric matrices

A, B \in s o (2)

is commutative, therefore

[A, B] = 0

, and hence

S O (2)

is an abelian group. Elements of the group

S O (3)

can be equated with the rotations relative to the coordinate system origin of the space

R^{3}

. As we know, such rotations are not commutative. Therefore, the group

S O (3)

is non-abelian (non-commutative).

References

Fiori, S. Quasi-geodesic neural learning algorithms over the orthogonal group: A tutorial. J. Mach. Learn. Res. 2005, 6, 743–781. [Google Scholar]
Georgiev, P.; Theis, F.; Cichocki, A.; Bakardjian, H. Sparse component analysis: A new tool for data mining. In Data Mining in Biomedicine; Springer: Boston, MA, USA, 2007; pp. 91–116. [Google Scholar]
Hao, Y.; Song, L.; Cui, L.; Wang, H. A three-dimensional geometric features-based SCA algorithm for compound faults diagnosis. Measurement 2019, 134, 480–491. [Google Scholar] [CrossRef]
Hao, Y.; Song, L.; Cui, L.; Wang, H. Underdetermined source separation of bearing faults based on optimized intrinsic characteristic-scale decomposition and local non-negative matrix factorization. IEEE Access 2019, 7, 11427–11435. [Google Scholar] [CrossRef]
Kaselimi, M.; Doulamis, N.; Doulamis, A.; Voulodimos, A.; Protopapadakis, E. Bayesian-optimized Bidirectional LSTM Regression Model for Non-Intrusive Load Monitoring. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 2747–2751. [Google Scholar]
Altrabalsi, H.; Stankovic, V.; Liao, J.; Stankovic, L. Low-complexity energy disaggregation using appliance load modelling. Aims Energy 2016, 4, 884–905. [Google Scholar] [CrossRef]
Smith, S.T. Optimization techniques on Riemannian manifolds. Fields Inst. Commun. 1994, 3, 113–135. [Google Scholar]
Plumbley, M.D. Geometrical methods for non-negative ICA: Manifolds, Lie groups and toral subalgebras. Neurocomputing 2005, 67, 161–197. [Google Scholar] [CrossRef]
Plumbley, M.D. Lie group methods for optimization with orthogonality constraints. In Proceedings of the International Conference on Independent Component Analysis and Signal Separation, Granada, Spain, 22–24 September 2004; pp. 1245–1252. [Google Scholar]
Plumbley, M.D. Algorithms for nonnegative independent component analysis. IEEE Trans. Neural Netw. 2003, 14, 534–543. [Google Scholar] [CrossRef] [PubMed]
Plumbley, M.D. Optimization using Fourier expansion over a geodesic for non-negative ICA. In Proceedings of the International Conference on Independent Component Analysis and Signal Separation, Granada, Spain, 22–24 September 2004; pp. 49–56. [Google Scholar]
Edelman, A.; Arias, T.A.; Smith, S.T. The geometry of algorithms with orthogonality constraints. Siam J. Matrix Anal. Appl. 1998, 20, 303–353. [Google Scholar] [CrossRef]
Birtea, P.; Casu, I.; Comanescu, D. Steepest descent algorithm on orthogonal Stiefel manifolds. arXiv 2017, arXiv:1709.06295. [Google Scholar]
Mika, D.; Kleczkowski, P. ICA-based single channel audio separation: New bases and measures of distance. Arch. Acoust. 2011, 36, 311–331. [Google Scholar] [CrossRef]
Mika, D.; Kleczkowski, P. Automatic clustering of components for single channel ICA-based signal demixing. In Proceedings of the INTER-NOISE and NOISE-CON Congress and Conference Proceedings, Lisbon, Portugal, 13–16 June 2010; pp. 5350–5359. [Google Scholar]
Hyvärinen, A.; Hoyer, P. Emergence of phase-and shift-invariant features by decomposition of natural images into independent feature subspaces. Neural Comput. 2000, 12, 1705–1720. [Google Scholar] [CrossRef] [PubMed]
Selvana, S.E.; Amatob, U.; Qic, C.; Gallivanc, K.A.; Carforab, M.F.; Larobinad, M.; Alfanod, B. Unconstrained Optimizers for ICA Learning on Oblique Manifold Using Parzen Density Estimation; Tech. Rep. FSU11-05; Florida State University Department of Mathematics: Tallahassee, FL, USA, 2011. [Google Scholar]
Hyvarinen, A.; Karhunen, J.; Oja, E. Independent Component Analysis; John Wiley & Sons: New York, NY, USA, 2001. [Google Scholar]
Absil, P.-A.; Gallivan, K.A. Joint Diagonalization on the Oblique Manifold for Independent Component Analysis. In Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Toulouse, France, 14–19 May 2006. [Google Scholar]
Selvan, S.E.; Amato, U.; Gallivan, K.A.; Qi, C.; Carfora, M.F.; Larobina, M.; Alfano, B. Descent algorithms on oblique manifold for source-adaptive ICA contrast. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 1930–1947. [Google Scholar] [CrossRef] [PubMed]
Nishimori, Y.; Akaho, S. Learning algorithms utilizing quasi-geodesic flows on the Stiefel manifold. Neurocomputing 2005, 67, 106–135. [Google Scholar] [CrossRef]
Absil, P.-A.; Mahony, R.; Sepulchre, R. Optimization Algorithms on Matrix Manifolds; Princeton University Press: Princeton, NJ, USA, 2009. [Google Scholar]
Absil, P.-A.; Mahony, R.; Sepulchre, R. Riemannian geometry of Grassmann manifolds with a view on algorithmic computation. Acta Appl. Math. 2004, 80, 199–220. [Google Scholar] [CrossRef]
Comon, P.; Golub, G.H. Tracking a few extreme singular values and vectors in signal processing. Proc. IEEE 1990, 78, 1327–1343. [Google Scholar] [CrossRef]
Demmel, J.W. Three methods for refining estimates of invariant subspaces. Computing 1987, 38, 43–57. [Google Scholar] [CrossRef]
Nishimori, Y.; Akaho, S.; Abdallah, S.; Plumbley, M.D. Flag manifolds for subspace ICA problems. In Proceedings of the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing—ICASSP’07, Honolulu, HI, USA, 15–20 April 2007; pp. 1417–1420. [Google Scholar]
Nishimori, Y.; Akaho, S.; Plumbley, M.D. Riemannian optimization method on the flag manifold for independent subspace analysis. In Proceedings of the International Conference on Independent Component Analysis and Signal Separation, Charleston, SC, USA, 5–8 March 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 295–302. [Google Scholar]
Nishimori, Y.; Akaho, S.; Plumbley, M.D. Natural conjugate gradient on complex flag manifolds for complex independent subspace analysis. In Proceedings of the International Conference on Artificial Neural Networks, Prague, Czech Republic, 3–6 September 2008; pp. 165–174. [Google Scholar]
Gallier, J.; Xu, D. Computing exponentials of skew-symmetric matrices and logarithms of orthogonal matrices. Int. J. Robot. Autom. 2003, 18, 10–20. [Google Scholar]
Pecora, A.; Maiolo, L.; Minotti, A.; De Francesco, R.; De Francesco, E.; Leccese, F.; Cagnetti, M.; Ferrone, A. Strain gauge sensors based on thermoplastic nanocomposite for monitoring inflatable structures. In Proceedings of the 2014 IEEE Metrology for Aerospace (MetroAeroSpace), Benevento, Italy, 29–30 May 2014; pp. 84–88. [Google Scholar]
Krantz, S.G. Function Theory of Several Complex Variables; American Mathematical Soc.: Providence, RI, USA, 2011. [Google Scholar]
Cichocki, A.; Amari, S. Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications; John Wiley & Sons: New York, NY, USA, 2002. [Google Scholar]

Figure 1. Schematic block scheme of Independent Component Analysis.

Figure 2. Visual representation of the toral subalgebra

t (p)

for

p = 2

. The angles

φ_{1}

,

φ_{2}

and the matrices

W_{1}

and

W_{2}

are as in Equation (23). The broken line marks the search curve for the case

\frac{k_{1}}{k_{2}} = 3

.

Figure 2. Visual representation of the toral subalgebra

t (p)

for

p = 2

. The angles

φ_{1}

,

φ_{2}

and the matrices

W_{1}

and

W_{2}

are as in Equation (23). The broken line marks the search curve for the case

\frac{k_{1}}{k_{2}} = 3

.

Figure 3. Comparison of SD algorithms for minimizing the cost function on group

U (1)

. Methods in the Euclidean versus Riemannian space (Lie group methods). * algorithm SD unconstrained on the Euclidean space (1). Sensors 20 00440 i001

algorithm SD on the Euclidean space with constraint restoration (2). + algorithm SD on the Euclidean space with penalty function (3). o non-geodesic algorithm SD on Riemannian space (4). Sensors 20 00440 i002

geodesic algorithm SD on Riemannian space (5).

Figure 3. Comparison of SD algorithms for minimizing the cost function on group

U (1)

. Methods in the Euclidean versus Riemannian space (Lie group methods). * algorithm SD unconstrained on the Euclidean space (1). Sensors 20 00440 i001

algorithm SD on the Euclidean space with constraint restoration (2). + algorithm SD on the Euclidean space with penalty function (3). o non-geodesic algorithm SD on Riemannian space (4). Sensors 20 00440 i002

geodesic algorithm SD on Riemannian space (5).

Figure 4. Comparison of ICA results using INFOMAX algorithm and optimization on the

S O (4)

group, (a) source signals, (b) observed signals (mixed), (c) ICA results for the INFOMAX algorithm, (d) ICA results for the algorithm on the group

S O (4)

.

Figure 4. Comparison of ICA results using INFOMAX algorithm and optimization on the

S O (4)

group, (a) source signals, (b) observed signals (mixed), (c) ICA results for the INFOMAX algorithm, (d) ICA results for the algorithm on the group

S O (4)

.

Figure 5. Comparison of entropy sum value of received components, (a) INFOMAX algorithm, (b) optimization algorithm on a group

S O (4)

.

Figure 5. Comparison of entropy sum value of received components, (a) INFOMAX algorithm, (b) optimization algorithm on a group

S O (4)

.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mika, D.; Jozwik, J. Lie Group Methods in Blind Signal Processing. Sensors 2020, 20, 440. https://doi.org/10.3390/s20020440

AMA Style

Mika D, Jozwik J. Lie Group Methods in Blind Signal Processing. Sensors. 2020; 20(2):440. https://doi.org/10.3390/s20020440

Chicago/Turabian Style

Mika, Dariusz, and Jerzy Jozwik. 2020. "Lie Group Methods in Blind Signal Processing" Sensors 20, no. 2: 440. https://doi.org/10.3390/s20020440

APA Style

Mika, D., & Jozwik, J. (2020). Lie Group Methods in Blind Signal Processing. Sensors, 20(2), 440. https://doi.org/10.3390/s20020440

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Lie Group Methods in Blind Signal Processing

Abstract

1. Introduction

2. Model Definition (ICA, ISA)

3. Geometry of ICA, ISA and Other BSP Models

4. Lie Group Optimization Methods. One-Parameter Subalgebra and Toral Subalgebra

5. Experimental Results

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

Appendix A. Lie Group and Lie Algebra

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI