There exists a large variety of numerical methods to solve initial value problems. In order to limit the computational burden of the learning rules considered in this paper, we considered a Euler method, a Heun method, and a Runge method (also known as the Runge–Kutta method of the second order). These methods were developed to solve initial values problems over the Euclidean space ${\mathbb{R}}^{m}$, although they can be adapted to solve initial value problems over smooth manifolds or Lie groups, as the group $\mathrm{SO}\left(m\right)$.
2.1. An HM-Type Neural SVD Learning Algorithm
Denoting as $Z\in {\mathbb{R}}^{m\times p}$ a matrix whose SVD is to be computed and as $r\le min\{m,p\}$ the rank of Z, its singular value decomposition may be written $Z=UD{V}^{\top}$, where $U\in {\mathbb{R}}^{m\times m}$ and $V\in {\mathbb{R}}^{p\times p}$ are (special) orthogonal matrices and D is a pseudo-diagonal matrix that has all-zero values except for the first r diagonal entries, termed singular values. It is easily checked that the columns of U coincide with the eigenvectors of the product $Z{Z}^{\top}$, while V contains the eigenvectors of the self-product ${Z}^{\top}Z$, with the same eigenvalues.
The Helmke–Moore algorithm is utilized for training, in an unsupervised way, an artificial neural network to learn an SVD of a given rectangular matrix. The HM dynamics arises from the maximization of a specific criterion
${\varphi}_{W}:\mathrm{SO}\left(m\right)\times \mathrm{SO}\left(m\right)\to \mathbb{R}$ defined as:
where
$W\in {\mathbb{R}}^{p\times m}$ is a weighting kernel and we assumed that
$m\u2a7ep$. The dynamical system, derived as a Riemannian gradient flow on
$\mathrm{SO}\left(m\right)\times \mathrm{SO}\left(m\right)$, reads:
where an over-dot denotes derivation with respect to the time parameter. The weighting matrix
W has the structure
$\left[{W}_{1}\phantom{\rule{4pt}{0ex}}{0}_{p,m-p}\right]$, where
${W}_{1}\in {\mathbb{R}}^{p\times p}$ must be diagonal with unequal entries on the main diagonal [
45].
Since the matrix A has size $m\times m$, the matrix Z has size $m\times p$, and the matrix B has size $p\times p$, then the product matrix H has size $m\times p$.
Whereas the continuous-time versions of the learning algorithms leave the orthogonal group invariant, this is not true for their discrete-time counterparts, which are obtained by employing a numerical integration scheme, unless a suitable integration method is put into effect. In the present case, we may employ a convenient Lie integration method drawn from a manifold-calculus-based integration theory (see, e.g., the contribution [
46] and previous reviews in [
47,
48,
49]).
2.2. Euler Method in ${\mathbb{R}}^{m}$ and Its Extension to $\mathrm{SO}\left(m\right)$
Consider the following initial value problem:
with
$f:\mathbb{R}\times {\mathbb{R}}^{m}\to {\mathbb{R}}^{m}$ being a regular function and with the initial condition
$y\left(0\right)={y}_{0}\in {\mathbb{R}}^{m}$. Whenever a closed-form solution to the IVP (
3) is out of reach, the simplest numerical scheme to approximate numerically its solution is the forward Euler method described by the iterative rule:
with
${y}_{0}$ known from the initial condition. As a reference for this method and those invoked in the continuation of the paper, readers might refer to [
50]. The constant
$h>0$ denotes an integration stepsize and represents the time lapse between each time-node
${t}_{n}=nh$ and the next. The key idea behind the above forward Euler method is to estimate the value
${y}_{n+1}$ by the slope of the vector field
$f(t,y)$ at the present node through a linear interpolation across the time lapse
h. This Euler method is asymmetric: it makes use only of the value of the vector field
f computed in the leftmost point of the interval
$[{t}_{n},\phantom{\rule{4pt}{0ex}}{t}_{n+1}]$. The right-hand side of the expression (
4) coincides with the zeroth and the first terms of the Taylor series expansion of the function
f. The residue owing to truncation (also termed “truncation error”) is of type
$\mathcal{O}\left({h}^{2}\right)$; therefore, this method is of order one.
A learning problem on the special orthogonal group is described by the IVP:
with the initial condition
$Y\left(0\right)={Y}_{0}\in \mathrm{SO}\left(m\right)$.
Let us recall that a smooth manifold $\mathbb{M}$ may be endowed with a manifold exponential map and by a parallel transport operator:
Manifold exponential: The exponential map derived from the geodesic associated with the connection is denoted by $exp:T\mathbb{M}\to \mathbb{M}$. Given $x\in \mathbb{M}$ and $v\in {T}_{x}\mathbb{M}$, the exponential of v at a point x is denoted as ${exp}_{x}\left(v\right)$. On a Euclidean manifold $\mathbb{M}={\mathbb{R}}^{m}$, ${exp}_{x}\left(v\right)=x+v$; therefore, an exponential map may be thought of as a generalization of “vector addition” to a curved space.
Parallel transport along geodesics: The parallel transport map is denoted by $\mathrm{P}:\mathbb{M}\times T\mathbb{M}\to T\mathbb{M}$. Given two points $x,y\in \mathbb{M}$, parallel transport from x to y is denoted as ${\mathrm{P}}^{x\to y}:{T}_{x}\mathbb{M}\to {T}_{y}\mathbb{M}$. Parallel transport moves a tangent vector between two points along their connecting geodesic arc. Given $x,y\in \mathbb{M}$ and $v\in {T}_{x}\mathbb{M}$, the parallel transport of a vector v from a point x to a point y is denoted by ${\mathrm{P}}^{x\to y}\left(v\right)$. Parallel transport does not change the length of a transported vector, namely it is an isometry; moreover, parallel transport does not alter the angle between transported vectors, namely, given $x,y\in \mathbb{M}$ and $a,v\in {T}_{x}\mathbb{M}$, it holds that ${\langle {\mathrm{P}}^{x\to y}\left(a\right),{\mathrm{P}}^{x\to y}\left(v\right)\rangle}_{y}={\langle a,v\rangle}_{x}$, i.e., we say that parallel transport is a conformal map. Formally, on a Euclidean manifold $\mathbb{M}={\mathbb{R}}^{m}$, ${\mathrm{P}}^{x\to y}\left(v\right)=v$; therefore, a vector transport may be thought of as a generalization of the familiar geometric notion of the “rigid translation” of vectors to a curved space.
In the case of a special orthogonal group endowed with its canonical metric (inherited by the Euclidean metric), these maps read:
where
$X,Y\in \mathrm{SO}\left(m\right)$,
$V\in {T}_{X}\mathrm{SO}\left(m\right)$, “Exp” denotes a matrix exponential (implemented, for example, by the function
expm in MATLAB
${}^{\circledR}$),
$\sqrt{\xb7}$ denotes a matrix square-root (implemented, for example, by the function
sqrtm in MATLAB
${}^{\circledR}$). Let us recall that a matrix
$V\in {T}_{X}\mathrm{SO}\left(m\right)$ may be rewritten as
$V=X\mathsf{\Omega}$, with
$\mathsf{\Omega}$ skew-symmetric and, therefore, that the following simplified expression for the exponential map may be used:
A rule of thumb to extend the forward Euler method to smooth manifolds is that the sum between a variable representing a point and a vector field at that point needs to be replaced by the exponential map applied to the point and to the vector field at that point. As a result, a Euler method on a manifold
$\mathrm{SO}\left(m\right)$ may be expressed as:
with
${Y}_{0}$ known from the initial condition. Likewise, the original forward Euler method, the method (
8), estimates the value of the solution at the temporal node
${t}_{n+1}$, namely
${Y}_{n+1}$, as the value of the solution at the node
${t}_{n}$ interpolated by the short geodesic arc departing from the current point
${Y}_{n}$ toward the direction indicated by the vector field
$f({t}_{n},{Y}_{n})$.
2.3. Heun and Runge Methods in ${\mathbb{R}}^{m}$ and Their Extension to $\mathrm{SO}\left(m\right)$
The following methods are second-order numerical schemes that may be adopted to solve the Cauchy problem (
3).
Heun’s method reads:
where
$h>0$ denotes again a stepsize (or time lapse between two adjacent time nodes). It is worth summarizing an interpretation of the equations that describe the Heun method: The quantity
${k}_{1}$ represents an estimation of the value of the vector field
f at the leftmost point of the time lapse
$[{t}_{n},\phantom{\rule{4pt}{0ex}}{t}_{n}+h]$, while the quantity
${k}_{2}$ represents an estimation of the value of the vector field
f at the rightmost point of the time lapse
$[{t}_{n},\phantom{\rule{4pt}{0ex}}{t}_{n}+h]$. Both
${k}_{1}$ and
${k}_{2}$ are estimations, in that the expression for the quantity
${k}_{1,n}$ uses the value
${y}_{n}$ that is an estimation of the solution of the Cauchy problem obtained in the previous iteration, while the expression for
${k}_{2,n}$ utilizes an estimation of
${y}_{n+1}$, indicated by
${\tilde{y}}_{n+1}$, based on a linear interpolation from
${y}_{n}$ in the direction
${k}_{1,n}$. The actual estimation of the solution at
${t}_{n}+h$ is obtained as a linear interpolation in a direction computed as the arithmetic average between the directions
${k}_{1}$ and
${k}_{2}$.
Runge’s method (often denoted as “RK2”) is simpler than Heun’s, although presenting the same precision order, and is expressed as:
The quantity ${k}_{1}$ represents again an estimation of the value of the vector field f at the leftmost point of the time lapse $[{t}_{n},\phantom{\rule{4pt}{0ex}}{t}_{n}+h]$, while the quantity ${k}_{2}$ represents an estimate of the value of the vector field f at the midpoint of the time lapse $[{t}_{n},\phantom{\rule{4pt}{0ex}}{t}_{n}+h]$. Both ${k}_{1}$ and ${k}_{2}$ are estimations, rather than being exact evaluations. In particular, the expression for ${k}_{2,n}$ utilizes an estimation of the exact solution to the Cauchy problem, denoted by ${\tilde{y}}_{n+1/2}$, based on a linear interpolation from ${y}_{n}$ in the direction ${k}_{1,n}$ extended only to half of a whole step. The actual estimation of the solution at ${t}_{n}+h$ is obtained as a linear interpolation, from ${y}_{n}$ in a direction ${k}_{2}$ extended to a whole step.
The above second-order numerical methods may be extended to a smooth manifold by recalling two simple rules-of-thumb:
The sum between a variable representing a point on a manifold and a vector field at that point are replaced by the exponential map applied to the point and to the vector field at that point.
The sum between two tangent vectors belonging to two different tangent spaces may be carried out only upon transporting one of the vectors to the tangent space where the other vector lives by means of parallel transport.
By applying the above rules, it is possible to extend the Heun method and the Runge method from the Euclidean space ${\mathbb{R}}^{m}$ to the manifold $\mathrm{SO}\left(m\right)$.
Heun’s method on
$\mathrm{SO}\left(m\right)$ may be expressed as:
The above extension of the original Heun method was derived by following faithfully Heun’s concept and by replacing the linear operations with manifold operations. In particular, notice that the mean direction between ${K}_{1}$ and ${K}_{2}$ cannot be calculated as $\frac{1}{2}}({K}_{1}+{K}_{2})$ because ${K}_{1}$ and ${K}_{2}$ belong to different tangent spaces.
Runge’s method on
$\mathrm{SO}\left(m\right)$ may be expressed as:
The interpretation of the equations that constitute Runge’s method on the manifold $\mathrm{SO}\left(m\right)$ is completely faithful to the original method.
In the following subsection, the forward Euler method, the Heun method, and the Runge method are applied to solve the HM learning system of IVPs (
2). This application leads to facing at least two challenges: (1) the HM system is made of two differential equations, which entails the application of each numerical method twice; this gives rise to a non-univocality in the extension of the equations, which will then be presented in different versions; (2) the curved space
$\mathrm{SO}\left(m\right)$ may be treated as a smooth Riemannian manifold and as a Lie group; this gives rise to two different ways to design numerical methods, which will be explored and discussed in the following, even by the help of preliminary numerical tests.
2.4. Application of the Euler, Heun, and Runge Method to Solving an HM System
The HM-type learning system to solve is formed by two coupled neural learning equations, namely:
where the skew-symmetrization operator
$\sigma \left(X\right):={X}^{\top}-X$ has been introduced and where the initial conditions
$A\left(0\right)={A}_{0}$ and
$B\left(0\right)={B}_{0}$ have been fixed.
In the following, we shall outline three different classes of numerical methods to tackle the learning problem (
13), namely the (forward) Euler class, the (explicit second-order) Heun method, and the (explicit) Runge method (a second-order instance from the general class of Runge–Kutta methods). For the sake of notation compactness, in the following, we shall make use of the twin vector fields:
The numerical schemes adopted in this study are first- and second-order explicit. Implicit methods are overly complex, as they require solving a non-linear problem at every iteration (the computations involved in the implicit methods may be as complex as the problem they were designed to solve). Higher order formulations (such as the fourth-order Runge–Kutta method) are often inappropriate in optimization, because they were designed to provide a highly accurate estimation of the solution to an initial value problem at every step, while in optimization, such accuracy would be out of the scope as the only step that needs accuracy is the final optimization goal.
2.4.1. Forward Euler Method
The Euler method on
$\mathrm{SO}\left(m\right)$ outlined in
Section 2.2, applied to solve an HM learning system, may be expressed as:
with
${h}^{A}>0$ and
${h}^{B}>0$ being two different learning stepsizes.
2.4.2. Explicit Second-Order Heun Method
The Heun method outlined in
Section 2.3, applied to an HM learning problem on
$\mathrm{SO}\left(m\right)$, as a manifold, may be expressed as:
It is interesting to observe that the same numerical learning algorithm may be recast in a number of slightly different ways by operating some tiny variations on the equations. An alternative version to the original Heun, Version 1 (
16) is:
where the differences have been highlighted. Notice that both versions are explicit numerical schemes.
From a computational burden viewpoint, it is important to underline that the above numerical algorithms require the repeated calculation of parallel transport that contributes non-negligibly to increasing the computational complexity of these methods.
As an alternative, the Heun method may be rewritten without using parallel transport by composing the partial steps in a different way over the space
$\mathrm{SO}\left(m\right)$, namely:
The differences with the Heun method, Version 1 (
16), have been highlighted. It is interesting to observe that, by swapping the order of the steps in
${A}_{n+1}$ e
${B}_{n+1}$ (
$1\leftrightarrow 2$), we get:
which is computationally lighter compared to the Heun method, Version 3 (
18). It is immediate to verify that Version 3 of the Heun method, further modified by the above equations, will result in being computationally lighter than the versions requiring parallel transport; therefore, the Heun method considered in the following preliminary numerical test will be:
2.4.3. Explicit Second-Order Runge Method
The Runge method outlined in
Section 2.3, applied to an HM learning problem on
$\mathrm{SO}\left(m\right)$, as a manifold, may be expressed as:
The same method may be expressed in a slightly different way, namely by exchanging the order of adaptation of the variables
A and
B:
The last version of the HM-type learning systems is obtained by regarding the curved space
$\mathrm{SO}\left(m\right)$ as a Lie group with Lie algebra
$\mathfrak{so}\left(m\right)$, which leads to the following numerical method:
Even this version of the numerical learning scheme does not require using the computationally intensive parallel transport. The highlighted update equations were drawn from the survey [
51].
2.5. Preliminary Pyramidal Numerical Tests on the HM Learning Methods
This subsection illustrates the results of a pyramidal comparison of the above learning algorithms in order to find out which algorithm is more convenient from a computational viewpoint, for comparable learning performances. The numerical experiments concerned the computation of the SVD of random matrices
Z. The numerical experiments discussed in this subsection were performed on an Intel
${}^{\circledR}$ i5-6400T 4-Core CPU, 2.2 GHz clock, 8 GB RAM machine and were coded in a MATLAB
${}^{\circledR}$ R2017b environment (we used non-parallel programming in the experiments). In the following experiments, the learning stepsizes
${h}_{A}$ and
${h}_{B}$ were set to a common value denoted by
h. Moreover, we adopted the following criterion to stop the iteration based on the objective function (
1):
where
$\tau $ denotes a threshold. The size of the unknown-matrix
B was set to
$p:=3$ in all experiments, and the sub-matrix
${W}_{1}$ explained in the
Section 2.1 was set to
$\mathrm{diag}(3,2,1)$. The size
m of the matrix
B took values in
$\{9,25,225\}$ (these specific values would arise in the computation of the optical-flow of
$720\times 720$ pixels frames, as will be shown in
Section 4.2). In all experiments, the initial matrices were chosen as
${A}_{0}:={I}_{m}$ and
${B}_{0}:={I}_{3}$. For comparison purposes, the SVD of a matrix
Z was also computed by MATLAB’s SVD engine, whose output triple is denoted by
$(U,\xb7,V)$. As objective measures of the quality of the learned SVD factors, we used the following figures of demerit:
where
$|\xb7|$ denotes an entry-wise absolute value of a matrix. Both numerical orthogonality and discrepancy between the learned SVD and the MATLAB-computed reference SVD were expected to result in being as close as possible to zero.
The results of the first experiment, which was meant to compare the performances of the Heun method in its Versions 1 and 4, are summarized in
Table 1. The random matrix
Z was generated entry-wise as a set of random numbers uniformly distributed in the interval
$[0,\phantom{\rule{4pt}{0ex}}1]$. This choice corresponded to a normalized pixel luminance, and the uniform distribution was motivated by the fact that a single pixel location may in principle take any luminance value. In the low-dimensional cases,
$h:=0.05$, while in the higher dimensional case,
$h:=0.005$. In all cases,
$\tau :={10}^{-5}$. The acquired runtimes showed that Version 4 was much lighter than Version 1, a phenomenon that became more apparent as the size of the matrix
A increased.
A graphical comparison of the values taken by the discrepancies defined in (
25) and of the learning criterion (
1) during learning is displayed in
Figure 1. In this case,
$m:=9$. The discrepancy and learning curves showed that, as concerns the learning performances, no meaningful differences between the two versions of the HM-type learning algorithms could be appreciated.
The results of this experiment entailed that the Heun method, Version 4 was the preferable version in the Heun class.
The results of the second experiment, which was meant to compare the performances of the Runge method in its Versions 1 and 3, are summarized in
Table 2. In the low-dimensional cases,
$h:=0.05$, while in the higher dimensional case,
$h:=0.005$. In all cases,
$\tau :={10}^{-5}$. The acquired runtimes showed that Version 3 was lighter than Version 1.
A graphical comparison of the values taken by the discrepancies and of the learning criterion during learning is displayed in the
Figure 2. In this case,
$m:=9$. The discrepancy and learning curves showed that, as concerns the learning performances, Version 1 converged only slightly more rapidly to the expected solutions.
The results of this experiment entailed that the Runge method, Version 3 was the preferable version in the Runge class.
The results of the last experiment of this subsection, which was meant to compare the performances of the best Runge method and of the best Heun method to the performances of the Euler method, are summarized in
Table 3. In the low-dimensional cases,
$h:=0.05$, while in the higher dimensional case,
$h:=0.005$. In all cases,
$\tau :={10}^{-5}$. The acquired runtimes revealed that the Euler method was the lightest in terms of computational complexity.
A graphical comparison of the values taken by the discrepancies and of the learning criterion during learning is displayed in
Figure 3. In this case,
$m:=225$. The discrepancy and learning curves showed that, as concerns the learning performances, the Euler method-based HM learning algorithm and the Heun method-based learning algorithm converged more rapidly to the expected solutions.
A graphical comparison of the values taken by the non-orthogonality figures defined in (
25) during learning is displayed in
Figure 4. In this case,
$m:=225$. The orthogonality curves showed that the Euler method-based HM was unable to keep the same numerical precision of the Heun-based and of the Runge-based HM learning algorithms, the former being a first-order method and the latter two second-order methods. However, all these algorithms kept the non-orthogonality figures at very low values and stabilized after learning completion.
The results of these three experiments revealed that the Euler method to implement a Helmke–Moore learning paradigm guaranteed an excellent trade-off between computational complexity and numerical precision.