First Order and Second Order Learning Algorithms on the Special Orthogonal Group to Compute the SVD of Data Matrices

: The present paper deals with neural algorithms to learn the singular value decomposition (SVD) of data matrices. The neural algorithms utilized in the present research endeavor were developed by Helmke and Moore (HM) and appear under the form of two continuous-time differential equations over the special orthogonal group of matrices. The purpose of the present paper is to develop and compare different numerical schemes, under the form of two alternating learning rules, to learn the singular value decomposition of large matrices on the basis of the HM learning paradigm. The numerical schemes developed here are both ﬁrst-order (Euler-like) and second-order (Runge-like). Moreover, a reduced Euler scheme is presented that consists of a single learning rule for one of the factors involved in the SVD. Numerical experiments performed to estimate the optical-ﬂow (which is a component of modern IoT technologies) in real-world video sequences illustrate the features of the novel learning schemes.

The neural algorithms utilized in the present research work to learn an SVD of a data matrix were developed by Helmke and Moore (HM) in [1]. An HM learning rule appeared under the form of two continuous-time differential equations over the special orthogonal group of matrices. The purpose of the present research is to develop and compare different numerical schemes, under the form of two alternating learning equations, to learn the SVD of large matrices on the basis of the HM learning paradigm. The numerical schemes developed here are both first-order (Euler-like) and second-order

Helmke-Moore Learning and Advanced Numerical Methods
In the present section, we recall some essential details about the Helmke and Moore (HM) learning algorithm from [1,45]. Moreover, we survey a first-order and two second-order numerical methods to implement a learning paradigm formulated as an initial value problem (IVP, also termed the Cauchy problem) and propose numerical schemes to extend these methods to Riemannian manifolds, of which Lie groups are instances.
There exists a large variety of numerical methods to solve initial value problems. In order to limit the computational burden of the learning rules considered in this paper, we considered a Euler method, a Heun method, and a Runge method (also known as the Runge-Kutta method of the second order). These methods were developed to solve initial values problems over the Euclidean space R m , although they can be adapted to solve initial value problems over smooth manifolds or Lie groups, as the group SO(m).

An HM-Type Neural SVD Learning Algorithm
Denoting as Z ∈ R m×p a matrix whose SVD is to be computed and as r ≤ min{m, p} the rank of Z, its singular value decomposition may be written Z = UDV , where U ∈ R m×m and V ∈ R p×p are (special) orthogonal matrices and D is a pseudo-diagonal matrix that has all-zero values except for the first r diagonal entries, termed singular values. It is easily checked that the columns of U coincide with the eigenvectors of the product ZZ , while V contains the eigenvectors of the self-product Z Z, with the same eigenvalues.
The Helmke-Moore algorithm is utilized for training, in an unsupervised way, an artificial neural network to learn an SVD of a given rectangular matrix. The HM dynamics arises from the maximization of a specific criterion φ W : SO(m) × SO(m) → R defined as: where W ∈ R p×m is a weighting kernel and we assumed that m p. The dynamical system, derived as a Riemannian gradient flow on SO(m) × SO(m), reads: where an over-dot denotes derivation with respect to the time parameter. The weighting matrix W has the structure [W 1 0 p,m−p ], where W 1 ∈ R p×p must be diagonal with unequal entries on the main diagonal [45].
Since the matrix A has size m × m, the matrix Z has size m × p, and the matrix B has size p × p, then the product matrix H has size m × p.
Whereas the continuous-time versions of the learning algorithms leave the orthogonal group invariant, this is not true for their discrete-time counterparts, which are obtained by employing a numerical integration scheme, unless a suitable integration method is put into effect. In the present case, we may employ a convenient Lie integration method drawn from a manifold-calculus-based integration theory (see, e.g., the contribution [46] and previous reviews in [47][48][49]).

Euler Method in R m and Its Extension to SO(m)
Consider the following initial value problem: with f : R × R m → R m being a regular function and with the initial condition y(0) = y 0 ∈ R m . Whenever a closed-form solution to the IVP (3) is out of reach, the simplest numerical scheme to approximate numerically its solution is the forward Euler method described by the iterative rule: with y 0 known from the initial condition. As a reference for this method and those invoked in the continuation of the paper, readers might refer to [50]. The constant h > 0 denotes an integration stepsize and represents the time lapse between each time-node t n = nh and the next. The key idea behind the above forward Euler method is to estimate the value y n+1 by the slope of the vector field f (t, y) at the present node through a linear interpolation across the time lapse h. This Euler method is asymmetric: it makes use only of the value of the vector field f computed in the leftmost point of the interval [t n , t n+1 ]. The right-hand side of the expression (4) coincides with the zeroth and the first terms of the Taylor series expansion of the function f . The residue owing to truncation (also termed "truncation error") is of type O(h 2 ); therefore, this method is of order one. A learning problem on the special orthogonal group is described by the IVP: with the initial condition Y(0) = Y 0 ∈ SO(m). Let us recall that a smooth manifold M may be endowed with a manifold exponential map and by a parallel transport operator: • Manifold exponential: The exponential map derived from the geodesic associated with the connection is denoted by exp : therefore, an exponential map may be thought of as a generalization of "vector addition" to a curved space.

•
Parallel transport along geodesics: The parallel transport map is denoted by P : M × TM → TM. Given two points x, y ∈ M, parallel transport from x to y is denoted as P x→y : T x M → T y M. Parallel transport moves a tangent vector between two points along their connecting geodesic arc. Given x, y ∈ M and v ∈ T x M, the parallel transport of a vector v from a point x to a point y is denoted by P x→y (v). Parallel transport does not change the length of a transported vector, namely it is an isometry; moreover, parallel transport does not alter the angle between transported vectors, namely, given x, y ∈ M and a, v ∈ T x M, it holds that P x→y (a), P x→y (v) y = a, v x , i.e., we say that parallel transport is a conformal map. Formally, on a Euclidean manifold M = R m , P x→y (v) = v; therefore, a vector transport may be thought of as a generalization of the familiar geometric notion of the "rigid translation" of vectors to a curved space.
In the case of a special orthogonal group endowed with its canonical metric (inherited by the Euclidean metric), these maps read: where X, Y ∈ SO(m), V ∈ T X SO(m), "Exp" denotes a matrix exponential (implemented, for example, by the function expm in MATLAB ), √ · denotes a matrix square-root (implemented, for example, by the function sqrtm in MATLAB ). Let us recall that a matrix V ∈ T X SO(m) may be rewritten as V = XΩ, with Ω skew-symmetric and, therefore, that the following simplified expression for the exponential map may be used: exp X (XΩ) = XExp(Ω).
A rule of thumb to extend the forward Euler method to smooth manifolds is that the sum between a variable representing a point and a vector field at that point needs to be replaced by the exponential map applied to the point and to the vector field at that point. As a result, a Euler method on a manifold SO(m) may be expressed as: with Y 0 known from the initial condition. Likewise, the original forward Euler method, the method (8), estimates the value of the solution at the temporal node t n+1 , namely Y n+1 , as the value of the solution at the node t n interpolated by the short geodesic arc departing from the current point Y n toward the direction indicated by the vector field f (t n , Y n ).

Heun and Runge Methods in R m and Their Extension to SO(m)
The following methods are second-order numerical schemes that may be adopted to solve the Cauchy problem (3).
Heun's method reads: y n+1 := y n + hk 1,n , k 2,n := f (t n + h,ỹ n+1 ), where h > 0 denotes again a stepsize (or time lapse between two adjacent time nodes). It is worth summarizing an interpretation of the equations that describe the Heun method: The quantity k 1 represents an estimation of the value of the vector field f at the leftmost point of the time lapse [t n , t n + h], while the quantity k 2 represents an estimation of the value of the vector field f at the rightmost point of the time lapse [t n , t n + h]. Both k 1 and k 2 are estimations, in that the expression for the quantity k 1,n uses the value y n that is an estimation of the solution of the Cauchy problem obtained in the previous iteration, while the expression for k 2,n utilizes an estimation of y n+1 , indicated byỹ n+1 , based on a linear interpolation from y n in the direction k 1,n . The actual estimation of the solution at t n + h is obtained as a linear interpolation in a direction computed as the arithmetic average between the directions k 1 and k 2 .
The quantity k 1 represents again an estimation of the value of the vector field f at the leftmost point of the time lapse [t n , t n + h], while the quantity k 2 represents an estimate of the value of the vector field f at the midpoint of the time lapse [t n , t n + h]. Both k 1 and k 2 are estimations, rather than being exact evaluations. In particular, the expression for k 2,n utilizes an estimation of the exact solution to the Cauchy problem, denoted byỹ n+1/2 , based on a linear interpolation from y n in the direction k 1,n extended only to half of a whole step. The actual estimation of the solution at t n + h is obtained as a linear interpolation, from y n in a direction k 2 extended to a whole step.
The above second-order numerical methods may be extended to a smooth manifold by recalling two simple rules-of-thumb:

•
The sum between a variable representing a point on a manifold and a vector field at that point are replaced by the exponential map applied to the point and to the vector field at that point.

•
The sum between two tangent vectors belonging to two different tangent spaces may be carried out only upon transporting one of the vectors to the tangent space where the other vector lives by means of parallel transport.
By applying the above rules, it is possible to extend the Heun method and the Runge method from the Euclidean space R m to the manifold SO(m).
Heun's method on SO(m) may be expressed as: The above extension of the original Heun method was derived by following faithfully Heun's concept and by replacing the linear operations with manifold operations. In particular, notice that the mean direction between K 1 and K 2 cannot be calculated as 1 2 (K 1 + K 2 ) because K 1 and K 2 belong to different tangent spaces.
Runge's method on SO(m) may be expressed as: The interpretation of the equations that constitute Runge's method on the manifold SO(m) is completely faithful to the original method.
In the following subsection, the forward Euler method, the Heun method, and the Runge method are applied to solve the HM learning system of IVPs (2). This application leads to facing at least two challenges: (1) the HM system is made of two differential equations, which entails the application of each numerical method twice; this gives rise to a non-univocality in the extension of the equations, which will then be presented in different versions; (2) the curved space SO(m) may be treated as a smooth Riemannian manifold and as a Lie group; this gives rise to two different ways to design numerical methods, which will be explored and discussed in the following, even by the help of preliminary numerical tests.

Application of the Euler, Heun, and Runge Method to Solving an HM System
The HM-type learning system to solve is formed by two coupled neural learning equations, namely: where the skew-symmetrization operator σ(X) := X − X has been introduced and where the initial conditions A(0) = A 0 and B(0) = B 0 have been fixed. In the following, we shall outline three different classes of numerical methods to tackle the learning problem (13), namely the (forward) Euler class, the (explicit second-order) Heun method, and the (explicit) Runge method (a second-order instance from the general class of Runge-Kutta methods). For the sake of notation compactness, in the following, we shall make use of the twin vector fields: The numerical schemes adopted in this study are first-and second-order explicit. Implicit methods are overly complex, as they require solving a non-linear problem at every iteration (the computations involved in the implicit methods may be as complex as the problem they were designed to solve). Higher order formulations (such as the fourth-order Runge-Kutta method) are often inappropriate in optimization, because they were designed to provide a highly accurate estimation of the solution to an initial value problem at every step, while in optimization, such accuracy would be out of the scope as the only step that needs accuracy is the final optimization goal.

Forward Euler Method
The Euler method on SO(m) outlined in Section 2.2, applied to solve an HM learning system, may be expressed as: with h A > 0 and h B > 0 being two different learning stepsizes.

Explicit Second-Order Heun Method
The Heun method outlined in Section 2.3, applied to an HM learning problem on SO(m), as a manifold, may be expressed as: Heun, Version 1: It is interesting to observe that the same numerical learning algorithm may be recast in a number of slightly different ways by operating some tiny variations on the equations. An alternative version to the original Heun, Version 1 (16) is: Heun, Version 2: where the differences have been highlighted. Notice that both versions are explicit numerical schemes. From a computational burden viewpoint, it is important to underline that the above numerical algorithms require the repeated calculation of parallel transport that contributes non-negligibly to increasing the computational complexity of these methods.
As an alternative, the Heun method may be rewritten without using parallel transport by composing the partial steps in a different way over the space SO(m), namely: Heun, Version 3: The differences with the Heun method, Version 1 (16), have been highlighted. It is interesting to observe that, by swapping the order of the steps in A n+1 e B n+1 (1 ↔ 2), we get: which is computationally lighter compared to the Heun method, Version 3 (18). It is immediate to verify that Version 3 of the Heun method, further modified by the above equations, will result in being computationally lighter than the versions requiring parallel transport; therefore, the Heun method considered in the following preliminary numerical test will be: Heun, Version 4:

Explicit Second-Order Runge Method
The Runge method outlined in Section 2.3, applied to an HM learning problem on SO(m), as a manifold, may be expressed as: Runge, Version 1: The same method may be expressed in a slightly different way, namely by exchanging the order of adaptation of the variables A and B: Runge, Version 2: The last version of the HM-type learning systems is obtained by regarding the curved space SO(m) as a Lie group with Lie algebra so(m), which leads to the following numerical method:

Runge, Version 3:
Even this version of the numerical learning scheme does not require using the computationally intensive parallel transport. The highlighted update equations were drawn from the survey [51].

Preliminary Pyramidal Numerical Tests on the HM Learning Methods
This subsection illustrates the results of a pyramidal comparison of the above learning algorithms in order to find out which algorithm is more convenient from a computational viewpoint, for comparable learning performances. The numerical experiments concerned the computation of the SVD of random matrices Z. The numerical experiments discussed in this subsection were performed on an Intel i5-6400T 4-Core CPU, 2.2 GHz clock, 8 GB RAM machine and were coded in a MATLAB R2017b environment (we used non-parallel programming in the experiments). In the following experiments, the learning stepsizes h A and h B were set to a common value denoted by h. Moreover, we adopted the following criterion to stop the iteration based on the objective function (1): where τ denotes a threshold. The size of the unknown-matrix B was set to p := 3 in all experiments, and the sub-matrix W 1 explained in the Section 2.1 was set to diag(3, 2, 1). The size m of the matrix B took values in {9, 25, 225} (these specific values would arise in the computation of the optical-flow of 720 × 720 pixels frames, as will be shown in Section 4.2). In all experiments, the initial matrices were chosen as A 0 := I m and B 0 := I 3 . For comparison purposes, the SVD of a matrix Z was also computed by MATLAB's SVD engine, whose output triple is denoted by (U, ·, V). As objective measures of the quality of the learned SVD factors, we used the following figures of demerit: where | · | denotes an entry-wise absolute value of a matrix. Both numerical orthogonality and discrepancy between the learned SVD and the MATLAB-computed reference SVD were expected to result in being as close as possible to zero. The results of the first experiment, which was meant to compare the performances of the Heun method in its Versions 1 and 4, are summarized in Table 1. The random matrix Z was generated entry-wise as a set of random numbers uniformly distributed in the interval [0, 1]. This choice corresponded to a normalized pixel luminance, and the uniform distribution was motivated by the fact that a single pixel location may in principle take any luminance value. In the low-dimensional cases, h := 0.05, while in the higher dimensional case, h := 0.005. In all cases, τ := 10 −5 . The acquired runtimes showed that Version 4 was much lighter than Version 1, a phenomenon that became more apparent as the size of the matrix A increased. A graphical comparison of the values taken by the discrepancies defined in (25) and of the learning criterion (1) during learning is displayed in Figure 1. In this case, m := 9. The discrepancy and learning curves showed that, as concerns the learning performances, no meaningful differences between the two versions of the HM-type learning algorithms could be appreciated. The results of this experiment entailed that the Heun method, Version 4 was the preferable version in the Heun class.
The results of the second experiment, which was meant to compare the performances of the Runge method in its Versions 1 and 3, are summarized in Table 2. In the low-dimensional cases, h := 0.05, while in the higher dimensional case, h := 0.005. In all cases, τ := 10 −5 . The acquired runtimes showed that Version 3 was lighter than Version 1. Table 2. Results of a comparison between the runtimes (in seconds) of the Runge method in its Versions 1 and 3.

Method
Size of A m := 9 m := 25 m := 225 A graphical comparison of the values taken by the discrepancies and of the learning criterion during learning is displayed in the Figure 2. In this case, m := 9. The discrepancy and learning curves showed that, as concerns the learning performances, Version 1 converged only slightly more rapidly to the expected solutions. The results of this experiment entailed that the Runge method, Version 3 was the preferable version in the Runge class.
The results of the last experiment of this subsection, which was meant to compare the performances of the best Runge method and of the best Heun method to the performances of the Euler method, are summarized in Table 3. In the low-dimensional cases, h := 0.05, while in the higher dimensional case, h := 0.005. In all cases, τ := 10 −5 . The acquired runtimes revealed that the Euler method was the lightest in terms of computational complexity.
A graphical comparison of the values taken by the discrepancies and of the learning criterion during learning is displayed in Figure 3. In this case, m := 225. The discrepancy and learning curves showed that, as concerns the learning performances, the Euler method-based HM learning algorithm and the Heun method-based learning algorithm converged more rapidly to the expected solutions.  A graphical comparison of the values taken by the non-orthogonality figures defined in (25) during learning is displayed in Figure 4. In this case, m := 225. The orthogonality curves showed that the Euler method-based HM was unable to keep the same numerical precision of the Heun-based and of the Runge-based HM learning algorithms, the former being a first-order method and the latter two second-order methods. However, all these algorithms kept the non-orthogonality figures at very low values and stabilized after learning completion. The results of these three experiments revealed that the Euler method to implement a Helmke-Moore learning paradigm guaranteed an excellent trade-off between computational complexity and numerical precision.

Reduced Euler Method
In some applications, such as optical-flow estimation recalled in the next Section 4, only the matrix B is relevant. In view of the optical-flow application discussed in the next section, assume that B is a 3 × 3 factor of the singular value decomposition of a data matrix Z. Notice that, in general, the matrix A has a much bigger size than the matrix B; therefore, it would be certainly convenient to express the matrix A as a function of the matrix B and to express the learning system (2) only in terms of the variable matrix B. The following arguments may be extended easily to the general case that B has different sizes.

Derivation of a Reduced Euler Method
The HM-type learning system (2) in the form (13) Therefore, one of these conditions may be used to express the matrix variable A in terms of the matrix variable B. In particular, the first condition, in plain form, reads: The above equation is equivalent to (W B Z )A − A (ZBW) = 0. Defining K := W B Z , Equation (27) takes the form: which is to be solved in A. Let us observe that the matrix K has size 9 × 9 and that its last six rows are zero, while the matrix A has size 9 × 9 and that its last six columns do not influence the result. Let us partition the matrices K and A as follows: with K 1 is 3 × 9, A 1 is 9 × 3, and A 2 is 9 × 6. The matrix product KA exhibits the structure: According to the condition (28), the product KA needs to result in a symmetric matrix; hence, the product K 1 A 1 needs to be symmetric, while the product K 1 A 2 needs to be zero. Since the matrix A 2 plays no role in the product KA, it is computationally convenient to turn down the orthogonality of the matrix A and to set A 2 = 0 m×n , while the columns of the sub-matrix A 1 still need to be mutually orthogonal and unit-norm. Since K 1 A 1 is symmetric, let us set K 1 A 1 =: S, where S is a 3 × 3 symmetric matrix to be determined. To determine the unknown matrix S, let us rewrite the last equation by the ansatz K 1 = A 1 S, from which it follows that: therefore: The matrix K 1 is defined as a sub-matrix of K, which, in turn, is a function of the matrix B (and depends on the weight matrix W and on the data matrix Z). Hence, the function: may be used to reduce the two update rules in the Euler method (15) to the single update rule: with h > 0 and with the only initial condition B 0 . Notice that the inverse matrix square root may be expressed in terms of the matrix exponential Exp and the matrix logarithm Log as X −1/2 = Exp − 1 2 Log(X) . The algorithm (34) will be referred to as the reduced Euler method.

Numerical Comparison of the Euler Method and Its Reduced Version
The results of a comparison of the performances of the forward Euler method (15) and of the reduced Euler method (34) to implement an HM learning paradigm are summarized in Table 4. In the low-dimensional cases, h := 0.05, while in the higher dimensional case, h := 0.005. In all cases, τ := 10 −5 . The acquired runtimes showed that the reduced Euler method was way lighter than the original Euler method, a phenomenon that became more apparent as the size m of the m × 3 matrix Z increased.
A graphical comparison of the values taken by the figures of demerit defined in (25) and of the learning criterion (1) during learning is displayed in Figure 5. In this case, m := 9. The discrepancy and learning curves showed that, as concerns the learning performances, no meaningful differences between the two versions of the HM-type learning algorithms can be appreciated.
The results of this experiment confirmed that the reduced Euler method was both effective and efficient to implement the HM learning paradigm.

Experiments on Optical Flow Estimation by a Helmke-Moore Neural Learning Algorithm
In this section, we recall the notion of optical-flow estimation by a total least squares method from [42] and illustrate the behavior of the devised learning algorithm by means of numerical experiments performed on a recorded video sequence.

Optical Flow Estimation by Total Least Squares
Let us consider a gray-scale video sequence {I (x, y, q)} q , where I denotes the scalar image intensity, the pair (x, y) denotes the coordinate pair of any pixel, and q denotes the frame index.
During motion, any pixel moves from frame q and position (x, y) to frame q + ∆q at position (x + ∆x, y + ∆y). The fact that the pixel intensity has moved over the image support may be formally expressed by the optical-flow conservation equation, namely: On the basis of the above conservation law, it is possible to estimate the motion of any pixel within the sequence {I (x, y, q)} q . In fact, let us define ∆I (x, y, q) := I (x, y, q + ∆q) − I (x, y, q). An application of the Taylor series expansion gives: ∆I (x, y, q) = I (x − ∆x, y − ∆y, q) − I (x, y, q) = −I x (x, y, q)∆x − I y (x, y, q)∆y + residual, (36) where I x and I y denote the partial derivatives of the image function along the vertical and the horizontal direction, respectively, and the term "residual" denotes the sum of higher order terms in the Taylor expansion of the image function and the sum of random disturbances occurring in the recording of a video sequence, while the vertical and horizontal displacements (∆x, ∆y) were supposed small enough for the above truncated Taylor series to represent accurately the optical-flow change. The latter hypothesis was equivalent to assuming slow motion or, equivalently, sufficiently high-rate image sampling.
As mentioned, we made the solid-block motion assumption, namely the above equation was supposed to hold true, with the same values of displacements, for a set of pixels located within the rectangular patch described by x ∈ [x 1 , x N x ] and y ∈ [y 1 , y N y ], where integers N x and N y denote the block-size. On the basis of these considerations, it was possible to write the resolving system for any block between frames q and q + ∆q, that is: I (x 2 , y 1 , q + ∆t) − I (x 2 , y 1 , q) = −I x (x 2 , y 1 , q)∆x − I y (x 2 , y 1 , q)∆y , I (x 3 , y 1 , q + ∆q) − I (x 3 , y 1 , q) = −I x (x 3 , y 1 , q)∆x − I y (x 3 , y 1 , q)∆y , . . .
where high-order terms were neglected.
By defining a displacement vector v := [∆x ∆y] and upon defining a matrix L ∈ R N x N y ×2 and a vector c ∈ R N x N y , the above system casts into Lv = c. This is an over-determined linear system of N x · N y equations in two unknowns, which may be solved by the help of a total least squares technique [52].
The TLS approach to solving an over-determined system of linear equations is based on the singular value decomposition (SVD) technique [53,54]. The resulting procedure is stated in Algorithm 1.

Algorithm 1
Total least squares based algorithm to estimate a displacement vector.
It is interesting to underline that, depending on the size of each block, the data matrix Z whose SVD is to be calculated may be large sized. Moreover, for later considerations, it is important to underline that the only factor of an SVD decomposition that plays an effective role in motion estimation is V.

Numerical Experiments on Optical-Flow Learning
In order to test the capability of the HM learning paradigm in optical-flow estimation, a video was realized and saved in MP4 format. The pre-processing of the images consisted of applying a filter after the subdivision of the frames in sub-windows: every gradient matrix was examined and then the windows, which presented the null gradient, were excluded from further processing in the successive steps as they corresponded to block of pixels that were not subjected to any movement. In the post-processing, after the velocities along the xand y-axes were calculated by using an HM learning algorithm, their norms were compared to a threshold, and all the windows that presented a velocity norm larger than the threshold were ignored in the optical-flow estimation as they were considered to correspond to noisy pixels. These filters were built to clean the image from noise created by the environment and by the camera, but they also caused a small loss of information, which could be neglected. In the experiments summarized in the present subsection, the following setting was adopted: • The optical-flow was estimated between Frame 130 and Frame 135.

•
The original colored still images drawn from the two considered frames were converted from RGB to gray scale using the luminance formula 0.2989R + 0.5870G + 0.1140B as prescribed by the recommendation ITU-R BT.601-7 [55].

•
The threshold to stop iteration was set to τ := 0.001.

•
The learning stepsizes were set as h A = h B = h := 0.0005.

•
Given the frame size of 720 × 720, each frame was subdivided into windows of size 3 × 3, 5 × 5, or 15 × 15 pixels (corresponding to 57,600 matrices of size 9 × 3 to compute the SVD, 20,736 matrices of size 25 × 3 to compute the SVD, and to 2304 matrices of size 225 × 3 to compute the SVD, respectively).
The computational complexity related to the estimation of the optical-flow between two frames depended on the number of sub-windows and on their size; therefore, it was interesting to test the HM learning algorithms in different instances.
With reference to the HM learning paradigm, in the present application, it holds that m := N x · N y (which depends on the size of a image's window) while p := 3 (fixed). Considering the block decomposition H = H 1 · · · , with H 1 of size 3 × 3, the product W H may be simplified as W 1 H 1 , while the product HW may be computed as the composite matrix [HW 1 0 NxNy,NxNy−3 ]. Table 5 shows the results of a comparison of runtimes taken to estimate the optical-flow between two frames by the forward Euler method-based HM learning paradigm and the reduced Euler method-based algorithm. It is very interesting to observe how the computational complexity of the forward Euler method increased with the size of the sub-windows, while the complexity of the reduced Euler method showed an opposite trend. The time saving afforded by the reduced Euler method ranged from about 50% for low-dimensional windows to about 98% for high-dimensional windows. Figure 6 illustrates the results of optical-flow estimation when the windows size was set to 3 × 3. Figure 7 illustrates the results of optical-flow estimation when the windows size was set to 5 × 5. Figure 8 illustrates the results of optical-flow estimation when the windows size was taken as 15 × 15.
The estimation obtained through the reduced Euler method seemed cleaner and free of artifacts, which may denote a stronger robustness against noise.

Conclusions
The aim of this paper was to present two first-order (forward Euler and reduced Euler) and two second-order (Heun and Runge) numerical learning methods to learn the SVD of large rectangular matrices.
We recalled the Helmke-Moore learning paradigm to estimate the SVD of a given rectangular matrix, which was expressed by two coupled differential equations on the special orthogonal group. Taking the move from classical numerical calculus methods to solve ordinary differential equations, we suggested several numerical schemes to extend these classical methods to the manifold of special orthogonal matrices.
The resulting methods were tested numerically against each other in a pyramidal comparison in order to determine their merits and demerits in terms of computational complexity and effectiveness in computing the SVD. Observing that, for total least-squares-based optical-flow estimation, only one among the three SVD factors was necessary, a further reduced method was developed and tested against its non-reduced counterpart.
The conclusion of the experimental tests was that the estimation obtained through the reduced Euler method looked clean and free of artifacts, which may suggest robustness against noise and that the time saving of the reduced Euler method with respect to the forward Euler method increased noticeably when the size of the sub-windows increased.