Next Article in Journal
A Mask-Based Adversarial Defense Scheme
Next Article in Special Issue
Numerical Integration Schemes Based on Composition of Adjoint Multistep Methods
Previous Article in Journal
Global Repeat Map (GRM) Application: Finding All DNA Tandem Repeat Units
Previous Article in Special Issue
Clustering Algorithm with a Greedy Agglomerative Heuristic and Special Distance Measures
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mixed Alternating Projections with Application to Hankel Low-Rank Approximation

Faculty of Mathematics and Mechanics, St. Petersburg State University, Universitetskaya nab. 7/9, St. Petersburg 199034, Russia
*
Author to whom correspondence should be addressed.
Algorithms 2022, 15(12), 460; https://doi.org/10.3390/a15120460
Submission received: 14 October 2022 / Revised: 23 November 2022 / Accepted: 30 November 2022 / Published: 5 December 2022
(This article belongs to the Collection Feature Papers in Algorithms)

Abstract

:
The method of alternating projections for extracting low-rank signals is considered. The problem of decreasing the computational costs while keeping the estimation accuracy is analyzed. The proposed algorithm consists of alternating projections on the set of low-rank matrices and the set of Hankel matrices, where iterations of weighted projections with different weights are mixed. For algorithm justification, theory related to mixed alternating projections to linear subspaces is studied and the limit of mixed projections is obtained. The proposed approach is applied to the problem of Hankel low-rank approximation for constructing a modification of the Cadzow algorithm. Numerical examples compare the accuracy and computational cost of the proposed algorithm and Cadzow iterations.

1. Introduction

Problem. Consider the problem of extracting a signal S = ( s 1 , , s N ) T from a noisy time series X = S + N , where S is governed by a linear recurrence relation (LRR) of order r: s n = i = 1 r a i s n i , n = r + 1 , , N , a r 0 , N is noise with zero expectation and autocovariance matrix Σ = Σ N R N × N . There is a well-known result that describes the set of time series governed by LRRs in parametric form: s n = i P i ( n ) exp ( β i n ) cos ( 2 π ω i n + ψ i ) , where P i ( n ) are polynomials in n (see, e.g., ([1], Theorem 3.1.1)). Methods based on the estimation of signal subspace (subspace-based methods), and in particular, singular spectrum analysis (see [2,3,4]), work well for the analysis of such signals.
The subspace-based approach starts from fixing a window length L, 1 < L < N , putting K = N L + 1 and constructing the L-trajectory matrix for a series S R N :
S = T L ( S ) = s 1 s 2 s K s 2 s 3 s K + 1 s L s L + 1 s N ,
where T L denotes the bijection between R N and H ; H is the set of Hankel matrices of size L × K with the equal values on the auxiliary diagonals i + j = const .
We say that the time series S is called a time series of rank r, if its trajectory matrix S is rank-deficient and has rank r for any L, r < L < N r + 1 . The equivalent definition of series of rank r, 2 r < N , is that ( r + 1 ) -trajectory matrix has rank r ([5], Corollary 5.1). Denote the set of time series of rank r and length N as D r N = D r .
Consider the problem of approximating a time series by series of rank r in a weighted Euclidean norm as the following non-linear weighted least-squares problem:
Y = arg min Y D r ¯ X Y W 2 ,
where X , Y R N are time series of length N and Z W is the weighted Euclidean norm generated by the inner product
Z 1 , Z 2 W = Z 1 T W Z 2 ,
W R N × N is a symmetric positive-definite weight matrix and D ¯ r is the closure of the set D r .
The solution Y of the problem (1) can be used as a finite-rank estimate of the signal S . Moreover, the obtained estimate allows one to solve many related problems: estimating the LRR coefficients and time series parameters, forecasting the time series, filling in the gaps (if any) and so on. Note that when W = Σ 1 , the resulting signal estimate Y assuming Gaussian noise N is the maximum likelihood estimate.
Known solution. To solve the problem (1), consider the related problem of structured low-rank approximation (SLRA; see a review in [6]):
Y = arg min Y M r H f L , R ( Y ) , f L , R ( Y ) = X Y L , R 2 ,
where Z L , R is the matrix norm generated by the inner product described in [7]:
Z 1 , Z 2 L , R = tr ( L Z 1 R Z 2 T ) ,
H = H L , K is the space of Hankel matrices of size L × K ; M r R L × K is the set of matrices of rank not larger than r; L R L × L and R R K × K are positive-definite matrices; and X and Y are the matrices related to (1) as X = T L ( X ) and Y = T L ( Y ) . Then, the estimate of the signal S is T L 1 ( Y ) .
There exists a relation between the weight matrix W and the matrices L and R that provides the equivalence of the problems (1) and (3) [8] and thereby the coincidence of the corresponding signal estimates.
The algorithm for the numerical solution of the Hankel SLRA problem (3) is the Cadzow iterative method (introduced in [9]; see also [4]), which belongs to the class of alternating projection methods [10]. Let us define the following sequence of matrices starting from a given matrix X :
X 0 = X , X k + 1 = Π H , ( L , R ) Π M r , ( L , R ) X k , k 0 ,
where Π H , ( L , R ) and Π M r , ( L , R ) are projectors on the corresponding sets in the matrix norm · L , R generated by the inner product (4). As a numerical solution we can take the value at T-th iteration, that is, T L 1 ( X T ) , which can then be used as the initial value in local search methods, for example, in VPGN [11] and MGN [12] methods.
The problem of search of matrices L and R (let us call them “matrix weights” to differ from the weight matrix W ) that would correspond to the matrix W 0 = Σ 1 is already known. The choice W 0 = Σ 1 is natural for a model with noise autocovariance matrix Σ and leads to the maximum likelihood estimation of the signal. For the case of autoregressive noise (and for white noise in particular), it is shown in [8] that the equivalence of the problems (1) and (3) is possible, only if either the matrix L or R is degenerate; this is inappropriate when solving the problem with methods of the Cadzow type; see ([13], Remark 4). Thereby, nondegenerate matrices L and R , which give a matrix W approximated by Σ 1 , have to be used. The papers [8,14] are devoted to solving the problems of finding such matrices.
The problem to be solved is parameterized by a parameter 0 < α 1 . Assume that the matrix L is fixed and the matrix R is subject to the following constraints: R is diagonal, and its condition number is bounded—say, cond R 1 / α —that is, α is the parameter regulating conditionality. By decreasing α , we can achieve weights closer to the required weights of W 0 , but this reduces the convergence rate of the Cadzow method (5) [13]. Accordingly, by choosing α close to or equal to 1, we get an iteration that converges faster, but the weights used are far from optimal, which leads to a loss of accuracy. The question arises: is there a way to modify the Cadzow iterations to obtain both fast convergence and high accuracy?
Proposed approach. We propose to consider “mixed Cadzow iterations” defined in the following way:
X T = Π H , ( L 2 , R 2 ) Π M r , ( L 2 , R 2 ) T 2 Π H , ( L 1 , R 1 ) Π M r , ( L 1 , R 1 ) T 1 X 0 ,
where T = T 1 + T 2 and T 1 , T 2 0 set the number of iterations performed with matrix weights ( L 1 , R 1 ) and ( L 2 , R 2 ) , respectively. The approach is to use different weights ( L 1 , R 1 ) and ( L 2 , R 2 ) that correspond to different α to achieve an estimate of the signal that would be both sufficiently accurate and at the same time rapidly computable.
Main results. The main result consists of two parts. Theorem 2 contains the limit properties of (6) in the general case of alternating projections. Theorem 6, which is its corollary, is used to justify the proposed algorithm with mixed alternating projections for solving the problem (3). Section 5.2 demonstrates the capabilities of the proposed algorithm.
Paper’s structure. In this paper, we study general properties of alternating projections (Section 2) and mixed alternating projections (Section 3) and then apply them to Cadzow iterations with a justification of the choice of weights for the projectors participating in the mixed iterations. To do this, we consider a linearized version of the algorithm, where the projection on the set of rank-deficient matrices is replaced by the projection on the tangent subspace to the set of such matrices (Section 4). This allows us to apply the linear operator theory to the mixed iterations (Section 5). Section 5.1 discusses how the algorithm’s parameters affect the accuracy and rate of convergence. In Section 5.2, the obtained results are confirmed by a numerical experiment. Section 6 and Section 7 conclude the paper.

2. Alternating Projections

Consider the linear case where we are looking for a solution to an optimization problem
min S A B X S ,
where A and B are linear subspaces of R M with a nonempty intersection of dimension m < M .
The solution to the problem can be found using the method of alternating projections with an initial value of S ( 0 ) = X and the iteration step
S ( k + 1 ) = Π A Π B S ( k ) .
The following statement is known.
Theorem 1
([15], Theorem 13.7).
( Π A Π B ) n n + Π A B ,
that is, the limit of the iterations is equal to the projection on the intersection of the subspaces A and B .
This allows us to prove the following property. Denote P A , B = Π A Π B .
Corollary 1.
All the eigenvalues of the matrix P A , B lie within the unit circle; the eigenvalue λ = 1 , which only has modulus 1, is semi-simple and has multiplicity equal to m = dim A B ; the corresponding eigenvectors form the basis of A B .
Proof. 
The statement of the corollary is obviously true in the case of diagonalizable matrix P A , B , since by Theorem 1, P A , B n tends to a projection matrix of rank m whose eigenvalues (1 of multiplicity m and 0 of multiplicity M m ) are the limits of the powers of the eigenvalues of P A , B . In the general case, the result is based on the statement “Limits of Powers” in ([16], page 630), which, among other things, states that the eigenvalue 1 is semi-simple—i.e., its algebraic and geometric multiplicities coincide. The geometric multiplicity m of the unit root follows from the fact that the vectors from A B (and only they) correspond to the eigenvalue 1 of the matrix P A , B . □
The following proposition establishes the convergence rate of alternating projections.
Proposition 1.
The following inequality holds:
( P A , B n Π A B ) X C X ( | λ ^ | + δ ) n ,
where C > 0 is some positive constant, δ > 0 is an arbitrarily small real number and λ ^ is the largest non-unit (that is, ( m + 1 ) -th) eigenvalue of the matrix P A , B .
Proof. 
The theorem ([17], Theorem 2.12) implies that for a matrix D , the convergence rate of D k to the limit matrix D is determined by the spectral radius γ = ρ ( D D ) of the difference matrix. Namely, for any μ > γ , there exists a constant C such that D k D C μ k , where · is the induced norm. By Corollary 1, both matrices P A , B and Π A B have eigenvalue one corresponding to the same eigenspace. Since the other eigenvalues of Π A B are zero, the spectral radius of the difference matrix is λ ^ , which proves the statement. □
Thus, the convergence rate of the algorithm with iterations given by (8) is determined by the modulus of the eigenvalue | λ ^ | : the closer it is to one, the slower the algorithm converges.
Remark 1.
Using the commutativity property Π A B Π A = Π A Π A B = Π A B and Π A B Π B = Π B Π A B = Π A B , the expression P A , B n Π A B on the left side of the inequality (9) can be rewritten as
P A , B n Π A B = P A ^ , B ^ n ,
where P A ^ , B ^ = Π A ^ Π B ^ , Π A ^ = Π A Π A B , Π B ^ = Π B Π A B , A ^ and B ^ are orthogonal complements of the A B in the subspaces A and B , respectively. Then λ ^ is the largest absolute eigenvalue of the matrix P A ^ , B ^ .
Let us show the relation of P A , B to canonical correlations and principal angles between subspaces.
Proposition 2.
Let A be the matrix consisting of a basis of the subspace A as its columns and B be the matrix consisting of a basis of the subspace B as its columns. Then, the following is true:
P A , B = B T B 1 B T A A T A 1 A T B .
Proof. 
The formula (10) is obtained by multiplying the solutions of two least-squares approximation problems related to the subspaces A and B . □
Remark 2.
The matrix on the right side of the equality (10) arises in the problem of computing canonical correlations. The squares of canonical correlations are equal to the squares of cosines of the so-called principal angles between subspaces A and B , spanned by the columns of the matrices, and are calculated using the eigenvalues of this matrix.
Since we need only the leading principal angle, which is the angle between subspaces, we will use the simpler definition of the cosine of the angle between subspaces through the inner product.
Definition 1
([18]). Let A and B be two linear subspaces of a Hilbert space with inner product · , · and the corresponding (semi-)norm · . Then, the cosine of the angle between A and B is defined as
cos ( A , B ) = max A A , B B | A , B | A B .

3. Mixed Alternating Projections

Suppose that different norms may be considered in the problem (7). For example, such a case arises if a weighted least-squares (WLS) problem is solved. Such a problem can be solved both with an optimal weight matrix in terms of the minimum error of WLS estimates and with some suboptimal ones.
Thus, consider two equivalent norms and assume that the application of the method of alternating projections in the first norm leads to a more accurate estimation of the solution, but the algorithm converges slowly, and alternating projections in the second norm, on the contrary, converge faster, but the accuracy of the solution is worse.
We propose a way to speed up the convergence of the algorithm using the first norm, almost without worsening the accuracy of the solution. Let us consider the projectors by two equivalent norms and denote the corresponding matrices P A , B ( 1 ) and P A , B ( 2 ) . This notation is introduced and implies both projections in one of the 1st or 2nd norms in the definition P A , B ( i ) = Π A ( i ) Π B ( i ) , i = 1 , 2 .
The mixed alternating projections are defined as:
X T = P A , B ( 2 ) T 2 P A , B ( 1 ) T 1 X 0 ,
where T = T 1 + T 2 . Since this formula uses only linear operators, we can prove the main theorem of this paper.
Theorem 2.
For the algorithm given by (11), the following relations are true, where convergence is in the operator norm:
1. 
P A , B ( 2 ) T 2 P A , B ( 1 ) T 1 Π A B ( 1 ) if T 2 is fixed and T 1 + .
2. 
There exists an equivalent third norm such that P A , B ( 2 ) T 2 P A , B ( 1 ) T 1 Π A B ( 3 ) if T 1 is fixed and T 2 + .
Proof. 
The proof of the first statement follows from Theorem 1: P A , B ( 1 ) T 1 Π A B ( 1 ) as T 1 + and also P A , B ( 2 ) T 2 Π A B ( 1 ) = Π A B ( 1 ) for any T 2 .
To prove the second statement, we first apply Theorem 1 to the matrix P A , B ( 2 ) T 2 as T 2 + , and then consider the matrix P ˜ = Π A B ( 2 ) P A , B ( 1 ) T 1 . It is sufficient to show that the matrix P ˜ defines a projection—i.e., P ˜ = P ˜ P ˜ ; then, the required projection is P ˜ = Π A B ( 3 ) in a third norm. By using direct substitution, we obtain:
P ˜ P ˜ = Π A B ( 2 ) P A , B ( 1 ) T 1 Π A B ( 2 ) P A , B ( 1 ) T 1 .
Since P A , B ( 1 ) T 1 Π A B ( 2 ) = Π A B ( 2 ) for any T 1 , the second statement is proved. □
Theorem 2 has the following interpretation: if one first makes T 1 steps with projections in the first norm in the iteration (11), and then iterates with projections in the second norm, then the obtained result will still be a projection on the desired subspace (item 2), and the larger T 1 is, the closer this projection will be to the projection in the first norm (item 1). This implies that to achieve fast and accurate estimation, the first norm should correspond to a more accurate solution of the problem and possibly slower convergence of the alternating projections, and the second norm can correspond to a less accurate solution but faster iterations. The rate of convergence of the corresponding operators is explained by Proposition 1.
A general rule in optimization is that the accuracy of two successive optimization methods is inherited from the second method, so the first method is chosen to be fast and possibly inaccurate, and the second method to be more accurate. Theorem 2 shows that in alternating projections, the situation is in some sense the opposite. This can be explained as follows. Optimization methods use the target function at each step, and can therefore adjust the steps. In the case of alternating projections, initial data are used only to compute the first iteration. Therefore, it is important to perform the first iterations in the proper direction.

4. Linearized Cadzow Iterations in the Signal-Estimation Problem

In the next Section 5, we apply the approach proposed in Section 3 to solving the SLRA problem (3) to estimate signals in time series. Since this approach considers alternating projections to linear subspaces, we start with linearization of the Cadzow iteration (5). The Cadzow iteration is not linear because it uses the projection onto the nonconvex set M r of matrices of rank not larger than r, which is not a closed subspace but is a smooth manifold [19].
Let us sketch the way of linearization in the neighborhood of a time series S 0 D r of rank r. First, we consider the set M r in the neighborhood of the point S 0 = T L ( S 0 ) . The projection in the neighborhood of this point is correctly defined if we solve the signal-estimation problem for a noisy time series of the form X = S 0 + ξ , where ξ is a random vector with a zero mean and an autocovariance matrix proportional to W 0 1 = Σ 0 , where the variances are small enough and W 0 is a given weight matrix. After that, we construct a linearized iteration by replacing the set M r with a tangent subspace at S 0 . Then, we can reduce the linearized iteration to a multiplication by a real matrix and find its properties. As a result, we obtain asymptotic relations for the linearization which explains the yield of the full Cadzow iteration (5).

4.1. Linearization of the Operator Π M r , ( L , R )

Consider the matrix S 0 R L × K , L > r and K > r , where S 0 = T L ( S 0 ) and S 0 is a series of rank r; hence, the matrix S 0 has rank r. In the neighborhood of the point S 0 , the set M r is a smooth manifold of dimension r ( L + K ) r 2 , and the tangent subspace N = N ( S 0 ) at S 0 is expressed as follows [19]:
N ( S 0 ) = { X R L × K : U T X V = 0 L r , K r } ,
where U R L × ( L r ) and V R K × ( K r ) are arbitrary full-rank matrices such that U T S 0 = 0 L r , K and S 0 V = 0 L , K r . Note that the definition is correct because N ( S 0 ) depends only on colspace S 0 and rowspace S 0 .
Let us denote by Π N ( S 0 ) , ( L , R ) : R L × K N ( S 0 ) the projection on N ( S 0 ) in the norm generated by (4). Then, the linearized Cadzow iteration in the neighborhood of the point S 0 has the following form:
X k + 1 = Π H , ( L , R ) Π N ( S 0 ) , ( L , R ) X k .
Set X 0 = T L ( X 0 ) , X k = T L 1 ( X k ) , k 1 . Then, we can rewrite (13) as
X k + 1 = T L 1 Π H , ( L , R ) Π N ( S 0 ) , ( L , R ) T L X k .
Note that each of the operators inside the brackets is linear. Hence, their composition is a linear operator and therefore can be written as a matrix, which we denote P S 0 , ( L , R ) :
P S 0 , ( L , R ) = T L 1 Π H , ( L , R ) Π N ( S 0 ) , ( L , R ) T L , X k + 1 = P S 0 , ( L , R ) X k .
Formally, we can calculate the elements of the matrix P S 0 , ( L , R ) by successive substituting N standard unit vectors into the composition of operators.

4.2. Intersection of Subspaces H N ( S 0 )

For a compact representation of the further results, let us describe the intersection of the tangent subspace to the set of matrices of finite rank M r and the space of Hankel matrices H .
Let us gather necessary definitions from [12].
Definition 2.
We say that a series S 0 R N is governed by a generalized linear recurrence relation (GLRR) with coefficients a , where a R r + 1 and a 0 r + 1 , if a T T r + 1 ( S 0 ) = 0 N r T . We denote by Z N ( a ) = Z ( a ) = { S R N : a T T r + 1 ( S ) = 0 N r T } the subspace of time series governed by the GLRR( a ).
Consider an equivalent form of Z ( a ) . Denote by Q M , d the operator R d + 1 R M × ( M d ) defined as follows:
Q M , d ( b ) = b 1 0 0 b 2 b 1 0 b 2 0 b d + 1 b 1 0 b d + 1 b 2 0 0 0 b d + 1 ,
where b = ( b 1 , , b d + 1 ) T R d + 1 . Then, Z ( a ) can be represented in another form as
Z ( a ) = { S R N : Q T ( a ) S = 0 N r } ,
where Q = Q N , r [20].
Remark 3.
Since the columns and rows of the trajectory matrix of a time series in D r satisfy the same GLRR( a ) as the original series does, we can take Q L , r ( a ) and Q K , r ( a ) as U and V in (12), respectively.
Definition 3.
Let a = ( a 1 , , a k ) T R k . Denote by a 2 R 2 k 1 = ( a 1 ( 2 ) , , a 2 k 1 ( 2 ) ) T the acyclic convolution of the vector with itself:
a i ( 2 ) = j = max ( 1 , i k + 1 ) min ( i , k ) a j a i j + 1 .
As shown in [12], the tangent subspace to D r at S 0 coincides with Z ( a 0 2 ) , dim Z ( a 0 2 ) = 2 r . The following theorem establishes the relation between the tangent subspaces to the manifold of time series governed by GLRRs and the manifold of their trajectory matrices.
Theorem 3.
Let S 0 D r ; i.e., let S 0 be a series of rank r, r < min ( L , K ) , and S 0 Z ( a 0 ) —i.e., let S 0 be governed by the GLRR( a 0 ) for a vector a 0 R r + 1 . Then, T L 1 ( H N ( S 0 ) ) = Z ( a 0 2 ) , where S 0 = T L ( S 0 ) .
Proof. 
Consider a matrix X in the intersection of H and N ( S 0 ) . In particular, there exists X = ( x 1 , , x N ) T : X = T L ( X ) . Let us calculate the ( i , j ) -th element of the matrix U T X V from (12) following Remark 3:
l = 1 r + 1 p = 1 r + 1 a l a p x i + l 1 + p + j 1 1 = l = 1 r + 1 p = 1 r + 1 a l a p x ( i + j 1 ) + p + l 2 = k = 1 2 r + 1 a k ( 2 ) x ( i + j 1 ) + k 1 = 0 ,
where a i ( 2 ) are coefficients of the vector a 0 2 ; see definition (17). The values of i + j 1 , where i runs from 1 to L r and j runs from 1 to K r , cover the entire interval from 1 to N 2 r , and therefore, (18) is equivalent to Q ( a 2 ) T X = 0 N 2 r . Thus, according to (16), all such X constitute Z ( a 2 ) . □

4.3. Limit Property of the Composition of Projections

First, consider the relation connecting the (semi-)norm Z W generated by the inner product (2) and the (semi-)norm Z L , R , which is generated by the inner product (4) (semi-norms arise if not to require nondegeneracy of the matrices W , L and R ). The mapping T L preserves the norm if Z W = Z L , R for any Z , where Z = T L ( Z ) , L and R are matrix weights and W is a weight matrix.
Define the convolution of the two matrices C = A B for A R L × L , B R K × K and C R N × N as
c i , j = i 1 + i 2 1 = i j 1 + j 2 1 = j a i 1 , j 1 b i 2 j 2 ,
where A = ( a i 1 , j 1 ) , B = ( b i 2 , j 2 ) and C = ( c i , j ) .
There is a known relation linking the weight matrix to the matrix weights.
Theorem 4
([8]). Z W = Z L , R for any series Z R N if and only if W = W ( L , R ) = L R .
All statements of the following theorem are a direct application of Theorem 1 and Proposition 1 with consideration to
P S 0 , ( L , R ) n = T L 1 Π H , ( L , R ) Π N ( S 0 ) , ( L , R ) T L n = T L 1 Π H , ( L , R ) Π N ( S 0 ) , ( L , R ) n T L
and T L 1 ( H N ( S 0 ) ) = Z ( a 0 2 ) .
Theorem 5.
Let S 0 be a time series governed by a GLRR( a 0 ), S 0 = T L ( S 0 ) . Then
1. 
The following operator convergence takes place:
Π H , ( L , R ) Π N ( S 0 ) , ( L , R ) n n + Π H N ( S 0 ) , ( L , R ) ,
that is, the limit of iterations is equal to the projection on the intersection of the subspaces H and N ( S 0 ) .
2. 
The following matrix convergence is valid:
P S 0 , ( L , R ) n n + Π Z ( a 0 2 ) , L R ,
where Π X , W denotes the matrix of projection on the linear subspace X in the norm generated by the inner product (2) with the weight matrix W .
3. 
All the eigenvalues of the matrix P S 0 , ( L , R ) lie within the unit circle; the eigenvalue λ = 1 , which only has modulus 1, is semi-simple and has multiplicity equal to 2 r ; the corresponding eigenvectors form the basis of Z ( a 0 2 ) . If the weight matrix W is nondegenerate,
( P S 0 , ( L , R ) n Π Z ( a 0 2 ) , L R ) X W C X W ( | λ ^ | + δ ) n ,
where C is some positive constant, δ > 0 is any small real number, λ ^ is the largest non-unitary (that is, ( 2 r + 1 ) -th) eigenvalue of the matrix P S 0 , ( L , R ) .

4.4. Influences of the Matrices L and R on the First Principal Angle and Convergence Rate of the Cadzow Method

Recall that if one wants to solve the problem (1) for a given weight matrix W using the Cadzow method, one must require W = L R . To obtain the maximum likelihood estimate, the matrix W is chosen as the inverse of the noise autocovariance matrix. However, for such weight matrices W in the case of white or autoregressive noise, there are no nondegenerate matrix weights L and R . Therefore, the matrix weights L and R are sought numerically under the upper bound 1 / α of the condition number of R for a given L .
The works [13,14] are devoted to the search for appropriate matrix weights. To find these weights, the matrix L was fixed in some specific way, and the diagonal matrix R was numerically found such that the measure of discrepancy considered in these papers was minimal: ρ ( W , L R ) min R provided cond R 1 / α , where 0 < α 1 . Simulations have shown that the control of matrix conditionality by the parameter α affects the convergence rate of the Cadzow method and the estimation accuracy. Decreasing α increases the set on which the matrix R is matched; this allows us to obtain a weight matrix with smaller values of ρ ( W , L R ) and leads to improved accuracy of the signal estimate. However, decreasing α also results in slower convergence of the iterations, which requires a theoretical explanation. A general theoretical dependence on α could not be obtained, so we demonstrate the convergence rate when α tends to zero.
We know from Theorem 5 that the convergence rate of the methods is affected by the ( 2 r + 1 ) -th eigenvalues of the matrix P S 0 , ( L , R ) . From Remark 2, we know that this eigenvalue is equal to the squared cosine of the ( 2 r + 1 ) -th angle between the corresponding subspaces.
The calculation of the cosine of the ( 2 r + 1 ) -th principal angle between H and N ( S 0 ) can be reduced to finding the cosine between the orthogonal (in the inner product · , · L , R , given in (4)) complements to H N ( S 0 ) in H and N ( S 0 ) , respectively. Let us denote these complements H ^ ( L , R ) and N ^ ( S 0 , L , R ) . Note that by Theorem 3, H N ( S 0 ) = T L ( Z ( a 0 2 ) ) , and its rank equals 2 r . For brevity, we will omit S 0 in N ^ ( S 0 , L , R ) and N ( S 0 ) .
The properties of the principal angles concerning Remark 1 lead directly to the following statement.
Proposition 3.
The cosine of the ( 2 r + 1 ) -th principal angle between the subspaces H and N is equal to cos ( H ^ ( L , R ) , N ^ ( L , R ) ) .

4.4.1. Example of Bad Convergence as α Tends to Zero

Using Proposition 3 and the fact that the cosine of the ( 2 r + 1 ) -th principal angle determines the convergence rate (Proposition 1), we show by a specific example that this cosine tends to one as α tends to zero; this makes the convergence of the linearized Cadzow algorithm as slow as possible.
Let N = 2 L and K = L + 1 . Let the matrix weights L = I L be the identity matrix, which corresponds to the case of white noise. For simplicity, we make the matrix R = R ( α ) R K × K equal to R = diag ( 1 , α , , α , 1 ) according to [13]. Numerical experiments show that similar matrix weights (up to multiplying a constant) are obtained when applying the method from [12] of finding weights for small α . Consider the degenerate case R ( 0 ) studied in [13].
Proposition 4.
For any matrix X R L × ( L + 1 ) , we can find a matrix Y H such that X Y I L , R ( 0 ) = 0 .
Proof. 
The inner product (4) of matrices Z 1 and Z 2 with the matrix weights L = I L and R ( 0 ) = diag ( 1 , 0 , , 0 , 1 ) depends only on the first and last columns of these matrices due to the form of R ( 0 ) . Therefore, the required matrix Y can be written in the form Y = Π H , ( I L , R ( 0 ) ) X , where the projection is calculated uniquely using the values from the first and last columns of the matrix X . □
Corollary 2.
Let X R L × ( L + 1 ) be an arbitrary matrix, X ^ = X Π T L ( Z ( a 0 2 ) ) , ( I L , R ( 0 ) ) X , Y ^ = Π H ^ ( I L , R ( 0 ) ) , ( I L , R ( 0 ) ) X ^ . Then, X ^ Y ^ I L , R ( 0 ) = 0 .
Proof. 
The corollary is proved by the following substitution:
Π H ^ ( I L , R ( 0 ) ) , ( I L , R ( 0 ) ) = Π H , ( I L , R ( 0 ) ) Π T L ( Z ( a 0 2 ) ) , ( I L , R ( 0 ) ) . □
Corollary 3.
Let N N ( S 0 ) . Denote N ^ ( α ) = Π N ^ ( I L , R ( α ) ) , ( I L , R ( α ) ) N and H ^ ( α ) = Π H ^ ( I L , R ( α ) ) , ( I L , R ( α ) ) N ^ ( α ) . Then, the following is fulfilled:
N ^ ( α ) I L , R ( α ) α 0 N ^ ( 0 ) I L , R ( 0 ) , H ^ ( α ) I L , R ( α ) α 0 H ^ ( 0 ) I L , R ( 0 ) , N ^ ( 0 ) I L , R ( 0 ) = H ^ ( 0 ) I L , R ( 0 ) , N ^ ( α ) , H ^ ( α ) I L , R ( α ) α 0 N ^ ( 0 ) I L , R ( 0 ) 2 .
Proof. 
Since N ^ ( α ) = N Π T L ( Z ( a 0 2 ) ) , ( I L , R ( α ) ) N , this expression is smooth in α and uniquely defined when α = 0 . The projection Π H ^ ( I L , R ( α ) ) , ( I L , R ( α ) ) is also smooth in α and uniquely defined at α = 0 . Applying Corollary 2 for X = N and α = 0 , we obtain that N ^ ( 0 ) H ^ ( 0 ) I L , R ( 0 ) = 0 ; this yields the statements of the corollary due to the smoothness of the left-side and right-side expressions. □
The declared result is described in the following proposition.
Proposition 5.
Let L > 2 r > 0 and N = 2 L . Then,
cos ( H ^ ( I L , R ( α ) ) , N ^ ( I L , R ( α ) ) α 0 1 .
Proof. 
Note that the cosine of the angle between a vector and a subspace (see Definition 1) is equal to the cosine between this vector and its projection to the subspace. Thus, we can use Corollary 3. It follows from Corollary 3 that the numerator and denominator in the definition of cosine are equal, and therefore, it is sufficient to prove that N ^ ( 0 ) I L , R ( 0 ) 0 . The latter can be shown by constructing such a matrix N ^ ( 0 ) from the trajectory matrix S 0 = T L ( S 0 ) .
Let S T be a row of the matrix S 0 with a non-zero element at the first or last position (such a row always exists since S 0 0 ). Without loss of generality, we suppose that the first element of S T is not zero. Denote G : R N R L the mapping that truncates the vector to the first L coordinates. Then, let Q R L be a nonzero vector, which does not lie in G ( Z ( a 0 2 ) ) , and N = Q S T . By construction, N N ( S 0 , L , R ) and N ^ ( 0 ) have a nonzero first column, and therefore, N ^ ( 0 ) I L , R ( 0 ) 0 . □
Thus, for a popular case N = 2 L , we have shown that the cosine of the ( 2 r + 1 ) -th principal angle can be as close to 1 as one wishes; hence, the convergence of the Cadzow method becomes as slow as one chooses with a small α . Recall that the sense of choosing a small α is to increase the estimation accuracy.

4.4.2. Numerical Example for the Dependence of Convergence on α

Figure 1 shows the ( 2 r + 1 ) -th eigenvalue of the P S 0 , ( L , R ) (the squared cosine of the ( 2 r + 1 ) -th principal angle between the corresponding subspaces) for the series S 0 = ( s 1 , , s N ) T with the common term s k = sin 2 π k 6 and N = 40 ; the window length is L = 20 and the signal rank is r = 2 . The values α are taken from [ 10 4 , 1 ] on the logarithmic scale, and the matrix R is calculated using the weight search method from [14]. Figure 1 demonstrates that the eigenvalues are close to 1 for small α ; however, they do not equal one; that is, the Cadzow iterations converge. This result explains the decrease in the convergence rate of the Cadzow method with decreasing values of the parameter α , which was numerically demonstrated in [13,14].

5. Mixed Cadzow Iterations

5.1. Convergence of Linearized Mixed Iterations

Let us introduce a linearized analogue of the mixed Cadzow algorithm (6), using the matrix P S 0 , ( L , R ) defined in (14). Instead of (6), we will study
X T = P S 0 , ( L 2 , R 2 ) T 2 P S 0 , ( L 1 , R 1 ) T 1 X 0 ,
where T = T 1 + T 2 . Since this formula uses only linear operators, the following theorem follows directly from Theorem 2.
Theorem 6.
Let S 0 be a time series governed by the GLRR( a 0 ). The following relations are true for iterations (20), where convergence is in the operator norm:
1. 
P S 0 , ( L 2 , R 2 ) T 2 P S 0 , ( L 1 , R 1 ) T 1 Π Z ( a 0 2 ) , L 1 R 1 if T 2 is fixed and T 1 + .
2. 
There is a symmetric positive-definite matrix Q R N × N such that
P S 0 , ( L 2 , R 2 ) T 2 P S 0 , ( L 1 , R 1 ) T 1 Π Z ( a 0 2 ) , Q if T 1 is fixed and T 2 + .
Thus, to achieve fast and accurate estimates, the weights ( L 1 , R 1 ) should correspond to more accurate but possibly slower-converging Cadzow iterations, and the weights ( L 2 , R 2 ) should correspond to less accurate but faster-converging iterations. The convergence rate of the corresponding operators is explained by Proposition 1. As follows from the discussion in Section 4.4, the pair of matrices ( L 1 , R 1 ) is natural to choose as an approximation of the optimal weight matrix W with a small parameter α , and the pair ( L 2 , R 2 ) corresponds to a larger value α .

5.2. Numerical Experiments

In this section, we demonstrate the result of Theorem 6 through a numerical experiment. Consider the problem of estimating a finite-rank signal when it is corrupted by noise. As in Section 4.4.2, the signal has the form S 0 = ( s 1 , , s N ) T , where s k = sin 2 π k 6 and N = 40 . The observed time series is X = S 0 + ε , where ε is Gaussian white noise with zero mean and standard deviation σ = 0.1 .
Let us apply iterations given in (6) for estimating the signal S 0 . Set L = 20 and r = 2 . Let us take T 1 iterations with one of three values of α , 0.5, 0.1 and 0.01, and then T 2 iterations with α = 1 . For the second iterations, we consider the identity matrices L 2 and R 2 (this choice corresponds to α = 1 ). For the choice of ( L 1 , R 1 ) , we set L 1 as the identity matrix, and for a given α , we calculated the matrix of weights R 1 using the method described in [14].
We denote the results of the iterations Y and the signal estimate S ^ = T L 1 ( Y ) . For the linearized version (20), we can use the theoretical considerations. For the Cadzow mixed iterations (6), we have no theoretical results and estimated the error using I = 1000 simulations of time series with different realizations of noise. The accuracy of the estimate was measured by RMSE (root mean square error) and estimated as the square root of the sample mean of I realizations of i = 1 N ( s ^ i s i ) 2 / N .
The estimate of the signal S using the linearized Cadzow algorithm has the form S ˜ = S 0 + Π Z ( a 0 2 ) , Q ε , where Π Z ( a 0 2 ) , Q was taken from the second statement of Theorem 6 and GLRR( a 0 ) governs the signal S 0 . To calculate RMSE of S ˜ , we consider its autocovariance matrix Σ S ˜ = Π Z ( a 0 2 ) , Q Σ Π Z ( a 0 2 ) , Q T , where Σ = I N is the autocovariance matrix of ε . Then, RMSE can be calculated by taking the square root of the average value of the diagonal elements of Σ S ˜ , since the i-th element of the diagonal is the variance of s ˜ i .
The results are shown in Figure 2. The legend denotes the values of α for obtaining the matrix weights R 1 , and which method was used (“Linear” corresponds to linearization (20), and “True” corresponds to the original Cadzow method (6)). To demonstrate the statement 2 of Theorem 6, we took a large T 2 , which numerically corresponds to the case T 2 = .
We see that the results predicted for the linearized Cadzow iterations and estimated by simulation for the original Cadzow iterations are similar.
To demonstrate that the proposed approach allows one to improve the computational costs while keeping the accuracy high, let us consider the dependence of RMSE on T 1 + T 2 , where the total number of iterations T 1 + T 2 changes from 1 to 10 and α = 0.02 .
In Figure 3, the case T 1 = 0 (marked as “ 0.02 × 0 + 1 × 10 ”) corresponds to the case of fast convergence but low accuracy for standard Cadzow iterations with α = 1 . The case T 1 = 10 (“ 0.02 × 10 + 1 × 0 ”) corresponds to the case of slow convergence but has good accuracy for Cadzow iterations with α = 0.02 . The mixed iterations with two and four iterations with α = 0.02 and then iterations with α = 0.1 (“ 0.02 × 2 + 1 × 8 ” and “ 0.02 × 4 + 1 × 6 ” respectively) show their advantage in computational cost. One can see that one iteration of the second type is sufficient to almost achieve the limiting accuracy. Thereby, for example, four iterations with α = 0.02 and one iteration with α = 1 provide approximately the same accuracy and two-times faster than 10 iterations with α = 0.02 .

6. Discussions

In the previous section, we demonstrated the potential advantage of mixed Cadzow iterations. The following questions arise for use in practice. The first important issue to discuss is the stability of the demonstrated effect. One source of perturbation is that the signal rank can be inaccurately estimated. Of course, if the rank is underestimated, most methods fail. The question is the robustness of the method to rank overestimation. Let us take the time series from Section 5.2 with a signal of rank 2 and consider a projection onto a subspace of rank 4 in alternating projections (target rank 4). Figure 4 shows that the general error behaviour is similar to that in Figure 3; the only differences are the convergence rate and the number of accurate steps required.
The second example is related to the stability of the result concerning the signal rank. We changed the signal from the previous example to s n = sin ( 2 π n / 6 ) + 0.5 sin ( 2 π n / 8 ) 0.3 sin ( 2 π n / 12 ) . Figure 5 demonstrates the same effect: similar behavior of the error dynamics and a lower rate of convergence. Note that the convergence for the case of rank 6 is much faster than for the overestimated rank 2.
Both examples show that the number of first accurate iterations depends on the structure of the series. Therefore, the choice of the number of accurate steps should be investigated in the future.
The next point of discussion is the possibility of speeding up the implementation of Cadzow iterations. Ideas from [21], including the randomized linear algebra approach [22,23], can be applied to Cadzow iterations. Combining the mixed alternating projection approach and approaches to single iteration acceleration may be a challenge for future research.

7. Conclusions

In this paper, an approach for improving the alternating projection method by combining the alternating projections in two equivalent norms was proposed and justified. The approach in general form was considered for the case of projections to linear subspaces (Theorem 2). Then, this approach was applied to the problem of Hankel low-rank approximation, which is considered for the estimation of low-rank signals and is not linear. The theoretical results were obtained using the linearization (13). Numerical experiments confirmed that the linearization provides the correct conclusions for the convergence of the mixed Cadzow iterations (6). Using the theory of principal angles, it was shown why improving the estimation accuracy leads to increasing the computational cost of the conventional Cadzow iterations (Section 4.4). Using the features of the projector compositions, it was shown how the signal estimation can be improved by a combination of iterations with different norms (Theorem 6).
As a result, the algorithm, which modifies the Cadzow weighted iterations by combining a good accuracy of slow iterations and a fast convergence of inaccurate iterations, was proposed.
The numerical example in Section 5.2 demonstrates the effectiveness of the elaborated approach. The examples are implemented in R using the R-package rhlra, which can be found at https://github.com/furiousbean/rhlra (accessed on 13 October 2022); see the R-code of the examples in Supplementary Materials.

Supplementary Materials

The R-code for producing the numerical results that are presented in the paper can be downloaded at https://www.mdpi.com/article/10.3390/a15120460/s1.

Author Contributions

Conceptualization, N.Z. and N.G.; methodology, N.G.; software, N.Z.; validation, N.Z. and N.G.; formal analysis, N.Z. and N.G.; investigation, N.Z. and N.G.; resources, N.G.; writing—original draft preparation, N.Z. and N.G.; writing—review and editing, N.Z. and N.G.; visualization, N.Z.; supervision, N.G.; project administration, N.G.; funding acquisition, N.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by RFBR, project number 20-01-00067.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank the two anonymous reviewers whose comments and suggestions helped to improve and clarify this manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
SLRAStructured Low-Rank Approximation
(G)LRR(Generalized) Linear Recurrence Relation
RMSERoot Mean Square Error

References

  1. Hall, M. Combinatorial Theory; Wiley-Interscience: Hoboken, NJ, USA, 1998. [Google Scholar]
  2. Van Der Veen, A.J.; Deprettere, E.F.; Swindlehurst, A.L. Subspace-based signal analysis using singular value decomposition. Proc. IEEE 1993, 81, 1277–1308. [Google Scholar] [CrossRef] [Green Version]
  3. Gonen, E.; Mendel, J. Subspace-based direction finding methods. In Digital Signal Processing Handbook; Madisetti, V., Williams, D., Eds.; CRC Press: Boca Raton, FL, USA, 1999; Volume 62. [Google Scholar]
  4. Gillard, J. Cadzow’s basic algorithm, alternating projections and singular spectrum analysis. Stat. Interface 2010, 3, 335–343. [Google Scholar] [CrossRef] [Green Version]
  5. Heinig, G.; Rost. Algebraic Methods for Toeplitz-like Matrices and Operators (Operator Theory: Advances and Applications); Birkhäuser Verlag: Basel, Switzerland, 1985. [Google Scholar]
  6. Gillard, J.; Usevich, K. Hankel low-rank approximation and completion in time series analysis and forecasting: A brief review. Stat. Interface 2022, accepted. [Google Scholar]
  7. Allen, G.I.; Grosenick, L.; Taylor, J. A generalized least-square matrix decomposition. J. Am. Stat. Assoc. 2014, 109, 145–159. [Google Scholar] [CrossRef] [Green Version]
  8. Golyandina, N.; Zhigljavsky, A. Blind deconvolution of covariance matrix inverses for autoregressive processes. Linear Algebra Appl. 2020, 593, 188–211. [Google Scholar] [CrossRef] [Green Version]
  9. Cadzow, J.A. Signal enhancement: A composite property mapping algorithm. IEEE Trans. Acoust. 1988, 36, 49–62. [Google Scholar] [CrossRef] [Green Version]
  10. Escalante, R.; Raydan, M. Alternating Projection Methods; SIAM: Philadelphia, PA, USA, 2011. [Google Scholar]
  11. Usevich, K.; Markovsky, I. Variable projection for affinely structured low-rank approximation in weighted 2-norms. J. Comput. Appl. Math. 2014, 272, 430–448. [Google Scholar] [CrossRef]
  12. Zvonarev, N.; Golyandina, N. Fast and stable modification of the Gauss–Newton method for low-rank signal estimation. Numer. Linear Algebra Appl. 2022, 29, e2428. [Google Scholar] [CrossRef]
  13. Zvonarev, N.; Golyandina, N. Iterative algorithms for weighted and unweighted finite-rank time-series approximations. Stat. Interface 2017, 10, 5–18. [Google Scholar] [CrossRef] [Green Version]
  14. Zvonarev, N.K. Search for Weights in the Problem of Finite-Rank Signal Estimation in the Presence of Random Noise. Vestn. St. Petersburg Univ. Math. 2021, 54, 236–244. [Google Scholar] [CrossRef]
  15. Von Neumann, J. Functional Operators: The Geometry of Orthogonal Spaces; Annals of Mathematics Studies, Princeton University Press: Princeton, NJ, USA, 1950. [Google Scholar]
  16. Meyer, C.D. Matrix Analysis and Applied Linear Algebra; Society for Industrial and Applied Mathematics (SIAM): Philadelphia, PA, USA, 2001. [Google Scholar]
  17. Bauschke, H.H.; Cruz, J.Y.B.; Nghia, T.T.A.; Phan, H.M.; Wang, X. Optimal Rates of Linear Convergence of Relaxed Alternating Projections and Generalized Douglas-Rachford Methods for Two Subspaces. Numer. Algorithms 2016, 73, 33–76. [Google Scholar] [CrossRef]
  18. Bjorck, A.; Golub, G.H. Numerical Methods for Computing Angles between Linear Subspaces. Math. Comput. 1973, 27, 579. [Google Scholar] [CrossRef] [Green Version]
  19. Lewis, A.S.; Malick, J. Alternating projections on manifolds. Math. Oper. Res. 2008, 33, 216–234. [Google Scholar] [CrossRef] [Green Version]
  20. Usevich, K. On signal and extraneous roots in Singular Spectrum Analysis. Stat. Interface 2010, 3, 281–295. [Google Scholar] [CrossRef] [Green Version]
  21. Wang, H.; Cai, J.F.; Wang, T.; Wei, K. Fast Cadzow’s Algorithm and a Gradient Variant. J. Sci. Comput. 2021, 88, 41. [Google Scholar] [CrossRef]
  22. Erichson, N.B.; Mathelin, L.; Kutz, J.N.; Brunton, S.L. Randomized Dynamic Mode Decomposition. SIAM J. Appl. Dyn. Syst. 2019, 18, 1867–1891. [Google Scholar] [CrossRef] [Green Version]
  23. Minster, R.; Saibaba, A.K.; Kar, J.; Chakrabortty, A. Efficient Algorithms for Eigensystem Realization Using Randomized SVD. SIAM J. Matrix Anal. Appl. 2021, 42, 1045–1072. [Google Scholar] [CrossRef]
Figure 1. (A) The ( 2 r + 1 ) -th eigenvalue of P S 0 , ( L , R ) depending on α ; (B) gap between 1 and ( 2 r + 1 ) -th eigenvalues of P S 0 , ( L , R ) in the logarithmic scale.
Figure 1. (A) The ( 2 r + 1 ) -th eigenvalue of P S 0 , ( L , R ) depending on α ; (B) gap between 1 and ( 2 r + 1 ) -th eigenvalues of P S 0 , ( L , R ) in the logarithmic scale.
Algorithms 15 00460 g001
Figure 2. RMSE (root mean square error) depending on T 1 and α . The signal rank 2 and the target rank 2.
Figure 2. RMSE (root mean square error) depending on T 1 and α . The signal rank 2 and the target rank 2.
Algorithms 15 00460 g002
Figure 3. RMSE depending on T 1 + T 2 . The signal rank 2 and the target rank 2.
Figure 3. RMSE depending on T 1 + T 2 . The signal rank 2 and the target rank 2.
Algorithms 15 00460 g003
Figure 4. RMSE depending on T 1 + T 2 . The signal rank 2 and the target rank 4.
Figure 4. RMSE depending on T 1 + T 2 . The signal rank 2 and the target rank 4.
Algorithms 15 00460 g004
Figure 5. RMSE depending on T 1 + T 2 . The signal rank 6 and the target rank 6.
Figure 5. RMSE depending on T 1 + T 2 . The signal rank 6 and the target rank 6.
Algorithms 15 00460 g005
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zvonarev, N.; Golyandina, N. Mixed Alternating Projections with Application to Hankel Low-Rank Approximation. Algorithms 2022, 15, 460. https://doi.org/10.3390/a15120460

AMA Style

Zvonarev N, Golyandina N. Mixed Alternating Projections with Application to Hankel Low-Rank Approximation. Algorithms. 2022; 15(12):460. https://doi.org/10.3390/a15120460

Chicago/Turabian Style

Zvonarev, Nikita, and Nina Golyandina. 2022. "Mixed Alternating Projections with Application to Hankel Low-Rank Approximation" Algorithms 15, no. 12: 460. https://doi.org/10.3390/a15120460

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop