A Gradient System for Low Rank Matrix Completion

: In this article we present and discuss a two step methodology to ﬁnd the closest low rank completion of a sparse large matrix. Given a large sparse matrix M , the method consists of ﬁxing the rank to r and then looking for the closest rank-r matrix X to M , where the distance is measured in the Frobenius norm. A key element in the solution of this matrix nearness problem consists of the use of a constrained gradient system of matrix differential equations. The obtained results, compared to those obtained by different approaches show that the method has a correct behaviour and is competitive with the ones available in the literature.


Introduction
A large class of datasets are naturally stored in matrix form.In many important applications, the challenge of filling a matrix from a sampling of its entries can arise; this is known as the matrix completion problem.Clearly, such a problem needs some additional constraints to be well-posed.One of its most interesting variants is to find the lower rank matrices that best fit the given data.This constrained optimization problem is known as low-rank matrix completion.
Let M ∈ R m×n be a matrix that is only known on a subset, Ω, of its entries.In [1], the authors provided conditions on the sampling of observed entries, such that the problem which arises has a high probability of not being undetermined.The classical mathematical formulation for the low rank matrix completion problem is : min rank(X) s.t.P Ω (X) = P Ω (M) where P Ω is the projection onto Ω defined as a function This approach may seem like the most natural to describe the problem, but it is not very useful in practice, since it is well known to be NP-hard [2].In [3], the authors stated the problem as min ||X|| * s.t.P Ω (X) = P Ω (M) where || || * is the nuclear norm of the matrix, which is the sum of its singular values.This is a convex optimization problem and the authors proved that when Ω is sampled uniformly at random and is sufficiently large, the previous relaxation can recover any matrix of rank r with high probability.We will consider the following formulation as in [4,5], ||P Ω (X) − P Ω (M)|| 2 F s.t.rank(X) = r Notice that, the projection P Ω (X) can be written as a Hadamard product.If we identify the subset Ω of the fixed entries with the matrix Ω such that it is clear that P Ω (X) = Ω • X.By considering the manifold M r = {X ∈ R n×m : rank(X) = r} we can write the problem as This approach is based on the assumption of knowing in advance the rank r of the target matrix.A key feature of the problem is that r min{m, n}, that translates, from a practical point of view in a small increase of the cost to eventually update r.In [6] is well explained the possibility of estimating the rank (unknown a priori) based on the gap between singular values of the "trimmed" partial target matrix M. Furthermore, the authors highlight that in collaborative filtering applications, r ranged between 10 and 30.In [4], the author employed optimization techniques widely exploiting the structure of smooth Riemaniann manifold of M r .The same tools are used in [5], where the authors considered the matrix completion in the presence of outliers.In the recent work [7], the authors provide a non convex relaxation approach for matrix completion in presence of non Gaussian noise and or outliers, by employing the correntropy induced losses.In [8], the authors survey on the literature on matrix completions and deal with target matrices, whose entries are affected by a small amount of noise.Recently, the problem became popular thanks to collaborative filtering applications [9,10] and the Netflix problem [11].It can also be employed in other fields of practical applications such as sensor network localization [12], signal processing [13] and reconstruction of damaged images [14].A very suggestive use of modeling as low rank matrix completion problem has been done in biomathmatics area, as shown in [15] for gene-disease associations.Applications to minimal representation of discrete systems can be considered as of more mathematical feature [16].What makes the problem interesting are not just the multiple applications, but also its variants, such as, for example, structured [17] and Euclidean distance matrix cases [18].In this paper a numerical technique to solve the low rank matrix completion problem is provided, which makes use of a gradient system of matrix ODEs.

General Idea : Two-Level Method
Let us write the unknown matrix X of the problem (1) as X = εE with ε > 0 and ||E|| F = 1 .For a fixed norm ε > 0, we aim to minimize the functional constrained by E ∈ M r and ||E|| F = 1 (see [19]).By computing the stationary point of a suitable differential equation, we will find a local minimum E ε of the functional.Setting f (ε) = F ε (E ε ), we will look for the minimum value of ε, say ε * , such that f (ε * ) = 0, by using a Newton-like method.The behaviour of f (ε) in a left neighbourhood of ε * is well understood.For ε ≥ ε * it is more challenging.
We discuss two possible scenarios; ε * can be a strict local minimum point or f (ε) can become identically zero when ε exceeds ε * .The two situations depend on the rank constraint and on the sparsity pattern.To motivate our assumption, we now present two simple illustrative examples.Suppose that we aim to recover the matrix If we constrain the problem by imposing that the rank of the solution has to be equal to 1, we have a strict point of minimum for ε * = 3 and the optimal rank-1 matrixthat fits perfectly the given entries of M is If we consider the problem of recovering the matrix requiring that the solution has to be of rank 2, we have that the solutions of minimal norm ε * = 2.6458 are However, for all ε > ε * , we have a point of minimum of f (ε).To understand this behaviour, we can intuitively think that there are a lot of "possibilities" to realize a rank-2 matrix "filling" the unknown entries of Y.For example, are families of solutions of the problem.In the Figure 1, we show the graphics of f (ε) for the two problems considered.The paper is structured as follows.In Sections 3 and 5 we discuss the two level method designed to solve the problem (1).A characterization of local extremizers for the functional (2) is given in Section 4. In Section 6 we present a suitable splitting method for rank-r matricial ODEs employed in the context of the inner iteration.Finally, numerical experiments are showed in Section 7.

Differential Equation for E
3.1.Minimizing F ε (E) for Fixed ε Suppose that E(t) is a smooth matrix valued function of t. (We omit the argument t of E(t)).Our goal is to find an optimal direction Ė = Z (see [19,20]) such that the functional (2) is characterized by the maximal local decrease, in a way that the matrix E remains in the manifold M r .
To deal with this goal, we differentiate (2) with respect to t and since by definition of Thus, we have which identifies G as the free gradient of the functional.We have now to include the constraint we gain a linear constraint for Ė.By virtue of the rank constraint in (5), we must guarantee that the motion of E remains in the manifold M r for all t.In order to get it, we require the derivative Ė to lie in the tangent space in E to M r , for all t.These considerations led us to the following optimization problem.
where T E M r denotes the tangent space to M r at E. The constraint ||Z|| F = 1 is simply introduced to get a unique direction Z.In the following, we will denote by P E (•) the orthogonal projection on T E M r .

Rank-r Matrices and Their Tangent Matrices
See [21].Every real rank-r matrix E of dimension n × m can be written in the form where U ∈ R n×r and V ∈ R m×r have orthonormal columns, i.e., and nonsingular S ∈ R r×r .In particular, when S is diagonal, we find the SVD.The decomposition ( 6) is not unique; simply replacing U by U = UP and V by V = VQ with orthogonal matrices P, Q ∈ R r×r and S by Ŝ = P T SQ, we get the same matrix E = USV T = Û Ŝ VT .However, we can make the decomposition unique in the tangent space.For all E in the manifold M r , let us consider the tangent space T E M r .It is a linear space and every tangent matrix is of the form where Ṡ ∈ R r×r , U T U and V T V are skew-symmetric r × r matrices.Ṡ, U, V are uniquely determined by Ė and U, V, S by imposing the gauge conditions We consider the following important result from [21], thanks to which it is possible to obtain a formula for the projection of a matrix onto the tangent space to a rank-r matrices.
Lemma 1.The orthogonal projection onto the tangent space T E M r at E = USV T ∈ M r is given by

Steepest Descent Dynamics
Lemma 2. Let E ∈ R n×m be a real matrix of unit Frobenius norm, such that it is not proportional to P E (G).Then, the solution of (5) is given by where µ is the reciprocal of the Frobenius norm on the right-hand side.

Proof.
Let be E ⊥ = {Z ∈ R n×m : E, Z = 0}.The function Z, G is an inner product and the feasible region R = E ⊥ ∩ T E M r is a linear subspace, since it is intersection of subspaces.By observing that the inner product with a given vector is minimized over a subspace by orthogonally projecting the vector onto the subspace, we can say that the solution of ( 5) is a matrix proportional to the normalized orthogonal projection of the free gradient G onto R. Therefore, , since P E ⊥ and P E commute.Since ||E|| F = 1, we have that the solution is given by (7).
The Expression (4), jointly with the Lemma 2 suggest to consider the following gradient system for F ε (E) To get the differential equation in a form involving the factors in E = USV T , we use the following result.
Lemma 3 (See [21]).For E = USV T ∈ M r , with nonsingular S ∈ R r×r and with U ∈ R n×r and R n×r having orthonormal columns, the equation In our case Z = −P E (G) + E, P E (G) E, this yields that the differential Equation ( 8) for E = USV T is equivalent to the following system of differential equations for S, V, U, The following monotonicity result is an immediate consequence of the fact that the differential equation is the gradient flow for F on the manifold of matrices, of fixed rank r, and unit norm.
Theorem 1.Let be E(t) ∈ M r a solution of unit Frobenius norm of the matrix differential Equation (8).Then Therefore, using (8),

Stationary points
Since we are interested to minimize F ε (E), we focus on the equilibria of (8), which represents local minima of (2).Lemma 4. The following statements are equivalent along the solutions of (8): Proof.From the expression (4), clearly (b) implies (a).
Supposing (c), we can write P E (G) = αE with α ∈ R and by substitution in (8) we get that is (b).So, it remains to show that (a) implies (c).Note that: So, since ε > 0, we have The following result characterizes the local extremizers.
Theorem 2. Let E * ∈ M r be a real matrix of unit Frobenius norm.Then, the following two statements are equivalent: (a) Every differentiable path E(t) ∈ M r (for small t ≥ 0 ) with ||E|| F = 1 and E(0 Proof.The strategy of the proof is similar to [22].Assume that (a) does not hold.Then, there exists Thus, Lemma 2 shows that also the solution path of (8) passing through E * is such a path.So E * is not a stationary point of ( 8), and according to the Lemma 4, it is not a real multiple of P E (G).
If E * is not a multiple of P E (G), then E * is not a stationary point of ( 8) and Theorem 1 and Lemma 4 ensure that d dt F ε (E(t)) ≤ 0 along the solution path of (8).
This path is such that in contradiction with (a).

Numerical Solution of Rank-r Matrix Differential Equation
We have seen that the matrix ODE ( 8) is equivalent to the system (9), involving the factors of the decomposition (6).In (9), the inverse of S appear.Therefore, when S is nearly singular, problems of stability can arise, working with a standard numerical methods for ODEs .To avoid this difficulties, we employ the first order projector-splitting integrator of [23].The algorithm directly approximate the solution of the Equation (8).It starts from the normalized rank r matrix E 0 = U 0 S 0 V * 0 at the time t 0 , obtained by the SVD of the matrix to recover.At the time t 1 = t 0 + h, one step of the method works as follows

Projector-splitting integrator
Data: E 1 is taken as approximation to E(t 1 ).All the nice features of the integrator are presented in [23], but it is already clear that, there is no matrix inversion in the steps of the algorithm.

Iteration on ε
In this section we show the outer iteration to manage ε (see [22]).

Qualitative Tools
For every fixed ε > 0, the gradient system (8) returns a stationary point E(ε) of unit Frobenius norm that is a local minimum of F ε .
Setting f (ε) = F ε (E(ε)), our purpose is to solve the problem min{ε > 0 : f (ε) = 0} employing a Newton-like method.We assume that E(ε) is a smooth function of ε, so that, also the function f (ε) = F ε (E(ε)) is differentiable with respect to ε.Let us focus on its derivative, By the expression of the free gradient (3), it is clear that This means that ε * is a double root for f (ε).

Numerical Approximation of ε *
The presence of the double root ensures that f (ε) is convex for ε ≤ ε * , therefore, the classical Newton method will approach ε * from the left.In this case, we are not able to find an analytical formulation for the derivative f (ε), so we approximate it with backward finite differences.

Algorithm for computing ε *
Data: the matrix M to recover is given, tol, ε 0 ,ε 1 such that f (ε

Numerical Experiments
In the following experiments we randomly generate some matrices of low rank.As in [3,4], r is the fixed rank, we generate two random matrices A L , A R ∈ R n×r with i.i.d.standard Gaussian entries.We build the matrix A = A L A T R and generate a uniformly distributed sparsity pattern Ω.We work on the matrix M, that is the matrix resulting from the projection of A onto the pattern Ω.In this way we are able to compare the accuracy of the matrix solution of our code with the true solution A. As stopping criteria for the integrator of the ODE in the inner level, we use where tol is an input tolerance parameter together with a maximum number of iterations and a minimum value for the integrator stepsize.We provide a stepsize control that reduces the step h with a factor γ (the default value is 1.25), when the functional is not decreasing, but increases the step as hγ when the value of the objective decreases with respect to the previous iteration.Some computational results are shown in the Tables 1 and 2. In particular, they show the values of the cost function evaluated in ε * , computed by the outer method, thanks to which, the accuracy of the method when we recover matrices of different rank and different dimension is highlight.Observe that the presence of the double root would allow us to use a modified Newton iteration (from the left) getting quadratic convergence.Since our purpose is to find an upper bound for ε * , if it should happen that ε k > ε * , we need a bisection iteration to preserve the approximation from the left.Furthermore, we can observe that, if we indicate by g 2 (ε), therefore they have common zeros.This allows us to employ the function g(ε) instead of f (ε) in the outer level, joining classical Newton and bisection.In practice, this results to be the most efficient approach.Tables 3 and 4 show the behaviours of the two alternative approaches on a test matrix M of dimension 150 × 150, ≈50% of known entries and rank 15.We tested also the classic Newton method on M. The comparison is summarized in Tables 5 and 6.The accuracy is the same for all the choices, but in the case of selecting g instead of f , the computational cost is sharply reduced, both in terms of number of iterations and in terms of timing.

Experiments with Quasi Low Rank Matrices
The following simulations are devoted to check the "robustness" of the method with respect to small perturbations of the singular values.More precisely, we consider a rank r matrix A, built as introduced in this section, and we perturbe it in order to get a matrix A P of almost rank r.In other words, we aim to get a matrix A P that has r significant singular values, whereas the remaining ones become very small.Let A = UΣV T be the SVD decomposition of A, therefore Σ is diagonal with only the first r diagonal values different form zero.If Σ is the diagonal matrix such that the first r diagonal entries are zero and the remaining ones (all or a part of them) are put equal to random small values, we build A P as where, Table 7 shows the numerical results, obtained by considering a rank 9 matrix A, of size 200 and perturbations of different amplitude.The columns σ r+1,r+1 and err contain the orders of magnitude of the greater perturbed singular value, and the values of the real error, computed as the Frobenius distance between A and the optimal matrix, that comes out the code, respectively.As it is natural to expect, the optimal matrix remains close to A, but the error is affected by the perturbations as they grow.
Another interesting example in terms of robustness, when we work with quasi low rank matrices, is given by considering matrices with exponentially decaying singular values.In particular we build a n × n matrix A, which singular values are given by the sequence {exp(−x i )} i=1,..,n where x 1 ≤ x 2 ≤ ... ≤ x n are random increasing numbers in an interval [a, b].We build the matrix M to recover, by projecting A onto a sparsity pattern Ω.In the following experiment we fix n = 100, a = 20 and b = 1.The singular values of A range in the interval [0.3511, 2.0870 × 10 −9 ], and the mask M has about 30% of known elements.We choose the values of rank in the experiments for the completion by considering the changes of order of magnitude of singular values.Table 8 shows the results, in particular, the value of the cost function f (ε * ) which we compare to f , the one given by the code of [4].

Experiment with Theoretical Limit Number of Samples
In the seminal paper [1], the authors focus on determine a lower bound for the cardinality Ω of the known set of entries, such that it is possible recovering the matrix with high probability.In particular they proved that, most of n × n matrix of rank r (assumed not too large) can be perfectly recovered solving a convex optimization problem, if |Ω| ≥ Cn 1.2 rlogn, for some positive constant C. Table 9 shows the results when we compare our code with the method in [4].In particular, we present the best value of the objective functions, the real errors and the computational times.We consider n = 50, r = 3, therefore, according with the previous bound, we have to set |Ω| ≥ C 1.2832 × 10 3 .This means that, for C = 1, the corresponding target matrix M will have ≈51.33% of given entries.Given a test matrix M, our purpose, in this section, is to understand the behaviour of the cost function, when we set the rank different from the exact one.In particular, we consider a matrix M, with ≈44.71% of known elements, of dimension 70 × 70, and such that the exact rank of the solution is 8.We compute the values of the cost function evaluated in the best fixed rank k approximation (say b k ) and in the solution given by our code (say f k ).For every fixed rank k, the error is given by

Conclusions
The matrix completion problem consists of recovering a matrix from a few samples of its entries.We formulate the problem as a minimization of the Frobenius distance on the set of the fixed entries, over the manifold of the matrices of fixed rank.In this paper we introduce a numerical technique to deal with this problem.The method works on two levels; the inner iteration computes the fixed norm matrix that best fits the data entries, solving low rank matrix differential equations, the outer iteration optimizes the norm by employing a Newton-like method.A key feature of the method is to avoid the problem of the lack of vector space structure for M r , moving the dynamics in the tangent space.Numerical experiments show the high accuracy of the method and its robustness with respect to small perturbations of the singular values.However, in presence of very challenging problems it could be suitable to relax tolerance parameters.The method is particularly suited for problems for which guessing the rank is simple.In the field of research on low rank matrix completion, it would be useful to study real databases types of matrices in order to try to establish gaps for the values of the rank.Moreover, since this is a typical context, in which we work with very large matrices, future work could be devoted to develop methods working in parallel.Structured variants, such as nonnegative low rank completions, are suggested from applications.These may be subject of a future work.

Figure 1 .
Figure 1.The figure on the left represents the graphic of f (ε) when we consider the problem of recovering M by a rank-1 matrix.The figure in the right shows that f (ε) is identically equals to zero, when ε ≥ ε * , if we require the rank of the solution to be equal to 2, when we compete Y.

F
The results are shown in the Table 10(a).

Table 10 .
The table shows the values of the cost function for different value of the rank and the relative errors.The true rank is 8.The values of the objective for the different ranks are represented on the figure (b).× 10 −31 0.22641 × 10 −7 9 1.6394 × 10 −30 2.76671 × 10 −7 10 3.9834 × 10 −15 2.83831 × 10 −7

Table 1 .
Computational results from recovering three matrices of different dimensions and 30% of known entries.The rank is fixed to be 10.

Table 2 .
Computational results from recovering three matrices of different ranks and 30% of known entries.The dimension is always 1000 × 1000.

Table 3 .
Computational results obtained by coupling modified Newton method and bisection.

Table 4 .
Computational results got by employing the function g(ε).

Table 5 .
Comparison between different approach to the outer iteration.The table show the number of iterations done by each method and the optimal values of the cost function.

Table 6 .
Comparison between different approach to the outer iteration.The table show the real error and the time.

Table 8 .
The table show the behaviours of the codes when we recover the matrix M fixing different values of the rank, accordingly with the order of magnitude of the singular value of the exact full rank solution A. × 10 −3 4.98621 × 10 −11 4.2904 × 10 −8 24 ≈ 1 × 10 −4 8.98261 × 10 −32 6.0490 × 10 −16

Table 9 .
The table show the behaviours of the codes when we recover the 50 × 50 matrix M with ≈49.88% of given elements.The rank is 3. Our results are marked by an asterisk.