A Uniﬁed Proximity Algorithm with Adaptive Penalty for Nuclear Norm Minimization

: The nuclear norm minimization (NNM) problem is to recover a matrix that minimizes the sum of its singular values and satisﬁes some linear constraints simultaneously. The alternating direction method (ADM) has been used to solve this problem recently. However, the subproblems in ADM are usually not easily solvable when the linear mappings in the constraints are not identities. In this paper, we propose a proximity algorithm with adaptive penalty (PA-AP). First, we formulate the nuclear norm minimization problems into a uniﬁed model. To solve this model, we improve the ADM by adding a proximal term to the subproblems that are difﬁcult to solve. An adaptive tactic on the proximity parameters is also put forward for acceleration. By employing subdifferentials and proximity operators, an equivalent ﬁxed-point equation system is constructed, and we use this system to further prove the convergence of the proposed algorithm under certain conditions, e.g., the precondition matrix is symmetric positive deﬁnite. Finally, experimental results and comparisons with state-of-the-art methods, e.g., ADM, IADM-CG and IADM-BB, show that the proposed algorithm is effective.


Introduction
The rank minimization (RM) problem aims to recover an unknown low-rank matrix from very limited information. It has gained am increasing amount of attention rapidly in recent years, since it has a range of applications in many computer vision and machine learning areas, such as collaborative filtering [1], subspace segmentation [2], non-rigid structure from motion [3] and image inpainting [4]. This paper deals with the following rank minimization problem: where A : R m× n → R p is a linear map and the vector b ∈ R p is known. The matrix completion (MC) problem is a special case of the RM problem, where A is a sampling operator in the form of AX = X Ω , Ω ⊂ {1, 2, . . . , m} × {1, 2, . . . , n} is an index subset, and X Ω is a vector formed by the entries of X with indices in Ω.
Although Label (1) is simple in form, directly solving it is an NP-hard problem due to the discrete nature of the rank function. One popular way is replacing the rank function with the nuclear norm, which is the sum of the singular values of a matrix. This technique is based on the fact that the nuclear norm minimization (NNM) is the tightest convex relaxation of the rank minimization problem [5]. The obtained new problem is given by min X∈R m× n X * s.t. AX = b, (2) where X * := ∑ r i=1 σ i (X) denotes the nuclear norm. It has been shown that recovering a low-rank matrix can be achieved by solving Label (2) [1,6].
In practical applications, the observed data may be corrupted with noise, namely b = AX + e, where e contains measurement errors dominated by certain normal distribution. In order to recover the low-rank matrix robustly, problem (2) should be modified to the following inequality constrained problem: min where · 2 is the 2 norm of vector and the constant δ ≥ 0 is the noise level. When δ = 0, problem (3) reduces to the noiseless case (2). Alternatively, problems (2) and (3) can be rewritten as the nuclear norm regularized least-square (NNRLS) under some conditions: where γ > 0 is as given parameter.
The studies on the nuclear norm minimization problem are mainly along two directions. The first one is enhancing the precision of a low rank approximation via replacing the nuclear norm by a non-convex regularizer-for instance, the Schatten p-norm [7,8], the truncated nuclear norm [4,9], the log or fraction function based norm [10,11], and so on. The second one is improving the efficiency of solving problems (2), (3) and (4) and their variants. For instance, the authors in [12] treated the problem (2) as a standard linear semidefinite programming (SDP) problem, and proposed the solving algorithm from the dual problem. However, since the SDP solver uses second-order information, with the increase in the size of the matrix, the memory required to calculate the descending direction quickly becomes too large. Therefore, algorithms that use only first-order information are developed, such as the singular value thresholding (SVT) algorithm [13], the fixed point continuation algorithm (FPCA) [14], the accelerated proximal gradient Lagrangian (APGL) method [15], the proximal point algorithm based on indicator function (PPA-IF) [16], the augmented Lagrange multiplier (ALM) algorithm [17] and the alternating direction methods (ADM) [18][19][20][21].
In particular, Chen et al. [18] applied the ADM to solve the nuclear norm based matrix completion problem. Due to the simplicity of the linear mapping A, i.e., AA * = I, all of the ADM subproblems of the matrix completion problem can be solved exactly by an explicit formula; see [18] for details. Here, and hereafter A * and I represent the adjoint of A and the identity operator. However, for a generic A with AA * = I, some of the resulting subproblems no longer have closed-form solutions. Thus, the efficiency of the ADM depends heavily on how to solve these harder subproblems.
To solve this difficulty, a common strategy is to introduce new auxiliary variables, e.g., in [19], one auxiliary variable was introduced for solving Label (2), while two auxiliary variables were introduced for Label (3). However, with more variables and more constraints, more memory is required and the convergence of ADM also becomes slower. Moreover, to update auxiliary variables, whose subproblems are least square problems, expensive matrix inversions are often necessary. Even worse, the convergence of ADM with more than two variables is not guaranteed. To mitigate these problems, Yang and Yuan [21] presented a linearized ADM to solve the NNRLS (4) as well as problems (2) and (3), where each subproblems admit explicit solutions. Instead of the linearized technique, Xiao and Jin [19] solve one least square subproblem iteratively by the Barzilai-Borwein (BB) gradient method [22].
Unlike [19], Jin et al. [20] used the linear conjugate gradient (CG) algorithm rather than BB to solve the subproblem.
In this paper, we further investigate the efficiency of ADM in solving the nuclear norm minimization problems. We first reformulate the problems (2), (3) and (4) into a unified form as follows: where f : R m× n → R and g : R p → R. In this paper, we always fix f (·) = · * . When considering problems (2) and (3), we choose g(·) = X B δ (· − b), where X B δ (·) denotes the indicator function over When considering problem (4), we choose g(·) = γ 2 · −b 2 2 . As a result, for a general linear mapping A, we only need to solve such a problem whose objective function is a sum of two convex functions and one of them contains an affine transformation.
Motivated by the ADM algorithms above, we then present a unified proximity algorithm with adaptive penalty (PA-AP) to solve Label (5). In particular, we employ the proximity idea to deal with the problems encountered by the present ADM, by adding a proximity term to one of the subproblems. We term the proposed algorithm as a proximity algorithm because we can rewrite it as a fixed-point equation system of proximity operators of f and g. By analyzing the fixed-point equations and applying the "Condition-M" [23], the convergence of the algorithm is proved under some assumptions. Furthermore, to improve the efficiency, an adaptive tactic on the proximity parameters is put forward. This paper is closely related to the works [23][24][25][26]. However, this paper is motivated to improve ADM to solve the nuclear norm minimization problem with linear affine constraints.
The organization of this paper is as follows. In Section 2, a review of ADM and its application on NNM are provided. In addition, the properties about subdifferentials and proximity operators are introduced. To improve ADM, a proximity algorithm with adaptive penalty is proposed, and convergence of the proposed algorithm is obtained in Section 3. Section 4 demonstrates the performance and effectiveness of the algorithm through numerical experiment. Finally, we will make a conclusion in Section 5.

Preliminaries
In this section, we give a brief review on ADM and its applications to the NNM problem (2) developed in [19,20]. In addition, some preliminaries on subdifferentials and proximity operators are given. Throughout this paper, linear maps are denoted with calligraphic letters (e.g., A), while capital letters represent matrices (e.g., A), and boldface lowercase letters represent vectors (e.g., x).
We begin with introducing the ADM. The basic idea of ADM goes back to the work of Gabay and Mercier [27]. ADM is designed to solving the separable convex minimization problem: where θ 1 : R s → R, θ 2 : R t → R are convex functions, and A ∈ R l× s , B ∈ R l× t and c ∈ R l . The corresponding augmented Lagrangian function is where λ ∈ R l is the Lagrangian multiplier and β > 0 is a penalty parameter. ADM is to minimize (8) first with respect to x, then with respect to y, and finally update λ iteratively, i.e., The main advantage of ADM is to make use of the separability structure of the objective function To solve (2) based on ADM, the authors in [19,20] introduced an auxiliary variable Y, and equivalently transformed the original model into The augmented Lagrangian function to (10) is where Z ∈ R m× n , z ∈ R p are Lagrangian multipliers, µ, γ > 0 are penalty parameters and · denotes the Frobenius inner product, i.e., the matrices are treated like vectors. Following the idea of ADM, given (X k , Y k ), the next pair (X k+1 , Y k+1 ) is determined by alternating minimizing (11), Firstly, X k+1 can be updated by which in fact corresponds to evaluating the proximal operator of · * , i.e., prox 1 β · * which is defined below.
Secondly, given (X k+1 , Z k , z k ), Y k+1 can be updated by which is a least square subproblem. Its solution can be found by solving a linear equation: However, computing the matrix inverse (µI + γA * A) −1 is too costly to implement. Though [19,20] adopted inverse-free methods, i.e., BB and CG, to solve (14) iteratively, they are still inefficient, which will be shown in Section 4.
Next, we review definitions of subdifferential and proximity operator, which play an important role in the algorithm and convergence analysis. The subdifferential of a convex function θ at a point x ∈ R d is the set defined by The conjugate function of θ is denoted by θ * , which is defined by where dom(·) denotes the domain of a function. For a given positive definite matrix H, the weighted inner product is defined by x, y H = Hx, y . Furthermore, the proximity operator of θ at x with respect to H [23] is defined by and prox 1 β θ(·) (·) is short for prox θ(·),βI (·). A relation between subdifferentials and proximity operators is that which is frequently used to construct fixed-point equations and prove convergence of the algorithm.

Proximity Algorithm with Adaptive Penalty
In Section 2, it is shown that the current ADM results in expensive matrix inverse computation when solving (2). Therefore, it is desirable to improve it. In this section, we propose a proximity algorithm with adaptive penalty (PA-AP) to solve the unified problem (5).

Proximity Algorithm
We derive our algorithm from ADM. First of all, we introduce an auxiliary variable y ∈ R p , and convert (5) to min The augmented Lagrangian function of (19) is defined by where λ ∈ R p is the Lagrangian multiplier and µ > 0 is a penalty parameter. ADM first updates X by minimizing L(X, y) with y being fixed and then updates y with X fixed at its latest value, until some convergence criteria are satisfied. After some simplification, we get Note that the subproblem of X k+1 usually has no closed-form solutions when A is not the identity. In order to design efficient algorithms, we add a proximity term to this subproblem. More precisely, we propose the following algorithm: where Q ∈ R m× m is a symmetric positive definite matrix. The next lemma shows that (22) is equivalent to a fixed-point equation of some proximity operators. The proof is similar to [25]. (22) is equivalent to solving the following equations:
Proof. By changing the order of iterations, the first-order optimality condition of (22) is From the first line of (24) and (16), we obtain Denote z k := −λ k . Then, Using (18), it follows from (25) and (26) that By (26), we have and thus From (28) and the third line of (24), given τ > 0, we have which yields Therefore, the results in (23) are achieved by combining (27) and (29).
We now discuss the choice of Q. To make the first subproblem of (22) have closed-form solutions, in this paper, we simply choose Q = µηI − µA * A. To make sure Q is positive definite, η must satisfy η > A 2 2 , where · 2 is the spectral norm. By substituting Q into (22), we obtain The subproblems in (30) can be solved explicitly based on proximity operators. Specifically, we have where UΣV T is the singular value decomposition (SVD) of X k + A * (λ k −µ(AX k −y k )) µη , U ∈ R m× m , V ∈ R n× n , and Σ ∈ R m× n is a diagonal matrix containing the singular values.

Adaptive Penalty
In previous proximity algorithms [23,28,29], the penalty parameter µ is usually fixed. In view of the linearized ADM, Liu et al. [26] presented an adaptive updating strategy for the penalty parameter µ. Motivated by it, we update µ by where µ max is an upper bound of {µ k }. The value of ρ is defined as where ρ 0 > 1 and 1 > 0 are given.
Based on the above analysis in Sections 3.1 and 3.2, the proximity algorithm with adaptive penalty (abbr. PA-AP) for solving (5) can be outlined in Algorithm 1.

Convergence
In this section, we establish the convergence of (22). For problem (5), Li et al. [23] presented a general formula of fixed-point algorithms. Given two symmetric positive definite matrices S ∈ R p× p and T ∈ R m× m , denote where S A , R and E can be treated as linear maps which map (R p ) × (R m× n ) into itself. Defining Z := (z, X) ∈ (R p ) × (R m× n ) and T := prox (g * + f )(·),R , then the solution to (5) is equivalent to Furthermore, a multi-step proximity algorithm was proposed, which is Li et al. [23] proved that the sequence {Z k } generated by (35) converges to a solution of problem (5) if M satisfies the "Condition-M", which refers to where N (H) and H † are the null space and the Moore-Penrose pseudo-inverse matrix of H, respectively. By checking the Condition-M, we prove the convergence of (22).
Since H is positive definite, it yields that N (H) = {0}, which implies Item (iv) of Condition-M holds. Finally, M 2 = 0 implies that Item (v) holds. Consequently, the sequence generated by (23) converges to the solution of (5). The equivalence of (22) and (23) proves the result.
2 , then the sequence generated by (30) converges to the solution of (5).

Numerical Experiments
In this section, we present some numerical experiments to show the effectiveness of the proposed algorithm (PA-AP). To this end, we test algorithms to solve the nuclear norm minimization problem, the noiseless matrix completion problem(δ = 0), noisy matrix completion(δ > 0) and low-rank image recovery problem. We compare PA-AP against the ADM [18], IADM-CG [20] and IADM-BB [19]. All experiments are performed under Windows 10 and MATLAB R2016 running on a Lenovo laptop with an Intel CORE i7 CPU at 2.7 GHz and 8 GB of memory. In the numerical experiments of the first two parts, we use randomly generated square matrices for simulations. We denote the true solution by X * ∈ R m× m . We generate the rank-r matrix X * as a product of X L X T R , where X L and X R are independent m × r matrices with i.i.d. Gaussian entries. For each test, the stopping criterion is where ε 2 > 0. The algorithms are also forced to stop when the iteration number exceeds 10 3 . LetX be the solution obtained by the algorithms. We use the relative error to measure the quality ofX compared to the original matrix X * , i.e., It is obvious that, in each iteration of computing X k+1 , PA-AP contains an SVD computation that computes all singular values and singular vectors. However, we actually only need the ones that are bigger than 1 µη . This causes the main computational load by using full SVD. Fortunately, this disadvantage can be smoothed by using the software PROPACK [30], which is designed to compute the singular values bigger than a threshold and the corresponding vectors. Although PROPACK can calculate the first fixed number of singular values, it cannot automatically determine the number of singular values greater than 1 µη . Therefore, in order to perform a local SVD, we need to predict the number of singular values and vectors calculated in each iteration, which is expressed by sv k . We initialize sv 0 = 0.01m, and update it in each iteration as follows: where svp k is the number of singular values in the sv k singular values that are bigger than 1 µη . We use r and p to represent the rank of an (m × n) matrix and the cardinality of the index set Ω, i.e., p = |Ω|, and use sr = p/(mn) to represent the sampling rate. The "degree of freedom" of a matrix with rank r is defined by dof = r(m + n − r). For PA-AP, we set ε 1 = 10 −4 , µ 0 = 1 b 2 , µ max = max{10 3 µ 0 , 10 −2 }, ρ 0 = 1.7, and η = 1.01 A 2 2 . In all the experimental results, the boldface numbers always indicate the best results.

Nuclear Norm Minimization Problem
In this subsection, we use PA-AP to solve the three types of problems including (2)-(4). The linear map A is chosen as a partial discrete cosine transform (DCT) matrix. Specifically, in the noiseless model (2), A is generated by the following MATLAB scripts: which shows that A maps R m× n into R p . In the noise model (3), we further set where ω is the additive Gaussian noise of zero mean and standard deviation σ. In (3), the noise level δ is chosen as ω 2 .
The results are listed in Table 1, where the number of iterations (Iter) and CPU time in seconds (Time) besides RelErr are reported. To further illustrate the efficiency of PA-AP, we test problems with different matrix sizes and sampling rates (sr). In Table 2, we compare the PA-AP with IADM-CG and IADM-BB for solving the NNRLS problem (4). It shows that our method is more efficient than the other two methods, and thus it is suitable for solving large-scale problems.

Matrix Completion
This subsection adopts the PA-AP method to solve the noiseless matrix completion problem (2) and the noisy matrix completion problem (3) to verify its validity. The mapping A is a linear projection operator defined as AX * = X Ω , where X Ω is a vector formed by the components of X * with indices in Ω. The indicators of the selected elements are randomly arranged to form a column vector, and the index set of the first sr × m × n is selected to form the set Ω. For noisy matrix completion problems, we take δ = 10 −2 .
In Table 3, we report the numerical results of PA-AP for noiseless and noisy matrix completion problems, taking m = n = 1000 and m = n = 2000. Only the rank of the original matrix is considered to be r = 10 and r = 20. As can be seen from Table 3, the PA-AP method can effectively solve these problems. Compared with the noiseless problem, PA-AP solves the noisy problems accuracy of the solution dropped. Moreover, the number of iterations and the running time decrease as sr increases. Table 3. PA-AP for noiseless and noisy matrix completion problems (m = n, ε 2 = 10 −5 ).  To further verify the validity of the PA-AP method, it is compared with ADM, IADM-CG and IADM-BB. When RelChg is lower than 10 −5 , the algorithms are set to terminate. The numerical results of the four methods for solving the noiseless and noisy MC problem are recorded in Tables 4 and 5, from which we can see that the calculation time of the PA-AP method is much less than IADM-BB and IADM-CG, and the number of iterations and calculation time of PA-AP and ADM are almost the same, while our method is relatively more accurate. From the limited experimental data, the PA-AP method is shown to be more effective than the ADM, IADM-BB and IADM-CG.

Low-Rank Image Recovery
In the section, we turn to solve problem (2) for low-rank image recovery. The effectiveness of the PA-AP method is verified by testing three 512 × 512 grayscale images. First, the original images are transformed into low-rank images with rank 40. Then, we lose some elements from the low rank matrix to get the damaged image, and restore them by using the PA-AP, ADM, IADM-BB and IADM-CG, respectively. The iteration process is stopped when RelChg falls below 10 −5 . The original images, the corresponding low-rank images, the damaged images, and the restored images by the PA-AP are depicted in Figure 1. Observing the figure, we clearly see that our algorithm performs well. To evaluate the recovery performance, we employ the Peak Signal-to-Noise Ratio (PSNR), which is defined as where X * ∞ is the infinity norm of X * , defined as the maximum absolute value of the elements in X * . From the definition, higher PSNR indicates a better recovery result. Table 6 shows the cost time, relative error and PSNR of recovery image by different methods. From Table 6, we can note that the PA-AP method is able to obtain higher PSNR as sr increases. Moreover, the running time of PA-AP is always much less than the other methods with different settings. Figure 2 shows the executing process of the different methods. From Figure 2, it is clear that our method can estimate the rank exactly after 30 iterations, and runs much less time before termination than other methods.  Figure 2. Convergence behavior of the four methods (Lena, sr = 0.4, ε 2 = 10 −5 ). The first subfigure is the estimated rank; the second is the relative error to the original matrix; and the last is the running time.

Conclusions and Future Work
In this paper, a unified model and algorithm for the matrix nuclear norm minimization problem are proposed. In each iteration, the proposed algorithm mainly includes computing matrix singular value decompositions and solving proximity operators of two convex functions. In addition, the convergence of the algorithm is also proved. A large number of experimental results and numerical comparisons show that the algorithm is superior to IADM-BB, IADM-CG and ADM algorithms.
The problem of tensor completion has been widely studied recently [31]. One of our future works is to extend the proposed PA-AP algorithm to tensor completion.