A Proximal Algorithm with Convergence Guarantee for a Nonconvex Minimization Problem Based on Reproducing Kernel Hilbert Space

The underlying function in reproducing kernel Hilbert space (RKHS) may be degraded by outliers or deviations, resulting in a symmetry ill-posed problem. This paper proposes a nonconvex minimization model with ℓ0-quasi norm based on RKHS to depict this degraded problem. The underlying function in RKHS can be represented by the linear combination of reproducing kernels and their coefficients. Thus, we turn to estimate the related coefficients in the nonconvex minimization problem. An efficient algorithm is designed to solve the given nonconvex problem by the mathematical program with equilibrium constraints (MPEC) and proximal-based strategy. We theoretically prove that the sequences generated by the designed algorithm converge to the nonconvex problem’s local optimal solutions. Numerical experiment also demonstrates the effectiveness of the proposed method.


Introduction
The reproducing kernel Hilbert space (RKHS, denote as H) has been widely studied in many studies [1][2][3][4][5][6][7]. Its most critical property is that the functions in RKHS can be linearly represented by reproducing kernel function. In addition, many studies have analyzed the properties of unitary or binary functions in RKHS. These functions usually can be regarded as signals or images in discrete form, so as to build optimization models and solve some application problems, such as image super-resolution and image restoration.
In general, the Hilbert space can be considered as H ⊂ L 2 (P) which P is a probability measure on the subset X ⊂ R, H is complete for a class of real valued functions f : X → R with f L 2 (P) < ∞. Moreover, the reproducing kernel κ of H can be defined as: (1) for any x ∈ X , the function κ(·, x) blongs to H. (2) the function κ has so-called reproducing property, that is f (x) = f , κ(·, x) H for all f ∈ H, and ·, · H represents an associated inner product. By this relation, we could get the Gram matrix K ∈ R n×n by the discretization of reproducing kernel κ(·, x) for x ∈ R n , thus it is easy to get the following discrete formulation by considering the bias e ∈ R n : where α ∈ R n is the coefficient we need to estimate, besides, K is a real symmetry matrix.
In the real world, the underlying function g generally will be polluted by outliers and Gaussian perturbation, which gets the following symmetry ill-posed problem: where u ∈ R n can be viewed as outliers and σ ∈ R n stands for Gaussian perturbation. Our final goal is to accurately estimate the coefficients α and e from the known y and K. After obtaining α and e, we could calculate the underlying function g by (1). Note that, solving the problem (2) is quite a chanllenging task, since the variables α, e, u and σ in (2) are all unknown, which leads to a ill-posed problem.
In [8], Papageorfiou et al, considered that the function g can be linearly represented by coefficient α and constant c as g = Kα + c1, and proposed a kernel regularized orthogonal maching pursuit (KROMP) method to solve the nonconvex problem. However, the KROMP method has two weakness: one is that the constant c is not general and flexible; the other is that the convergence of its algorithm is not guaranteed theoretically. Therefore, this paper mainly establishes the nonconvex optimization model for the degraded problem (2) in RKHS, and gives the designed algorithm whose convergence can be guaranteed, finally shows the effectiveness of the proposed method in some simulation experiments.
Regularized modeling is a promising way to deal with ill-posed problems. Actually, the variable u representing outliers is generally sparse, which motivates us to formulate a sparsity-based regularization model. Especially, 0 -quasi norm that counts the nonzero elements in a vector is an ideal metric to depict the sparse property. Therefore, the nonconvex minimization problem for solving the ill-posed problem (2) can be simply shown as follows, min α,e,u∈R n Φ(α, e, u) = where µ, λ 1 and λ 2 are positive parameters. The first term in (3) is deduced from the Gaussian perturbation of σ under the framework of maximum a posteriori (MAP) estimation. The second and third terms are two regularized terms to depict the underlying prior for α and e. The last term is a 0 -quasi norm to depict the sparse prior of outlier u. Note that the given regularization model (3) is a nonconvex minimization problem due to the nonconvex property of 0 term. In general, the 0 term will be replaced approximately by some other convex terms (e.g., 1 term or hard threshold [9][10][11][12]) for simpler computation and convergence guarantee. However, if taking this way, it will lose the accuracy of depicting sparsity, which may result in unreasonable outcomes. However, the nonconvex minimization problems usually have the following difficulties which are encountered to solve: (1) Whether the designed algorithm can effectively solve the minimization model? (2) Whether the convergence analysis of the designed algorithm can be guaranteed? (3) Whether the initial value affects the convergence of the designed algorithm? Thus, many studies have been devote to conquer these weaknesses of nonconvex problems.
Recently, the nonconvex problem can be reformulated as an equivalent minimization problem based on the mathematical program with equilibrium constraints (MPEC) which can be effectively solved by the classical algorithms [13][14][15]. For instance, Yuan et al. [14] have proposed an equivalent biconvex MPEC formulation for 0 -quasi norm of the nonconvex minimization problem. Additionally, the proximal alternating based algorithm has widely been used to solve the nonconvex and nonsmooth problems [16][17][18][19][20][21][22][23][24]. In [18], Bolte et al. propose a proximal alternating linear minimization algorithm (PALM) framework to solve nonconvex and nonsmoothing minimization problem, and give the convergence analysis of the algorithm.
In this paper, we mainly focus on the above mentioned difficulties of nonconvex minimization problem (3) to design an efficient algorithm with convergence guarantee theoretically. A simple and representative example is employed to verify the effectiveness of the proposed method. Besides, the contributions of this work can be summarized as: (1) New nonconvex minimization modeling based on RKHS; (2) Convergence guarantee of the designed algorithm for the nonconvex problem.
The outline of this paper is as follows. Section 2 shows the detailed algorithm for the nonconvex minimization problem (3). In Section 3, the convergence analysis of the given algorithm is given. Numerical results are reported in Section 4. Finally, conclusions are drawn in Section 5.

The Solution for the Nonconvex Minimization Problem
Based on the MPEC lemma of 0 -quasi norm (see more details from [14]), the nonconvex minimization problem (3) can be equivalently reformulated the following model: where represents point-wise multiplication, and ι [0,1] (v) is indicator function projecting the elements of v into [0, 1]. The constrained minimization problem (4) can be rewritten as the following unconstrained minimization problem by the penalty strategy: where We utilize the proximal-based algorithm to effectively deal with the unconstrained problem (5) by alternatingly solving each variable, which leads to the following subproblems: where the related parameters are all nonnegative, i.e., δ 1 > 0, δ 2 > 0, and τ 1 = γ 1 K T K F , τ 2 = γ 2 I T m I m F , here I m ∈ R m×m is identity matrix, and · F represents Frobenius norm, In particular, above subproblems are all convex functions whose closed-form solutions can be easily calculated as follows: We iteratively and alternatingly update α k+1 , e k+1 , u k+1 and v k+1 according to (10)- (13). The final algorithm for the nonconvex minimization problem (3) is summarized in Algorithm 1.
In Algorithm 1, "Max iter " means the maximum iterations and the "rho" represents the relative error between adjacent iterations. When the iteration stops, the final underlying function can be estimated by the relation of (1).

Convergence Analysis
For the sake of notational simplicity, we uniform expression * as: (1) Lemma 1. Let the bounded sequence {z k } k∈N is generated by designed algorithm. Then the sequence {Φ(z k )} k∈N sufficiently decreases as follows: where and L 1 is Lipschitz constant of ∂ α H for α, and L 2 is that of ∂ e H for e.

Lemma 3.
Let the bounded sequence {z k } k∈N be generated by designed algorithm and the initial variable z 0 be bounded. Then the sequence {Φ(z k )} k∈N is bounded.
Proof. Obviously, the continuous function Φ is proper and coercive, since there exists Φ → ∞ if and only if z → ∞. Thus, the sequence Φ(z k ) is bounded because the sequence {z k } generated by the designed algorithm is bounded.

Lemma 4. The function Φ(z) in Equation (3) is a Kurdyka-Łojasiewicz (KŁ) function (The definition of a KŁ function and some examples can be found in [18]).
Proof. According to the definition and some examples for the KŁ function in [18,25], f 1 , f 2 , f 3 , f 4 , H are polynomial functions which are obviously real analytic functions. Thus, Φ(z) is a KŁ function. Theorem 1. Let the sequence {z k } k∈N be bounded which is generated by the designed algorithm. Then the sequence {z k } k∈N converges to a critical point.
Proof. Since the sequence {z k } k∈N is bounded, so it must exist a subsequence {z k q } q∈N which converges to a critical pointz satisfying lim q→∞ z k q =z. Since the function Φ(z) is continuous, then the sequence {Φ(z k q )} q∈N is converged to Φ(z), i.e., lim q→∞ Φ(z k q ) = Φ(z).
According to Lemmas 1-3, it means that the sequence {Φ(z k )} k∈N is also converged. Thus, the sequence {Φ(z k )} k∈N and the subsequence {Φ(z k q )} q∈N will converge to the same function value as follows: If there is an indexk such that Φ(z˜k) = Φ(z). It is obviously that the sequence {z k } k∈N up to a stationary point and the corresponding function value does also not change based on Lemma 1, i.e., Φ(z˜k +1 ) = Φ(z˜k). Thus, it is the critical point of {z k } and the conclusion of Theorem 1 is obviously established.
Next part will prove that Theorem 1 still holds when the indexk is nonexistent. Based on Lemma 1, it implies that for any k > 0, we have: From (28), it implies that for any η > 0, there exists k 1 ∈ N, so we have the following inequation when k > k 1 , Denote a set of limit points of the sequence {z k } as ϑ(z 0 ), and let dist(z, ϑ(z 0 )) express the minimum distance between one pointz and a set ϑ(z 0 ), i.e., dist z k , ϑ(z 0 ) = min{ z k − z : z ∈ ϑ(z 0 )}.
Based on Lemma 2, Φ(z) is a continuous and differential function, and we have 0 ∈ ∂Φ(z) since d k+1 → 0 as k → ∞. It implies that for any > 0, there exists a positive index k 2 ∈ N, when k > k 2 , then: Set l = max{k 1 , k 2 }, and for each k > l, there exists: From the Lemma 4, Φ(z) is a KŁ function in domain Ω. Thus, there is a concave function ϕ such that: Moreover, based on the concavity of ϕ, it has: ) for all nonnegative integers p, and q ∈ N. Combine Equation (33), Lemmas 1 and 2, we have: which can be rewritten as follows: It is a well-known inequality 2 αβ ≤ α + β for any α, β > 0, thus, we have: Taking summation to (36) for i = l + 1, · · · , k, it has the following inequation: that is: According to the definition of Θ, we have lim k→∞ Θ l+1,k+1 = ϕ Φ(z l+1 ) − Φ(z) . Thus, Thus, {z k } is a Cauchy sequence, and has a finite length. Because of the completeness of Hilbert space, the Cauchy sequence {z k } is also certainly a convergence sequence. Thus, the sequence {z k } generated by designed algorithm converges to a critical point z * = (α * , e * , u * , v * ). Moreover, the convergence of the sequence generated by the designed algorithm can be guaranteed for any initial value.

Numerical Result
In this section, we conduct some simple simulation examples to show the effectiveness of the proposed method. We choose f as the ground-truth function (discrete form as f) and add Gaussian noise and sparse outliers to generate the observation. The proposed method is compared with the kernel-based regression using orthogonal matching pursuit (KROMP) method [8], and the parameters of KROMP method are selected according to the range mentioned in the literature. The parameters of the proposed method in the experiment set empirically to: µ = 10, λ 1 ∈ [10 −3 , 10 2 ], λ 2 ∈ [10 −3 , 10], β = 0.01,γ 1 = 1.1, γ 2 = 10γ 1 , and it should be noted that better visual and numerical results can be obtained by fine-tuning the parameters more carefully.
The related error (ReErr) for the quantitative evaluation, which is a commonly index to measure the effect of restoration, and it is defined as: where g is the restoration result which is estimated by different methods. Experiments are implemented in MATLAB (R2016a) on a desktop with 16Gb RAM and Inter(R) Core(TM) CPU i5-4590: @3.30GHz.
In order to show the experimenta results of different examples, we will show the ground-truth data, the degraded data polluted by noise and outliers, and the restored outcome calculated by the proposed method and the KROMP method, respectively.

Example 2.
The binary continuous function f is a given as follows: πx . After discretization of f which is a binary continuous function, 10 dB Gaussian noise and 10% outlier noise were added to obtain the final degraded data.
From Figures 1 and 2, although the shape of ground-truth in Example 2 is similar to that of Example 1 (see Figures 1a and 2a), in fact, the degree of degradation of Example 2 is much greater than that of Example 1, which the function value and the degree of noise pollution are different (see Figures 1b and 2b). The proposed method has obvious restorated outcomes, and can effectively recover original data (see Figure 1c, however the restorated outcomes of the KROMP method still have obvious noise residual in Figures 1d and 2d. It also shows the effectiveness of the proposed method.

Example 3.
The binary continuous function f is a given as follows where x ∈ [−2, 2], y ∈ [−2, 2] (two dimensions, respectively, take 21 discrete points). After discretization of f which is a binary continuous function, 10 dB Gaussian noise and 5% outlier noise were added to obtain the final degraded data.
It can be seen that even in the case of extremely large external pollution, such as Example 3 in Figure 3b, the proposed method can obtain more accurate recovery data ( Figure 3c) than the KROMP method (Figure 3d). In addition, the relative error (ReErr) results from Example 1 to Example 3 are shown in Table 1, and the better results have been bolded. It is obvious that the proposed method has smaller relative errors compared with the KROMP method, which verifies the effectiveness of the proposed method.

Conclusions
In this paper, we proposed a new nonconvex modeling to deal with a challenging symmetry ill-posed problem. An efficient algorithm for solving the given nonconvex problem is designed by MPEC and the proximal-based regularization. Theoretically, the bounded sequence generated by the designed algorithm can be guaranteed to converge to the nonconvex problem's local optimal solution. Furthermore, the convergence of the designed algorithm can be guaranteed for any initial value. The numerical experiments show that the proposed method can achieve better restoration results. For example, our method could obtain smaller relative error comparing with benchmark KROMP approach, besides, could interpolate more mesh points and significantly reduce noise. Data Availability Statement: Data in the study is generated by ourself.