Stable Sparse Model with Non-Tight Frame

: Overcomplete representation is attracting interest in image restoration due to its potential to generate sparse representations of signals. However, the problem of seeking sparse representation must be unstable in the presence of noise. Restricted Isometry Property (RIP), playing a crucial role in providing stable sparse representation, has been ignored in the existing sparse models as it is hard to integrate into the conventional sparse models as a regularizer. In this paper, we propose a stable sparse model with non-tight frame (SSM-NTF) via applying the corresponding frame condition to approximate RIP. Our SSM-NTF model takes into account the advantage of the traditional sparse model, and meanwhile contains RIP and closed-form expression of sparse coefﬁcients which ensure stable recovery. Moreover, beneﬁtting from the pair-wise of the non-tight frame (the original frame and its dual frame), our SSM-NTF model combines a synthesis sparse system and an analysis sparse system. By enforcing the frame bounds and applying a second-order truncated series to approximate the inverse frame operator, we formulate a dictionary pair (frame pair) learning model along with a two-phase iterative algorithm. Extensive experimental results on image restoration tasks such as denoising, super resolution and inpainting show that our proposed SSM-NTF achieves superior recovery performance in terms of both subjective and objective quality.


Introduction
Sparse representation of signals in dictionary domains has been widely studied and has provided promising performance in numerous signal processing tasks such as image denoising [1][2][3][4][5], super resolution [6][7][8], inpainting [9,10] and compression [11,12]. It is well known that images are represented by a linear combination of certain atoms of a dictionary. Overcomplete sparse representation is the overcomplete system with a sparse constraint. Common overcomplete systems differ from the traditional bases, such as DCT, DFT and Wavelet, because they offer a wider range of generating elements; potentially, this wider range allows more flexibility and effectiveness in signal sparse representation. However, it is a severely under-constrained illposed problem to find the underlying overcomplete representation due to the redundancy of the systems. When the underlying representation is sparse and the overcomplete systems have stable properties, the ill-posedness will disappear [13]. Sparse models are generally classified into two categories: Synthesis sparse models and analysis sparse model [14]. The commonly referred to sparse models are synthesis sparse models. The analysis ones characterize the signal by multiplying it with an analysis overcomplete dictionary, leading to a sparse outcome. A variety of effective sparse models have been investigated and established such as the classical synthesis sparse model [9,15], the classical analysis sparse model [14], the nonlocal sparse model [16,17] and the 2D sparse model [18]. Unfortunately, these models ignore the stability recovery property which claims that once a sufficient sparse solution is found, all alternative solutions necessarily reside very close to it [9]. Recently, the stable recovery of sparse representation has drawn attention in signal processing theory. Generally speaking, stable recovery can be guaranteed by two properties: Sufficient sparsity and a favorable structure of the dictionary [19]. Donoho defines the concept of mutual incoherence of the dictionary and applies it to prove some possibility of stable recovery [19]. The authors of [20] proposea sparsity-based orthogonal dictionary learning method to minimize the mutual incoherence. The authors of [21] propose an incoherent dictionary learning scheme by integrating a low rank gram matrix of the dictionary into the dictionary learning model.
A more powerful stable recovery guarantee developed by Candes and Tao, termed Restricted Isometry Property (RIP), makes consequent analysis easy [22]. A matrix Φ is said to satisfy the RIP of order k if there exists a constant δ k ∈ (0, 1) such that holds for all k-sparse vectors y. δ k is defined as the smallest constant which satisfies the above inequalities and is called the restricted isometry constant of Φ.
Most RIP research substantially investigates applying RIP as a stablility analysis instrument [17,23,24] or finding optimal RIP constant [25,26] which are all theoretical analyses rather than practical applications. According to the research of [21], the intrinsic property of a dictionary has a direct influence on its performance. All familiar algorithms are staggeringly unstable with a coherent or degenerate dictionary [19]. Recognizing the gap between theoretical analyses and practical applications of RIP, this paper aims to build a stable sparse model satisfying the RIP.
Recently, the frame as a stable overcomplete system has drawn some attention in signal processing as the given signal can be represented by its canonical expansion in a manner similar to conventional bases under the frame. Some data-driven approaches are proposed in [1,[27][28][29][30]. The authors of [27,29,30] utilize redundant tight frame in compressed sensing and [28] applies tight frame to few-view image reconstruction. Study [1] presents a data-driven method that the dictionary atoms associated with the tight frame are generated by filters. These approaches achieve much better image processing performance than previous methods, and meanwhile the tight frame condition which requires the frame almost-orthogonality will limit the flexibility in sparse representation. Study [31] derives stable recovery result for l 1 -analysis minimization in redundant, possibly non-tight frames. Inspired by this result and the relationship between RIP and frame, we aim to establish a stable sparse model with RIP based on non-tight frame.
We call a sequence {φ i } M i=1 ∈ H a frame if and only if there exist two positive numbers A and B such that Here, A and B are called the bounds of the frame. We find that every submatrix Φ k satisfied RIP is a non-tight frame with (1 − δ k ) and (1 + δ k ) as its frame bounds with a given k. Obviously, there is an essential connection between the non-tight frame and the RIP.
In this paper we focus on a stable sparse model and more specifically on the development of an algorithm that would learn a pair of non-tight frame based dictionaries from a set of signal examples. We propose a stable sparse model via applying the non-tight frame condition to approximate the RIP. This model shares the favorite overcomplete structures with the common sparse models, and meanwhile it contains RIP and closed-form sparse coefficient expression which ensure stable recovery. Recognizing that the optimal framebounds are essentially the maximum and minimum singular values of the frame, RIP is actually enforced on the dictionary pair (the frame and its dual frame) by constraining the singular values of them. We also formulate a dictionary pair learning model via applying the second-order truncated Taylor series to approximate the inverse frame operator.
Then we present an efficient algorithm to learn the dictionary pair via a two-phase iterative approach. To summarize, this paper makes the following contributions: 1. We propose a stable sparse model along with a dictionary pair learning model. Non-tight frame condition is utilized to develop a relaxation of RIP to guarantee stable recovery of sparse representation. Moreover, the sparse coefficients are also modeled, which leads to a more stable recovery especially for seriously noisy image. 2. It is nearly impossible to solve the dictionary pair learning model in a straightforward way since the inverse frame operator is involved. We provide an effective way to modify the model via applying a second-order truncated Taylor series to approximate the inverse frame operator, and provide an efficient algorithm for the modified one. 3. We present the stability analysis of the proposed model and demonstrate it on natural and synthetic image denoising, super resolution and image inpainting. The denoising results show that the proposed approach outperforms synthesis models such as the KSVD and the data-driven tight frame based methods for natural image case in terms of average PSNR. Moreover, it also gains comparable performance to the Analysis KSVD for a piecewise-constant (PWC) image in terms of average PSNR. The meaningful structures in the trained dictionary pairs for natural images and a PWC image are observed. The super resolution results show that the SSM-NTF produces better performance than the Bicubic interpolation method and the method in [32].
The inpainting results show that our model is able to eliminate text of fonts completely.
This paper is organized as follows: Section 2 reviews the related work on frame, synthesis sparse model and analysis sparse model. Section 3 presents our stable sparse model with non-tight frame SSM-NTF along with a dictionary pair learning model. Section 4 proposes the corresponding dictionary pair learning algorithm. Section 5 proposes the image restoration method of our proposed SSM-NTF model. In Section 6 we analyze the computational complexity of our proposed algorithm. In Section 7, we demonstrate the the effectiveness of our SSM-NTF model by analyzing the convergence of the corresponding algorithm, denoising natural and piecewise constant images, super resolution and image inpainting. Finally, Section 8 concludes this paper.

Related Work
In this section, we briefly review the related work on frame, synthesis sparse model and analysis sparse model.
Frame: A frame Φis called a tight frame if the frame bounds are equal in the Equation (2) [32]. There are two associated operators can be defined between the Hilbert space H N and Square Integrable Space l M 2 (·) once a frame is defined. One is the analysis operator T defined by and the other is its adjoint operator T * which is called the synthesis operator then, the frame operator can be defined as the following canonical expansion In Euclidean space, a given frame Φ can be represent in manner of matrix with its columns of it as the frame elements. Then one of its adjoint operator can be representated as ψ i = F −1 φ i [32]. Letx ∈ R N be an arbitrary vector, a reconstruction function can be expressed as the following form Synthesis sparse model: The conventional synthesis sparse model represents a vector x by the linear combination of a few atoms from a large dictionary Φ, denoted as x = Φy, y 0 ≤ L, where L is the sparsity of y. The computational techniques for approximating sparse coefficient y under a given dictionary Φ and x includes greedy pursuit (e.g., OMP [9]) and convex relaxation optimization, such as Lasso [33] and FISTA [8]. In order to improve the performance of sparse representation, some modified models such as the nonlocal sparse model [16], the frame based sparse model [21], and the MD sparse model [18] are also investigated.
Analysis sparse model: The analysis sparse model is defined as: y = Ωx, y 0 = p − l where Ω ∈ R p×d is a linear operator (also called as a dictionary), and l denotes the co-sparsity of the signal x. The analysis representative vector y is sparse with l zeros. The zeros in y denote the low-dimensional subspace to which the signal x belongs. The analysis sparse coding [14] and dictionary learning [34] approach are also been proposed.
However, all these models ignore the stability recovery property which provides stable reconstruction of the signals in presence of noise.
Dictionary learning methods: The dictionaries include analytical dictionaries, such as DCT, DWT, curvelets and contourlets and learned dictionaries. Some dictionary learning method are proposed, such as the classical KSVD [9] algorithm, the efficient sparse coding which convert the original dictionary learning problem to two least squares problem by applying the Lagrange dual [3], the non-local sparse model [16] which learns a set of PCA sub-dictionaries by cluster the samples into K clusters using image nonlocal self-similarity prior and its improved version which using the l q -norm to instead the l 2 -norm in order to handle different image contents. With the realization of stability, some mutual-coherence based methods are proposed. In [20] a sparsity-based orthogonal dictionary learning method is proposed to minimize the mutual-coherence of the dictionary. The authors of [21] propose an in coherent dictionary learning scheme by integrating a low rank gram matrix of the dictionary into the dictionary learning model. However, these methods only concern the capability of the dictionary without modeling the sparse coefficients which still has some probability of instability.

The Proposed SSM-NTF
In this section, we present the stable sparse model with non-tight frame, (Section 3.1), the stability analysis of the proposed model, (Section 3.2) and the dictionary pair (the frame pair) learning model, (Section 3.3).

Stable Sparse Model with Non-Tight Frame
In this section, we derive our stable sparse model with non-tight frame where the non-tight frame condition serves as an approximation to the RIP.
According to [35], a k-th RIP constant can be express as where The Equation (7) provides a new perspective in integrating the RIP to sparse model via applying θ k max and θ k min instead of the RIP constant δ k (Φ). The difficulty in building a stable sparse model decreases. However, the sparsity k varies with the noise level, and also, in a feasible numerical calculation method, it is impossible to sweep through all the samples satisfying x 0 = k to pursue an unknown dictionary Φ.
Let x be a signal vector, the frame reconstruction function can be formulated as x = ΦΨ T x where Ψ is a dual frame of Φ. Adding a reasonable sparsity prior to the signal x over Ψ domain, we can derive Denoting the optimal frame bounds of Φ as A and B, the frame condition of Ψ can be formulated Then a pair of bounds for Equation (10) can be obtained as A ≤ A formula similar to Equation (9) is derived as where J is the data set. Imitating Equation (7), we can obtain a RIP-like constant expression Obviously,δ(Φ) can be regarded as an approximation of the RIP constant which benefits the computation due to the ignorance on sparsity degree. In a word, the RIP constraint can be satisfied by constraining the frame bounds. Thus, a stable overcomplete system with a sparsity prior can be established. Now we discuss the characteristic of the frame bounds A and B. The Frame Condition (2) has a more compact form where η max and η min denote the maximum and minimum singular values of Φ, respectively. Then, we can obtain η max ≥ θ k max , η min ≤ θ k min . It is easy to know thatδ(Φ) ≥ δ k (Φ). Obviously,δ(Φ) is a reasonable relaxation of δ k (Φ) asδ(Φ) is slightly exceed δ k (Φ) but resides very close to it as long as the data is not seriously degraded. Therefore, the RIP constraint can be enforced on the frames by limiting the maximum and minimum singular values.
In this paper, we integrate non-frame to traditional sparse model to establish a stable sparse model with RIP. Let x be a signal vector. Under the assumption of the sparsity prior of Ψ T x, we apply a soft thresholding operator S λ (·) (which shall be defined in the next subsection) on it such that where λ is a vector with elements λ i corresponding to ψ i , i = 1, 2, . . . , M. Therefore, we propose the stable sparse model with non-tight frame (SSM-NTF) as follows Here, the correlation between the frame Φ and its dual frame Ψ is formulated as Ψ = F −1 Φ. The frame operator F is formulated as ΦΦ T which is indeed a gram matrix of Φ. The singular values of Φ are constrained by to satisfy the RIP. Actually, by constraining the singular values of Φ, the elements of the gram matrix are also bounded which meets the theory of mutual coherence.
In order to be consistent with the traditional sparse models, we refer to the frame Φ and its dual frame Ψ as dictionary and its dual dictionary.

The Stability Analysis of the Proposed Model
In sparse representation problem, a given noiseless signal x, can be formulated formulated as (14) where Φ is the sparse representation dictionary and y is the sparse coefficients. While x = Φy is an underdetermined linear system, the problem (P 0 ) has the unique solution y 0 as soon as it satisfies the uniqueness property which is formulated as where µ is the mutual-coherence of Φ [9]. However, the signals are usually acquired with noise, then the problem (P 0 ) should be relaxed to the problem (P ) which is expressed as where is an error-tolerant which exists due to the noise. The problem (P ) will no longer maintain the uniqueness of solution as x = Φy + is an inequality system. Thus, the notion of Uniqueness Property (15) is replaced by the notion of stability which claims that all the alternative solutions reside very close to the ideal solution. Under the stable guarantee, we can yet ensure that the recovery results of our methods produce meaningful solutions. Assume that y 0 is the ideal solution to the problem (P ) andŷ is the candidate one, the traditional sparse model has a stability claim of the form [9] ŷ − y 0 where µ is the mutual coherence which is formulated as Apparently, the error bound of Equation (17) can only be determined with given sparsity s 0 and the mutual coherence µ. However, the mutual coherence of an unknown dictionary is very difficult to calculate which lead to a result that we can not ensure the stability in the dictionary learning case. In contrast, we derive a similar stability claim of our proposed SSM-NTF model. Defining d =ŷ − y 0 with y 0 as the ideal solution to the model, we have that Φŷ − Φy 0 2 = Φd 2 ≤ 2 . From the previous subsection, we have know that the frame Φ satisfies the RIP with the corresponding parameterδ(Φ). Thus, using this property and exploiting the lower-bound part in Equation (1) . Thus, we get a stability claim of the form Obviously, the error bound of the SSM-NTF is determined by B A , the ratio of the upper bound to the lower bound of the frame, rather than the specific values of A and B. Thus, for the convenience of numerical experiments, we usually set A to a fixed value. A main advantage of standard orthogonal transformations is that they maintain the energy of the signals in the transform domain as its frame bounds A and B are equal to 1. However, the standard orthogonal basis is non-redundant that limits its performance in sparse representation. In order to make a trade off between the represent accuracy and the degree of redundant, we usually set the lower frame bound A to a value a little smaller than 1 but not over-small as A is the minimum singular value of Φ which determines the condition number of Φ. Thus, once the tolerance error is given, the value of B can be easily calculated. Further, a pair of dictionaries conform to the given error can be obtained using the proposed SSM-NTF model. On the other hand, if the value of B is given by experience, the error bound of our model can be measured.

Learning Model of Dictionary Pair
Assuming X ∈ R N×L is the training data with signal vectors x i ∈ R N , i = 1, 2, . . . , L, as its columns. The dictionary pair learning model can be written as However, the Problem (20) is difficult to solve. First, the inverse of the frame operator F has no closed-form explicit expression. Secondly, the thresholding operator is a highly nonlinear operator which makes the optimization with respect to λ hard to optimize.
Apparently, the Problem (20) is difficult to solve as the existence of the inverse of F. Fortunately, the matrix F −1 can be expressed as a convergent series [36] which is formulated as Here, we truncated the series at k = 1 to make a tradeoff between computational complexity and approximation accuracy. It is formulated as In this way, once the frame bounds are given, the inverse of F can be calculated easily. Then the optimization problem for training RIP-dictionary pair is formulated as where S λ (·) is the elementwise thresholding operator. There are two basic thresholding methods: The hard thresholding method whose thresholding operator defines as S λ (·) → max(| · | − λ, 0) and the soft thresholding whose operator is defined as S λ (·) → sgn(·)max(| · | − λ, 0). Both of the two operator are are non-convex and highly discontinuous which lead to big challenges to solve Problem (23). The mean reason is the fact that the update of the thresholding values λ causing non-smooth changes to the cost function. To solve this difficulty, we design an alternative direction method via global search and least square that will be introduce in Section 4.1.

Dictionary Pair Learning Algorithm
In this subsection, we propose the two-phase iterative algorithm for dictionary pair learning by dividing Problem (23) into two subproblems: The sparse coding phase which updates the sparse coefficients Y and thresholding values λ, and the dictionary pair update phase which computes Φ and Ψ.

Sparse coding phase
In this subsection, we discuss how to calculate the sparse coefficients Y and the threshold values λ with given Φ and Ψ under our SSM-NTF model. Given a pair of dictionaries Φ and Ψ, calculating Y and λ from X is formulated as: We pursue the two variables alternatively. Firstly, with fixed λ, we obtain the sparse coefficients Y by solving Problem (24) through OMP [9] as it can be easily convert to the classical synthesis sparse expression min Secondly, the pursue of λ is equivalent to solving the following problem which can be decomposed into M individual optimization problemŝ where ψ i is the column of Ψ. From the definition of soft thresholding operator, we can know that the function of Problem (26) is discrete. By denoting the data indices set that remains intact after the thresholding as J i , we split the data X into two parts: X J i and XĴ i such that whereĴ i is a supplementary to the intact indices J i which turn the all elements to zero. It is clear to know that the variables J i andĴ i are both functions of λ i without explicit expressions which leads to a large challenge in optimization.
In order to solve Problem (26), an intermediate variable µ i is necessarily to introduced to separate the whole problem into two parts: The update of the indices J i andĴ i (determined by µ i ) and the update of the explicit thresholding value λ i . Then Problem (26) can be transformed to another optimization problem: where J i andĴ i are two functions of the intermediate variable µ i . At the k-th step, to obtain µ i , we solve Problem (29) with λ i fixed and denote the functions Optimizing this expression is obviously non-trivial as the target function is non-convex and highly discontinuous. Actually, with λ i fixed, the minimization of f (µ i ) + g(µ i ) can be globally solved due to its discrete finite nature. In another word, if a series of candidate terms of µ i are given, the global search is guaranteed to succeed.
Once a λ i is given, the f (µ i ) + g(µ i ) will be a piecewise constant function. It means that the function values remain unchanged within a series of intervals which are determined by |ψ T i x i |, i = i 1 , i 2 , . . . , i l . Therefore, |ψ T i x i |, i = 1, 2, . . . , L can be taken as a portion of candidate terms of µ i . For the function ł(µ i ), it is clear that it minimizes at µ i = λ i and monotonically increases with the increasing distance between ł(µ i ) and the given ł(λ i ). So, to minimize ł(µ i ), we only need to choose the closest point in the feasible region.
Without loss of generality, we assume that all the |ψ T i x i |, i = 1, 2, . . . , L are ascending ordered and the corresponding signals are in the same order. We compute all the possible values of f (µ i ) + g(µ i ) by where . . , L in descending order of f (µ i ) + g(µ i ), and every two adjacent values form an interval on which the function value remains unchanged. In another word, the objective function f (µ i ) + g(µ i ) + l(µ i ) is minimized at the point closest to λ i in the interval. Thus, compute all the minimizer values on every interval and the minimum must be the optimal result.
With µ i fixed, we solve the following problem in order to pursue λ i : This is a standard continuous convex function that can be easily sovled by least square. We summarize our sparse coding method in Algorithm 1.

Input and Initialization:
Training data X ∈ R N×L , iteration number r, initial value λ i = 0. Output: Sparse coefficients y, and thresholding values λ i 1: Compute the sparse coefficients y via Problem (24) according to the OMP algorithm.

Denote them as a vector ν.
End for 4: Sort the elements of |ψ i T X| in descending order of ν. Denote the intervals bounded as ξ q , q = 1, 2, L − 1.

Dictionary Pair Update Phase
To obtain Ψ, we solve the following problem with all other variables fixed: Such problem is a highly nonlinear optimization due to the definition of S λ . Here we solve Ψ columnwisely by updating each column of Ψ.
For each ψ i , we solve the following subproblem: We denote J i andĴ i as the indices set as before. Set the elements of y i corresponding to the indiceŝ J i to be zeros and denote the new vector as z i . This operation leads to a consequence that ψ T i XĴ i ≈0. Then we solve the following quadratic optimization problem that is easy to solve with least squares.
The optimization problem to pursue Φ is formulated aŝ where the frame operator F is given by ΦΦ T and F −1 is defined as Equation (22). The target function then becomes which is denoted by h(Φ). We apply the gradient descent method to unconstraint version of Problem (35) and then project the solution to the feasible space. The gradient is given by a very complicated form as follows In order to reduce the complexity, the gradient can also be computed with the fixed F calculated in the previous step of the ADM. Then at the k-th iteration, the gradient can be written as where F = Φ k−1 Φ (k−1) T . The descent step length can be obtained by optimizing the problem min θ h(Φ + θ∇h(Φ)) with fixed F, which is given bŷ we apply a SVD decomposition Φ = UΣV T and map the singular values to the interval [ √ A, √ B] linearly. We denote the mapped singular matrix asΣ and reconstruct Φ by Φ = UΣV T . We summarize our algorithm in Algorithm 2.

Input and Initialization:
Training data X, frame bound A,B, iteration num, gradient descent iterations r.
Build frame Φ ∈ R N×M and Ψ ∈ R N×M ,either by using random entries, or using M randomly chosen data.

Restoration
The image restoration aims to reconstruct a high-quality image I from its degraded (e.g., noisy, blurred and/or downsampled) version L, denoted by L = SHI + n, where H represents a blurring filter, S the downsampling operator, and n is a noisy signal. For the signal satisfies the SSM-NTF, the restoration model based on SSM-NTF is formulated as where R i is an operator that extracts the i-th patch of the image I and y i is the i-th column of Y . λ denotes a vector [λ 1 , λ 2 , · · · , λ M ] with λ j operating on the j-th element of Ψ T R i I. On the right side of Equation (41), the first term is the global force that demands the proximity between the degraded image L, and its high-quality version I. The rest terms are the local constraints to make sure every patch at location i satisfies the SSM-NTF.
To solve Problem (41), we apply Algorithm 1 to obtain the sparse coefficients Y and the threshold values λ. We mainly state the iterative method to obtain I. Assume the sign of Ψ T R i I k will not change much between two steps, we set it in the k-th step by c k = sign(Ψ T R i I k−1 ). where sign is the sign function. Denote d k = Ψ T R i I k−1 . We set O k as an index set that satisfies |d l | ≤ λ l , l ∈ O k . Set u k ∈ R M as a vector with elements u l = λ l l ∈ O k , 0 otherwise. Then the non-convex and non-smooth threshold can be removed with the substitution that y i − S λ (Ψ T R i I k ) ≈ y i + c k u − Ψ T R i I k Thus, in the k-th step, the problem needs to be solved is expressed as where is point multiplication. This convex problem can be easily solved by gradient descent algorithm. We summarized the restoration algorithm in Algorithm 3.

Input
Training dictionaries Φ, Ψ, iteration number r, a degraded image L, set I 0 = L. Output: The high quality imageÎ 1: Compute Y and λ via the method in Algorithm 1.

Complexity Analysis
In this section, we discuss the computational complexity of our sparse coding and dictionary pair learning algorithms with regard to those of conventional sparse model counterparts.
We first analyze complexities of the main components of the sparse coding (SC) and dictionary updating (DU) algorithms. In terms of SC, given a set of training samples, X ∈ R N×L , the complexity of BtOMP of calculatingŶ = min where K is the target sparsity and the complexity of threshold of calculatingλ = min is O(N ML), which cost most of time in SC step at each iteration. The sparse coefficients Y ∈ N × L and the the threshold values λ are computed with fixed dictionaries Φ ∈ R N×M and Ψ ∈ R N×M . Correspondingly, the traditional sparse coefficients B ∈ N × L is sparse approximated by dictionary D ∈ R N×M and the computational complexity is O(K 2 ML).
In terms of DU, with the given training samples X ∈ R N×L , we learn a pair of dictionary Φ ∈ R N×M and Ψ ∈ R N×M . We update Ψ via Problem (34) with a computational complexity of O(N 2 L). In order to update Φ. we need to calculate the gradient via Equation (39) with a computational complexity of O(N ML) and the step size via Equation (40) with a computational complexity of O(rN ML) where r is the iteration number of the gradient descent. For the traditional dictionary learning, the corresponding training set is X ∈ R N×L and the dictionary D ∈ R N×M is updated by SVD decomposition of rank-1 with a computational complexity of O(KML).

Experimental Results
We demonstrate the effectiveness of our SSM-NTF model by first discussing the convergence of our dictionary pair learning algorithm and then evaluating the performance on natural and piecewise constant image denoising, super resolution and image inpainting.

Convergence Analysis
The convergence of the presented dictionary pair learning algorithm is evaluated in Figure 1. Here, we train a pair of dictionaries Φ and Ψ of size 100 × 200 from 65,000 patches, which are of size 10 × 10 randomly sampled from six natural images. We apply the frame reconstruction function ΦS λ (Ψ T x) to reconstruct the patches. The convergence of the presented dictionary pair learning algorithm is evaluated in Figure 1. The dictionary pair is illustrated in Figure 2. They exhibit that our dictionary pair learning method is able to capture the feature of the image along with the convergence property. Figure 1.
Convergence analysis. The X-labels are the iteration number.
The Y-labels are the is the objective function of System (20) (left) and the restoration result (measured by 'PSNR') (right). It is shown that our dictionary pair learning algorithm is a convergence one.

Image Denoising
In this subsection, we evaluate the performance of our proposed SSM-NTF model on image denoising. Benefitting from the concept of non-tight frame, the proposed SSM-NTF model contains a pair of dictionaries: The frame and its dual frame. As a result, our proposed SSM-NTF model contains an analysis system and a synthesis system. The analysis-like system is denoted as which analyzes the signals in Ψ domain. The synthesis system is denoted as which reconstructs the analyzed signals. The two systems share the same sparse coefficients Y. Therefore, we compare our proposed SSM-NTF with synthesis and analysis models, respectively. It is well known that the synthesis sparse model has advantage in dealing with the natural image while the analysis sparse model is mostly used to address the piecewise constant image. Therefore, we respectively, perform the denoising experiments on natural images and piecewise images comparing with the most related approaches.

Natural Image Denoising
We now turn to present experimental results on six classical natural images named 'Barbara', 'Boat', 'Couple', 'Hill', 'Lena' and 'Man' which are shown in [1], to evaluate the performance of the training algorithm. The denoising problem which has been widely studied in sparse representation is used as the target application. We add Gaussian white noise to these images at different noise levels σ = 20, 30, 40, 50, 60, 70, 80, 90, 100. Then we use the learned dictionary pair to denoise the natural images, with overlap of 1 pixel between adjacent patches of size 10 × 10. The patch denoising stage is followed by weighted averaging the overlapping patch recoveries to obtain the final clean image. The parameters in our scheme are γ 1 = 1.1 and γ 3 = 1.2(L/M) 2 where L and M are the sample and dictionary size, respectively. We have stated in Section 3.2 that we usually set A to a positive number around but smaller than 1. In fact, we set A from 0.6 to 1 by a step of 0.03 to test the denoising performance to determine the specific value of it. Then, with fixed A, we set B from 1 to 4 with a step length of 0.3 to run experiments on every noise level to determine the values of B. The values of frame bounds A and B are shown in Table 1. For example, when the noise level σ = 40, A and B are set to be 0.8 and 1.8, respectively.   Table 2 shows the comparison results in terms of PSNR. There are three related image denoising methods involved, including the classical dictionary learning algorithm KSVD [9], the data-driven tight frame based denoising method [1] and the incoherent dictionary learning based method [21]. The patch size of KSVD [9] and the method in [21] are 8 × 8 with stripe 1 and the dictionaries are of size 64 × 256 at their optimal state according to the previous work. We point out that [1] works on filters of size 16 × 16 instead of image patches and initialized by 64 3-level Harr wavelet filters in size 16 × 16. All the three compared methods can achieve their best performance with 50 iterations. Table 2 shows that the incoherent dictionary learning method [21] outperforms the KSVD [9] in average as the mutual incoherent of dictionary can provide stable recovery. That [1] outperforms [21] implies that the tight frame is a more stable system. Then our stable sparse model based method outperforms [1] in average suggests that applying non-tight frame to approximate RIP can provide even better and more stable reconstruction results. Figure 3 shows two exemplified visual results on images 'Man' and 'Couple' at noise levels σ = 50 and σ = 40, respectively. The proposed method shows much clearer and better visual results than the other competing methods.  In this subsection, we demonstrate the analytical property of our SSM-NTF model using a synthetic image. The denoising problem which has been widely studied in sparse representation is used as the target application. We start with a piecewise constant of size 256 × 256 contaminated by Gaussian white noise with noise level σ = 5 and extract all possible 5 × 5 image pathes. For the denoising we apply the dictionary pair learning algorithm with the parameters γ 1 = 1.5, γ 3 = 1.2 L/M, A = 0.8 and B = 1.8 in parallel with patch denosing with the synthesis KSVD [9] and the analysis KSVD [14]. We apply 100 iterations of the our dictionary learning method on this training set, and learning dictionary pair of size 25 × 50. The experimental set of the synthesis KSVD [9] and the analysis KSVD [14] are at their optimal state according to the previous work.
The learned dictionary pair Φ which exhibits much like the synthesis dictionary and Ψ which exhibits a high resemblance to the analysis dictionary are illustrated in Figure 4. The resulting PSNRs of the denoised images are 45.32 dB for Analysis KSVD, 43.60 dB for Synthesis KSVD, and 45.17 dB for our proposed algorithm. The figure shows that our dictionary pair learning method is able to capture the features of the piecewise constant image. Figure 5 shows the absolute difference images for each of the three methods. Note that these images are displayed in the dynamic range [0, 20]to make the differences more pronounced. Our proposed approach leads to a much better denoising result than the synthesis KSVD and is comparable with the analysis KSVD.  Visual quality comparison of denoising results for piecewise constant image. Images of the absolute errors are displayed in the dynamic range [0,20] (from left to right): Original image, noise image, analysis KSVD [14], synthesis KSVD [9], our proposed method.

Super Resolution
We evaluate our SSM-NTF in comparison with two examplar-based scheme [37] for image Super Resolution (SR) Problem (41) with a bicubic filter. Figure 6 shows the 15 test natural images [18] with both rich texture and structure. All the schemes are applied to the illumination channel, where the scale factor is 3, we always use 3 × 3 low-resolution patches with overlap of 1 pixel between adjacent patches, corresponding to 9 × 9 patches with overlap of 3 pixels for the high-resolution patches. In these experiments, we have used the following parameters: A = 0.8, B = 1.8, γ 1 = 1.1 and γ 3 = 1.2 L/M where L and M are the sample and dictionary size, respectively. In our scheme, dictionary learning is performed between HR and middle-level (MR) images which are the first-, and second-order derivatives of the upsampled version of one LR image by a factor of 2. The four 1D filters used to extract the derivatives are: We train two pairs of HR/LR dictionaries {Φ h , Ψ h } and {Φ l , Ψ l } from 100,000 HR/LR patch pairs [X h , X l ] randomly sampled from the collected natural images which are also used in [37] where X h is sampled from the HR images and X l is sampled from the four feature images. The feature images are obtained by applying the four filters to the upsampled LR image. Given Φ and Ψ and the four MR feature images, the sparse coefficients Y and threshold value λ can be calculated by Algorithm 1. With the theory in [37], the HR image can be recovered via Algorithm 3. In the experiment, our HR dictionary pair are of size 81 × 450 and MR ones are of size 144 × 450. The dictionary size of [37] is 81 × 1024 (HR) and 144 × 1024 (MR) at its best performance as stated in the paper. Thus, the dictionary size of [37] is larger than the sum of our dictionaries. Table 3 shows the objective evaluation results of our proposed SSM-NTF compared with bicubic interpolation and [37]. On average, our SSM-NTF presents best in PSNR. Figure 7 presents the corresponding visual comparison of the illumination SR results of Image 12. We can observe that the result of bicubic interpolation is too smooth and the result of [37] suffers from obvious ringing artifact and noises. The HR reconstruction of our SSM-NTF method provides more clear details.   Table 3. From left to right: Original image, result of bicubic interpolation (PSNR = 30.28), [32] (PSNR = 30.62) and our SSM-NTF method (PSNR = 30.99), respectively.

Image Inpainting
To illustrate the potential applicability of our proposed SSM-NTF model on image inpainting, we apply it to the applications of text removal. In these experiments, we have used the following parameters: A = 0.8, B = 1.8, γ 1 = 1.1 and γ 3 = 1.2 L/M where L and M are the sample and dictionary size, respectively. We operate on the image 'Adar','Lena', 'Couple', 'Hill' with super-imposed text of various fonts.
In this experiment, we applied our SSM-NTF model to image inpainting in a way similar to the non-blind KSVD inpainting algorithm [9], which requires the knowledge of which pixels are corrupted and required inpainting. Actually, only the non-corrupted pixels are used to training the dictionary pair and inpainting the images. We operate our method on pathes of size 10 × 10 that extract from the images with overlap of 1 pixel between adjacent. The trained dictionary pair are of size 100 × 200. The KSVD algorithm in this experiment is dealing with patches of size 8 × 8 that extract from the images with overlap of 1 pixel between adjacent. The dictionary size is 64 × 256 at its best performance according to [9]. The patch inpainting stage is followed solving Problem (41). Table 4 shows the objective evaluation results of our proposed SSM-NTF compared with DCT and KSVD [9]. The visual comparisons are shown in Figures 8 and 9. We find that the proposed SSM-NTF method is able to eliminate text of fonts completely while the KSVD is dull. Our SSM-NTF method achieves better performance in terms of both subjective and objective quality.

Conclusions
In this paper, we propose a stable sparse model with non-tight frame (SSM-NTF) and further formulate a dictionary pair learning model to stably recover the signals. We theoretically analyze the rationality of the approximation for RIP with the non-tight frame condition. The proposed SSM-NTF has RIP and the closed-form expression of the sparse coefficients that ensure the stable recovery especially for seriously noise images. The proposed SSM-NTF contains both a synthesis sparse and an analysis system which share the common sparse coefficients without taking into account the thresholding. We also propose an efficient dictionary pair learning algorithm via developing an explicit analytical expression of the inherent relation between the dictionary pair. The proposed algorithm is capable of approximating structures of signals via a pair of adaptive dictionaries. The effectiveness of our proposed SSM-NTF and its corresponding algorithms are demonstrated in image denoising, image super-resolution and image inpainting. The results of numerical experiments show that the proposed SSM-NTF achieves superior to the compared methods in objective and subjective quality on most of the cases.
On the other hand, our proposed SSM-NTF is actually a 1D sparse model. The 1D sparse model suffers from high memory as well as high computational costs especially when handling high dimensional data. MD frame can be expressed as the kronecker product of a series of 1D frames. Benefitting from this good characteristic, in future work, we will extend our stable sparse model to propose an MD stable sparse model. Moreover, the proposed SSM-NTF is not effective enough to remove other kinds of noise (e.g., salt and pepper noise) as the loss function of SSM-NTF is gaussian. We would like to improve the performance of our model by changing the loss function.