Error Estimations for Total Variation Type Regularization

: This paper provides several error estimations for total variation (TV) type regularization, which arises in a series of areas, for instance, signal and imaging processing, machine learning, etc. In this paper, some basic properties of the minimizer for the TV regularization problem such as stability, consistency and convergence rate are fully investigated. Both a priori and a posteriori rules are considered in this paper. Furthermore, an improved convergence rate is given based on the sparsity assumption. The problem under the condition of non-sparsity, which is common in practice, is also discussed; the results of the corresponding convergence rate are also presented under certain mild conditions.


Introduction
Compressed sensing [1,2] has gained increasing attention in recent years; it plays an important role in signal processing [3,4], imaging science [5,6] and machine learning [7]. Compressed sensing focusses on signals with sparse presentation. Let H 1 be a Hilbert space, and {e i ∈ H 1 |i ∈ N} be the orthonormal basis of H 1 . For any x ∈ H 1 , let x i := x, e i . Given some operators K satisfy certain conditions, it is possible to recover a sparse x † ∈ C n signal with length n by Basis Pursuit (BP) [8], i.e., min ||x|| 1 s.t. y † = Kx, from the samples y † = Kx † , even K is ill-posed [2,9,10]. However, in most cases, noise is inevitable. The literatures has turned to studying the noised BP model where δ is the allowed error. Actually, the unconstrained form of the noised BP model, i.e., sparse regularization which is the focus in [11][12][13][14][15][16] is more attractive. While the success of compressed sensing greatly inspired the development of sparse regularization, it is interesting to see that sparse regularization appeared much earlier than compressed sensing [11,12]. As an inverse problem, the error theory of sparse regularization is well studied in the literature [17][18][19].
In practical terms, a large crowd of signals is not sparse unless being transformed by some operators (maybe ill posed). Thus, many studies have been proposed to analyze the regularized optimization problem [20]. A typical example of them is signal with a sparse gradient which arises frequently from imaging processing (nature images are usually piecewise constant, i.e., they have a sparse gradient). The Total Variation (TV) has been used extensively in the literature for decades in imaging sciences and a series of techniques have been dedicated to researching its choice of regularization parameter [21][22][23][24][25][26][27][28][29][30][31]; others [32,33] are developed based on this observation. Similar to [34], Total Variation can also smooth the signal of interest. Let H 2 be another Hilbert space. For any x ∈ H 1 , define that T : Under the above definition, T is an ill-posed linear operator. Given a linear map K : H 1 → H 2 and y δ ∈ H 2 , the total variation regularization problem can be represented as where α > 0 is the regularization parameter. The regularization term ∑ i |(Tx) i | is the right total variation (TV) of x. The TV type regularization has a similar form to the sparse regularization. However, the perfect reconstruction result established in sparse regularization can not be applied to theTV type directly, especially when T is ill-posed (T has a nontrivial null space). So in this paper, firstly, we discuss the stability and consistency of the minimizers of Ψ α . Besides basic properties, we are also interested in the convergence rate to solve the TV problem. Then, under the source conditions [19,35,36], convergence rates get obtained for both a priori and a posteriori parameter choice rules. However, the linear convergence rate requires K to be injective, which is strict usually. In the latter part, the linear convergence rate can also be derived under the sparsity assumption on Tx † and some suitable conditions for K. This requirement of deduction does not depend on the injectivity of K. Meanwhile, this paper also considers the case when the sparsity assumption on Tx † fails. Last, based on some recent works [37][38][39], which also assume the Tx † is not sparse, a convergence rate is also given in this case.
The rest of this paper is organized as follows. Section 2 provides a brief summary of the notations. Section 3 presents some basic properties and gives the convergence rate of the minimizer. Section 4 proves the improved convergence rate. Finally, Section 5 concludes the whole paper.

Notation
The notations described in this section are adopted throughout this paper. Let H 1 , H 2 be two Hilbert spaces and {e i ∈ H 1 |i ∈ N}, {ξ i ∈ H 2 |i ∈ N} be the orthonormal basis of H 1 and H 2 , respectively. For any x ∈ H 1 and y ∈ H 2 , x i := x, e i and y i := y, ξ i . The 1 and 2 norms of x and y are denoted by x 1 := ∑ i |x i |, x 2 := (∑ i |x i | 2 ) 1 2 and y 1 := ∑ i |y i |, y 2 := (∑ i |y i | 2 ) 1 2 , respectively. In this paper, if not specified, for any x ∈ H 1 and y ∈ H 2 , we assume that x, y ∈ L 2 , i.e., x 2 < +∞ and y 2 < +∞. x n x means that x n converges weakly to x, while x n → x means x n converges strongly to x. The operator norm of the linear operator K : H 1 → H 2 is defined as Through the paper, x † means the signal of interest; y † := Kx † are the measurements. y δ denotes an element in H 2 satisfying y † − y δ 2 ≤ δ. Under these notations, the TV regularization can be expressed as Denote that x δ α is one of the minimizers of Ψ α .

Remark 2.
Let D = T − I d , where I d is the identical operator over H 1 . Then, (Dx) i = −x i+1 for any i ∈ N. It is easy to verify that D is continuous. Then, T is continuous over H 1 and In practice, The ill condition of T brings trouble to the analysis. To overcome this problem, we consider a condition which plays an important role in the deduction.

Condition 1.
There exist two constants c, m > 0 such that We present a finite-dimensional understanding of this condition. Let dim(H 1 ) = M and dim(H 2 ) = N. Then, K ∈ R M×N satisfies null(K) = 0 and T ∈ R (N−1)×N . In the finite dimension case, T has the form The definition of T gives that null(T) = span( − → 1 ). If K − → 1 = 0. Then, null(K) null(T) = 0, we have that null cK mT = 0, where c, m > 0. Hence, for any x ∈ R n and some

Basic Error Estimations
The properties of TV type regularization are investigated in this section. First, a lemma is introduced which is used in this section frequently. Lemma 1. Let y δ be bounded, α be fixed and {x n } n=1,2,... be a sequence. Assume that Condition 1 holds and {Ψ α (x n )} n=1,2,... is bounded. Then, {x n } n=1,2,... is also bounded.

Stability
In this subsection, we investigate the performance of x δ α as α → α, when y δ is fixed. A lemma is introduced which arises in convex optimization.
Lemma 2 ([40,41]). Let χ * be the solution set of the convex minimization problem Then, Kx and Tx 1 is constant over χ * . Theorem 1. Assume that K, T satisfies Condition 1. For any fixed α > 0 and y δ ∈ H 2 , we have Proof. The minimizing property of x δ α n gives that 1 2 Kx δ α n − y δ 2 2 + α Tx δ α n 1 ≤ Ψ α n (0). Then, Lemma 1 indicates that there exists a subsequence of {x δ α n } converging weakly to some x * ∈ 2 . For simplicity, we also denote this subsequence as {x δ α n }. By the weak lower continuity of the norms, we have Therefore, we have that On the other hand, by the minimizing property of x δ α n , Obviously, it holds that lim sup That means In the following, we present the proof by the mean of contradiction. Assume that t : We can obtain that This is a contradiction. Then, we have From relations (3), we can obtain that Kx δ α n → Kx δ α .
If K is injective, we can further have that lim α n →α x δ α n = x δ α . The theorem above indicates that Ψ α (x δ α ) and Tx δ α 1 are continuous at α. In fact, we can obtain a stronger result; the value function is differentiable at α.
Due to that x δ α minimizing Ψ α , we have Combining the two inequalities above, we have When α < α, similar results can be also obtained. The continuity of Tx δ α 1 at α gives that dF(α) dα = Tx δ α 1 .

Consistency
The performance of x δ α is investigated under a prior parameter choice as δ → 0. In the analysis, we assume that the following conditions hold.

Condition 2.
For any x ∈ H 1 obeying Kx = y † , x † satisfies that The equality holds if and only if x = x † .
Proof. We can obtain that lim sup The triangle inequality gives that ( Tx n Thus, we have lim sup n T(x n − x * ) 1 = 0.
By the same method, we also can obtain that K(x n − x * ) 2 → 0.
Theorem 3. Assume that K, T satisfies Condition 1 and Lemma 1. Let the parameters satisfy that Proof. By the definition of x δ α , we have From the parameters' choice rule of α and δ, we can see that {Ψ α (x δ α )} are bounded. Then, from Lemma 1, there exists a subsequence also denoted by {x δ α } δ and some point x * such that x δ α x * . We can have that This means Kx * = y † . It is easy to see that lim δ Kx δ α − y † 2 2 = 0. On the other hand, we can obtain that Condition 2 gives that x * = x † . From the inequality above, we see that lim δ Tx δ α 1 = Tx † 1 . By Lemma 3, we have T(x δ α − x * ) 1 → 0 and K(x δ α − x * ) 2 → 0. Consequently, from Condition 1, it holds that

Convergence Rate
This subsection concerns the convergence rate under different parameter choice rules (a priori and a posteriori). First, we discuss the a priori one. Like the classical Tikhonov regularization method [19,35,36], we introduce a source condition.

Condition 3. Let x † satisfy the source condition
Theorem 4. If x † satisfies the source condition, it holds that If K is injective, there exists γ > 0 such that Proof. The definition of x δ α gives that Using the notation C(x) = Tx 1 , we obtain that For any v ∈ ∂C(x † ), the convexity of C indicates C(x δ α ) ≥ C(x † ) + v, x δ α − x † . Then, we have that Choose v = K * w in the source condition; after simplification, we derive that By adding both sides with , we obtain that This means If K is injective, there exists γ > 0 such that x 2 ≤ γ Kx 2 . Then, we derive that

Remark 3.
In fact, the first result in Theorem 4 has been proved by [42] for general convex regularization. The proof here is for the completeness.
The following part investigates the a posteriori parameter choice rule. The analysis is motivated by the work in [43,44]. For simplicity of presentation, the parameter α is chosen as Theorem 5. Assume that α is chosen as rule (5), and x † satisfies Condition 2. It then holds that If K is injective, there exists θ > 0 such that Proof. It is trivial to prove that Then, the sequence has a sub-sequence also denoted by {x δ α } δ converging weakly to some x * . We can easily see that That is actually to say that Kx * = y † . Moreover, it is easy to see that lim δ Kx δ α − y † 2 2 = 0. Using relation (6), we have that Condition 2 gives that x * = x † ; hence, the whole sequence converges weakly to x † and Thus, we have Tx δ If K is injective, there exists θ > 0 such that x 2 ≤ θ Kx 2 . Then, we derive that

Improved Convergence Rate
In this section, we investigate the convergence rate when K may be not injective. The first part presents the analysis under the sparse assumption while the second one deals with the case when the sparsity assumption fails.

Performance under Sparsity Assumption
The analysis in this subsection assumes that Tx † is sparse. To prove the convergence rate we need the finite injectivity property [45].

Condition 4.
The operator K satisfies the uniformly finite injectivity property, i.e., for any finite subset S ⊆ N, K| S is injective.

Remark 4.
In the finite dimension case, if supp(S) is small, it is easy to find that the finite injectivity property is actually the restrict isometry property [2,46].
Let z := Tx and z † := Tx † . Denote S as the set S := {i ∈ N : |v i | > 1 2 }, where v ∈ ∂ z † 1 satisfies the source condition. Let m = sup i / ∈S {|v i |}. Due to that v ∈ 2 , S is finite and it contains the support of z † . Let P be the identical projection onto S and P ⊥ be the one onto N \ S. From Condition 4, there exists some d > 0 such that d KPz 2 ≥ Pz 2 .

Lemma 4.
Assume that x † satisfies the source condition and Condition 1 holds. If md K < 1, there exist c 1 > 0 and c 2 > 0 such that Proof. Assume the conditions in Lemma 4 are held. Then, we can obtain that Hence, we derive that We now turn to estimating P ⊥ z 2 . Let m = sup i / ∈S {|v i |}. Obviously, m ≤ 1 2 . We then have that The source condition gives that Therefore, we have that From Condition 1, we have that Note that md K < 1; let q = 1 1−md K ; we have that With the lemma above, we can obtain the following result. The proofs can be found in [44,[47][48][49]. Theorem 6. Let the regularization parameter be chosen a priori as α(δ) = O(δ) or a posteriori as α(δ) according to the strong discrepancy principle (5) Then we have the convergence rate

Performance if Sparsity Assumption Fails
In this subsection, we focus on the case where Tx † is not sparse. As presented in the last section, lemma 4 is critical for the convergence rate analysis. In this part, a similar lemma will be proposed. Then, the convergence rate will be proved. The first lemma is motivated by [37].
Algebra computation gives that Note that and P n Tx † 1 ≤ P n T(x − x † ) 1 + P n Tx 1 . Combining the equations above, we obtain Condition 5. For all k ∈ N there exists f k ∈ H 2 such that T * e k = K * f k and lim k→∞ f k 2 → +∞.
Assume that x † satisfies the source condition and Conditions 1 and 5 hold. If c K < 1, it holds that Proof. ϕ is concave and upper semi-continuous since it is an infimum of affine functions. For any t ≥ 0, ϕ is finite and continuous. Note that ϕ(0) = 0; the upper semi-continuity at t = 0 gives the continuity of ϕ at t = 0. We turn to the strict monotonicity of ϕ. Condition 5 means the infimum of ϕ(t) is attained at some n ∈ N. Considering 0 < t 1 < t 2 < +∞, we have Therefore, we obtain that From Condition 1, we have that Let q = 1 1−c K , we obtain that Theorem 7. Let the regularization parameter be chosen a priori as α(δ) = O( δ 2 ϕ(δ) ) or a posteriori as α(δ) according to the strong discrepancy principle (5). Then we have the convergence rate x δ α − x † 2 = O(ϕ(δ)).

Conclusions
In this paper, we study some problems in total variation type regularization. While owning a familiar form as the sparse regularization, the TV type is hard to investigate for the ill condition of T. A group of regularization conditions has been given in this paper. Under these conditions, we study several theoretical properties such as stability, consistency and convergence rates of the minimizer of the TV type regularization. These analyses are deepened for the convergence rate under the assumption of sparsity. In the non-sparse case, we also present a conservative result based on some recent works. Now, the regularizers learned from the data are all the rage in research. So, in future work, we will make the error estimations for this type of regularization problem.

Conflicts of Interest:
The authors declare no conflict of interest.