The Singular Value Expansion for Arbitrary Bounded Linear Operators

: The singular value decomposition (SVD) is a basic tool for analyzing matrices. Regarding a general matrix as deﬁning a linear operator and choosing appropriate orthonormal bases for the domain and co-domain allows the operator to be represented as multiplication by a diagonal matrix. It is well known that the SVD extends naturally to a compact linear operator mapping one Hilbert space to another; the resulting representation is known as the singular value expansion (SVE). It is less well known that a general bounded linear operator deﬁned on Hilbert spaces also has a singular value expansion. This SVE allows a simple analysis of a variety of questions about the operator, such as whether it deﬁnes a well-posed linear operator equation and how to regularize the equation when it is not well posed.


Introduction
One of the most powerful ideas in linear algebra is diagonalization, which renders many problems completely transparent. For example, if A ∈ R n×n is a symmetric matrix, the spectral theorem implies that there exists an orthogonal matrix V ∈ R n×n and a diagonal matrix D ∈ R n×n such that (see, for instance, ([1], Section 7.1)). To say that V is orthogonal means that V T V = I, the n × n identity matrix, which implies that the columns v 1 , v 2 , . . . , v n of V form an orthonormal basis for R n . If D has diagonal entries λ 1 , λ 2 , . . . , λ n , then we can also write If A ∈ R m×n is a general matrix (not assumed to be symmetric or even square), the singular value decomposition (SVD) allows us to diagonalize the matrix, at the cost of using two different orthonormal bases. There exist orthogonal matrices U ∈ R m×m and V ∈ R n×n and a diagonal matrix S ∈ R m×n such that A = USV T and S = diag(σ 1 , σ 2 , . . . , σ n ), with σ 1 ≥ σ 2 ≥ · · · ≥ σ n ≥ 0 (see ( [2],  or ( [1], Chapter 8)). This decomposition follows immediately from the spectral theorem for symmetric matrices; in fact, A T A = VS T SV T is the spectral decomposition of the symmetric matrix A T A.
Among its other virtues, the SVD reveals the rank of A and bases for the four fundamental subspaces associated with A. If A has rank r, then exactly r of the singular values σ 1 , σ 2 , . . . , σ n are positive, and we can write the SVD in the reduced form where u 1 , u 2 , . . . , u m ∈ R m , v 1 , v 2 , . . . , v n ∈ R n are the columns of U, V, respectively, U = [u 1 |u 2 | · · · |u r ],V = [v 1 |v 2 | · · · |v r ], andŜ = diag(σ 1 , σ 2 , . . . , σ r ). The four fundamental subspaces are represented by orthonormal bases as follows: It is well known that spectral theory is considerably more complicated for linear operators defined on infinite-dimensional spaces. However, for compact operators, the finite-dimensional theory carries through almost unchanged. Throughout the rest of this paper, X and Y will denote real Hilbert spaces. If T : X → Y is linear, then T is called compact if and only if {Tx n } has a convergent subsequence in Y for every bounded sequence {x n } in X (this is equivalent to T's being continuous when the weak topology is imposed on X and the norm topology on Y).
The spectral theorem for self-adjoint compact operators (see ( [3], Section 4.3) or ( [4], Section 8.2)) says that if T : X → X is compact and self-adjoint, then there exists an orthonormal sequence {φ k } in X and a sequence {λ k } of nonzero real numbers such that where the outer product φ k ⊗ φ k : X → X is the bounded linear operator defined by For ease of exposition, we will assume that the sequences {φ k } and {λ k } are infinite sequences; in the contrary case, T is a finite-rank operator and the infinite series becomes a finite sum. For a general (that is, not necessarily self-adjoint) compact operator T : X → Y, we can derive the singular value expansion (SVE) of T by applying the spectral theorem for self-adjoint compact operators to T * T (see [3], Section 4.4). The result is where {φ k } is an orthonormal sequence in X, {ψ k } is an orthonormal sequence in Y, and σ 1 ≥ σ 2 ≥ · · · are positive numbers converging to zero. Moreover, {φ k } is a complete orthonormal set for N (T) ⊥ and {ψ k } is a complete orthonormal set for R(T).
The SVE of a compact operator has many applications, particularly in analyzing linear operator equations of the form Tx = y (that is, given y ∈ Y, find or estimate x ∈ X satisfying Tx = y). When T is compact and not of finite rank, this equation is ill-posed in the sense that the solution x (if it exists) does not depend continuously on the data y; in this case, the equation is often referred to as an inverse problem. The singular value expansion of T is useful in analyzing approaches to regularizing Tx = y, that is, to computing a stable (approximate) solution to the equation in the presence of noisy data.
When T : X → Y is not necessarily compact, there still exists a singular value expansion in the following form. Theorem 1. Let X and Y be real Hilbert spaces and let T : X → Y be a bounded linear operator. Then there exist a Borel space (M, A, µ) with a second-countable topology T , isometries V : L 2 (µ) → X, U : L 2 (µ) → Y, and an essentially bounded measurable function σ : M → R such that where V † is the generalized inverse of V and m σ is the multiplication operator defined by σ: , Moreover, σ > 0 a.e.
By Borel space, we mean a measure space (M, A, µ) such that there is a topology T defined on the set M and A (the collection of measurable subsets of M) is the σ-algebra of Borel sets of T . As noted in the theorem, the topology is guaranteed to be second-countable, that is, to have a countable base. Furthermore, note that for an isometry V, V † = V * .
We will call the representation of Theorem 1 the SVE of T. Pietsch ([5], Section D.3) outlines a short proof of Theorem 1 based on the polar decomposition. We give a direct proof in the next section that is analogous to the derivation of the SVD of a matrix A from the spectral decomposition of A T A. In Section 3, we derive some basic results about this form of the SVE, including its relationship to the classical SVE of a compact operator and how to recognize from the SVE when R(T) fails to be closed. We also include a brief discussion of the relationship of the SVE to notions of s-numbers (the generalization of singular values) that have appeared in the literature. In Section 4, we analyze the inverse problem Tx = y, including methods for regularizing the equation, using Theorem 1. The results in Section 4 are not new, but we hope to convince the reader that the analysis based on Theorem 1 is particularly convenient. We conclude with a brief discussion in Section 5.

The SVE of a Bounded Linear Operator
As noted above, the SVD of a matrix A ∈ R m×n can be derived from the spectral decomposition of the symmetric matrix A T A ∈ R n×n . In the same way, the SVE described by Theorem 1 can be derived from the following spectral theorem for a bounded, self-adjoint linear operator. Theorem 2. Let T be a bounded and self-adjoint linear operator mapping a real Hilbert space X into itself. Then there exists a Borel space (M, A, µ) with a second-countable topology T , a unitary operator V : L 2 (µ) → X, and an essentially bounded measurable function θ : M → R such that where m θ is the multiplication operator defined by θ.
This version of the spectral theorem is usually stated in terms of a complex Hilbert space X and complex L 2 (µ) (see, for instance, [6] for an accessible exposition), but it can be verified that the same proof yields this representation when X is a real Hilbert space and L 2 (µ) denotes the space of real-valued square-integrable functions.
To derive Theorem 1 from Theorem 2, we require the following preliminary result.
We can now prove a special case of Theorem 1.
Theorem 3. Let X and Y be real Hilbert spaces and let T : X → Y be a bounded linear operator with N (T) = {0}. Then there exist a Borel space (M, A, µ) with a second-countable topology T , a unitary operator V : L 2 (µ) → X, an isometry U : L 2 (µ) → Y, and an essentially bounded measurable function σ : Moreover, σ > 0 a.e. and R(U) = R(T).
Proof. By Theorem 2, there exist a measure space (M, A, µ), a unitary operator V : L 2 (µ) → X, and a bounded measurable function θ : M → R such that We first show that θ ≥ 0 a.e., which obviously follows if we prove that However, m θ = V −1 T * TV and hence, for any f ∈ L 2 (µ), . Therefore, θ ≥ 0 a.e., as desired, and we can assume that θ ≥ 0 everywhere.
where χ E is the characteristic function of the set E; this implies that Vχ E = 0 in X and hence that T * TVχ E = 0 (since N (T * T) = N (T) is trivial). However, Therefore, if we define σ = √ θ and then Lemma 1 applies and we see that S is dense in L 2 (µ). We define U : S → Y by U = TVm σ −1 .
Since σ −1 f ∈ L 2 (µ) for all f ∈ S, U is well-defined, and we see that it is linear and densely defined.
However, now we see that U is bounded and densely defined and hence it extends to a bounded operator defined on all of L 2 (µ). We will use U to denote this extension as well (therefore U : . However, then we see that, for each x ∈ X, which shows that y ∈ R(U). Since R(U) ⊂ R(T) by definition of U, it follows that R(U) = R(T). This completes the proof.
Theorem 1 is an immediate corollary of Theorem 3.
Proof of Theorem 1. If we apply Theorem 3 to T| N (T) ⊥ , we obtain where V : Since V is obviously an isometry, proving that (4) holds will complete the proof. By definition, V † x is the minimum-norm least-squares However, then This proves (4), which completes the proof.

Relationship to the SVE of a Compact Operator
Suppose T : X → Y is a compact operator with singular value expansion As noted above, {φ n } is a complete orthonormal sequence for N (T) ⊥ and {ψ n } is a complete orthonormal sequence for R(T).
Let us define M = Z + , A = P (Z + ) (the power set of Z + ), and µ to be counting measure (that is, for an E ⊂ Z + , µ(E) is the cardinality of E). Then L 2 (µ) is the space of square summable sequences of real numbers (usually denoted by 2 ) and, for α = {α k } ∈ L 2 (µ), Then it is straightforward to verify that V is an isometry and that V † = V * is defined by V † (x) k = φ k , x X for all k ∈ Z + . The sequence σ = {σ k } is bounded and measurable with respect to the measure space (M, A, µ) and m σ α = {σ k α k }.
Finally, U : L 2 (µ) → Y is defined to be the extension to all of L 2 (µ) of TVm σ −1 , which is given, for α ∈ L 2 (µ) such that m σ −1 α also lies in L 2 (µ), by Clearly this formula extends to every α ∈ L 2 (µ). Therefore, for each x ∈ X, we have This shows that T = Um σ V −1 and also that Um σ V −1 is just another way of writing the usual singular value expansion of T.

The SVE of Operators Related to T
Throughout the rest of the paper, we assume that T : X → Y is a bounded linear operator from one real Hilbert space X to another such space Y, and that T = Um σ V † is the SVE of Theorem 1. The associated Borel space is denoted by (M, A, µ) and the topology of M is denoted by T .
Since V * = V † and V * V = I hold for any isometry V, we immediately have the following:

Inverse Problems and the SVE
It is well known that the equation Tx = y represents a true inverse problem if and only if R(T) fails to be closed (see ([3], Section 2.3)). In this case, the solution x (if it exists) does not depend continuously on the data vector y. One way to state this precisely is to note that the generalized inverse T † is unbounded if and only if R(T) fails to be closed. We will study T † below in Section 4; for now, we prove the following necessary and sufficient condition for R(T) to be closed.

Proof. It is a standard result that R(T) is closed if and only if there exists γ > 0 such that
(see ( [3], Theorem 2.20)). If there exists γ > 0 such that σ(t) ≥ γ for almost all t ∈ M, then We have V f X = f L 2 (µ) for all f ∈ L 2 (µ); moreover, V defines an isomorphism from L 2 (µ) to N (T) ⊥ . Therefore, Conversely, suppose σ is not bounded away from zero. It follows that S k = {t ∈ M : σ(t) < 1/k} has positive measure for each k ∈ Z + . Therefore, with f k = χ S k , we have This shows that R(T) fails to be closed in this case, and the proof is complete.

s-Numbers
Given the utility of singular values for matrices and compact operators, it is natural to try to extend the concept to more general operators. This can be done in various ways. The Courant-Fischer characterization [8,9] of the singular values σ 1 , σ 2 , . . . of a compact operator T : X → Y is the following: Equation (6) can be taken as the definition of the s-numbers of an arbitrary bounded linear operator T : X → Y by replacing "min" with "inf" and "max" with "sup." Alternatively, the singular values of a compact operator can be characterized [10] as which again extends to arbitrary bounded linear operators T by replacing "min" with "inf". It can be shown that (6) and (7) are equivalent for operators on Hilbert space. In fact, Pietsch [11] formulated a list of five axioms that characterize s-numbers on Hilbert space (in the sense that two definitions, such as (6) or (7), that satisfy the axioms are equivalent).
The above definitions of s-numbers are limited, in that they may not give much information if the continuous spectrum of T * T is nonempty. Fack and Kosaki [12] defined generalized s-numbers for certain operators in a von Neumann algebra, and their techniques allow for a nonempty continuous spectrum. In the context of the SVE presented in this paper, it would be natural to define the set of s-numbers of a bounded linear operator T : X → Y as the essential range of σ (where T = Um σ V † ); we refer the reader to the first author's PhD dissertation [13] for a discussion. The relationship between these two approaches remains to be investigated.

The SVE and Tikhonov Regularization
We believe that Theorem 1 will prove to be useful in a variety of applications. Here we show that it can be used to give transparent proofs of convergence theorems in the theory of Tikhonov regularization, the most popular method for addressing inverse problems.
We consider an equation of the form Tx = y. We are given y ∈ Y and wish to compute or estimate x ∈ X satisfying the equation. The problem is well-posed if there exists a unique solution x for each y ∈ Y, where x depends continuously on y. Existence fails, at least for some y ∈ Y, if R(T) is a proper subspace of Y. However, in that case, it is common to settle for a least-squares solution of the equation, that is, an x ∈ X that minimizes Tx − y 2 Y . Uniqueness fails to hold if N (T) is nontrivial, but we can select a unique (least-squares) solution by choosing the unique solution lying in N (T) ⊥ , which is equivalent to choosing the minimum-norm least-squares solution. The interesting case occurs when R(T) fails to be closed. In that case, 1. Least-squares solutions exists only for y in the dense subspace R(T) ⊕ R(T) ⊥ of Y; 2. For each y ∈ R(T) ⊕ R(T) ⊥ , there exists a unique minimum-norm least-squares solution x ∈ N (T) ⊥ , but x does not depend continuously on y.
The generalized inverse T † : D(T † ) → X, where D(T † ) = R(T) ⊕ R(T) ⊥ , is defined by the condition that T † y is the minimum-norm least-squares solution of Tx = y. It follows from the above discussion that, when R(T) fails to be closed, then T † is a densely defined unbounded linear operator. In this case, the problem Tx = y, even when interpreted as asking for the minimum-norm least-squares solution, is ill-posed in that the solution does not depend continuously on the data. In this case, we call Tx = y a (linear) inverse problem.
Many regularization techniques for solving Tx = y approximate T † by a family {R λ : λ > 0} of bounded operators. Here λ is called the regularization parameter and it is required that R λ → T † pointwise as λ → 0 + . The most popular regularization method is Tikhonov regularization, in which R λ = (T * T + λI) −1 T * . This operator arises from solving We first show that R λ y → T † y for all y ∈ D(T † ). For convenience, we will write x λ,y = R λ y.
By definition, T † y ∈ N (T) ⊥ . Furthermore, x λ,y is defined by the equation We will write y = proj R(T) y, and notice that T * y = T * y since y − y ∈ R(T) ⊥ = N (T * ). It follows that x λ,y = (T * T + λI) −1 T * y = (T * T + λI) −1 T * y. Moreover, since the least-squares solutions of Tx = y are precisely the solutions of T * Tx = T * y, it also follows that T † y = T † y. These two facts (that T † y, x λ,y ∈ N (T) ⊥ and that T † y, x λ,y , can be defined by y in place of y) make it convenient to use the singular value expansion as expressed in Theorem 3 (as opposed to the version of Theorem 1). Suppose that, using the notation of Theorem 3, and recall that σ > 0 a.e. in M. Since x = T † y satisfies Tx = y, when it is convenient to do so), and hence This leads to However, then To show that T † y − x λ,y → 0 as λ → 0, it suffices to show that Moreover, since y ∈ R(T) (as opposed to merely belonging to R(T)-this follows from the fact that y ∈ D(T † ) = R(T) ⊕ R(T) ⊥ ), it follows that f = m σ −1 U † y ∈ L 2 (µ). However, then, since λ/(σ 2 + λ) is bounded on M and goes to 0 pointwise as λ → 0, it follows that by the dominated convergence theorem. This shows that x λ,y → T † y as λ → 0. Henceforth, we will write x 0,y = T † y.
We will prove two other results to demonstrate the usefulness of the singular value expansion. The result that we just defined shows that, for each y ∈ D(T † ), x λ,y → x 0,y as λ → 0. However, the result says nothing about the rate of convergence and, in fact, the convergence can be arbitrarily slow depending on the data y ∈ D(T † ) (or, equivalently, on the solution x 0,y ). For certain x 0,y , though, we can bound the rate of convergence. We will not attempt to prove the most general theorem, but rather just consider what turns out to be the optimal rate of convergence. We will show that if x 0,y ∈ R(T * T), then x 0,y − x λ,y X = O(λ).
We can also prove the following converse result, namely, that if y ∈ D(T † ) and x 0,y − x λ,y X = O(λ), then x 0,y ∈ R(T * T). We will use the fact, easily verified, that x ∈ R(T * T) if and only if m σ −2 V −1 x ∈ L 2 (µ), that is, if and only if V −1 x belongs to the domain of the densely defined operator m σ −2 . Let us write f 0 = V −1 x 0,y ; then we must show that We have which implies that . Since x 0,y − x λ,y X = O(λ) by assumption, there exists a constant C > 0 such that that is, Since f 2 0 /(σ 2 + λ) 2 converges monotonically to f 2 0 /σ 4 a.e. in M as λ → 0, it follows from the monotone convergence theorem that as desired.

Discussion
The proofs of the last section are offered as an illustration of the power of the singular value expansion. The reader can compare these proofs to other treatments of the same results that can be found in the literature on inverse problems. In Groetsch's monograph [14], the analysis is restricted to compact operators and Theorem 2.1.1, Corollary 3.1.2, and Theorem 3.2.2 correspond to our results; Groetsch's proofs use the singular value expansion (2) for compact operators. The reader will see that our proofs are direct generalizations of the derivations given there, and also that there is no difficulty in extending his other conclusions to general bounded linear operators. Groetsch does present his theory in greater generality, with much of the analysis applying to a certain family of regularization operators R λ , as opposed to just the Tikhonov approach. We restricted our presentation to Tikhonov regularization simply for convenience of exposition; there would be no difficulty in reproducing his results in the same level of generality.
To extend the results of [14] to general operators, the standard approach is to use the spectral representation of T * T in the form where {E α } is the spectral resolution of T * T, and apply the so-called functional calculus, which allows the representation of functions of T * T via It can be shown, for example, that A good reference for this approach is the book [15] by Engl, Hanke, and Neubauer, which (among other things) extends the results of [14] to general bounded linear operators. There is no intrinsic difficulty in doing so, but it may be argued that the arguments are less intuitive and therefore harder to follow. For instance, it is necessary to work with integrals of the following types: (· · · ) dE α , (· · · ) d E α x 2 X , (· · · ) d E α x, y X .
As Halmos stated in his popular expository article [6] on the spectral theorem (one of the most-downloaded articles from the American Mathematical Monthly), The result (namely, the spectral theorem for Hermitian matrices, when expressed using a resolution {E α } of the identity) is not intuitive in any language; neither Stieltjes integrals with unorthodox multiplicative properties, nor bounded operator representations of function algebras, are in the daily toolbox of every working mathematician. In contrast, the formulation of the spectral theorem given below uses only the relatively elementary concepts of measure theory.
We believe that the singular value expansion for general bounded linear operators, as described above, offers a similarly intuitive tool that can replace the standard use of the functional calculus in many contexts.