The Impact of the Discrepancy Principle on the Tikhonov-Regularized Solutions with Oversmoothing Penalties

: This paper deals with the Tikhonov regularization for nonlinear ill-posed operator equations in Hilbert scales with oversmoothing penalties. One focus is on the application of the discrepancy principle for choosing the regularization parameter and its consequences. Numerical case studies are performed in order to complement analytical results concerning the oversmoothing situation. For example, case studies are presented for exact solutions of Hölder type smoothness with a low Hölder exponent. Moreover, the regularization parameter choice using the discrepancy principle, for which rate results are proven in the oversmoothing case in in reference (Hofmann, B.; Mathé, P. Inverse Probl. 2018, 34, 015007) is compared to Hölder type a priori choices. On the other hand, well-known analytical results on the existence and convergence of regularized solutions are summarized and partially augmented. In particular, a sketch for a novel proof to derive Hölder convergence rates in the case of oversmoothing penalties is given, extending ideas from in reference (Hofmann, B.; Plato, R. ETNA. 2020, 93).


Introduction
This paper tries to complement the theory and practice of Tikhonov regularization with oversmoothing penalties for the stable approximate solution of nonlinear ill-posed problems in a Hilbert scale setting. Thus, we consider the operator equation with a nonlinear forward operator F : D(F) ⊆ X → Y, possessing the domain D(F) and mapping between the infinite dimensional real Hilbert spaces X and Y. In this context, · X and · Y denote the norms in X and Y, respectively. Throughout the paper, let x † ∈ D(F) be a solution to Equation (1) for a given right-hand side y. We restrict our considerations to problems that are locally ill-posed at x † . This means that the replacement of the exact right-hand side y by noisy data y δ ∈ Y, obeying the deterministic noise model with noise level δ > 0, may lead to significant errors in the solution of Equation (1) measured by the X-norm, even if δ tends to zero (cf. ( [1], Def. 2) for details).
For finding approximate solutions, we apply a Hilbert scale setting, where the densely defined, unbounded and self-adjoint linear operator B : D(B) ⊂ X → X with domain D(B) generates the Hilbert scale. This operator is assumed to be strictly positive such that we have for some m > 0 Bx X ≥ m x X , for all x ∈ D(B). (3) In this sense, we exploit the Hilbert scale {X τ } τ∈R generated by B with X 0 = X, X τ = D(B τ ), and with corresponding norms x τ := B τ x X .
As approximate solutions to x † , we use Tikhonov-regularized solutions x δ α ∈ D(F) that are minimizers of the extremal problem where α > 0 is the regularization parameter and F(x) − y δ 2 Y characterizes the misfit or fidelity term. The penalty functional Bx 2 X = x 2 1 in the Tikhonov functional T δ α is adjusted to the level one of Hilbert scale such that all regularized solutions have the property x δ α ∈ D(B). A more general form of the penalty functional in the Tikhonov functional would be B(x −x) 2 X , wherex ∈ D denotes a given smooth reference element.x then plays the role of the origin (the point of central interest), which can be very different for nonlinear problems. Without a loss of generality, we set in the sequel x := 0, which makes the formulas simpler.
In our study, the discrepancy principle named after Morozov (cf. [2]) as the most prominent a posteriori choice of the regularization parameter α > 0 plays a substantial role. On the one hand, the simplified version of the discrepancy principle in equation form with a prescribed constant C > 1 is important for theory (cf. [3]). However, it is well known that there are nonlinear problems, where that version is problematic due to duality gaps that prevent the solvability of Equation (5). For overcoming the remaining weaknesses of the parameter choice expressed in Equation (5), sequential versions of the discrepancy principle can be applied that approximate α discr , and we refer to [4][5][6] for more details. Such an approach is used for performing the numerical case studies in Section 6. Our focus is on oversmoothing penalties in the Tikhonov functional T δ α , where x † ∈ D(B) = X 1 such that T δ α (x † ) = ∞. In this case, the regularizing property T δ α (x δ α ) ≤ T δ α (x † ) does not yield any information. This property, however, is the basic tool for obtaining error estimates and convergence assertions for the Tikhonov-regularized solutions in the standard case, where T δ α (x † ) < ∞. We refer as an example to Chapter 10 of the monograph [7], which also deals with nonlinear operator equations, but adjusts the penalty functional to level zero. To derive error estimates in the oversmoothing case, the regularizing property must be replaced by inequalities of the form , where x aux ∈ D is an appropriately chosen auxiliary element.
The seminal paper on Tikhonov regularization in Hilbert scales that includes the oversmoothing case was written by F. Natterer in 1984 (cf. [8]) and was restricted to linear operator equations. Error estimates in the X-norm and convergence rates were proven under two-sided inequalities that characterize the degree of ill-posedness a > 0 of the problem. We follow this approach and adapt it to the case of nonlinear problems throughout the subsequent sections and assume the inequality chain for all x ∈ D (6) and constants 0 < c a ≤ C a < ∞. The left-hand inequality in Equation (6) represents a conditional stability estimate and is substantial for obtaining stable regularized solutions, whereas the right-hand inequality in Equation (6) contributes to the determination of the nonlinearity structure of the forward operator F. Convergence and rate results for the Tikhonov regularization expressed in Equation (4) with oversmoothing penalties under the inequality chain expressed in Equation (6) were recently presented in [3,6,9] and complemented by case studies in [10]. The present paper continues this series of articles by addressing open questions with respect to the discrepancy principle for choosing the regularization parameter α and its comparison to a priori parameter choices. In this context, one of the examples from [10] is reused for performing new numerical experiments in order to obtain additional assertions that cannot be taken from analytical investigations. The paper is organized as follows: We summarize in Section 2 basic properties of regularized solutions under assumptions that are typical for oversmoothing penalties and in Section 3 assertions concerning the convergence. In Section 4 we show that the error estimates derived in [6] for obtaining low order convergence rates are also applicable to obtain the order optimal Hölder convergence rates under the associated Hölder-type source conditions. Section 5 recalls a nonlinear inverse problem from an exponential growth model and an appropriate Hilbert scale, which can both be used for performing numerical experiments in the subsequent section. In that section (Section 6), the obtained numerical results are presented and interpreted based on a series of tables and figures.

Assumptions and Properties of Regularized Solutions
In this section, we formulate the standing assumptions concerning the forward operator F, the Tikhonov functional T δ α , and the solution x † of Equation (1) in order to ensure the existence and stability of regularized solutions x δ α for all regularization parameters α > 0 and noisy data y δ .

Assumption 1.
(a) The operator F : D(F) ⊆ X → Y mapping between the real Hilbert spaces X and Y is weakly sequentially continuous, and its domain D(F) with 0 ∈ D(F) is a convex and closed subset of X.
(e) There is a number a > 0, and there are constants 0 < c a ≤ C a < ∞ such that the two-sided estimates expressed in Equation (6) hold.
As a specific impact of Item (d) of Assumption 1 on approximate solutions to x † , we have the following proposition that is of interest for the behavior of regularized solutions in the case of oversmoothing penalties. Proof. In order to construct a contradiction, let us assume that the sequence {x n } ⊂ D(B) (or some of its subsequences) is bounded in X 1 , i.e., Bx n X ≤ K for all n ∈ N. Thus, a subsequence of {Bx n } converges weakly in X to some element z ∈ X, because bounded sets are weakly pre-compact in the Hilbert space X. Since the operator B is densely defined and self-adjoint, it is closed, i.e., the graph {(x, Bx) : x ∈ D(B)} is closed and, due to the convexity of this set, a weakly closed subset of X × Y. Hence, the operator B is weakly closed, which implies that x † ∈ D(B) and Bx † = z. This, however, contradicts the assumed property x † / ∈ D(B) and proves the proposition.

Remark 1.
As a consequence of Proposition 1, we have for any sequence of regularized solutions {x n = x δ n α n }, which is norm-convergent (and thus also weak-convergent) to x † / ∈ D(B) for δ n → 0 as n → ∞, such that it blows up to infinity with respect to the X 1 -norm. In other words, we have the limit condition lim n→∞ Bx δ n α n X = Bx † X = ∞.
Based on Lemma 1, we can formulate in Proposition 2 the existence of minimizers to the extremal problem expressed in Equation (4).

Lemma 1.
The non-negative penalty functional B · 2 X : D(B) ⊂ X → R as part of the Tikhonov functional T δ α is a proper convex and a lower, semi-continuous, and stabilizing functional.
Proof. The obviously convex penalty functional is proper, since it attains finite values for all x ∈ D(B) = X 1 = ∅. It is also a stabilizing functional because, as a consequence of Equation (3), the sub-level sets {x ∈ D(B) : Bx 2 X ≤ c} are weakly sequentially pre-compact subsets in X for all constants c ≥ 0. Namely, all such non-empty sub-level sets are bounded in X and hence weakly pre-compact. For showing that the functional B · 2 X = · 2 1 is lower semi-continuous, by taking into account Proposition 1 and its proof, it is enough to show that a sequence {x n } ⊂ D(B) with x n 1 ≤ K < ∞ for all n ∈ N that converges weakly in X tox ∈ X implies thatx ∈ D(B) and that this sequence also converges weakly in X 1 tox. The lower semi-continuity of the norm functional · 1 then yields Bx 2 X ≤ lim inf n→∞ Bx n 2 X . Now note that a subsequence {x n } bounded in X 1 has a subsequence that converges weakly in X 1 to some element z ∈ X 1 . Since the operator B is weakly closed, we then have Bx = z. Since z is uniform for all subsequences, this completes the proof.

Proposition 2.
For all α > 0 and y δ ∈ Y, there is a regularized solution x δ α ∈ D, solving the extremal problem expressed in Equation (4).
Proof. Proposition 4.1 from [11], which coincides with our proposition, is immediately applicable, since the Assumptions 3.11 and 3.22 from [11] are satisfied due to Assumption 1 and Lemma 1 above.
In addition to the existence assertion of Proposition 2, we also have, under the assumptions stated above, the stability of regularized solutions, which means that small changes in the data y δ yield only small changes in x δ α . For a detailed description of this fact, see Proposition 4.2 from [11] that applies here under Assumption 1.

Remark 2.
From Assumption 1, we have that there are no solutions x * ∈ D = D(F) ∩ D(B), satisfying with F(x * ) = y the operator expressed in Equation (1), because this would contradict, with F(x † ) = F(x * ) and x † − x * X > 0, the left-hand inequality of Equation (6). Besides x † , however, other solutions with x * / ∈ D(B) may exist. The regularized solutions x δ α , for fixed α > 0 and y δ ∈ Y, need not be uniquely determined, since, though possessing a convex part Bx 2 , the Tikhonov functional T δ α (x) is not necessarily convex.

Convergence of Regularized Solutions in the Case of Oversmoothing Penalties
In this section, we discuss assertions about the X-norm convergence of regularized solutions with the Tikhonov functional T δ α introduced in Equation (4). First we recall the following lemma (from [6], Proposition 3.4).
From Lemma 2, we directly obtain the following proposition (cf. [6], Theorem 4.1): For any a priori parameter choice α * = α(δ) and any a posteriori parameter choice α * = α(δ, y δ ), the regularized solutions x δ α * converge under Assumption 1 to the solution x † of the operator expressed in Equation (1) for δ → 0, i.e., whenever α * → 0 Remark 3. By inspection of the corresponding proofs in [6], it becomes clear that the validity of Lemma 2 and consequently of Theorem 1 is not restricted to the case of oversmoothing penalties, but it holds if Items (a), (b), (c), and (e) of Assumption 1 are fulfilled. This means that the solution x † ∈ D(F) can possess arbitrary smoothness.

Example 1.
In this example, we consider with respect to Theorem 1 the a priori parameter choice for varying exponents κ > 0. As the following proposition, as a consequence of Theorem 1, indicates, there is a wide range of exponents κ yielding convergence. (11) of the regularization parameter α > 0, the condition expressed in Equation (10) in Theorem 1 holds if and only if 0 < κ < 2 + 2 a .

Proposition 3. For the a priori choice expressed in Equation
However, the proof of the underlying Theorem 4.1 in [6] shows that the general verification of the basic estimate expressed in Equation (8), developed with a focus on the case of oversmoothing penalties, requires both the left-hand inequality as well as the right-hand inequality in the nonlinearity condition expressed in Equation (6) and, moreover, that x † is an interior point of D(F). More discussions in that direction can be made if we distinguish the following three κ-intervals: (ii): κ = 2 with two constants c and c such that and (iii): If we have x † ∈ X 1 in contrast to Item (d) of Assumption 1, then, for the convergence of regularized solutions to x † in Case (i), the nonlinearity condition expressed in Equation (6) is not needed at all provided that Items (a) and (b) of Assumption 1 are fulfilled. Also Item (c) is not necessary there. However to derive Equation (9), x † must be the uniquely determined penalty-minimizing solution to Equation (1) (cf. [11], Sect. 4.1.2 or alternatively [12,13]). Note that, for x † ∈ X 1 and parameter choices according to (i), conditions of the type expressed in Equation (6) are only relevant for proving convergence rates.
If regularization parameters are chosen such that Equation (12) is violated as in Cases (ii) and (iii), then even for x † ∈ X 1 inequalities from the condition expressed in Equation (6) are important. Precisely, Case (iii) seems to require both inequalities of Equation (6) for deriving convergence of regularized solutions to x † . The parameter choice according to Case (ii) with α * = α(δ) ∼ δ 2 represents for x † ∈ X 1 the typical conditional stability estimate situation introduced in the seminal paper [14]. There, only the left-hand inequality of condition expressed in Equation (6) is needed for convergence, which then is a consequence of convergence rate results (cf. [15][16][17] and references therein). However, to derive Equation (9), x † must be the uniquely determined solution to Equation (1). In the oversmoothing case x † / ∈ X 1 , both inequalities in Equation (6) seem to be indispensable for obtaining convergence; moreover, for all suggested choices of the regularization parameter α, the convergence proofs published by now, all using auxiliary elements, are essentially based on the fact that x † is an interior point of the domain D(F). Determining the conditions under which convergence takes place if κ ≥ 2 + 2/a is chosen in Equation (11) is an open problem. Now we turn to convergence assertions, provided that the regularization parameter α > 0 is selected according to the discrepancy principle expressed in Equation (5) with prescribed constant C > 1. The main ideas of the proof are outlined along the lines of [6], Theorem 4.9, where a sequential discrepancy has been considered.

Theorem 2.
Under Assumption 1, let there be, for a sequence {δ n } of positive noise levels with lim n→∞ δ n = 0 and all admissible noisy data y δ n ∈ Y obeying y δ n − y Y ≤ δ n , regularization parameters α n := α discr (δ n , y δ n ) > 0, satisfying the discrepancy principle for a prescribed constant C > 1. We then have and convergence as lim Proof. First, we show that Equation (16) always takes place for oversmoothing penalties with x † / ∈ D(B) = X 1 . To find a contradiction, we assume that lim inf n→∞ α n > 0. Since 0 ∈ D as a consequence of Item (a) of Assumption 1, we have T δ n α n (x δ n α n ) ≤ T δ n α n (0) and thus α n x δ n α n 2 , which means that x δ n α n 1 = Bx δ n α n X and, by Equation (3), x δ n α n X are uniformly bounded from above for all n ∈ N. We then have for a subsequence the weak convergences in X as x δ n k α n k x ∈ X and Bx δ n α n x ∈ X as k → ∞. Since the operator B is weakly sequentially closed, we therefore obtainx ∈ D(B) and for F weakly sequentially continuous (cf. Item (a) of Assumption 1) also F(x). Now, by Equation (15), we easily derive that F(x . Thus, the left-hand inequality of Equation (6) (cf. Item (e) of Assumption 1) yieldsx = x † , which contradicts the assumption x † / ∈ D(B) and proves the property expressed in Equation (16) of the regularization parameter choice.

Remark 4.
In the case x † ∈ X 1 of non-oversmoothing penalties, the limit condition expressed in Equation (16) represents a canonical situation for regularized solutions, whereas the non-existence of α discr from Equation (5) and the violation of Equation (16) only occur in exceptional cases. For the sequential variant of the discrepancy principle, the exceptional case lim inf n→∞ a n > 0 is discussed in [4] in the context of the exact penalization veto introduced there.

An Alternative Approach to Prove Hölder Convergence Rates in the Case of Oversmoothing Penalties for an A Priori Parameter Choice of the Regularization Parameter
In this section, we consider order optimal convergence rate results in the case of oversmoothing penalties for an a priori parameter choice of the regularization parameter α > 0. Such results have been proven in the paper [9] under the condition x † ∈ X p = D(B p ) = R(B −p ) for 0 < p < 1, which is a Hölder-type source condition. In that paper, the proof is formulated for the penalty B(x −x) 2 X with reference elementx ∈ D. This proof has been repeated in the appendix of the paper [10] in the simplified version withx = 0 and penalty term Bx 2 X , which is also utilized in the present work. In the following, we present the sketch of an alternative proof for the order optimal Hölder convergence rates under the Hölder-type source condition x † ∈ X p for 0 < p < 1. This alternative approach is based on error estimates that have been verified in [6] for showing convergence of the regularized solutions x δ α to x † and for proving low order (e.g., logarithmic) convergence rates under corresponding low order source conditions. By one novel idea outlined below, the results from [6] can be extended to prove Hölder convergence rates, too.
For the subsequent investigations, we complement Assumption 1 with an assumption that specifies the smoothness of the solution x † : Assumption 2. There are 0 < p < 1 and w ∈ X such that Theorem 3. Under Assumptions 1 and 2, we have for the a priori parameter choice the convergence rate Proof. We give only a sketch of a proof for this theorem, presupposing the results of the recent paper [6]. Precisely, we outline only the points, where we amend and complement the results of [6] in order to extend [6], Theorem 5.3, to the case of appropriate power-type functions ϕ. Auxiliary elements z α ∈ D(B), which are for all α > 0 the uniquely determined minimizers of the artificial Tikhonov functional T α,a := x − x † 2 −a + α Bx 2 X , represent, in combination with the moment inequality, the essential tool for the proof. By introducing the self-adjoint and positive semi-definite bounded linear operator G := B −(2a+2) : X → X, we can verify these elements in an explicit manner as which implies that, for all α > 0, According to [6], Lemma 3.1, we then have the functions f 1 (α) = o(1), f 2 (α) = o(1) and f 3 (α) = o(1) as α → 0 introduced there, which can be found in our notation from the representations . Under the source condition expressed in Equation (18), which attains the form x † = G p/(2a+2) w with some source element w ∈ X, we derive in detail the formulas and The asymptotics O(α p/(2a+2) ) in Equations (21)-(23) is a consequence of the properties G(G + αI) −1 ≤ 1, (G + αI) −1 ≤ 1/α, which yield by exploiting the moment inequality (cf. [7], Formula (2.49)) G θ (G + αI) −1 ≤ G(G + αI) −1 θ (G + αI) −1 1−θ ≤ α θ−1 , for α > 0 and 0 ≤ θ ≤ 1 (24) for the self-adjoint and positive semi-definite operator G. Here, · denotes the operator norm in the space of bounded linear operators mapping in X. In Equation (21), the inequality expressed in Equation (24) is applied with θ = p/(2a + 2), it is applied with θ = (a + p)/(2a + 2) in Equation (22), and it is applied with θ = (2a + p + 1)/(2a + 2) in Equation (23), taking into account that all three θ-values are smaller than one. These are the new ideas of the present proof.
Because the function f 9 (α) in [6], Formula (3.12), is found by linear combination and maximum-building of the functions f 1 (α), f 2 (α), and f 3 (α), we derive here f 9 (α) ∼ α p/(2a+2) along the lines of Section 3 in [6] and consequently the error estimate with constants K, K > 0, which is valid for all δ > 0 and a sufficiently small α > 0. Such a restriction to a sufficiently small α > 0 is due to the fact that z α has to belong to D(F) in order to apply the inequality chain expressed in Equation (6), but this is the case for small α, since x † is assumed to be an interior point of D(F). Under the a priori parameter choice expressed in Equation (19), we immediately obtain the convergence rate expressed in Equation (20) from the error estimate expressed in Equation (25). This completes the sketch of the proof of the theorem.

Remark 5.
Obviously, the a priori parameter choice expressed in Equation (19) satisfies the sufficient condition expressed in Equation (10) for the convergence of regularized solutions from Theorem 1.
More precisely, taking into account Example 1, the choice expressed in Equation (19) has the form of Equation (11) with κ = 2(a+1) a+p , which for 0 < p < 1 yields 2 < κ < 2 + 2 a and belongs to Case (iii), where the quotient δ 2 α * tends toward infinity as δ → 0. We mention that the choice expressed in Equation (19) coincides with the choice in [8] suggested by Natterer, who proved the order optimal convergence rate expressed in Equation (20) for linear ill-posed operator equations. For the nonlinear operator expressed in Equation (1), in [3], the convergence rate expressed in Equation (20) has also been proven for the a posteriori parameter choice α discr = α(δ, y δ ) from Equation (5). However, by now, there are no analytical results about the α discr -asymptotics of the discrepancy principle as δ tends toward zero. The numerical experiments in the subsequent sections will provide some hints that the hypothesis α discr ∼ δ 2(a+1) a+p does not have to be rejected.

Model Problem and Appropriate Hilbert Scale
In the following, we introduce an example for a nonlinear inverse operator expressed in Equation (1) together with an appropriate Hilbert scale, for which we will investigate the analytic results from the previous section numerically, following up on [10]. The well-known scale of Hilbert-type Sobolev spaces H p (0, 1) with integer values of p ≥ 0 consists of functions whose p-th derivative is still in L 2 (0, 1). For positive indices p, the spaces can be defined by using an interpolation argument, and for general real parameters of p ∈ R the norms of H p (0, 1) can be defined by using the Fourier transformx of the function x as (cf. [18]). The Sobolev scales do not constitute a Hilbert scale in the strict sense, but for each 0 < p * < ∞ there is an operator B : L 2 (0, 1) → L 2 (0, 1) such that {X p } 0≤p≤p * is a Hilbert scale (see [19]). In order to form a full Hilbert scale for arbitrary real values of p, boundary values conditions need to be imposed.
Model problem. The exponential growth model of this example was discussed in early literature (cf., e.g., [24], Section 3.1). More details and properties can be found in [25] and in the appendix of [3]. To identify the time dependent growth rate x(t) (0 ≤ t ≤ T) of a population, we use observations y(t) (0 ≤ t ≤ T) of the time-dependent size of the population with initial size y(0) = y 0 > 0, where the O.D.E. initial value problem is assumed to hold. For simplicity, we set T := 1 and consider the space setting X = Y := L 2 (0, 1). Thus, we derive the nonlinear forward operator F : x → y mapping in the real Hilbert space L 2 (0, 1) as with full domain D(F) = L 2 (0, 1) and with the Fréchet derivative It can be shown that there is some constantK > 0 such that for all x ∈ X the inequality is valid. This in turn guarantees that a tangential cone condition holds with some 0 < η < 1 in D(F) = B r (x † ) for a sufficiently small r > 0 (cf. [3], Example A.2), where B r (x † ) denotes a closed ball around x † with radius r. According to the construction of the Hilbert scale {X τ } τ∈R generated by the operator B in Equation (28), and due to 0 < c ≤ F(x † ) ≤ c ≤ ∞ as a consequence of Equation (30), we receive from [3], Proposition A.4 that the inequality chain expressed in Equation (6) holds with a = 1 in this example.

Numerical Case Studies
In this section, numerical evidence for the behavior of the regularized solutions x δ α of the model problem introduced in Section 5 is provided. In Section 6.1, numerical experiments with a focus on the discrepancy principle are conducted using exact solutions for low order Hölder-type smoothness x † ∈ X p with 0 < p < 1/2, while the focus of the recent paper [10] was on results for p = 1/2 and larger values p. The essential point of Section 6.2 is the comparison of results obtained by the discrepancy principle with those calculated by a priori choices expressed in Equation (11) of the regularization parameter α. In our first series of experiments, we investigated the interplay between the value p ∈ (0, 1 2 ), the decay rates of the regularization parameter α discr with respect to the noise level δ for different values p as δ tends toward zero, and the corresponding rates of the error of regularized solutions x δ α . Therefore, we turn to exact solutions of the form x † (t) = ct −β (0 < t ≤ 1) with β ∈ (0, 1/2). These functions x † do not belong to the Sobolev space H p (0, 1) with fractional order p if 1/2 − β < p(see for example [26], p. 422). This allows us to study the behavior of the regularized solutions for exact solutions with low order Hölder-type solution smoothness. For the numerical simulations, we therefore assume that 1/2 − β is at least approximately the smoothness of the exact solution x † .
To confirm our theoretical findings, we solve Equation (4) after discretization using the trapezoidal rule for the integral, the MATLAB R -function fmincon. We would also like to point out the difficulties associated with the numerical treatment of functions of this particular type. Obviously the pole occurring at zero is source force for the low smoothness of the exact solution and needs to be captured accordingly. After multiple different approaches, equidistant discretization with the first discretization point very close to zero was proven to be very successful. Typically, a discretization level N = 200 is used. To the simulated data y = F(x † ), we added random noise for which we prescribe the relative errorδ such that y − y δ =δ y ; i.e., we have Equation (2) with δ =δ y . To obtain the X 1 norm in the penalty, we set · 1 = · H 1 (0,1) and additionally enforce the boundary condition x(1) = 0. The regularization parameter α in these series of experiments is chosen as α discr = α(δ, y δ ) using, with some prescribed multiplier C > 1, the discrepancy principle which approximates α discr from Equation (5). Unless otherwise noted, C = 1.3 was used. From the case studies in [10], we can conjecture, but have no stringent proof, that the α-rate of the discrepancy principle does not systematically deviate from the a priori rate expressed in Equation (19), which for a = 1 attains the form This α-rate already occurred in Natterer's paper [8] for linear problems, and occurs in the case of oversmoothing penalties. We should in our numerical experiments be able to observe the order optimal convergence rate, which is for a = 1 This convergence rate was proven for the a priori parameter choice expressed in Equation (19) as well as for the discrepancy principle expressed in Equation (5) in [9] and [3], respectively.
As the exact solutions x † are known, we can compute the regularization errors x δ α − x † X . Interpreting these errors as a function of δ justifies a regression for the convergence rates according to The α-rates are then computed in a similar fashion using Both exponents κ x and κ α and the corresponding multipliers c x and c α , all obtained by using a least squares regression based on samples for varying δ, are displayed for different values β in Table 1. As we know, the convergence rate expressed in Equation (35), we can estimate the smoothness p by the the formula p est := κ x 1−κ x . The far right column of Table 1 displays the quotient 4 p est +1 estimating the exponent in Equation (34), which can be compared with the κ α -values in the second to right column obtained by regression from a data sample. By comparing the far right column and the second to right column of Table 1, we can state that the asymptotics of α discr as δ → 0 seems to be approximately the same as for the optimal a priori parameter choice expressed in Equation (34). Such observation was already made for larger values p in [10] for the same model problem.  Figure 1 illustrates results from Table 1 for x † (t) = ct −β (0 < t ≤ 1) characterizing with varying β ∈ (0, 1/2) different smoothness levels of the solution. Since we have an oversmoothing penalty for all such β, the κ α -values lie between 2 and 2 + 2 a = 4 (cf. the κ-interval (iii) in Example 1). Additionally, the border lines for κ α = 2 and κ α = 2 + 2 a are displayed taking into account that [6] guarantees convergence of the regularized solution x δ α to the exact solution x † as δ → 0 for a priori choices in the sense of Equation (19) whenever 2 ≤ κ α < 2 + 2 a . It becomes evident that the α-rates resulting from the discrepancy principle also lie between those bounds. Figures 2 and 3 give some more insight into the situation for the special case β = 0.2, which approximately corresponds with the smoothness x † ∈ X 0.31 . In Figure 2 (left), the realized errors x δ α − x † X are visualized for a discrete set of noise levels and compared with the associated regression line in a double-logarithmic scale. It becomes evident that the approximation using Hölder rates is highly accurate. The right image of Figure 2 visualizes the behavior of δ 2 α discr for various noise levels on a logarithmic scale. The tendency that δ 2 α discr → ∞ as δ → 0 seems to be convincing. Figure 3 (left) displays the realized α discr -values for this particular situation together with the best approximating regression line according to Equation (37). We see again a very good fit for this type of approximation. The right subfigure shows the exact and regularized solution for δ = 10 −3.5 . The excellent fit of the regularized solution confirms our confidence in the numerical implementation, especially considering the problems associated with this type of exact solution.  Table 1 on a double-logarithmic scale.

A Comparison with Results from A Priori Parameter Choices
The question of whether the a posteriori choices using the discrepancy principle or appropriate a priori choices according to Equation (19) yield better results is of interest. The influence of the constant c α when using a priori choice according to Equation (37) remains especially unclear. To numerically investigate this, we remain in the setting of Section 6.1; i.e., we consider x † (t) = ct −β (0 < t ≤ 1) as an exact solution. Figure 4 illuminates this situation for β = 0.2. The error x δ α − x † X is plotted for various constants c α , where we use α * = α(δ) = c α δ  Regularization error x δ α − x † X using the a priori rate expressed in Equation (19) implemented in the sense of Equation (37) depending on various constants c α . Comparison with the error occurring for the discrepancy principle with C = 1.4 (orange) or C = 1.6 (red) and with noise level δ = 10 −2.5 .
We complete our numerical experiments on a priori choices of the regularization parameter with Table 2 and Figure 5, where we list and illustrate the best regression exponents κ x according to the error norm estimate expressed in Equation (36) for different exponents κ α in the a priori parameter choice expressed in Equation (37). In this case study, we used the exact solution x(t) = 1 (0 ≤ t ≤ 1) with the higher smoothness p = 0.5. For the a priori parameter choice expressed in Equation (37) with varying exponents κ α , the factor c α = 1 has been fixed. The discretization level N = 1000 was considered. As expected, Table 2 indicates that maximal error rates occur if κ α is close to the optimal value 4 1+p . These rates also correspond with the order optimal rates according to Equation (20). For smaller exponents κ α , the error rates are falling, and for large exponents κ α ≥ 2 + 2 a = 4, the convergence seems to degenerate. This is visualized in Figure 5: For κ α = 3.5, convergence still takes place, whereas for κ α = 5.5 convergence cannot be observed anymore. Regularization error x δ α − x † X and regression line for different noise levels δ on a log-log scale. x † (t) = 1, a priori parameter choice according to Equation (37) with κ α = 3.5 (left) and κ α = 5.5 (right).

Remark 6.
As an alternative a posteriori approach for choosing the regularization parameter α, one could also consider the balancing (Lepskiȋ) principle (cf., e.g., [27,28]). In [29], this principle is adapted to the Hilbert scale setting, but not with respect to oversmoothing penalties. In future work, we can discuss this missing facet and perform numerical experiments for the balancing principle in the case of oversmoothing penalties.