Complexity L0-Penalized M-Estimation: Consistency in More Dimensions

Demaret, Laurent; Friedrich, Felix; Liebscher, Volkmar; Winkler, Gerhard

doi:10.3390/axioms2030311

Open AccessArticle

Complexity L0-Penalized M-Estimation: Consistency in More Dimensions

by

Laurent Demaret

^1,*,

Felix Friedrich

²,

Volkmar Liebscher

³ and

Gerhard Winkler

⁴

¹

Institute of Computational Biology, German Research Center for Environmental Health, Ingolstädter Landstr. 1, Neuherberg D-85764, Germany

²

Computer Systems Institute, Swiss Federal Institute of Technology (ETH), Zürich 8092, Switzerland

³

Department of Mathematics and Computer Science, University of Greifswald, Domstr. 11, Greifswald 17489, Germany

⁴

Mathematical Institute, Ludwig-Maximilian University of Munich Professor-Huber-Platz 2, München 80539, Germany

^*

Author to whom correspondence should be addressed.

Axioms 2013, 2(3), 311-344; https://doi.org/10.3390/axioms2030311

Submission received: 7 April 2013 / Revised: 15 May 2013 / Accepted: 4 June 2013 / Published: 9 July 2013

(This article belongs to the Special Issue Wavelets and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

We study the asymptotics in

L^{2}

for complexity penalized least squares regression for the discrete approximation of finite-dimensional signals on continuous domains—e.g., images—by piecewise smooth functions. We introduce a fairly general setting, which comprises most of the presently popular partitions of signal or image domains, like interval, wedgelet or related partitions, as well as Delaunay triangulations. Then, we prove consistency and derive convergence rates. Finally, we illustrate by way of relevant examples that the abstract results are useful for many applications.

Keywords:

adaptive estimation; penalized M-estimation; Potts functional; complexity penalized; variational approach; consistency; convergence rates; wedgelet partitions; Delaunay triangulations

Classification:

MSC 41A10, 41A25, 62G05, 62G20

1. Introduction

We are going to study the consistency of special complexity penalized least squares estimators for noisy observations of finite-dimensional signals on multi-dimensional domains, in particular, of images. The estimators discussed in the present paper are based on partitioning combined with piecewise smooth approximation. In this framework, consistency is proven and convergence rates are derived in

L^{2}

. Finally, the abstract results are applied to a couple of relevant examples, including popular methods, like interval, wedgelet or related partitions, as well as Delaunay triangulations. Figure 1 illustrates a typical wedgelet representation of a noisy image.

Figure 1. A noisy image (left) and (right) a fairly rough wedgelet representation for

n = 256

; the (middle) picture also shows the boundaries of the smoothness regions.

Figure 1. A noisy image (left) and (right) a fairly rough wedgelet representation for

n = 256

; the (middle) picture also shows the boundaries of the smoothness regions.

Consistency is a strong indication that an estimation procedure is meaningful. Moreover, it allows for structural insight, since a sequence of discrete estimation procedures is embedded into a common continuous setting and the quantitative behavior of estimators can be compared. It is frequently used as a substitute or approximation for missing or vague knowledge in the real finite sample situation. Plainly, one must be aware of various shortcomings and should not rely on asymptotics in case of a small sample size. Nevertheless, consistency is a broadly accepted justification of statistical methods. Convergence rates are of particular importance, since they indicate the quality of discrete estimates or approximations and allow for comparison of different methods.

Observations or data will be governed by a simple regression model with additive white noise: Let

S^{n} = {1, \dots, n}^{d}

be a finite discrete signal domain, interpreted as the discretization of the continuous domain,

S^{\infty} = {[0, 1)}^{d}

. Data,

y = {(y_{s})}_{s \in S^{n}}

, are available for the discrete domains and generated by the model:

Y_{s}^{n} = {\bar{f}}_{s}^{n} + ξ_{s}^{n}, n \in N, s \in S^{n}

(1)

where

{({\bar{f}}_{s}^{n})}_{s \in S^{n}}

is a discretization of an original or “true” signal, f, on

S^{\infty}

and

{(ξ_{s}^{n})}_{s \in S^{n}}

is white sub-Gaussian noise.

The present approach is based on a partitioning of the discrete signal domain into regions on each of which a smooth approximation of noisy data is performed. The choice of a particular partition is obtained by a complexity penalized least squares estimation, dependent on the data. Between the regions, sharp breaks of intensity may happen, which allow for edge-preserving piecewise smoothing. In one dimension, a natural way to model jumps in signals is to consider piecewise regular functions. This naturally leads to representations based on partitions consisting of intervals. The number of intervals on a discrete line of length, n, is of the polynomial order,

n^{2}

.

In more dimensions, however, the definition of elementary fragments is much more involved. For example, in a discrete square of side-length, n, the number of all subregions is of the exponential order

2^{n^{2}}

. When dealing with images, one of the difficulties consists in constructing reduced sets of fragments, which, at the same time, take into account the geometry of images and lead to computationally feasible algorithms for the computation of estimators.

The estimators adopted here are minimal points of complexity penalized least squares functionals: If

y = {(y_{s})}_{s \in S^{n}}

is a sample and

x = {(x_{s})}_{s \in S^{n}}

a tentative representation of y, the functional:

H^{n} (x, y) = γ | 𝒫 (x) | + \sum_{s \in S^{n}} {(y_{s} - x_{s})}^{2}

(2)

has to be minimized in x given y; the penalty,

| 𝒫 (x) |

, is the number of subdomains into which the entire domain is divided and on which x is smooth in a sense to be made precise by the choice of suitable function spaces (see Section 2.1 and Section 5); γ is a parameter that reflects the tradeoff between the quadratic error and the size of the partition.

Due to the non-convexity of the

L^{0}

-type penalty, one has to solve hard optimization problems in general. If all possible partitions of the signal domain are admitted, such optimization problems are not computationally feasible. A popular attempt to circumvent this nuisance is simulated annealing; see, for instance, the seminal paper [1]. This paper had a considerable impact on imaging; the authors transferred models from statistical physics to image analysis as prior distributions in the framework of Bayesian statistics. This approach was intimately connected with Markov Chain Monte Carlo methods, like Metropolis Sampling and Simulated Annealing [2].

On the other hand, transferring spatial complexity to time complexity, like in such metaheuristics, does not remove the basic problem; it rather transforms it. Such algorithms are not guaranteed to find the optimum or even a satisfactory near-optimal solution [2], Section 6.2. All metaheuristics will eventually encounter problems on which they perform poorly.

Moreover, if the number of partitions grows, at least, exponentially, it is difficult to derive useful uniform bounds on the projections of noise onto the subspaces induced by the partitions. Reducing the search space drastically allows the designing of exact and fast algorithms. Such a reduction basically amounts to restrictions on admissible partitions of the signal domain. There are various suggestions, some of them mentioned initially.

In one dimension, regression onto piecewise constant functions was proposed by the legendary [3] who called respective representations regressograms. The Functional (2) is by some (including the authors) referred to as the Potts functional. It was introduced in [4] as a generalization of the well-known Ising model [5] from statistical physics from two or more spins. It was suggested by [6] and penalizes the length of contours between regions of constant spins. In fact, in one dimension, a partition,

𝒫

, into, say, k intervals on which the signal is constant admits

k - 1

jumps and, therefore, has contour-length,

k - 1

.

The one-dimensional Potts model for signals was studied in detail in a series of theses and articles; see [7,8,9,10,11,12,13,14] Consistency was first addressed in [10] and, later on, exhaustively treated in [15,16]. Partitions there consist of intervals. Our study of the multi-dimensional case started with the thesis [8]; see also [17].

In two or more dimensions, the model (2) differs substantially from the classical Potts model. The latter penalizes the length of contours—locations of intensity breaks—whereas (2) penalizes the number of regions. This allows, for instance, good performance on filamentous structures, albeit having long borders compared to their area.

Let us give an informal introduction into the setting. The aim is to estimate a function, f, on the d-dimensional unit cube,

S^{\infty} = {[0, 1)}^{d}

, from discrete data. To this end,

S^{\infty}

and f are discretized to cubic grids,

S^{n} = {1, \dots, n}^{d}

,

n \in N

, and functions,

{\bar{f}}^{n}

, on

S^{n}

. For each n, data,

y_{s}^{n}

,

s \in S^{n}

, is available, i.e., noisy observations of the

{\bar{f}}_{s}^{n}

.

In this paper, we prove almost sure convergence of the quadratic error associated with complexity penalized least squares estimators

{\hat{f}}^{n} (y)

, which are minimal points of functionals of the form (2) (see Section 2.2). Note that the partition,

P

, is chosen among a suitable class of admissible partitions. Moreover, we derive almost sure convergence rates of quadratic errors, whenever a decay rate of the approximation errors is assumed in addition.

We are faced with three kinds of error: The error caused by noise, the approximation and the discretization error. Noise is essentially controlled regardless of the specific form of f. For the approximation and the discretization error, special assumptions on the function classes in question are needed.

The results presented here are closely related to—but differ from—those obtained by the classical model selection theory developed, for instance, in [18,19]. In fact, classical model selection works within the minimax setting, where the quadratic risk,

E (∥ {\hat{f}}^{n} (ω) - {\bar{f}}^{n} ∥^{2})

, is controlled. Let us stress that neither of both results directly implies the other.

Due to the approximation error term, there are deep connections to approximation theory. In particular, when dealing with piecewise regular images, non-linear approximation rates obtained by wavelet shrinkage methods are known to be suboptimal, as discussed in [20,21]. In the last decade, the challenging problem to improve upon wavelets has been addressed in very different directions.

The search for a good paradigm for detecting and representing curvilinear discontinuities of bivariate functions remains a fundamental issue in image analysis. Ideally, an efficient representation should use atomic decompositions, which are local in space (like wavelets), but also possess appropriate directional properties (unlike wavelets). One of the most prominent examples is given by curvelet representations, which are based on multiscale directional filtering combined with anisotropic scaling. [22] proved that thresholding of curvelet coefficients provides estimators, which yield the minimax convergence rate up to a logarithmic factor for piecewise

𝒞^{2}

functions with

𝒞^{2}

boundaries. Another interesting representation is given by bandelets, as proposed in [23]. Bandelets are based on optimal local warping in the image domain relative to the geometrical flow, and [24] proved, also, the optimality of the minimax convergence rates of their bandelet-based estimator, for a larger class of functions, including piecewise

𝒞^{α}

functions with

𝒞^{α}

boundaries.

In Section 5, we apply the abstract framework proposed in Section 4 to bidimensional examples that rely on explicit geometrical constructions: In particular, the corresponding approaches are aimed at avoiding the pseudo-Gibbs artifacts produced by the above methods.

Wedgelet partitions were introduced by [21] and belong to the class of shape-preserving image segmentation methods. The decompositions are based on local polynomial approximation on some adaptively selected leaves of a quadtree structure. The use of a suitable data structure allowed for the development of fast algorithms for wedgelet decomposition; see [17].

An alternative is provided by anisotropic Delaunay triangulations, which have been proposed in the context of image compression in [25]. The flexible design of the representing system allows for a particularly fine selection of triangles fitting the anisotropic geometrical features of images. In contrast to curvelets, such representations preserve the advantage of wavelets and are still able to approximate point singularities optimally; see [26].

Both wedgelet representations and anisotropic Delaunay triangulations lead to optimal non-linear approximation rates for some classes of piecewise smooth functions. Note that the classes of (generalized horizon) functions considered in this paper contain and are larger than the above mentioned horizon functions (or boundary fragments). For a brief discussion on the generalization to more general piecewise regular functions, see Section 6. In the present paper, we use this optimality to derive convergence rates of the estimators. We prove almost sure consistency rates for function classes where the piecewise regularity is controlled by a parameter, α. More precisely, for these classes, we obtain that, for almost each ω:

∥ {\hat{f}}^{n} (ω) - f ∥ = O (ε_{n}^{2 α / (α + 1)} log (ε_{n})), ε_{n} = σ^{2} / n^{d}

(3)

where

σ^{2}

is the variance of noise.

In the minimax setting, decay rates similar to those in (3) are known to be optimal for the respective function classes. Note that, using slightly modified penalties, which are not merely proportional to the number of pieces, [27] were able to show that optimal minimax rates may be achieved, optimally meaning that the rates are of the same order as in (3), but without the log factor. In the present paper, in contrast to [27], we control the almost sure convergence instead of the

L^{2}

-risk. Moreover, we explicitly restrict our attention to the classical penalty given by the number of pieces (or, equivalently, the dimension of the model) as in (2), noting that this is strongly connected to the sparse ansatz, which is currently popular in the signal community. We refer to [28] for a comprehensive review on sparsity. The generalization of the results in the present paper to other penalties is straightforward, but would be rather technical and, thus, might obscure the main ideas.

We address, first, noise and its projections to the approximation spaces; see Section 3. In Section 4, we derive convergence rates in the general context. Finally, in Section 5, we illustrate the abstract results by specific applications. Dimension, 1, is included, thus generalizing the results from [15] to piecewise polynomial regression and piecewise Sobolev classes. Our two-dimensional examples, wedgelets and Delaunay triangulations both rely on a geometric and edge-preserving representation. Our main motivation is the optimal approximation properties of these methods, the key feature to apply the previous framework being an appropriate discretization of these schemes.

2. The Setting

In this section, we introduce the formal framework for piecewise smooth representations, the regression model for data and the estimation procedure.

2.1. Regression and Segmentations

Image domains will generically be denoted by S. We choose

S^{\infty} = {[0, 1)}^{d}

,

d \in N

, as the continuous and

S^{n} = {1, \dots, n}^{d}

as the generic discrete image domain. Let

f \in L^{2} (S^{\infty})

represent the “true” image, which has to be reconstructed from noisy discrete data. For the latter, we adopt a simple linear regression model of the form:

Y_{s}^{n} = {\bar{f}}_{s}^{n} + ξ_{s}^{n}, n \in N, s \in S^{n}

(4)

The noise variables,

ξ_{s}^{n}

, in the regression model are random variables on a common probability space

(Ω, ℱ, P)

.

\bar{f^{n}} = {({\bar{f^{n}}}_{s})}_{s \in S^{n}}

is a discretization of f. To be definite, divide

S^{\infty}

into

n^{d}

semi-open cubes:

I_{i_{1}, \dots, i_{d}}^{n} = \prod_{1 \leq j \leq d} [(i_{j} - 1) / n, i_{j} / n), 1 \leq i_{j} \leq n

of volume

1 / n^{d}

and for

g \in L^{2} (S^{\infty})

take local means

{\bar{g}}_{s}^{n} = n^{d} \int_{I_{s}} g (u) d u, s \in S^{n}

This specifies maps,

δ^{n}

, from

L^{2} (S^{\infty})

to

R^{S^{n}}

by:

δ^{n} g = {({\bar{g}}_{s}^{n})}_{s \in S^{n}}

(5)

Conversely, embeddings of

R^{S^{n}}

into

L^{2} (S^{\infty})

are defined by:

z = {(z_{s})}_{s \in S^{n}} ⟼ ι^{n} z = \sum_{s \in S^{n}} z_{s} 1_{I_{s}}

(6)

As an aid to memory, keep the following chain of maps in mind:

L^{2} (S^{\infty}) \overset{δ^{n}}{⟶} R^{S^{n}} \overset{ι^{n}}{⟶} L^{2} (S^{\infty})

In the absence of noise, the discrete approximations of f are the functions,

ι^{n} {\bar{f}}^{n} = ι^{n} δ^{n} f

, which approximate f more and more precisely as n tends to infinity. Thus, a crucial issue will be to control noise. In fact, the function,

ι^{n} δ^{n} f = ι^{n} {\bar{f}}^{n}

, is the conditional expectation of f with regard to the (σ)-algebra

𝒜^{n}

generated by the cubes,

I_{s}^{n}

, and convergence can be seen by a martingale argument.

We are dealing with estimates of f or, rather, of

{\bar{f}}^{n}

on each level, n. An image domain, S, will be partitioned by the method into sets, on which the future representations are members of initially chosen spaces of smooth functions. To keep control, we choose a class,

ℛ \subset 2^{S}

, of admissible fragments, and later on, these will be rectangles, wedges or triangles. A subset,

𝒫 \subset 2^{S}

, is a partition if (a) the elements in

𝒫

are mutually disjoint and (b) S is the union of all

P \in 𝒫

. In the following,

P

is a fixed subset of partitions,

𝒫

, such that

𝒫 \subset ℛ

. We call elements of

P

admissible partitions.

For each fragment,

P \in ℛ

, we choose a finite dimensional linear space,

ℱ_{P}

, of real functions on S, which vanish off P. Examples are spaces of constant functions or polynomials of higher degree, set to 0 outside P. If

𝒫 \in P

and

f_{𝒫} = {(f_{P})}_{P \in 𝒫}

is a family of such functions, we also denote by

f_{𝒫}

the function defined on all of S and whose restriction to P is equal to

f_{P}

for each

P \in 𝒫

. The pair,

(𝒫, f_{𝒫})

, is a segmentation, and each element

(P, f_{P})

is a segment. For each partition,

𝒫

, define the linear space,

\oplus_{P \in 𝒫} ℱ_{P}

. A family of segmentations is called a segmentation class. In particular, for a given set of functions,

ℱ_{P}, P \in 𝒫

, we let:

S (P) : = \{(𝒫, f) : P \in P, f \in ℱ_{𝒫}\}

with partitions in

P

and functions whose restrictions to

P \in 𝒫

are in

ℱ_{P}

. We will also use the notation,

S^{n}

, for the discrete equivalent of

S

obtained from discrete partitions,

𝒫^{n}

.

2.2. Complexity Penalized Least Squares Estimation

We want to produce appropriate discrete representations or estimates of the underlying function, f, on the basis of random data, Y, from the regression model (4). We are watching out for a segmentation that is in proper balance between fidelity to data and complexity.

We decide in advance on a class,

S^{n}

, of (admissible) discrete segmentations, which should contain the desired representations. The segmentations, given data,

Y^{n}

, are scored by the functional:

H_{γ}^{n} : S^{n} \times R^{S^{n}} ⟶ R, H_{γ}^{n} ((𝒫, f_{𝒫}), Y^{n}) = γ | 𝒫 | + ∥ f_{𝒫} - Y^{n} ∥^{2}

(7)

with

γ \geq 0

and

| 𝒫 |

being the cardinality of

𝒫

. The symbol,

∥ \cdot ∥

, denotes the

ℓ^{2}

-norm on

R^{S^{n}}

. The last term measures fidelity to data. The other term is a rough measure of overall smoothness. As estimators for f given data, Y, we choose minimizers:

(\hat{𝒫^{n}}, {\hat{f}}^{n})

of (7). Note that both

\hat{𝒫^{n}}

and

{\hat{f}}^{n}

are random since

Y^{n}

is random.

The definition makes sense, since minimal points of (7) always exist. This can be easily verified by the reduction principle, which relies on the decomposition:

min_{𝒫 \in P^{n}, f_{𝒫} \in ℱ_{𝒫}} H_{γ}^{n} ((𝒫, f_{𝒫}), Y) = min_{𝒫 \in P^{n}} (γ | 𝒫 | + min_{f_{𝒫} \in ℱ_{𝒫}} {∥ f_{𝒫} - Y ∥}^{2})

Given

𝒫

, the inner minimization problem has as a unique solution the orthogonal projection,

{\hat{f}}_{𝒫}^{n}

, of Y onto

\oplus_{P \in 𝒫} ℱ_{P}

. The outer minimization problem is finite, and hence, a minimum of (7) exists. Let us pick one of the minimal points,

{\hat{f}}^{n}

.

The reduction principle connects complexity penalized least squares to model choice techniques. The right side can be interpreted as a criterion for the choice of a partition,

P

. The chosen partition determines the linear space,

F_{P}

, which is a linear model for data. Based on this model, the estimate is finally derived.

3. Noise and Its Projections

For consistency, resolutions at infinitely many levels are considered simultaneously. Frequently, segmentations are not defined for all

n \in N

, but only for a cofinal subset of

N

. Typical examples are all dyadic quad-tree partitions or dyadic wedgelet segmentations, where only indices of the form,

n = 2^{p}

, appear. Therefore, we adopt the following convention:

The symbol,

M

, denotes any infinite subset of

N

endowed with the natural order, ≤.

(M, \leq)

is a totally ordered set, and we may consider nets,

{(x_{n})}_{n \in M}

. For example,

x_{n} \to x

,

n \in M

, means that

x_{n}

convergences to x along

M

. We deal similarly with notions, like lim sup etc. Plainly, we might resort to subsequences instead, but this would cause a change of indices, which is notationally inconvenient.

3.1. Sub-Gaussian Noise and a Tail Estimate

We introduce now the main hypotheses on noise accompanied by a brief discussion. The core of the arguments in later sections is the tail Estimate Equation (9) below.

As Theorem 2 will show, the appropriate framework are sub-Gaussian random variables. A random variable, ξ, enjoys this property if one of the following conditions is fulfilled:

Theorem 1

The following two conditions on a random variable ξ are equivalent:

(a): There is $a \in R$ , such that:

$E (exp (t ξ)) \leq exp (a^{2} t^{2} / 2) f o r e v e r y t > 0$

(8)
(b): ξ is centered and majorized in distribution by some centered Gaussian variable, η, i.e.:

$t h e r e i s c_{0} \geq 0, s u c h t h a t P (| ξ | \geq c) \leq P (| η | \geq c), f o r a l l c > c_{0}$

This and most other facts about sub-Gaussian variables quoted in this paper are verified in the first few sections of the monograph, [29]; one may also consult [30], Section 3.4.

The definition in (a) was given in the celebrated paper [31], which uses the term generalized Gaussian variables. The closely related concept of semi-Gaussian variables, which requires symmetry of ξ, seems to go back to [32].

The class of all sub-Gaussian random variables living on a common probability space,

(Ω, 𝒜, P)

, is denoted by

Sub (Ω)

. The sub-Gaussian standard is the number,

τ (η) = inf {a \geq 0 : a is feasible in (8)}

The infimum is attained and, hence, is a minimum.

Sub (Ω)

is a linear space, τ is a norm on

Sub (Ω)

if variables differing on a null-set only are identified. Equipped with the norm, τ,

Sub (Ω)

is a Banach space. It is important to note that

Sub (Ω)

is strictly contained in all spaces,

L_{0}^{p} (Ω)

,

p \geq 1

, the spaces of all centered variables with finite

p t h

ordered absolute moments.

Remark 1

The most prominent sub-Gaussians are centered Gaussian variables, η, with standard deviation, σ, and

τ (η) = σ

. For them, Inequality (8) is an equality with

a = σ

. The specific characteristic of sub-Gaussian variables are tails lighter than those of Gaussians, as expressed in (b) of Theorem 1.

The following theorem is essential in the present context:

Theorem 2

For each

n \in M

, suppose that the variables,

ξ_{s}^{n}

,

s \in S^{n}

, are independent. Then:

(a): Suppose that there is a real number, $β > 0$ , such that for each $n \in M$ and real numbers, $μ_{s}$ , $s \in S^{n}$ , and each $c \in R_{+}$ , the inequality:

$P (|\sum_{s \in S_{n}} μ_{s} ξ_{s}^{n}| \geq c) \leq 2 \cdot exp (- \frac{c^{2}}{β \sum_{s \in S_{n}} μ_{s}^{2}})$

(9)

holds. Then, all variables $ξ_{s}^{n}$ are sub-Gaussian with a common scale factor, β.
(b): Let all variables, $ξ_{s}^{n}$ , be sub-Gaussian. Suppose further that:

$β = 2 \cdot sup {τ^{2} (ξ_{s}^{n}) : n \in M, s \in S^{n}} < \infty$

(10)

Then, (a) is fulfilled with this factor, β.

This is probably folklore, and we skip the proof. A detailed proof can be found in the extended version [33].

Remark 2

For white Gaussian noise, one has

τ (ξ_{s}^{n}) = σ

and, hence,

β = 2 σ^{2}

.

3.2. Noise Projections

In this section, we quantify projections of noise. Choose for each

n \in M

a class,

ℛ^{n} \subset P^{n} \times 2^{S^{n}}

, of admissible segments over

S^{n}

and a set,

P^{n}

, of admissible partitions. As previously, for each

P \in ℛ^{n}

, a linear function space,

ℱ_{P}

, is given. We shall denote the orthogonal

L^{2}

-projection onto the linear space,

\oplus_{P \in 𝒫} F_{𝒫}

, by

π_{𝒫}

.

The following result provides almost sure

L^{2}

-estimates for the projections of noise to these spaces, as there are more and more admissible segments.

Proposition 1

Suppose that

\dim ℱ_{P} \leq D

for all

n \in M

and each

P \in ℛ^{n}

. Assume in addition that there is a number,

M > 0

, such that for some

κ > 0

:

|ℛ^{n}| \geq M \cdot n^{κ} e v e n t u a l l y

Let

γ > 0

. Then, we have:

\begin{matrix} P (t h e r e i s 𝒫^{n} \in P^{n} | ∥ π_{𝒫^{n}} ξ^{n} ∥^{2} > (γ / κ + 1) β D | 𝒫^{n} | ln (ℛ^{n})) \\ \leq & 2 D M^{- γ / κ} n^{- γ} \end{matrix}

(11)

Moreover, for

γ > 1

and for almost all ω, there is

n_{0} (ω) \in M

, such that for

n \geq n_{0}

:

a n d f o r a l l 𝒫^{n} \in P^{n}, {∥π_{𝒫^{n}} ξ^{n} (ω)∥}^{2} \leq (γ / κ + 1) β D | 𝒫^{n} | ln (| ℛ^{n} |)

(12)

This will be proven at a more abstract level. No structure of the finite sets,

S^{n}

, is required. Nevertheless, we adopt all definitions from Section 1, mutatis mutandis. All Euclidean spaces,

R^{k}

, will be endowed with their natural inner products,

〈 \cdot, \cdot 〉

, and respective norms. The projection onto the linear subspace

ℋ

will be denoted by

π_{ℋ}

.

Theorem 3

Suppose that the noise variables,

ξ_{s}^{n}

, fulfill (9) accordingly. Consider finite nonempty collections,

H^{n}

, of linear subspaces in

R^{S^{n}}

, and assume that the dimensions of all subspaces,

ℋ \in H^{n}

,

n \in M

, are uniformly bounded by some number,

D \in N

. Assume in addition that there is a number,

M > 0

, such that for some

κ > 0

:

|H^{n}| \geq M \cdot n^{κ} e v e n t u a l l y

Let

γ > 0

. Then, we have

P (t h e r e i s ℋ \in H^{n} | ∥ π_{ℋ} ξ^{n} ∥^{2} \geq (γ / κ + 1) β D ln (H^{n})) \leq 2 D M^{- γ / κ} n^{- γ}

Note that

∥ \cdot ∥

is the Euclidean norm in the spaces,

R^{S^{n}}

, since each

ξ^{n} (ω)

is simply a vector. The assumption in the theorem can be reformulated as

| H^{n} |^{- 1} = O (n^{- κ})

.

Proof.

Choose

n \in M

and

ℋ \in H^{n}

with

dim ℋ = d_{n}

. Let

e_{i}

,

1 \leq i \leq d_{n}

be some orthonormal basis of

ℋ

. Observe that for any real number,

c > 0

:

\sum_{i = 1}^{d_{n}} | 〈 ξ^{n} (ω), e_{i} 〉 |^{2} > c^{2} ln | H^{n} |

implies that:

| 〈 ξ^{n} (ω), e_{i} 〉 |^{2} > \frac{c^{2}}{d_{n}} ln | H^{n} | for at least one i = 1, \dots, d_{n}

We derive a series of inequalities:

\begin{matrix} P ({∥π_{H} ξ^{n}∥}^{2} > c^{2} ln | H^{n} |) = P (\sum_{i = 1}^{d_{n}} | 〈 ξ^{n}, e_{i} 〉 |^{2} > c^{2} ln | H^{n} |) \\ \leq & P (⋃_{i = 1}^{d_{n}} {| 〈 ξ^{n}, e_{i} 〉 |^{2} > \frac{c^{2}}{d_{n}} ln | H^{n} |}) \leq \sum_{i = 1}^{d_{n}} P (| 〈 ξ^{n}, e_{i} 〉 |^{2} > \frac{c^{2}}{d_{n}} ln | H^{n} |) \\ \leq & \sum_{i = 1}^{d_{n}} P (|\sum_{s \in S^{n}} ξ_{s}^{n} e_{i, s}| > c {(ln | H^{n} | / d_{n})}^{1 / 2}) \end{matrix}

where the first inequality follows from the introductory implication. By (9), we may continue with:

\leq 2 \cdot d_{n} exp (\frac{- c^{2} ln | H^{n} |}{β d_{n} \sum_{s \in S^{n}} e_{i, s}^{2}}) \leq 2 \cdot D \cdot {| H^{n} |}^{\frac{- c^{2}}{β D}}

This implies that:

\begin{matrix} P (\exists ℋ \in H^{n} | ∥ π_{H} ξ^{n} ∥^{2} \geq c^{2} ln (h^{n})) \leq \sum_{ℋ \in H^{n}} P ({∥π_{ℋ} ξ^{n}∥}^{2} > c^{2} ln | H^{n} |) \\ \leq & 2 D \sum_{ℋ \in H^{n}} | H^{n} |^{\frac{- c^{2}}{β D}} \leq 2 D | H^{n} | | H^{n} |^{\frac{- c^{2}}{β D}} \\ \leq & 2 D {(\frac{1}{M} \cdot n^{- κ})}^{\frac{c^{2}}{β D} - 1} = 2 D \cdot M^{1 - c^{2} / (β D)} n^{- κ (\frac{c^{2}}{β D} - 1)} \end{matrix}

Let

γ > 0

. For

c^{2} = (γ / κ + 1) β D

:

P (\exists ℋ \in H^{n} | ∥ π_{H} ξ^{n} ∥^{2} \geq (γ / κ + 1) β D ln (H^{n})) \leq 2 D M^{- γ / κ} n^{- γ}

and the assertion is proven.

□

Now, let us prove Proposition 1:

Proof of Proposition 1.

We apply Theorem 3 to the collections,

H^{n} = {ℱ_{R} : R \in ℛ^{n}}

. Then,

| H^{n} | = | ℛ^{n} |

. Since for each

𝒫^{n} \in P^{n}

, the spaces,

ℱ_{P}

,

P \in 𝒫^{n}

, are mutually orthogonal, one has for

z \in R^{S^{n}}

that:

∥ π_{𝒫^{n}} {z ∥}^{2} = \sum_{P \in 𝒫^{n}} {∥ π_{ℱ_{P}^{n}} z ∥}^{2}

In particular, we have:

\begin{matrix} P (\sum_{P \in 𝒫^{n}} ∥ π_{ℱ_{P}^{n}} ∥^{2} > (\frac{γ}{κ} + 1) β D ln (H^{n}) | 𝒫^{n} |) \\ \leq P (\exists P \in 𝒫^{n} | ∥ π_{ℱ_{P}^{n}} ∥^{2} > (\frac{γ}{κ} + 1) β D ln (H^{n}) | 𝒫^{n} |) \end{matrix}

Applying Theorem 3 to the latter inequality Proves (11). Moreover, for

γ < 1

, we observe that the right hand side of (11) has a finite sum over n. Thus, the Borel-Cantelli lemma yields:

\begin{matrix} P (∥ π_{𝒫^{n}} ξ^{n} ∥ > (\frac{γ}{κ} + 1) β D | 𝒫^{n} | ln | ℛ_{n} | \\ for finitely many (n, 𝒫^{n}) with 𝒫^{n} \in P^{n}) = 1 \end{matrix}

This implies (12).

□

Let us finally illustrate the above concept in the classical case of Gaussian white noise.

Remark 3

Continuing from Remark 2, we illustrate the behavior of the lower bound for the constant, C, in Proposition 1 and Theorem 3, in the case of white Gaussian noise and polynomially growing number of fragments, i.e.,

| ℛ^{n} |

is asymptotically equivalent to

n^{κ}

. In this case, the estimate for the norm of noise projections takes the form, for almost all ω and for

n \geq n_{0}

:

\begin{matrix} ∥ π_{𝒫^{n}} ξ^{n} {(ω) ∥}^{2} & \leq & (\frac{1}{κ} + 1) κ 2 σ^{2} D | 𝒫^{n} | ln n = (1 + κ) 2 σ^{2} D | 𝒫^{n} | ln n \end{matrix}

This underlines the dependency between the noise projections, the number of fragments, the noise variance, the dimension of the regression spaces and the size of the partitions.

3.3. Discrete and Continuous Functionals

We want to approximate functions, f, on the continuous domain,

S^{\infty} = {[0, 1)}^{d}

, by estimates on discrete finite grids,

S^{n}

. The connections between the two settings are provided by the maps,

ι^{n}

and

δ^{n}

, introduced in (5) and (6). Note first that:

〈 ι^{n} x, ι^{n} y 〉 = 〈 x, y 〉 / | S^{n} | and ∥ ι^{n} {x ∥}^{2} = {∥ x ∥}^{2} / | S^{n} | for x, y \in R^{S^{n}}

(13)

where the inner product and norm on the respective left-hand sides are those on

L^{2} (S^{\infty})

, and on the right-hand sides, one has the Euclidean inner product and norm. Furthermore, one needs appropriate versions of the Functionals (7). Let now

S^{n}

be segmentation classes on the domains,

S^{n}

, and

S \supset ι^{n} S^{n}

a segmentation class on

S^{\infty}

. Set:

\begin{matrix} H_{γ}^{n} : R^{S^{n}} \times S^{n}, H_{γ}^{n} (z, (𝒫^{n}, g_{𝒫^{n}}^{n})) & = & γ | 𝒫^{n} | + ∥ z - g_{𝒫^{n}} ∥^{2} / | S^{n} | \\ {\tilde{H}}_{γ}^{n} : L^{2} (S^{\infty}) \times S, {\tilde{H}}_{γ}^{n} (f, (𝒫, g_{𝒫})) & = & \{\begin{matrix} γ | 𝒫 | + ∥ f - g_{𝒫} ∥^{2} & if (𝒫, g_{𝒫}) \in ι^{n} S^{n} \\ \infty & otherwise \end{matrix} \end{matrix}

The two functionals are compatible.

Proposition 2

Let

n \in M

and

(𝒫^{n}, g_{𝒫^{n}}) \in S^{n}

and

z^{n} \in R^{S^{n}}

. Then,

H_{γ}^{n} (z^{n}, {(𝒫^{n}, g_{𝒫^{n}})}^{n}) = {\tilde{H}}_{γ}^{n} (ι^{n} z^{n}, ι^{n} (𝒫^{n}, g_{𝒫^{n}}^{n}))

If, moreover,

f \in L^{2} (S^{\infty})

then:

(𝒫^{n}, g_{𝒫^{n}}^{n}) \in argmin H_{γ}^{n} (δ^{n} f, \cdot) ⟺ ι^{n} (𝒫^{n}, g_{𝒫^{n}}^{n}) \in argmin {\tilde{H}}_{γ}^{n} (f, \cdot)

Proof.

The identity is an immediate consequence of (13). Hence, let us turn to the equivalence of minimal points. The key is a suitable decomposition of the functional,

{\tilde{H}}_{γ}^{n} (f, \cdot)

. The map,

ι^{n} δ^{n}

, is the orthogonal projection of

L^{2} (S^{\infty})

onto the linear space,

ℋ^{n} = span {1_{I_{i j} : 1 \leq i, j \leq n}}

, and for any

(𝒫, h) \in ι^{n} S^{n}

, the function, h, is in

ℋ^{n}

. Hence:

{∥ f - h ∥}^{2} + γ | 𝒫 | = ∥ f - ι^{n} δ^{n} {f ∥}^{2} + ∥ ι^{n} δ^{n} {f - h ∥}^{2} + γ | 𝒫 |

The quantity,

∥ f - ι^{n} δ^{n} {f ∥}^{2}

, does not depend on

(𝒫, h)

. Therefore, a pair,

(𝒫, h)

, minimizes

∥ f - ι^{n} δ^{n} {f ∥}^{2} + ∥ ι^{n} δ^{n} {f - h ∥}^{2} + γ | 𝒫 |

if and only if it minimizes

∥ ι^{n} δ^{n} {f - h ∥}^{2} + γ | 𝒫 | = {\tilde{H}}_{γ}^{n} (ι^{n} δ^{n} f, ι^{n} (𝒫, h))

Setting

z^{n} = δ^{n} f

in (2), this completes the proof.

□

3.4. Upper Bound for Projective Segmentation Classes

We compute an upper bound for the estimation error in a special setting: choose in advance a finite dimensional linear subspace,

𝒢

of

L^{2} (S^{\infty})

. Discretization induces linear spaces,

δ^{n} 𝒢 = {δ^{n} f : f \in 𝒢}

and

𝒢_{P}^{n} = {1_{P} \cdot g : g \in δ^{n} 𝒢}

, for any

P \subset S^{n}

, of functions on

S^{n}

. Let further for each

n \in M

, a set,

ℛ^{n}

, of admissible fragments and a family,

P^{n}

, of partitions with fragments in

ℛ^{n}

be given. Set

G^{n} : = \{𝒢_{𝒫} : 𝒫 \in P^{n}\}

. The induced segmentation class,

S^{n} (P^{n}, G^{n}) = \{(P^{n}, f) : \exists 𝒫 \in P^{n}, f \in 𝒢_{𝒫}\}

will be called projective

𝒢

-segmentation class at stage, n.

The following inequality is at the heart of later arguments since it controls the distance between the discrete M-estimates and the “true” signal. Note that this result is also central in the derivation of the results of the model selection theory.

Lemma 1

Let for

n \in M

a

𝒢

-projective segmentation class,

S^{n}

, over

S^{n}

be given and choose a signal,

f \in L^{2} (S^{\infty})

, and a vector,

ξ^{n} \in R^{S^{n}}

. Let further

({\hat{𝒫}}^{n}, {\hat{f}}^{n}) \in \underset{(𝒬, h) \in S^{n}}{argmin} H_{γ}^{n} (δ^{n} f + ξ^{n}, (𝒬, h))

and

(𝒬, h) \in S^{n}

. Then,

∥ ι^{n} {\hat{f}}^{n} {- f ∥}^{2} \leq 2 γ (| 𝒬 | - | {\hat{𝒫}}^{n} |) + 3 ∥ ι^{n} {h - f ∥}^{2} + \frac{16}{n^{d}} (∥ π_{{\hat{𝒫}}^{n}} ξ^{n} ∥^{2} + {∥ π_{𝒬} ξ^{n} ∥}^{2})

(14)

Proof.

Since

({\hat{𝒫}}^{n}, {\hat{f}}^{n})

is a minimal point of

H_{γ}^{n} (δ^{n} f + ξ^{n}, \cdot)

, the embedded segmentation,

ι^{n} ({\hat{𝒫}}^{n}, {\hat{f}}^{n})

, is a minimal point of

{\tilde{H}}_{γ}^{n} (f + ι^{n} ξ^{n}, \cdot)

by Proposition 2 and, hence:

γ | {\hat{𝒫}}^{n} | + ∥ (ι^{n} {\hat{f}}^{n} - f) - ι^{n} ξ^{n} ∥^{2} \leq γ | 𝒬 | + ∥ (ι^{n} h - f) - ι^{n} ξ^{n} ∥^{2}

Expansion of squares yields that:

\begin{matrix} γ | {\hat{𝒫}}^{n} | + ∥ ι^{n} {\hat{f}}^{n} {- f ∥}^{2} + 2 〈 ι^{n} {\hat{f}}^{n} - f, ι^{n} ξ^{n} 〉 + {∥ ι^{n} ξ^{n} ∥}^{2} \\ \leq & γ | 𝒬 | + ∥ ι^{n} {h - f ∥}^{2} + 2 〈 ι^{n} h - f, ι^{n} ξ^{n} 〉 + {∥ ι^{n} ξ^{n} ∥}^{2} \end{matrix}

and hence:

∥ ι^{n} {\hat{f}}^{n} {- f ∥}^{2} \leq γ (| 𝒬 | - | {\hat{𝒫}}^{n} |) + ∥ ι^{n} {h - f ∥}^{2} + 2 〈 ι^{n} h - ι^{n} {\hat{f}}^{n}, ι^{n} ξ^{n} 〉

(15)

By definition,

h \in ℱ_{𝒬}

and

{\hat{f}}^{n} \in ℱ_{{\hat{𝒫}}^{n}}

, which implies that

h - {\hat{f}}^{n} \in ℱ^{'} = span ({\hat{𝒫}}^{n}, ℱ_{𝒬})

and, hence,

π_{ℱ^{'}} ({\hat{f}}^{n} - h) = {\hat{f}}^{n} - h

. We proceed with:

\begin{matrix} | 〈 ι^{n} h - ι^{n} {\hat{f}}^{n}, ι^{n} ξ^{n} 〉 | = | S^{n} |^{- 1} | 〈 π_{ℱ^{'}} ({\hat{f}}^{n} - h), ξ^{n} 〉 | = | S^{n} |^{- 1} | 〈 h - {\hat{f}}^{n}, π_{ℱ^{'}} ξ^{n} 〉 | \\ \leq & ∥ ι^{n} {\hat{f}}^{n} - ι^{n} h ∥ \cdot | S^{n} |^{- 1 / 2} \cdot ∥ π_{ℱ^{'}} ξ^{n} ∥ \\ \leq & | S^{n} |^{- 1 / 2} ∥ π_{ℱ^{'}} ξ^{n} ∥ \cdot ∥ ι^{n} {\hat{f}}^{n} - f ∥ + | S^{n} |^{- 1 / 2} ∥ π_{ℱ^{'}} ξ^{n} ∥ \cdot ∥ f - ι^{n} h ∥ \end{matrix}

Since

a b \leq a^{2} + b^{2} / 4

, we conclude:

\begin{matrix} | 〈 ι^{n} h - ι^{n} {\hat{f}}^{n}, ι^{n} ξ^{n} 〉 | & \leq & ∥ ι^{n} {\hat{f}}^{n} - ι^{n} {h ∥}^{2} / 4 + ∥ f - ι^{n} {h ∥}^{2} / 4 + 2 ∥ π_{ℱ^{'}} ξ^{n} ∥^{2} / | S^{n} | \\ \leq & ∥ ι^{n} {\hat{f}}^{n} - ι^{n} {h ∥}^{2} / 4 + {∥ f - ι^{n} h ∥}^{2} / 4 \\ + 4 (∥ π_{{\hat{𝒫}}^{n}} ξ^{n} ∥^{2} + {∥ π_{𝒬} ξ^{n} ∥}^{2}) / | S^{n} | \end{matrix}

Putting this into Inequality (15) results in:

\begin{matrix} ∥ ι^{n} {\hat{f}}^{n} {- f ∥}^{2} & \leq & γ (| 𝒬 | - | {\hat{𝒫}}^{n} |) + ∥ ι^{n} {h - f ∥}^{2} + {∥ ι^{n} {\hat{f}}^{n} - f ∥}^{2} / 2 \\ + ∥ f - ι^{n} {h ∥}^{2} / 2 + 8 (∥ π_{{\hat{𝒫}}^{n}} ξ^{n} ∥^{2} + {∥ π_{𝒬} ξ^{n} ∥}^{2}) / | S^{n} | \end{matrix}

which implies the asserted inequality.

□

4. Consistency

In this section, we complete the abstract considerations and summarize the preliminary work in two theorems on consistency. The first one concerns the desired

L^{2}

-convergence of estimates to the “truth”, and the second one provides convergence rates.

4.1. $L^{2}$ -Convergence

We will prove now that the estimates of image converge almost surely to the underlying true signal in

L^{2} (S^{\infty})

for almost all observations. We adopt the projective setting introduced in Section 3.4. Let us make some agreements in advance.

Hypothesis 1

Assume that

(H1.1): There are $κ > 0$ and $C > 0$ , such that $| ℛ^{n} | \geq C \cdot n^{κ}$ , and eventually,
(H1.2): The random variables, $ξ_{s}^{n}$ , are sub-Gaussian, such that

$β = 2 \cdot sup {τ^{2} (ξ_{s}^{n}) : n \in M, s \in S^{n}} < \infty$
(H1.3): The positive sequence, ${(γ_{n})}_{n \in N}$ , satisfies:

$γ_{n} \to 0 a n d γ_{n} > C D \cdot \frac{ln | R^{n} |}{| S^{n} |}, f o r e v e n t u a l l y a l l n$

with $C = 4 β (κ + 1) / κ$ , and D is, like in Proposition 1, an upper bound for the dimension of the linear spaces, $ℱ_{P}$ .

Remark.

Let us briefly comment on these hypotheses. (H1.1) means that the number of fragments of which the models are built is growing in n at least as rapidly as the (possibly fractional) monomial,

n^{κ}

. Observe that the central point here is to use the number of fragments rather than the number of models. Note further that, besides the wedgelets and triangulations discussed in Section 5, this setting includes many other partitioning schemes, for instance, smoothlets; see [34]. (H1.2) reflects the assumption that noise is uniformly sub-Gaussian, i.e., the tails are asymptotically comparable to those of Gaussians with uniform variance. (H1.3) expresses that the variational parameters,

γ_{n}

, should tend to zero, but at a controlled velocity: If

γ_{n}

converges too fast to infinity, the noise dominates, while too slow of a convergence of

γ_{n}

leads to dominating approximation error. Finally, we note that the constant, which controls the decay of

γ_{n}

, depends on the maximal dimension of the local regression spaces, D, on the noise level, β, and on the polynomial growth rate, κ, of the number of fragments used in the model. Finally, note that the condition,

γ_{n} \cdot | S^{n} | / ln n \to \infty

, implies the second part of (H1.3) by (H1.1). It was used, for example, in [8,15,16].

Given a signal,

f \in L^{2} (S^{\infty})

, we must assure that our setting actually allows for good approximations of f at all. If so, least squares estimates are consistent.

Theorem 4

Assume that Hypothesis 1 holds. Let

f \in L^{2} (S^{\infty})

and suppose:

lim_{k \to \infty} \underset{n \to \infty}{lim sup} inf_{(𝒬, h) \in S^{n}, | 𝒬 | \leq k} {∥ ι^{n} h - f ∥}^{2} = 0

(16)

Then:

∥ ι^{n} {\hat{f}}^{n} {(ω) - f ∥}^{2} ⟶ 0 a s n \to \infty f o r a l m o s t a l l ω \in Ω

We formulate part of the proof separately, since it will be needed later once more.

Lemma 2

We maintain the assumptions of Theorem 4. Then, given

k > 0

, for almost all ω, there exists

n_{0} (ω)

, such that for all

n \geq n_{0} (ω)

, and for all

(Q, h) \in S^{n}

, such that

| Q | \leq k

:

∥ ι^{n} {\hat{f}}^{n} {(ω) - f ∥}^{2} \leq 3 γ_{n} \cdot | Q | + 3 ∥ ι^{n} {h - f ∥}^{2}

(17)

Proof. Lemma 1 yields:

∥ ι^{n} {\hat{f}}^{n} {(ω) - f ∥}^{2} \leq 2 γ_{n} (| 𝒬 | - | 𝒫^{n} |) + 3 {∥ ι^{n} h - f ∥}^{2} + \frac{16}{n^{d}} (∥ π_{{\hat{𝒫}}^{n}} {ξ ∥}^{2} + {∥ π_{𝒬} ξ ∥}^{2})

and application of Proposition 1 implies that for any real number,

C^{'} > \frac{κ + 1}{κ} β D

, the following inequality holds for almost all

ω \in Ω

:

\begin{matrix} ∥ ι^{n} {\hat{f}}^{n} {(ω) - f ∥}^{2} & \leq & 2 γ_{n} k + 3 {∥ ι^{n} h - f ∥}^{2} \\ + 16 C^{'} (\frac{ln (| ℛ^{n} |)}{n^{d}}) \cdot (| 𝒬 | + | {\hat{𝒫}}^{n} |) - 2 γ_{n} \cdot | {\hat{𝒫}}^{n} | \\ \leq & 2 γ_{n} k + 3 {∥ ι^{n} h - f ∥}^{2} \\ + 16 C^{'} \frac{ln | ℛ^{n} |}{n^{d}} k + | 𝒫^{n} | (8 C^{'} \frac{ln | ℛ^{n} |}{n^{d}} - 2 γ_{n}) \end{matrix}

For

γ_{n}

satisfying Hypothesis (H1.3), the term in parenthesis is negative. Therefore, (17) holds, and the assertion is proven.

□

Theorem 4 follows now easily.

Proof of Theorem 4.

The following formulae hold almost surely. Lemma 2 implies that:

∥ ι^{n} {\hat{f}}^{n} {- f ∥}^{2} \leq 3 γ_{n} \cdot k + 3 \cdot inf_{(𝒬, h) \in S^{n}, | 𝒬 | \leq k} (∥ ι^{n} {h - f ∥}^{2}) eventually

Therefore:

\begin{matrix} \underset{n \to \infty}{lim sup} {∥ ι^{n} {\hat{f}}^{n} - f ∥}^{2} & \leq & \underset{n \to \infty}{lim sup} (3 γ_{n} \cdot k + 3 \cdot inf_{(𝒬, h) \in S^{n}, | 𝒬 | \leq k} (∥ ι^{n} {h - f ∥}^{2})) \\ = & 0 + 3 \cdot \underset{n \to \infty}{lim sup} inf_{(𝒬, h) \in S^{n}, | 𝒬 | \leq k} (∥ ι^{n} {h - f ∥}^{2}) \end{matrix}

By Assumption (16), the right-hand side converges to zero as k tends to ∞. Hence:

\underset{n \to \infty}{lim sup} {∥ ι^{n} {\hat{f}}^{n} - f ∥}^{2} = 0

which completes the proof.

□

4.2. Convergence Rates

The final abstract result provides almost sure convergence rates in the general setting.

Theorem 5

Suppose that Hypothesis 1 holds, and assume further that there are real numbers,

α, C > 0, ϱ \geq 0

, and a sequence,

{(F_{n})}_{n \in N}

, with

{lim}_{n \to \infty} F_{n} = + \infty

, such that:

\begin{matrix} ∥ ι^{n} h - f ∥ \leq C \cdot (\frac{k^{ϱ}}{F_{n}} + \frac{1}{k^{α}}) \end{matrix}

(18)

\begin{matrix} f o r a l l n \in M a n d k, a n d s o m e (𝒬, h) \in S^{n} w i t h | 𝒬 | \leq k \end{matrix}

Then:

∥ ι^{n} {\hat{f}}^{n} {(ω) - f ∥}^{2} = O (γ_{n}^{\frac{2 α}{2 α + 1}}) + O (F_{n}^{- \frac{2 α}{α + ϱ}}) f o r a l m o s t a l l ω \in Ω

(19)

Proof.

Let

{(k_{n})}_{n \in M}

be a sequence in

R_{+}

. Recall from Lemma 2 that

∥ ι^{n} {\hat{f}}^{n} {- f ∥}^{2} \leq 2 γ_{n} \cdot k_{n} + 3 \cdot {∥ ι^{n} h - f ∥}_{2}^{2}

for sufficiently large

n \in M

and any

(𝒬, h) \in S^{n}

with

| 𝒬 | \leq k_{n}

on a set of ω of full measure. The following arguments hold for all such ω. We will write C for constants; hence, the C below may differ. Since

{(a + b)}^{2} \leq 2 (a^{2} + b^{2})

, Assumption (18) implies that:

∥ ι^{n} {\hat{f}}^{n} {- f ∥}^{2} \leq C (γ_{n} \cdot k_{n} + \frac{k_{n}^{2 ϱ}}{F_{n}^{2}} + \frac{1}{k_{n}^{2 α}})

(20)

This decomposition of the error into variance, discretization and bias can be interpreted as follows: The first term corresponds to an estimate of the error due to the noise, the second term corresponds to the discretization, while the third term can be directly related to the approximation error of the underlying scheme, in the continuous domain.

One has free choice of the parameters,

k_{n}

. We enforce the same decay rate for the first and third term setting

γ_{n} k_{n} = k_{n}^{- 2 α} .

Then, in view of (20):

∥ ι^{n} {\hat{f}}^{n} {- f ∥}^{2} \leq C (γ_{n}^{\frac{2 α}{2 α + 1}} + \frac{γ_{n}^{- \frac{2 ϱ}{2 α + 1}}}{F_{n}^{2}})

(21)

To get the same rate for the discretization and the approximation error set:

\frac{k_{n}^{2 ϱ}}{F_{n}^{2}} = \frac{1}{k_{n}^{2 α}} or equivalently k_{n} = F_{n}^{\frac{1}{ϱ + α}}

which, together with Estimate (20), yields:

∥ ι^{n} {\hat{f}}^{n} {- f ∥}^{2} \leq C (γ_{n} F_{n}^{\frac{1}{ϱ + α}} + F_{n}^{- \frac{2 α}{α + ϱ}})

(22)

Straightforward calculation gives:

γ_{n}^{\frac{2 α}{2 α + 1}} \geq \frac{γ_{n}^{- \frac{2 ϱ}{2 α + 1}}}{F_{n}^{2}} if and only if γ_{n} F_{n}^{\frac{1}{α + ϱ}} \geq \frac{1}{F_{n}^{\frac{2 α}{α + ϱ}}}

Hence, the first term on the right-hand side of Inequality (21) dominates the second one if and only this holds in Inequality (22). We discriminate between the two cases, ≥ and <. The first one is:

γ_{n}^{\frac{2 α}{2 α + 1}} \geq \frac{γ_{n}^{- \frac{2 ϱ}{2 α + 1}}}{F_{n}^{2}}

(23)

Combination with (21) results in:

∥ ι^{n} {\hat{f}}^{n} {- f ∥}_{2}^{2} \leq C \cdot γ_{n}^{\frac{2 α}{2 α + 1}}

(24)

for some

C > 0

. In view of the equivalence, replacement of ≥ by < in (23), results in:

γ_{n} F_{n}^{\frac{1}{α + ϱ}} < F_{n}^{- \frac{2 α}{α + ϱ}}

which, together with Estimate (22) gives for some

C > 0

that:

∥ ι^{n} {\hat{f}}^{n} {- f ∥}^{2} \leq C \cdot F_{n}^{- \frac{2 α}{α + ϱ}}

(25)

Combination of (25) and (24) completes the proof of (19).

□

Remark 4

Let us continue from Remark 3. If

| ℛ^{n} | \sim n^{κ}

and noise is white Gaussian with

β = 2 σ^{2}

, then Hypothesis (H1.3) boils down to

γ_{n} ⟶ 0 and γ_{n} > 2 (κ + 1) σ^{2} D \cdot \frac{ln n}{n^{d}}

Setting

ε_{n} = σ / n^{d / 2}

, the Estimate (19) then reads as follows. For almost all ω, there exists

n_{0} (ω)

, such that for all

n \geq n_{0}

,

∥ ι^{n} {\hat{f}}^{n} {(ω) - f ∥}^{2} = O ({(ε_{n}^{2} |ln ε_{n}|)}^{\frac{2 α}{2 α + 1}})

as long as the growth of

F_{n}

is sufficient. In particular, we obtain almost sure convergence rates of the same order as those obtained by model selection with the same penalty, for the control of the

L^{2}

-risk of the estimators, see, for instance, [18].

5. Special Segmentations

We are going to now exemplify the abstract Theorem 5 by way of typical partitions and spaces of functions. On the one hand, this extends a couple of already existing results, and on the other hand, it illustrates the wide range of possible applications.

5.1. One Dimensional Signals—Interval Partitions

Albeit, the focus of this paper is on two or more dimensions, we start with one dimension. There are at least two reasons for that: Illustration of the abstract results by choices of the (seemingly) most elementary example and to generalize results like some of those in [15,16] to classes of piecewise Sobolev functions.

To be definite, let

S^{n} = {1, \dots, n}

and let

ℛ^{n} = {[i, j] : 1 \leq i \leq j \leq n}

be the discrete intervals of admissible fragments. Then,

P^{n}

is the collection of partitions of

S^{n}

into intervals. Plainly,

| ℛ^{n} | = (n + 1) n / 2

and

| P^{n} | = 2^{n - 1}

. We deal with approximation by local polynomials. To this end and in accordance with Section 3.4, we choose the finite dimensional linear subspace,

ℱ_{p} \subset L^{2} ([0, 1))

, of polynomials of maximal degree, p. The induced segmentation class,

S^{n}

, consists of the piecewise polynomial functions relative to a partition in

P^{n}

.

The signals to be estimated will be members of the fractional Sobolev space,

W^{α, 2} ((0, 1))

, of the order,

α > 0

. The main task is to verify Condition (18). Note that this class of functions is slightly larger than the classical Hölder spaces of the order, α, as usually treated. For results in the case of equidistant partitioning, we refer, for instance, to [35], Section 11.2.

For the following lemma, we adopt classical arguments from approximation theory.

Lemma 3

For any

f \in W^{α, 2} ((0, 1))

, with

p < α < p + 1

, there is

C > 0

, such that for all

k \leq n \in N

, there is

(𝒫_{k}^{n}, h_{k}^{n},) \in S^{n}

, such that

| 𝒫_{k}^{n} | \leq k

and which satisfies:

∥ f - ι^{n} h_{k}^{n} ∥ \leq C \cdot (\frac{1}{k^{α}} + \frac{k}{n})

(26)

For the proof, let us introduce partitions,

ℐ_{k} = {[(i - 1) / k, i / k) : i = 1, \dots, k}

, of

[0, 1)

into k intervals, each of length,

1 / k

.

Proof.

Let

f \in W^{α, 2} ((0, 1))

. From classical approximation theory (see e.g., [36], Chapter 12, Theorem 2.4), we learn that there is

C > 0

, such that there is a piecewise polynomial function,

h_{k}

, of a degree at most p, such that:

∥ f - h_{k} ∥ \leq \frac{C}{k^{α}}

For each

i = 1, \dots, k

, let

h_{k, i}

denote the restriction of

h_{k}

to

I_{i} = ((i - 1) / k, i / k)

. We consult the Bramble-Hilbert lemma (for a version corresponding to our needs, we refer to Theorem 6.1 in [37]) and find

C > 0

, such that:

| f - h_{k, i} |_{W^{1, 2} (I_{i})} \leq C \cdot {| f |}_{W^{1, 2} (I_{i})} for each i = 1, \dots, k

This yields for some

C > 0

, independent of k and n, that

| h_{k, i} |_{W^{1, 2} (I_{i})} \leq | f - h_{k, i} |_{W^{1, 2} (I_{i})} + {| f |}_{W^{1, 2} (I_{i})} \leq C \cdot {| f |}_{W^{1, 2} (I_{i})} for all i = 1, \dots, k

We turn now to the piecewise constant approximation on the partition,

ℐ_{n}

. We split

[0, 1)

into the union,

J_{k}^{n}

, of those intervals in

ℐ_{n}

, which do not contain knots,

i / k

, and the union,

K_{k}^{n}

, of those intervals in

ℐ_{n}

, which do contain knots,

i / k

. For

I \in ℐ_{k}

and

I \subset J_{k}^{n}

, we have:

| h_{k, i} |_{W^{1, 2} (I)} \leq {C | f |}_{W^{1, 2} (I)} | if and only if | h_{k, i}^{'} |_{L^{2} (I)}^{2} \leq C^{2} \cdot {| f^{'} |}_{L^{2} (I)}^{2}

This implies:

\sum_{I \subset J_{n}^{k}} | h_{k, i}^{'} |_{L^{2} (I)}^{2} \leq C^{2} \sum_{I \subset J_{n}^{k}} | f^{'} |_{L^{2} (I)}^{2} \leq C^{2} {| f^{'} |}_{L^{2} ([0, 1])}

which in turn leads to:

| h_{k} |_{W^{1, 2} (J_{n}^{k})} \leq C^{2} {| f |}_{W^{1, 2} ((0, 1))}

Hence we are ready to conclude that for some constant,

C > 0

:

∥ h_{k} - ι^{n} δ^{n} h_{k} ∥_{L^{2} (J_{k}^{n})} \leq C / n

(27)

For

I \in ℐ_{k}

and

I \subset K_{k}^{n}

, we use the fact that

h_{k}^{n} \leq 2 C \cdot {∥ f ∥}_{L^{\infty}} ([0, 1])

and deduce:

∥ h_{k} - ι^{n} δ^{n} h_{k} ∥_{L^{2} (I)} \leq 2 C {∥ f ∥}_{L^{\infty} (I)} / n

Summation over all intervals included in

K_{k}^{n}

results in:

∥ h_{k} - ι^{n} δ^{n} h_{k} ∥_{L^{2} (K_{k}^{n})} \leq C \cdot k / n

This yields for the entire interval

[0, 1)

that

∥ f - ι^{n} δ^{n} h_{k} ∥ \leq ∥ f - h_{k} ∥ + ∥ h_{k} - ι^{n} δ^{n} h_{k} ∥ \leq C (\frac{k}{n} + \frac{1}{k^{α}})

with

h_{k}^{n} = δ^{n} h_{k}

. This completes the proof.

□

Piecewise smooth functions have only a very low Sobolev regularity. Indeed, recall that piecewise smooth functions belong to

W^{α, 2} ((0, 1))

only for

α > 1 / 2

. In order to overcome this limitation, we consider a larger class of functions, the class of piecewise Sobolev functions.

Definition 1

Let

α > 1 / 2

be a real number,

J \in N

, and

x_{0} = 0 < x_{1} < \dots < x_{J + 1} = 1

. A function, f, is said to be piecewise

W^{α, 2} ([0, 1])

with J jumps, relative to the partition,

{[x_{i}, x_{i + 1}) : i = 1, \dots, J}

if

{f |}_{(x_{i}, x_{i + 1})} \in W^{α, 2} ((x_{i}, x_{i + 1}))

Remark 5

Definition 1 is consistent, due to the Sobolev embedding theorem. For an open interval, I, of

R

,

W^{α, 2} (I)

is continuously embedded into

𝒞 (I^{a})

, the space of uniformly continuous functions on the closure

I^{a}

of I.

We conclude from Lemma 3:

Lemma 4 Let f be piecewise-

W^{α, 2} ([0, 1))

with J jumps and with

p < α < p + 1

. Then, there are

C > 0

and

(𝒫_{k}^{n}, h_{k}^{n}) \in S^{n}

, such that

| 𝒫^{n} | \leq k

and:

∥ f - h_{k}^{n} ∥ \leq C \cdot (\frac{1}{k^{α}} + \frac{k}{n} + \frac{J}{n})

(28)

Proof. With the same arguments as in the proof of Lemma (3), we just have to incorporate the error made at each jump of the original piecewise regular function. More precisely, we use a similar splitting into

J_{k}^{n}

and

K_{k}^{n}

, where

K_{k}^{n}

also contains the intervals containing

x_{i}

for

i = 1, \dots, J

. Since there are at most

k + J

intervals in

K_{k}^{n}

, this gives Estimate: (28).

□

By Lemma 4, a piecewise Sobolev function satisfies Condition (18) with

ρ = 1

and

F_{n} = n

, and therefore, Theorem 5 applies. In summary:

Theorem 6 Let

α \in (0, p + 1)

, where p is the maximal degree of the approximating polynomials, and let f be piecewise,

W^{α, 2} ([0, 1])

. We assume further that (H1.3) holds and that the noise variables,

ξ_{s}^{n}

, from Section 2.1 satisfy (9). Then:

∥ ι^{n} {\hat{f}}^{n} {(ω) - f ∥}^{2} = O (γ_{n}^{\frac{2 α}{2 α + 1}}), f o r a l m o s t a l l ω \in Ω

(29)

Proof. Let us check the assumption in Theorem 5. Since

| ℛ^{n} | = (n - 1) n / 2

, Hypothesis (H1.1) holds with

κ = 2

. Hypothesis (H1.2) and (H1.3) were required separately. Finally, Condition (18) holds with

ϱ = 1

and

F_{n} = n

by Lemma 4. Finally, Hypothesis (H1.3) completes the proof.

□

Let

𝒞^{1} ([0, 1])

denote the set of continuously differentiable functions. For

p \in N

,

α \in (p, p + 1]

, a function,

f \in 𝒞^{p} ([0, 1])

, is said to be α-Hölder if there is

C > 0

, such that

| f^{(p)} (x) - f^{(p)} {(y) | \leq C | x - y |}^{α - p} for any x, y \in [0, 1], x \neq y

The linear space of α-Hölder functions will be denoted by

𝒞^{α} ([0, 1])

if

α \notin N

and

𝒞^{α - 1, 1} ([0, 1])

if

α \in N

.

Remark. Choose

γ_{n} = C ln n / n

with large enough C, independently of f. Then, the almost sure Estimates (29) of the estimation error simplify to:

∥ ι^{n} {\hat{f}}^{n} {(ω) - f ∥}^{2} = O {(\frac{ln n}{n})}^{\frac{α}{2 α + 1}} for almost all ω \in Ω

(30)

These convergence rates are, up to the logarithmic factor, the optimal rates for mean square error in the Hölder classes

𝒞^{α} ([0, 1])

. Thus, our estimate automatically adapts to all smoothness,

α < p + 1

, of the signal.

5.2. Wedgelet Partitions

Wedgelet decompositions are content-adapted partitioning methods based on elementary geometric atoms, called wedgelets. A wedge results from the splitting of a square into two pieces by a straight line, and in our setting, a wedgelet will be a piecewise polynomial function over a wedge partition. The discrete setting requires a careful treatment. We adopt the discretization scheme from [17], which relies on the digitalization of lines from [38]. This discretization differs from that in [21], where all pairs of pixels on the boundary of a discrete square are used as endpoints of line segments. One of the main reasons for our special choice is an efficient algorithm, which returns exact solutions of the Functional (7). It relies on rapid moment computation, based on lookup tables [17].

5.2.1. Wedgelet Partitions

Let us first recall the relevant concepts and definitions. Only the case of dyadic wedgelet partitions will be discussed. Generalizations are straightforward, but technical.

We start from discrete dyadic squares,

S^{m} = {1, \dots, m}^{2}

with

m \in M = {2^{p} : p \in N_{0}}

. Admissible fragments are dyadic squares of the form:

[(i - 1) \cdot 2^{q}, i \cdot 2^{q}) \times [(j - 1) \cdot 2^{q}, j \cdot 2^{q}), 1 \leq i, j \leq 2^{p - q}, 0 \leq q \leq p

The collection of dyadic squares can be interpreted as the set of leaves of a quadtree, where each internal node has exactly four children obtained by subdividing one square into four.

Digital lines in

Z^{2}

are defined for angles

ϑ \in (- π / 4, 3 π / 4]

. Let:

d (ϑ) = max {| cos ϑ |, | sin ϑ |}, v (ϑ) = \{\begin{matrix} (- sin ϑ, cos ϑ) & if | cos ϑ | \geq | sin ϑ | \\ (sin ϑ, - cos ϑ) & otherwise \end{matrix}

The digital line through the origin in direction ϑ is defined as:

L_{ϑ}^{0} = {s \in Z^{2} : - d (ϑ) / 2 < 〈 s, v (ϑ) 〉 \leq d (ϑ) / 2}

Lines parallel to

L_{ϑ}^{0}

are shifted versions:

L_{ϑ}^{r} = {s \in Z^{2} : (r - 1 / 2) d (ϑ) < 〈 s, v (ϑ) 〉 \leq (r + 1 / 2) d (ϑ)}

with the line numbers,

r \in Z

. One distinguishes between flat lines, where

cos ϑ \geq sin ϑ

, and steep lines, where

cos ϑ < sin ϑ

. For

x \in R

, set

round (x) = max {i \in Z : i \leq x + 1 / 2}

, let

y_{ϑ} (x) = round (x \cdot tan ϑ)

and

x_{ϑ} (x) = round (y \cdot cot ϑ)

. According to Lemma 2.7 in [17]:

\begin{matrix} L_{ϑ}^{r} & = & (0, r) + {(x, y_{ϑ} (x) : x \in Z)} for flat lines \\ L_{ϑ}^{r} & = & (r, 0) + {(x_{ϑ} (y), y : y \in Z)} for steep lines \end{matrix}

By Lemma 2.8 in the same reference, all parallel lines partition

Z^{2}

. We are now ready to define wedgelets. Let Q be a square in

Z^{2}

and

L_{ϑ}^{r}

a line with

L_{ϑ}^{r} \cap Q \neq \emptyset

and

L_{ϑ}^{r + 1} \cap Q \neq \emptyset

. A wedge split is a partition of Q into the lower and upper wedge, respectively, given by:

W_{ϑ}^{r, l} = ⋃_{k \leq r} L_{ϑ}^{k} \cap Q, W_{ϑ}^{r, u} = ⋃_{k > r} L_{ϑ}^{k} \cap Q

(31)

Let

𝒬

be a partition of some domain,

S^{m}

, into squares. Then, a wedge partition of

S^{m}

is obtained, replacing some of these squares by the two wedges of a wedge split. It is called dyadic if

m \in M

, and the squares

Q \in 𝒬

are dyadic.

We assume that a finite set, Θ, of angles is given. The set

ℛ^{m}

of admissible segments consisting of wedges obtained by wedge splits of dyadic squares, given by (31) and for

θ \in Θ

or by dyadic squares.

Focus is on piecewise polynomial approximation of low order. The induced segmentation classes,

S^{m}

, consist of piecewise polynomial functions relative to a wedgelet partition. The cases of piecewise constant (original wedgelets) and piecewise linear polynomials (platelets) will be treated explicitly.

5.2.2. Wedgelets and Approximations

We first recall some approximation results for wedgelets. They stem from [21,39]. Since we are not working with the same discretization, we rewrite them for the continuous setting and provide elementary self-contained proofs. The discussion of the discretization is postponed to Section 5.2.3. We start with the definition of horizon functions, like in [21].

Definition 2 (Horizon functions) Let

α \in (1, 2]

and

h \in 𝒞^{α} ([0, 1])

if

α < 2

or

𝒞^{1, 1} ([0, 1])

if

α = 2

. Let, further, f be a bivariate function, which is piecewise constant relative to the partition of

{[0, 1]}^{2}

in an upper and a lower part induced by h:

f (x, y) = \{\begin{matrix} c_{1} i f y \leq h (x) \\ c_{2} i f y > h (x) \end{matrix}

with real numbers,

c_{1}

and

c_{2}

. Such a function is called an α-horizon function; the set of such functions will be denoted by

H o r^{α} ({[0, 1]}^{2})

. h is called the horizon boundary of f.

Discretization at various levels of a typical horizon function is plotted in Figure 2, left column. In the right column, respective noisy versions are shown.

Figure 2. (Left)

δ^{n} f

, for

n = 64, 128, 256

, respectively, where f is a horizon function, according to Definition 2. Here, the horizon boundary is in

𝒞^{α} ((0, 1))

and

α = 1.5

; (Right) Respective noisy images

δ^{n} f + ξ^{n}

.

Figure 2. (Left)

δ^{n} f

, for

n = 64, 128, 256

, respectively, where f is a horizon function, according to Definition 2. Here, the horizon boundary is in

𝒞^{α} ((0, 1))

and

α = 1.5

; (Right) Respective noisy images

δ^{n} f + ξ^{n}

.

Lemma 5 Let

α \in [1, 2]

and

f \in H o r^{α} ({[0, 1]}^{2})

with boundary function, h. Then, there are

C, C^{'} > 0

—independent of k—and for each k, a continuous wedge partition,

𝒲_{k}

, of the unit square,

{[0, 1]}^{2}

, such that

| 𝒲_{k} | \leq C^{'} k

and

∥ f - f_{k} ∥_{L^{2} ({[0, 1]}^{2})} \leq \frac{C}{k^{α / 2}}

where

f_{k}

is the

L^{2}

-projection of f on the space of piecewise constant functions relative to the wedge partition,

𝒲_{k}

.

Proof. Let us first approximate the graph of h by linear pieces. We consider the uniform partition induced by

x_{i} = i / k

. We denote by

S_{k} (h)

the continuous linear spline interpolating, h, relatively to the uniform subdivision:

S_{k} (h) (x) = h (x_{i}) + (x - x_{i}) (\frac{h (x_{i + 1}) - h (x_{i})}{x_{i + 1} - x_{i}}) for i = 0, \dots, k - 1 and x \in I_{i}

where

I_{i} = [x_{i}, x_{i + 1}]

. Therefore, we have:

|h (x) - S_{k} (h) (x)| = |h (x) - h (x_{i}) - \frac{h (x_{i + 1}) - h (x_{i})}{x_{i + 1} - x_{i}} (x - x_{i})| for each x \in I_{i}

(32)

Since

h^{'} \in 𝒞^{0, α - 1} ([0, 1])

, there exists

C > 0

, such that:

|\frac{h (x_{i + 1}) - h (x_{i})}{x_{i + 1} - x_{i}} - h^{'} (x_{i})| \leq C {| x_{i + 1} - x_{i} |}^{α - 1} = \frac{C}{k^{α - 1}}

This implies that:

|h (x) - S_{k} (h) (x)| = |h (x) - h (x_{i}) - (h^{'} (x_{i}) + O (\frac{1}{k^{α - 1}})) (x - x_{i})| for x \in I_{i}

On the other hand:

h (x) = h (x_{i}) + h^{'} (x_{i}) (x - x_{i}) + O (| x - x_{i} |^{α})

Hence, Equation (32) can be rewritten as:

|h (x) - S_{k} (h) (x)| = O (| x - x_{i} |^{α}) + O (\frac{1}{k^{α}})

and there is a constant

C > 0

(independent of k) such that:

{∥h - S_{k} (h)∥}_{L^{\infty} ([0, 1])} \leq \frac{C}{k^{α}}

Now, we will use this estimate to derive error bounds for the optimal wedge representation. As a piecewise approximation of f, we propose:

f_{k} (x, y) = \{\begin{matrix} c_{1} if y < S_{k} (h) (x) \\ c_{2} if y > S_{k} (h) (x) \end{matrix}

We bound the error by the area between the horizon, h, and its piecewise affine reconstruction:

\begin{matrix} ∥ f - f_{k} ∥_{L^{2} ({[0, 1]}^{2})} & \leq | c_{1} - c_{2} | {(\int_{0}^{1} | h (x) - S_{k} (h) (x) | d x)}^{1 / 2} \\ \leq | c_{1} - c_{2} | {(∥ h - S_{k} {(h) ∥}_{L^{\infty} ([0, 1])})}^{1 / 2} \leq \frac{C}{k^{α / 2}} \end{matrix}

It remains to bound the size of the minimal continuous wedgelet partition,

𝒲_{k}

, such that

f_{k} \in ℱ_{𝒲_{k}}

. A proof is given in Lemma 4.3 in [21]; it uses

h \in 𝒞^{1} ([0, 1])

.

□

Remark. For an arbitrary horizon function, the approximation rates obtained by non-linear wavelet approximation (with sufficiently smooth wavelets) cannot be better than

∥ f - f_{k} ∥_{L^{2} ({[0, 1]}^{2})} = O (\frac{1}{k^{1 / 2}})

where

f_{k}

is the non-linear k-term wavelet approximation of f. This means that for such a function, the asymptotical behavior in terms of approximation rates is strictly better for wedgelet decompositions than for wavelet decompositions. For a discussion on this topic, see Section 1.3 in [22].

Piecewise constant wedgelet representations are limited by the degree, zero, of the regression polynomials on each wedge. This is reflected by the choice of the horizon functions, which are not only piecewise smooth, but even piecewise constant. A similar phenomenon arises also in the case of approximation by Haar wavelets.

R. Willett and R. Nowak [39] extended the regression model to piecewise linear functions on each leaf of the wedgelet partition and called the according representations platelets. This allows for an improved approximation rate for larger classes of piecewise smooth functions.

Let h be a function in

𝒞 ([0, 1])

. We define the two subdomains,

S^{+}

and

S^{-}

, respectively, as the hypograph and the epigraph of h restricted to

{(0, 1)}^{2}

. In other words:

S^{+} = \{(x, y) \in {(0, 1)}^{2} | y > h (x)\}, S^{-} = \{(x, y) \in {(0, 1)}^{2} | y < h (x)\}

(33)

Let us introduce the following generalized class of horizon functions:

H o r_{1}^{α} ({[0, 1]}^{2}) {: = {f : [0, 1]}^{2} \to {R | f |}_{S^{+}} and f |_{S^{-}} \in 𝒞^{α} (S^{\pm}), h \in 𝒞^{α} ([0, 1])}

(34)

The following result from [39] gives approximation rates by platelet approximations for

H o r^{α}

.

Proposition 3 Let

f \in H o r_{1}^{α} ([0, 1])

for

1 < α \leq 2

. Then, the k-term platelet approximation error,

h_{k}

, satisfies:

∥ f - h_{k} ∥_{L^{2} ({[0, 1]}^{2})} = O (\frac{1}{k^{α / 2}})

(35)

Proof. A sketch of the proof is given by the following two steps: (1) The boundary between the two subdomains is approximated uniformly, like in [21]; (2) In the rest of the subdomains, we also use uniform approximation with dyadic cubes, together with the corresponding Hölder bounds. The partition generated consists of squares of sidelength at least

O (1 / k^{1 / 2})

. There are at most

O (k)

such squares.

□

5.2.3. Wedgelets and Consistency

Now, we apply the continuous approximation results to the consistency problem of the wedgelet estimator based on the above discretization. Note that, due to our specific discretization, the arguments below differ from those in [21].

Two ingredients are needed: Pass over to a suitable discretization and bound the number of generated discrete wedgelet partitions polynomially in n, in order to apply the general consistency results. Let us first state a discrete approximation lemma:

Lemma 6 Let f be an α horizon function in

H o r_{1}^{α}

with

1 < α < 2

. There is

C > 0

, such that for all

k \leq n \in N

, there is

(𝒫_{k}^{n}, h_{k}^{n},) \in S^{n}

, such that

| 𝒫_{k}^{n} | \leq k

and which satisfies:

∥ f - ι^{n} h_{k}^{n} ∥ \leq C \cdot (\frac{1}{k^{α / 2}} + \frac{k^{1 / 2}}{n^{1 / 2}})

(36)

Proof. The triangular inequality yields the following decomposition of the error:

∥ f - ι^{n} δ^{n} h_{k} ∥ \leq ∥ f - h_{k} ∥ + ∥ h_{k} - ι^{n} δ_{n} h_{k} ∥

The first term may be approximated by (35), whereas the second term corresponds to the discretization. Let us estimate the error induced by discretization.

One just has to split

{[0, 1)}^{2}

into

J_{n}^{k}

, the union of those squares in

𝒬_{n}

, which do not intersect the approximating wedge lines and

K_{n}^{k}

, the union of such squares meeting the approximating wedge lines. We obtain the following estimates:

∥ h_{k} - ι^{n} δ^{n} h_{k} ∥_{L^{2} (Q)}^{2} \leq \frac{C}{n^{2}} for any Q \in K_{n}^{k}, and for some constant, C > 0

Since there are, at most,

C^{'} k n

such squares, for some constant,

C^{'}

, not depending on k and n, this implies that:

∥ h_{k} - ι^{n} δ^{n} h_{k} ∥_{L^{2} (K_{n}^{k})}^{2} \leq \frac{C k n}{n^{2}} = \frac{C}{n} and {∥ h_{k} - ι^{n} δ^{n} h_{k} ∥}_{L^{2} (J_{k}^{n})}^{2} \leq \frac{C k}{n}

where

C > 0

is a constant. Taking

h_{k}^{n} = δ^{n} h_{k}

completes the proof.

□

Finally, the following lemma provides an estimate of the number of fragments in

ℛ^{n}

.

Lemma 7 There is a constant,

C > 0

, such that for all

n \in M

, the number,

| ℛ^{n} |

, of fragments used to form the wedgelet partitions is bounded as follows:

| ℛ^{n} | \leq C n^{4}

Proof. In a dyadic square of size, j, there are at most

j^{4}

possible digital lines. For dyadic

n \in M

, one can write

n = 2^{J}

, and therefore, we have:

\begin{matrix} | ℛ^{n} | & \leq & \sum_{i = 0}^{J} 2^{2 (J - i)} \cdot 2^{2 \cdot 2 i} = n^{2} \sum_{i = 0}^{J} 2^{2 i} \\ = & n^{2} \cdot \frac{2^{2 J + 2} - 1}{2^{2} - 1} \leq C \cdot n^{4} for some constant C > 0 \end{matrix}

This completes the proof.

□

Note that the discretization of the continuous approximation,

h_{k}

, leads to a wedgelet partition composed of fragments in

ℛ^{n}

. Therefore, combination of the Lemmatas 6 and 7 yields:

Theorem 7 Let

α \in (1, 2)

, and let f be an α horizon function in

H o r_{1}^{α} ({[0, 1]}^{2})

. Assume further that the noise is such that (H1.2) holds and suppose that the parameters,

γ_{n}

, satisfy (H1.3) with

κ = 4

. Then:

∥ {\hat{f}}_{γ_{n}}^{n} {(ω) - f ∥}^{2} = O (γ_{n}^{\frac{α}{α + 1}}) + O (n^{- \frac{α}{α + 1}}), f o r a l m o s t a l l ω \in Ω

(37)

where

{\hat{f}}_{γ_{n}}^{n}

is the wedgelet-platelet estimator.

Remark 6 Choosing

γ_{n}

of the order

ln n / n^{2}

, Estimate (37) reads:

∥ {\hat{f}}_{γ_{n}}^{n} {(ω) - f ∥}^{2} = O (\frac{{(ln n)}^{\frac{2 α}{α + 1}}}{n^{\frac{2 α}{α + 1}}}) + O (\frac{1}{n^{\frac{α}{α + 1}}}) f o r a l m o s t a l l ω \in Ω

(38)

Whereas the first term on the right-hand side consists of the best compromise between approximation and noise removal, the second term on the right-hand size corresponds to the discretization error. Note that, in contrast to the

1 D

-case the discretization error asymptotically dominates the first term. This is related to the piecewise constant nature of the discretization. In concrete applications, this may severely limit the actual quality of the estimation. Finally, note that, neglecting this discretization problem, the decay rate given by (38) is of the same order as the bounds for the decay rates of the risk, which would have been obtained by model selection.

On the left column of Figure 3, wedgelet estimators for a typical noisy horizon function are shown.

Figure 3. Estimators of the noisy images of Figure 2. (Left) piecewise linear wedgelet estimator; (Right) piecewise linear and continuous Delaunay estimators.

5.3. Triangulations

Adaptive triangulations have been used since the emergence of early finite element methods to approximate solutions of elliptic differential equations. They have been also used in the context of image approximation; we refer to [40] for an account on recent triangulation methods applied to image approximation. The idea to use discrete triangulations leading to partitions based on a polynomially growing number of triangles has been proposed in [41] in the context of piecewise constant functions over triangulations. In the present example, we deal with a different approximation scheme, where the triangulations are Delaunay triangulations and where the approximating functions are continuous linear splines. One key ingredient is the use of recent approximation results, [26], that show the asymptotical optimality of approximations based on Delaunay triangulations having at most n vertices. Due to this specific approximation context, a key feature for the proof of the consistency is a suitable discretization scheme, which still preserves the approximation property.

5.3.1. Continuous and Discrete Triangulations

Let us start with some definitions. We begin with triangulations in the continuous settings:

Definition 3 A conforming triangulation,

T

, of the domain,

{[0, 1]}^{2}

, is a finite set,

{T}_{T \in T}

, of closed triangles,

T \subset {[0, 1]}^{2}

, satisfying the following conditions:

(i): The union of the triangles in $T$ covers the domain, ${[0, 1]}^{2}$ ;
(ii): For each pair, $T, T^{'} \in T$ , of distinct triangles, the intersection of their interior is empty;
(iii): Any pair of two distinct triangles in $T$ intersects at most in one common vertex or along one common edge.

We denote the set of (conforming) triangulations by

𝒯 ({[0, 1]}^{2})

. We will use the term triangulations for conforming triangulations.

Accordingly, we define the following discrete sets, relatively to partitions,

𝒬_{k} = {[(i - 1) / k, i / k) \times [(j - 1) / k, j / k) : i, j = 1, \dots, k}

of

{[0, 1)}^{2}

into k squares each of side length,

1 / k

.

For

a, b \in {[0, 1]}^{2}

, we denote by

[a, b]

the line segment with endpoints, a and b.

Definition 4 For a triangle,

T \subset {[0, 1]}^{2}

, with vertices

a, b

and c, we define the following discrete sets:

(i): For each $p \in {a, b, c}$ , the square, $Q \in 𝒬_{n}$ , such that $Q ∋ p$ is called a discrete vertex of T;
(ii): For each edge, $e \in {[a b], [b c], [c a]}$ , the set of squares, $Q \in 𝒬_{n}$ , such that $Q \cap e \neq \emptyset$ , and Q is not a discrete vertex, called a discrete (open) edge of the triangle, T;
(iii): The set of squares, $Q \in 𝒬_{n}$ , such that $Q \cap T \neq \emptyset$ and Q, is neither a discrete vertex nor belongs to a discrete open edge, called a discrete open triangle.

5.3.2. Piecewise Polynomials Functions on Triangulations

We take

S^{n} = {1, \dots, n}^{2}

, and the set of fragments,

ℛ^{n}

, is given as the set of discrete vertices, open edges and open triangles :

ℛ^{n} = S^{n} \cup \{([a b]) : a, b \in S^{n}\} \cup \{([a b c]) : a, b, c \in S^{n}\}

We let

P^{n}

then be the collection of partitions of

S^{n}

into discrete triangles, obtained from continuous triangulations, and assuming that there is a rule deciding to which triangle discrete open segments and discrete vertices belong. Each such discrete triangle is then the union of elementary digital sets in

ℛ^{n}

. We remark that

| ℛ^{n} | = n + n (n - 1) / 2 + n (n - 1) (n - 2) / 6

and, therefore,

| ℛ^{n} | \sim n^{3} / 6

. Like in the one-dimensional case, as described in Section 5.1, we choose the finite dimensional linear subspace,

ℱ_{p} \subset L^{2} ([0, 1))

, of polynomials of maximal degree, p. The induced segmentation classes,

S^{n} (P^{n}, F^{n})

, consist of piecewise polynomial functions relative to partitions in

P^{n}

.

We have the following approximation lemma:

Lemma 8 Let

f \in 𝒞^{α} ({[0, 1]}^{2})

, with

p < α < p + 1

. There is

C > 0

, such that for all

k \leq n \in N

, there is

(𝒫_{k}^{n}, h_{k}^{n}) \in S^{n}

, such that

| 𝒫_{k}^{n} | \leq k

and which satisfies:

∥ f - ι^{n} h_{k}^{n} ∥ \leq C \cdot (\frac{1}{k^{α / 2}} + {(\frac{k}{n})}^{1 / 2})

(39)

Proof. We first use classical approximation theory, which tells us the existence of a function,

h_{k} : {[0, 1]}^{2} \mapsto R

, piecewise polynomially relative to a triangulation with k triangles and such that the error on the whole domain is bounded by:

∥ f - h_{k} ∥ \leq \frac{C}{k^{α / 2}}

As in the 1-D case, we split

{[0, 1)}^{2}

into the union,

J_{n}^{k}

, of those squares in

𝒬_{n}

, which do not meet the continuous triangulation, and

K_{n}^{k}

, the set of such squares meeting the triangulation, i.e., which intersects with some edge of the triangulation. For each small square,

Q \in 𝒬_{n}

and

Q \subset K_{n}^{k}

, the following estimate holds:

∥ h_{k} - ι^{n} δ^{n} h_{k} ∥_{L^{2} (Q)}^{2} \leq \frac{C}{n^{2}} for any Q \in K_{n}^{k}, and some constant C > 0

and there are at most

3 \cdot \sqrt{2} k n

such squares. Altogether, we obtain:

∥ h_{k} - ι^{n} δ^{n} h_{k} ∥_{L^{2} (K_{n}^{k})} \leq \frac{C k^{1 / 2}}{n^{1 / 2}}, for some constant C > 0

Now, for each square,

Q \in 𝒬_{n}

and

Q \subset J_{n}^{k}

, an argumentation similar to that in the

1 D

-proof yields:

∥ h_{k} - ι^{n} δ^{n} h_{k} ∥_{L^{2} (J_{k}^{n})} \leq \frac{C}{n}

This completes the proof.

□

Due to Lemma 8, (18) is satisfied: A function in

𝒞^{α}

satisfies (18) with

ρ = 1 / 2

and

F_{n} = n^{1 / 2}

and, therefore, Theorem 5 applies.

5.3.3. Continuous Linear Splines

We turn now to the more subtle case of continuous linear splines on Delaunay triangulations. Anisotropic Delaunay triangulations have been recently applied successfully to the design of a full image compression/decompression scheme, [25,42]. Here, we investigate the behavior of such triangulation schemes in the context of image estimation.

To this end, we first introduce the associated function space in the continuous setting. We restrict the discussion to the case of piecewise affine functions, i.e.,

p = 1

:

Definition 5 Let

T

be a conforming triangulation of

{[0, 1]}^{2}

. Let

S_{T}^{0} = \{f \in 𝒞 ({[0, 1]}^{2}) : f |_{T} \in ℱ_{1}, T \in T\}

be the set of piecewise affine and continuous functions on

T

.

The following piecewise smooth functions generalize the horizon functions from (34):

Definition 6 Let

α \in (1, 2)

and

g \in 𝒞^{α} ([0, 1])

. Let

S^{+}

and

S^{-}

be two subdomains defined as in (33). A generalized α-horizon function is an element of the set,

ℋ^{α, 2} ({[0, 1]}^{2}) : = \{f \in L^{2} ({[0, 1]}^{2}) {| f |}_{S^{+}} {, f |}_{S^{-}} \in W^{α, 2} (S^{\pm})\}

where

W^{α, 2} (S^{\pm})

is the Sobolev space of regularity, α, relative to the

L^{2}

-norm on

S^{\pm}

.

In order to obtain convergence rates of the triangulation-based estimators for this class of functions, we need the following recent result, Theorem 4 in [26]:

Theorem 8 Let f be an α-horizon function in

H o r_{1}^{α}

, with

α \in (1, 2)

, such that

{f |}_{S^{\pm}} \in W^{α, 2} (S^{\pm})

. Then, there is

C > 0

, such that for all

k \in N

, there is a Delaunay triangulation

D_{k}

with k vertices and such that:

∥ f - π_{S_{D_{k}}^{0}} {f ∥}_{L^{2} ({[0, 1]}^{2})} \leq \frac{C}{k^{α / 2}}

Using arguments as in the proof of Lemma 8, we obtain the following lemma:

Lemma 9 Let

f \in ℋ^{α, 2} ({[0, 1]}^{2})

, with

1 < α < 2

there is

C > 0

such that for all

k \leq n \in N

, there is

(𝒫_{k}^{n}, h_{k}^{n})

, such that

𝒫_{k}^{n} \in P^{n}

is a discretization of a continuous Delaunay triangulation,

D_{k}

,

| 𝒫_{k}^{n} | \leq k

,

h_{k}^{n} = δ^{n} h_{k}

, where

h_{k} \in S_{D_{k}}^{0}

and which satisfies:

∥ f - ι^{n} h_{k}^{n} ∥ \leq C \cdot (\frac{1}{k^{α / 2}} + \frac{k^{1 / 2}}{n^{1 / 2}})

The previous machinery cannot be applied directly without an explanation: Since we are dealing with the space of continuous linear splines, our scheme is not properly a projective

ℱ

-segmentation class. However, for each fixed partition,

𝒫 \in P

, with elements in

ℛ^{n}

,

S_{T}^{0}

, a subspace of

ℱ_{P}

. Observe that all arguments in Lemma 1 remain valid if we replace

ℱ_{P}

by subspaces and consider also the minimization of the functional

H_{γ}^{n}

over functions in these subspaces. We can thus apply Theorem 5 to obtain the equivalent of Theorem 6.

Theorem 9 Let

α \in (1, 2)

and let f be a generalized horizon function in

ℋ^{α} ({[0, 1]}^{2})

. Let us further assume that noise in (4) is such that (H1.2) holds. Assume further that

γ_{n}

satisfy (H1.3), with

κ = 3

. Then:

∥ {\hat{f}}_{γ_{n}}^{n} {(ω) - f ∥}^{2} = O (γ_{n}^{\frac{α}{α + 1}}) + O (n^{- \frac{α}{α + 1}}) f o r a l m o s t a l l ω \in Ω

(40)

where

{\hat{f}}_{γ_{n}}^{n}

is the Delaunay estimator.

Proof. We check the assumptions in Theorem 5. Since

| ℛ^{n} |

is of the order

{(n^{2})}^{3}

, Hypothesis (H1.1) holds with

κ = 3

. Hypothesis (H1.2) and (H1.3) were required separately. Finally, (18) holds with

ϱ = 1 / 2

and

F_{n} = n^{1 / 2}

by Lemma 9. This completes the proof.

□

Remark 7 Similarly to Remark 6 and choosing

γ_{n}

of the order

ln n / n^{2}

, Estimate (40) reads:

∥ {\hat{f}}_{γ_{n}}^{n} {(ω) - f ∥}^{2} = O (\frac{{(ln n)}^{\frac{2 α}{α + 1}}}{n^{\frac{2 α}{α + 1}}}) + O (\frac{1}{n^{\frac{α}{α + 1}}}) f o r a l m o s t a l l ω \in Ω

As in Remark 6, we observe that the discretization error asymptotically dominates over the other term. Again, neglecting the discretization term, we have obtained almost sure estimates of the same asymptotical decay order as those one would have obtained by model selection to control the risk of the estimators.

Whereas the first term on the right-hand size consists of the best compromise between approximation and noise removal, the second term on the right-hand size corresponds to the discretization error. Note that, in contrast to the

1 D

-case, the discretization error asymptotically dominates the first term. This is related to the piecewise constant nature of the discretization. In concrete applications, this may severely limit the actual quality of the estimation. Neglecting this discretization problem, the decay rates given by (38) are the usual optimal rates for the function class under consideration.

On the right column of Figure 3, estimators by Delaunay triangulation are shown, for the same noisy horizon function as in the wedgelet case.

The rates in Theorem 9 are, up to a logarithmic factor, similar to the minimax rates obtained in [22] with curvelets for

α = 2

and, more recently, in [24] with bandelets for general α. This is in contrast to isotropic approximation methods, e.g., shrinkage of tensor product wavelet coefficients, which only attain the rate for

α = 1

.

6. Conclusion

In the last section, we have mainly discussed the application of our abstract consistency results to two partitioning schemes, wedgelets and Delaunay triangulations. Note that the classes of functions covered by our results are very simple prototypes for images dominated by geometric information. Recently, optimal decay rates of the wedgelet approximations were obtained for larger classes of piecewise regular images, which include possible junctions between the curves of discontinuities (see, e.g., [43], Theorem 5.16). This suggests that the proof of optimal rates presented in this paper could be easily transferred to more general classes of piecewise regular images. Note that the result of Theorem 5.16 in [43] requires that at the intersection between two curves, a certain conic condition is fulfilled. Such a condition or a similar one, which, for instance, prevents singularities of the cusp type, seems to be necessary to obtain the desired generalization.

Acknowledgments

Finally, the authors thank the editors and the unknown reviewers for their efforts and valuable suggestions.

Conflict of Interest

The authors declare no conflict of interest.

References

Geman, S.; Geman, D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. PAMI 1984, 6, 721–741. [Google Scholar] [CrossRef]
Winkler, G. Stochastic Modelling and Applied Probability. In Image Analysis, Random Fields and Markov Chain Monte Carlo Methods: A Mathematical Introduction, 2nd ed.; Springer Verlag: Berlin, Germany, 2003; Volume 27. [Google Scholar]
Tukey, J.W. Curves as Parameters, and Touch Estimation. In Proceedings of 4th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 20 June–30 July 1960; University of California Press: Berkeley, CA, USA, 1961; Volume I, pp. 681–694. [Google Scholar]
Potts, R.B. Some generalized order-disorder transitions. Proc. Camb. Philos. Soc. 1952, 48, 106–109. [Google Scholar] [CrossRef]
Ising, E. Beitrag zur Theorie des Ferromagnetismus. Z. Phys. 1925, 31, 253–258. [Google Scholar] [CrossRef]
Lenz, W. Beiträge zum Verständnis der magnetischen Eigenschaften in festen Körpern. Phys. Z. 1920, 21, 613–615. [Google Scholar]
Liebscher, V.; Winkler, G. A Potts Model for Segmentation and Jump. In Proceedings S4G International Conference on Stereology, Spatial Statistics and Stochastic Geometry, Prague, Czekh, 21–24 June 1999; Benes, V., Janacek, J., Saxl, I., Eds.; Union of Czech Mathematicians and Physicists: Prague, Czekh, 1999; pp. 185–190. [Google Scholar]
Friedrich, F. Complexity Penalized Segmentations in 2D—Efficient Algorithms and Approximation Properties. Ph.D. Thesis, Munich University of Technology, Institute of Biomathematics and Biometry, National Research Center for Environment and Health, Munich, Germany, 2005. [Google Scholar]
Friedrich, F.; Kempe, A.; Liebscher, V.; Winkler, G. Complexity penalized M-estimation: Fast computation. J. Comput. Graph. Stat. 2008, 17, 1–24. [Google Scholar] [CrossRef]
Kempe, A. Statistical Analysis of Discontinuous Phenomena with Potts Functionals. Ph.D. Thesis, Institute of Biomathematics and Biometry, National Research Center for Environment and Health, Munich, Germany, 2004. [Google Scholar]
Winkler, G.; Kempe, A.; Liebscher, V.; Wittich, O. Parsimonious Segmentation of Time Series by Potts Models. In Proceedings of 27th Annual GfKl Conference on Innovations in Classification, Data Science, and Information Systems, Cottbus, Germany, 12–14 March 2003; Baier, D., Wernecke, K.-D., Eds.; Springer-Verlag: Heidelberg-Berlin, Germany, 2004; pp. 295–302. [Google Scholar]
Winkler, G.; Liebscher, V. Smoothers for discontinuous signals. J. Nonparametr. Stat. 2002, 14, 203–222. [Google Scholar] [CrossRef]
Winkler, G.; Wittich, O.; Liebscher, V.; Kempe, A. Don’t shed tears over breaks. Jahresber. Deutsch. Math.-Ver. 2005, 107, 57–87. [Google Scholar]
Wittich, O.; Kempe, A.; Winkler, G.; Liebscher, V. Complexity penalized least squares estimators: Analytical results. Math. Nachr. 2008, 281, 1–14. [Google Scholar] [CrossRef]
Boysen, L.; Kempe, A.; Liebscher, V.; Munk, A.; Wittich, O. Consistencies and rates of convergence of jump-penalized least squares estimators. Ann. Stat. 2009, 37, 157–183. [Google Scholar] [CrossRef]
Boysen, L.; Liebscher, V.; Munk, A.; Wittich, O. Scale space consistency of piecewise constant least squares estimators—Another look at the regressogram. IMS Lect. Notes-Monogr. Ser. 2007, 55, 65–84. [Google Scholar]
Friedrich, F.; Demaret, L.; Führ, H.; Wicker, K. Efficient moment computation over polygonal domains with an application to rapid wedgelet approximation. SIAM J. Sci. Comput. 2007, 29, 842–863. [Google Scholar] [CrossRef]
Birgé, L.; Massart, P. From Model Selection to Adaptive Estimation. In Festschrifft for Lucien Le Cam: Research Papers in Probability and Statistics; Springer: New York, NY, USA, 1997; pp. 55–87. [Google Scholar]
Donoho, D.L.; Johnstone, I.M. Ideal Denoising in an Orthonormal Basis Chosen from a Library of Bases; Gauthier-Villars: Paris, France, 1994; Volume 319, pp. 1317–1322. [Google Scholar]
Korostelev, A.P.; Tsybakov, A.B. Minimax Theory of Image Reconstruction; Lecture Notes in Statistics 82; Springer: New York, NY, USA, 1993. [Google Scholar]
Donoho, D. Wedgelets: Nearly minimax estimation of edges. Ann. Stat. 1999, 27, 859–897. [Google Scholar] [CrossRef]
Candès, E.; Donoho, D. New tight frames of curvelets and optimal representations of objects with piecewise-C2 singularities. Comm. Pure Appl. Math. 2002, 57, 219–266. [Google Scholar] [CrossRef]
Le Pennec, E.; Mallat, S. Sparse geometrical image approximation using bandelets. IEEE Trans. Image Process. 2005, 14, 423–438. [Google Scholar] [CrossRef] [PubMed]
Dossal, C.; Mallat, S.; Le Pennec, E. Bandelets image estimation with model selection. Signal Process. 2011, 91, 2743–2753. [Google Scholar] [CrossRef] [Green Version]
Demaret, L.; Dyn, N.; Iske, A. Image compression by linear splines over adaptive triangulations. Signal Process. J. 2006, 86, 1604–1616. [Google Scholar] [CrossRef]
Demaret, L.; Iske, A. Optimal n-term approximation by linear splines over anisotropic Delaunay triangulations. Preprint. 2012. Available online: http://www.math.uni-hamburg.de/home/iske/papers/atapprox.pdf (accessed on 27 June 2013).
Birgé, L.; Massart, P. Minimal penalties for Gaussian model selection. Probab. Theory Relat. Fields 2007, 138, 33–73. [Google Scholar] [CrossRef]
Elad, M. Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing; Springer Verlag: New York, NY, USA, 2010. [Google Scholar]
Buldygin, V.V.; Kozachenko, Y.V. Metric Characterization of Random Variables and Random Processes; American Mathematical Society: Providence, Rhode Island, 2000. [Google Scholar]
Petrov, V.V. Sums of Independent Random Variables; Springer Verlag: New York, NY, USA, 1975. [Google Scholar]
Chow, Y.S. Some convergence theorems for independent random variables. Ann. Math. Stat. 1966, 35, 1482–1493. [Google Scholar] [CrossRef]
Kahane, J.P. Séminaire de Mathématiques Supérieures; Technical Report for Université de Montréal: Montréal, QC, Canada, 1963. [Google Scholar]
Demaret, L.; Friedrich, F.; Liebscher, V.; Winkler, G. Complexity l⁰-penalized m-estimation: Consistency in more dimensions. Available online: http://arxiv.org/abs/1301.5492 (accessed on 27 June 2013).
Lisowska, A. Smoothlets: Multiscale functions for adaptive representation of images. IEEE Trans. Image Process. 2011, 20, 1777–1787. [Google Scholar] [CrossRef] [PubMed]
Györfi, L.; Kohler, M.; Krzyzak, A.; Walk, H. A Distribution-Free Theory of Nonparametric Regression; Springer Series in Statistics; Springer Verlag: New York, NY, USA, 2002. [Google Scholar]
DeVore, R.; Lorentz, G. Constructive Approximation. In Grundlehren der Mathematischen Wissenschaften; Springer Verlag: Heidelberg, Germany, 1993. [Google Scholar]
Dupont, T.; Scott, R. Polynomial approximation of functions in Sobolev spaces. Math. Comput. 1980, 34, 441–463. [Google Scholar] [CrossRef]
Bresenham, J. Algorithm for computer control of a digital plotter. IBM Syst. J. 1965, 4, 25–30. [Google Scholar] [CrossRef]
Willett, R.; Nowak, R. Platelets: A multiscale approach for recovering edges and surfaces in photon-limited medical imaging. IEEE Trans. Med. Imaging 2003, 22, 332–350. [Google Scholar] [CrossRef] [PubMed]
Demaret, L.; Iske, A. Anisotropic Triangulation Methods in Image Approximation. In Algorithms for Approximation; Georgoulis, E.H., Iske, A., Levesley, J., Eds.; Springer-Verlag: Berlin, Germany, 2010; pp. 47–68. [Google Scholar]
Candès, E. Modern statistical estimation via oracle inequalities. Acta Numer. 2005, 15, 257–325. [Google Scholar] [CrossRef]
Demaret, L.; Iske, A.; Khachabi, W. Contextual Image Compression from Adaptive Sparse Data Representations. In Proceedings of SPARS’09, Saint-Malo, France, 6–9 April 2009.
Demaret, L. Geometric anisotropy and image approximation. Habilitationsschrifft 2013. submitted. [Google Scholar]

© 2013 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Demaret, L.; Friedrich, F.; Liebscher, V.; Winkler, G. Complexity L0-Penalized M-Estimation: Consistency in More Dimensions. Axioms 2013, 2, 311-344. https://doi.org/10.3390/axioms2030311

AMA Style

Demaret L, Friedrich F, Liebscher V, Winkler G. Complexity L0-Penalized M-Estimation: Consistency in More Dimensions. Axioms. 2013; 2(3):311-344. https://doi.org/10.3390/axioms2030311

Chicago/Turabian Style

Demaret, Laurent, Felix Friedrich, Volkmar Liebscher, and Gerhard Winkler. 2013. "Complexity L0-Penalized M-Estimation: Consistency in More Dimensions" Axioms 2, no. 3: 311-344. https://doi.org/10.3390/axioms2030311

APA Style

Demaret, L., Friedrich, F., Liebscher, V., & Winkler, G. (2013). Complexity L0-Penalized M-Estimation: Consistency in More Dimensions. Axioms, 2(3), 311-344. https://doi.org/10.3390/axioms2030311

Article Menu

Complexity L0-Penalized M-Estimation: Consistency in More Dimensions

Abstract

1. Introduction

2. The Setting

2.1. Regression and Segmentations

2.2. Complexity Penalized Least Squares Estimation

3. Noise and Its Projections

3.1. Sub-Gaussian Noise and a Tail Estimate

3.2. Noise Projections

3.3. Discrete and Continuous Functionals

3.4. Upper Bound for Projective Segmentation Classes

4. Consistency

4.1. $L^{2}$ -Convergence

4.2. Convergence Rates

5. Special Segmentations

5.1. One Dimensional Signals—Interval Partitions

5.2. Wedgelet Partitions

5.2.1. Wedgelet Partitions

5.2.2. Wedgelets and Approximations

5.2.3. Wedgelets and Consistency

5.3. Triangulations

5.3.1. Continuous and Discrete Triangulations

5.3.2. Piecewise Polynomials Functions on Triangulations

5.3.3. Continuous Linear Splines

6. Conclusion

Acknowledgments

Conflict of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Complexity L0-Penalized M-Estimation: Consistency in More Dimensions

Abstract

1. Introduction

2. The Setting

2.1. Regression and Segmentations

2.2. Complexity Penalized Least Squares Estimation

3. Noise and Its Projections

3.1. Sub-Gaussian Noise and a Tail Estimate

3.2. Noise Projections

3.3. Discrete and Continuous Functionals

3.4. Upper Bound for Projective Segmentation Classes

4. Consistency

4.1. L 2 -Convergence

4.2. Convergence Rates

5. Special Segmentations

5.1. One Dimensional Signals—Interval Partitions

5.2. Wedgelet Partitions

5.2.1. Wedgelet Partitions

5.2.2. Wedgelets and Approximations

5.2.3. Wedgelets and Consistency

5.3. Triangulations

5.3.1. Continuous and Discrete Triangulations

5.3.2. Piecewise Polynomials Functions on Triangulations

5.3.3. Continuous Linear Splines

6. Conclusion

Acknowledgments

Conflict of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.1. $L^{2}$ -Convergence