A Probabilistic Result on Impulsive Noise Reduction in Topological Data Analysis through Group Equivariant Non-Expansive Operators

Frosini, Patrizio; Gridelli, Ivan; Pascucci, Andrea

doi:10.3390/e25081150

Open AccessFeature PaperArticle

A Probabilistic Result on Impulsive Noise Reduction in Topological Data Analysis through Group Equivariant Non-Expansive Operators

by

Patrizio Frosini

^*,†

,

Ivan Gridelli

^† and

Andrea Pascucci

^†

Department of Mathematics, University of Bologna, 40126 Bologna, Italy

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Entropy 2023, 25(8), 1150; https://doi.org/10.3390/e25081150

Submission received: 5 May 2023 / Revised: 22 July 2023 / Accepted: 25 July 2023 / Published: 31 July 2023

(This article belongs to the Special Issue Topological Data Analysis Meets Information Theory. New Perspectives for the Analysis of Higher-Order Interactions in Complex Systems)

Download

Browse Figures

Versions Notes

Abstract

:

In recent years, group equivariant non-expansive operators (GENEOs) have started to find applications in the fields of Topological Data Analysis and Machine Learning. In this paper we show how these operators can be of use also for the removal of impulsive noise and to increase the stability of TDA in the presence of noisy data. In particular, we prove that GENEOs can control the expected value of the perturbation of persistence diagrams caused by uniformly distributed impulsive noise, when data are represented by L-Lipschitz functions from

R

to

R

.

Keywords:

GENEO; impulsive noise; persistence diagram; persistent homology; machine learning

MSC:

55N31; 62R40; 60-08; 65D18; 68T09; 68U05

1. Introduction

In the last thirty years, Topological Data Analysis (TDA) developed as a useful mathematical theory for analyzing data [1], benefiting from the dimensionality reduction guaranteed by topology [2,3,4]. Topology can construct low-dimensional representations of high-dimensional manifolds of data, thereby reducing the dimensionality of the parameter space and ultimately reducing the learning cost of machine learning models. One of the main tools in TDA is the concept of the persistence diagram, which is a collection of points in the real plane that describes the homological changes of the sublevel sets of suitable continuous functions. These changes give important information about the data of interest, focusing on some of their most relevant properties. Persistence diagrams can be used in the presence of noise, since a well-known stability theorem states that these topological descriptors change in a controlled way when we know that the functions expressing the filtrations we are interested in change in a controlled way with respect to the sup-norm. More precisely, the sup-norm distance between two functions is an upper bound for the bottleneck distance between the corresponding persistence diagrams [5]. Furthermore,

L^{p}

-stability of persistence diagrams with respect to the sup-norm have been proved in [6]. Unfortunately, in many applications, the sup-norm of the noise is not guaranteed to be small, and hence these results cannot be directly applied. In particular, these results cannot be directly used when data are represented by functions from

R

to

R

and affected by impulsive noise, i.e., noise that carries a sudden sharp change of short duration, when the variable is seen as a time. It is indeed well known that persistence diagrams can completely change when the data are subject to impulsive noise.

Analogously, in the discrete setting of TDA, the presence of outliers in cloud points can drastically affect the corresponding persistence diagrams. The problem of managing outliers has been studied by several authors with different techniques. In [7], an approach based on confidence sets has been introduced. A method inspired by the k-nearest neighbors regression and the local median filtering has been used in [8], while the concept of the bagplot has been applied in [9]. In [10], an approach based on reproducing kernels has been proposed.

In our paper, we start exploring a different probabilistic approach, in the topological setting. The main idea is that studying the properties of data should primarily be based on the analysis of the observers that are asked to examine the data, since we cannot ignore that different observers can differently judge the same data. This approach has been initially proposed in [11] and requires both the definition of the space of group equivariant non-expansive operators (GENEOs) and the development of geometrical techniques to move around in this space [12,13]. In other words, in this model we should not wonder how we have to manage the data but rather how we have to manage the observers (i.e., GENEOs) analyzing the data. As a first step in this direction, in this paper we show how GENEOs could be used to obtain stability of persistence diagrams of 1D signals in the presence of impulsive noise. For the use of such operators in the comparison of 1D signals under the action of several transformation groups, we refer the interested reader to the paper [14]. This type of signal is important in many applications, such as those concerning EEG data and time series.

GENEOs have been studied in [14] as a new tool in TDA, since they allow for an extension of the theory that is not invariant under the action of every homeomorphism of the considered domain. This is important in applications where the invariance group is not the group of all homeomorphisms, such as the ones concerning shape comparison. Interestingly, GENEOs are also deeply related to the foliation method used to define the matching distance in two-dimensional persistent homology [15,16] and can be seen as a theoretical bridge between TDA and Machine Learning [11]. Furthermore, these operators make available lower bounds for the natural pseudo-distance

d_{G} (φ_{1}, φ_{2}) : = inf_{g \in G} {∥ φ_{1} - φ_{2} \circ g ∥}_{\infty}

, associated with a group G of self-homeomorphisms of the domain of the signals

φ_{1}, φ_{2}

[14]. For general information about the interest in the theory of GENEOs and its applications, we refer the reader to [17].

In our paper, we prove that GENEOs can control the expected value of the perturbation of persistence diagrams caused by uniformly distributed impulsive noise when data are represented by L-Lipschitz functions from

R

to

R

. In order to do that, we choose a “mother” bump function

ψ

, i.e., a continuous function that is non-negative, upper bounded by 1, and compactly supported, and we assume that our noise is made up of finite bumps each obtained by translating, heightening, and/or widening

ψ

. The function

\hat{φ} = φ + R

represents the corrupted data, where the noise is given by the function

R = \sum_{i = 1}^{k} a_{i} ψ (b_{i} (x - c_{i}))

for some

a_{i}, b_{i}, c_{i} \in R

, with

b_{i} > 0

, and some positive integer k.

In this situation, trying to use a convolution to approximate our starting data is not effective, because even if it does contract the bumps, it does not cut them and hence does not improve the sup-norm distance from the original data. A classical approach for the removal of impulsive noise is the one of using a median filter [18]. Although this approach would be quite efficient in the discrete case, let us remark that in our setting, we are considering continuous functions. The median for the continuous case is defined as the interval of those values m such that

\int_{- \infty}^{m} f (x) d x = \int_{m}^{+ \infty} f (x) d x,

where f is a density of probability. However, this operator is not stable: a small alteration in the starting function could lead to a significant change in the median.

The operators we consider in this paper are

F^{δ} (φ) (x) = max (φ (x - δ), φ (x + δ))

and

F_{ε} (φ) (x) = min (φ (x - ε), φ (x + ε))

. The main idea is that

F^{δ}

removes the noise “directed downwards” and

F_{ε}

the noise “directed upwards”, and hence, their composition should be able to eliminate all the bumps. These operators are GENEOs with respect to isometries of the real line. We prove that moving in the space of GENEOs by taking suitable values for

ε

and

δ

, we can get quite close to restoring the original function

φ

, depending on how the bumps are positioned. The closer the bumps are to being Dirac delta functions and the further they are from each other, the better our approximation can be. On the ground of this result, we finally obtain an estimate of the expected value

E (∥ F^{δ} \circ F_{ε} (\hat{φ}) - φ ∥_{\infty})

.

We conclude this introduction by observing that, while the operators we use in this paper are elementary, the theory of GENEOs makes available procedures to learn more complex operators by combining simple GENEOs through techniques belonging to the rising field of Geometric Deep Learning [11,12,13]. For this reason, we hope that our results can not only illustrate a practical use of some particular GENEOs but also be a first step towards the application of Machine Learning for increasing the stability of TDA in the presence of noise.

The paper is structured as follows. In Section 2, the mathematical background is laid. The case we consider and our notations are explained in Section 3. In Section 4, we prove the results that are needed in order to demonstrate our main results. In Section 5, the main theorems giving us probabilistic upper bounds are formulated. In Section 6, some examples and experiments are presented in order to better illustrate the use of our results. A brief discussion concludes the paper.

2. Mathematical Setting

In this section, we will recall some basic concepts we will use in this paper.

To avoid misunderstandings about pairs and intervals, we will use the symbols

(u, v)

,

[u, v]

, and

] u, v [

to denote a pair, a closed interval, and an open interval, respectively.

2.1. Representing Data as Real Functions

Let us consider a set

Φ

of bounded functions from a set X to

R

, which will represent the data we wish to take into account (e.g., functions describing sounds). We shall call

Φ

the set of admissible measurements on X. We endow

Φ

with the topology induced by the sup-norm

{∥ \cdot ∥}_{\infty}

and the corresponding distance

D_{Φ}

. A pseudo-metric

D_{X}

can be defined on X by setting

D_{X} (x_{1}, x_{2}) = sup_{φ \in Φ} | φ (x_{1}) - φ (x_{2}) |

for every

x_{1}, x_{2} \in X

. We recall that a pseudo-metric on a set X is a distance d without the property

d (x, y) = 0 \Rightarrow x = y

(in other words, we do not have the so-called “identity of indiscernibles”). We will consider the topological space

(X, τ_{D_{X}})

where

τ_{D_{X}}

is the topology induced by

D_{X}

. A base for this topology is given by the set of open balls

{B (x, r) : x \in X, r \in R}

, where

B (x, r) : = {x^{'} \in X : D_{X} (x, x^{'}) < r}

. The choice of this topology makes every function in

Φ

a continuous functions. As shown in [11], this fact enables us to use persistence diagrams in the study of

Φ

.

2.2. GENEOs as Operators Acting on Data

We are interested in considering transformations of data. Let

{Homeo}_{Φ} (X)

be the group of

Φ

-preserving homeomorphisms from X to X with respect to the topology

τ_{D_{X}}

, meaning that every g in

{Homeo}_{Φ} (X)

is a homeomorphism of X such as both

φ \circ g

and

φ \circ g^{- 1}

belong to

Φ

for every

φ

in

Φ .

Let G be a subgroup of

{Homeo}_{Φ} (X)

. G represents the set of transformations on data for which we will require equivariance to be respected.

Under the previously stated assumptions, we call the ordered pair

(Φ, G)

a perception pair. We can now introduce the concept of GENEO.

Definition 1.

Let

(Φ, G)

and

(Ψ, H)

be perception pairs and assume that a homomorphism

T : G \to H

is given. A function

F : Φ \to Ψ

is called a Group Equivariant Non-Expansive Operator (GENEO) from

(Φ, G)

to

(Ψ, H)

with respect to T if the following properties hold:

1.: (Group Equivariance) $F (φ \circ g) = F (φ) \circ T (g)$ for every $φ \in Φ, g \in G$ ;
2.: (Non-Expansivity) $D_{Ψ} (F (φ_{1}), F (φ_{2})) \leq D_{Φ} (φ_{1}, φ_{2})$ for every $φ_{1}, φ_{2} \in Φ$ .

Let us now consider the set

F_{all}

of all GENEOs from

(Φ, G)

to

(Ψ, H)

with respect to

T : G \to H

. The space

F_{all}

is endowed with the extended pseudo-metric

D_{F_{all}}

, defined by setting

D_{F_{all}} (F_{1}, F_{2}) = {sup}_{φ \in Φ} D_{Ψ} (F_{1} (φ), F_{2} (φ))

for every

F_{1}, F_{2} \in F_{all}

. The word extended refers to the possibility that

D_{F_{all}}

takes an infinite value.

The following result can be proven [11]:

Theorem 1.

If

(Φ, D_{Φ})

,

(Ψ, D_{Ψ})

are compact and convex then the metric space

(F_{all}, D_{F_{all}})

is compact and convex.

If a non-empty set

F \subseteq F_{all}

is fixed, we can define the following pseudo-distance

D_{F, Φ}

on

Φ

:

Definition 2.

For any

φ_{1}, φ_{2}

in Φ we set

D_{F, Φ} (φ_{1}, φ_{2}) = sup_{F \in F} {∥ F (φ_{1}) - F (φ_{2}) ∥}_{\infty} .

This pseudo-distance allows us to compare data by taking into account how agents operate on data. Notice how, if G becomes larger, the natural pseudo-distance

d_{G}

becomes harder to compute but this new pseudo-distance

D_{F, Φ}

becomes easier to evaluate. In order to find a lower bound for

D_{F, Φ}

it is useful to introduce the notion of persistence diagram.

2.3. Persistence Diagrams

We will now recall some basic definitions and results in persistent homology. The interested reader can find more details in [4].

Let us consider an ordered pair

(X, φ)

where X is a topological space and

φ : X \to R

is a continuous function. For any

t \in R

we can set

X_{t} : = φ^{- 1} (]- \infty, t])

. If

u < v

, the inclusion

i_{u, v} : X_{u} \to X_{v}

induces a homomorphism

i_{u, v}^{k} : H_{k} (X_{u}) \to H_{k} (X_{v})

between the kth homology groups of

X_{u}

and

X_{v}

. We can define the kth persistent homology group, with respect to

φ

and computed at the point

(u, v)

, as

P H_{k} (u, v) : = i_{u, v}^{k} (H_{k} (X_{u}))

. Moreover, we can define the kth persistent Betti numbers function

r_{k} (u, v)

as the rank of

P H_{k} (u, v)

.

The kth persistent Betti numbers function can be represented by the kth persistence diagram. This diagram is defined as the multi-set of all the ordered pairs

(u_{j}, v_{j}),

where

u_{j}

and

v_{j}

are the times of birth and death of the jth k-dimensional hole in X, respectively. We call time of birth of a hole the first time at which the homology class appears, and time of death the first time at which the homology class merges with an older one. When a hole never dies, we set its time of death equal to ∞. We also add to this set all points of the form

(w, w)

for

w \in R

.

In Figure 1 the filtration of the set

X : = [0, \frac{3}{4} π]

given by the function

φ (x) = 2 sin x

is illustrated. In this example, the topology on X is the one defined by the space

Φ

of admissible functions given by all functions

a sin (x + b)

with

a, b \in R

. The reader can easily check that this topology is the same as the Euclidean topology. The persistence diagram in degree

k = 0

of the function

φ

is displayed in Figure 2.

2.4. Comparing Persistence Diagrams

Persistence diagrams can be efficiently compared by means of a suitable metric

d_{match}

. In order to define it, we first define the pseudo-distance

δ ((x, y), (x^{'}, y^{'})) = min {max {| x - x^{'} |, | y - y^{'} |}, max {| x - y | / 2, | x^{'} - y^{'} | / 2}}

for all

(x, y), (x^{'}, y^{'}) \in {(x, y) \in R

with

x \leq y} \cup {(x, \infty)

with

x \in R}

by agreeing that

\infty - y = \infty, y - \infty = - \infty

for

y \neq \infty, \infty - \infty = 0, \infty / 2 = \infty, | \pm \infty | = \infty, min {\infty, c} = c,

max {\infty, c} = \infty

.

If two persistence diagrams,

D, D^{'}

, are given, we can set

d_{match} (D, D^{'}) = inf_{σ \in Σ} sup_{P \in D} δ (P, σ (P))

where

Σ

represents the set of all bijections between the multisets

D, D^{'}

. This function is usually called the bottleneck distance between D and

D^{'}

.

For every degree k we can now define a new pseudo-metric:

D_{F, Φ}^{match} (φ_{1}, φ_{2}) = sup_{F \in F} d_{match} (D_{F (φ_{1})}, D_{F (φ_{2})})

where

D_{F (φ_{1})}, D_{F (φ_{2})}

are the persistence diagrams at degree k of the functions

F (φ_{1})

,

F (φ_{2})

, respectively.

In this paper we will limit ourselves to considering data represented as functions from

R

to

R

, and we recall that for this kind of data, persistence diagrams are non-trivial only in degree

k = 0

(i.e., when persistent homology is used to count connected components). For this reason, in the following we will always assume

k = 0

.

3. Our Model

In this paper, we will be mainly interested in the set

{Lip}_{L}

of all L-Lipschitz functions from

R

to

R

, for some fixed constant

L \in R

, and in the set

C^{0} (R)

of all functions from

R

to

R

that are continuous with respect to the Euclidean topology. We will set

X = R

and consider the perception pairs

({Lip}_{L}, G)

,

(C^{0} (R), G)

, where G is the group of the Euclidean isometries of

R

.

We will assume our noise originates from a finite number of copies of a “mother” non-negative continuous bump function

ψ : R \to R

, such that

supp (ψ) \subseteq]- σ, σ[

for some

σ > 0

and

{∥ ψ ∥}_{\infty} \leq 1

. We recall that the support of a function is the closure of the set of points where the function is non-zero. After fixing two positive real numbers,

η

and

β

, the noise we will be adding is a function R belonging to the space

R_{η, β}

that contains the null function and all functions of the form

\sum_{i = 1}^{k} a_{i} ψ (b_{i} (x - c_{i}))

, where k is a positive integer, and

a_{i}, b_{i}, c_{i}

are real numbers such that

| c_{i} - c_{j} | \geq η

for

i \neq j

and

b_{i} \geq β

for every index i. For any

R \in R_{η, β}

, we define

S (R) : = ⋃_{i = 1}^{k}]c_{i} - \frac{σ}{β}, c_{i} + \frac{σ}{β}[

and remark that if

x \notin S (R)

, then

R (x) = 0

.

Figure 3 shows how drastically persistence diagrams can change when the data are subject to impulsive noise. Our purpose will be to recover

φ \in {Lip}_{L}

as well as possible from the function

\hat{φ} = φ + R

. An example of such a situation is depicted in Figure 4.

The following result will be of use.

Proposition 1.

Let

F_{1}, F_{2}

be GENEOs from

(C^{0} (R), G)

to

(C^{0} (R), G)

with respect to the trivial homomorphism

T = id : G \to G

. Then

F_{1} \circ F_{2}

is a GENEO from

(C^{0} (R), G)

to

(C^{0} (R), G)

with respect to T.

Proof.

For every

φ \in C^{0} (R), g \in G

we have that

\begin{matrix} F_{1} \circ F_{2} (φ \circ g) & = F_{1} (F_{2} (φ \circ g)) \\ = F_{1} (F_{2} (φ) \circ g) \\ = F_{1} (F_{2} (φ)) \circ g \\ = (F_{1} \circ F_{2}) (φ) \circ g . \end{matrix}

Therefore,

F_{1} \circ F_{2}

is G-equivariant. Moreover, for any

φ_{1}, φ_{2} \in C^{0} (R)

\begin{matrix} D_{C^{0} (R)} (F_{1} \circ F_{2} (φ_{1})), F_{1} \circ F_{2} (φ_{2})) & = D_{C^{0} (R)} (F_{1} (F_{2} (φ_{1})), F_{1} (F_{2} (φ_{2}))) \\ \leq D_{C^{0} (R)} (F_{2} (φ_{1})), F_{2} (φ_{2})) \\ \leq D_{C^{0} (R)} (φ_{1}, φ_{2}) . \end{matrix}

It follows that

F_{1} \circ F_{2}

is non-expansive. □

4. Cutting Off the Noise by GENEOs

We start by introducing two families of GENEOs from

(C^{0} (R), G)

to

(C^{0} (R), G)

with respect to the identical homomorphism.

Definition 3.

Let

φ \in {Lip}_{L}

and

ε > 0

. For all

x \in R

we define:

1.: $F^{ε} (φ) (x) = max (φ (x - ε), φ (x + ε))$ ;
2.: $F_{ε} (φ) (x) = min (φ (x - ε), φ (x + ε))$ .

Proposition 2.

The maps

F^{ε}

and

F_{ε}

are GENEOs from

(C^{0} (R), G)

to

(C^{0} (R), G)

with respect to the identical homomorphism.

Proof.

We start by proving that

F^{ε}

is G-equivariant.

Let g be the translation

x \mapsto x + k

with

k \in R

, then

\begin{matrix} F^{ε} (φ \circ g) & = max {φ ((x + k) - ε), φ ((x + k) + ε)} \\ = max {φ ((x - ε) + k), φ ((x + ε) + k)} \\ = F^{ε} (φ) \circ g . \end{matrix}

Let g be the symmetry

x \mapsto - x

, then

\begin{matrix} F^{ε} (φ \circ g) & = max {φ ((- x) - ε), φ ((- x) + ε)} \\ = max {φ (- (x + ε)), φ (- (x - ε))} \\ = max {φ (x + ε), φ (x - ε)} \circ g \\ = F^{ε} (φ) \circ g . \end{matrix}

Since every isometry in G can be written as the composition of a symmetry and a translation, our statement follows.

Furthermore, for any

x \in R

\begin{matrix} | F^{ε} (φ_{1}) (x) - F^{ε} (φ_{2}) (x) | = \\ = | max {φ_{1} (x - ε), φ_{1} (x + ε)} - max {φ_{2} (x - ε), φ_{2} (x + ε)} | \\ \leq max {| φ_{1} (x - ε) - φ_{2} (x - ε) |, | φ_{1} (x + ε) - φ_{2} (x + ε) |} \\ \leq max {∥ φ_{1} - φ_{2} ∥_{\infty}, ∥ φ_{1} - φ_{2} ∥_{\infty}} \\ = ∥ φ_{1} - φ_{2} ∥_{\infty} . \end{matrix}

It follows that

∥ F^{ε} (φ_{1}) - F^{ε} (φ_{2}) ∥_{\infty} \leq {∥ φ_{1} - φ_{2} ∥}_{\infty}

, and hence

F^{ε}

is non-expansive.

We observe that

min {a, b} = - max {- a, - b}

, and hence

F_{ε} (φ) = - F^{ε} (- φ)

for every

φ \in C^{0} (R)

. Moreover, the map that takes

φ

to

- φ

is a GENEO from

(C^{0} (R), G)

to

(C^{0} (R), G)

with respect to the identical homomorphism. Therefore, Proposition 1 ensures that G-equivariance and non-expansivity also hold for

F_{ε}

. □

Since Proposition 1 shows that the composition of GENEOs is still a GENEO, the operator

F^{δ} \circ F_{ε}

is a GENEO from

(C^{0} (R), G)

to

(C^{0} (R), G)

with respect to the identical homomorphism.

We now want to prove that if a function

\hat{φ}

is obtained by adding impulsive noise to a function

φ

, then the value of

∥ F^{δ} \circ F_{ε} (\hat{φ}) {- φ ∥}_{\infty}

is bounded, and possibly small, provided that

δ

and

ε

are chosen appropriately. The main idea is that the operator

F_{ε}

cuts the noise “directed upwards” and

F^{δ}

cuts the noise “directed downwards”.

In order to proceed, we need two lemmas.

Lemma 1.

Let

R \in C^{0} (R)

,

φ \in {Lip}_{L}

for some

L \in R

, and set

\hat{φ} : = φ + R

. Then for any

ε > 0

and

δ > 0

$(i)$: $- L ε + F_{ε} (R) \leq F_{ε} (\hat{φ}) - φ \leq L ε + F_{ε} (R)$ ;
$(ii)$: $- L δ + F^{δ} (R) \leq F^{δ} (\hat{φ}) - φ \leq L δ + F^{δ} (R)$ ;
$(iii)$: $- L (δ + ε) + F^{δ} \circ F_{ε} (R) \leq F^{δ} \circ F_{ε} (\hat{φ}) - φ \leq L (δ + ε) + F^{δ} \circ F_{ε} (R)$ .

Proof.

Since

φ

is Lipschitz of constant L, we have that for any value

x \in R

| φ (x - ε) - φ (x) | \leq L ε

and

| φ (x + ε) - φ (x) | \leq L ε

. Therefore,

\begin{matrix} F_{ε} (\hat{φ}) (x) & = F_{ε} (φ + R) (x) \\ = min {φ (x - ε) + R (x - ε), φ (x + ε) + R (x + ε)} \\ \leq min {φ (x) + L ε + R (x - ε), φ (x) + L ε + R (x + ε)} \\ = φ (x) + L ε + min {R (x - ε), R (x + ε)} \\ = φ (x) + L ε + F_{ε} (R) (x) . \end{matrix}

Analogously,

F_{ε} (\hat{φ}) (x) \geq φ (x) - L ε + F_{ε} (R) (x)

. The same steps applied to

F^{δ}

yield the second statement of the lemma. As for the last claim, we can see that:

\begin{matrix} F^{δ} \circ F_{ε} (\hat{φ}) (x) \\ = max {F_{ε} (\hat{φ}) (x - δ), F_{ε} (\hat{φ}) (x + δ)} \\ \leq max {φ (x - δ) + L ε + F_{ε} (R) (x - δ), φ (x + δ) + L ε + F_{ε} (R) (x + δ)} \\ \leq max {φ (x) + L δ + L ε + F_{ε} (R) (x - δ), φ (x) + L δ + L ε + F_{ε} (R) (x + δ)} \\ = φ (x) + L δ + L ε + max {F_{ε} (R) (x - δ), F_{ε} (R) (x + δ)} \\ = φ (x) + L δ + L ε + F^{δ} \circ F_{ε} (R) (x) . \end{matrix}

Analogously, we can prove the lower bound. □

Henceforth, we will assume that any summation on an empty set of indexes is the null function.

Lemma 2.

Let

R \in R_{η, β}

and

λ \geq \frac{σ}{β}

. If

λ \leq ρ \leq \frac{η}{2} - λ

then

$(a)$: $F_{ρ} (R) (x) = \sum_{a_{i} < 0} [a_{i} ψ (b_{i} (x - ρ - c_{i})) + a_{i} ψ (b_{i} (x + ρ - c_{i}))] \leq 0$
$(b)$: $F^{ρ} (R) (x) = \sum_{a_{i} > 0} [a_{i} ψ (b_{i} (x - ρ - c_{i})) + a_{i} ψ (b_{i} (x + ρ - c_{i}))] \geq 0$

for all

x \in R

. Moreover

F_{ρ} (R), F^{ρ} (R) (x) \in R_{2 λ, β}

.

Proof.

We will suppose without loss of generality that

c_{i} < c_{i + 1}

for all

i = 1, \dots, k - 1

and

a_{i} \neq 0

for all indexes i.

We want to show that at least one of

x - ρ

and

x + ρ

must always be outside of

⋃_{i = 1}^{k}]c_{i} - λ, c_{i} + λ[

. If by contradiction both

x - ρ

and

x + ρ

were in the same

]c_{i} - λ, c_{i} + λ[

, then we would have

ρ < λ

, against our hypotheses. Moreover,

x - ρ

and

x + ρ

cannot belong to different intervals

]c_{i} - λ, c_{i} + λ[

and

]c_{j} - λ, c_{j} + λ[

for

i < j

: if by contradiction

x - ρ \in]c_{i} - λ, c_{i} + λ[

and

x + ρ \in]c_{j} - λ, c_{j} + λ[

for some

i < j

, then

2 ρ = x + ρ - (x - ρ) > (c_{j} - λ) - (c_{i} + λ) \geq c_{j} - c_{i} - 2 λ \geq η - 2 λ

, and hence we would have

ρ > \frac{η}{2} - λ

, against our hypotheses.

Since

]c_{i} - \frac{σ}{β}, c_{i} + \frac{σ}{β}[\subseteq]c_{i} - λ, c_{i} + λ[

, it follows that at least one among the values

R (x - ρ)

and

R (x + ρ)

must always be zero. Let us now set

I_{i}^{-} : =](c_{i} - ρ) - λ, (c_{i} - ρ) + λ[

and

I_{i}^{+} : =](c_{i} + ρ) - λ, (c_{i} + ρ) + λ[

. These two intervals must be disjoint, since

(c_{i} + ρ - λ) - (c_{i} - ρ + λ) = 2 ρ - 2 λ \geq 0

.

Let us now consider

{I_{1}^{-}, I_{1}^{+}, \dots, I_{k}^{-}, I_{k}^{+}}

. We will now prove that any two distinct elements from this set must be a disjoint. Since we have just proven that

I_{i}^{-} \cap I_{i}^{+} = \emptyset

for

i = 1, \dots, k

, the following holds in the case

i < j

:

1.: $I_{i}^{+} \cap I_{j}^{+} = \emptyset$ , since $c_{j} + ρ - λ - (c_{i} + ρ + λ) \geq c_{j} - c_{i} - 2 λ \geq η - 2 λ \geq 0$ ;
2.: $I_{i}^{-} \cap I_{j}^{-} = \emptyset$ , since $c_{j} - ρ - λ - (c_{i} - ρ + λ) \geq c_{j} - c_{i} - 2 λ \geq η - 2 λ \geq 0$ ;
3.: $I_{i}^{+} \cap I_{j}^{-} = \emptyset$ , since $c_{j} - ρ - λ - (c_{i} + ρ + λ) \geq c_{j} - c_{i} - 2 ρ - 2 λ \geq η - 2 (\frac{η}{2} - λ) - 2 λ = 0$ ;
4.: $I_{j}^{+} \cap I_{i}^{-}$ , since $c_{j} + ρ - λ - (c_{i} - ρ + λ) \geq c_{j} - c_{i} + 2 ρ - 2 λ \geq η + 2 λ - 2 λ \geq 0$ .

Let us now fix

k \in {1, \dots, k}

.

Since

supp (a_{k} ψ (b_{k} (x - c_{k}))) \subseteq]c_{k} - λ, c_{k} + λ[

, then

supp (a_{k} ψ (b_{k} (x - ρ - c_{k}))) \subseteq I_{k}^{-}

and

supp (a_{k} ψ (b_{k} (x + ρ - c_{k}))) \subseteq I_{k}^{+}

.

Hence, for any

x \in R

we have that

\begin{matrix} F_{ρ} (R) (x) & = min {R (x - ρ), R (x + ρ)} \\ = \{\begin{matrix} 0 & i f x \notin ⋃_{i = 1}^{k} (I_{i}^{-} \cup I_{i}^{+}) \\ a_{j} ψ (b_{j} (x + ρ - c_{j})) & i f x \in I_{j}^{-} and a_{j} < 0 \\ a_{j} ψ (b_{j} (x - ρ - c_{j})) & i f x \in I_{j}^{+} and a_{j} < 0 . \end{matrix} \end{matrix}

This means that

F_{ρ} (R) (x) = \sum_{a_{i} < 0} [a_{i} ψ (b_{i} (x - ρ - c_{i})) + a_{i} ψ (b_{i} (x + ρ - c_{i}))] \leq 0 .

Now, given

c_{i} + ρ

and

c_{j} - ρ

centers of bumps of

F_{ρ} (R)

, we have that

| (c_{i} + ρ) - (c_{j} - ρ) | \geq min {η - 2 ρ, 2 ρ} \geq 2 λ .

It follows that

F_{ρ} (R) \in R_{2 λ, β}

.

By noting that

F^{ρ} (R) = - F_{ρ} (- R)

, we obtain the second part of the thesis. □

Let us remark that, in particular, if

λ = 2 \frac{σ}{β}

, then

F_{ρ} (R), F^{ρ} (R) \in R_{4 \frac{σ}{β}, β}

. We observe that for there to exist a

ρ

that satisfies the hypotheses of Lemma 2, the inequality

η \geq 4 λ

must hold.

We are now actually ready to prove a result that will be useful in the remainder of our paper.

Theorem 2.

Given

θ, β, L \in R

, let

\bar{R} \in R_{θ, β}

,

φ \in {Lip}_{L}

and set

\hat{φ} : = φ + \bar{R}

. If

2 \frac{σ}{β} \leq ε \leq \frac{θ}{2} - 2 \frac{σ}{β}

then for any

\frac{σ}{β} \leq δ \leq \frac{1}{2} min {θ - 2 ε, 2 ε} - \frac{σ}{β}

the following inequality holds:

∥ F^{δ} \circ F_{ε} (\hat{φ}) {- φ ∥}_{\infty} \leq L (ε + δ) .

Proof.

Let

x \in R

. Firstly, let us notice that the condition

2 \frac{σ}{β} \leq ε \leq \frac{θ}{2} - 2 \frac{σ}{β}

implies that

\frac{σ}{β} \leq \frac{1}{2} min {θ - 2 ε, 2 ε} - \frac{σ}{β}

. From Lemma 1, we know that

- L (δ + ε) + F^{δ} \circ F_{ε} (\bar{R}) (x) \leq F^{δ} \circ F_{ε} (\hat{φ}) (x) - φ (x) \leq L (δ + ε) + F^{δ} \circ F_{ε} (\bar{R}) (x) .

Let us now prove that

F^{δ} \circ F_{ε} (\bar{R}) (x) = 0

.

By applying Lemma 2 with

η : = θ

,

λ : = 2 \frac{σ}{β}

,

ρ : = ε

and

R : = \bar{R}

, we obtain

F_{ε} (\bar{R}) (x) = \sum_{a_{i} < 0} [a_{i} ψ (b_{i} (x - ε - c_{i})) + a_{i} ψ (b_{i} (x + ε - c_{i}))] \leq 0 .

Let us remark that

F_{ε} (\bar{R}) \in R_{4 \frac{σ}{β}, β}

. Moreover,

1.: $F^{δ} \circ F_{ε} (\bar{R}) (x) = max {F_{ε} (\bar{R}) (x - δ), F_{ε} (\bar{R}) (x + δ)} \leq 0$ since both terms are negative;
2.: $F^{δ} \circ F_{ε} (\bar{R}) (x) = F^{δ} (F_{ε} (\bar{R})) (x) \geq 0$ .

These inequalities follow from Lemma 2, by setting

η : = 4 \frac{σ}{β}

,

λ : = \frac{σ}{β}

,

ρ : = δ

and

R : = F_{ε} (\bar{R})

.

Therefore, we have proved that

| F^{δ} \circ F_{ε} (\hat{φ}) (x) - φ (x) | \leq L (δ + ε

) for any

x \in R

. It follows that

∥ F^{δ} \circ F_{ε} (\hat{φ}) {- φ ∥}_{\infty} \leq L (ε + δ)

. □

Let us remark that Theorem 2 works under the (only) implicit assumption that

θ \geq 8 \frac{σ}{β}

. In our setting, this should not be restrictive since it means that the noise added is made up of scattered, thin bumps, without any reference to the height of the bumps. This is what we expect when considering additive impulsive noise.

Corollary 1.

Given

θ, β, L \in R

, let

R \in R_{θ, β}

,

φ \in {Lip}_{L}

, and set

\hat{φ} : = φ + R

. If

θ \geq 8 \frac{σ}{β}

, then

{∥F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ∥}_{\infty} \leq 3 L \frac{σ}{β} .

Proof.

The claim follows from Theorem 2 by taking

ε : = 2 \frac{σ}{β}

and

δ : = \frac{σ}{β} .

□

Corollary 1 and the well-known stability of persistence diagrams with respect to the max-norm [5] immediately imply the following result, which is of interest in TDA (the symbol

d_{match}

denotes the usual bottleneck distance between persistence diagrams).

Corollary 2.

Given

θ, β, L \in R

, let

R \in R_{θ, β}

,

φ \in {Lip}_{L}

, and set

\hat{φ} : = φ + R

. If

θ \geq 8 \frac{σ}{β}

, and D and

D^{'}

are the persistence diagrams in degree 0 of the filtering functions φ and

F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ})

, respectively, then

d_{match} (D, D^{'}) \leq 3 L \frac{σ}{β} .

5. Our Main Results

We are now ready to prove our main results. We start by recalling a simple statement concerning the probability p that any two distinct points in a randomly chosen set of cardinality k in an interval of length ℓ have a distance greater than

η

(cf., e.g., [19]). For the reader’s convenience, we report the proof here.

Lemma 3.

Let

X_{1}, \dots, X_{k}

, with

k \geq 2

, be independent random variables, uniformly distributed on the interval

[0, ℓ]

, for some

ℓ > 0

. Let

M : = min_{\binom{1 \leq i, j \leq k}{i \neq j}} | X_{i} - X_{j} |

be the minimal distance between two distinct random variables. Then, we have

P (M > η) = \{\begin{matrix} 1 & if η \leq 0, \\ {(1 - \frac{(k - 1) η}{ℓ})}^{k} & if 0 < η < \frac{ℓ}{k - 1}, \\ 0 & if η \geq \frac{ℓ}{k - 1} . \end{matrix}

Proof.

It suffices to consider the case

0 < η < \frac{ℓ}{k - 1}

. By symmetry, we have

P (M > η) = k! P ((M > η) \cap (X_{1} < X_{2} < \dots < X_{k}))

(since

X_{1}, \dots, X_{k}

are uniformly distributed)

= k! \frac{Leb (S)}{ℓ^{k}}

(1)

where

S = {x \in {[0, ℓ]}^{k} ∣ x_{1} < x_{2} - η < x_{3} - 2 η < \dots < x_{k} - (k - 1) η}

, and Leb denotes the Lebesgue measure. Setting

y_{i} = x_{i} - (i - 1) η

for

i = 1, \dots, k

, we have that

Leb (S) = Leb (S^{'})

where

S^{'} = {y \in {[0, ℓ - (k - 1) η]}^{k} ∣ y_{1} < y_{2} < \dots < y_{k}} .

On the other hand, again by symmetry, we have

Leb (S^{'}) = \frac{Leb ({[0, ℓ - (k - 1) η]}^{k})}{k!} = \frac{{(ℓ - (k - 1) η)}^{k}}{k!}

and plugging this last identity into (1) we obtain the thesis. □

We can now prove the following result, concerning the expected value of the error

{∥F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ∥}_{\infty}

. We remark explicitly that the error is random because it is a function of

c_{1}, \dots, c_{k}

, which are independent random variables uniformly distributed on the interval

[0, ℓ]

.

Theorem 3.

Let us choose a function

φ \in {Lip}_{L}

, a non-negative continuous function

ψ : R \to R

with

{∥ ψ ∥}_{\infty} \leq 1

and

supp (ψ) \subseteq]- σ, σ[

for some

σ > 0

, two positive numbers β and ℓ, and an integer

k \geq 2

. For

i = 1, \dots, k

, let us fix

a_{i} \in R

and

b_{i} \geq β

, and set

\bar{α} : = max | a_{i} |

. Moreover, let

c_{1}, \dots, c_{k}

be independent random variables, uniformly distributed on the interval

[0, ℓ]

. Let us consider the random variable

\hat{φ} : = φ + R

, where

R (x) : = \sum_{i = 1}^{k} a_{i} ψ (b_{i} (x - c_{i}))

for any

x \in R

. If

\frac{σ}{β} < \frac{ℓ}{8 (k - 1)}

, then

E ({∥F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ∥}_{\infty}) \leq 3 L \frac{σ}{β} + k \bar{α} (1 - {(1 - 8 \frac{(k - 1)}{ℓ} \frac{σ}{β})}^{k}) .

Proof.

By setting

δ = \frac{σ}{β}

and

ε = 2 \frac{σ}{β}

in statement

(iii)

of Lemma 1, we have that

{∥F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ∥}_{\infty} \leq 3 L \frac{σ}{β} + {∥F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (R)∥}_{\infty} .

Since the operator

F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}}

is non-expansive and

F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (0) = 0

, it follows that

{∥F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (R)∥}_{\infty} \leq {∥R∥}_{\infty} \leq k \bar{α} {∥ψ∥}_{\infty} \leq k \bar{α} .

Therefore,

{∥F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ∥}_{\infty} \leq 3 L \frac{σ}{β} + k \bar{α}

. If we apply Lemma 3 with

η = 8 \frac{σ}{β}

, we obtain that

R \in R_{8 \frac{σ}{β}, β}

with probability

p : = {(1 - 8 \frac{(k - 1)}{ℓ} \frac{σ}{β})}^{k}

. If

R \in R_{8 \frac{σ}{β}, β}

, we can apply Theorem 2 by setting

δ = \frac{σ}{β}

,

ε = 2 \frac{σ}{β}

and

θ = 8 \frac{σ}{β}

, and hence we obtain that

{∥F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ∥}_{\infty} \leq 3 L \frac{σ}{β}

with probability at least p. Since

{∥F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ∥}_{\infty} \leq 3 L \frac{σ}{β} + k \bar{α}

in any case, it follows that

\begin{matrix} E ({∥F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ∥}_{\infty}) & \leq 3 L \frac{σ}{β} p + (3 L \frac{σ}{β} + k \bar{α}) (1 - p) \\ = 3 L \frac{σ}{β} + k \bar{α} (1 - p) . \end{matrix}

□

Theorem 3 and the well-known stability of persistence diagrams with respect to the max-norm [5] immediately imply the following result, which is of interest in TDA (the symbol

d_{match}

denotes the usual bottleneck distance between persistence diagrams).

Corollary 3.

Let us make the same assumptions of Theorem 3. Let D and

D^{'}

be the persistence diagrams in degree 0 of the filtering functions φ and

F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ})

, respectively. If

\frac{σ}{β} < \frac{ℓ}{8 (k - 1)}

, then

E (d_{match} (D, D^{'})) \leq 3 L \frac{σ}{β} + k \bar{α} (1 - {(1 - 8 \frac{(k - 1)}{ℓ} \frac{σ}{β})}^{k}) .

This result shows that the use of suitable GENEOs can make TDA (relatively) stable also in the presence of impulsive noise, under the assumptions we have considered in this paper.

Theorem 3 and Corollary 3 can be easily extended to the case that the number k of bumps is a random variable:

Theorem 4.

Let k be a random variable that takes values in the subset

S = {2, 3, \dots, K}

of the positive integers, where we assume that the probability of the integer

j \in S

is

p_{j}

. Let us choose a function

φ \in {Lip}_{L}

, a non-negative continuous function

ψ : R \to R

with

{∥ ψ ∥}_{\infty} \leq 1

and

supp (ψ) \subseteq]- σ, σ[

for some

σ > 0

, two positive numbers β and ℓ. Let

c_{1}^{k}, \dots, c_{k}^{k}

be random variables that, conditioned to k, are independent and uniformly distributed on the interval

[0, ℓ]

. Moreover, for

i = 1, \dots, k

, let us fix

a_{i}^{k} \in R

and

b_{i}^{k} \geq β

, and set

{\bar{α}}_{k} : = {max}_{i} | a_{i}^{k} |

. Let us also consider the random variable

\hat{φ} : = φ + R_{k}

, where

R_{k} (x) : = \sum_{i = 1}^{k} a_{i}^{k} ψ (b_{i}^{k} (x - c_{i}^{k}))

for any

x \in R

. If

\frac{σ}{β} < \frac{ℓ}{8 (K - 1)}

, then

E ({∥F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ∥}_{\infty}) \leq 3 L \frac{σ}{β} + \sum_{j = 2}^{K} p_{j} j {\bar{α}}_{j} (1 - {(1 - 8 \frac{(j - 1)}{ℓ} \frac{σ}{β})}^{j}) .

Proof.

It follows immediately from Theorem 3. □

Corollary 4.

Let us make the same assumptions of Theorem 4. Let D and

D^{'}

be the persistence diagrams in degree 0 of the filtering functions φ and

F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ})

, respectively. If

\frac{σ}{β} < \frac{ℓ}{8 (K - 1)}

, then

E (d_{match} (D, D^{'})) \leq 3 L \frac{σ}{β} + \sum_{j = 2}^{K} p_{j} j {\bar{α}}_{j} (1 - {(1 - 8 \frac{(j - 1)}{ℓ} \frac{σ}{β})}^{j}) .

Proof.

It follows immediately from Corollary 3. □

6. Examples and Experiments

We will now validate our approach based on GENEOs by giving two examples and illustrating some experimental results.

6.1. Examples

In order to verify how our approach works, we will set

τ_{n} : = (1 - \frac{1}{n}) 2 \frac{σ}{β} + \frac{1}{n} (\frac{θ}{2} - 2 \frac{σ}{β})

and consider the upper bound

{∥F^{\frac{τ_{n}}{2}} \circ F_{τ_{n}} (\hat{φ}) - φ∥}_{\infty} \leq \frac{3}{2} L τ_{n}

, obtained by applying Corollary 1. We observe that

τ_{n} \geq 2 \frac{σ}{β}

for every index n, and

lim_{n \to + \infty} τ_{n} = 2 \frac{σ}{β}

. We will examine two examples that use the GENEOs

F^{\frac{τ_{n}}{2}} \circ F_{τ_{n}}

and show how our method based on such operators and the method based on convolutions differ, as for their capability in removing additive impulsive noise. Moreover, we will compare the actual error

{∥F^{\frac{τ_{n}}{2}} \circ F_{τ_{n}} (\hat{φ}) - φ∥}_{\infty}

to its upper bound

\frac{3}{2} L τ_{n}

, by running several simulations. The convolutions that will be applied in our examples use the functions

T_{h} : R \to R

defined by setting

T_{h} (x) : = \{\begin{matrix} \frac{h}{2} & i f - \frac{1}{h} \leq x \leq \frac{1}{h} \\ 0 & o t h e r w i s e \end{matrix}

for

h > 0

. We will see that, although the convolution with such functions is also a GENEO, it will not be able to efficiently remove the noise.

Our noise function R will be constructed starting from the mother function

ψ

defined by setting

ψ (x) : = e^{1 - \frac{1}{1 - x^{2}}}

for

x \in]- 1, 1[

and

ψ (x) : = 0

for

x \notin]- 1, 1[

. Using the notation introduced in the previous sections, we will set

σ = 1.1

, thus satisfying the condition

supp (ψ) \subseteq]- σ, σ[

. The impulsive noise will be added in an interval

[- ℓ, ℓ]

. The following parameters are considered, with these respective uniform distributions:

$k \sim {Unif}_{{1, \dots, 10}}$
$a_{i} \sim {Unif}_{]- 100, 100[}$ for $i = 1, \dots, k$
$b_{i} \sim {Unif}_{]0, 100[}$ for $i = 1, \dots, k$
$c_{i} \sim {Unif}_{]- ℓ + \frac{σ}{β}, ℓ - \frac{σ}{β}[}$ for $i = 1, \dots, k$ .

We set

β : = min_{i = 1, \dots, N} b_{i}

,

\bar{α} : = max_{i = 1, \dots, N} | a_{i} |

, and

η : = min_{i \neq j} | c_{i} - c_{j} |

. After producing random values for the parameters

N, a_{i}, b_{i}, c_{i}

, our algorithm checks whether

η > 8 \frac{σ}{β}

; otherwise it generates another set of parameters.

6.1.1. First Example

Let us consider the function

φ (x) : = \{\begin{matrix} sin x & i f - 4 π \leq x \leq 4 π \\ 0 & o t h e r w i s e \end{matrix}

for

x \in R

. We observe that

φ \in {Lip}_{L}

, for

L = 1

. We will add noise in the interval

[- 4 π, 4 π]

(meaning

ℓ = 4 π

) and visualize the results in such an interval. Figure 5 illustrates how the function

\hat{φ}

looks compared to

φ

.

We will start by considering how well the convolution

\hat{φ} * T_{n}

can approximate the original function

φ

, when n goes from 3 to 100. From Figure 6, it is immediately apparent that the max-norm distance between

\hat{φ} * T_{n}

and

φ

remains quite large.

If we apply a convolution with

T_{\frac{1}{n}}

, for

3 \leq n \leq 100

, we obtain the results displayed in Figure 7, showing that all information represented by the function

φ

is progressively destroyed.

In contrast, if we apply the operator

F^{\frac{τ_{n}}{2}} \circ F_{τ_{n}} (\hat{φ})

, for

n = 3, 5, 20, 100

, we obtain the results displayed in Figure 8. As we can see, this operator is much more efficient in removing the bumps and restoring the function

φ

. We recall that as n increases,

τ_{n}

tends to

2 \frac{σ}{β}

, and our operator becomes more effective at removing the noise.

As a matter of fact, when we apply a convolution with the function

T_{h}

and check the corresponding errors via the sup-norm, we obtain the results displayed in Figure 9. Since

lim_{h \to + \infty} \hat{φ} * T_{h} = \hat{φ}

and

lim_{h \to + \infty} \hat{φ} * T_{\frac{1}{h}} = 0

, we obtain that the errors tend to

∥ \hat{φ} {- φ ∥}_{\infty} = max_{i = 1, \dots, N} | a_{i} | = \bar{α}

and

{∥ φ ∥}_{\infty}

, respectively.

In contrast, if we apply the operator

F^{\frac{τ_{n}}{2}} \circ F_{τ_{n}}

, we obtain the results displayed in Figure 10. This shows that the upper bound for the error stated in Corollary 1 is quite tight. As we can expect, the best denoising is achieved when we replace

τ_{n}

with

2 \frac{σ}{β}

(see Figure 11).

Finally, we executed 1000 simulations. In each simulation, we added random impulsive noise to the function

φ

to produce a function

\hat{φ}

. We then applied

F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}}

to

\hat{φ}

to see how close the upper bound in Corollary 1 is to the actual error

∥ F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) {- φ ∥}_{\infty}

. The same parameters as in the beginning of this example were used. As we can see in Figure 12, the overestimation committed by our upper bound is often quite close to zero relative to the Lipschitz constant L of the function

φ

.

This suggests that this upper bound is quite accurate.

6.1.2. Second Example

Let us consider the function

φ (x) : = \{\begin{matrix} \frac{1}{1000} (x - 5) (x - 3) (x + 1) (x + 4) (x + 5) & i f - 5 \leq x \leq 5 \\ 0 & o t h e r w i s e \end{matrix}

for

x \in R

. The coefficient

\frac{1}{1000}

was chosen so that the Lipschitz constant L would be comparable to the one in the previous example. In this example,

L = \frac{27}{25}

. Figure 13 illustrates how the function

\hat{φ}

looks compared to

φ

.

We will start by considering how well the convolution

\hat{φ} * T_{n}

can approximate the original function

φ

, when n goes from 3 to 100. From Figure 14, it is immediately apparent that the max-norm distance between

\hat{φ} * T_{n}

and

φ

remains quite large.

If we apply a convolution with

T_{\frac{1}{n}}

, for

3 \leq n \leq 100

, we obtain the results displayed in Figure 15, showing that all information represented by the function

φ

is progressively destroyed.

In contrast, if we apply the operator

F^{\frac{τ_{n}}{2}} \circ F_{τ_{n}} (\hat{φ})

, for

n = 3, 5, 20, 100

, we obtain the results displayed in Figure 16. As we can see, this operator is much more efficient in removing the bumps and restoring the function

φ

.

When we apply a convolution with the function

T_{h}

and check the corresponding errors via the sup-norm, we obtain the results displayed in Figure 17. As we have already seen, since

lim_{h \to + \infty} \hat{φ} * T_{h} = \hat{φ}

and

lim_{h \to + \infty} \hat{φ} * T_{\frac{1}{h}} = 0

, we obtain that the errors tend to

∥ \hat{φ} {- φ ∥}_{\infty} = max_{i = 1, \dots, k} | a_{i} | = \bar{α}

and

{∥ φ ∥}_{\infty}

, respectively.

In contrast, if we apply the operator

F^{\frac{τ_{n}}{2}} \circ F_{τ_{n}}

, we obtain the results displayed in Figure 18. This shows that the upper bound for the error stated in Corollary 1 is quite tight. As we can expect, the best denoising is achieved when we replace

τ_{n}

with

2 \frac{σ}{β}

(see Figure 19).

Finally, we again executed 1000 simulations. We used the same methodology as in the previous case, but this time we considered the polynomial presented at the beginning of this example. As we can see in Figure 20, the overestimation committed by our upper bound is often quite close to zero relative to the Lipschitz constant L of the polynomial

φ

.

6.2. Experiments

In order to check how good the upper bound stated in Theorem 3 is, we have made the following experiment. In the first step, we fixed

ℓ = 20

and

σ = 1.1

, and assumed the following parameters to be given:

L > 0

,

N \in N

,

α > 0

,

β > 0

,

k \in N

,

a_{1}, \dots, a_{k} \in] 0, α [

,

b_{1}, \dots, b_{k} > β

,

c_{1}, \dots, c_{k} \in] 0, ℓ [

.

Firstly, we used the parameters

L > 0

,

N \in N

to generate a random L-Lipschitz function

φ

in the following way. We randomly chose and sorted in ascending order N points

x_{1}, \dots, x_{N}

in the open interval

] 0, ℓ [

, with uniform distribution. Hence, we obtained the following decomposition:

] 0, ℓ [=] 0, x_{1}] \cup] x_{1}, x_{2}] \cup \dots \cup] x_{N}, ℓ [

. We defined our Lipschitz function

φ

to be 0 outside

] 0, ℓ [

. After setting

(x_{0}, y_{0}) = (0, 0)

and

(x_{N + 1}, y_{N + 1}) = (ℓ, 0)

for

i \in {1, \dots, N + 1}

, the value

y_{i}

of the function at

x_{i}

was randomly chosen, with uniform distribution in an interval that allows for an L-Lipschitz extension to

[0, ℓ]

of the function, i.e.,

[max {y_{i - 1} - L (x_{i} - x_{i - 1}), - L (ℓ - x_{i})}, min {y_{i - 1} + L (x_{i} - x_{i - 1}), L (ℓ - x_{i})}] .

Finally, the graph of the Lipschitz function

φ

on

[0, ℓ]

was obtained by connecting each point

(x_{i - 1}, y_{i - 1})

to

(x_{i}, y_{i})

with a segment for

i \in {1, \dots, N + 1}

. We observe that

φ

constructed this way is an L-Lipschitz function.

Secondly, we used the parameters

α > 0

,

β > 0

,

k \in N

,

a_{1}, \dots, a_{k} \in] 0, α [

,

b_{1}, \dots, b_{k} > β

,

c_{1}, \dots, c_{k} \in] 0, ℓ [

to generate a noise function as follows. We considered the mother function

ψ

defined by setting

ψ (x) : = e^{1 - \frac{1}{1 - x^{2}}}

for

x \in]- 1, 1[

and

ψ (x) : = 0

for

x \notin]- 1, 1[

. For each L-Lipschitz function

φ

produced in the previously described way, we considered the function

\hat{φ} = φ + \sum_{i = 1}^{k} a_{i} ψ (b_{i} (x - c_{i}))

.

In Figure 21 and Figure 22, some examples of the functions we have produced are displayed for

N = 3

and

N = 7

, respectively. In each figure, the functions are displayed without noise (left) and with added noise (right).

In the second step, we fixed

ℓ = 20

and

σ = 1.1

again, and considered a probabilistic model where

α

,

β

, L are given and the values N, k,

a_{i}

,

b_{i}

,

c_{i}

are random variables. In this setting, we compared the noise

∥ \hat{φ} {- φ ∥}_{\infty}

with the probabilistic upper bound stated in Theorem 3 and the value

{∥F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ∥}_{\infty}

, which represents the reduced noise that we can obtain by applying our method. In order to average our results, for each triplet

(α, β, L)

with

α \in {50, 55, 60, \dots, 100}

,

β \in {3, 4, 5, \dots, 13}

, and

L \in {1, 2, \dots, 10}

, we randomly generated 100 examples of an L-Lipschitz function

φ

and its noisy version

\hat{φ}

by randomly choosing the parameters N, k,

a_{i}

,

b_{i}

,

c_{i}

according to the following distributions:

$N \sim {Unif}_{{1, \dots, 10}}$
$k \sim {Unif}_{{1, \dots, 10}}$
$a_{i} \sim {Unif}_{]0, α[}$ for $i = 1, \dots, k$
$b_{i} \sim {Unif}_{]β, 20[}$ for $i = 1, \dots, k$
$c_{i} \sim {Unif}_{]3 \frac{σ}{β}, ℓ - 3 \frac{σ}{β}[}$ for $i = 1, \dots, k$ .

Then, for the chosen values of each one of the three variables

α, β, L

, we computed the mean, with respect to the other two variables, of the average of the noise

∥ \hat{φ} {- φ ∥}_{\infty}

and the average of the reduced noise

{∥F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ∥}_{\infty}

obtained by our method, both evaluated for

(φ, \hat{φ})

varying in the set of cardinality 100 that we produced. We also computed the mean of the probabilistic upper bound stated in Theorem 3 with respect to the same two variables. The results are displayed in Figure 23, Figure 24 and Figure 25. We remind the reader that

α

and

β

, respectively, express the maximum height and the thinness of the noise bumps, while L is a Lipschitz constant for each function

φ

that we are interested in. These results illustrate the effectiveness of using GENEOs in the reduction of impulsive noise.

7. Conclusions

In our paper, we have proved a stability property for persistence diagrams of functions from

R

to

R

in the presence of impulsive noise. This property shows that TDA can also be useful when noise drastically changes the topology of the sublevel sets of the filtering functions we are considering, and highlights some new potential interactions between TDA and the theory of GENEOs. The experimental section shows that our approach is indeed able to remove impulsive noise in most cases. It would be interesting to investigate the possibility of extending our method to real-valued functions defined on n-dimensional domains and allowing the function

ψ

to be a random variable, by selecting suitable GENEOs. We observe that the operators

F^{ε}

and

F_{ε}

introduced in Definition 3 can be easily adapted to the case of functions from

R^{n}

to

R

, by setting

F^{ε} (φ) (x) = max {φ (x + v) | v \in R^{n}, ∥ v ∥ = ε}

and

F_{ε} (φ) (x) = min {φ (x + v) | v \in R^{n}, ∥ v ∥ = ε}

for all

x \in R^{n}

. Applying a suitable operator

F^{δ} \circ F_{ε}

to real-valued filtering functions defined on

R^{n}

should allow us to control the expected value of the perturbation of persistence diagrams caused by uniformly distributed impulsive noise, even for

n > 1

and in degrees greater than 0. Our long-term goal is to realize Machine Learning methods in which optimal operators for impulse noise removal can be learned and selected in the convex hull of a given finite set of GENEOs, using a theorem which guarantees that the convex combination of GENEOs is still a GENEO (cf. [11]). We plan to devote our research to this topic in the future.

Author Contributions

P.F. devised the project. All authors contributed equally to the manuscript. The authors of this paper have been listed in alphabetical order. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the European Union in the context of the H2020 EU Marie Curie Initial Training Network project named WAKEUPCALL, and by INdAM-GNSAGA.

Data Availability Statement

Data available on request from the authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

Edelsbrunner, H.; Morozov, D. Persistent homology: Theory and practice. In European Congress of Mathematics; European Mathematical Society: Zürich, Germany, 2013; pp. 31–50. [Google Scholar]
Biasotti, S.; Floriani, L.D.; Falcidieno, B.; Frosini, P.; Giorgi, D.; Landi, C.; Papaleo, L.; Spagnuolo, M. Describing shapes by geometrical-topological properties of real functions. ACM Comput. Surv. 2008, 40, 12:1–12:87. [Google Scholar] [CrossRef]
Carlsson, G. Topology and data. Bull. Amer. Math. Soc. 2009, 46, 255–308. [Google Scholar] [CrossRef] [Green Version]
Edelsbrunner, H.; Harer, J. Persistent homology—A survey. In Surveys on Discrete and Computational Geometry; American Mathematical Society: Providence, RI, USA, 2008; Volume 453, pp. 257–282. [Google Scholar]
Cohen-Steiner, D.; Edelsbrunner, H.; Harer, J. Stability of persistence diagrams. Discrete Comput. Geom. 2007, 37, 103–120. [Google Scholar] [CrossRef] [Green Version]
Cohen-Steiner, D.; Edelsbrunner, H.; Harer, J.; Mileyko, Y. Lipschitz functions have Lp-stable persistence. Found. Comput. Math. 2010, 10, 127–139. [Google Scholar] [CrossRef]
Fasy, B.T.; Lecci, F.; Rinaldo, A.; Wasserman, L.; Balakrishnan, S.; Singh, A. Confidence sets for persistence diagrams. Ann. Stat. 2014, 42, 2301–2339. [Google Scholar] [CrossRef]
Buchet, M.; Chazal, F.; Dey, T.K.; Fan, F.; Oudot, S.Y.; Wang, Y. Topological analysis of scalar fields with outliers. In Proceedings of the 31st International Symposium on Computational Geometry (SoCG 2015), Eindhoven, The Netherlands, 22–25 June 2015; Leibniz International Proceedings in Informatics (LIPIcs). Arge, L., Pach, J., Eds.; Schloss Dagstuhl–Leibniz-Zentrum für Informatik: Dagstuhl, Germany, 2015; Volume 34, pp. 827–841. [Google Scholar]
Adler, R.J.; Agami, S. Modelling persistence diagrams with planar point processes, and revealing topology with bagplots. J. Appl. Comput. Topol. 2019, 3, 139–183. [Google Scholar] [CrossRef] [Green Version]
Vishwanath, S.; Fukumizu, K.; Kuriki, S.; Sriperumbudur, B.K. Robust persistence diagrams using reproducing kernels. In Advances in Neural Information Processing Systems; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H., Eds.; Curran Associates, Inc.: New York, NY, USA, 2020; Volume 33, pp. 21900–21911. [Google Scholar]
Bergomi, M.G.; Frosini, P.; Giorgi, D.; Quercioli, N. Towards a topological–geometrical theory of group equivariant non-expansive operators for data analysis and machine learning. Nat. Mach. Intell. 2019, 1, 423–433. [Google Scholar] [CrossRef] [Green Version]
Conti, F.; Frosini, P.; Quercioli, N. On the construction of Group Equivariant Non-Expansive Operators via permutants and symmetric functions. Front. Artif. Intell. 2022, 5, 786091. [Google Scholar] [CrossRef] [PubMed]
Bocchi, G.; Botteghi, S.; Brasini, M.; Frosini, P.; Quercioli, N. On the finite representation of linear group equivariant operators via permutant measures. Ann. Math. Artif. Intell. 2023, 1–23. [Google Scholar] [CrossRef]
Frosini, P.; Jabłoński, G. Combining persistent homology and invariance groups for shape comparison. Discrete Comput. Geom. 2016, 55, 373–409. [Google Scholar] [CrossRef] [Green Version]
Cerri, A.; Fabio, B.D.; Ferri, M.; Frosini, P.; Landi, C. Betti numbers in multidimensional persistent homology are stable functions. Math. Methods Appl. Sci. 2013, 36, 1543–1557. [Google Scholar] [CrossRef] [Green Version]
Cerri, A.; Ethier, M.; Frosini, P. On the geometrical properties of the coherent matching distance in 2D persistent homology. J. Appl. Comput. Topol. 2019, 3, 381–422. [Google Scholar] [CrossRef] [Green Version]
Micheletti, A. A new paradigm for artificial intelligence based on group equivariant non-expansive operators. Eur. Math. Soc. Mag. 2023, 128, 4–12. [Google Scholar] [CrossRef]
Vaseghi, S.V. Impulsive Noise: Modelling, Detection and Removal; John Wiley and Sons, Ltd.: Chichester, UK, 2008; Chapter 13; pp. 341–358. [Google Scholar]
Earnest, M. Average Minimum Distance between n Points Generate i.i.d. with Uniform Dist. Mathematics Stack Exchange. Available online: https://math.stackexchange.com/q/2001026 (accessed on 13 July 2021).

Figure 1. How the sublevel sets change with respect to the filtration induced by

φ

.

Figure 1. How the sublevel sets change with respect to the filtration induced by

φ

.

Figure 2. Persistence diagram of the function

2 sin x

on

[0, \frac{3}{4} π]

. The point

(0, \infty)

describes the existence of a connected component that is born at zero and never dies. The point

(\sqrt{2}, 2)

claims that there is a connected component born at

\sqrt{2}

that dies (merges with the other one) at 2. The trivial points on the diagonal

u = v

are not displayed.

Figure 2. Persistence diagram of the function

2 sin x

on

[0, \frac{3}{4} π]

. The point

(0, \infty)

describes the existence of a connected component that is born at zero and never dies. The point

(\sqrt{2}, 2)

claims that there is a connected component born at

\sqrt{2}

that dies (merges with the other one) at 2. The trivial points on the diagonal

u = v

are not displayed.

Figure 3. How drastically impulsive noise can influence persistence diagrams.

Figure 4. In the figure on the left, we have our original function

φ

. In the middle we have a noise function R, and in the right figure, we have the corrupted function

\hat{φ} : = φ + R

.

Figure 4. In the figure on the left, we have our original function

φ

. In the middle we have a noise function R, and in the right figure, we have the corrupted function

\hat{φ} : = φ + R

.

Figure 5. Example 1: Comparison between

φ

and

\hat{φ}

.

Figure 5. Example 1: Comparison between

φ

and

\hat{φ}

.

Figure 6. Example 1: Denoising via convolution with

T_{h}

(

h = n

).

Figure 6. Example 1: Denoising via convolution with

T_{h}

(

h = n

).

Figure 7. Example 1: Denoising via convolution with

T_{h}

(

h = \frac{1}{n}

).

Figure 7. Example 1: Denoising via convolution with

T_{h}

(

h = \frac{1}{n}

).

Figure 8. Example 1: Denoising via the proposed GENEO

F^{\frac{τ_{n}}{2}} \circ F_{τ_{n}}

.

Figure 8. Example 1: Denoising via the proposed GENEO

F^{\frac{τ_{n}}{2}} \circ F_{τ_{n}}

.

Figure 9. Example 1: Error made by applying a convolution with

T_{n}

(left) or

T_{\frac{1}{n}}

(right).

Figure 9. Example 1: Error made by applying a convolution with

T_{n}

(left) or

T_{\frac{1}{n}}

(right).

Figure 10. Example 1: Error made by using the GENEO

F^{\frac{τ_{n}}{2}} \circ F_{τ_{n}}

.

Figure 10. Example 1: Error made by using the GENEO

F^{\frac{τ_{n}}{2}} \circ F_{τ_{n}}

.

Figure 11. Example 1: Denoising via

F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}}

.

Figure 11. Example 1: Denoising via

F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}}

.

Figure 12. Example 1: Histogram counting the number of cases in each of ten bins concerning the overestimation value

3 L \frac{σ}{β} - {∥ F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ ∥}_{\infty}

obtained in our simulations. Most of the cases belong to the first bin, specifically 665 cases, containing the functions

φ

for which

0 \leq 3 L \frac{σ}{β} - {∥ F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ ∥}_{\infty} \leq 0.02

.

Figure 12. Example 1: Histogram counting the number of cases in each of ten bins concerning the overestimation value

3 L \frac{σ}{β} - {∥ F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ ∥}_{\infty}

obtained in our simulations. Most of the cases belong to the first bin, specifically 665 cases, containing the functions

φ

for which

0 \leq 3 L \frac{σ}{β} - {∥ F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ ∥}_{\infty} \leq 0.02

.

Figure 13. Example 2: Comparison between

φ

and

\hat{φ}

.

Figure 13. Example 2: Comparison between

φ

and

\hat{φ}

.

Figure 14. Example 2: Denoising via convolution with

T_{h}

(

h = n

).

Figure 14. Example 2: Denoising via convolution with

T_{h}

(

h = n

).

Figure 15. Example 2: Denoising via convolution with

T_{h}

(

h = \frac{1}{n}

).

Figure 15. Example 2: Denoising via convolution with

T_{h}

(

h = \frac{1}{n}

).

Figure 16. Example 2: Denoising via the proposed GENEO

F^{\frac{τ_{n}}{2}} \circ F_{τ_{n}}

.

Figure 16. Example 2: Denoising via the proposed GENEO

F^{\frac{τ_{n}}{2}} \circ F_{τ_{n}}

.

Figure 17. Example 2: Error made by applying a convolution with

T_{n}

(left) or

T_{\frac{1}{n}}

(right).

Figure 17. Example 2: Error made by applying a convolution with

T_{n}

(left) or

T_{\frac{1}{n}}

(right).

Figure 18. Example 2: Error made by using the GENEO

F^{\frac{τ_{n}}{2}} \circ F_{τ_{n}}

.

Figure 18. Example 2: Error made by using the GENEO

F^{\frac{τ_{n}}{2}} \circ F_{τ_{n}}

.

Figure 19. Example 2: Denoising via

F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}}

.

Figure 19. Example 2: Denoising via

F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}}

.

Figure 20. Example 2: Histogram counting the number of cases in each of ten bins concerning the overestimation value

3 L \frac{σ}{β} - {∥ F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ ∥}_{\infty}

obtained in our simulations: 431 cases belong to the first bin and 154 belong to the second one; hence, most of the cases belong to the first two bins containing the functions

φ

for which

0 \leq 3 L \frac{σ}{β} - {∥ F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ ∥}_{\infty} \leq 0.24

.

Figure 20. Example 2: Histogram counting the number of cases in each of ten bins concerning the overestimation value

3 L \frac{σ}{β} - {∥ F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ ∥}_{\infty}

obtained in our simulations: 431 cases belong to the first bin and 154 belong to the second one; hence, most of the cases belong to the first two bins containing the functions

φ

for which

0 \leq 3 L \frac{σ}{β} - {∥ F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ ∥}_{\infty} \leq 0.24

.

Figure 21. Five examples of the functions we produced for

N = 3

. In each figure, the functions are displayed without noise (left) and with added noise (right).

Figure 21. Five examples of the functions we produced for

N = 3

. In each figure, the functions are displayed without noise (left) and with added noise (right).

Figure 22. Five examples of the functions we produced for

N = 7

. In each figure, the functions are displayed without noise (left) and with added noise (right).

Figure 22. Five examples of the functions we produced for

N = 7

. In each figure, the functions are displayed without noise (left) and with added noise (right).

Figure 23. Plots of the means of the averaged values of

∥ \hat{φ} {- φ ∥}_{\infty}

(yellow), the means of

3 L \frac{σ}{β} + k α (1 - {(1 - 8 \frac{(k - 1)}{ℓ} \frac{σ}{β})}^{k})

(brown), and the means of the averaged values of

{∥F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ∥}_{\infty}

(blue) for

β \in {3, 4, 5, \dots, 13}

and

L \in {1, 2, \dots, 10}

, when

α

varies in the set

{50, 55, 60, \dots, 100}

.

Figure 23. Plots of the means of the averaged values of

∥ \hat{φ} {- φ ∥}_{\infty}

(yellow), the means of

3 L \frac{σ}{β} + k α (1 - {(1 - 8 \frac{(k - 1)}{ℓ} \frac{σ}{β})}^{k})

(brown), and the means of the averaged values of

{∥F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ∥}_{\infty}

(blue) for

β \in {3, 4, 5, \dots, 13}

and

L \in {1, 2, \dots, 10}

, when

α

varies in the set

{50, 55, 60, \dots, 100}

.

Figure 24. Plots of the means of the averaged values of

∥ \hat{φ} {- φ ∥}_{\infty}

(yellow), the means of

3 L \frac{σ}{β} + k α (1 - {(1 - 8 \frac{(k - 1)}{ℓ} \frac{σ}{β})}^{k})

(brown), and the means of the averaged values of

{∥F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ∥}_{\infty}

(blue) for

α \in {50, 55, 60, \dots, 100}

and

L \in {1, 2, \dots, 10}

, when

β

varies in the set

{3, 4, 5, \dots, 13}

.

Figure 24. Plots of the means of the averaged values of

∥ \hat{φ} {- φ ∥}_{\infty}

(yellow), the means of

3 L \frac{σ}{β} + k α (1 - {(1 - 8 \frac{(k - 1)}{ℓ} \frac{σ}{β})}^{k})

(brown), and the means of the averaged values of

{∥F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ∥}_{\infty}

(blue) for

α \in {50, 55, 60, \dots, 100}

and

L \in {1, 2, \dots, 10}

, when

β

varies in the set

{3, 4, 5, \dots, 13}

.

Figure 25. Plots of the means of the averaged values of

∥ \hat{φ} {- φ ∥}_{\infty}

(yellow), the means of

3 L \frac{σ}{β} + k α (1 - {(1 - 8 \frac{(k - 1)}{ℓ} \frac{σ}{β})}^{k})

(brown), and the means of the averaged values of

{∥F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ∥}_{\infty}

(blue) for

α \in {50, 55, 60, \dots, 100}

and

β \in {3, 4, 5, \dots, 13}

, when L varies in the set

{1, 2, \dots, 10}

.

Figure 25. Plots of the means of the averaged values of

∥ \hat{φ} {- φ ∥}_{\infty}

(yellow), the means of

3 L \frac{σ}{β} + k α (1 - {(1 - 8 \frac{(k - 1)}{ℓ} \frac{σ}{β})}^{k})

(brown), and the means of the averaged values of

{∥F^{\frac{σ}{β}} \circ F_{2 \frac{σ}{β}} (\hat{φ}) - φ∥}_{\infty}

(blue) for

α \in {50, 55, 60, \dots, 100}

and

β \in {3, 4, 5, \dots, 13}

, when L varies in the set

{1, 2, \dots, 10}

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Frosini, P.; Gridelli, I.; Pascucci, A. A Probabilistic Result on Impulsive Noise Reduction in Topological Data Analysis through Group Equivariant Non-Expansive Operators. Entropy 2023, 25, 1150. https://doi.org/10.3390/e25081150

AMA Style

Frosini P, Gridelli I, Pascucci A. A Probabilistic Result on Impulsive Noise Reduction in Topological Data Analysis through Group Equivariant Non-Expansive Operators. Entropy. 2023; 25(8):1150. https://doi.org/10.3390/e25081150

Chicago/Turabian Style

Frosini, Patrizio, Ivan Gridelli, and Andrea Pascucci. 2023. "A Probabilistic Result on Impulsive Noise Reduction in Topological Data Analysis through Group Equivariant Non-Expansive Operators" Entropy 25, no. 8: 1150. https://doi.org/10.3390/e25081150

APA Style

Frosini, P., Gridelli, I., & Pascucci, A. (2023). A Probabilistic Result on Impulsive Noise Reduction in Topological Data Analysis through Group Equivariant Non-Expansive Operators. Entropy, 25(8), 1150. https://doi.org/10.3390/e25081150

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Probabilistic Result on Impulsive Noise Reduction in Topological Data Analysis through Group Equivariant Non-Expansive Operators

Abstract

1. Introduction

2. Mathematical Setting

2.1. Representing Data as Real Functions

2.2. GENEOs as Operators Acting on Data

2.3. Persistence Diagrams

2.4. Comparing Persistence Diagrams

3. Our Model

4. Cutting Off the Noise by GENEOs

5. Our Main Results

6. Examples and Experiments

6.1. Examples

6.1.1. First Example

6.1.2. Second Example

6.2. Experiments

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI