A Probabilistic Result on Impulsive Noise Reduction in Topological Data Analysis through Group Equivariant Non-Expansive Operators

In recent years, group equivariant non-expansive operators (GENEOs) have started to find applications in the fields of Topological Data Analysis and Machine Learning. In this paper we show how these operators can be of use also for the removal of impulsive noise and to increase the stability of TDA in the presence of noisy data. In particular, we prove that GENEOs can control the expected value of the perturbation of persistence diagrams caused by uniformly distributed impulsive noise, when data are represented by L-Lipschitz functions from R to R.


Introduction
In the last thirty years Topological Data Analysis (TDA) developed as a useful mathematical theory for analyzing data, benefiting from the reduction of dimensionality guaranteed by topology [3,5,11].One of the main tools in TDA is the concept of persistence diagram, which is a collection of points in the real plane, describing the homological changes of the sublevel sets of suitable continuous functions.These changes give important information about the data of interest, focusing on some of their most relevant properties.Persistence diagrams can be used in the presence of noise, since a well-known stability theorem states that these topological descriptors change in a controlled way when we know that the functions expressing the filtrations we are interested in change in a controlled way with respect to the sup-norm [8].Furthermore, L p -stability of persistence diagrams with respect to the sup-norm have been proved in [9].Unfortunately, in many applications the sup-norm of the noise is not guaranteed to be small, and hence these results cannot be directly applied.In particular, these results cannot be directly used when data are affected by impulsive noise.
Analogously, in the discrete setting of TDA the presence of outliers in cloud points can drastically affect the corresponding persistence diagrams.The problem of managing outliers has been studied by several authors by different techniques.In [12] an approach based on confidence sets has been introduced.A method inspired from the k-nearest neighbors regression and the local median filtering has been used in [4], while the concept of bagplot has been applied in [1].In [16] an approach based on reproducing kernels has been proposed.
In our paper we start exploring a different probabilistic approach, in the topological setting.The main idea is that studying the properties of data is a process that should be primarily based on the analysis of the observers that are asked to examine the data, since we cannot ignore that different observers can differently judge the same data.This approach has been initially proposed in [2] and requires both the definition of the space of Group Equivariant Non-Expansive Operators (GENEOs) and the development of geometrical techniques to move around in this space [10].In other words, in this model we should not wonder how we have to manage the data but rather how we have to manage the observers (i.e., GENEOs) analyzing the data.As an initial step in this direction, in this paper we show how GENEOs can be used to get stability of persistence diagrams of 1D-signals in the presence of impulsive noise.
GENEOs have been studied in [13] as a new tool in TDA, since they allow for an extension of the theory that is not invariant under the action of every homeomorphism of the considered domain.This is important in applications where the invariance group is not the group of all homeomorphisms, such as the ones concerning shape comparison.Interestingly, GENEOs are also deeply related to the foliation method used to define the matching distance in 2-dimensional persistent homology [6,7] and can be seen as a theoretical bridge between TDA and Machine Learning [2].Furthermore, these operators make available lower bounds for the natural pseudo-distance d G (ϕ 1 , ϕ 2 ) := inf g∈G ϕ 1 − ϕ 2 • g ∞ , associated with a group G of self-homeomorphisms of the domain of the signals ϕ 1 , ϕ 2 [13].
In our paper, we prove that GENEOs can control the expected value of the perturbation of persistence diagrams caused by uniformly distributed impulsive noise, when data are represented by L-Lipschitz functions from R to R. In order to do that, we choose a "mother" bump function ψ, i.e. a continuous function that is non-negative, upper bounded by 1 and compactly supported, and we assume that our noise is made up of finite bumps each obtained by translating, heightening and/or widening ψ.The function φ = ϕ + R represents the corrupted data, where the noise is given by the and some positive integer k.
In this situation, trying to use a convolution to approximate our starting data is not effective, because even if it does contract the bumps, it does not cut them, and hence does not improve the sup-norm distance from the original data.A classical approach for the removal of impulsive noise is the one of using a median filter [15].Although this approach would be quite efficient in the discrete case, let us remark that in our setting we are considering continuous functions.The analogous of the median for the continuous case would be defined as the interval of those values m such that where f is a density of probability.However, this operator is not stable: a small alteration of the starting function could lead to a significant change of the median.
The operators we consider in this paper are F δ = max(ϕ(x − δ), ϕ(x + δ)) and F ε = min(ϕ(x − ε), ϕ(x + ε)).The main idea is that F δ cuts the noise "directed downwards" and F ε the noise "directed upwards", and hence their composition should be able to eliminate all the bumps.These operators are GENEOs with respect to isometries of the real line.We prove that moving in the space of GENEOs by taking suitable values for ε and δ, we can get quite close to restoring the original function ϕ, depending on how the bumps are positioned.The closer the bumps are to being Dirac delta functions and the further they are from each other, the better our approximation can be.On the ground of this result, we finally get an estimate of the expected value The paper is structured as follows.In Section 2 the mathematical background is laid.The case we consider and our notation are explained in Section 3. In Section 4 we prove the results that are needed in order to demonstrate our main result.In Section 5 the main theorem giving us a probabilistic upper bound is formulated.In Section 6 some examples and experiments are presented in order to better illustrate the use of our results.A brief discussion concludes the paper.

Mathematical setting
In this section we will recall some basic concepts we will use in this paper.

Representing data as real functions
Let us consider a set Φ of bounded functions from a set X to R, which will represent the data we wish to compare.We shall call Φ the set of admissible measurements on X.We endow Φ with the topology induced by the sup-norm • ∞ and the corresponding distance D Φ .A pseudo-metric D X can be defined on X by setting We recall that a pseudo-metric on a set X is a distance d without the property d(x, y) = 0 =⇒ x = y.We will consider the topological space (X, τ D X ) where τ D X is the topology induced by D X .A base for this topology is given by the open balls {B(x, r) := {x ∈ X : D X (x, x ) < r} with x ∈ X, r ∈ R}.The choice of this topology makes every function in Φ a continuous functions.As shown in [2], this fact enables us to use persistence diagrams in the study of Φ.

GENEOs as operators acting on data
We are interested in considering transformations of data.Let Homeo Φ (X) be the set of Φ-preserving homeomorphisms from X to X with respect to the topology τ D X , meaning that every g in Homeo Φ (X) is a homeomorphism of X such as both ϕ • g and ϕ • g −1 belong to Φ for every ϕ in Φ.Let G be a subgroup of Homeo Φ (X).G represents the set of transformations on data for which we will require equivariance to be respected.Under the previously stated assumptions we call the ordered pair (Φ, G) a perception pair.We can now introduce the concept of GENEO.
Definition 1 Let (Φ, G) and (Ψ, H) be perception pairs and assume that a homomorphism T : G → H is given.A function F : Φ → Ψ is called a Group Equivariant Non-Expansive Operator (GENEO) from (Φ, G) to (Ψ, H) with respect to T if the following properties hold: Let us now consider the set F all of all GENEOs from (Φ, G) to (Ψ, H) with respect to T : G → H.The space F all is endowed with the extended pseudometric D F all , defined by setting D F all (F 1 , F 2 ) = sup ϕ∈Φ D Ψ (F 1 (ϕ), F 2 (ϕ)) for every F 1 , F 2 ∈ F all .The word extended refers to the possibility that D F all takes an infinite value.

The following result can be proven [2]:
Theorem 1 If (Φ, D Φ ),(Ψ, D Ψ ) are compact and convex then the metric space (F all , D F all ) is compact and convex.
If a non-empty set F ⊆ F all is fixed, we can define the following pseudodistance D F ,Φ on Φ: This pseudo-distance allows us to compare data by taking into account how agents operate on data.Notice how, if G becomes larger, the natural pseudo-distance d G becomes harder to compute but this new pseudo-distance D F ,Φ becomes easier to evaluate.In order to find a lower bound for D F ,Φ it is useful to introduce the notion of persistence diagram.

Persistence Diagrams
We will now recall some basic definitions and results in persistent homology.The interested reader can find more details in [11].
Let us consider an ordered pair (X, ϕ) where X is a topological space and ϕ : X → R is a continuous function.For any t ∈ R we can set X ) between the k th homology groups of X u and X v .We can define the k th persistent homology group, with respect to ϕ and computed at the point (u, v), as P H k (u, v) := i k u,v (H k (X u )).Moreover, we can define the k th persistent Betti numbers function r k (u, v) as the rank of P H k (u, v).
The k th persistent Betti numbers function can be represented by the k th persistence diagram.This diagram is defined as the multi-set of all the ordered pairs (u j , v j ), where u j and v j are the times of birth and death of the j th kdimensional hole in X, respectively.We call time of birth of a hole the first time at which the homology class appears, and time of death the first time at which the homology class merges with an older one.When a hole never dies, we set its time of death equal to ∞.We also add to this set all points of the form (w, w) for w ∈ R.
In Figure 1 the filtration of the set X := 0, 3 4 π given by the function ϕ(x) = 2 sin x is illustrated.In this example, the topology on X is the one defined by the space Φ of admissible functions given by all functions a sin(x+b) with a, b ∈ R. The reader can easily check that this topology coincides with the Euclidean topology.The persistence diagram in degree k = 0 of the function ϕ is displayed in Figure 2.

Comparing Persistence Diagrams
Persistence diagrams can be efficiently compared by means of a suitable metric d match .In order to define it, we first define the pseudo-distance For every degree k we can now define a new pseudo-metric: where D F (ϕ1) , D F (ϕ2) are the persistence diagrams at degree k of the functions F (ϕ 1 ), F (ϕ 2 ), respectively.In this paper we will limit ourselves to considering data represented as functions from R to R, and we recall that for this kind of data persistence diagrams are non-trivial only in degree k = 0 (i.e., when persistent homology is used to count connected components).For this reason, in the following we will always assume k = 0. We will assume our noise to be made up of a finite number of copies of a "mother" nonnegative continuous bump function ψ : R → R, such that supp(ψ) ⊆ ]−σ, σ[ for some σ > 0 and ψ ∞ ≤ 1.We recall that the support of a function is the closure of the set of points where f is non-vanishing.After fixing two positive real numbers η, β, the noise we will be adding is a function R belonging to the space R η,β that contains the null function and all functions of the form where k is a positive integer and a i , b i , c i are real numbers such that |c i − c j | ≥ η for i = j and b i ≥ β for every index i.
Our purpose will be the one of recovering ϕ ∈ Lip L as well as possible from the function φ = ϕ + R.An example of such situation is depicted in Figure 4.
The following result will be of use.
Proof For every ϕ ∈ C 0 (R), g ∈ G we have that

Cutting off the noise by GENEOs
We start by introducing two families of GENEOs from (C 0 (R), G) to (C 0 (R), G) with respect to the identical homomorphism.
Definition 3 Let ϕ ∈ Lip L and ε > 0. For all x ∈ R we define: Proposition 3 The maps F ε and Fε are GENEOs from with respect to the identical homomorphism.
Proof We start by proving that F ε is G-equivariant.
Let g be the translation Since every isometry in G can be written as the composition of a symmetry and a translation, our statement follows.
Since Proposition 2 shows that the composition of GENEOs is still a GENEO, the operator with respect to the identical homomorphism.
We want now to prove that if a function φ is obtained by adding impulsive noise to a function ϕ then the value of is bounded, and possibly small, provided that δ and ε are suitably chosen.The main idea is that the operator F ε cuts the noise "directed upwards" and F δ cuts the noise "directed downwards".
In order to proceed, we need two lemmas.
Lemma 1 Let R ∈ C 0 (R, R), ϕ ∈ Lip L for some L ∈ R, and set φ := ϕ + R. Then for any ε > 0 and δ > 0 Proof Since ϕ is Lipschitz of constant L we have that for any value The same steps applied to F δ yield the second statement of the lemma.As for the last claim we can see that: Analogously, we can prove the lower bound.
Henceforth we will assume that any summation on an empty set of indexes is the null function.
Proof We will suppose without loss of generality that c i < c i+1 for all i = 1, . . ., k −1 and a i = 0 for all indexes i.We want to show that at least one of x − ρ and x + ρ must always be outside Let us now consider {I − 1 , I + 1 , . . ., I − k , I + k }.We will now prove that any two distinct elements from this set must be disjoint.Since we have just proven that . ., k, the following holds in the case i < j: 1.
Now, given c i + ρ and c j − ρ centers of bumps of Fρ(R), we have that By noticing that F ρ (R) = −Fρ(−R) we get the second part of the thesis.
Let us remark that, in particular, if λ = 2 σ β then F ρ (R), F ρ (R) ∈ R 4 σ β ,β .We observe that in order to exist a ρ that satisfies the hypotheses of Lemma 2, the inequality η ≥ 4λ must hold.
We are now actually ready to prove a key result in our paper.
Therefore, we have proved that Let us remark that Theorem 4 works under the (only) implicit assumption that θ ≥ 8 σ β .In our setting this should not be restrictive since it means that the noise added is made up of scattered, thin bumps, without any reference to the height of the bumps: this is what we expect when considering additive impulsive noise.
Corollary 5 and the well known stability of persistence diagrams with respect to the max-norm [8] immediately imply the following result, which is of interest in TDA (the symbol d match denotes the usual bottleneck distance between persistence diagrams).

Our main result
We are now ready to prove our main result.We start by stating a lemma concerning the probability p that any two distinct points in a randomly chosen set of cardinality k in an interval of length have a distance greater than η.A proof of the following lemma is provided in [14]: for the reader's convenience, we report it here.
Lemma 3 Let X 1 , . . ., X k , with k ≥ 2, be independent random variables, uniformly distributed on the interval [0, ], for some > 0. Let be the minimal distance between two distinct random variables.Then we have Proof It suffices to consider the case 0 < η < k−1 .By symmetry, we have where η} and Leb denotes the Lebesgue measure.Setting y i = x i − (i − 1)η for i = 1, . . ., k, we have that Leb(S) = Leb(S ) where On the other hand, again by symmetry, we have k! and plugging this last identity into (1) we get the thesis.
We can now prove the following result, concerning the expected value of the error Proof By setting δ = σ β and ε = 2 σ β in statement iii) of Lemma 1, we have that Since the operator , we can apply Theorem 4 by setting δ = σ β , ε = 2 σ β and θ = 8 σ β , and hence we obtain that Theorem 7 and the well known stability of persistence diagrams with respect to the max-norm [8] immediately imply the following result, which is of interest in TDA (the symbol d match denotes the usual bottleneck distance between persistence diagrams).
Corollary 8 Let us make the same assumptions of Theorem 7. Let D and D be the persistence diagrams in degree 0 of the filtering functions ϕ and This result shows that the use of suitable GENEOs can make TDA (relatively) stable also in the presence of impulsive noise, under the assumptions we have considered in this paper.

Examples and experiments
We will now validate our approach based on GENEOs by giving two examples and illustrating some experimental results.

Examples
In order to verify how our approach works, we will set τ and consider the upper bound Lτ n , obtained by applying Corollary 5. We observe that τ n ≥ 2 σ β for every index n, and lim n→+∞ τ n = 2 σ β .We will examine two examples that use the GENEOs F τn 2 • F τn and show how our method based on such operators and the method based on convolutions differ, as for their capability in removing additive impulsive noise.Moreover, we will compare the actual error Lτ n , by running several simulations.The convolutions that will be applied in our examples use the functions T h : R → R defined by setting for h > 0. We will see that, although the convolution with such functions is also a GENEO, it will not be able to efficiently remove the noise.Our noise function R will be constructed starting from the mother function ψ defined by setting ψ(x) := e After producing random values for the parameters N, a i , b i , c i , our algorithm checks whether η > 8 σ β , otherwise it generates another set of parameters.

First example
Let us consider the function  for x ∈ R. We observe that ϕ ∈ Lip L , for L = 1.We will add noise in the interval [−4π, 4π] (meaning = 4π) and visualize the results in such an interval.Figure 5 illustrates how the function φ looks like compared to ϕ.We will start by considering how well the convolution φ * T n can approximate the original function ϕ, when n goes from 3 to 100.From Figure 6, it is immediately apparent that the max-norm distance between φ * T n and ϕ remains quite large.If we apply a convolution with T 1 n , for 3 ≤ n ≤ 100, we get the results displayed in Figure 7, showing that all information represented by the function ϕ is progressively destroyed.In contrast, if we apply the operator F τn 2 • F τn ( φ), for n = 3, 5, 20, 100, we get the results displayed in Figure 8.As we can see, this operator is much more efficient in removing the bumps and restoring the function ϕ.
As a matter of fact, when we apply a convolution with the function T h and check the corresponding errors via the sup-norm, we get the results displayed In contrast, if we apply the operator F τn 2 •F τn , we get the results displayed in Figure 10, showing that the upper bound for the error stated in Corollary 5 is quite tight.As we can expect, we get the best denoising by replacing τ n with 2 σ β (see Figure 11).Finally, we executed 1000 simulations.In each of them we have produced a function φ by adding random impulsive noise to the function ϕ, then we have applied F σ β • F 2 σ β to φ, in order to see how tight the upper bound in Corollary 5 is to the actual error The same parameters as in the beginning of this example have been used.As we can see in Figure 12 the overestimation committed by our upper bound is often quite close to zero, relatively to the Lipschitz constant L of the function ϕ.This suggests that such an upper bound is quite accurate.for x ∈ R. The coefficient 1 1000 was chosen in order to get that the Lipschitz constant L is comparable to the one of the previous example.In this example L = 27 25 .Figure 13 illustrates how the function φ looks like compared to ϕ.We will start by considering how well the convolution φ * T n can approximate the original function ϕ, when n goes from 3 to 100.From Figure 14, it is immediately apparent that the max-norm distance between φ * T n and ϕ remains quite large.

Second example
If we apply a convolution with T 1 n , for 3 ≤ n ≤ 100, we get the results displayed in Figure 15, showing that all information represented by the function ϕ is progressively destroyed.In contrast, if we apply the operator F τn 2 • F τn ( φ), for n = 3, 5, 20, 100, we get the results displayed in Figure 16.As we can see, this operator is much more efficient in removing the bumps and restoring the function ϕ.
When we apply a convolution with the function T h and check the corresponding errors via the sup-norm, we get the results displayed in Figure 17.In contrast, if we apply the operator F τn 2 •F τn , we get the results displayed in Figure 18, showing that the upper bound for the error stated in Corollary 5 is quite tight.As we can expect, we get the best denoising by replacing τ n with 2 σ β (see Figure 19).Finally, we again executed 1000 simulations, using the same methodology as in the previous case, this time considering the polynomial presented at the beginning of this example.As we can see in Figure 20 the overestimation committed by our upper bound is often quite close to zero, relatively to the Lipschitz constant L of the polynomial ϕ.

Experiments
In order to check how good the upper bound stated in Theorem 7 is, we have made the following experiment.
Finally, the graph of the Lipschitz function ϕ on [0, ] has been obtained by connecting each point (x i−1 , y i−1 ) to (x i , y i ) with a segment, for i ∈ {1, . . ., N + 1}.We observe that ϕ constructed this way is an L-Lipschitz function.
Secondly, we have used the parameters α > 0, β > 0, k ∈ N, a 1 , . . .ϕ produced in the previously described way, we have considered the function In Figures 21 and 22 some examples of the functions we have produced are displayed, for N = 3 and N = 7, respectively.In each figure, the functions are displayed without noise (left) and with added noise (right).
In a second step, we have fixed = 20 and σ = 1.1 once again, and considered a probabilistic model assuming that α, β, L are given and the values N , k, a i , b i , c i are random variables.In this setting, we have compared the noise φ − ϕ ∞ with the probabilistic upper bound stated in Theorem 7 and the value F , representing the reduced noise that we can obtain by applying our method.In order to average our results, for each triplet (α, β, L) with α ∈ {50, 55, 60, . . ., 100}, β ∈ {3, 4, 5, . . ., 13}, and L ∈

Conclusion
In our paper we have proved a stability property for persistence diagrams of functions from R to R, in the presence of impulsive noise.This property shows that TDA can also be of use when noise drastically changes the topology of the sublevel sets of the filtering functions we are considering, and stresses some new possible interaction between TDA and the theory of GENEOs.The experimental section shows that our approach is indeed able to remove the impulsive noise, in most of the cases.It would be interesting to check the  possibility of extending our method to real-valued functions defined on ndimensional domains, by selecting suitable GENEOs.We plan to devote our research to this topic in the future.

Fig. 1
Fig. 1 How the sublevel sets change with respect to the filtration induced by ϕ.

Fig. 2
Fig.2Persistence diagram of the function 2 sin x on 0,3  4 π .The point (0, ∞) describes the existence of a connected component that is born at zero and never dies.The point ( √ 2, 2) claims that there is a connected component born at √ 2 that dies (merges with the other one) at 2. The trivial points on the diagonal u = v are not displayed.

Fig. 4
Fig. 4 In the figure on the left we have our original function ϕ, in the middle a noise function R and in the right figure the corrupted function φ := ϕ + R.
Fig. 17Example 2: Error made by applying a convolution with Tn (left) or T 1 n

Fig. 21
Fig. 21 Five examples of the functions we have produced for N = 3.In each figure, the functions are displayed without noise (left) and with added noise (right).

Fig. 22
Fig. 22 Five examples of the functions we have produced for N = 7.In each figure, the functions are displayed without noise (left) and with added noise (right).