Generalized Persistence for Equivariant Operators in Machine Learning

: Artiﬁcial neural networks can learn complex, salient data features to achieve a given task. On the opposite end of the spectrum, mathematically grounded methods such as topological data analysis allow users to design analysis pipelines fully aware of data constraints and symmetries. We introduce an original class of neural network layers based on a generalization of topological persistence. The proposed persistence-based layers allow the users to encode speciﬁc data properties (e.g., equivariance) easily. Additionally, these layers can be trained through standard optimization procedures (backpropagation) and composed with classical layers. We test the performance of generalized persistence-based layers as pooling operators in convolutional neural networks for image classiﬁcation on the


Introduction
Artificial neural networks (ANNs) can approximate arbitrarily complex functions provided that they are fed with a high-quality, sufficiently large training set. Users do not need to develop a profound knowledge of the data involved in the task. However, they do not control the features that the network will learn to solve the task. The lack of control makes it difficult to predict the generalization capacity of ANN-based analysis pipelines, causing pathologies such as vulnerability to adversarial attacks [1] or requiring the investigation of custom data augmentation algorithms [2] to reduce errors or tame unwanted behaviors caused by noisy features interpreted as salient by the network. On the contrary, the topological persistence (TP) requires the user to explicitly declare-under the form of a continuous real-valued function-which features of the data are relevant to tackle a given task. This procedure does not require harvesting training data and gives the user complete control over the features used to solve the task.
Both the ANN and TP frameworks have apparent drawbacks; gathering the massive training datasets required to train complex neural architectures can be extremely hard. Symmetries of the data can be leveraged to reduce the dimensionality of the parameter space of ANNs and learn efficiently from smaller datasets [3][4][5]. However, it is often difficult to adapt such constructions to arbitrary features. Instead, in the TP frameworks, accessing sufficient information to determine what features are needed to achieve the desired results can be equally daunting. Moreover, TP is bound to topological data types (e.g., triangulable manifolds). Thus, data lacking such topological structures need to be mapped-often via complex transformations-to topological spaces. See [5][6][7] for examples of topological constructions mapping graphs to simplicial complexes.
Interactions between TP and deep learning are of broad interest [8][9][10]. Ideally, combining the two methods would yield constrainable and learnable models composable with state-of-the-art neural networks. However, we believe that the need to map data to topological objects and express their features as critical points of continuous functions hinders the development of TP-inspired neural layers.

Aim
At the crossroad between artificial neural network and topological persistence, we provide an algorithmic approach to the design of learnable persistence-based layers focusing on generality in terms of data types, usability through a streamlined user interface, and flexibility. We do this by building on the framework of rank-based persistence [11], which allows us to avoid auxiliary topological constructions mapping data to topological objects. This approach broadens the spectrum of applicability of our solutions, naturally including data types such as undirected and directed graphs, and metric spaces. Finally, in the attempt of simplifying the TP's algorithmic pipeline, we leverage the notion of persistent features [12]. Persistent features allow us to define learnable, persistence-based neural network layers based on Boolean features of the data, rather than defining them as continuous functions. Finally, such layers can be easily constrainable with respect to data symmetries.

Contribution
In this study, we define and provide constructive examples of operators based on persistent features and discuss their properties, particularly constrainability and noise robustness. After showcasing these properties on images, we introduce an original neural network layer, namely the persistent feature-based layer. We base our proposal on two primary principles. On the one hand, we aim to learn relevant (persistent) features directly from data points. On the other hand, we provide simple strategies to take advantage of locality and equivariance, which are two critical notions for analyzing structured data. We provide an algorithm for a persistence-based pooling operator, test it on several architectures and datasets, and compare it with some classical pooling layers.

Structure
In Section 2, we retrace the interplay between networks and topological persistence, in particular for the use of the latter in a neural network. Section 3 intuitively introduces the central mathematical concepts involved in the definition of a persistence-based layer: persistent homology, rank-based persistence, and persistent feature. In Section 4, we devise an image-filtering algorithm based on persistent features, discuss its main properties, and provide examples. Thereafter, we define a persistence-based neural network layer and specialize it to act as a pooling layer in convolutional neural networks. Computational experiments in Section 5 evaluate and compare the performance of the proposed pooling layer in an image-classification task on several datasets and on two different architectures. In the same section, we provide a qualitative analysis of the most salient features detected by the persistence-based pooling, the classical max-pooling, and LEAP [13].

State of the Art
The interplay between topological persistence (TP) and networks is nowadays a wide and ramified research area; it dates back at least to [6], where the clique and neighborhood complexes were built on a time-varying network for application in statistical mechanics. TP was then applied to graph-derived simplicial complexes in several contexts: polymers, collaboration networks, various aspects of brain connections, social networks, language families, and many more.
Comparing large networks in search of a possible isomorphism is not only computationally unfeasible but also nonsensical; so a notion of weak isomorphism [14] and an interesting pseudo-metric on the space of all networks [15] were introduced. A natural strategy is the reduction to a set of invariants; along the same lines, the same authors applied persistent homology to the Dowker and clique complexes of a directed network [16,17] and to its path homology [18]. The stability and convergence of the derived invariants are studied in a comprehensive paper: [19]. Two interesting papers on similar problems are: [20,21].
TP is used in [22] for assessing a sort of complexity of a neural network: each edge is given a weight derived from the activation function; then, the diagram of the persistent Betti number in degree zero is computed. Finally, the p-th "neural persistence" of the network is defined as the p-norm of the diagram. A "persistence interaction detection" framework is the core of [23]. This shows how TP can be of use in the analysis of neural networks; conversely, a neural network can be employed to directly produce the persistence image of a given picture [24].
The first paper we know, in which TP is part of a neural network, is [25], which adopts as the input layer a (vectorized) persistence diagram of the object to be classified. This is a strategy that was then refined by [26,27] and was followed in different forms by a large number of researchers. A similar technique, based on the "element specific persistent homology" is applied in a series of papers (starting from [8]) for the prediction of proteinligand binding affinity; see the rich bibliography of [10]. In [28,29], the Wasserstein-1 distance between persistence diagrams contributes to the loss function of a deep neural network for 3D segmentation.
Neural networks that have graphs as input, benefit quite naturally from topological expedients for down-scaling: [30][31][32][33]. Coming closer to the subject of our research, a form of topological pooling based on persistent homology was defined in [34] for pose recognition; a message reweighting graph convolution is also based on TP in [35]. TP-based pooling is the subject of two well-structured papers: [36,37]. Still, these articles are based on the classical pipeline: constructing weighted simplicial complexes, getting a filtration, and computing persistent homology modules. Our approach differs in that we bypass the simplicial and homological passages, thanks to a generalization that produces persistence diagrams directly from graph-theoretical features.

Background
In the following paragraphs, we sketch and provide references for the essential mathematical constructions that motivate and inspire the idea of persistent-based layers: persistent homology, rank-based persistence, and steady persistent features.

Rank-Based Persistence
Persistent homology requires three main ingredients:
The homology functor S k mapping topological spaces to finite vector spaces; 3.
A notion of rank, e.g., the dimension for vector spaces or cardinality for sets [38].
In Figure 1, we show how considering a topological sphere X ⊂ R 3 filtered by the height function f : X → R (x, y, z) → z yields a persistence diagram whose points correspond to the maxima and minima of X with respect to the vertical axis. We refer the reader to [39] for details on topological persistence and persistent homology.
We choose to frame our work in a more general context than homological persistence, namely rank-based persistence [11]. Although topological persistence and persistent homology have been extended in several ways, e.g., [40][41][42][43][44], rank-based persistence allows us to work directly in the category of the data of choice, rather than topological spaces. The authors assume an axiomatic standpoint based on the three building blocks of homological persistence mentioned above, and the authors generalize persistence to categories and functors other than topological spaces and homology. Importantly, under a few assumptions, persistence built in the rank-based framework still guarantees funda-mental properties such as flexibility (dependence on the filtering function), stability [45], and robustness [46]. In this setting, we can work with data types such as images and time series, without intermediary topological constructions. We refer to [11] for details and provide a list of analogies between classical and rank-based persistence in Table 1.

An algorithm for computing persistence
Persistence is computed through an algorithm mirroring the one we described in Algorithm 7.1. Let K be a triangulation of X, andf : K ae X a monotone function such thatf (· ) 6f ( ‡) if · is a face of ‡. Consider an ordering of the simplices of K, such that each simplex is preceded by its faces andf is non-decreasing. This ordering allows to store the simplicial complex in a boundary matrix B, whose entries are defined as filtration induced by the sub-level sets of a tame functions f . Moreover, the lifespan of the homology classes represented by a cornerpoint corresponds to its distance from the diagonal. Thus, noisy and persistent homological classes are represented by cornerpoints lying near to or far from the diagonal, respectively.

Bottleneck distance
Persistence diagrams are simpler than the shape they represents and describe its topological and geometrical properties, as they are highlighted by the homological critical values of the function used to build the filtration. The bottleneck distance allows to compare such diagrams.
Definition 7.2.7. Let X be a triangulable topological space and f, g : X ae R two tame functions. The bottleneck distance between D k (f ) and D k (g) is In Figure 7.9 a bijection between two k-persistence diagrams is depicted. Corner points belonging to the two diagrams are depicted in orange and yellow, respectively. Observe how the inclusions of the points of allows the comparison of multisets of points whose underlying set has di erent cardinality (see Section 3.1 for a definition of multiset) by associating one of the purple points to one of the points lying on the diagonal.
An important property of persistence diagrams is their stability. A small perturbation of the tame function f produces small variations in the persistence diagram with respect to the bottleneck distance. filtration induced by the sub-level sets of a tame functions f . Moreover, the lifespan of the homology classes represented by a cornerpoint corresponds to its distance from the diagonal. Thus, noisy and persistent homological classes are represented by cornerpoints lying near to or far from the diagonal, respectively.

Bottleneck distance
Persistence diagrams are simpler than the shape they represents and describe its topological and geometrical properties, as they are highlighted by the homological critical values of the function used to build the filtration. The bottleneck distance allows to compare such diagrams.
Definition 7.2.7. Let X be a triangulable topological space and f, g : X ae R two tame functions. The bottleneck distance between D k (f ) and D k (g) is In Figure 7.9 a bijection between two k-persistence diagrams is depicted. Corner points belonging to the two diagrams are depicted in orange and yellow, respectively. Observe how the inclusions of the points of allows the comparison of multisets of points whose underlying set has di erent cardinality (see Section 3.1 for a definition of multiset) by associating one of the purple points to one of the points lying on the diagonal.
An important property of persistence diagrams is their stability. A small perturbation of the tame function f produces small variations in the persistence diagram with respect to the bottleneck distance.  Table 1. Analogy between the classical and rank-based persistence frameworks.

Classical Framework Categorical Framework
Topological spaces Arbitrary source category C Vector spaces Regular target category R Dimension Rank function on R Homology functor Arbitrary functor from C to R Filtration of topological spaces (R, ≤)-indexed diagram in C

Persistent Features
In the spirit of the aforementioned generalization, ref. [12] (Section 2.2) introduces the concept of steady persistent features for weighted graphs.
A weighted graph is a pair (G, f ), where G = (V, E) is a graph defined by a set of vertices V and edges E, and f is a function assigning (tuples of) real-valued weights to the edges of G, in symbols f : The weighting function naturally induces a sublevel set filtration For an intuition, see Figure 2a,b. Let S = 2 V∪E be the set of all subsets of elements (vertices and edges) of G. Let F be a graph-theoretical property, e.g., local degree prevalence, independence. The persistent feature F : S → {true, false} associated with F is a Boolean mapping returning true if the property F holds for a certain subset s ∈ S and false otherwise. Symmetrically to the topological persistence framework, we evaluate F for every subset of every {S i } i . See Figure 2b. Then, we compute the steadiness σ of each subset s along the filtration by counting subsequent sublevels such that F (s) = true. As in [12], we refer to σ((G, f , F )) as the steady persistence of the feature F on (G, f ). This construction yields a persistence diagram. See Figure 2c.
Sublevel 0 Sublevel 1 Sublevel 2 (c) Figure 7.9: A matching between two k-persistence diagrams. The bijections between elements of the diagrams is denoted using left-right arrows.

An algorithm for computing persistence
Persistence is computed through an algorithm mirroring the one we described in Algorithm 7.1. Let K be a triangulation of X, andf : K ae X a monotone function such thatf (· ) 6f ( ‡) if · is a face of ‡. Consider an ordering of the simplices of K, such that each simplex is preceded by its faces andf is non-decreasing. This ordering allows to store the simplicial complex in a boundary matrix B, whose entries are defined as The  filtration induced by the sub-level sets of a tame functions f . Moreover, the lifespan of the homology classes represented by a cornerpoint corresponds to its distance from the diagonal. Thus, noisy and persistent homological classes are represented by cornerpoints lying near to or far from the diagonal, respectively.

Bottleneck distance
Persistence diagrams are simpler than the shape they represents and describe its topological and geometrical properties, as they are highlighted by the homological critical values of the function used to build the filtration. The bottleneck distance allows to compare such diagrams. Definition 7.2.7. Let X be a triangulable topological space and f, g : X ae R two tame functions. The bottleneck distance between D k (f ) and D k (g) is In Figure 7.9 a bijection between two k-persistence diagrams is depicted. Corner points belonging to the two diagrams are depicted in orange and yellow, respectively. Observe how the inclusions of the points of allows the comparison of multisets of points whose underlying set has di erent cardinality (see Section 3.1 for a definition of multiset) by associating one of the purple points to one of the points lying on the diagonal.
An important property of persistence diagrams is their stability. A small perturbation of the tame function f produces small variations in the persistence diagram with respect to the bottleneck distance. filtration induced by the sub-level sets of a tame functions f . Moreover, the lifespan of the homology classes represented by a cornerpoint corresponds to its distance from the diagonal. Thus, noisy and persistent homological classes are represented by cornerpoints lying near to or far from the diagonal, respectively.

Bottleneck distance
Persistence diagrams are simpler than the shape they represents and describe its topological and geometrical properties, as they are highlighted by the homological critical values of the function used to build the filtration. The bottleneck distance allows to compare such diagrams. Definition 7.2.7. Let X be a triangulable topological space and f, g : X ae R two tame functions. The bottleneck distance between D k (f ) and D k (g) is In Figure 7.9 a bijection between two k-persistence diagrams is depicted. Corner points belonging to the two diagrams are depicted in orange and yellow, respectively. Observe how the inclusions of the points of allows the comparison of multisets of points whose underlying set has di erent cardinality (see Section 3.1 for a definition of multiset) by associating one of the purple points to one of the points lying on the diagonal.
An important property of persistence diagrams is their stability. A small perturbation of the tame function f produces small variations in the persistence diagram with respect to the bottleneck distance.

Persistence-Based Layers
In the following sections, we first build a persistence-based operator that can act as a filter on grayscale images (the operator can also be applied to RGB images treating each channel independently). This construction follows naturally from the definition of steady persistent function. Then, we discuss the main properties of such operators: locality and equivariance. Finally, we specialize our construction to operate as a pooling layer in a convolutional neural network.

Persistent Features as Equivariant Filters
Locality and equivariance are crucial features for convolutional neural networks, and in general for any group-equivariant model. Indeed, we have:

1.
The intensity of a pixel in a grayscale image carries knowledge only when compared to neighboring pixels; 2.
Identical configurations located in different regions of the image (translated) should be recognized as such by the model, as is the case for convolutional neural networks [47].
Mathematically, a function f is equivariant with respect to the action of a group G if f (gx) = g f (x). See [4,9,48,49] for an overview on and examples of equivariant machine learning models

Locality
In this setting, considering the notion of the persistent feature introduced in Section 3.2, we think about an image as a graph in which vertices are pixels and edges connecting adjacent pixels. Patches (or windows in a time series) of size k around a pixel (point) correspond to k-distance neighborhoods of the vertex associated with such a pixel.

Flexibility
Persistent features require a filtered space to be computed. Thus, after associating a graph to an image (or time series), we define the weighting function f : S → R n , where S is the set of all subsets of V ∪ E and n ∈ N. Importantly, f can carry additional information about the original data. For instance, when considering images, one can associate with each vertex the intensity value of its underlying pixel, and leverage this information to compute appropriate weights in a process reminiscent of message passing in graph neural networks [50]. The weighting function f induces a sub-level set filtration of the graph associated with each pixel of the image. See Figure 3a,b for an example.

Equivariance
Once the pair (G, f ) has been associated with our data, the proposed construction is naturally equivariant with respect to translation. Indeed, sublevel set filtrations and persistent features are totally determined by the weights and connectivity of the graphs at each filtration level. It is important to notice that the flexibility of the proposed solution makes it possible to further control the equivariance of the operator. For instance, weighting functions that only depend on pairwise intensity values will generate operators that not only are equivariant with respect to translations, but also to isometries (translation, rotations, reflection, and scaling) of the original data.

Parametrization
We can add parameters to the weighting function f and the feature F to create operators endowed with more complex equivariance and learnable parameters. Let S t = ι −1 ((−∞, t]) be a sublevel of G naturally induced by the intensity of pixels (edges are added when the vertices they connect are added). We define G k m,n (v, t) as the feature mapping v to true if more than m and less than n pixels in the k-distance neighborhood of v have an intensity less than t. Figure 4a shows how varying parameters m and n allow us to highlight radically different aspects of a binary image.  Remark 1. The operator G defined above does not rely on the 2-dimensional structure of the selected patch. Thus, its steady persistence diagram is not only invariant with respect to translations, but also to permutations. This kind of equivariance meshes the standard convolutional equivariance with the fully-connected input/output representation typical of dense layers of an artificial neural network.

Robustness
In the context of both topological and rank-based persistence, robustness means that small changes in the image do not give rise to significant perturbations of the corresponding persistence diagram. This concept is defined in [12] for persistent features and dubbed balancedness. In [51], it is formally shown that G is a balanced feature in the sense of [12]. Robustness to noise is showcased in Figure 4b. Additionally, and in line with the principles of functional data analysis [52], the proposed operator is adapted to work on continuous data and multiple resolutions: salient features are maintained across different resolutions, as shown in [53] in the context of persistent images; in computational experiments, the authors demonstrate how downsampling the persistence diagram does not affect the classification accuracy.

A Steady, Persistent-Feature Layer
The filter G and its steady persistence introduced in Section 4.1 is endowed with features inherited from its mathematical foundation, which make it suitable for tackling typical machine learning tasks:

1.
G enhances the signal in correspondence of abrupt changes (a max-pooling filter could be blind to such features); 2.
The steady persistence σ(G) yields invariant representations of the input with respect to the group of isometry; 3.
Salt-and-pepper noise does not impair the quality of detected features.
These properties motivate implementing and testing σ(G) as a complement to pooling and convolutional operators in standard artificial neural networks. In the following paragraphs, we discuss the implementation of a pooling layer with learnable parameters based on persistence features. The same operator can be easily adapted to work as a convolution-like layer.

Persistence-Based Pooling
We consider an image I and split it in a collection of patches {P(h, w) i } of size (h, w) ∈ N 2 . For every pixel p ∈ P i , we compute σ(G k m,n )(p) for some fixed values of m, n and k (padding is added if needed). This procedure, which computationally boils down to sorting and slicing operations, yields a persistence diagram with a point per pixel in P i . Indeed, we compute the operator G for every P i , padding the patch whenever necessary. Symmetrically to the classical max-pooling operator, the maximum persistence-i.e., the distance from the diagonal-shall determine the value to be associated with the entire patch and its downsampling. As an example, see Figure 5a,b, and the top row of (c).

Parametrization
Alternatively, and following the idea of learnable pooling operators, e.g., [13], we propose to learn weights Λ to modulate the contribution of the non-zero component of the persistence diagram, as depicted in Figure 5c. Because the persistence value associated with each pixel is continuous and G is balanced, it is possible to learn such weights through standard backpropagation. The steady persistence operator σ(G) yields a persistence diagram associating a persistence value-namely, the distance from the diagonal of the points of the persistence diagram on the right of the panel-with each of the pixels of the considered patch. (c) We associate to each patch either the maximum persistence or a weighted sum of the persistence values associated with each pixel. In the latter case, weights are learned through gradient descent.

Computational Experiments
We assess the performance of the suggested pooling layer, embedding it into two neural architectures. First, we compare the performance of the persistence-based pooling with some classical pooling implementations. Then, we compute saliency maps (projections of the network's gradients onto the input images) to highlight qualitative differences across the considered pooling layers.

Datasets
We perform computational experiments on the MNIST [54], Fashion-MNIST [55], and CIFAR-10 [56] datasets. The MNIST and Fashion-MNIST datasets are composed of grayscale images labeled according to ten classes of hand-written digits and fashion articles, respectively. Both datasets are composed of 60,000, 28 by 28 pixels, black and white images for training, and 10,000 for testing. The CIFAR-10 dataset is composed of 50,000, 32 pixels by 32 pixels, RGB images for training that belong to ten classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. The test is performed on 10,000 additional labeled images.

Architectures
We test the proposed layer via two neural network architectures depicted in Figure 6. We designed the first architecture to provide the simplest configuration, in which different pooling operators could be compared during a supervised task where parameters are learned via gradient propagation. The second architecture is more akin to the standard convolutional neural network topology in a simple image-classification task; we alternate the convolutional and pooling layers before a dense classifier is provided.

Convolution
Pooling Convolution Dense toy neural network where a pooling operator is applied directly to input images. The downsampled output of the pooling layer is passed to a dense classifier. In other words, first, the pooled output is flattened, then its dimensionality is reduced to match the number of classes to be predicted by the network. (b) As common practice for CNNs, this architecture alternates convolutional and pooling layers before the dense classifier.

Training
We use sparse categorical cross-entropy as a loss function [57]. Optimization is conducted in batches of 32 images through the ADAM optimizer [58] with a learning rate of 3 × 10 −4 . We train each model for 100 epochs and make use of early stopping [59] with parameters min_delta = 0 and patience = 3.

Results
We tested the architectures presented in Section 5.2 on image classification on the MNIST, Fashion-MNIST, and CIFAR-10 datasets. We adapted the persistence-based and convolutional layers to work with the RGB images of CIFAR-10. Specifically, the persistencebased pooling layer treats channels independently; thus, multiple channels are processed in parallel, ignoring their interactions.
Additionally, we considered a combination between the learnable persistence-based pooling and max-pooling layers. The two pooling approaches are combined by adding the value obtained through max-pooling on a specific patch to the weighted sum defined in Section 4.2 and depicted in Figure 5c. In this way, the weights learned during training combine and modulate the contribution of persistent features detected in each patch and the maximum intensity value of the pixels therein.
We quantify the performance of each network through a standard accuracy metric. We list the results in Table 2. The proposed layers, outperform the max-pooling and LEAP on both datasets and architectures. The combination of max-and persistence-based-pooling realizes the maximum performance in three out of six experiments. We believe this result points to the complementarity of the two approaches.

Qualitative Comparison of Salient Features
Pooling operators aim to select salient features of their input. Grad-CAM heatmaps [60] are a visualization tool for highlighting regions of the input that contributed most to the model's prediction. Figure 7 showcases Grad-CAM heatmaps obtained through the considered architectures on random input samples. As expected, the persistence-based pooling and its combination with max-pooling yield similar yet distinct heatmaps. In particular, although non-topological, the persistence-based pooling seems to capture geometrical and topological features of the images, such as corners and regions that would cause the foreground image to disconnect. See Figure 7a

Conclusions
At the crossroad between artificial intelligence and generalizations of topological data analysis, we propose original constructions of neural network layers, taking advantage of the formalism of rank-based persistence. Building on persistent features, we define a convolution-like operator that can be tailored to specific tasks by imposing equivariance through simple invariants. Such invariants can be defined easily by considering features of the data at hand rather than mapping data to the category of topological spaces. Thanks to its mathematical foundation relying on steady persistence and inherited by persistent homology, the proposed operators enjoy mathematically guaranteed properties such as noise robustness and stability to perturbations. We showcase these properties by explicitly defining an image filter operator, relying on steady persistence and testing it in an edgedetection task on noisy images. We compare the performance of the proposed filter against state-of-the-art edge detectors, namely Canny and Sobel operators, comparing the mean squared error and peak signal-to-noise ratio achieved by the three methods on images affected by the salt and pepper noise. In our tests, the proposed operator outperforms competitors in detecting edges and image restoration (noise removal).
Locality, equivariance, and robustness to noise make the proposed class of operators a plausible complement to existing neural layers. Indeed, neural network layers are designed to be as agnostic as possible to the intrinsic properties of their input data. These features make them incredibly general and, at the same time, challenging to constrain to specific data features. This generality induces problematic behaviors and hinders the intelligibility of neural networks' learning patterns [61][62][63]. The proposed persistence-based layers, allowing the user to select relevant (persistent) features a priori, pave the road to the design and implementation of hybrid strategies leveraging the data-driven, learnable nature of artificial neural networks and the flexibility of the persistence-based approach.
We devise and implement a special class of the persistent-features-based layer called persistent-based pooling. The proposed layer uses the previously defined edge detection operator to act as a pooling layer in convolutional neural networks. A natural choice of operation for persistence-based pooling is to consider the maximum persistence realized in the diagram associated with each input image patch. We implement an alternative parametrized version that is equipped with learnable weights and easily combinable with standard max-pooling. We test this learnable layer version in image classification on three benchmark datasets. We compare the classification accuracy achieved by our layers with max-pooling and LEAP and embed the pooling layer into two neural architectures (dense and convolutional, respectively). Persistence-based pooling realizes a better performance than its competitors across architectures and datasets. Finally, utilizing Grad-CAM heatmaps, we visualize salient features of input samples to provide a qualitative comparison across the considered pooling layers. There, the persistence-based pooling seems to retrieve relevant geometrical properties of the images.
This approach meshes well with the framework developed in [64], where inputs, outputs, and weights of a neural network layer are expressed as functions on smooth or discrete spaces. There, we demonstrated how several classes of linear neural network layers can be expressed as a combination of the function pullback (from a smaller to a larger space), pointwise multiplication of functions, and integration along fibers. We believe that, by combining the two approaches, it will be possible to define general parametric nonlinear layers, where the architecture is defined through the parametric spans introduced in [64], whereas the geometry-aware nonlinear computation descends from the methods developed here.