Optimizing the Simplicial-Map Neural Network Architecture

Simplicial-map neural networks are a recent neural network architecture induced by simplicial maps defined between simplicial complexes. It has been proved that simplicial-map neural networks are universal approximators and that they can be refined to be robust to adversarial attacks. In this paper, the refinement toward robustness is optimized by reducing the number of simplices (i.e., nodes) needed. We have shown experimentally that such a refined neural network is equivalent to the original network as a classification tool but requires much less storage.


Introduction
In spite of the undoubted advantages of deep learning techniques for classification tasks [1], many important problems remain still unsolved in this context. In particular, if we focus on the efficiency of such models, one of their main drawbacks is the huge amount of resources needed for training competitive networks (for instance, vision models based on the transformer architecture need billions of images to be trained [2]). In many cases, only big companies can support the expensive cost of training competitive architectures [3,4]. From a practical point of view, one of the open research lines in deep learning is the exploration of ways to reduce training resources without reducing the accuracy of trained models.
One way to reduce time (to train the model) and space (to store the training data set) is to take a small subset of the training data set that summarizes its useful information. Several authors have explored this idea. For example, in [5], a data set representative of the training data set was considered. In [6], techniques of active learning were applied to classify images using convolutional neural networks. In [7], the authors reduced the volume of the training data set using stochastic methods. Other authors, in [8], replaced the training data set with a small number of synthetic samples containing all the original information.
Another approach tries to reduce the number of training parameters by pruning the model. This is a general technique in machine learning and it has a long tradition in neural networks [9]. The importance of pruning neural networks has emerged in recent years due to the big amount of resources required in deep learning [10][11][12]. Since local search techniques based on back propagation play a central role in weight optimization, the different pruning techniques can be classified using such a training process as the main reference. For example, there are studies where pruning occurs at the end of the training process [13], after the training process [14] or in the early stages of the training process [15].
Topological data analysis (TDA) provides a completely different approach to reducing the number of resources in the neural network classification process. In [16], the authors provided a constructive approach to the problem of approximating a continuous function on a compact set in a triangulated space. Once a triangulation of the space is given, a twohidden-layer feedforward network with a concrete set of weights called a simplicial-map neural network is computed. The construction is based on several strong theorems from algebraic topology and allows one to avoid the heavy process of optimizing the weights of neural networks since they can compute the weights directly from the triangulation of the space. Later, in [17], the authors showed that simplicial-map neural networks can be defined to be robust to adversarial attacks of a given size.
Simplicial-map neural networks are vaguely related to margin-based classifiers such as support vector machines (SVMs) and to nonparametric methods such as k-nearest neighbors (k-NN). These algorithms are widely used and, in both cases, there exist efforts to study their robustness to adversarial examples such as [18] in the case of k-NN or [19] for SVMs. Simplicial-map neural networks are not trained but defined on a triangulation of the data set and the decision boundaries are based on that triangulation. One of the greatest advantages of this approach is the possibility of formal proof of different properties such as universal approximation ability and, as previously mentioned, robustness against adversarial examples. However, both properties are based on barycentric subdivisions of the triangulation with a large increase in required storage as the number of simplices grows, this being a bottleneck for its applicability.
In this paper, we propose an algorithm to reduce the number of parameters of simplicialmap neural networks without reducing their accuracy. The key to the proposed method is that barycentric subdivisions, in particular, and triangulations of training data sets, in general, introduce many simplices that are not needed or redundant. The paper is organized as follows. In Section 2, we recall some basic concepts. In Section 3, we provide the description of our methodology. The description is illustrated with some examples in Section 4. We finish the paper with some conclusions and hints for future work.

Background
In [16,17], a new approach to construct neural networks based on simplicial maps was introduced. Roughly speaking, a combinatorial structure (a simplicial complex) K is built on top of a labeled data set using Delaunay triangulations to, lately, construct a neural network based on a simplicial map defined between K and a simplicial complex with just one maximal simplex. This section is devoted to recall some of the basic concepts used in such construction.
The research field of neural networks is exponentially growing and recently, many different architectures, activation functions, and regularization methods have been introduced; thus, it is difficult to find a general definition that covers all the cases. In this paper, we adapt a definition from [20] that fits into our purposes. From now on, n, m, d, k denote positive integers and 1, n denote the set of integers {1, . . . , n}.
Definition 1 (adapted from [20]). A multilayer feedforward network defined between spaces X ⊆ R d and Y ⊆ R k is a function N : X → Y composed of m + 1 functions: where the integer m > 0 is the number of hidden layers and, for i ∈ 1, m + 1 , the function f i : where X 0 = X, X m+1 = Y, and X i ⊆ R d i for i ∈ 1, m ; d 0 = d, d m+1 = k, and d i > 0 being an integer for i ∈ 1, m (called the width of the ith hidden layer); W (i) ∈ M d i−1 ×d i being a real-valued d i−1 × d i matrix (called the matrix of weights of N ); b i being a point in R d i (called the bias term); and φ i being a function (called the activation function). We will call the width of the neural network to the maximum width of hidden layers.
Throughout this paper, neural networks will be considered as classification models.

Definition 2. A labeled data set D is a finite set of pairs
where, for j, h ∈ 1, n , p j = p h if j = h, and j represents a one-hot vector. We say that j is the label of p j or, equivalently, that p j belongs to the class j . We will denote by D P the ordered set of points p j j .
Given a data set and a set of neural networks that only differ in their weights, the supervised classification problem consists in finding an available neural network in the set that provides the best classification for the data set. Since neural networks in the set only differ in their weights, finding the best neural network is equivalent to find the best possible weights. Again, several definitions of the concept of supervised classification problem can be provided, mainly depending on the method used to look for the possible weights and the concept of improvement chosen to define the best option.
In this paper, the concept of supervised classification problem for neural networks is defined as follows.
Definition 3. Given a labeled data set D ⊂ R d × E k , an integer m > 0, and a set of activation functions φ i for i ∈ 1, m , a supervised classification problem consists of looking for the weights W (i) and bias terms b i for i ∈ 1, m , such that the associated neural network N : If such a neural network N exists, we will say that N characterizes D, or, equivalently, that N correctly classifies D.
The process to search for optimal weights is usually called the training of the neural network. The training most commonly used is based on backpropagation [21]. Nevertheless, in this paper, the optimal weights are not searched through an optimization process. Instead, a combinatorial structure is built on top of the training samples and a function called simplicial map is defined on it; then, a special kind of neural network named simplicial-map neural network is constructed. In order to recall the definition of simplicial-map neural network, we start by recalling the definitions of convex hull and convex polytope.

Definition 4.
The convex hull of a set S ⊂ R d , denoted by conv(S), is the smallest convex set containing S. If S is finite, then conv(S) is called a convex polytope and denoted by P. The set of vertices of a convex polytope P is the minimum set V P of points in P such that P = conv(V P ).
Our construction of simplicial-map neural networks is based on the simplicial complex obtained after a triangulation of the given convex polytope. Let us now recall the concept of simplicial complex.

Definition 5.
Let us consider a finite set V whose elements will be called vertices. A simplicial complex K consists of a finite collection of nonempty subsets (called simplices) of V such that:

1.
Any subset of V with exactly one point of V is a simplex of K called 0-simplex or vertex.

2.
Any nonempty subset of a simplex σ is a simplex, called a face of σ.
A simplex σ with exactly k + 1 points is called a k-simplex. We also say that the dimension of σ is k and write dim σ = k. A maximal simplex of K is a simplex that is not face of any other simplex in K. The dimension of K is denoted by dim K and it is the maximum dimension of its maximal simplices. The set of vertices of a simplicial complex K will be denoted by K (0) . A simplicial complex K is pure if all its maximal simplices have the same dimension.
An example of simplicial complex is the Delaunay complex defined from the Voronoi diagram of a given finite set of points. Definition 6. Let S = {p 1 , . . . , p n } be a finite set of points in R d in general position. The Voronoi cell V (p i , S) is defined as: The Voronoi diagram of S, denoted as V (S), is the set of Voronoi cells: The Delaunay complex of S can be defined as: The following lemma is just another view of the definition of Delaunay complexes. Given d > 0, an embedding of a simplicial complex K in the d-dimensional space R d is usually called a geometric realization of K, and it will be denoted by |K|.
One of the key ideas along this paper is that a triangulation can be refined by successive subdivisions of the simplicial complex obtained from the triangulation. There are many different ways to obtain a subdivision of a simplex; in our case, we will use the barycentric subdivision. Definition 7. Let K be a simplicial complex with vertices in R d . The barycentric subdivision Sd K is the simplicial complex defined as follows. The set (Sd K) (0) of vertices of Sd K is the set of barycenters of all the simplices of K. The simplices of Sd K are the finite nonempty collections of (Sd K) (0) that are totally ordered by the face relation in K. That is, any k-simplex σ of Sd K can be written as an ordered set {w 0 , . . . , w k } such that w i is the baricenter of µ i , being µ i a face of µ j ∈ K for i, j ∈ 0, k and i < j. In particular, if σ is maximal, then there exists a d-simplex {u 0 , . . . , u d } ∈ K satisfying that w i is the barycenter of {u 0 , . . . , u i } for i ∈ 0, d .
Let us introduce now the notion of simplicial approximation, which is a simplicial map defined on two simplicial complexes K and L that approximates a given continuous function g between the geometric realization of K and L. First, we recall the concept of vertex maps between two simplicial complexes. Definition 8. Given two simplicial complexes K and L, a vertex map ϕ (0) : K (0) → L (0) is a function from the vertices of K to the vertices of L such that for any simplex σ ∈ K, the set A vertex map defined on the vertices of a simplicial complex K can be linearly extended to a continuous function on the whole simplicial complex K.

Definition 9.
The simplicial map ϕ c : |K| → |L| induced by the vertex map ϕ (0) : K (0) → L (0) is a continuous function defined as follows. Let x ∈ |K|. Then, x ∈ |σ| for some simplex σ ={u 0 , . . . , u k } of K. So, x = ∑ i∈ 0,k λ i u i being λ i ≥ 0, for all i ∈ 0, k and ∑ i∈ 0,k λ i = 1. Then, Intuitively, a simplicial approximation between two simplicial complex K and L is a simplicial map that preserves the star of a vertex. Recall that for a vertex v of K (0) , the star of v, denoted by st v, is the set of simplices of K having {v} as a face.
Definition 10. Let g : |K| → |L| be a continuous function between the geometric realization of two simplicial complexes K and L. A simplicial map ϕ c : |K| → |L| induced by a vertex map Next, the main definition used in this paper is recalled. Given a simplicial map between the geometric realizations of two finite pure simplicial complexes, a two-hidden-layer feedforward network can be built. Such neural network is called a simplicial-map neural network and the value of its weights can be exactly computed from the vertex map associated to the simplicial map. In other words, there is no need to train the neural network to find the optimal weights. This way, N ϕ = f 3 • f 2 • f 1 being f i (y) = φ i (W (i) ; y; b i ), for i ∈ 1, 3 , defined as follows.
being h = j(r + 1) and = i(t + 1) for i ∈ 1, n ; j ∈ 1, m ; t ∈ 0, d ; and r ∈ 0, k . The function φ 2 is defined as: As shown in [17], simplicial-map neural networks can be used for classification purposes. Given a labeled data set D ⊂ R d × R k , we first compute a convex polytope P surrounding D. Second, we compute the Delaunay complex K = D (S) of the set S =D P ∪ V P and define a simplicial complex L composed of a maximal simplex σ = {v 0 , . . . , v } such that its dimension is equal to the number of classes of D. Finally, a vertex map that induces a simplicial-map neural network that correctly classifies D is defined as follows (see Proposition 4 in [17]): However, this simplicial-map neural network is not robust to adversarial attacks as shown in Proposition 5 in [17]. To construct simplicial-map neural networks robust to adversarial attacks of a given bounded size, the idea is to define a width decision boundary through barycentric subdivisions. Nevertheless, with each barycentric subdivision iteration, the number of simplices grows as it is claimed in Remark 1 of [16].
Once we have introduced all the necessary notions to explicitly construct a neural network to solve a classification problem, we present a methodology to reduce the size of such a network without hindering its performance.

Description of the Methodology
In this section, we propose a methodology to reduce the size of a simplicial-map neural network used for classification tasks.
Recall that given a labeled data set D with k classes, the process to obtain a simplicialmap neural network that correctly classifies D is: (1) to compute a convex polytope P surrounding D; (2) to compute the Delaunay complex K of the set D P ∪ V P ; (3) to compute a vertex map ϕ (0) from the vertices of K to the vertices of a simplicial complex L with only one maximal k-simplex; and (4) to compute a simplicial-map neural network N ϕ : |K| → |L|, from the simplicial map ϕ c .
However, this simplicial-map neural network N ϕ , as many other neural networks, can suffer the attack of adversarial examples. In [17], a method to increase the robustness of the simplicial-map neural network to such attacks was developed by applying successive barycentric subdivisions to K and L depending on the desired robustness. However, the iteration of barycentric subdivisions results in the exponential growth of the number of simplices. Therefore, the storage and computational cost of the simplicial map ϕ c and the simplicial-map neural network N ϕ grow exponentially.
In order to avoid this problem, in this paper, we propose a method to reduce the storage and computational cost of the simplicial-map neural network N ϕ : |K| → |L| by removing points of the given labeled data set D but keeping exactly the same accuracy as N ϕ . The idea is to remove those simplices from K whose vertices belong all to the same class. Therefore, those simplices with vertices in the decision boundary remain, leaving the decision boundary invariant.
Let us now formalize this idea. Let D = (p j , j ) : j ∈ 1, n , p j ∈ R d , j ∈ E k be a data set and let N ϕ be the simplicial-map neural network obtained using the process described above. Our aim is to obtain a subsetD that induces a simplicial-map neural networkÑφ with exactly the same behavior than N ϕ . The procedure is described in Algorithm 1.

Algorithm 1 Simplicial-map neural network optimization
Result: The simplicial-map neural networkÑφ; Input: A labeled data set D, a convex polytope P surrounding D, and the simplicial-map neural network N ϕ : |K| → |L| that correctly classifies D; M := ∅; In Section 4, using a high-dimensional data set composed of digit images, we check experimentally that both simplicial-map neural networksÑφ and N ϕ have the same behavior. The following partial result also supports that idea.

Lemma 2.
Let D be a labeled data set, let N ϕ : |K| → |L| be the simplicial-map neural network that correctly classifies D, constructed following the method given in [17], and letÑφ be the simplicial-map neural network obtained from Algorithm 1. If σ = {v 0 , . . . , v n } ∈ K satisfies that N ϕ (v i ) = N ϕ (v j ) for some i = j, thenÑφ(x) = N ϕ (x) for all x ∈ |σ|.
Proof. Let σ = {v 0 , . . . , v n } be a simplex of K such that N ϕ (v i ) = N ϕ (v j ) for some i = j. Then, σ is a face of a maximal simplex µ of K with all its vertices belonging toD P ∪ V P . Therefore, µ is a maximal simplex ofK (by Lemma 1) andÑφ(x) = N ϕ (x) for any x ∈ |µ|. Since σ is a face of µ thenÑφ(x) = N ϕ (x) for any x ∈ |σ|.
In order to illustrate Algorithm 1, let us consider the two-dimensional labeled data set D given in Figure 1. Let us consider a square surrounding the data set as the convex polytope P, and let us compute the Delaunay complex K = D (D P ∪ V P ) as shown in Figure 2. Then, K is composed of 24 points and 42 2-simplices. Applying Algorithm 1 is equivalent to remove those 2-simplices of K whose vertices belong, all of them, to the same class. Then, we consider only the vertices of the surviving 2-simplices and the Delaunay complex is computed again. In that case, the resultant simplicial complex is composed of 18 points and 30 2-simplices (see Figure 2).

Lemma 3.
If the points of D P ∪ V P are in general position, then the reduced simplicial neural networkÑφ can always be computed from Algorithm 1.
Proof. If the points of D P ∪ V P are in general position, then any subset of points of D P ∪ V P are in general position, so the the Delaunay triangulation ofD P ∪ V P can always be computed, as well as the simplicial-map neural networkÑφ.   Figure 1 together with the vertices of the square polytope surrounding D. On the right, the Delaunay triangulation of a subset of D obtained as described in Algoritm 1. As we can see, the triangles whose vertices belonged all to the same class disappeared.
Let us notice that, depending on the distribution of the data set, the reduction obtained after applying Algorithm 1 can be significant or not. Specifically, if the different classes of D are not mixed, then we can expect good results of Algorithm 1. The reduction will be optimum when the data set is separable and dense. In such case, most of the simplices would have vertices of the same class and be removed when Algorithm 1 is applied. An example of these two opposite cases are shown in Figure 3. (2) the data set obtained after applying Algorithm 1 to D; (3) a data set that cannot be reduced by applying Algorithm 1.

Experiments
In this section, a high-dimensional data set composed of digit images is considered. In this case, for visualization purposes, the data set is firstly embedded to obtain a lowdimensional representation using the UMAP algorithm [24]. The data set is composed of 1797 greyscale images of dimension 8 × 8. These images represent digits from 0 to 9. In Figure 4, some of the images are shown and, in Figure 5, the two-dimensional UMAP output is displayed, representing the full data set. In order to illustrate our method by providing a graphical intuition, we will focus on the 2D representation of the digits data set, but the construction can be conducted with any dimension of the input.
Let us focus on the 1797 two-dimensional points of the UMAP representation of the digits data set D depicted in Figure 5, and let us consider a square P surrounding such a cloud of points D P . According to [17], a simplicial-map neural network N ϕ can be built in order to correctly classify D. Now, let us apply Algorithm 1 to obtain a simplified version of N ϕ that also correctly classify D. This way, all of the points in D P surrounded by points belonging to the same class were removed to obtain a reduced data setD inducing the same simplicial-map neural network than D. In Figure 5, the two-dimensional representation of the reduced data set is shown. The next step is the computation of the Delaunay triangulation using the data setD and the vertices of the square P. In Figure 6, the Delaunay triangulation is shown for both the original and the simplified data set. The Delaunay triangulation of the original data set is composed of 3596 2-simplices, whereas the Delaunay triangulation of the simplified data set is composed of 604 2-simplices and 305 points reaching a remarkable reduction in the number of simplices. The results are summarized in Table 1. Finally, the induced simplicial-map neural networks were experimentally compared obtaining exactly the same performance.
Lastly, Algorithm 1 was experimentally tested for synthetically generated two-and three-dimensional data sets. The numerical results can be found in Tables 2 and 3, respectively. Let us point out that in the three-dimensional data set with a greater amount of points, the reduced data set has a reduction of approximately 73%, inducing the same simplicial-map neural network.
The code of the experimentation can be consulted in https://github.com/Cimagroup/ DelaunayTriangAndNN (accessed on 30 August 2021). Figure 4. Some of the 1797 images used in the experiment. The images are labeled from 0 to 9 in a natural way. Each image is grey-scaled and has 8 × 8 pixels, so it can be represented as a point in R 64 . In order to visualize such 1797 64-dimensional points, R 64 has been projected into R 2 using the UMAP algorithm. Figure 5 shows the projection on R 2 of the 1797 images.

Conclusions
Simplicial-map neural networks are a recent neural network architecture based on simplicial maps defined between a triangulation of the given data set and a simplicial complex encoding the classification problem. These neural networks are refined by applying barycentric subdivisions to ensure their robustness. The iterative application of barycentric subdivisions increases the number of simplices exponentially. Therefore, the width of the neural network also increases exponentially. In this paper, we have provided a way to reduce the number of simplices but maintaining the performance of the neural network. The proposed method has been experimentally tested. As further work, we plan to formally prove that our optimized simplicial-map neural networkÑφ is equivalent to the original one N ϕ .