Simplicial-Map Neural Networks Robust to Adversarial Examples

Paluzo-Hidalgo, Eduardo; Gonzalez-Diaz, Rocio; Gutiérrez-Naranjo, Miguel A.; Heras, Jónathan

doi:10.3390/math9020169

Open AccessArticle

Simplicial-Map Neural Networks Robust to Adversarial Examples

by

Eduardo Paluzo-Hidalgo

^1,*,†

,

Rocio Gonzalez-Diaz

^1,†,‡

,

Miguel A. Gutiérrez-Naranjo

^2,†,‡

and

Jónathan Heras

^3,‡

¹

Department of Applied Mathematics I, University of Seville, 41012 Seville, Spain

²

Department of Computer Sciences and Artificial Intelligence, University of Seville, 41012 Seville, Spain

³

Department of Mathematics and Computer Sciences, University of La Rioja, 26006 Logroño, Spain

^*

Author to whom correspondence should be addressed.

^†

These authors are partially supported by MICINN, FEDER/UE under grant PID2019-107339GB-100.

^‡

These authors contributed equally to this work.

Mathematics 2021, 9(2), 169; https://doi.org/10.3390/math9020169

Submission received: 11 December 2020 / Revised: 8 January 2021 / Accepted: 12 January 2021 / Published: 15 January 2021

(This article belongs to the Special Issue Computational Algebraic Topology and Neural Networks in Computer Vision)

Download

Browse Figures

Versions Notes

Abstract

Broadly speaking, an adversarial example against a classification model occurs when a small perturbation on an input data point produces a change on the output label assigned by the model. Such adversarial examples represent a weakness for the safety of neural network applications, and many different solutions have been proposed for minimizing their effects. In this paper, we propose a new approach by means of a family of neural networks called simplicial-map neural networks constructed from an Algebraic Topology perspective. Our proposal is based on three main ideas. Firstly, given a classification problem, both the input dataset and its set of one-hot labels will be endowed with simplicial complex structures, and a simplicial map between such complexes will be defined. Secondly, a neural network characterizing the classification problem will be built from such a simplicial map. Finally, by considering barycentric subdivisions of the simplicial complexes, a decision boundary will be computed to make the neural network robust to adversarial attacks of a given size.

Keywords:

algebraic topology; neural network; adversarial examples

1. Introduction

Adversarial examples are currently one of the main problems for the robustness of neural networks applications [1]. Broadly speaking, an adversarial example against a classification model occurs when a small perturbation on an input data point produces a change on its classification. Adversarial examples are usually associated with computer vision tasks [2]. In this context, small generally refers to changes that are not appreciable by human perception. Recently, several studies have shown that adversarial examples also appear in other contexts such as natural language processing [3], multivariate time series [4] or recommendation systems [5]. Therefore, the study of adversarial examples is of an undoubted importance for building reliable models, and also leads us to wonder about the mechanisms of our brain and the differences between artificial and natural classification processes.

Since the discovery of adversarial examples as a weakness for the safety of neural network models in real-world problems, many attacks and defenses have been proposed [6], each of which builds on the other. One of the most popular approaches to study models’ robustness and adversarial examples is based on the concept of margin. From a geometrical point of view, data are points in an d-dimensional metric space and a classifier splits such a metric space into regions. In a classification problem, each region is associated with a label and all the points in such a region are classified with the corresponding label. Roughly speaking, the margin of the model is the minimum distance between the training data and the decision boundary (i.e., the set of regions’ boundaries). In the literature, there are many approaches that try to maximize such a margin (see, for instance, [7]). The concept of margin in Neural Network is strongly influenced by its use in Support Vector Machines (SVMs) [8]. In such way, in [9], the final softmax layer of the neural network is replaced with a linear SVM. In [10], a way of reducing empirical margin errors was proposed, and in [11], the discriminability of Deep Neural Networks (DNNs) features is enhanced via an ensemble strategy.

In this paper, we explore this idea of margin between regions associated with labels from a novel point of view. To the best of our knowledge, this is the first time where adversarial examples are studied with techniques from Algebraic Topology. Our approach can be summarized as follows. The starting point is a classification problem where the data are d-dimensional vectors which are mapped onto a set of k labels with a one-hot representation. From a topological point of view, such set of instances can be seen as the vertices of a simplicial complex embedded in a bounded polytope in

R^{d}

(details are given below), and the set of k one-hot labels can be endowed with the structure of a k-dimensional simplex (in fact,

k + 1

labels are considered since we add an unknown label to the set of one-hot labels). In this way, a simplicial map between both topological structures arises in a natural manner, since each vertex in the simplicial complex corresponds to an instance of the dataset, and it is mapped onto the vertex of the simplex that represents the corresponding label. The second step is to apply the extended Simplicial Approximation Theorem [12] that allows us to provide a constructive proof of the Universal Approximation Theorem obtaining a a neural network that classifies correctly all the instances of the dataset. As shown in [12], all the weights of such a neural network can be computed directly from the simplicial complexes without any kind of training processes. Finally, by considering these ideas together with a subdivision process of the simplices, a decision boundary for the classification problem will be computed to make the neural network robust to adversarial attacks of a given size, since the mesh of a simplicial complex can be bounded by the number of subdivisions and the neural network obtained is based on the simplices used in the simplicial map.

Regarding other approaches to these ideas found in the literature, in [13], the authors proved the existence of a two-hidden-layer neural network which can approximate any continuous multivariable function with arbitrary precision, and, in [14], they provided a constructive method through a numerical analysis approach. Therefore, such papers can be seen as alternative constructive proofs to the Universal Approximation Theorem where no adversarial examples on classification problems were considered. A related approach that uses simplicial complexes to feed a neural network, is the concept of Simplicial Neural Network (SNN), provided in [15], that consists of a generalization of Graph Neural Networks (GNNs) with the property that compared to GNNs, SNNs exploit higher order relationships between the input data due to representing the data using simplicial complexes. Let us observe that, although having a similar name, our approach has a totally different goal.

The paper is organized as follows. In Section 2, all the basic concepts needed to understand the rest of the paper are presented. In Section 3, we introduce the concept of simplicial-map neural networks. Their use to build neural networks for classification tasks robust to adversarial attacks of a given size is presented in Section 4. The paper ends with conclusions and future works listed in Section 5.

2. Background

In this section, some of the preliminary concepts from Algebraic Topology and Neural Networks are recalled. Several useful references for this section are [16,17,18,19]. Let us notice that, in order to provide a bridge between Algebraic Topology and Neural Network, some concepts need to be reinterpreted.

Firstly, let us state some basic notation. Given two integers

j \leq m

, let

⟦ j, m ⟧ : = {i \in Z : j \leq i \leq m}

. Hereafter, let

k > 0

be an integer and let

e_{0}^{k} : = (0, \dots, 0)

be the origin of the Euclidean space

R^{k}

. Let a one-hot vector of length k be denoted as

e_{i}^{k} = (0, \overset{i - 1}{\dots}, 0, 1, 0, \overset{k - i}{\dots}, 0)

with

i \in ⟦ 1, k ⟧

. Let

E^{k} : = {e_{i}^{k} : i \in ⟦ 1, k ⟧}

be the set of all the one-hot vectors of length k. Let us observe that

E^{k + 1} = {e_{i}^{k} \times 0 : i \in ⟦ 1, k ⟧} \cup {e_{0}^{k} \times 1}

where

e_{i}^{k} \times 0 : = (0, \overset{i - 1}{\dots}, 0, 1, 0, \overset{k - i}{\dots}, 0, 0) = e_{i}^{k + 1}

for

i \in ⟦ 1, k ⟧

and

e_{0}^{k} \times 1 : = (0, \overset{k}{\dots}, 0, 1) = e_{k + 1}^{k + 1}

.

Now, we recall different fundamental structures such as polytopes and simplicial complexes. Convex polytopes can be seen as a generalization in any dimension of the notion of polygons.

Definition 1.

The convex hull of a set

S \subset R^{d}

, denoted by

conv (S)

, is the smallest convex set containing S. A convex polytope

P

in

R^{d}

is the convex hull of a finite set of points. Besides, the set of vertices of a convex polytope

P

is the minimum set

V_{P}

of points in

P

such that

P = conv (V_{P})

.

Accordingly, a convex polytope

P = conv (S)

is a closed bounded subset of

R^{d}

, and the set of vertices

V_{P}

of a convex polytope

P

always exists and it is unique. A particular case of convex polytopes are simplices. Geometrically, a simplex is a generalization of a triangle to any dimension. For example, a 0-simplex is a point, a 1-simplex is a line segment, a 2-simplex is a triangle, a 3-simplex is a tetrahedron, and so on. In this paper, all the considered simplicial complexes have their vertices in the Euclidean space

R^{d}

. Nevertheless, simplicial complexes can be defined abstractly.

Definition 2.

Let us consider a finite set V whose elements will be called vertices. A simplicial complex K consists of a finite collection of nonempty subsets (called simplices) of V such that:

1.: Any subset of V with exactly one point of V is a simplex of K called 0-simplex or vertex.
2.: Any nonempty subset of a simplex σ is a simplex, called a face of σ.

A simplex σ with exactly

k + 1

points is called a k-simplex. We also say that the dimension of σ is k and write

dim σ = k

. A maximal simplex of K is a simplex that is not face of any other simplex in K. The dimension of K is denoted by

dim K

and it is the maximum dimension of its maximal simplices. The set of vertices of a simplicial complex K will be denoted by

K^{(0)}

. For a vertex v of V, the star of v is the set of simplices having v as a face and it is denoted by

st v

. A simplicial complex K is pure if all its maximal simplices have the same dimension.

Let us consider a simplicial complex K whose vertices are in

R^{d}

. If a k-simplex

σ

of K satisfies that it is a set of affinely independent points, then its realization

| σ |

is the convex polytope

| σ | = conv (σ)

, which is the convex hull of its

k + 1

vertices. If all the simplices of K have a realization in

R^{d}

satisfying that the intersection of two realizations is the realization of a simplex of K, then the union of their realizations is a subspace of

R^{d}

denoted by

| K |

and called the embedding of K in

R^{d}

.

Next, the definition of triangulation of a convex polytope is recalled.

Definition 3.

A triangulation of a convex polytope

P

is a simplicial complex K such that

| K | = P

.

Let us recall that given a set

S = {p_{1}, \dots, p_{n}}

of points in

R^{d}

, its barycenter, denoted by

bar S

, is

bar S : = \frac{1}{n} \sum_{i \in ⟦ 1, n ⟧} p_{i} \in R^{d}

. In particular,

bar {p} = p

for

p \in R^{d}

. The barycentric subdivision of a simplicial complex will be the main tool to refine the neural network in the next sections and consists of getting a new simplicial complex by splitting the simplices in a standard way (see Figure 1). The t-th iteration of the barycentric subdivision of a simplicial complex K will be denoted by

{Sd}^{t} K

being

{Sd}^{0} K : = K

. Next definition provides a formalization of this idea.

Definition 4.

Let K be a simplicial complex with vertices in

R^{d}

. The barycentric subdivision

Sd K

is the simplicial complex defined as follows. The set

{(Sd K)}^{(0)}

of vertices of

Sd K

is the set of barycenters of all the simplices of K. The simplices of

Sd K

are the finite nonempty collections of

{(Sd K)}^{(0)}

which are totally ordered by the face relation in K. That is, any k-simplex σ of

Sd K

can be written as an ordered set

{w_{0}, \dots, w_{k}}

such that

w_{i} = bar μ_{i}

being

μ_{i}

a face of

μ_{j} \in K

for

i, j \in ⟦ 0, k ⟧

and

i < j

. In particular, if σ is maximal then there exists a k-simplex

{u_{0}, \dots, u_{k}} \in K

satisfying that

w_{i} = bar {u_{0}, \dots, u_{i}}

for

i \in ⟦ 0, k ⟧

.

Let us recall the definition of the Voronoi diagram of a set of points.

Definition 5.

Let

S = {p_{1}, \dots, p_{n}}

be a set of points in

R^{d}

. The Voronoi cell

V (p_{i}, S)

is defined as:

V (p_{i}, S) : = \{x \in R^{d} : | | x - p_{i} | | \leq | | x - p_{j} | |, \forall p_{j} \in S\} .

Then, the Voronoi diagram of S, denoted as

V (S)

, is the set of Voronoi cells:

V (S) : = \{V (p_{1}, S), \dots, V (p_{n}, S)\} .

From the Voronoi diagram

V (S)

, a particular simplicial complex, called the Delaunay complex of S and denoted as

D (S)

, can be constructed. Both structured can be computed in time

Θ (n log n + n^{⌈ \frac{d}{2} ⌉})

(see [17], Chapter 4).

Definition 6.

Given a finite set of points

S = {p_{1}, \dots, p_{n}}

in

R^{d}

and its Voronoi diagram

V (S) = {V (p_{1}, S), \dots, V (p s_{n}, S)}

, the Delaunay complex of S can be defined as:

D (S) : = \{ς \subseteq S : \cap_{p \in ς} V (p, S) \neq \emptyset\} .

The Delaunay complex is a well-defined concept in the sense that

D (S)

is always a simplicial complex [17]. Usually, a finite set of points

S \subset R^{d}

is said to be in general position when any subset of S with size at most

d + 1

is a set of affinely independent points. When the set of points

S \subset R^{d}

is in general position, then the embedding of the Delaunay complex

D (S)

in

R^{d}

is a triangulation of

P = conv (S)

. In Figure 2, an example of the computation of a triangulation of a convex polytope

P

, being the Delaunay complex of the set of vertices of

P

together with a labelled dataset lying in the interior of

P

, is provided.

Let us see now how to define maps between simplicial complexes.

Definition 7.

Given two simplicial complexes K and L, a vertex map

φ^{(0)} : K^{(0)} \to L^{(0)}

is a function from the vertices of K to the vertices of L such that for any simplex

σ \in K

, the set

φ (σ) : = {v \in L^{(0)} : \exists u \in σ, φ^{(0)} (u) = v}

is a simplex of L.

Let us observe that

φ (u) = φ^{(0)} (u)

if

u \in K^{(0)}

and the composition of vertex maps is a vertex map. Let us see now that a vertex map

φ^{(0)} : K^{(0)} \to L^{(0)}

can always be extended to a continuous function

φ^{c} : | K | \to | L |

satisfying that if

x = bar σ

then

φ^{c} (x) = bar φ (σ)

.

Definition 8.

The simplicial map

φ^{c} : | K | \to | L |

induced by the vertex map

φ^{(0)} : K^{(0)} \to L^{(0)}

is a continuous function defined as follows. Let

x \in | K |

. Then,

φ^{c} (x) : = \sum_{i \in ⟦ 0, k ⟧} λ_{i} φ^{(0)} (u_{i}),

being

λ_{i} \geq 0

, for all

i \in ⟦ 0, k ⟧

, such that

\sum_{i \in ⟦ 0, k ⟧} λ_{i} = 1 a n d x = \sum_{i \in ⟦ 0, k ⟧} λ_{i} u_{i},

where

σ = {u_{0}, \dots, u_{k}}

is a simplex of K such that

x \in | σ |

.

Next, we recall one of the key ideas in this paper. Simplicial maps can be used to approximate continuous functions as closed as desired.

Definition 9.

Let K and L be simplicial complexes and

g : | K | \to | L |

a continuous function. A simplicial map

φ^{c} : | K | \to | L |

induced by a vertex map

φ^{(0)} : K^{(0)} \to L^{(0)}

is a simplicial approximation of g if

g (| st v |) \subseteq | st φ (v) |

for each vertex v of K.

Let us notice that

| st x |

is thought here as an open set of points. That is,

| st x | : = \cup_{σ \in st x}

int σ

.

Theorem 1.

Simplicial Approximation Theorem ([20], p. 56) If

g : | K | \to | L |

is a continuous function between the underlying spaces of two simplicial complexes K and L, then there is a sufficiently large integer

t > 0

such that

φ^{c} : | S d^{t} K | \to | L |

is a simplicial approximation of g.

In Figure 3, an example of a simplicial approximation is provided. Theorem 1 was extended in [12] by introducing a bound to the distance between the continuous function and its simplicial approximation.

Proposition 1

(Simplicial Approximation Theorem Extension [12]).Given

ϵ > 0

and a continuous function

g : | K | \to | L |

between the underlying spaces of two simplicial complexes K and L, there exists

s, t > 0

such that

φ^{c} : | {Sd}^{s} K | \to | {Sd}^{t} L |

is a simplicial approximation of g and

| | g - φ^{c} | | \leq ϵ

.

Once concepts from Algebraic Topology have been stated, let us provide the definition of neural network, and a connection between these two fields using results from [12].

Definition 10

(adapted from [19]).Given

d, k > 0

, a multi-layer feed-forward network defined between spaces

X \subseteq R^{d}

and

Y \subseteq R^{k}

is a function

N : X \to Y

composed by

m + 1

functions:

N = f_{m + 1} \circ f_{m} \circ \dots \circ f_{1}

where the integer

m > 0

is the number of hidden layers and, for

i \in ⟦ 1, m + 1 ⟧

, the function

f_{i} : X_{i - 1} \to X_{i}

is defined as

f_{i} (y) : = ϕ_{i} (W^{(i)}; y; b_{i})

where

X_{0} = X

,

X_{m + 1} = Y

, and

X_{i} \subseteq R^{d_{i}}

for

i \in ⟦ 1, m ⟧

;

d_{0} = d

,

d_{m + 1} = k

, and

d_{i} > 0

being an integer for

i \in ⟦ 1, m ⟧

(called the width of the i-th hidden layer);

W^{(i)} \in M_{d_{i - 1} \times d_{i}}

being a real-valued

d_{i - 1} \times d_{i}

matrix (called the matrix of weights of

N

);

b_{i}

being a point in

R^{d_{i}}

(called the bias term); and

ϕ_{i}

being a function (called the activation function).

In the literature, many other definitions of neural networks are available. The field is continuously adding new ideas and there is not a general definition which covers all the possible approaches, but many of the problems where neural networks are applied are based on the idea of finding a set of weights and bias where the remaining features of the neural network (number of hidden layers, their dimension, and activation functions) are settled at the beginning of the problem. As usual, such set of features of the neural network beyond the weights and the bias, will be called the architecture of the neural network. A constructive method for approximating multidimensional functions with neural networks was provided in [12]. Such networks have two hidden layers and the weights are not obtained by a training method, but they are determined by a given simplicial map.

Theorem 2

(Theorem 4 of [12]).Let us consider a simplicial map

φ^{c} : | K | \to | L |

between the embedding of two finite pure simplicial complexes K and L of dimension d and k, respectively. Then a two-hidden-layer feed-forward network

N_{φ} : | K | \to | L |

such that

N_{φ} (x) = φ^{c} (x)

for all

x \in | K |

can be explicitly defined.

The construction of the neural network given in [12] to prove Theorem 2 gives rise to the concept of simplicial-map neural network introduced in the next section.

3. Simplicial-Map Neural Networks

The explicit construction of the neural network given in [12] is the main tool used in this paper for computing neural networks robust to adversarial attacks. Such a concrete construction is called simplicial-map neural network.

Definition 11.

Let K and L be two finite pure simplicial complexes of dimension d and k, respectively. Let us consider the simplicial map

φ^{c} : | K | \to | L |

induced by a vertex map

φ^{(0)} : K^{(0)} \to L^{(0)}

. Let

\{σ_{1}, \dots σ_{n}\}

be the maximal simplices of K, where

σ_{s} = \{u_{0}^{s}, \dots, u_{d}^{s}\}

and

u_{h}^{s} \in R^{d}

for

s \in ⟦ 1, n ⟧

and

h \in ⟦ 0, d ⟧

. Let

\{μ_{1}, \dots, μ_{m}\}

be the maximal simplices of L, where

μ_{j} = \{v_{0}^{j}, \dots, v_{k}^{j}\}

and

v_{h}^{j} \in R^{k}

for

j \in ⟦ 1, m ⟧

and

h \in ⟦ 0, k ⟧

. The simplicial-map neural network induced by

φ^{c}

is a two-hidden-layer feed-forward neural network denoted by

N_{φ}

with the following architecture:

an input layer composed of $d_{0} = d$ neurons;
a first hidden layer composed of $d_{1} = n (d + 1)$ neurons;
a second hidden layer composed of $d_{2} = m (k + 1)$ neurons; and
an output layer with $d_{3} = k$ neurons.

Then,

N_{φ} = f_{3} \circ f_{2} \circ f_{1}

being

f_{i} (y) = ϕ_{i} (W^{(i)}; y; b_{i}), f o r i \in ⟦ 1, 3 ⟧ .

Firstly,

W^{(1)} = (\begin{matrix} W_{1}^{(1)} \\ ⋮ \\ W_{n}^{(1)} \end{matrix}) \in M_{n (d + 1) \times d}

being

(\begin{matrix} W_{i}^{(1)} & | & B_{i} \end{matrix}) = {(\begin{matrix} u_{0}^{s} & \dots & u_{d}^{s} \\ 1 & \dots & 1 \end{matrix})}^{- 1} \in M_{(d + 1) \times (d + 1)}

where

W_{i}^{(1)} \in M_{(d + 1) \times d}

and

B_{i} \in R^{d + 1}

. The bias term

b_{1} \in R^{n (d + 1)}

is

b_{1} = (\begin{matrix} B_{1} \\ ⋮ \\ B_{n} \end{matrix})

and the function

ϕ_{1}

is then defined as:

ϕ_{1} (W^{(1)}; y; b_{1}) : = W^{(1)} y + b_{1} .

Secondly,

W^{(2)} = (W_{h, ℓ}^{(2)}) \in M_{m (k + 1) \times n (d + 1)}

where

W_{h, ℓ}^{(2)} : = \{\begin{matrix} 1 & i f φ^{(0)} (u_{t}^{s}) = v_{r}^{j}, \\ 0 & o t h e r w i s e; \end{matrix}

being

h = j (r + 1)

and

ℓ = s (t + 1)

for

s \in ⟦ 1, n ⟧

;

j \in ⟦ 1, m ⟧

;

t \in ⟦ 0, d ⟧

; and

r \in ⟦ 0, k ⟧

. The bias term

b_{2} \in R^{m (k + 1)}

is null and the function

ϕ_{2}

is defined as:

ϕ_{2} (W^{(2)}; y; b_{2}) : = W^{(2)} y .

Thirdly,

W^{(3)} = (\begin{matrix} W_{1}^{(3)} & \dots & W_{m}^{(3)} \end{matrix}) \in M_{k \times m (k + 1)}

being

W_{j}^{(3)} : = (\begin{matrix} v_{0}^{j} & \dots & v_{k}^{j} \end{matrix}) f o r j \in ⟦ 1, m ⟧,

the bias term

b_{3}

is null, and

ϕ_{3}

is defined as:

ϕ_{3} (W^{(3)}; y; b_{3}) : = \frac{\sum_{j \in ⟦ 1, ℓ ⟧} z^{j} ψ (y^{j})}{\sum_{j \in ⟦ 1, ℓ ⟧} ψ (y^{j})}

being

z^{j} : = W_{j}^{(3)} y^{j}

for

y = (\begin{matrix} y^{1} \\ ⋮ \\ y^{m} \end{matrix}) \in M^{m \cdot (k + 1)}

and

ψ (y^{j}) : = \{\begin{matrix} 1 & i f a l l t h e c o o r d i n a t e s o f y^{j} a r e \geq 0, \\ 0 & o t h e r w i s e . \end{matrix}

In [12], it is proven that

N_{φ} (x)

and

φ^{c} (x)

coincide for all

x \in | K |

.

Proposition 2

([12]). Let K and L be two finite pure simplicial complexes of dimension

d > 0

and

k > 0

, respectively. Let us consider the simplicial map

φ^{c} : | K | \to | L |

induced by a vertex map

φ^{(0)} : K^{(0)} \to L^{(0)}

. Then, the simplicial-map neural network

N_{φ} : | K | \to | L |

induced by the simplicial map

φ^{c}

satisfies that

N_{φ} (x) = φ^{c} (x)

for all

x \in | K |

.

4. Classification with Simplicial-Map Neural Networks

In this section, simplicial-map neural networks are considered as tools for classification tasks and for the study of adversarial examples. As usual, the classification problem will consist of finding a set of weights adapted to a labelled dataset given a fixed architecture.

Definition 12.

Let

n, d, k > 0

be integers. A labelled dataset D is a finite set of pairs

D = \{(p_{j}, ℓ_{j}) : j \in ⟦ 1, n ⟧, p_{j} \in R^{d}, ℓ_{j} \in E^{k}\}

where, for

j, h \in ⟦ 1, n ⟧

,

p_{j} \neq p_{h}

if

j \neq h

, and

ℓ_{j}

represents a one-hot vector. We say that

ℓ_{j}

is the label of

p_{j}

or, equivalently, that

p_{j}

belongs to the class

ℓ_{j}

. Besides, we will denote by

D_{P}

the ordered set of points

{〈 p_{j} 〉}_{j}

.

The concept of supervised classification problem for neural networks can be defined as follows.

Definition 13.

Given a labelled dataset

D \subset R^{d} \times E^{k}

, an integer

m > 0

, and activation functions

ϕ_{i}

for

i \in ⟦ 1, m ⟧

, a supervised classification problem consists of looking for the weights

W^{(i)}

and bias terms

b_{i}

for

i \in ⟦ 1, m ⟧

, such that the associated neural network

N : X \to Y

, with

X \subseteq R^{d}

,

Y \subseteq R^{k}

and

D \subseteq X \times Y

, satisfies:

$N (p) = ℓ$ for all $(p, ℓ) \in D$ .
$N$ maps $x \in X$ to a vector of scores $N (x) = (y_{1}, \dots, y_{k}) \in Y$ such that $y_{i} \in [0, 1]$ for $i \in ⟦ 1, n ⟧$ and $\sum_{i \in ⟦ 1, n ⟧} y_{i} = 1$ .

If such a neural network

N

exists, we will say that

N

characterizes D, or, equivalently, that

N

correctly classifies D.

Let us remark that the success of a classification model as a neural network is not usually measured on the correct classification on the input dataset, but on the correct classification of unseen examples, (that is, pairs not in D), collected in a test set. In this paper, we chose such a restrictive definition since we are more interested in dealing with the problem of the robustness of neural networks against adversarial attacks than in the problem of overfitting. Besides, let us observe that, as usual, the scores can be interpreted as a probability distribution over the labels.

Remark 1.

It is known that some functions like the logistic sigmoid, the softmax or the softplus satisfy the properties of a probability distribution and they are broadly applied in deep learning models. Our function also behaves like a probability distribution which is adequate for multiclassification tasks.

Next, we provide the definition of some of the main concepts in this paper, the confidence set

T_{N}

, the classified set

C_{N}

, and the decision boundary

Γ_{N}

of

N

. The intuition behind these concepts is that x belongs to the confidence set of

N

if the output

N (x)

is one of the possible one-hot vectors. If the output is a vector where the maximum is reached in exactly one coordinate, we say that x belongs to the classified set of

N

. Otherwise, the output is a vector where the maximum is reached in two or more coordinates, i.e., the instance has equal probability to belong to two or more output classes, then we say that x belongs to the decision boundary of

N

. Let us observe that

T_{N} \subseteq C_{N}

and

C_{N} ⊔ Γ_{N} = X

.

Definition 14.

Let

d, k > 0

be integers. Let

D \subset R^{d} \times E^{k}

be a labelled dataset and

N : X \to Y

a neural network that characterizes D. Let

x \in X

, with

N (x) = (y_{1}, \dots, y_{k}) \in Y

. If there exists

j \in ⟦ 1, k ⟧

such that

y_{j} > max {y_{i} : i \in ⟦ 1, k ⟧, i \neq j}

, then we say that x belongs to the set

C_{N}^{j}

and it has label

e_{j} \in E^{k}

(with probability

y_{j}

). Moreover, we define

C_{N}

to be the union of the sets

C_{N}^{j}

for

j \in ⟦ 0, k ⟧

. Besides, when

y_{j} = 1

, we say that x belong to the confidence set

T_{N}

. Finally, we say that x belongs to the decision boundary

Γ_{N}

if there exists

j \in ⟦ 1, n ⟧

such that

y_{j} = max {y_{i} : i \in ⟦ 1, k ⟧, i \neq j}

.

The following is a key result to define a simplicial-map neural network that characterizes a given labelled dataset.

Proposition 3.

Let

d, k > 0

be integers. Let L be the simplicial complex with only one maximal k-simplex

σ = {v_{0}, \dots, v_{k}}

with

v_{i} = e_{i}^{k} \times 0

for

i \in ⟦ 1, k ⟧

and

v_{0} = e_{0}^{k} \times 1

. Let

D \subset R^{d} \times E^{k}

be a labelled dataset and let

V_{P}

be the vertices of a convex polytope

P

such that

D_{P} \subset P

. Let us assume that

D_{P}

is in general position. Let

K = D (D_{P} \cup V_{P})

. Then, the map

φ^{(0)} : K^{(0)} \to L^{(0)}

defined as follows is a vertex map:

φ^{(0)} (u) : = \{\begin{matrix} ℓ \times 0 & i f (u, ℓ) \in D, \\ v_{0} & i f u \in V_{P} . \end{matrix}

Proof.

L is composedof a maximal simplex. Any subset of vertices of a simplex is a simplex by definition. Then, any map between vertices of

K^{(0)}

to

L^{(0)}

is a vertex map. Specifically,

φ^{(0)}

is a vertex map. □

By abuse of notation, we will say that a point

y \in R^{k}

with barycentric coordinates

(y_{0}, \dots, y_{k})

has label

j \in ⟦ 0, k ⟧

if

y_{j} > max {y_{i} : i \in ⟦ 0, k ⟧, i \neq j}

. Let us notice that an unknown label has been assigned to the vertex

v_{0}

of L.

Proposition 4.

Let

φ^{(0)} : K^{(0)} \to L^{(0)}

be the vertex map defined in Proposition 3. Then, the simplicial-map neural network

N_{φ} : | K | \to | L |

induced by the simplicial map

φ^{c}

characterizes D.

Proof.

By Proposition 2, the neural network

N_{φ}

satisfies that

N_{φ} (x) = φ^{c} (x)

for all

x \in | D (D_{P} \cup V_{P}) |

. Besides, let us observe that the Cartesian coordinates of

N_{φ} (x)

coincide with its barycentric coordinates. Moreover, for all

x \in D_{P}

,

N_{φ} (x) = φ^{(0)} (x)

and, by definition,

φ^{(0)} (x) = ℓ \times 0

when

(x, ℓ) \in D

. Then, we can conclude that

N_{φ}

characterizes D. □

Again, by abuse of notation, when

N_{φ}

is a simplicial-map neural network, we will denote by

T_{φ}

,

C_{φ}

, and

Γ_{φ}

, its confidence set, classified set and decision boundary, respectively.

Remark 2.

Firstly, let us observe that, with the assumptions of Proposition 3, if

x \in R^{d}

belongs to the decision boundary

Γ_{φ}

then

x \in | σ |

for some

σ \in K

satisfying that there are at least two vertices in σ having different labels. Moreover,

x \in | μ |

for some

μ \in Sd K

with all its vertices in

Γ_{φ}

. Secondly, if σ is a d-simplex in

Sd K

, then either all its vertices belong to the confidence set

T_{φ}

or

σ = σ^{1} \cup σ^{2}

with

σ^{1}, σ^{2} \in Sd K

satisfying that

| σ^{1} | \subseteq Γ_{φ}

and

| σ^{2} | \subseteq T_{φ}

. Finally, if σ is a d-simplex in

{Sd}^{t} K

for

t > 0

then either all its vertices belong to the classified subset

C_{φ}^{j}

for some

j \in ⟦ 0, k ⟧

, or

σ = σ^{1} \cup σ^{2}

with

σ^{1}, σ^{2} \in {Sd}^{t} K

satisfying that

| σ^{1} | \subseteq Γ_{φ}

and

\emptyset \neq | σ | ∖ | σ^{1} | \subseteq C_{φ}^{j}

.

As the following result states, when

K = D (D_{P} \cup V_{P})

, we can obtain a vertex map

φ_{t}^{(0)}

from

{({Sd}^{t} K)}^{(0)}

to

{({Sd}^{t} L)}^{(0)}

applying the barycentric subdivision, inducing a neural network

N_{φ_{t}}

that coincides with

N_{φ}

for any integer

t > 0

. Figure 4 illustrates these concepts.

Lemma 1.

Let

φ_{0}^{(0)} : = φ^{(0)}

be the vertex map defined in Proposition 3. For an integer

t > 1

and any

v \in {({Sd}^{t} K)}^{(0)}

, there exists

μ \in {Sd}^{t - 1} K

such that

w = bar μ

. Then, the map

φ_{t}^{(0)} : {({Sd}^{t} K)}^{(0)} \to {({Sd}^{t} L)}^{(0)}

defined as:

φ_{t}^{(0)} (w) : = bar φ_{t - 1} (μ)

is a vertex map inducing a neural network

N_{φ_{t}}

that coincides with

N_{φ}

for any integer

t \geq 0

.

Proof.

Let

t > 0

be an integer. Let us observe that

φ_{0}^{(0)}

is a vertex map. By induction, let us assume that

φ_{t - 1}^{(0)} : {({Sd}^{t - 1} K)}^{(0)} \to {({Sd}^{t - 1} L)}^{(0)}

is a vertex map. Let

σ \in {Sd}^{t} K

. By definition of barycentric subdivision, we can assume that

σ = {w_{0}, \dots, w_{k}} with w_{i} = bar μ_{i}, being μ_{i} a face of μ_{j} \in {Sd}^{t - 1} K for i, j \in ⟦ 0, k ⟧ and i < j .

Then,

{v \in {Sd}^{t} L : \exists w \in σ, φ_{t}^{(0)} (w) = v} = {φ_{t}^{(0)} (w_{i}) : i \in ⟦ 0, k ⟧} = {bar φ_{t - 1} (μ_{i}) : i \in ⟦ 0, k ⟧} .

Since

φ_{t - 1}^{(0)}

is a vertex map, then

φ_{t - 1} (μ_{i})

is a simplex of

{Sd}^{t - 1} L

and

φ_{t - 1} (μ_{i})

is a face of

φ_{t - 1} (μ_{j})

for all

i, j \in ⟦ 0, k ⟧

with

i < j

, by definition of

φ_{t - 1}

. Then,

{bar φ_{t - 1} (μ_{i})

:

i \in ⟦ 0, k ⟧}

is a simplex of

{Sd}^{t} L

.

Now, let us see that

N_{φ_{t}} = N_{φ}

. By induction, let us prove that

N_{φ_{t}} = N_{φ_{t - 1}}

. Let

x \in | K |

. Then, there exist a d-simplex

μ = {w_{0}, \dots, w_{d}} \in {Sd}^{t} K

and a d-simplex

σ = {u_{0}, \dots, u_{d}} \in {Sd}^{t - 1} K

such that

x \in | μ | \subset | σ |

and

w_{i} = bar {u_{0}, \dots, u_{i}}

for all

i \in ⟦ 0, d ⟧

.

Then,

N_{φ_{t}} (x) = φ_{t}^{c} (x) = \sum_{i \in ⟦ 0, d ⟧} λ_{i} φ_{t}^{(0)} (w_{i})

being

λ_{i} \in [0, 1]

for all

i \in ⟦ 0, d ⟧

and

\sum_{i \in ⟦ 0, d ⟧} λ_{i} = 1

. Therefore,

N_{φ_{t}} (x) = \sum_{i \in ⟦ 0, d ⟧} λ_{i} \sum_{j \in ⟦ 0, i ⟧} \frac{1}{i + 1} φ_{t}^{(0)} (u_{j}) = \sum_{i \in ⟦ 0, d ⟧} λ_{i}^{'} φ_{t}^{(0)} (u_{i})

being

λ_{i}^{'} = \sum_{j \in ⟦ i, d ⟧} \frac{λ_{j}}{j + 1}

. Let us observe that

\sum_{i \in ⟦ 0, d ⟧} λ_{i}^{'} = \sum_{i \in ⟦ 0, d ⟧} (i + 1) \frac{λ_{i}}{i + 1} = \sum_{i \in ⟦ 0, d ⟧} λ_{i} = 1 .

Now, let us observe that

λ_{i}^{'} = \sum_{j \in ⟦ i, d ⟧} \frac{λ_{j}}{j + 1} \geq 0 and \sum_{j \in ⟦ i, d ⟧} \frac{λ_{j}}{j + 1} \leq \frac{1}{i + 1} \sum_{j \in ⟦ i, d ⟧} λ_{j} \leq \frac{1}{i + 1} \leq 1,

for all

i \in ⟦ 0, d ⟧

. Then, for all

x \in | K |

,

N_{φ_{t}} (x) = \sum_{i \in ⟦ 0, d ⟧} λ_{i}^{'} φ_{t}^{(0)} (u_{i}) = N_{φ_{t - 1}} (x),

with

λ_{i}^{'} \in [0, 1]

for all

i \in ⟦ 0, d ⟧

and

\sum_{i \in ⟦ 0, d ⟧} λ_{i}^{'} = 1

, concluding the proof. □

Computing Simplicial-Map Neural Networks Robust to Adversarial Attacks

In this subsection, the main result of the paper is provided. It states that we can always compute a neural network characterizing a given labelled dataset, and being robust to adversarial attacks of a given size. Firstly, let us define the concepts of adversarial example and robustness of neural networks against adversarial attacks. Some interesting references on these concepts are [21,22].

Definition 15.

Let

d, k > 0

be integers. Let

D \subset R^{d} \times E^{k}

be a labelled dataset and

N

a neural network that characterizes D. Let

B (r) = {α \in R^{d}

:

| | α | | \leq r}

being

| | \cdot | |

a norm on

R^{d}

. Let us suppose that

x \in R^{d}

has label ℓ. Then, an adversarial example of size r is defined as

x^{'} = x + α

with

α \in B (r)

such that

x^{'}

has label

ℓ^{'}

with

ℓ^{'} \neq ℓ

. A neural network is called robust to adversarial attacks of size r if no labelled point

x \in R^{d}

has an adversarial example of size r.

Proposition 5.

With the assumptions of Proposition 4, we have that

N_{φ}

is not robust to adversarial attacks of size r for

0 < r < d (T_{φ}, Γ_{φ})

.

Proof.

By Remark 2, consider

σ \in K

such that there exist v and w being two vertices of

σ

with different labels. Let

z = bar {v, w}

. Then z is in the decision boundary of

| K |

and

{v, z}

,

{z, w}

are edges of

Sd K

. Let

x = (1 - a) z + a v

where

a = \frac{r}{2 d (z, v)}

. Then x has the same label as v and

d (x, z) = \frac{r}{2}

. Let

x^{'} = (1 - a^{'}) z + a^{'} w

where

a^{'} = \frac{r}{2 d (z, w)}

. Then

x^{'}

has the same label as w and

d (z, x^{'}) = \frac{r}{2}

. Then,

d (x, x^{'}) = d (x, z) + d (z, x^{'}) = r

concluding that

x^{'}

is an adversarial example of size r and

0 < r < d (T_{φ}, Γ_{φ})

. □

Example 1.

Let

d = k = 2

. Let us consider the labelled dataset

D = {(A = (4, 8), (0, 1, 0))

,

(B = (8, 4), (1, 0, 0))}

and the convex polytope

P

with vertices

{C = (5, 10)

,

D = (0, 0)

,

E = (15, 0)}

. Then,

K = D (D_{P} \cup V_{P})

is composed by five maximal 2-simplices and L by just one maximal 2-simplex. This way,

Sd K

and

Sd L

are composed by the maximal 2-simplices showed in Figure 5. Let

x = (6 - a, 6 + a) \in | K |

, with

a \in (0, 2]

, be in the geometric realization of the segment with endpoints

{(4, 8), (8, 4)}

. Then,

φ^{c} (x) = (c, d, 0)

with

0 \leq c < \frac{1}{2}

and

\frac{1}{2} < d \leq 1

. Therefore, x is classified as

(0, 1, 0)

with probability d. Take

z = (6, 6)

. Then, z belongs to the decision boundary since

φ^{c} (z) = (\frac{1}{2}, \frac{1}{2}, 0)

. Take

x^{'} = (6 + a, 6 - a)

. Then,

φ^{c} (x) = (c^{'}, d^{'}, 0)

with

\frac{1}{2} < c^{'} \leq 1

and

0 \leq d^{'} < \frac{1}{2}

. Therefore,

x^{'}

is classified as

(1, 0, 0)

with probability

d^{'}

.

Since

d (x, z) = a \sqrt{2} = d (x^{'}, z)

and

a \in (0, 2]

, then

N_{φ}

is not robust to adversarial attacks of any size r with

0 < r \leq 4 \sqrt{2}

. See Figure 5.

Let us now introduce the main result of this paper stating that there exists a two-hidden-layer neural network characterizing a given labelled dataset and being robust to adversarial attacks of size

r > 0

for r being small enough. In order to define such a neural network robust to adversarial examples, we will construct a continuous function from

| K |

to

| K |

with the idea of later applying the Simplicial Approximation Theorem and the composition of simplicial maps to obtain a simplicial map from

| K |

to

| L |

that will give rise to a neural network robust to adversarial attacks of a given size

r > 0

. Let us observe that, to be able to compute such a robust neural network, the size r should be smaller than the distance between the decision boundary and the confidence set.

Theorem 3.

Let

n, d, k > 0

be integers. Let

D = {(p_{j}, ℓ_{j})

:

j \in ⟦ 1, n ⟧

,

p_{j} \in R^{d}

,

ℓ_{j} \in E^{k}}

be a labelled dataset. Then, there exists a two-hidden-layer neural network

N

characterizing D and robust to adversarial attacks of size

r > 0

, for r being small enough.

Proof.

Let us consider a convex polytope

P

such that the points of

D_{P}

are inside

P

. Then, we can compute the Delaunay complex

D (D_{P} \cup V_{P})

that will be denoted simply by K (see Figure 4), and a simplicial complex L composed of just one maximal k-simplex. As claimed in Proposition 4, a simplicial map

φ^{c}

can be defined between

| K |

and

| L |

giving rise to a neural network

N_{φ}

that characterizes D (see Proposition 2). However,

N_{φ}

is not robust to adversarial attacks (see Proposition 5). Our goal is to define a new simplicial map such that its associated simplicial-map neural network is robust to adversarial attacks. To reach that aim, we need r to be small enough, that is,

0 < r < d (T_{φ}, Γ_{φ})

, where

d (T_{φ}, Γ_{φ}) = min {d (p, q) : p \in T_{φ}

,

q \in Γ_{φ}}

, so adversarial attacks will be placed between the confidence set

T_{φ}

and the decision boundary

Γ_{φ}

. Then, a continuous function

g : | Sd K | \to | Sd K |

will be defined depending on such r, to later apply the Simplicial Approximation Theorem Extension (Proposition 1), obtaining a simplicial approximation

ω^{c} : {Sd}^{s} K \to {Sd}^{t} K

of g as close to g as desired. Then,

φ_{t}^{c} \circ ω^{c} : | K | \to | L |

will be a simplicial map giving rise to a simplicial-map neural network

N_{φ_{t} \circ ω}

robust to adversarial attacks of size r.

Let us define now the continuous function

g : | Sd K | \to | Sd K |

. Let

σ = {u_{0}, \dots, u_{d}}

be a d-simplex of

Sd K

. Let us observe that, by Remark 2, the vertices of

σ

satisfy the following property:

All the vertices of $σ$ are in $T_{φ}$ . Then, $| σ | \subseteq T_{φ}$ .
Otherwise, $σ = σ^{1} \cup σ^{2}$ being $\emptyset \neq | σ^{1} | \subseteq Γ_{φ}$ and $\emptyset \neq | σ^{2} | \subseteq T_{φ}$ .

In the latter case, let us define the continuous function

g : | σ | \to | σ |

as follows. Without loss of generality, let us assume that

σ^{1} = {u_{0}, \dots, u_{h}}

with

h \in ⟦ 0, d ⟧

.

Let us compute the set of points of

| σ |

at distance less than r to

| σ^{1} |

and let us send, by g, such points to points in

| σ^{1} |

.

Let x be a point of

| σ |

with barycentric coordinates

(x_{0}, \dots, x_{d})

with respect to

σ

.

Let

λ = \sum_{i \in ⟦ 0, h ⟧} x_{i}

.

Let

z^{1} \in R^{d}

be the projection of x in

| σ^{1} |

whose barycentric coordinates with respect to

σ

are:

(z_{0}, \dots, z_{h}, 0, \dots, 0), where z_{i} = \frac{x_{i}}{λ} for i \in ⟦ 0, h ⟧ .

Let

z^{2}

be the point in

| σ^{2} |

with barycentric coordinates

(0, \dots, 0, z_{h + 1}, \dots, z_{d})

with respect to

σ

, aligned with x and

z^{1}

. Then,

\frac{x_{i} - z_{i}}{x_{i}} = \frac{x_{j}}{x_{j} - z_{j}} for i \in ⟦ 0, h ⟧ and j \in ⟦ h + 1, d ⟧ .

So,

z_{j} = \frac{x_{j}}{1 - λ}

for

j \in ⟦ h + 1, d ⟧

.

Now,

x = (1 - a) z^{1} + a z^{2}

for

a \in [0, 1]

.

Then,

d (x, σ^{1}) \leq d (x, z^{1}) = a \cdot d (z^{1}, z^{2})

.

Let

ε = \frac{r}{d (z^{1}, z^{2})}

. Then,

d (x, σ^{1}) \leq r

if

a \leq ε

. Then,

g (x) : = \{\begin{matrix} z^{1} & if a \in [0, ε], \\ (1 - a^{'}) z^{1} + a^{'} z^{2} & if a \in [ε, 1] with a^{'} = \frac{a - ε}{1 - ε} . \end{matrix}

Let us observe that

a^{'} \in [0, 1]

and for

a = ε

, we have that

a^{'} = 0

so

(1 - a^{'}) z^{1} + a^{'} z^{2} = z^{1}

.

Besides, for

a = 1

, we have that

a^{'} = 1

so

(1 - a^{'}) z^{1} + a^{'} z^{2} = z^{2}

.

Let us prove that g is continuous at any point

x \in | K |

.

Let us observe that, by construction, g is continuous in the interior of

| σ |

of every d-simplex

σ \in Sd K

.

Let x be a point in

| σ \cap μ |

for some d-simplices

σ, μ \in Sd K

.

Let

σ = σ^{1} \cup σ^{2}

and

μ = μ^{1} \cup μ^{2}

with

| σ^{1} |, | μ^{1} | \subseteq Γ_{φ}

and

| σ^{2} |, | μ^{2} | \subseteq T_{φ}

. Let

γ \in Sd K

be the simplex with lower dimension such that

x \in | γ |

. Then, by definition of simplicial complex,

γ \subseteq σ \cap μ

and the barycentric coordinates of x with respect to

σ

and

μ

coincide.

By Remark 2, we have to consider two cases:

(1): All the vertices of $γ$ belong to $T_{φ}$ . Then $γ \subseteq σ^{2} \cap μ^{2}$ and $g (x) = x$ .
(2): $γ = γ^{1} \cup γ^{2}$ with $| γ^{1} | \subseteq Γ_{φ}$ and $| γ^{2} | \subseteq T_{φ}$ . Then $γ^{1} \subseteq σ^{1} \cap μ^{1}$ and $γ^{2} \subseteq σ^{2} \cap μ^{2}$ so the definition of $g (x)$ with respect to $σ$ and $μ$ coincides.

Now, by Proposition 1, given

r_{1} > 0

, there exist

s, t > 0

and a simplicial map

ω^{c} : | {Sd}^{s} K | \to | {Sd}^{t} K |

such that

| | g - ω^{c} | | < r_{1}

. By Lemma 1,

φ_{t}^{(0)} : {({Sd}^{t} K)}^{(0)} \to {({Sd}^{t} L)}^{(0)}

is a vertex map. Since the composition of simplicial maps is a simplicial map, then

φ_{t}^{c} \circ ω^{c} : | {Sd}^{s} K | \to | {Sd}^{t} L |

is a simplicial map, concluding that

N_{φ_{t} \circ ω}

is a simplicial-map neural network.

Let us prove now that

N_{φ_{t} \circ ω}

is robust to adversarial attacks of size r. First of all, the following properties holds:

(1): If $x \in C_{φ_{t} \circ ω}^{j}$ then $ω^{c} (x) \in C_{φ_{t}}^{j}$ , being $j \in ⟦ 0, k ⟧$ .
(2): Let $v \in {({Sd}^{t} K)}^{(0)}$ and $z \in | st v |$ . If $v \in C_{φ_{t}}^{j}$ then $z \in C_{φ_{t}}^{j}$ , being $j \in ⟦ 0, k ⟧$ .
Since $z \in | st v |$ then $z \in | σ |$ for a d-simplex $σ \in {Sd}^{t} K$ with $v \in σ$ . Then, by Remark 2, $σ = σ^{1} \cup σ^{2}$ , with $σ^{1} \in Γ_{φ_{t}}$ and $σ^{2} \in C_{φ_{t}}^{j}$ . Besides, since $v \in σ^{2}$ then $z \notin | σ^{1} |$ , therefore $z \in | σ | ∖ | σ^{1} \subseteq C_{φ_{t}}^{j}$ .
(3): If $x \in C_{φ_{t} \circ ω}^{j}$ then $g (x) \in C_{φ_{t}}^{j}$ , being $j \in ⟦ 0, k ⟧$ .
If $x \in C_{φ_{t} \circ ω}^{j}$ then there exists $v \in C_{φ_{t} \circ ω}^{j}$ such that $x \in | st v |$ . Then, $ω (v) \in C_{φ_{t}}^{j}$ by (0). Now, since $g (| st v |) \subseteq | st ω (v) |$ , then $g (x) \in C_{φ_{t}}^{j}$ by (1).
(4): Let $x \in | {Sd}^{s} K |$ . If $g (x) \in Γ_{φ_{t}}$ then $x \in Γ_{φ_{t} \circ ω}$ .
By contradiction, let us assume that $g (x) \in Γ_{φ_{t}}$ and $x \in C_{φ_{t} \circ ω}^{j}$ for some $j \in ⟦ 0, k ⟧$ . Then, $g (x) \in C_{φ_{t}}^{j}$ by (2), leading to a contradiction.
(5): Let $x \in | {Sd}^{s} K |$ . If $g (x) \in C_{φ_{t}}^{j}$ with probability $y_{j}$ and $x \in C_{φ_{t} \circ ω}^{j^{'}}$ then $j = j^{'}$ and $| y_{j} - y_{j}^{'} | < r_{1}$ . This last statement is a consequence of (2) and that $| | g - ω^{c} | | < r_{1}$ .

Now, let

x \in C_{φ_{t} \circ ω}^{j}

being

j \in ⟦ 0, k ⟧

. Let

α \in R^{d}

with

| | α | | < r

and let

x^{'} : = x + α

. Let us prove that

x^{'} \in Γ_{φ_{t} \circ ω}

or

x^{'} \in C_{φ_{t} \circ ω}^{j}

.

On one hand, if

g (x^{'}) \in Γ_{φ_{t}}

then

x^{'} \in Γ_{φ_{t} \circ ω}

by (4). On the other hand, if

g (x^{'}) \in C_{φ_{t}}^{j}

then

x^{'} \in C_{φ_{t} \circ ω}^{j}

or

x^{'} \in Γ_{φ_{t} \circ ω}

by (3), concluding the proof. □

Example 2.

Let us consider a labelled dataset

D = {(p, 1)}

with

p \in R

composed of just one point. Let

P

be a segment with endpoints

p_{1}

and

p_{2}

in

R

such that

p_{1} < p < p_{2}

. Let

r \in R

such that

0 < r < min {| p_{1} - p |, | p_{2} - p |}

. Let K be the Delaunay complex of

{p, p_{1}, p_{2}}

that consists of just the two maximal simplices

{p, p_{1}}

and

{p, p_{2}}

. Let L be a simplicial complex composed by a maximal 1-simplex with endpoints

v_{0} = (0, 1)

and

v_{1} = (1, 0)

. Then, a simplicial map

φ^{c}

can be defined as in Proposition 3 together with a neural network

N_{φ}

as in Proposition 4. However,

N_{φ}

is not robust to attacks of size r as it has been proved in Proposition 5. Then, following the proof of Theorem 3, in Figure 6, we have computed barycentric subdivisions on K until we approximate g by the simplicial map

ω^{c} : | {Sd}^{3} K | \to | {Sd}^{2} K |

. Finally, the neural network induced by the composition

φ_{2}^{c} \circ ω^{c}

is robust to adversarial attacks of size r.

5. Conclusions and Future Work

Neural networks are one of the most promising tools in artificial intelligence and, currently, with the big success on real-world problem of Deep Learning architectures, it has become one of the most widely used. From a mathematical point of view, neural network can be seen as the composition of a big amount of simple functions, mainly from linear algebra, and the so-called activation functions. Since the efficiency of such neural networks depends of the choice of an appropriate set of parameters, most of the efforts in the study of such networks has been focused on optimization techniques. After a first wave of research based on these optimization techniques, many researchers are considering the study of neural networks by using different mathematical techniques as analysis, geometry or, as in this paper, algebraic topology.

Specifically, in this paper, we have presented a family of neural networks, called simplicial-map neural networks, that are robust to adversarial examples. The main contribution of the paper is a constructive proof that shows how to define a neural network robust to adversarial attacks of a given size. This result is proven thanks to the connection of neural networks with concepts from Algebraic Topology. By endowing the set of instances of a classification problem with the structure of simplicial complex and considering the set of one-hot labels as a simplex, provides a new point of view that allows to find the exact values of the weights of the associated network without any kind of training or optimization process.

Finally, we plan to provide an implementation of our methods that takes into account the efficiency issues that arise in creating neural networks following our approach. We believe that this point of view opens a new bridge between Neural Network and Algebraic Topology which can lead to a fruitful flow of concepts, problems and solutions in both directions.

Author Contributions

Conceptualization, R.G.-D., M.A.G.-N., J.H. and E.P.-H.; methodology, R.G.-D., M.A.G.-N., J.H. and E.P.-H.; formal analysis, R.G.-D., M.A.G.-N., J.H. and E.P.-H.; investigation, R.G.-D., M.A.G.-N., J.H. and E.P.-H.; writing—original draft preparation, R.G.-D., M.A.G.-N., J.H. and E.P.-H.; writing—review and editing, R.G.-D., M.A.G.-N., J.H. and E.P.-H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by MICINN, FEDER/UE under grant PID2019-107339GB-100.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. Intriguing properties of neural networks. arXiv 2013, arXiv:cs.CV/1312.6199. [Google Scholar]
Fezza, S.A.; Bakhti, Y.; Hamidouche, W.; Déforges, O. Perceptual Evaluation of Adversarial Attacks for CNN-based Image Classification. In Proceedings of the Eleventh International Conference on Quality of Multimedia Experience (QoMEX), Berlin, Germany, 5–7 June 2019; pp. 1–6. [Google Scholar] [CrossRef]
Garg, S.; Ramakrishnan, G. BAE: BERT-based Adversarial Examples for Text Classification. arXiv 2020, arXiv:cs.CL/2004.01970. [Google Scholar]
Karim, F.; Majumdar, S.; Darabi, H. Adversarial Attacks on Time Series. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 1. [Google Scholar] [CrossRef] [PubMed]
Christakopoulou, K.; Banerjee, A. Adversarial Attacks on an Oblivious Recommender. In Proceedings of the 13th ACM Conference on Recommender Systems, Association for Computing Machinery, Copenhagen, Denmark, 20 September 2019; pp. 322–330. [Google Scholar] [CrossRef]
Xu, H.; Ma, Y.; Liu, H.; Deb, D.; Liu, H.; Tang, J.; Jain, A.K. Adversarial Attacks and Defenses in Images, Graphs and Text: A Review. Int. J. Autom. Comput. 2020, 17, 151–178. [Google Scholar] [CrossRef]
Yan, Z.; Guo, Y.; Zhang, C. Adversarial Margin Maximization Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 1. [Google Scholar] [CrossRef] [PubMed]
Cortes, C.; Vapnik, V. Support Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Tang, Y. Deep Learning using Linear Support Vector Machines. arXiv 2013, arXiv:cs.LG/1306.0239. [Google Scholar]
Sun, S.; Chen, W.; Wang, L.; Liu, X.; Liu, T. On the Depth of Deep Neural Networks: A Theoretical View. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Schuurmans, D., Wellman, M.P., Eds.; AAAI Press: Palo Alto, CA, USA, 2016; pp. 2066–2072. [Google Scholar]
Wang, X.; Zhang, S.; Lei, Z.; Liu, S.; Guo, X.; Li, S.Z. Ensemble Soft-Margin Softmax Loss for Image Classification. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI’18, Stockholm, Sweden, 13–19 July 2018; AAAI Press: Palo Alto, CA, USA, 2018; pp. 992–998. [Google Scholar]
Paluzo-Hidalgo, E.; Gonzalez-Diaz, R.; Gutiérrez-Naranjo, M.A. Two-hidden-layer feed-forward networks are universal approximators: A constructive approach. Neural Netw. 2020, 131, 29–36. [Google Scholar] [CrossRef] [PubMed]
Ismailov, V.E. On the approximation by neural networks with bounded number of neurons in hidden layers. J. Math. Anal. Appl. 2014, 417, 963–969. [Google Scholar] [CrossRef]
Guliyev, N.J.; Ismailov, V.E. Approximation capability of two hidden layer feedforward neural networks with fixed weights. Neurocomputing 2018, 316, 262–269. [Google Scholar] [CrossRef]
Ebli, S.; Defferrard, M.; Spreemann, G. Simplicial Neural Networks. arXiv 2020, arXiv:cs.LG/2010.03633. [Google Scholar]
Spanier, E.H. Algebraic Topology; Springer: New York, NY, USA, 1995. [Google Scholar]
Boissonnat, J.D.; Chazal, F.; Yvinec, M. Geometric and Topological Inference; Cambridge Texts in Applied Mathematics; Cambridge University Press: Cambridge, UK, 2018. [Google Scholar] [CrossRef][Green Version]
Okabe, A.; Boots, B.; Sugihara, K.; Chiu, S.N.; Kendall, D.G. Definitions and Basic Properties of Voronoi Diagrams. In Spatial Tessellations: Concepts and Applications of Voronoi Diagrams, 2nd ed.; John Wiley & Sons: Chichester, UK, 2000; pp. 43–112. [Google Scholar] [CrossRef]
Hornik, K. Approximation Capabilities of Multilayer Feedforward Networks. Neural Netw. 1991, 4, 251–257. [Google Scholar] [CrossRef]
Edelsbrunner, H.; Harer, J. Computational Topology—An Introduction; American Mathematical Society: Providence, RI, USA, 2010; pp. 1–241. [Google Scholar]
Lecuyer, M.; Atlidakis, V.; Geambasu, R.; Hsu, D.; Jana, S. Certified Robustness to Adversarial Examples with Differential Privacy. In Proceedings of the IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 18–19 May 2019; pp. 656–672. [Google Scholar] [CrossRef]
Yuan, X.; He, P.; Zhu, Q.; Li, X. Adversarial Examples: Attacks and Defenses for Deep Learning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2805–2824. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Example of a barycentric subdivision. Let

V = {a, b, c}

be the set of the three vertices (in blue) of the triangle depicted on the left. Let

K = {{a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}}

. From left to right:

| K |

,

| Sd K |

, and

| {Sd}^{2} K |

are shown.

Figure 1. Example of a barycentric subdivision. Let

V = {a, b, c}

be the set of the three vertices (in blue) of the triangle depicted on the left. Let

K = {{a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}}

. From left to right:

| K |

,

| Sd K |

, and

| {Sd}^{2} K |

are shown.

Figure 2. Given a labelled dataset D, a convex polytope

P

containing D can be computed. Then, the simplicial complex K can be obtained using the Delaunay triangulation of all the points of D and the vertices of

P

.

Figure 2. Given a labelled dataset D, a convex polytope

P

containing D can be computed. Then, the simplicial complex K can be obtained using the Delaunay triangulation of all the points of D and the vertices of

P

.

Figure 3. On the top, we can see a 1-simplex with two iterated applications of the barycentric subdivision. On the bottom, a continuous function was applied to the straight line and a simplicial approximation (in red) is provided. The star condition is satisfied and no more barycentric subdivisions are needed.

Figure 4. Let

K = D (D_{P} \cup V_{P})

be the Delaunay complex of

D_{P} \cup V_{P}

where

D_{P}

is the set

{A, B, C}

of red and blue points, and

V_{P}

are the green vertices (depicted in the center). Let L be the simplicial complex with one maximal simplex

σ = {v_{0} = (0, 0, 1), v_{1} = (0, 1, 0), v_{2} = (1, 0, 0)}

(pictured on the right). Let us consider the vertex map

φ^{(0)}

that sends the blue points

A, B

to

v_{1}

, the red point C to

v_{2}

, and the green points (labelled as unknown) to

v_{0}

. Then,

φ^{(0)}

gives rise to the simplicial map

φ^{c}

and the simplicial-map neural network

N_{φ}

. The decision boundary of

N_{φ}

is pictured on the center as the set of points in the boundary of the red, blue or green region. For example,

N_{φ} (D) = (\frac{1}{3}, \frac{1}{3}, \frac{1}{3})

,

N_{φ} (F) = (\frac{1}{2}, \frac{1}{2}, 0)

and

N_{φ} (G) = (0, \frac{1}{2}, \frac{1}{2})

. Let us consider now the barycenter subdivision of K shown on the left and the simplicial map

ω^{c}

which relates both simplicial complexes. The decision boundary of

N_{φ \circ ω}

is the gray zone on the left picture.

Figure 4. Let

K = D (D_{P} \cup V_{P})

be the Delaunay complex of

D_{P} \cup V_{P}

where

D_{P}

is the set

{A, B, C}

of red and blue points, and

V_{P}

are the green vertices (depicted in the center). Let L be the simplicial complex with one maximal simplex

σ = {v_{0} = (0, 0, 1), v_{1} = (0, 1, 0), v_{2} = (1, 0, 0)}

(pictured on the right). Let us consider the vertex map

φ^{(0)}

that sends the blue points

A, B

to

v_{1}

, the red point C to

v_{2}

, and the green points (labelled as unknown) to

v_{0}

. Then,

φ^{(0)}

gives rise to the simplicial map

φ^{c}

and the simplicial-map neural network

N_{φ}

. The decision boundary of

N_{φ}

is pictured on the center as the set of points in the boundary of the red, blue or green region. For example,

N_{φ} (D) = (\frac{1}{3}, \frac{1}{3}, \frac{1}{3})

,

N_{φ} (F) = (\frac{1}{2}, \frac{1}{2}, 0)

and

N_{φ} (G) = (0, \frac{1}{2}, \frac{1}{2})

. Let us consider now the barycenter subdivision of K shown on the left and the simplicial map

ω^{c}

which relates both simplicial complexes. The decision boundary of

N_{φ \circ ω}

is the gray zone on the left picture.

Figure 5. An adversarial example x for the simplicial-map neural network

N_{φ} : | K | \to | L |

.

Figure 5. An adversarial example x for the simplicial-map neural network

N_{φ} : | K | \to | L |

.

Figure 6. Three simplicial complexes with simplicial maps

ω^{c}

and

φ_{2}^{c}

between them are shown illustrating a neural network

N_{φ_{2} \circ ω}

robust to adversarial attacks of size r.

Figure 6. Three simplicial complexes with simplicial maps

ω^{c}

and

φ_{2}^{c}

between them are shown illustrating a neural network

N_{φ_{2} \circ ω}

robust to adversarial attacks of size r.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Paluzo-Hidalgo, E.; Gonzalez-Diaz, R.; Gutiérrez-Naranjo, M.A.; Heras, J. Simplicial-Map Neural Networks Robust to Adversarial Examples. Mathematics 2021, 9, 169. https://doi.org/10.3390/math9020169

AMA Style

Paluzo-Hidalgo E, Gonzalez-Diaz R, Gutiérrez-Naranjo MA, Heras J. Simplicial-Map Neural Networks Robust to Adversarial Examples. Mathematics. 2021; 9(2):169. https://doi.org/10.3390/math9020169

Chicago/Turabian Style

Paluzo-Hidalgo, Eduardo, Rocio Gonzalez-Diaz, Miguel A. Gutiérrez-Naranjo, and Jónathan Heras. 2021. "Simplicial-Map Neural Networks Robust to Adversarial Examples" Mathematics 9, no. 2: 169. https://doi.org/10.3390/math9020169

APA Style

Paluzo-Hidalgo, E., Gonzalez-Diaz, R., Gutiérrez-Naranjo, M. A., & Heras, J. (2021). Simplicial-Map Neural Networks Robust to Adversarial Examples. Mathematics, 9(2), 169. https://doi.org/10.3390/math9020169

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Simplicial-Map Neural Networks Robust to Adversarial Examples

Abstract

1. Introduction

2. Background

3. Simplicial-Map Neural Networks

4. Classification with Simplicial-Map Neural Networks

Computing Simplicial-Map Neural Networks Robust to Adversarial Attacks

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI