Adversarial Robustness with Partial Isometry

Despite their remarkable performance, deep learning models still lack robustness guarantees, particularly in the presence of adversarial examples. This significant vulnerability raises concerns about their trustworthiness and hinders their deployment in critical domains that require certified levels of robustness. In this paper, we introduce an information geometric framework to establish precise robustness criteria for l2 white-box attacks in a multi-class classification setting. We endow the output space with the Fisher information metric and derive criteria on the input–output Jacobian to ensure robustness. We show that model robustness can be achieved by constraining the model to be partially isometric around the training points. We evaluate our approach using MNIST and CIFAR-10 datasets against adversarial attacks, revealing its substantial improvements over defensive distillation and Jacobian regularization for medium-sized perturbations and its superior robustness performance to adversarial training for large perturbations, all while maintaining the desired accuracy.


Introduction
One of the primary motivations for investigating machine learning robustness stems from the susceptibility of neural networks to adversarial attacks, wherein small perturbations in the input data can deceive the network into making the wrong decision.These adversarial attacks have been shown to be both ubiquitous and transferable [1][2][3].Beyond posing a security threat, adversarial attacks underscore the glaring lack of robustness in machine learning models [4,5].This deficiency in robustness is a critical challenge as it undermines trustworthiness in machine learning systems [6].
In this paper, we shed an information geometric perspective to adversarial robustness in machine learning models.We show that robustness can be achieved by encouraging the model to be isometric in the orthogonal space of the kernel of the pullback Fisher information metric (FIM).We subsequently formulate a regularization defense method for adversarial robustness.While our focus is on l 2 white-box attacks within multi-class classification tasks, the method's applicability extends to more general settings, including unrestricted attacks and black-box attacks across various supervised learning tasks.The regularized model is evaluated on MNIST and CIFAR-10 datasets against projected gradient descent (PGD) l ∞ attacks and AutoAttack [7] with l ∞ and l 2 norms.Comparisons with the unregularized model, defensive distillation [8], Jacobian regularization [9], and Fisher information regularization [10] show significant improvement in robustness.Moreover, the regularized model is able to ensure robustness against larger perturbations compared to adversarial training.
The remainder of this paper is organized as follows.Section 2 introduces notations, notions of adversarial machine learning, and definitions related to geometry.Then, we derive a sufficient condition for adversarial robustness at a given sample point.Section 3 presents our method for approximating the robustness condition, which involves promoting model isometry in the orthogonal complement of the kernel of the pullback of the FIM.In Section 4, several experiments are presented to evaluate the proposed method.Section 5 discusses the results in the context of related work on adversarial defense.Finally, Section 6 concludes the paper and outlines potential extensions of this research.Appendix A provides the proof of the results stated in the main text.

Notations
Let d, c ∈ N * such that d ≥ c > 1.Let m = c − 1.In the learning framework, d will be the dimension of the input space, while c will be the number of classes.The range of a matrix M is denoted as rg(M).The rank of M is denoted as rk(M).The Euclidean norm (i.e., l 2 norm) is denoted as ∥ • ∥.We use the notation δ ij = 1 if i = j and 0 otherwise.We denote the components of a vector v by v i ∈ R with a superscript.Smooth means C ∞ .

Adversarial Machine Learning
An adversarial attack is any strategy aiming at deliberately altering the expected behavior of a model or extracting information from a model.In this work, we focus on attacks performed at inference time (i.e., after training), sometimes referred to as evasion attacks.The most well-known evasion attacks are gradient-based.Such gradient-based attacks all follow the same idea that we explain thereafter.
To reach good accuracy and generalization, a machine learning model f (with input x and parameter w) is typically trained by minimizing a loss function L(y, f (x, w)) with respect to the parameters w of the model.In its simpler form, the loss function quantifies the error between the prediction of the model f (x, w) and ground-truth y.Given a clean input x 0 , an adversarial example x * can be crafted by maximizing the loss function L(y, f (x, w)), starting from x 0 and using gradient ascent x t+1 − x t ∝ ∇ x L(y, f (x, w)), where the gradient is computed with respect to the input x (and not the parameter w as during training).In order for x * to be an adversarial example, x 0 and x * must be close to each other according to some dissimilarity measure, typically a l p norm.An adversarial example x * is successful if the model f classifies x * differently from x 0 .Some well-known gradient-based attacks include the fast gradient sign method [2] and projected gradient descent [3].
Adversarial attacks can be classified according to their threat model.White-box attacks assume that the adversary has perfect knowledge of the targeted model, including access to the training data, model architecture, and model parameters.Such an adversary can directly compute the gradient ∇ x L(y, f (x, w)) of the targeted model and craft adversarial examples.More realistic threat models are classified as gray-box or black-box attacks, where some or all of the information is unknown to the adversary.In this work, we use both white-box attacks as well as simple gray-box attacks where the adversary can access the training data and model architecture, but not the model parameters.To craft such gray-box adversarial examples, another model is trained with the same data and architecture.Then, white-box attacks are performed on this model.Finally, the adversarial examples can be transferred to the targeted model.
Adversarial robustness aims to build models that classify both x * and x 0 with the same class while preserving sufficient accuracy for the clean examples x 0 .Various defenses have been proposed to improve adversarial robustness.The most efficient defense is called adversarial training, which was first described in [2] and further developed in [3].The idea behind adversarial training is to obtain the parameters w * of the trained model as: in place of the original parameters arg min w L(y, f (x, w)).The set ∆(x) is a set of allowed adversarial attacks for x, e.g., a l 2 ball with a given radius (or budget).In practice, adversar-ial training is performed by adding adversarial examples to the training set, thus providing a lower bound for max ϵ∈∆(x) L(y, f (x + ϵ, w)).

Geometrical Definitions
Consider a multi-class classification task.Let X ⊆ R d be the input domain, and let Y = {1, . . ., c} ⊂ N be the set of labels for the classification task.For example, in MNIST, we have X = [0, 1] d (with d = 784) and c = 10.We assume that X is a d-dimensional embedded smooth connected submanifold of R d .Let m = c − 1.
Definition 1 (Probability simplex).Define the probability simplex of dimension m by ∆ m is a smooth submanifold of R c of dimension m.We can see θ = (θ 1 , . . ., θ m ) as a coordinate system from ∆ m to R m .Then, let us define A machine learning model (e.g., a neural network) is often seen as assigning a label y ∈ Y to a given input x ∈ X .Instead, in this work, we see a model as assigning the parameters of a random variable Y to a given input x ∈ X .The random variable Y has a probability density function p θ belonging to the family of c-dimensional categorical distributions S = {p θ : θ ∈ ∆ m }.
S can be endowed with a differentiable structure by using p θ ∈ S → (θ 1 , . . ., θ m ) ∈ R m as a global coordinate system.Hence, S becomes a smooth manifold of dimension m (more details on this construction can be found in [11], Chapter 2).We can identify p θ with (θ 1 , . . ., θ m ).
We see any machine learning model as a smooth map f : X → ∆ m that assigns to an input x ∈ X , the parameters θ = f (x) ∈ ∆ m of a c-dimensional categorical distribution p θ ∈ S. In practice, a neural network produces a vector of logits s(x).Then, these logits are transformed into the parameters θ with the softmax function: θ = softmax(s(x)).
In order to study the sensitivity of the predicted f (x) ∈ ∆ m with respect to the input x ∈ X , we need to be able to measure distances both in X and in ∆ m .In order to measure distances on smooth manifolds, we need to equip each manifold with a Riemannian metric.
First, we consider ∆ m .As described above, we see ∆ m as the family of categorical distributions.A natural Riemannian metric for ∆ m (i.e., a metric that reflects the statistical properties of ∆ m ) is the Fisher information metric (FIM).Definition 2 (Fisher information metric).For each θ ∈ ∆ m , the Fisher information metric (FIM) g defines a symmetric positive-definite bilinear form g θ over the tangent space T θ ∆ m .In the standard coordinates of R c , for all θ ∈ ∆ m and all tangent vectors v, w ∈ T θ ∆ m , we have where G θ is the Fisher information matrix for parameter θ ∈ ∆ m , defined by For any θ ∈ ∆ m , the matrix G θ is symmetric positive-definite and non-singular (see Proposition 1.6.2 in [12]).The FIM induces a distance on ∆ m , called the Fisher-Rao distance, denoted as d(θ 1 , θ 2 ) for any The FIM has two remarkable properties.First, it is the "infinitesimal distance" of the relative entropy, which is the loss function used to train a multi-class classification model.More precisely, if D is the relative entropy (also known as the Kullback-Leibler divergence) and if d is the Fisher-Rao distance, then given two distributions θ 1 and θ 2 , we have (see Theorem 4.4.5 in [12]): The same result can be restated infinitesimally using the FIM g, as follows: where dθ is seen as a tangent vector of T θ S.
The other remarkable property of the FIM is Chentsov's theorem [13], claiming that the FIM is the unique Riemannian metric on ∆ m , which is invariant under sufficient statistics (up to a multiplicative constant).Informally, the FIM is the only Riemannian metric that is statistically meaningful.In [14], Amari and Nagaoka state a more general result.Along with the FIM, they introduce a family of affine connections parameterized by a real parameter α, called the α-connections.Theorem 2.6 in [14] states that an affine connection is invariant under sufficient statistics if and only if it is an α-connection for some α ∈ R. In other words, the α-connections are the only affine connections that have a statistical meaning.While Equation ( 4) gives the second-order approximation of the relative entropy, an α-connection can be seen as the third-order term in the Taylor approximation of some divergence [14].More precisely, a given α-connection can be canonically associated with a unique divergence (while the second-order term is always given by the FIM).If α = ±1, the canonical divergences are the relative entropy and its dual (obtained by switching the arguments in D(θ 2 ||θ 1 )).More generally, for α ̸ = 0, the canonical divergence is not symmetric.The only canonical divergence that is symmetric is obtained for α = 0, and is precisely the square of the Fisher-Rao distance.Thus, the Fisher-Rao distance is the only statistically meaningful distance.This provides a motivation for using the Fisher-Rao distance to measure lengths in ∆ m .Now, we consider X .Since we are studying adversarial robustness, we need a metric that formalizes the idea that two close data points must be "indistinguishable" from a human perspective (or any other relevant perspective).A natural choice is the Euclidean metric induced from R d on X .Definition 3 (Euclidean metric).We consider the Euclidean space R d endowed with the Euclidean metric g.It is defined in the standard coordinates of R d for all x ∈ R d and for all tangent vectors v, w ∈ T thus, its matrix is the identity matrix of dimension d, denoted as I d .The Euclidean metric induces a distance on R d that we will denote with the l 2 -norm: From now on, we fix : Definition 4. Define the set (Figure 1): For simplicity, assume that f (x) is not on the "boundary" of A x , such that arg max i f i (x) is well-defined. X The set A x is the subset of distributions of ∆ m that have the same class as f (x).
Definition 5 (Geodesic ball of the FIM).Let δ > 0 be the Fisher-Rao distance between f (x) and ∆ m \ A x (Figure 2), i.e., the Fisher-Rao distance between f (x) and the closest distribution of ∆ m with a different class.Define the geodesic ball centered at f (x) In Section 3.3, we propose an efficient approximation of δ. X Definition 6 (Pullback metric).On X , define the pullback metric g of g by f .In the standard coordinates of R d , g is defined for all tangent vectors v, w ∈ T x X by where J x is the Jacobian matrix of f at x (in the standard coordinates of R d and R c ). Define the matrix of gx in the standard coordinates of R d by Definition 7 (Geodesic ball of the pullback metric).Let d be the distance induced by the pullback metric g on R d .We can define the geodesic ball centered at x with radius δ by Note that the radius δ is the Fisher-Rao distance between f (x) and ∆ m \ A x as defined in Definition 5.
Our goal is to start from Proposition 1 and make several assumptions in order to derive a condition that can be efficiently implemented.
Working with geodesic balls b(x, ϵ) and b(x, δ) is intractable, so our first assumption consists of using an "infinitesimal" condition by restating Proposition 1 in the tangent space T x X instead of working directly on X .In T x X , define the Euclidean ball of radius ϵ by Similarly, in T x X , define the gx -ball of radius δ by Assumption 1.We replace Proposition 1 by Proposition 2. Equation ( 16) is equivalent to Since m < d, the Jacobian matrix J x has a rank smaller or equal to m.Thus, since G f (x) has full rank, G x = J T x G f (x) J x has a rank of at most m (when J x has a rank of m).
Assumption 2. The Jacobian matrix J x has a full rank equal to m.
Using Assumptions 1 and 2, the constant rank theorem ensures that for small enough δ, f is ϵ-robust at x.However, contrary to Proposition 1, Assumption 1 does not offer any guarantee on the ϵ-robustness at x for arbitrary δ.

Derivation of the Regularization Method
In this section, we derive a condition for robustness (Proposition 4), which can be implemented as a regularization method.Then, we provide two useful results for the practical implementation of this method: an explicit formula for the decomposition of the FIM as G = P T P (Section 3.2), and an easy-to-compute upper-bound of δ, i.e., the Fisher-Rao distance between f (x) and ∆ m \ A x (section 3.3).

The Partial Isometry Condition
In order to simplify the notations, we replace with G, which is an m × m symmetric positive definite real matrix.• G x with G, which is a d × d symmetric positive-semidefinite real matrix.
We define D = (ker( G)) ⊥ .We will use the two following facts.
Fact 2. J T GJ is symmetric positive semidefinite.Thus, by the spectral theorem, the eigenvectors associated with its nonzero eigenvalues are all in D = rg(J T ).
In particular, since rk(J) = m, there exists an orthonormal basis of T x X , denoted as B = (e 1 , . . ., e m , e m+1 , . . ., e d ), such that each e i is an eigenvector of J T GJ, and such that (e 1 , . . ., e m ) is a basis of D = rg(J T ) and (e m+1 , . . ., e d ) is a basis of ker(J).
The set D = rg(J T ) is an m-dimensional subspace of T x X .gx does not define an inner product on T x X because G has a nontrivial kernel of dimension d − m.In particular, the set B x (0, δ) is not bounded, i.e., it is a cylinder rather than a ball.However, when restricted to D, gx | D defines an inner product.We define the restriction of B x (0, δ) to D: and similarly, we define the restriction of B x (0, ϵ) to D: Assume that f is such that Equation ( 16) holds (i.e., B x (0, ϵ) ⊆ B x (0, δ)).Moreover, assume that we are in the limit case defined as follows: for any perturbation size, we can find a smaller perturbation of f such that Equation ( 16) does not hold anymore.This limit case is equivalent to having B D (0, ϵ) = B D (0, δ).In this case, B x (0, δ) is the smallest possible gx -ball (for the inclusion) such that Equation ( 16) holds.We noticed experimentally that enforcing this stronger criteria yields a larger robustifying effect.Thus, we make the following assumption: Assumption 3. We replace Equation ( 16) with Proposition 3. Equation ( 21) is equivalent to We can rewrite Equation (22) in matrix form: In Section 3.2, we show how to exploit the properties of the FIM to derive a closed-form expression for a matrix P ∈ GL m (R), such that G = P T P. For now, we assume that we can easily access such a P and we are looking for a condition on P and J, which is equivalent to Equation (23).
Proposition 4. The following statements are equivalent: where I m is the identity matrix of dimension m × m.
Proposition 4 constrains the matrix PJ to be a semi-orthogonal matrix (multiplied by a homothety matrix).A smooth map f between Riemannian manifolds (X , g) and (∆ m , g) is said to be (locally) isometric if the pullback metric (denoted f * g) coincides with g, i.e., f * g = g.Such a map f locally preserves distances.In our case, f * g = g is not a metric (since its kernel is non-trivial); thus, f cannot be an isometry.However, Equation (22) ensures that f locally preserves distances along directions spanned by D. Hence, f becomes a partial isometry, at least in the neighborhood of the training points.
Under the Assumptions 1-3, Equation (ii) in Proposition 4 implies robustness as defined in Definition 8.In other words, Equation (ii) is a sufficient condition for robustness.However, there is no reason for a neural network to satisfy Equation (ii).This is why we define the following regularization term: where ~• ~is any matrix norm, such as the Frobenius norm or the spectral norm.We use the Frobenius norm in the experiments of Section 4. To compute α(x, ϵ, f ), we only need to compute the Jacobian matrix J, which can be efficiently achieved with backpropagation.Finally, the loss function is: where l is the cross-entropy loss, and λ > 0 is a hyperparameter controlling the strength of the regularization with respect to the cross-entropy loss.The regularization term α(x, ϵ, f ) is minimized during training, such that the model is pushed to satisfy the sufficient condition of robustness.

Coordinate Change
In this subsection, we show how to compute the matrix P that was introduced in Proposition 4. To this end, we isometrically embed ∆ m into the Euclidean space R c using the following inclusion map: We can easily see that µ is an embedding.If S m (2) is the sphere of radius 2 centered at the origin in R c , then µ(∆ m ) is the subset of S m (2), where all coordinates are strictly positive (using the standard coordinates of R c ).
Proposition 5. Let g be the Fisher information metric on ∆ m (Definition 2), and g be the Euclidean metric on R c .Then µ is an isometric embedding of (∆ m , g) into (R c , g).Now, we use the stereographic projection to embed ∆ m into R m : Proposition 6.In the coordinates τ, the FIM is: Let J be the Jacobian matrix of τ • µ : ∆ m → R m at f (x).Then, we have: Thus, we can choose: More explicitly, we have: Proposition 7.For i, j = 1, . . ., m:

The Fisher-Rao Distance
In this subsection, we derive a simple upper-bound for δ (i.e., the Fisher-Rao distance between f (x) and ∆ m \ A x ).In Proposition 5, we show that the probability simplex ∆ m endowed with the FIM can be isometrically embedded into the m-sphere of radius 2. Thus, the angle β between two distributions of coordinates θ 1 and θ 2 in ∆ m with µ 1 = µ(θ 1 ) and The Riemannian distance between these two points is the arc length on the sphere: In the regularization term defined in Equation ( 24), we replace δ with the following upper bound: where O = 1 c (1, . . ., 1) is the center of the simplex ∆ m .Thus,

Experiments
The regularization method introduced in Section 3 is evaluated on MNIST and CIFAR-10 datasets.Our method uses the loss function introduced in Equation ( 25).

Experimental Setup
For the MNIST dataset, we implement a LeNet model with two convolutional layers of 32 and 64 channels, respectively, followed by one hidden layer with 128 neurons.The code is available here: https://github.com/lshigarrier/geometric_robustness.git(accessed on 1 December 2022).We train three models: one regularized model, one baseline unregularized model, and one model trained with adversarial training.All three models are trained with the Adam optimizer (β 1 = 0.9 and β 2 = 0.999) for 30 epochs, with a batch size of 64, and a learning rate of 10 −3 .For the regularization term, we use a budget of ϵ = 5.6, which is chosen to contain the l ∞ ball of radius 0.2.The adversarial training is conducted with 10 iterations of PGD with a budget ϵ adv = 0.2 using l ∞ norm.We found that λ = 10 −6 yields the best performance in terms of robustness-accuracy trade-off; this value is small because we did not attempt to normalize the regularization term.
The models are trained on the 60,000 images of MNIST's training set and then tested on 10,000 images of the test set.The baseline model achieves an accuracy of 98.9% (9893/10,000), the regularized model achieves an accuracy of 94.0% (9403/10,000), and the adversarially trained model achieves an accuracy of 98.8% (9883/10,000).Although the current implementation of the regularized model is almost six times slower to train than the baseline model, it may be possible to accelerate the training using, for example, the technique proposed by Shafahi et al. [15], or using another method to approximate the spectral norm of J.Even without relying on these acceleration techniques, the regularized model is still faster to train than the adversarially trained model.

Robustness to Adversarial Attacks
To measure the adversarial robustness of the models, we use the PGD attack with the l ∞ norm, 40 iterations, and a step size of 0.01.The l ∞ norm yields the hardest possible attack for our method, and corresponds more to the human notion of "indistinguishable images" than the l 2 norm.The attacks are performed on the test set, and only on images that are correctly classified by each model.The results are reported in Figure 3.The regularized model has a slightly lower accuracy than the baseline model for small perturbations, but the baseline model suffers a drop in accuracy above the attack level ϵ = 0.1.Adversarial training achieves high accuracy for small-to medium-sized perturbations but the accuracy decreases sharply above ϵ = 0.3.The regularized model remains robust even for large perturbations.The baseline model reaches 50% accuracy at ϵ = 0.2 and the adversarially trained model at ϵ = 0.325, while the regularized model reaches 50% accuracy at ϵ = 0.4.
Table 1 provides more results against AutoAttack (AA) [7], which was designed to offer a more reliable evaluation of adversarial robustness.For a fair comparison, and in addition to a baseline model (BASE), we compare the partial isometry defense (ISO) with several other computationally efficient defenses: distillation (DIST) [8], Jacobian regularization (JAC) [9], which also relies on the Jacobian matrix of the network, and Fisher information regularization (FIR) [10], which also leverages information geometry.We also consider an adversarially trained (AT) model using PGD.ISO is the best defense that does not rely on adversarial training.In future work, ISO may be combined with AT to further boost performance.Note that ISO and JAC are more robust against l 2 attacks since they were designed to defend the model against such attacks.On the other hand, AT is more robust against l ∞ attacks, because the adversarial training was conducted with the l ∞ norm.

Experiments on CIFAR-10 Dataset
We consider a DenseNet121 model fine-tuned on CIFAR-10 using pre-trained weights for ImageNet.The code is available here: https://github.com/lshigarrier/iso_defense.git(accessed on 26 January 2023).As for the MNIST experiments, we compare the partial isometry defense with distillation (DIST), Jacobian regularization (JAC), and Fisher information regularization (FIR).Here, adversarial training (AT) relies on the fast gradient sign method (FGSM) attack [16].All defenses are compared against PGD for various attack strengths.The results are presented in Table 2.The defenses are evaluated in a "gray-box" setting where the adversary can access the architecture and the data but not the weights.More precisely, the adversarial examples are crafted from the test set of CIFAR-10 using another unregularized DenseNet121 model.AT is the more robust method, but ISO achieves a robust accuracy 30% higher than the next best analogous method (FIR).
One of our goals is to provide alternatives to adversarial training (AT).Apart from high computational costs, AT suffers from several limitations: it only robustifies against the chosen attack at the chosen budget and it does not offer a robustness guarantee.For example, under Gaussian noise, AT accuracy decreases faster than baseline accuracy (i.e., no defense).Achieving high robustness accuracy against specific attacks on a specific benchmark is insufficient and misleading to measure the true robustness of the evaluated model.Our method offers a new point of view that can be extended to certified defense methods in future works.

Discussion and Related Work
In 2019, Zhao et al. [17] proposed to use the Fisher information metric in the setting of adversarial attacks.They used the eigenvector associated with the largest eigenvalue of the pullback of the FIM as an attack direction.Following their work, Shen et al. [10] suggested a defense mechanism by suppressing the largest eigenvalue of the FIM.They upper-bounded the largest eigenvalue by the trace of the FIM.As in our work, they added a regularization term to encourage the model to have smaller eigenvalues.Moreover, they showed that their approach is equivalent to label smoothing [18].In our framework, their method consists of expanding the geodesic ball b(x, δ) as much as possible.However, their approach does not guarantee that the constraint imposed on the model will not harm the accuracy more than necessary.In our framework, matrix PJ (compared with δ/ϵ) informs the model on the precise restriction that must be imposed to achieve adversarial robustness in the l 2 ball of radius ϵ.
Cisse et al. [19] introduced another adversarial defense called Parseval networks.To achieve adversarial robustness, the authors aim to control the Lipschitz constant of each layer of the model to be close to unity.This is achieved by constraining the weight matrix of each layer to be a Parseval tight frame, which is another name for semi-orthogonal matrix.Since the Jacobian matrix of the entire model with respect to the input is almost the product of the weight matrices, the Parseval network defense is similar to our proposed defense, albeit with completely different rationales.This suggests that geometric reasoning could successfully supplement the line of work on Lipschitz constants of neural networks, such as in [20].
Following another line of work, Hoffman et al. [9] advanced a Jacobian regularization to improve adversarial robustness.Their regularization consists of using the Frobenius norm of the input-output Jacobian matrix.To avoid computing the true Frobenius norm, they relied on random projections, which are shown to be both efficient and accurate.This method is similar to the method of Shen et al. [10] in the sense that it will also increase the radius of the geodesic ball.However, the Jacobian regularization does not take into account the geometry of the output space (i.e., the Fisher information metric) and assumes that the probability simplex ∆ m is Euclidean.
Although this study focuses on l 2 norm robustness, it must be pointed out that there are other "distinguishability" measures that can be used to study adversarial robustness, including all other l p norms.In particular, the l ∞ norm is often considered to be the most natural choice when working with images.However, the l ∞ norm is not induced by any inner product and, hence, there is no Riemannian metric that induces the l ∞ norm.However, given an l ∞ budget ϵ ∞ , we can choose an l 2 budget ϵ 2 =

√
dϵ ∞ , such that any attack in the ϵ ∞ budget will also respect the ϵ 2 budget.When working on images, other dissimilarity measures are rotations, deformations, and color changes of the original image.Contrary to the l 2 or l ∞ norms, these measures do not rely on a pixel-based coordinate system.However, it is possible to define unrestricted attacks based on these spatial dissimilarities, for example, in [21].
In this work, we derive the partial isometry regularization for a classification task.The method can be extended to regression tasks by considering the family of multivariate normal distributions as the output space.On the probability simplex ∆ m , the FIM is a metric with constant positive curvature, while it has constant negative curvature on the manifold of multivariate normal distributions [22].
Finally, the precise quantification of the robustness condition presented in Equation ( 12) and Proposition 4 paves the way for the development of a certified defense [23] in this framework.By strongly enforcing Proposition 4 on a chosen proportion of the training set, it may be possible to maximize the accuracy under the constraint of a chosen robustness level, which offers another solution to the robustness-accuracy trade-off [24,25].Certifiable defenses are a necessary step for the deployment of deep learning models in critical domains and missions, such as civil aviation, security, defense, and healthcare, where a certification may be required to ensure a sufficient level of trustworthiness.

Conclusions and Future Work
In this paper, we introduce an information geometric approach to the problem of adversarial robustness in machine learning models.The proposed defense consists of enforcing a partial isometry between the input space endowed with the Euclidean metric and the probability simplex endowed with the Fisher information metric.We subsequently derived a regularization term to achieve robustness during training.The proposed strategy is tested on the MNIST and CIFAR-10 datasets, and shows a considerable increase in robustness without harming the accuracy.Future works will evaluate the method on other benchmarks and real-world datasets.Several attack methods will also be considered in addition to PGD and AutoAttack.Although this work focuses on l 2 norm robustness, future work will consider other "distinguishability" measures.
Our work extends a recent, promising but understudied framework for adversarial robustness based on information geometric tools.The FIM has already been harnessed to develop attacks [17] and defenses [10,26] but a precise robustness analysis is yet to be proposed.Our work is a step toward the development of such an analysis, which might yield certified guarantees relying on these geometric tools.The study of adversarial robustness, which is non-local by definition and contrary to accuracy, should benefit greatly from a geometrical vision.However, the current literature on adversarial robustness is mainly concerned with the FIM and its spectrum (which are very local objects) without unfolding the full arsenal developed in information geometry.In our work, we demonstrate the usefulness of such an approach by developing a preliminary robustification method.Model robustification is a hard, unsolved yet vital problem to ensure the trustworthiness of deep learning tools in safety-critical applications.Our framework could be extended and applied to existing certification strategies, such as Lipschitz-based [27] or randomized smoothing [23], where statistical models naturally appear.
Proof of Proposition 5. We need to show that µ * g = g.Using the coordinates θ on ∆ m (Definition 1) and the standard coordinates on R c , and writing f (x) = θ 0 = (θ 1 0 , . . ., θ m 0 ) we have which is the FIM, as defined in Definition 2.

Figure 3 .
Figure 3. Accuracy of the baseline (dashed, blue), regularized (solid, green), and adversarially trained (dotted, red) models for various attack perturbations on the MNIST dataset.The perturbations are obtained with PGD using l ∞ norm.

Table 1 .
Clean and robust accuracy on MNIST against AA, averaged over 10 runs.The number in parentheses is the attack strength.

Table 2 .
Clean and robust accuracy on CIFAR-10 against PGD.The number in parentheses is the attack strength.