Applications of Neural Network-Based AI in Cryptography

: Artiﬁcial intelligence (AI) is a modern technology that allows plenty of advantages in daily life, such as predicting weather, ﬁnding directions, classifying images and videos, even automatically generating code, text, and videos. Other essential technologies such as blockchain and cybersecurity also beneﬁt from AI. As a core component used in blockchain and cybersecurity, cryptography can beneﬁt from AI in order to enhance the conﬁdentiality and integrity of cyberspace. In this paper, we review the algorithms underlying four prominent cryptographic cryptosystems, namely the Advanced Encryption Standard, the Rivest–Shamir–Adleman, Learning With Errors, and the Ascon family of cryptographic algorithms for authenticated encryption. Where possible, we pinpoint areas where AI can be used to help improve their security.


Introduction
In 1991, Rivest [1] presented a talk about relationships between machine learning and cryptography.Surprisingly, while artificial intelligence (AI) is being extensively developed in a number of applications, a very limited number of research studies have been done in the area of using AI in cryptography.
AI is the set of tools, methodologies, and implementations deployed to enable a digital computer or a robot to perform tasks that are usually associated with human intelligence [2,3].In the last few decades, AI has exponentially developed in many sectors, such as big data, Internet of Things, robotics, banking, finance, healthcare, e-commerce, meteorology, education, facial recognition, information systems, autonomous driving, data security, etc. Machine learning (ML) is a subset of AI that enables smart machines and computers to learn without human intervention and gives them the ability to imitate human behavior.The goal of ML is to design algorithms that extract information from data in order to build models capable of deriving predictions and patterns.ML covers a range of methodologies and applications, such as facial recognition, natural language, online chatbots, medical imaging, diagnostics, self-driving of vehicles, etc.
On the other hand, cybersecurity is the set of various methods and tools deployed to protect electronic devices, such as information systems, servers, computers, networks, and data centers, from all kind of threats, vulnerabilities, and attacks.Often, an attack on an electronic device has severe impacts on the regular operations, or even worse, can completely destroy the stored data.Therefore, the goal of cybersecurity is to detect any attack, handle it, and recover the system after the accident.Cybersecurity includes the use of cryptography and cryptographic protocols for protecting data in transit and in storage.
The aim of this paper is to present an overview of the applications of AI and ML in cryptography with a focus on four prominent cryptosystems, namely, AES, RSA, LWE, and Ascon.For this, we start by providing an overview of AI and ML techniques before embarking on presenting the above-referenced cryptosystems and explaining how AI and ML can be applied to improve their security for the benefit of cybersecurity.
Cryptography is concerned with protecting information and communications transferred over public communication channels in the presence of adversaries.It allows only the recipient of a message to view its contents and is used in all domains where the security is a concern.To transmit a message or electronic data, two families of cryptography can be used: symmetric and asymmetric cryptography.In symmetric cryptography, also called secret key cryptography, the same key is used for both encryption and decryption.Typically, a message is encrypted using a secret key, and both the encrypted message and the secret key are sent to the recipient for decryption.In asymmetric cryptography, invented by Diffie and Hellman [4] in 1976, and mostly known as public key cryptography, two keys are involved: one is public, and one is private.In general, the two keys are related by a mathematical process with the idea that it is computationally infeasible to determine one key given the other one.To encrypt and send a message, the sender uses the public key of the recipient.To decrypt, the recipient uses his or her private key.
Cryptanalysis is the study of cryptographic schemes for vulnerabilities.Specifically, there are mainly two methods of deploying cryptanalysis: mathematical and side-channel.Mathematical cryptanalysis, or algebraic cryptanalysis, consists of breaking cryptographic schemes by scrutinizing their mathematical properties, while side-channel cryptanalysis consists of studying and manipulating the implementations in order to collect information on the keys or on the plaintext itself.
In symmetric cryptography, the security of any scheme is based on the robustness of its S-box, a nonlinear operator that is often related to a vectorial Boolean function with very good cryptographic properties, such as resistance to differential cryptanalysis; linear cryptanalysis; boomerang cryptanalysis; and a variety of other cryptographic criteria [5].Artificial intelligence can be used to design S-boxes from vectorial Boolean functions and to study their cryptographic properties in order to select the most efficient and the most secure schemes.
In asymmetric cryptography, security is often based on a hard mathematical problem such as the integer factorization problem, the discrete logarithm problem, the Shortest Vector Problem (SVP), and the Closest Vector Problem (CVP) in a lattice.
Currently, the most widely used asymmetric cryptosystem is RSA, invented in 1978 by Rivest, Shamir, and Adleman [6].RSA is used for encryption, signatures, and key distribution, and is a powerful tool to provide privacy and to ensure authenticity of emails, digital data, and payment systems.The mathematics behind RSA are based on the ring Z/NZ, where N = pq is the product of two large prime numbers.In RSA, the public key is an integer e satisfying gcd(e, (p − 1)(q − 1)) = 1, and the private key is the integer d satisfying ed ≡ 1 (mod (p − 1)(q − 1)).To encrypt a message 1 < m < N, one computes c ≡ m e (mod N), and to decrypt it, one computes m ≡ c d (mod N).Since its invention, RSA has been intensively analyzed for vulnerabilities [7][8][9][10].
A promising family of asymmetric cryptography has appeared with the Learning With Error (LWE) problem and its variants.LWE was proposed by Regev [11] in 2005.Several homomorphic encryption libraries, public key encryptions, and digital signature systems are based on LWE or on one of its variants [12].In 2016, NIST initiated a process to select and standardize post-quantum cryptography standardization [13], and in 2022, it selected CRYSTALS-Kyber [14] for public-key encryption and CRYSTALS-Dilithium [15], Falcon [16], and SPHINCS+ [17] for digital signatures.Among the four selected algorithms for standardization, three are based on hard problems in lattices, namely CRYSTALS-Kyber, CRYSTALS-Dilithium, and Falcon.There are several variants of LWE, such as Polynomial-LWE [18], Ring-LWE [19], Module-LWE [20], and Continuous LWE [21].The goal in LWE is to find a secret vector s ∈ Z n q given m ≥ n samples of the form (a i , a i , s + e i ), where a i ∈ Z n q is uniformly generated, e i ∈ Z m q is a small vector chosen according to a probability density, and a i , s is the inner product of a i and s.The security of LWE is based on several hard problems in lattices, specifically the Gap Shortest Vector Problem (GapSVP) and the Shortest Independent Vectors Problem (SIVP).
The explosion of the Internet of Things (IoT), characterized by low-energy and lowcomputation power devices, has prompted the need for efficient and strong lightweight ciphers for protecting the privacy and authenticity of data transmitted by these devices.The U.S. government's National Institute of Standards and Technology (NIST) recently selected the Ascon family of lightweight symmetric ciphers for authenticated encryption [22,23] to be used by IoT devices.The Ascon family ensures 128-bit security and uses a 320-bit permutation internally.
Common attacks on both symmetric and asymmetric cryptography are side-channel attacks, introduced by Kocher [24] in 1996.Side-channel attacks are used to retrieve private keys from electronic devices.There are various types of possible side-channel attacks depending on the cryptosystem and the device.This includes timing execution [24], power consumption [25], electromagnetic radiation [26], fault injection [27], and acoustic attack [28].
In symmetric and asymmetric cryptography, a plaintext M is encrypted by a nonlinear trapdoor function F together with a secret or a private key K so that C = F(M, K).An algebraic attack consists of finding the plaintext M or the key K using publicly accessible Cs and, eventually, finding their known corresponding Ms.Moreover, for symmetric ciphers, an algebraic attack can be used to approximate the hole or a partial (reduced-round) encryption process by a linear function, which makes the cipher vulnerable.
The rest of this paper is organized as follows.In Section 2, we review the main facts of artificial intelligence (AI) and machine learning (ML).In Section 3, we discuss the differences between AI and ML.In Section 4, we provide a list of possible applications of AI in cryptography.In Sections 5-8, we review the four prominent cryptosystems, namely AES, RSA, LWE, and Ascon, and present possible applications of AI to test and enhance their security.We conclude the paper in Section 9.

Artificial Intelligence and Machine Learning
Artificial intelligence (AI) is a subarea of computer science that concerns itself with building rational agents, i.e., agents that sense the world (i.e., read a percept), map the percept to some internal representation, and identify the best action to take among a set of possible actions given the percept.The selected action is the one that minimizes the agent's objective function, then enacts the action and updates the internal representation of the world as a consequence of the action.There exist various types of agents: search agents, adversarial search agents, planning agents, logical agents, probabilistic agents, and learning agents.The latter are data-driven and use machine learning algorithms to predict the action to take as a function of the input percept from a collection of tuples (percept, action) [29].
Machine learning (ML) is a subarea of AI that concerns itself with learning agents and algorithms.There exist three (3) major classes of learning agents/algorithms: 1.
Supervised learning algorithms.These use tuples (input vector; output vector, also called Label) to learn/approximate the output vector for an unseen given input vector.

2.
Unsupervised learning algorithms.These do not make use of the label and use the input vector only to learn/infer/approximate the output vector for an unseen given input vector.

3.
Reinforcement learning algorithms.This class progressively learns/infers/approximates the output vector for an unseen given input vector from the positive or negative feedback returned from the external world.
Recently, two more subclasses have come to light.These are: 1.
Self-supervised learning algorithms.These algorithms mask parts of the input and try to learn it.In essence, these algorithms transform an unsupervised problem (i.e., a problem for which no labels exist) into a supervised problem by auto-generating the labels.

2.
Imitation learning algorithms.These are very recent.In essence, imitation learning algorithms reproduce others' actions from observing the actions performed by other agents/machine learning algorithms [30].
A comprehensive review and taxonomy of artificial intelligence and machine learning techniques together with their disadvantages and challenges can be found in [31].
Artificial neural networks (ANNs) in particular have proven to be more powerful than others in many applications, including computer vision (CV), natural language processing (NLP), and autonomous driving (AD).

Artificial Neural Networks (ANNs) as a Non-Linear Approximation Function
Artificial neural network models are mainly characterized by their architecture, the number of densely connected hidden layers, the number of cells/perceptrons (see Figure 1) per layer, activation functions used in the neurons for firing, and the cost function used to train the network (or, similarly, to find the weights of the interconnections between layers) using gradient descent and back propagation for computing the gradient of the cost function in the weight space [32] (see Figure 2).ANNs can be divided into three major categories: those that perform input classification, sequence learning, or function approximation.

Input layer
Hidden layer 1 Hidden layer 2 Output layer Figure 2. A multi-layer perceptron forming a 4-layer neural net with 3 input units, 5 units in the first hidden layer, 4 units in the second hidden layer, and 2 output units.
Thanks to nonlinear activation functions such as the Sigmoid, Tanh or RELU functions, ANNs are great at approximating arbitrary nonlinear functions (often as piece-wise linear approximations).
In fact, an ANN with at least one hidden layer is a universal approximator, i.e., it can represent any function [33].However, one hidden layer might need an exponential number of neurons, so often an architecture with many fully connected hidden layers (deep neural network) is preferred, as it is more compact.Arguably, the performance of the network increases with more hidden units and more hidden layers (see Figure 3).Although there is no universal method to approximate any arbitrary given function, there are development procedures that consistently lead to successful approximations of nonlinear functions within a specific field.There is wealth of literature for such applications in all fields, including bioengineering, mechanics, agriculture, digital design, and, last but not least, intrusion detection and cybersecurity.
As arbitrary function approximators, ANNs lend themselves naturally to cryptanalysis techniques, such as known and chosen plaintext attacks, linear, nonlinear, and differential attacks.Furthermore, the feed-forward of the data process through multiple layers of the neural networks has functional resemblance to multiple-round operations in a symmetric cipher, i.e., linear permutations, followed by nonlinear transformations.This also invites the attempt to leverage ANNs for the design of ciphers.

ANN Types and Their Domains of Application
Due to numerous advantages over other ML techniques, such as the ability to learn hierarchical features, the ability to handle multiple output, and the ability to deal with nonlinear data, which is clearly highlighted by the rapid development of foundation models (FMs) [35], ANN-based learning agents have overshadowed other types of intelligent agents, including the ones that use other machine learning techniques.They have become a synonym for AI.This said, and despite the aforementioned advantages, it is not guaranteed that, because deep ANNs demonstrate excellent performance for domain problems dealing with language, images, and videos, they will necessarily outperform other ML techniques in cryptography [36,37].Unless otherwise specified, we will use the term AI to designate agents that use ANNs.
There are many types of neural networks, such as auto-encoders, convolutional neural networks (CNNs), long short-term memory networks (LSTMs), recurrent neural networks (RNNs), generative adversarial networks (GANs), and transformers.They stand apart from one other through the type of processing attached to individual layers and the architecture used to connect the hidden layers, among other things, as well as whether a cell uses its own output from previous excitation or not.

Convolutional Neural Networks (CNNs)
In addition to hidden layers, a CNN contains multiple convolution layers, which are responsible for the extraction of important features, such as images, from spatial data.The earlier layers are responsible for low-level details, and the later layers are responsible for more high-level features.As such, CNNs are well-suited for applications such as facial recognition, medical analysis, and image classification.

Recurrent Neural Networks (RNNs)
RNNs are used to predict the next item in sequential data, which can be videos or text.In RNNs, a neuron in a layer also receives a time-delayed input from its own previous instance prediction.This instance prediction is stored in the RNN cell, which is a second input for every prediction.RNNs are typically used in tasks such as text or speech generation, text translation, and sentiment analysis.

Autoencoders
An autoencoder is a type of artificial neural network used to learn efficient coding of unlabeled data (unsupervised learning).An autoencoder learns two functions: an encoding function that transforms the input data into a low-dimension latent space representation, and a decoding function that recreates the input data from the latent space representation.Autoencoders are used in dimensionality reduction, image compression, image denoising, feature extraction, image generation using generative adversarial networks (GANs), sequence-to-sequence predictions, and recommendation systems.

Long Short-Term Memory Networks (LSTMs)
LSTMs use gates to control which output should be used or forgotten, including: input gate, output gate, and forget gate.LSTMs are best applied in speech recognition and text prediction.

Generative Adversarial Networks (GANs)
GANs learn to create new data instances that resemble the training data.For example, GANs can create images that look like photographs of human faces, even though the faces do not belong to any real person [38].

Transformers
The transformer model architecture drops recurrence and convolutions and uses an attention mechanism to connect an encoder network with a decoder network.Applications of this model include machine translation [39].
All of these variants of ANNs are mainly characterized by their architecture, the number of parameters/weights that make up the model and that have to be learned, and the training corpora used (Common Crawl, The Pile, MassiveText, Wikipedia, GitHub, books, articles, logs, etc.).As this paper is being written, the Megatron-Turing Natural Language Generation (MT-NLG), a transformer-based language generation model, uses 530 billion parameters, more than 7 times the average number of neurons in the adult human brain.These are also called foundation/base models since they can be adapted/finetuned for different tasks/contexts.Head-to-head comparison of existing (commercial and open source) large models (LMs) can be found in [40].It is worth noting that the size of a model depends largely on the nature of the problem at hand.In many engineering domains, the models used are not as big as foundation models.

Possible Applications of AI in Cryptography
The combination of cryptography with artificial intelligence will be beneficial for the security of several applications.One of the goals of applying AI is to identify potential vulnerabilities of a cryptographic system.

1.
In cybersecurity: Cybersecurity can easily benefit from the applications of AI.By applying AI, it is possible to write and process software to detect and to defend a system against cyberattacks.The advantages of applying AI instead of traditional security systems is that AI provides fast solutions and better security.

2.
In blockchain: Blockchain is a new technology with various industrial and economic applications.It plays a prominent role in many sectors, such as banking, cryptocurrencies, and data management.It achieves a complete independence from any central authority and guarantees secure communications thanks to advanced cryptographic techniques.AI can be used to analyze the security and the efficiency of blockchain applications in order to improve their practicability, security, and profitability.

3.
In symmetric cryptography: AI can be deployed to analyze the security of a symmetric system defined by an S-box or a vectorial Boolean function by testing all possible cryptographic criteria, including bijectivity, nonlinearity, linear analysis, differential analysis, balancedness, correlation immunity, algebraic degree, side-channel analysis, strict avalanche criterion (SAC), bit independence criterion (BIC), and the NIST Statistical Test Suite [41], which is used to guarantee the quality of random number generators for cryptographic applications.Especially, the security of AES and Ascon can be much improved if tested with the help of AI.

4.
In asymmetric cryptography based on RSA: AI can be used to generate safe primes for the RSA modulus, and to generate safe public and private keys by running the known attacks such as factorization, small private key attacks, partial key exposure attacks, and side-channel attacks.

5.
In asymmetric cryptography based on LWE: Attacks on LWE and its variants are very limited because their security is based on the hardness of hard problems in lattices.
Nevertheless, AI can be used to test the hardness of lattice problems with different parameters in order to guarantee the safety and the efficiency of the cryptosystem.
This said, AI itself can benefit from modern cryptographic techniques, such as homomorphic encryption, to resolve the privacy issue related to data used in learning without disclosing it [42].
The application of advanced ANNs such as deep, convolutional, and generative adversarial neural networks is also gaining in momentum.In this regard, Ref. [52] deployed a deep network for side-channel attacks on masked and unprotected AES implementations.Ref. [53] deployed a linear attack on round-reduced DES using deep learning with plaincipher pairs.Ref. [54] posed the cryptanalysis of a cipher as a language translation problem to be solved using a GAN, which was adapted to handle discrete data.The GAN is trained to learn the mapping between plain and cipher text distributions without supervision.Ref. [55] used deep convolutional neural networks to exploit differential properties of round-reduced Speck cipher to perform a differential distinguishing attack that did not involve key search.Ref. [56] used a deep neural network to perform the known-plaintext attack on AES and its modes of operation to restore different bit lengths with probabilities.Ref. [57] used deep learning in side-channel attacks against a secure implementation of the RSA algorithm.Surprisingly, the applications of advanced ANNs to asymmetric encryption has yet to begin.Therefore, in this article, we attempt to pinpoint stages in prominent encryption algorithms, namely AES, RSA, and LWE, where the applications of advanced ANNs can help increase their security.

Datasets
Datasets are structured collections of data used to train a model for the nonlinear trapdoor function C = F(K, M).They consist of pairs of collected (M, C) that are generated synthetically at the design phases of F. Typically, all combinations of Ms and their differences are generated and fed to F to obtain Cs, leading to a balanced dataset of pairs (M, C).

The Advanced Encryption Standard (AES)
The Advanced Encryption Standard [58], also known as the Rijndael algorithm, is a symmetric block cipher that was designed by Daemen and Rijmen [59] in 1999.It was adopted by the U.S. National Institute of Standards and Technology (NIST) in 2001 to supersede the Data Encryption Standard (DES) [60].AES allows key lengths of size 128, 192, or 256 bits, with a block length of 128 bits.In AES, the encryption performs 10 rounds for a 128-bit key, 12 rounds for a 192-bit key, and 14 rounds for a 256-bit key.
The encryption and the decryption in the AES algorithm start with two parameters: a block B of length 128 bits, and a key K of length 128, 192, or 256 bits (see Table 1).In all steps of the encryption and decryption in AES, the blocks B = {B 0 , . . ., B 15 } are represented by 4 × 4 square matrices of bytes called state arrays.   .

The Encryption Process of AES
At the beginning of the encryption, each key K is expanded into n + 1 subkeys by an algorithm called key expansion, where n ∈ {10, 12, 14} is the number of rounds.The encryption phase starts with the initial round by XORing the plaintext with the first subkey.Then, the rounds are composed of four algorithms, namely AddRoundKey, SubBytes, ShiftRows, and MixColumns, so that a round R i with 0 ≤ i ≤ n is in the form: The four algorithms can be summarized as follows: • AddRoundKey: The subkey for the round is bitwise XORed with the state array computed in the previous step.In the first round, the state array is the input block, and in the last round, the resulting state array is the ciphertext (see Table 2).
• SubBytes: The SubBytes transformation is a byte substitution that operates on each byte of the state using a substitution table called S-box (see Table 3).Algebraically, each byte x is transformed into a list of 8 bits, and is transformed via the rule T, , and x −1 is the inverse of x in the finite field F 2 8 modulo the polynomial  .
• MixColumns: In this transformation, each column is multiplied by a fixed matrix,as in Table 5. .
In MixColumns, the operations are performed in F 2 8 modulo the polynomial

The Decryption Process in AES
The decryption process in AES is performed by applying the inverse of the algorithms used in the encryption process.If n is the number of rounds in the encryption process, then there are m = n rounds in the decryption process.The decryption starts by XORing the ciphertext with the last subkey of the key expansion.For 0 ≤ i ≤ m, the the inverse round InvR i is composed of four algorithms as follows: The algorithms can be summarized as follows.
• InvAddRoundKey: As in the AddRoundKey algorithm, the subkey for the round is bitwise XORed, with the state array computed in the previous step.In the first round, the state array is the ciphertext block, and in the last round, the resultant state array is the plaintext.
In InvMixColumns, the operations are performed in F 2 8 modulo the polynomial

Main Attacks on AES
The goal of the attacks on an asymmetric cryptosystem is to find good properties inside the cipher that allow for retrieval of partial or total information on the secret key.In addition to the exhaustive attack, the two prominent attacks are the linear cryptanalysis and the differential cryptanalysis.

•
Exhaustive search attack.Brute force attacks, or exhaustive attacks, consist of trying all possible keys to a ciphertext and checking whether the plaintext is recognizable.
It is easy to prevent such attacks by using large keys.In AES, the key lengths are 128, 192, and 256 bits.This makes the total key combination of each key length 2 128 , 2 192 , and 2 256 , respectively, which is infeasible even for the fastest supercomputers today.On the other hand, with a computer with quantum technology, due to Grover's algorithm [61], it is possible to perform an exhaustive search in the square root of the classical time, and the key lengths should be 2 256 .• Linear attack.In 1993, Matsui [62] invented one of the most practical attacks on DES, known as linear cryptanalysis.It can be applicable to AES by approximating the nonlinear parts in the rounds by linear expressions.This makes the round a linear function where the input or the output is easy to compute.In the situation where the S-box of the system is constructed following a vectorial boolean function F : F 2 n → F 2 n , the linear cryptanalysis is constructed on the value of its nonlinearity, which is defined by: where a • x is the inner product in F 2 , defined as a • x = ⊕ n i=0 a i x i .The nonlinearity of the function F represents the minimum Hamming distance between F and all possible affine functions.It is well-known that NL F is upper bounded by 2 n − 2 n 2 −1 .Vectorial Boolean functions that achieve NL F = 2 n − 2 n 2 −1 are called bent.Bent functions exist only when n is even and are important for building balanced S-boxes.In practice, the nonlinearity of the vectorial Boolean function F is studied via the linear probability table (LPT) defined for the entry (a, b) ∈ F 2 2 n by: For AES, except for the first row and first column, all rows and columns of the LPT have the same distribution of values as given in Table 8.
The differential uniformity of F is defined by: The differential cryptanalysis exploits the differential probability DP F , specifically: For a randomly chosen permutation and for any a ∈ F 2 n \{0}, the value F(x) + F(a + x) is expected to be uniformly distributed with equiprobability.This makes DDT F (a, b) a reliable and practical distinguisher if DP F (a, b) is sufficiently small.For the AES S-box, Table 9 shows the distribution of the DP F values and their frequencies.If x 0 is a solution to the equation F(x) + F(a + x) = b, then x 0 + a is also a solution.This implies that DDT F (a, b) ≥ 2 for all a = 0 and, consequently, δ F ≥ 2. Vectorial Boolean functions satisfying δ F = 2 are called almost perfect nonlinear (APN) functions.As shown in Table 9, the differential uniformity of the AES S-box is 4. Hence, AES does not belong to the APN family; nevertheless, its differential uniformity is too small.This makes AES resistant to differential cryptanalysis.

Applications of AI to Block Ciphers
There are plenty of attacks on AES that can be performed by AI.The goal of using AI with AES is to test the resistance of its secret keys and its S-boxes to such attacks.AI can be used for the following tasks.

1.
Resistance to side-channel attacks [64]: Side-channel attacks exploit the operations performed by a cryptographic system during encryption or decryption to gain information about the private key.The most used channel attacks are timing attacks, simple power attacks, differential power attacks, electromagnetic radiation attacks, correlation power attacks, etc.These attacks rely on collecting and interpreting observations in order to infer information about key size and bits.These inferences lend themselves naturally to ML and ANNs in general and to advanced ANNs/models in particular.As described earlier, some work has already been initiated in this direction [52].

2.
Resistance to fault attacks [65]: Fault attacks are deployed to disturb the normal functioning of a cryptosystem.They are injected by various techniques such as laser, light pulses, electromagnetic perturbations, tampering with the clock, etc.This enables the attacker to collect the erroneous result and to gain information about the private key.As with side-channel attacks, fault attacks can be overcome by testing imple-mentations against an advanced ANN that tries to leverage the erroneous results to infer information about the key.AES cipher implementations need to be tested against an advanced ANN model that tries to leverage collected output to infer the key before deployment.

3.
Resistance to linear attacks [62]: This task can be processed by computing the linear probability table of the S-box.ANNs as excellent function approximators can be used to model nonlinearity of S-boxes, similarly to the work of [53] on DES.

4.
Resistance to differential attacks [63]: This task can be performed by computing the difference distribution table of the S-box.As with linear attacks, ANNs as excellent function approximators can be used to model the differential properties of S-boxes, similarly to what has been done by [55] on the round-reduced Speck cipher and by [47] on the round function of GIFT.

7.
Algebraic immunity [69,70]: The algebraic immunity of a vectorial Boolean function F defined on F 2 n is the lowest degree of all functions G = 0 satisfying Resistance to other attacks: There are plenty of attacks and criteria that can be implemented with AI to test the security of block ciphers.This includes correlation immunity [72], strict avalanche criterion (SAC) [73], fixed points and opposite fixed points [59], algebraic degree [72], impossible differential [74], etc.A complete list of such attacks can be found in [5,75].
In sum, AI can be used to test AES SubBytes() and MixColumns() functions, and the AES cipher with its modes of operations and their implementations can be used to test against all former attacks and to propose useful and efficient solutions, such as the choice of the key space, MixColumns() matrix polynomials, etc., that nullify/undermine the attacks.

The RSA Cryptosystem
In 1978, Rivest, Shamir, and Adleman [6] introduced RSA, a public key and digital signature scheme.RSA is used in various industrial applications, such as privacy, VPNs, communication channels, email services, cybersecurity, and web browsers.

The RSA Encryption Scheme
The RSA encryption scheme is composed of three algorithms.

1.
Key Generation: Given a parameter n, • Select a random prime number p of bit size n.

•
Select a random prime number q of bit size n with p = q.

•
Publish the public key (N, e).

2.
Encryption: Given a public key (N, e) and a message M ∈ Z/NZ, • Compute the ciphertext C ≡ M e (mod N).

3.
Decryption: Given the private key (N, d) and a ciphertext C, The correctness of the decryption works following Euler's Theorem, where k is the integer such that ed = 1 + kφ(N).

Attacks on RSA
In RSA, there are originally three parameters: a modulus N = pq with two large prime numbers p and q, a public exponent e satisfying gcd(e, (p − 1)(q − 1)) = 1, and a private exponent d such that ed ≡ 1 (mod (p − 1)(q − 1)).This modular equation can be rewritten as ed − kφ(N) = 1 and is called the key equation.Since its invention in 1978, RSA has been intensively cryptanalyzed by various methods [7,8,10].We describe below some of these attacks.The prominent attacks on RSA can be categorized into three groups:

•
Factorization attacks.The most obvious attack on RSA is to factor its modulus N.
Nevertheless, since N is the product of two balanced large prime numbers, no known method is efficient to factor RSA moduli of size 1024 bits or more.There are several algorithms devoted to factoring integers, such as the Number Field Sieve method [76], Pollard's Rho method [77], the Elliptic Curve Method [78], and others, with different running times as presented in Table 10.any factor Quadratic Sieve [79] O e (1+o(1)) any factor Despite the existence of such factorization algorithms, there is no known non-quantumbased method that can efficiently factor an RSA modulus of more than 1024 bits.The latest record for integer factorization was obtained in 2020 by Boudot et al. [80], who factored RSA-250, an RSA modulus with 829 bits.• Algebraic attacks.Such attacks are based on the mathematical structure of the cryptosystem.Typically, for RSA, the algebraic attacks are related to the key equation ed − kφ(N) = 1.In 1996, Coppersmith [81] proposed a method to solve certain polynomial equations and applied it to factor an RSA modulus if half of the bits of one of the prime factors were known.Since then, various generalizations of Coppersmith's method have been proposed [7][8][9][10]82].
In 1990, Wiener [83] showed that using RSA with a small private exponent is insecure.Using the key equation ed − kφ(N) = 1 with φ(N) = (p − 1)(q − 1) = N + 1 − (p + q) ≈ N, he showed that if p and q have the same bit size, and if d < 1 3 N 1 4 , then: which implies that k d is one of the convergents of the continued fraction expansion of e N .The convergents of e N can efficiently be computed by applying the continued fraction algorithm.In 1996, Boneh and Durfee [84] improved the bound up to d < N 0.292 by applying Coppersmith's method and lattice reduction techniques.

•
Side-channel attacks.The modular exponentiation is a crucial operation in RSA and must be implemented securely to prevent side-channel attacks.The application of side-channel attacks against RSA started in 1996 with the work of Kocher [24].Since then, numerous studies have been conducted to make side-channel attacks infeasible against RSA [85][86][87][88].
For the RSA cryptosystem, the running time during the decryption process can leak information about the private key.This method is known as a timing attack and is one of the most popular side-channel attacks.In RSA, the timing attack concerns the modular exponentiation if the square-and-multiply method is used.To compute m d (mod N), the square-and-multiply method consists of expanding d , taking a = 1, and then, for i from r − 1 down to 0, computing a ≡ a 2 (mod N); additionally, if d i = 1, a ≡ am (mod N).The drawback of this method is that the computation time is not the same when d i = 1 and d i = 0.This can be exploited to guess the binary decomposition of d and then to compute d.To ovoid timing attacks, there are various implementations of the modular exponentiation, such as square-always exponentiation [89].

1.
Resistance to side-channel attacks [24]: RSA is vulnerable to side-channel attacks depending on its arithmetic operations, especially during the decryption process.Numerous studies have been proposed to protect it from side-channel attacks [87,88,90].As with side-channel attacks on AES, advanced ANNs can be used to test the RSA cryptosystem and its implementations against the side-channel attacks before deployment.Some work has already been done in this direction.Ref. [57] used deep learning in side-channel attacks against a secure implementation of the RSA algorithm.

2.
Resistance to fault attacks [91]: In addition to side-channel attacks, RSA is vulnerable to fault attacks [92,93].There are many techniques to force faults, such as variations in the clock, laser, X-rays, voltage, etc.These attacks also lend themselves to the use of advanced ANNs to infer key bits or plaintext from the collected output resulting from the faults.

3.
Resistance to factorization attacks: The security of RSA is partly based on the difficulty of factoring its modulus N. Obviously, the bit size of N is crucial against factoring algorithms, such as the Number Field Sieve and the Elliptic Curve method.The current recommendation for the size of the RSA modulus is at least 3000 bits [94].Some initial work has been conducted in this direction by [95,96], but more is needed in order to strengthen the choice of primes p and q. 4.
Resistance to Fermat's factoring method [97]: This method is based on solving the equation N = x 2 − y 2 = (x − y)(x + y), which leads to p = x+y 2 , q = x−y 2 .If the difference |p − q| is too small relative to N, then y is too small, and √ N is an approximation of x.This can be exploited to retrieve x, y, and the prime factors from p and q.The method works efficiently when |p − q| < N 1 4 .AI can be used to learn x and y for different Ns and to eliminate the RSA prime factors p and q that are vulnerable to Fermat's factoring method during the generation phase.Furthermore, biases in the distribution of consecutive primes [98] can be learned using an advanced ANN to help reduce the search space in factorization and Fermat's factoring attacks.

5.
RSA with existing modulus: If N 1 = p 1 q 1 is the RSA modulus of two independent entities, then both entities know the prime factors and can decrypt the encrypted messages of each other.Unfortunately, AI cannot help guard against this scenario.Luckily, the likelihood that two organizations generate the same primes p and q is extremely slim, knowing that p and q are on the order of 2 1024 .6.
RSA moduli with common factors: If N 1 = pq 1 and N 2 = pq 2 are two RSA moduli, then an attacker can compute p = gcd(N 1 , N 2 ), q 1 = N 1 p , and q 2 = N 2 p .This factors the two moduli.To generate a safe RSA modulus N, testing whether N is coprime to every modulus in the list of collected moduli can be efficiently performed by using the method of Bernstein [99,100] without the need for AI. 7.
RSA moduli with primes sharing most, middle, or least significant bits: If N 1 = p 1 q 1 and N 2 = p 2 q 2 are two RSA moduli, where p 1 ≈ p 2 share an amount of their least, middle, or most significant bits, then one can apply the method of May and Ritzenhofen [101] or the method of Faugère et al. [102] to factor N 1 and N 2 .Here too, the factorization problem can be posed as an approximation function implemented using ANNs, leading to the elimination of the prime factors that share a significant number of their least significant bits.8.
Resistance to small private exponents: The private exponent in RSA with a modulus N = pq and a public exponent e is the integer Because of the attack of Wiener [83], and the attack of Boneh-Durfee [84], it is required that d be larger than √ N. Nevertheless, in many instances, one can find the value d even if d is arbitrarily large [103,104].AI can be used to build an approximation function using advanced ANNs for solving the equation above and using it to test the resistance of a generated RSA modulus to such attacks.9.
Resistance to partial key exposure attacks: When a fraction of the most significant or the least significant bits of the private exponent d is guessed by an attacker, then Coppersmith's method can be used to retrieve d entirely [105][106][107].An ANN approximator for learning d from its fractions and known ciphertext plaintext pairs can be used to test any generated private key d against such attacks before using it for practical applications.

Learning with Errors
In 2005, Regev [11] introduced the Learning With Errors problem (LWE).It has become an important computational problem in lattice-based cryptography.

Description of Learning with Errors
An instance of LWE is parameterized by a positive integer m, a prime number q, and a probability distribution χ over Z q , the ring of integers modulo q.A typical example of a probability distribution is the continuous Gaussian distribution centered in c ∈ R n with a parameter σ > 0. It is defined for a vector x ∈ R n by: There are two main equivalent sub-problems in LWE, Search LWE and Decision LWE, which are known to be equivalent.
• Search LWE can be summarized as follows.Let χ be a probability distribution over Z q .Given a matrix A ∈ Z m×n q and a vector b ∈ Z m q whose entries are chosen uniformly, find a vector s ∈ Z n q such that As + e = b, where e ∈ Z m q is a vector generated by χ.

•
Decision LWE can be summarized as follows.Given a matrix A ∈ Z m×n q and a vector b ∈ Z m q , determine whether (A, b) ∈ L 1 or (A, b) ∈ L 2 , where L 1 is the set of all tuples (A, b) ∈ Z m×n q × Z m q generated by uniformly random distribution and L 2 is the set of all tuples (A, b) ∈ Z m×n q × Z m q , such that b = As + e for a vector s ∈ Z n q , uniformly distributed, and e ∈ Z m q , generated by χ.The first cryptosystem based on LWE was presented by Regev in 2005.It is parameterized by the four parameters m, n, q, and χ, with q prime, m ≥ 4(n + 1) log(q), and χ as a probability distribution, such that a vector e ∈ Z m q generated by χ satisfies e < B < 1 4mq with overwhelming probability.The system is composed of four algorithms, which can be summarized as follows. 1.
Key Generation: Given the parameters m, n, q, and χ, • Select a matrix A ∈ Z m×n q at random.• Select a secret vector s ∈ Z n q .• Select a private vector e ∈ Z m q according to a probability distribution χ over Z q .• Compute b ∈ Z m q , such that b = As + e, and publish the public key (A, b).

2.
Encryption: Given the parameters m, n, q, χ, a public key (A, b), and a message M ∈ {0, 1}, • Select a vector r ∈ Z m q at random.
Here, x t represents the transpose of x.

3.
Decryption: Given the parameters m, n, q, χ, a ciphertext (C 1 , C 2 ), and a secret key s, then the decryption is 0, else the decryption is 1.The correctness of the decryption depends on the size of u.Indeed, and if e satisfies e < B with high probability, then r t e < mB < q 4 , and the decryption occurs.

Hardness of LWE
The security of LWE is based on the hardness of various open problems in lattice reduction theory.A lattice L ⊂ R n is a discrete subgroup of R n that is generated by m vectors u 1 , . . ., u m ∈ R n using integer coefficients; that is, The set B = {u 1 , . . ., u m } is called a basis of L, m is its rank, and n is its dimension.When n = m, the lattice is called a full-rank lattice.
Lattices have plenty of properties and hard unsolved problems that are used to build cryptosystems that are still resistant, even to quantum computers.The most known and used hard problems in lattice theory are the Shortest Vector Problem (SVP) and the Closest Vector Problem (CVP).Both problems use the minimum distance λ 1 (L) of L, which is defined by: λ where v is the Euclidean norm defined for • Shortest Vector Problem (SVP): Let L be a lattice with a basis B. Find the shortest nonzero lattice vector u ∈ L with u = λ 1 (L).

•
Closest Vector Problem (CVP): Let L be a lattice with a basis B and v ∈ L be a vector.Find a lattice vector u ∈ L such that u − v ≤ λ 1 (L).In 2005, Regev [11] showed that, when the LWE error e is generated by a Gaussian distribution with a parameter σ = αq, where √ n q < α < 1, then solving LWE implies a quantum solution of GapSVP γ and SIVP γ over n-dimensional lattices in the worst case for γ = O(n/α), where O(•) is a function with various poly-logarithmic factors.In 2009, Peikert [108] showed that classical reductions are possible from the worst-case hardness of the GapSVP problem to the search version of LWE when the modulus q is exponential in the dimension n, especially when q ≥ 2 n 2 .In 2013, Brakerski et al. [109] showed that LWE is classically at least as hard as standard worst-case lattice problems with any subexponential modulus.

Applications of AI to LWE
The security of LWE comes from its reduction to worst-case lattice problems.Such problems are believed to be hard for both classical and quantum computers.As a consequence, there is a very limited number of attacks that can be launched against LWE.While the theoretical security of LWE depends on hard problem in lattices, its practical security depends on the parameters used in a specific instantiation.Intuitively, both GapSVP γ and SIVP γ problems can benefit from the power of advanced ANNs as approximators.The set of parameters that are vulnerable to solutions of GapSVP γ and SIVP γ by an advanced ANN should be discarded.To date, no such attempts can be found in the literature.

The Ascon Family of Ciphers
In this section, we describe Ascon [22], the family of authenticated encryption and hashing algorithms selected by NIST for future standardization of lightweight cryptography.

Description of Ascon
Ascon is a family of several algorithms devoted to different tasks.The family includes Ascon-128 and Ascon-128a authenticated ciphers, the Ascon-Hash hash function, and the Ascon-Xof extendable output function.They ensure 128-bit security and use a common 320-bit permutation.All of the algorithms operate at 320-bit states.Each state S is divided into an inner part S r of size r bits and an outer part S c of size c = 320 − r bits, where r depends on the Ascon variant.Moreover, each state is divided into five 64-bit registers x 0 , • • • , x 4 , such that: S = S r S c = x 0 x 1 x 2 x 3 x 4 .
The authenticated encryption design of Ascon is parameterized by a key bit length of k ≤ 160, a rate r, and two integers a and b.The integers a and b serve to count the compositions of a permutation p of the set {0, 1} 320 .The permutation p is the composition of three permutations, specifically: where p C is a constant addition, p S is a substitution layer, and p L is a linear diffusion layer (see [22] for more details).where x k is the bitstring x truncated to the least significant k bits.The ciphertext is finally composed as: which is transmitted together with the tag T.

Ascon Decryption
In Ascon, the decryption algorithm starts with the following parameters, already fixed in the encryption algorithm: The decryption is nearly identical to encryption.More precisely, the initialization and the processing associated data are identical, while plaintext processing is replaced by ciphertext processing as follows.
First, the ciphertext C ∈ {0, 1} * is split into t blocks of size r so that: Second, for i = 1, • • • , t − 1, the following calculations are performed: Next, the following values are computed: S ← p a S ⊕ 0 r K 0 320−r−k , Finally, if T = T * , then return P 1 • • • P t−1 Pt , else return ⊥.

Security of Ascon
The hardness of Ascon is tightly linked to the choice of the parameters a, b, and r and to the key size k used to perform the encryption and the decryption.To guarantee 128-bit security, the recommended parameters are listed in Table 11.Since its selection as the winner of the CAESAR competition, the Ascon family has been intensively analyzed for vulnerabilities.Section 6 of [22] presents an overview of the security analysis and resiatance to attacks.All published results so far support its security and efficiency.

Applications of AI to Ascon
The Ascon family uses a 5-bit S-box [22] that needs to be immune against known attacks for substitution layers p S , such as the linear and differential attacks.Section 5.4 presents an exhaustive list of these attacks that are also applicable to the Ascon lightweight cipher.
To this end, AI can be used to test the Ascon S-box, as well as the full or partial transformation process reflected by small a and b values against all these attacks, in order to propose the set of parameters that nullify/undermine these attacks.

Conclusions
Artificial intelligence (AI) and, in particular, deep learning using sophisticated artificial neural network (ANN) architecture, is exponentially developing and gaining practical use in all sectors of daily life.In this paper, we presented areas where the use of AI can help enhance the security of cryptographic systems.We particularly focused on four prominent systems in modern cryptography, namely, the Advanced Encryption Standard (AES), the Rivest-Shamir-Adleman (RSA) scheme, the Learning With Errors (LWE) scheme, and the lightweight Ascon cipher family.We reviewed their security and pinpointed layers, functions, and areas that could potentially benefit from cryptanalysis that uses advanced ANN architectures.This said, depending on the function to approximate S-box and vectorial Boolean functions for AES, S-box and permutations for Ascon, Diophantine equations and factorization for RSA, lattice problems for LWE), ANNs may not necessarily outperform other machine learning (ML) techniques.For instance, LWE introduces vectors of errors similar to noise in the encryption process, which may hinder the performance of ANNs, since it is a well-known fact that ANNs suffer from noisy training data.Experimentation is needed to confirm this hypothesis.Furthermore, sophisticated ANN architectures can have the tendency to overfit the presented training data, which may lead to errors for unseen encrypted data or plaintext.
Finally, beyond prediction, ANNs do not provide any insights into the structure of the function being approximated, which may not help in fine-tuning the function/layer being approximated (see [110] for Explainable AI).For further research, we envisage experimenting with different ANN architectures and building an ANN generator that automatically generates an adversary ANN from the specification of the S-box or the vectorial Boolean function to help cryptosystem designers quickly test the strength of the cryptographic functions and substitution layers.

Figure 1 .
Figure 1.basic unit of artificial neural networks: the perceptron.

Figure 3 .
Figure 3. Learning the nonlinear line that separates the green dots from the red dots using 2 hidden layers with 20 neurons each.Figure generated using [34].
where a • b is the inner product of the vectors a and b.The underlying vectorial Boolean function of AES can be modeled and tested using advanced ANNs.8.Balancedness[71]: A vectorial Boolean function F : F 2 n → F 2 m is balanced if every value of F 2 m is the image of exactly 2 n−m values from F 2 n .The task of verifying balancedness can be processed by studying the vectorial Boolean function that defines the S-box of AES. 9.
Then, two values Pt and S r are computed as:Pt ← S r | Ct | ⊕ Ct , S r ← S r ⊕ Pt 1 0 r−| Ct |−1 .

Table 1 .
Representation of the block and the subkey with bytes.
S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 11 S 12 S 13 S 14 S 15

Table 3 .
SubBytes operation yielding a new state vector.
Transformation T on S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 11 S 12 S 13 S 14 S 15 = S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 11 S 12 S 13 S 14 S 15 .•ShiftRows:Inthistransformation, the bytes of the first row in the state array remain unchanged, and the bytes of rows 2, 3, and 4 are cyclically shifted left by 1, 2, and 3 cases, respectively (see Table4).

Table 5 .
MixColumns operation yielding a new state.

Table 6 .
InvShiftRows transforms the state on the left to the state on the right.S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 11 S 12 S 13 S 14 S 15 −→ S 0 S 1 S 2 S 3 S 7 S 4 S 5 S 6 S 10 S 11 S 8 S 9 S 13 S 14 S 15 S 12 .

Table 7 .
InvMixColumns multiplies the state with the given matrix.S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 11 S 12 S 13 S 14 S 15 = S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 11 S 12 S 13 S 14 S 15

Table 8 .
[63]ribution of the linear probability values of AES.In 1991, Biham and Shamir[63]proposed differential cryptanalysis and applied it to DES.Differential cryptanalysis is a chosen-plaintext attack and works with two pairs of plaintext (P 1 , P 2 ) with a fixed difference a = P 1 + P 2 and their corresponding ciphertext (C 1 , C 2 ).The goal of the differential cryptanalysis is to study the behavior of the difference b = P 1 + P 2 .For a vectorial Boolean function F : F 2 n → F 2 n , the differential cryptanalysis is studied via the difference distribution table (DDT), which is defined for (a, b) ∈ F 2 2 n by:

Table 9 .
Distribution of the differential probability values of AES.
[67]runcated differentials[66]: This variant of the differential attack was presented by Knudson in 1994.This task can be processed by adapting the difference distribution table of the S-box under the truncated differentials criteria.As with differential attacks, ANNs as excellent function approximators can be used to model the truncated differential properties of S-boxes.6.Resistance to boomerang attacks[67]: The task of testing the boomerang cryptanalysis can be accomplished by studying the boomerang connectivity table (BCT) as defined by Cid et al. in 2018 [68].The BCT of an invertible vectorial function F : F 2 n → F 2 n is defined at the entry (a, b) ∈ F 2 n by:

Table 10 .
Algorithms to factor an integer n with running times.
The security of LWE is based on two sub-problems in lattices: the Decisional Approximate SVP (GapSVP γ ) and the Approximate Shortest Independent Vectors Problem (SIVP γ ), where γ ≥ 1 is a positive real parameter.•DecisionalApproximateSVP(GapSVP γ ): Let L be a lattice with a basis B and r > 0 be a real number.Decide whether λ 1 (L) ≤ r or λ 1 (L) > γr.•Approximate Shortest Independent Vectors Problem (SIVP γ ): Let L be a full-rank lattice with dimension n and a basis B. Find n linearly independent vectors v i ∈ L such that v i ≤ γλ n (L), where λ n (L) is the n-th successive minimum of the lattice.