Algebraic Cryptanalysis with MRHS Equations

: In this work, we survey the existing research in the area of algebraic cryptanalysis based on Multiple Right-Hand Sides (MRHS) equations (MRHS cryptanalysis). MRHS equation is a formal inclusion that contains linear combinations of variables on the left-hand side, and a potential set of values for these combinations on the right-hand side. We describe MRHS equation systems in detail, including the evolution of this representation. Then we provide an overview of the methods that can be used to solve MRHS equation systems. Finally, we explore the use of MRHS equation systems in algebraic cryptanalysis and survey existing experimental results.


Introduction
The basic concept of algebraic cryptanalysis was already introduced in the seminal work of Shannon [1]. Shannon introduces a method of confusion as a way to prevent statistical cryptanalysis of ciphers. He notes that a set of statistics observed from a secret communication is connected to some coordinates of the key space through some algebraic equations. The ultimate goal of algebraic cryptanalysis is to solve this set of equations. On the other hand, good ciphers are designed in such a way, that this task should be very difficult. A summary of algebraic cryptanalysis can be found in [2]. Methods to solve algebraic equations in cryptanalysis are also summarized in [3].
The basic principle of algebraic cryptanalysis is to represent a cryptanalytic problem in an abstract setting, and then to solve this representation with generic tools. In general, each problem can be represented as a set of non-linear equations over finite fields. Theoretically, non-linear equation systems over finite fields can be solved by using general Gröbner bases techniques and related solvers, such as [4]. However, no algorithm is known that can solve most non-linear systems in practice. Specific techniques, such as XL [5], and XSL [6] were developed for solving problems related to algebraic cryptanalysis [7][8][9][10].
Another approach to algebraic cryptanalysis is to encode a cryptographic problem as a hard instance of the satisfiability problem [11], and then to use existing SAT solvers to solve this problem instance [9,[12][13][14][15]. In our experience, SAT solvers can be employed in large-scale distributed algebraic attacks [16] targeting specific weak keys in large sets of data encoded as an SAT problem.
Algebraic cryptanalysis can support and complement other types of cryptanalytic techniques, such as using algebraic techniques in differential cryptanalysis [17][18][19][20]. Algebraic side-channel attacks [21,22] use algebraic techniques to complement information leaked from the cipher through side-channels, or through errors [23].
In this paper, we focus on a different representation of algebraic problems that is suitable for algebraic cryptanalysis. This representation is based on the so-called Multiple Right-Hand Sides (MRHS) equation systems [24]. The MRHS representation can separate the representation of non-linear (confusion-based) and linear (diffusion-based) components of the cipher, and thus represent problems of algebraic cryptanalysis in a way similar to how the ciphers are designed. MRHS representation focuses on the main potential weakness of the symmetric cipher design: unlike random functions (that we try to emulate), practical ciphers must be efficiently implemented in hardware (and software) with a limited number of components. Thus the MRHS representation of a practical cipher is relatively small and compact in comparison to a representation of a random function. In general, the problem of whether a random (polynomially sized) MRHS equation system has a solution is NP-complete [25]. In practice, experiments [26] show that equations derived from (round reduced) ciphers can be in some instances solved faster than with an exhaustive search through the key space.
We describe the MRHS equation system and survey their evolution in Section 2. Section 3 is then devoted to surveying methods that can be used to solve MRHS equation systems. The aim of Section 4 is to introduce the techniques used in MRHS cryptanalysis, connecting the cryptanalytic problems and MRHS representation. In Section 5, we specifically survey the existing results of MRHS cryptanalysis.

What is a MRHS Equation System?
MRHS equation systems are related to the ideas of Zakrevskij [27]. Systems of Boolean equations can be sparse in the following sense: each Boolean equation in the system depends on only a small subset of the variables. Such systems can be solved by assigning values for particular variables and removing some potential solutions by observing local dependencies (individual possible values of the active variables in individual sparse Boolean equations).
A new representation of sparse Boolean equations related to the algebraic cryptanalysis was presented by Raddum and Semaev in [28]. The equations were represented by "symbols" containing lists of active variables and their possible values. The solution of the system was done by manipulating such symbols (Agreeing and Gluing). The representation of sparse equation systems can be generalized from Boolean equations to equations over any finite field [29].
Further generalization comes from replacing individual active variables with linear combinations of variables, coining the term Multiple Right-Hand Sides (linear) equation systems [24,30]. The original definition preserves the symbol notation, with a list of possible assignments of values for (active) linear combinations of variables.
In this article, we use a newer definition of MRHS equation systems introduced in [31] (equivalent to the original one). We will use the following symbolic notation: • Symbol F denotes a finite field, Z denotes a ring of integers, N denotes natural numbers. • We are using row vectors, denoted by bold lowercase letters: v ∈ F n . • Matrices are denoted by bold uppercase letters: M ∈ F n×k . • Standard sets are denoted by uppercase slanted letters: S ⊂ F n . The size of the set S is denoted by |S|. When S is a set of vectors, S denotes a matrix with |S| rows, where each row is in S. By S · A we denote a set of vectors S = {v · A; v ∈ S}. The set of solutions of M is a union of solutions of x · M = v, for each v ∈ S. We can see that if |S| = 1, an MRHS equation corresponds to a standard system of linear equations. Definition 2. Let F be a finite field, n, m ∈ N. For each i ∈ {1, 2, . . . , m} let l i ∈ N, M i ∈ F n×l i , and S i ⊂ F l i . MRHS (equation) system is a set of MRHS equations M i : We can write an MRHS system in a joint form We can see that the joint form of an MRHS system is an MRHS equation, given by lefthand side matrix M = (M 1 |M 2 | · · · |M m ), and the right-hand side set S = S 1 × S 2 × · · · × S m . The dimension of M is n × ∑ m i=1 l i . The size of set S grows quickly with m, |S| = ∏ m i=1 |S i |. To store the joint form efficiently, we typically do not evaluate the Cartesian product and store only the individual sets S i . Definition 3. Let poly(n) denote any polynomial function in n. We say that a family of MRHS systems parametrized by n has a polynomial representation if for each n: m < poly(n) and for each i ∈ {1, 2, . . . , m} we have l i < poly(n), and |S i | < poly(n).

Example 1.
Let us construct a family of MRHS systems with n ≥ 1 variables, and m = 1 MRHS equation (m < n + 1 for any n). Let S 1 = {0, c}, with c some randomly selected constant (|S 1 | < n + 2). This family has a polynomial representation if we restrict the dimension l 1 by some polynomial function of n. A counterexample is selecting all linear combinations of n input variables as columns of M 1 , which requires l 1 = 2 n > poly(n) for any polynomial function poly(n) (and sufficiently large n).
We can verify whether x is a solution of an MRHS system from a family with polynomial representation in polynomial time. Firstly, we compute u = x · M. Then we verify the right-hand sides with m tests u i ∈ S i , where u i represents a projection of u to coordinates corresponding to S i . The MRHS problem is a decision problem: Given the MRHS system, is there any solution x of this MRHS system? In [25] we prove that this problem is NP-complete for a family of MRHS systems with polynomial representation.
Further evolution of MRHS equation systems are Compressed Right-Hand Sides (CRHS) Equations [32]. In this form, the right-hand side set S is represented by a binary decision diagram (BDD). This form can represent large sets of right-hand side vectors efficiently but requires new methods to solve such systems such as [33]. Each MRHS equation system can be rewritten as a CRHS equation. The opposite direction is also possible, but given a general CRHS equation, the number of right-hand sides in the MRHS representation can grow too quickly to be efficiently represented. In the rest of the article, we focus mostly on MRHS equations, but we will try to survey also cryptanalytic results obtained with CRHS representation.

Algorithms for Solving MRHS Equations
MRHS problem in a decision setting is a question, of whether there exists some x ∈ F n that is a solution of the MRHS equation system xM ∈ S 1 × S 2 × · · · × S m . In practical algebraic cryptanalysis, we typically know that some solution of the MRHS system exists. Instead, we focus on computing any solution of the system (or all of the existing solutions).
Multiple algorithms can solve MRHS systems. The basic algorithm is the exhaustive search: iterate through each element of F n , and verify, whether it is a solution of the system. This gives us the upper bound on the complexity of the MRHS problem: |F n | iterations, and in each iteration we do one vector-matrix multiplication (on the left-hand-side) plus verification of the set membership (on the right-hand side). The iteration can be improved by classic techniques such as using Grey code for element enumeration.
A specific issue arises in algebraic cryptanalysis. Suppose that some MRHS system with n unknowns over F 2 is derived from a cryptanalytic problem with k < n unknown key bits. Then in the cryptanalytic setting, the complexity of the search should be bounded by 2 k key verification operations, instead of 2 n , the complexity of the exhaustive MRHS solver. Thus, a generic MRHS solver based on an exhaustive search seems unsuitable for algebraic cryptanalysis.

Solving MRHS Systems with Linear Algebra
Similarly to standard systems of linear equations, we can perform some operations on the MRHS system that do not change the (size of the) set of solutions of the MRHS system: Note that it would be possible to define a similar operation with a general invertible matrix B. However, in such a case we would have to evaluate Cartesian products of S i 's, and thus in general the equivalent system would lose the polynomial representation.
The main difference between the MRHS systems is that S m−1 requires more space to explicitly list all its vectors, in comparison to the original MRHS system. In general, we can join any pair of RHS sets (computing S i × S j ). • RHS compression. Let rank(M i ) < l i for some i. We can use column operations with matrix B i to change the first column of M i to all zeroes. The vector x is a solution of M if and only if it is a solution of . This means we can remove all vectors from S i that have non-zero first coordinates after the column operation.
We can transform the joint form of the MRHS equation to a single compact MRHS equation by a series of RHS joining and compression operations. This is the basis of the original Gluing algorithm [28,30] proposed to solve MRHS equation systems. Note that during the sequence of operations during the Gluing algorithm we can lose the polynomial representation property.
Another solving algorithm that uses linear algebra was proposed in [34]. This algorithm uses the reduced row echelon form of the joint matrix to efficiently expand and test partial solutions of the system.

Solving MRHS Systems with Local Reduction
It was already observed in the seminal works [28,30] that (sparse) MRHS equation systems can be solved more efficiently than with exhaustive search. They proposed a method of Agreeing and Gluing to solve the MRHS system. The main idea of the Agreeing is to use "local information" obtained from individual MRHS equations in the system to reduce the size of individual right-hand side sets. Let us suppose that we have two MRHS equations x · M i ∈ S i , and x · M j ∈ S j within the target MRHS system, with linearly dependent columns in (M i |M j ). There exists matrix U, such that (M i |M j ) · U = 0. Thus on the right-hand side, we can remove each v ∈ S i × S j for which v · U = 0. Agreeing method verifies parts of and similarly for v j and S j ).
In [35], it was observed that Agreeing algorithm can be translated into the language of electric wires and switches, and can be efficiently implemented in specialized hardware. In [36], a special-purpose architecture to implement an algebraic attack in hardware (called PET SNAKE) was proposed. The proposed use of ASICs seems to enable significant performance gains over a software implementation of MRHS solver based on Agreeing.
The Agreeing method can be generalized to different polynomial time "local reduction" methods [26], such as the Method of Syllogisms, or the Relinearisation method. However, MRHS systems in general cannot be solved by just these local reduction methods. When considering random sparse Boolean equations there is an observable phase transition between systems that can be solved by local reduction (easy problems) and systems that cannot be solved directly (hard problems) [37]. The main strategy in utilizing the local reduction is to combine it with Guessing. This means that we explicitly try to substitute some value (either of some variable or some combination of variables), and try to verify (with Agreeing) whether the reduced system still has a solution. This leads to a class of algorithms based on recursive search similar to DPLL algorithm [38] used in SAT solvers. Similarly to DPLL, additional information from guess and verify can be learned and used to improve further guessing [39].
Alternatively, local reduction methods can be combined with the Gluing method, which means explicitly joining individual MRHS equations to find all solutions of the MRHS system. Local reduction is used to keep the size of the intermediate systems as low as possible. An analysis of the improved Agreeing-Gluing algorithm can be found in [40]. The Gluing algorithm complexity depends on the order of MRHS equations used by individual Gluing operations. This gives rise to a new combinatorial MaxMinMax problem [41][42][43]. The solution to this problem can provide an optimal Gluing strategy. It is an open problem whether the MaxMinMax also applies to MRHS solver based on linear algebra [34], which has a complexity that also depends on the order of the MRHS equations within the joint form of the MRHS system.

Solving MRHS Systems in Dual Code
A new method to solve MRHS equation systems and their connection to group factorization was studied in [44]. The method is essentially a generalization of Agreeing to the whole joint matrix of the MRHS system for MRHS systems over a binary field F 2 . We can observe that on the left-hand side, possible vectors xM form a binary linear code C with parity check matrix H. Thus, valid solutions x correspond exactly to those right-hand side vectors v ∈ S, which are also codewords of C, and v · H T = 0. The problem of solving an MRHS system can be reduced to solving a group factorization problem in the form In [31], we have followed this reduction to change the MRHS problem into a specific instance of a decoding problem. We also explore how the complexity of solving Multivariate Quadratic (MQ) and MRHS systems is connected to the complexity of the decoding problem. In [45] we show how the transformation to the decoding problem can be used to estimate the upper bounds on the complexity of algebraic attacks on ciphers with low multiplicative complexity (low number of AND gates).

Solving MRHS Systems with Heuristic Search
In [46] we have introduced a new method for solving sparse random MRHS systems based on bit-flipping. This method starts with random x. In sufficiently sparse systems, each bit of unknown x only influences a limited number of individual MRHS equations. The bit-flipping method is based on marking those bits of x that can change unsatisfied MRHS equations (x · M i ∈ S i ) to a satisfied state. We then change (some of) the marked bits, gradually improving the number of satisfied MRHS equations (until the system is solved, or we end in a cycle and need to restart the method). Experiments show that this method can solve MRHS systems more efficiently than exhaustive search, but its complexity is significantly influenced by the density of the left-hand side joint matrix M.
An alternative formulation of the bit-flipping approach is based on the hill-climbing algorithm [46,47]. In this case we again start from random x, and choose some neighboring x ⊕ e i , where w H (e i ) = 1. With the greedy approach, we try to maximize the new number of satisfied MRHS equations (or restart, if this is not possible). Experiments show that the hill-climbing-based method has a better success chance than bit-flipping, but the individual steps of the algorithm are slower (as we need to explore all neighbors of x).
A natural extension of the hill-climbing method is the application of evolutionary computing and stochastic optimization algorithms. Successful solving of (specific random) MRHS equations with genetic algorithms was reported in [48]. This research area is however still very fresh, with many open questions and potential for research: Which methods are suitable for generic systems/specific systems related to algebraic cryptanalysis? How to select the parameters of the heuristic methods? Which scoring functions should be used? Can the methods be combined with other MRHS-solving methods?

Using MRHS Systems in Algebraic Cryptanalysis
Algebraic cryptanalysis typically involves three basic steps. Firstly, we transform the cryptanalytic problem into an algebraic representation. Then we solve the algebraic problem with a solver. Finally, we use the algebraic solution to determine the result of the cryptanalysis (e.g., extracting the key bits). We will call algebraic cryptanalysis that involves MRHS representation an MRHS cryptanalysis.
MRHS representation is especially suited for the cryptanalysis of block ciphers composed of small non-linear elements (S-boxes) and linear diffusion layers. Let us consider an example based on the Substitution-Permutation Network (SPN). SPN has r rounds composed of key addition, bricklayer substitution with s parallel S-boxes given by non-linear Boolean function F : Z m 2 → Z m 2 , and a diffusion layer that can be described as a linear transformation by an invertible diffusion matrix L ∈ Z sm 2 × Z sm 2 . Let us denote the input plaintext by x, the output ciphertext by y, and the unknown key bits by k. For the sake of simplicity, let us suppose that round keys are computed from key bits k by linear transformation given by matrices K i (for round i, the round key is k i = k · K i ). Note that a non-linear key schedule can be included in the MRHS system similar to individual rounds.
Let us denote S-box inputs in round i by u i , and S-box outputs by v i . The first S-box layer input is computed as u 1 = x + k 1 (here x is just the plaintext). The diffusion layer gives us linear equations u i+1 = v i · L ⊕ k i+1 , and in the final round we get y = v r · L ⊕ k r+1 . The initial MRHS system has a set of "unknowns" given by concatenation of z = (k, x, y, u 1 , . . . , u r , v 1 , . . . , v r ). Individual MRHS equations in the system are based on S-boxes. Each S-box corresponds to a single MRHS equation in the form of where I and J are identity matrices selecting corresponding input and output bits of the S-box. Set S consists of all possible pairs of S-box inputs and outputs: S = {(a, b) ∈ Z m 2 × Z m 2 ; b = F(a)}. The final MRHS system is obtained by substituting known values of plaintext and ciphertext, and partially solving the system of linear equations given by diffusion layers and key schedule. The resulting linear expressions are substituted into the MRHS system. The detailed algorithm is presented in [49].
It is possible to represent the same system on different levels, e.g., by replacing S-boxes with their AND-XOR decomposition [47]. In general, any family of Boolean functions that can be implemented with a polynomial number of AND gates in an AND-XOR logic leads to a family of MRHS systems with polynomial representation.
Note that every MRHS system can be rewritten as an XOR-SAT problem [25], and then converted to a CNF-SAT instance used by SAT solvers. The main advantage of MRHS representation in comparison with CNF-SAT representation is that the MRHS system can handle XOR clauses from complex diffusion layers more naturally. There is also a simple correspondence between MRHS representation and MQ (multivariate quadratic) representation of the system [31].
Various representations of the same cryptanalytic problem can exploit different types of "sparsity". As there is a polynomial-time algorithm to transform between the representations, the expected theoretical complexity of the problem should remain the same (decision versions of these problems are NP-complete). It is an open research question, which of these representations is more suitable for particular cryptanalytic tasks?

Experimental MRHS Cryptanalysis
From the research perspective, the aim of cryptanalysis is not to "break ciphers", but to give insights into cipher security. Experimental algebraic cryptanalysis focuses on performing practical attacks on a smaller version of the cipher (with a reduced number of rounds, state size, key bits, ...). It might be problematic to compare results across different types of algebraic cryptanalysis, as different types of attacks use different methodologies and metrics. Some optimizations in algebraic solvers can be advantageous for small systems but do not scale well with the increasing system size (parameters).
In [26] we have proposed a methodology of experimental MRHS cryptanalysis that splits the algebraic attack into a polynomial part (local reduction), and an exponential part (guessing), respectively. The evaluator uses instances with known solutions to estimate the complexity of the attacks, and the response to changing parameters of the problem. The methodology can be used to reject weak ciphers, or as a tool for qualitative comparison of cipher designs. The methodology is exemplified by the example of algebraic cryptanalysis of former encryption standard DES [50].
Experimental algebraic cryptanalysis was applied to multiple well-known ciphers. In [51], local reduction techniques were used to evaluate the security of the block cipher GOST [52]. A comparison of local reduction techniques and SAT-solver-based algebraic attacks used in cryptanalysis of SHA-3 candidates JH [53] and Keccak [54] were presented in [55]. In [56], a particular local reduction method (the method of syllogisms) was used to solve reduced versions of stream cipher Trivium [57]. In [58], the local reduction method was independently applied to algebraic cryptanalysis of lightweight cipher Present [59]. Block Cipher DESL [60] was analyzed in [61].
The stream cipher Trivium was also analyzed in [32]. However, in this case, a representation based on compressed right-hand side (CRHS) equations were used. This approach was later explored in more detail in [62], in the context of the DES and the MiniAES ciphers. Algebraic attacks based on CRHS equations on small-scale variants of AES [63] was explored in more detail in [64]. In [65], the CRHS representation was adapted for the factorization problem. In [66], a new tool called CryptaPath for assisted algebraic cryptanalysis of symmetric primitives that can be described with SPN structure was proposed. This tool also uses CRHS representation.
In [34], a new solver that can solve MRHS equations was proposed alongside a methodology for using the solver in algebraic cryptanalysis. The methodology was tested on instances of scaled-down DES, AES, Present, and LowMC [67] ciphers. The experimental MRHS cryptanalytis of LowMC based on the custom implementation of the algorithm proposed in [45] was conducted in [68]. However, the results of the attack were worse than the brute-force approach. The use of the hill-climbing method for MRHS cryptanalysis was explored in [47] in the context of cryptanalysis of the block cipher Ascon [69].
The use of MRHS representation is not limited to algebraic cryptanalysis. In [70], a new approach to linear cryptanalysis of the block cipher DES was proposed. MRHS equation system is collected from linear approximations obtained by linear cryptanalysis. This approach was later extended to multidimensional linear cryptanalysis in [71].

Conclusions
Multiple Right-Hand Sides equation systems can be used in algebraic cryptanalysis instead of standard representations such as CNF for SAT solvers and ANF for Gröbner bases and related solvers. The main advantage of MRHS equations is the separation of linear and non-linear components of analyzed ciphers and cryptographic primitives. As the main disadvantage, we perceive a lack of freely available universal and specialized MRHS solvers, as well as a relative lack of research on using MRHS equation systems other than cryptographic applications.
While MRHS equation systems were primarily used for experimental algebraic cryptanalysis, they have also been used in theoretical studies. In [45] we use MRHS systems and their transformation to a decoding problem to provide upper bounds on the complexity of algebraic cryptanalysis of ciphers with low multiplicative complexity. In [31], we use the MRHS equation system as a middle step in connecting the complexity of MQ-and codebased cryptosystems used in post-quantum cryptography. We have even proposed a new type of post-quantum signature scheme that can be derived from an MRHS representation of a symmetric cipher such as AES or LowMC [49].
We conclude that the use of MRHS equation systems in not only algebraic cryptanalysis has still significant research potential, both theoretical and experimental. We believe that there is also a potential for applications of MRHS systems and solvers in other problem areas dominated by SAT solvers, such as circuit optimization.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: