A new idea for RSA backdoors

This article proposes a new method to inject backdoors in RSA and other cryptographic primitives based on the Integer Factorization problem for balanced semi-primes. The method relies on mathematical congruences among the factors of the semi-primes modulo a large prime number, which acts as a"designer key"or"escrow key". In particular, two different backdoors are proposed, one targeting a single semi-prime and the other one a pair of semi-primes. The article also describes the results of tests performed on a SageMath implementation of the backdoors.


Introduction
Impairing the robustness of cryptographic applications is a sensitive topic.The interest on direct attacks, vulnerabilities, and backdoors for all currently used ciphers is certainly justified by economic or geopolitical reasons.If a vulnerable implementation of a cryptographic algorithm is surreptitiously distributed, an "evil" actor or a national security agency might get easy access to any sort of sensitive and precious information.On the other hand, there could be "legal" actors that openly mandate or encourage the adoption of cryptographic implementations that include backdoors in order to realize "key escrow" mechanisms.For instance, a national country might legislate that judiciary representatives should always be able to recover any kind of encrypted communication involved in a criminal case.
Up to a few years ago, it was only conjectured [1] that major security agencies were able to decrypt a large portion of the world's encrypted traffic, mainly thanks to vulnerabilities hidden in pseudorandom generators or major cryptographic algorithms and applications.Some examples of this practice might be the Hans Bühler case in 1994 [2], the Dual-EC algorithm proposed in 2004 by 1 ORCID iD: https://orcid.org/0000-0002-7492-3129US National Institute of Standards and Technologies [3,4], and perhaps the OpenBSD backdoor incident emerged in 2010 [5,6].However, in the last few years many government bodies openly talk about enforcing by law "responsible encryption" or "exceptional access to encrypted documents" [7,8], which are essentially more palatable words for "escrow key" and "backdoors".
Moreover, even the approach to backdoor construction is changed in the last years.While in the past the focus was mainly on weaknesses in pseudo-random generators or software implementations that might allow an attacker to predict some secret data of the target users, nowadays the emphasis is on theoretical backdoors based on mathematical properties of the cryptographic primitives.Perhaps, the main reason for this new approach is that it is very difficult to discover a mathematical backdoor by just looking at the cryptographic algorithm: for example, Bannier and Filiol [9] showed in 2017 how a block cipher similar to AES can be devised so that it includes, by design, a hidden mathematical backdoor that allows a knowledged attacker to effectively break the cipher and recover the key.
Evil actors and legal actors pursue very different goals, which justify the adoption of very different backdoor mechanisms.An evil actor is primarily concerned with how convenient is triggering the backdoor, and secondarily with how well the back-door mechanism is hidden to the final user; if that mechanism also impairs the security of the cipher is not crucially important.Thus, a backdoor introduced by an evil actor might even be a vulnerability hidden in a cipher implementation such that anyone knowing about its existence could easily break the cipher and recover the encrypted messages.For instance, a mechanism that can be easily exploited might be based on a semi-prime generator that select just one of the primes at random, while the other prime is fixed.The Euclidean algorithm applied to two different vulnerable semi-primes outputs the fixed prime, thus anyone can easily break the cipher even if the fixed prime is not known in advance.Perhaps not surprisingly, there are in Internet a lot of very weak public keys [10,11].On the other hand, a legal actor does not want to significantly impair the security of a cryptographic algorithm, because the final users might just refuse to adopt an insecure cipher.A backdoor introduced by a legal actors is likely a vulnerability embedded in a cryptographic implementation that allows only "authorized" actors to decipher the encrypted messages without knowing the private keys of the final users.Usually, this means that the retrieval of the encrypted messages can be done only if the actor knows a secret escrow key related to the backdoor itself.
Among the most widespread cryptographic algorithms, RSA [12] likely deserves special consideration, because it is conveniently used to protect any kind of sensitive data transmitted over the Internet.While it is commonly believed that RSA has been properly designed and that, by itself, it does not contain hidden vulnerabilities, a large number of attacks to RSA have been proposed since its invention.These attacks span from directly factoring the semi-prime in the public key to exploiting weaknesses in the generation algorithm for the prime factors; for a survey, see [13].Furthermore, several RSA's backdoors have been proposed: they are specially crafted values in RSA parameters that allow a knowledged attacker to recover the private key from publicly available information.For a in-depth discussion of several RSA's backdoors see [14].
In this work we propose a new idea to inject backdoors in RSA key generators, which was loosely inspired by the concept of "implicit hints" of May and Ritzenhofen [15] in pairs of semi-primes.On the other hand, our idea differs significantly from the backdoors based on implicit hints and, as far as we know, from any other published backdoor proposal.
More specifically, May and Ritzenhofen proposed the Implicit Factorization Problem (IFP), which is based on the premise that two or more semi-primes having factors sharing some common bits can be factored with some variants of the Coppersmith's algorithm [16,17].The authors stated that "[. . .] one application of our result is malicious key generation of RSA moduli, i.e. the construction of backdoored RSA moduli".In our opinion, however, a backdoor based on shared bits, as described in [15], is not really effective for RSA.In fact, it is practically not possible to exploit this backdoor in large "balanced" semi-primes, such as those used in currently used RSA moduli, because the time required by the Coppersmith's algorithm to factor a semiprime grows exponentially when the difference in the size of the factors becomes smaller.Moreover, this vulnerability is self-evident to anyone looking at the factors, because there would be a long run of identical bits in the two values, which means that the backdoor cannot be easily concealed to the owner of the private keys.
Our new idea is the following: rather than prescribing that the bit-expansions of the factors include a long run of identical bits, we impose that the bit-expansions include portions of correlated bits, where the correlation is bound to a secret designer key not known to the owner of the keys.In practice, we impose some mathematical conditions on the values of the factors as congruences modulo a large prime (of nearly the same size of the factors), which acts as the designer key.
Following the IFP approach in [15], we firstly devised a backdoor (named TSB) based on mutual correlations between the factors of two distinct semi-primes.Afterwards we devised a simpler backdoor (named SSB) based on the same idea but suitable for injecting a backdoor in a single semiprime.The backdoors can be applied to RSA or to any other cipher whose security is based on the difficulty of the integer factorization of the semiprimes.
A key difference with the IFP approach is that in order to trigger the backdoors, that is, in order to factor the semi-prime(s) by exploiting the designer key, there is no need to apply some variant of the Coppersmith's algorithm.Therefore, if the value of the designer key is known, factoring the semiprime(s) is easy and efficient.On the other hand, if the designer key is not known, there seems to be no efficient way to factor the semi-prime(s).Moreover, without the designer key, there seems to be no efficient way to detect the existence of the backdoor, even when looking at the distinct prime factors of the semi-prime(s).
The rest of the article is organized as follows.
In Section 2 we define some mathematical notation and introduce the basic RSA algorithm.In Section 3 we present the prior works related to RSA backdoors and the Implicit Factorization Problem (IFP).In Sections 4 we discuss our simpler backdoor, SSB, while in Section 5 we discuss the more sophisticated backdoor for a pair of semi-primes, TSB.Finally, in Section 6 we draw some conclusions from this work.

Preliminaries
Let us establish some notations: a ≡ b (mod c) denotes the relation in which a−b is a multiple of c; often we will use the shorter notation a ≡ c b.The notation a mod b denotes the operation remainder of the division a/b; hence, a ≡ c (a mod c) and 0 ≤ a mod c < c.
If N ≥ 0 is an integer, its size in bits is defined as (N ) = max {1, log 2 (N + 1) }.We write x y if x and y are equal or differ by at most one, while we write x ≈ y if x and y differs by a value negligible with respect to the sizes of x and y.If N ≈ 2 n , with n large, then (N ) log 2 N , that is, we may consider both (N ) and log 2 N to be approximately equal to n, ignoring a ±1 difference in size.

If h is an integer, h
k denotes the k most significant bits of h (a value from 0 to 2 k − 1), while h k denotes the k less significant bits.
A semi-prime is a number N such that N = p q where p and q are primes.Therefore, (N ) (p)+ (q).If (p) (q), then the semi-prime is said to be balanced.In the following we consider also sequences of semi-primes N i = p i q i (i = 1, 2, . ..) having common size n = (N i ), for every i; furthermore, the primes q i have common size (q i ) = α; it follows that all primes p i have the same size n − α.
The RSA public key cryptosystem has been invented by Rivest, Shamir, and Adleman [12] in 1977.In its simplest form, the algorithm is based on a balanced semi-prime N = p q and a couple of exponents e, d such that gcd(e, φ(N )) = 1 and ed ≡ 1 (mod φ(N )).Here φ(N ) denotes the Euler's totient function, which can be easily computed as (p−1)(q −1) if the prime factors p and q are known.Theoretically, the value of e could be random, while the value of d can be computed from e and φ(N ) by using the Extended Euclidean algorithm.The pair (N, e) is the "public key" of RSA, and the encryption function is M e mod N .Either the pair (p, q) or the pair (N, d) is the "private key", and the decryption function is Of course, factoring N allows an attacker to recover the private key from the public key, because from p and q we can compute φ(N ) and then d ≡ e −1 (mod φ(N )).

Related work
Many authors proposed to classify backdoors embedded in cryptographic applications according to several, different criteria.Following [18], we consider three types of backdoors: (1) weak backdoors, (2) information transfer via subliminal channels, and (3) SETUP mechanisms.Weak backdoors are based on modifications of the cryptographic protocol such that it would be possible to anyone to break the cipher and recover the secret data.Vulnerabilities falling under the information transfer via subliminal channels category allow an attacker to exploit the cryptographic protocol in such a way to create a hidden communication channel that cannot be intercepted or unambiguously detected.Finally, SETUP (Secretly Embedded Trapdoor with Universal Protection) mechanisms create vulnerabilities in the cryptographic protocols that cannot easily exploited by third-party attackers.SETUP mechanisms have been firstly proposed by Young and Yung [19,20] in 1996: they coined the term "kleptography" to denote the usage of cryptographic primitives in order to design "safe" backdoors in other cryptographic protocols.Following the classical distinction between asymmetric and symmetric cryptography, SETUP mechanisms can lead to asymmetric backdoors and symmetric backdoors.
In an asymmetric backdoor the information required to recover the encrypted messages is protected by an asymmetric cipher.Usually, this means that some data that allows an actor to recover any user private key are encrypted with the public key of the designer of the RSA implementation and stored inside the corresponding user public key.Any actor that knows the corresponding designer private key may extract the data from the user public key and decipher them to recover the user private key.Observe that in this case the RSA implementation is "tamper resistant": even reverse engineering cannot reveal the designer private key.
In a symmetric backdoor, on the other hand, the designer key that allows an actor to recover the user private key from the user public key is stored in some form inside the RSA implementation itself.To be secure and undetected, the RSA implementation (perhaps, a physical device) must be "tamper proof".
Existing RSA backdoors may also be categorized according to the place where the backdoor's specific data are stored: either in the semi-prime N alone, or also in the exponent e of any public key (N, e)."Exponent-based" backdoors are somewhat easier to devise, because e could theoretically be any random value coprime with φ(N ).However, most RSA implementations make use of special fixed values for the public exponent, such as small values or values having small Hamming weight, in order to improve the efficiency of the RSA algorithm.Thus, exponent-based backdoors cannot be easily hidden to the final user, and can be perceptively slower than honest RSA implementations.Backdoors embedded in the public key's semi-prime does not limit the choice of the public exponent, however they must address a crucial problem: how to encode information about the factorization of the semi-prime in the semi-prime itself, in such a way that the information is encrypted with a secret key and, possibly, the pair (p, q) is indistinguishable by a pair of primes generated by a honest RSA implementation.
In this work we propose two backdoors embedded in the semi-primes of the RSA's public keys; as a matter of fact, the backdoors apply to any cryptographic protocol based on the integer factorization of semi-primes.Therefore, we don't discuss at length related work concerning exponent-based backdoors; examples can be found, for example, in [21,22,14,23]

Symmetric backdoors
The proposed SSB algorithm implements a symmetric backdoor, because the escrow key is fixed and hard-cabled in the hardware or software device that generates the vulnerable semi-primes.As we shall see, TSB might be considered both a symmetric or an asymmetric backdoor.
The first RSA backdoor has been proposed by Anderson [24] in 1993.It is a symmetric backdoor embedded in the public key's semi-prime: let β be a m-bit secret prime (the "backdoor key"), and let π β and π β be pseudo-random functions that, given a seed in argument, produce a (n − m)-bit value (in the original article, n = 256 and m = 200).For any vulnerable 2n-bit semi-prime N = pq, let t, t < √ β be (m/2)-bit random numbers coprime with β, and let p = π β (t) • β + t and q = π β (t ) • β + t .Given N and β, it is possible to compute tt = N mod β, then factor the m-bit number tt , and finally compute p and q.Kaliski [25] proves that it is possible to discover the backdoor by either computing the continued fraction p/q, because the expansion likely contains an approximation of the fraction π β (t)/π β (t ), or by finding a reduced basis of a suitable lattice built on the primes of two vulnerable moduli.He also shows that the backdoor can be detected by the lattice method when 14 or more non-factored vulnerable moduli are available.It is easy to observe that Kaliski's detection algorithm can be easily defeated by introducing a "dynamic backdoor key" whose exact value depends, for instance, on an incremental counter.However, another drawback of Anderson's backdoor is that m ≈ 3/4 • n, hence triggering the backdoor for currently used public key sizes might require factoring a too large integer.
Our first proposed backdoor, SSB, is similar to Anderson's construction, in that triggering the backdoor involves as first step computing the remainder of the integer division of the semi-prime and the designer (escrow) key.However, a key difference with Anderson's idea is the form of the primes p and q, which allows SSB to escape detection by Kaliski's algorithms and to avoid factoring a large integer when exploiting the backdoor.
In 2003, Crepéau and Slakmon [22] presented, among several others exponent-based backdoors, a semi-prime-based backdoor that relies on Coppersmith's attack [17] and encrypts the factor p in the RSA modulus N = pq in such a way that the bits in N n/8 have the correct distribution for a random semi-prime, while the middle n/4 bits of N are an encryption, via a pseudo-random function π β , of p n/4 .Our proposed backdoors use a entirely different mechanism and do not rely on Coppersmith's attack, which means that they can be efficiently exploited even on very large balanced semi-primes.
In 2008, Joye [26] studied the performances of generating a semi-prime N in which some bits are prescribed; he developed as an example a RSA symmetric backdoor based on the Coppersmith's attack in which some of the bits of p are encrypted in q.While this study is relevant when analyzing the generation times of any semi-prime backdoor, their proposal is entirely different than the present one.
The symmetric backdoor proposed by Patsakis [27] in 2012 is based on yet another idea: the parameterized, randomized backdoor algorithm decomposes an integer as sum of squares in a way depending on a designer's secret parameter.The backdoor consists in imposing that the semi-prime, once decomposed by using the secret parameter, can be easily solved by a nonlinear system whose solutions are properly bounded.
In 2017, Nemec, Sys, and others [28] exposed a critical vulnerability (perhaps unintentional) in the key generation algorithm of the RSALib library, which is written, adopted, and distributed to third parties by Infineon, one of the top producers of cryptographic hardware devices.This work raised much interest because the flaw was already present in devices produced in 2012 and the total number of affected devices, and consequently vulnerable keys, is huge.In any N = pq generated by the flawed RSALib, all primes p and q have the form k • M t + (65537 a mod M t ), where M t is the primorial number composed by the product of the first t primes, and k, a are random integers.The values of t for semi-primes of bit length n = 512, 1024, 2048, and 4096 are, respectively, t = 39, 71, 126, and 225.This means that the number of truly random bits in each of the primes is reduced, respectively, to 98, 171, 308, and 519.In order to find the factors of a vulnerable semi-prime, a variant of the Coppersmith's attack is used: it is possible to efficiently factor N = pq when the value p mod M is known.Hence, the recovering procedure determines a suitable divisor M of M t of size (M ) ≥ n/4 (to reduce the search space for a), guesses an exponent a, computes 67537 a mod M , and factors N .It is also easy to verify whether a given key is flawed: N is likely vulnerable if the discrete logarithm log 65537 N mod M t exists.Actually, this logarithm can be easily computed by the Pohlig-Hellman algorithm [29] because M t is the product of many small consecutive primes.Hence, ROCA belongs arguably to the weak backdoor category.

Asymmetric backdoors
The proposed TSB algorithm can be used to implement both symmetric and asymmetric backdoors.In fact, TSB makes use of an embedded designer key, but also generates two distinct semiprimes.If both semi-primes are used to build two distinct public keys, both available to a third-party attacker, then tampering with the TSB device may expose the designer key and break the keys.On the other hand, TSB can be used to generate a public key (from one of the generated semi-primes) and a dedicated escrow key composed by the hard-coded large prime inside the device and the second semiprime, which must be considered as the designer's secret key.This is a reasonable scenario for cryptographic keys used in a highly-secure work environment.In this second case, TSB must be considered an asymmetric backdoor, because tampering with the device is not enough to break any key already generated.
The first examples of asymmetric backdoors proposed by Young and Yung [19] in 1996 were exponent-based.However, that article also includes the description of a asymmetric semi-prime-based backdoor named PAP, for "Pretty Awful Privacy".The backdoor designer defines a designer's RSA public key (N = p q , e ) and private key (p , q , d ), where (N ) = n/2.Let F K and G K be invertible functions depending on a fixed key K that transform a seed of n/2 bits in a pseudo-random value of n/2 bits.In order to create a backdoor in a RSA moduli, the designer first chooses a prime p of bit length n/2 at random, then searches the smallest value K such that ρ = F K (p) < N .ρ is then encrypted as ρ 2 = G K (ρ e mod N ).The RSA semiprime N results from the search of a prime q such that the n/2 most significant bits of N = pq coincide with ρ 2 .The attacker can easily break the public key by extracting ρ 2 from N , then starting an exhaustive search of the value for K that, when applied to the inverse permutations G −1 K and F −1 K , permits to extract the proper factor p using the RSA private key (p , q , d ).
In a series of articles published between 1997 and 2008, Young and Yung [20,30,31,32] proposed several kleptographic backdoors for RSA using different cryptographic algorithms for embedding the factor p in N .
In 2010, Patsakis [33,27] proposed yet another kleptographic mechanism that relies on Coppersmith's attack and forges p and q so that the most significant bits of both of them are of the form (a + r) e mod N , where a is a secret design parameter, r is a random value, and (N , e ) is the designer's asymmetric public key.
In 2016, Wüller, Kühnel, and Meyer [34] proposed a RSA backdoor called PHP, for "Prime Hiding Prime", in which the information required to factor N is hidden in N itself.The idea is to select a prime p such that q = (p e • p −1 ) mod N is a prime, where (N , e ) is the RSA public key of the designer.To factor N = pq, the actor computes N d ≡ N (p • p e • p −1 ) d ≡ N p.An improvement of PHP, called PHP', is also described in [34]: here q = (s e • p −1 ) mod N , where s is the concatenation of n/4 random bits and p n/4 .Half of the bits of p are enough to recover the factorization of N thanks to the Coppersmith's attack.
Markelova [18] revisited Anderson's idea for a symmetrical backdoor and devised SETUP mechanisms that protect the backdoor by means of some public-key algorithms, in particular based on discrete logarithm problems on both finite fields and elliptic curves.The author also presented a SETUP backdoor exploiting the Chinese Remainder theorem.The article [18] also includes a discussion of the similarities of these SETUP backdoors with the ROCA backdoor.

The Implicit Factorization Problem
In 1985, Rivest and Shamir [35] introduced the oracle complexity as a new way to look at the complexity of the factorization problem (and the related RSA attack): they showed that the semi-prime N can be factored in polynomial time if an oracle provides 3/5 of the bits of p.In 1996, Coppersmith [16,17] improved the oracle complexity by showing that an explicit "hint" about the top half bits of p are sufficient for factoring N in polynomial time.In particular, Coppersmith described some algorithms based on lattice reduction and the LLL procedure [36] to find small integer roots of univariate modular polynomials or bivariate integer polynomials.Later [37,38], these algorithms have been reformulated in simpler ways and heuristically extended to the multivariate polynomial case.
The seminal article [15] focusing on "implicit hints" has been published in 2009 and it is due to May and Ritzenhofen.An oracle gives an implicit hint when it does not output the value of some bits of one of the factors of the semi-prime; rather, the oracle outputs another semi-prime whose primes share some bits with the primes of original semiprime.The authors formally introduced the Implicit Factorization Problem (IFP), and showed that two semi-primes N 1 and The algorithm is based on a lattice reduction: the search for the unknown primes q i is reduced to a search for a basis of a suitable lattice by means of the quadratic Gaussian reduction algorithm.This result implies that only highly imbalanced semiprimes can be factored, because (q 1 ) = (q 2 ) = α, hence (p i ) > 2 (q i ).The authors also extended this result to k > 2 semi-primes, and showed that a polynomial algorithm based on the LLL algorithm [36] exists if t ≥ αk/(k − 1).For the balanced case this result is not useful, because it means that all p i primes are identical, hence they can be easily recovered by the Euclidean algorithm.However, the authors also showed that their method can be used to factor k balanced semi-primes when some additional conditions are satisfied and n/4 bits are discovered by brute force.
All attacks and vulnerabilities based on these results assume that the factors of vulnerable semiprimes share some identical bits.From a practical point of view, backdoors relying on shared identical bits cannot be easily concealed to anyone looking at the factors, that is, the private key.Furthermore, all the results cited in this section are based on some variants of Coppersmith's algorithms [16,17].On the other hand, our proposed backdoors generate semi-primes with factors without common shared bits and do not require Coppersmith's algorithm.Therefore, they are difficult to be detected and are much more efficient when applied to semi-prime having large size, such as those used in the currently used RSA public keys.

SSB: a backdoor embedded in a single semi-prime
In this section we introduce SSB (Single Semiprime Backdoor), our proposal for a backdoor encoded in the value of a semi-prime N .We first describe the vulnerability and how the semi-prime is generated; then, we describe the procedure to efficiently factor it, provided that the corresponding "escrow key" is known.Finally, we analyze the theoretical and practical "efficiency" of the backdoor.

Generation of a vulnerable semi-prime
The first step of the generation of a vulnerable semi-prime is choosing a "escrow key", namely a secret value that must be known in order to detect and exploit the backdoor.The escrow key is a prime T of size slightly smaller than the size of the primes in the semi-prime.Thus, if α is the reference bit length of the primes (e.g., α = 512 for RSA-1024), then (T ) α − c, where typically 4 ≤ c ≤ 10 for α ≤ 2048; a good value for α = 512, 1024, and 2048 appears to be c = 7.The backdoor designer must also choose the value of a constant K, which must be related to the value of α, as discussed later; in particular, we will show that a good practical choice may be K ≈ α.
In order to create a vulnerable semi-prime, two distinct primes p and q, each of them having bit length roughly α, must be generated.The backdoor exists whenever the following condition holds: (H0) There exists a positive integer k with 1 < k ≤ K such that p ≡ T k q.
G e t C o r r e l P r i m e s : Input : α , T , K Output : p , k w h i l e t r u e g e n e r a t e random p r i m e s q and r o f s i z e α f o r k := 2 t o K : p := r + (k * q − r) mod T i f p i s prime : r e t u r n p , q end f o r end w h i l e The algorithm in Figure 1 can be used to generate the two primes p and q satisfying the condition H0 above.It is based on Dirichlet's theorem stating that there are infinitely many primes of the form a + b c if gcd(a, b) = 1 (consider b = T and a = k • (q mod T )).The semi-prime is then computed as N = p q.

Recovering procedure
The key idea of SSB, and also the proof that it works as expected, is its recovering procedure.Formally, the factors of N can be efficiently recovered by knowing in advance only the semi-prime N and the escrow key T .The values of the parameters α, K, and c may affect the running time of the recovering procedure, however there is no need to know them to recover the factors.

Recovering the factors.
Generally speaking, in a practical implementation of the recovering procedure it might be convenient to interleave the executions of these three phases.However, we discuss the phases independently to simplify the description of the whole procedure.

Recovering "low-level" coefficients
At the beginning we only know N and T .The equation N = p q and the equation in condition H0 imply: By combining them we get: where K is a reasonably small constant, we can exhaustively test every possible value for k and discard any value for which N • k −1 in the Galois field GF(T ) is a quadratic non-residue, that is, discard any value k such that for all integers γ ∈ [0, T ), (N mod T ) (k −1 mod T ) ≡ T γ 2 .Here k −1 denotes the value in GF(T ) such that k At the end of this phase we have a list containing candidate values for the "low-level" coefficient k and the corresponding quadratic residue γ 2 in GF(T ).The correct value of k yields γ 2 ≡ T q 2 .

Recovering "high-level" coefficients
Let us assume that we start this phase by knowing N , T , k, and q 2 mod T .Actually, we execute this phase for any candidate in the list built in the previous phase and discard any candidate as soon as it yields inconsistent results.
As first step, we compute the square root of γ 2 = q 2 mod T in GF(T ), that is, we find the values whose square is congruent to γ 2 modulo T , typically by means of the Tonelli-Shanks algorithm [57,58].Because in general any square root has two distinct values in GF(T ), we get two possible values γ 1 and γ 2 for q mod T , where γ 1 ≡ T T − γ 2 .In the following, let γ be either γ 1 or γ 2 ; we have to perform this phase with both values and discard the one that yields inconsistent results.
We can easily compute the value p mod T from equation ( 2), so we may now assume to know the values N , T , q mod T , and p mod T .
The semi-prime N can be written as: that is, if δ = (N − (p mod T ) (q mod T )) /T : δ = π ν T + π (q mod T ) + ν (p mod T ). ( From the last equation we easily get the following bounds: Therefore, (π) + (ν) (π ν) Because by construction c is a small constant, we can adopt a brute force approach to discover the missing "high-level" coefficients π and ν.The brute force search guesses the value of the sum π + ν, starting from the lower bound 2 ( N/T 2 − 1) (from equation ( 7)) and ending at the upper bound N/T 2 ≈ 2 2c (from equation ( 6)).
For any candidate value of the sum π + ν, let us transform equation ( 5) by introducing an unknown Because we are looking for integer solutions for x and C − x, the brute force attack just try all values for C, in increasing order, and immediately discard any value such that is not a square.If the value of C survives, the solutions are computed; if either one of the solutions is an integral number, the pair (x, C − x) = (π, ν) is recorded as a candidate solution.

Recovering the factors
When this phase starts, we know N , T , p mod T , q mod T , and a list of candidate solutions (π, ν).
For any candidate solution (π, ν), we compute the corresponding p = π T + (p mod T ) and q = ν T + (q mod T ), then we simply verify whether p • q = N .One of the candidate solutions certainly yields a factorization of the semi-prime.

Analysis
We briefly describe here the time complexity of the SSB's recovering procedure.As explained in the previous subsection, the procedure starts by recovering the "low-level" coefficients by means of an exhaustive search among O(K) possible values for k.For every candidate value we must execute some operations in GF(T ) whose cost is in O((log T )2 ) = O(α 2 ), and also the Tonelli-Shanks algorithm to determine if a value < T is a quadratic residue, which costs O((log T ) 3 ) = O(α 3 ) [59].The list of candidate values for k has expected length K/2, because in a finite field with an odd number of elements any quadratic residue has two square roots, thus half of the elements of the field are not square of another element.Therefore, the "highlevel" coefficients recovery phase is executed on O(K) candidate values for k and includes an exhaustive search in an interval of size O(2 2c ); in every iteration we execute a few integer operations on values of bit length ≈ 2(α + c); hence, every execution of this phase has a cost in O(2 2c (α + c) 2 ).Finally, the cost of every execution of the third phase is dominated by two multiplications of values of bit length ≈ α − c, hence it is in O(α 2 ).Summing all up, the worst-case cost of the whole recovering procedure is in O(K (α + c) 3 2 2c ).
The values of the parameters K and c are chosen by the backdoor designer.We would expect that larger values of K and c yield smaller running times for the search algorithm in Figure 1 and longer running times for the recovery procedure; this intuition is confirmed by the experiments.Anyway, the value of c cannot be made too large, or it would be possible to discover the backdoor by just guessing the design key T of bit length (T ) = α − c.By letting K ∈ O(α) and c ∈ O(log α), for instance K ≈ α and c = 7 as suggested in subsection 4.1, we obtain a running time for the recovery procedure in O(α 4 ), that is, polynomial in the size of the semi-prime.

Experimental results
In order to confirm that the backdoor works as expected and to assess the execution times with respect to the designer's parameters, we implemented SSB in SageMath [60] and performed extensive tests. 2  In particular, we considered three values for α: 512 (the size of factors for RSA-1024), 1024 (RSA-2048), and 2048 (RSA-4096).All tests have been performed by choosing c = 7.This means that the escrow keys have sizes 505, 1017, and 2041, respectively.The value of c is so small that detecting the existence of the backdoor by simply guessing the value of the escrow key does not appear to be significantly easier than guessing one of the factors of the corresponding semi-primes.Every test trial involves choosing a value for the parameter K, generating a escrow key T and a vulnerable semi-prime, then recovering the factors of the semi-prime by just using the values of the semi-prime and the escrow key.We basically executed the tests by varying the parameter K so as to determine a value yielding both fast generations of vulnerable semi-primes and reasonably quick recovery of the factors.
The tests have been executed on three computational nodes with 16 physical cores Intel Xeon E5-2620 v4 running at 2.1 GHz with 64 GiB of RAM.The nodes are based on the Slackware 14.2 software distribution with a Linux kernel version 5.4.78 and SageMath version 9.1.All tests have properly recovered the factors of the vulnerable semi-primes.Each value of K ∈ {100 • i | i = 1, . . ., 50} has been tested 20 times.The SageMath code is sequential, that is, each test trial runs on a single computation core.We report in Table 1 and Figure 2 averages and standard deviations of the running times.
The experimental results confirm that the value of K is crucial in determining both the time required to generate a vulnerable semi-prime and the time required to recover the factors.Even if the code has not been optimized at all, the recovery time is reasonably small for all tested values of K, hence SSB is a practically effective backdoor.However, generation time is also very important whenever the backdoor mechanism has to be hidden in hardware devices or software programs that are supposed to yield robust, legit semi-primes.While in general larger values of K are associated to smaller generation times, there seems to be a threshold value for K above which the generation times are essentially constants and near the minimum observed value.From the values shown in Table 1 and Figure 2

TSB: a backdoor embedded in a pair of semi-primes
In this section we describe TSB (Twin Semiprime Backdoor), our proposal for a backdoor embedded in the values of a pair of semi-primes N 1 and N 2 .These semi-primes are typically to be generated on the same device but can be used independently.For instance, the two semi-primes might be used in two different RSA keys. 3 Alternatively, one of the semi-prime can be used to build a RSA key while the other one would be separately stored as an escrow key for the RSA key.
We first describe how the two semi-primes are generated.Then we describe the procedure to efficiently factor both semi-primes, provided that the corresponding "designer key" is known.

Generation of the vulnerable pair of semiprimes
The first step of the generation of a vulnerable pair is choosing a "designer key", namely a secret value that must be known in order to detect and exploit the backdoor.The designer key is a prime T of size slightly smaller than the size of the primes in each semi-prime.Thus, if α is the reference bit length of the primes (e.g., α = 512 for RSA-1024), then (T ) α − c, where typically 4 ≤ c ≤ 10 for α ≤ 2048; a good value for α = 512, 1024, and 2048 appears to be c = 7.The backdoor designer must also choose the values of two constants K and B. The value of K is related to the value of α, as discussed later; typically, we may set K ≈ α/5, e.g., K = 100, 200, and 400 for α = 512, 1024, and 2048, respectively.The constant B < T acts as a detection threshold, so any value for B such that (B) α − 2c is valid.
In order to create a vulnerable pair, four distinct primes p 1 , q 1 , p 2 , q 2 , each of them having bit length roughly α, must be generated.The backdoor exists whenever the following conditions hold: (H1) There exists a positive integer h with 1 < h ≤ K such that q 2 ≡ T h 2 q 1 .
3 It is not hard to justify the generation of two different RSA keys.For instance, the user might be told that one RSA key is for business or work usage and the other one is for personal usage.
The algorithm in Figure 3 can be used to generate the four primes p 1 , q 1 , p 2 , and q 2 satisfying the conditions H1-H6 above.Once more, the algorithm is implicitly based on Dirichlet's theorem stating that there are infinitely many primes of the form a + b c when gcd(a, b) = 1.
Finally, the semi-primes are computed as N 1 = p 1 q 1 and N 2 = p 2 q 2 .Observe that N 1 and N 2 are coprime, because all factors are necessarily different by construction.

Recovering procedure
The key idea of TSB, and also the proof that it works as expected, is its recovering procedure.Formally, the factors of N 1 and N 2 can be efficiently recovered by knowing in advance only the pair of semi-primes (N 1 , N 2 ) and the designer key T .The values of the parameters α, K, and c may affect the running time of the recovering procedure, however there is no need to know them to recover the factors.
Generally speaking, in a practical implementation of the recovering procedure it might be convenient to interleave the executions of these three phases.However, we discuss the phases independently to simplify the description of the whole procedure.
G e n e r a t e P a i r : I n p u t : α , c , K , T Output : p1 , q1 , p2 , q2 do g e n e r a t e random p r i m e s q1 , GetCorrelPrime : I n p u t : α , q , j , T , K , c Output : p , k w h i l e t r u e k := random v a l u e between 2 and K t1 := (k j q) mod T f o r p := t1 + 2 c−3 t o t1 + 2 2 c−2 i f p ≡ T t1 and p i s prime then r e t u r n p , k end i f end w h i l e

Recovering "medium-level" coefficients
When starting the recovering procedure we assume to know the following data: N 1 , N 2 , and the "secret" prime T .
Equations in conditions H1, H2, and H3 enforce the following congruences of N 1 and N 2 modulo T : It turns out that N 1 and N 2 are congruent modulo T to two values that have a big common factor, h 2 q 2 1 .However, the Euclidean algorithm on N 1 mod T and N 2 mod T does not really help here: The point is that the greatest common divisor is relative to the lifted images of the products in the Galois field GF(T ), and it is not related to the greatest common divisor of the products h 3 k 1 q 2 1 and h 2 k 2 q 2 1 in Z.
To overcome this problem, observe that equations ( 8) and ( 9) also imply the following ones: and therefore there exist two integers k1 , k2 such that From the last two equations we derive: Observe that dropping N 1 mod T from equation (12) Similarly, from equation ( 13): Hence, the sizes of the "medium" coefficients k1 and k2 is so small that they can be quickly recovered by a brute force approach as in Figure 4.
It is possible to recognize the proper values of k1 and k2 because the size of (h 2 q 2 1 ) mod T produced by the gcd with the right values is usually much higher than the average value resulting from a gcd with random wrong values.In fact, by condition H6, (h 2 q 2 1 ) mod T > B; hence we select any candidate pair of medium-level coefficients ( k1 , k2 )  for which the greatest common divisor in equation ( 14) is between B and T .Moreover, the value returned by the Euclidean algorithm with the right values must be a square in the Galois field GF(T ), hence we may use this condition to filter some false positives.In all our test cases, the first value found by this brute force procedure yields a proper factorization result.

Recovering "low-level" coefficients
The previous phase might determine several candidate pairs of medium-level coefficients, and the current phase must be applied to each of them.
Let us assume at this point to know the following data: N 1 , N 2 , T , k1 , k2 , and the value γ 2 = (h 2 q 2 1 ) mod T derived from equation ( 14).The value of the "low-level" coefficient k 2 can be immediately computed by using equation 13: or, assuming On the other hand, by inverting equation 12 we get the value of the product h k 1 : or, assuming K < T , (h k 1 ) = (N 1 • (γ 2 ) −1 ) mod T .
Since both h and k 1 are not greater than K, their product is below K 2 .Moreover, by condition H4, gcd(h, k 1 ) = 1.Because the number of multiplicative partitions of this product does not exceed K 2 [61,62], we may exhaustively generate all possible candidate pairs (h, k 1 ) and apply the forthcoming phases to each of them.When these phases are performed on the true pair (h, k 1 ), a proper factorization of N 1 and N 2 is computed.

Recovering "high-level" coefficients
Let us assume that we start this phase by knowing the following data: N 1 , N 2 , T , h, k 1 , k 2 , and γ 2 .
As first step, we compute the square root of γ 2 = (h 2 q 2 1 ) mod T in GF(T ), that is, we find the values whose square is congruent to γ 2 modulo T , typically by means of the Tonelli-Shanks algorithm [57,58].Because in general any square root has two distinct values in GF(T ), we get two possible values γ 1 and γ 2 for (h q 1 ) mod T , where γ 1 ≡ T T − γ 2 .In the following, let γ be either γ 1 or γ 2 ; we have to perform this phase with both values and discard the one that yields inconsistent results.
We can now compute the value q 1 mod T , because γ = (h q 1 ) mod T means: where obviously h −1 is computed in GF(T ), that is, h h −1 ≡ T 1.
From equation in condition H1 we can now infer the value q 2 mod T , because: Also, from equations in conditions H2 and H3 we compute p 1 mod T and p 2 mod T : At this point we know the values N 1 , N 2 , T , q 1 mod T , q 2 mod T , p 1 mod T , and p 2 mod T .The semi-prime N i (i ∈ {1, 2}) can be written as: that is, if δ i = (N i − (p i mod T ) (q i mod T )) /T : From the last equation we easily get the following bounds: Therefore, Because by construction c is a small constant, we can adopt a brute force approach to discover the missing "high-level" coefficients π i and ν i .The brute force search guesses the value of the sum π i + ν i , starting from the lower bound 2( N i /T 2 − 1) (from equation (23)) and ending at the upper bound N i /T 2 ≈ 2 2c (from equation ( 22)).
For any candidate value of the sum π i + ν i , let us transform equation ( 21) by introducing an unknown Because we are looking for integer solutions for x and C − x, the brute force attack just try all values for C, in increasing order, and immediately discard any value such that are computed; if either one of the solutions is an integral number, the pair (x, C − x) = (π i , ν i ) is recorded as a candidate solution.

Recovering the factors
When this phase starts, we know N i , T , p i mod T , q i mod T , and a list of candidate solutions (π i , ν i ), for i = 1, 2. We work on every semi-prime separately.
For any candidate solution (π i , ν i ), we compute the corresponding p i = π i T + (p i mod T ) and q i = ν i T + (q i mod T ), then we simply verify whether p i • q i = N i .One of the candidate solutions certainly yields a factorization of the semi-prime.

Analysis
We briefly describe here the time complexity of the TSB's recovering procedure.As already explained, the procedure starts by recovering the "medium-level" coefficients by means of an exhaustive search among K 3 possible values for the pair ( k1 , k2 ).For every candidate pair we must execute the Euclidean algorithm on values of bit length up to ≈ ( k1 T ), which costs O(log( k1 T )) = O( ( k1 ) + (T )) = O(log K + α − c).We may also use the Tonelli-Shanks algorithm to determine if a value < T is a quadratic residue, which costs O((log T ) 3 ) = O(α 3 ) [59].The "low-level" coefficients recovery phase involves a couple of integer divisions on values ≈ k1 T , a factorization of a value < K 2 , and the generation of up to K 2 candidate pairs (h, k 1 ); hence, each execution of this recovery phase has a cost in O(α 2 + K 2 ).The "high-level" coefficients recovery phase includes an exhaustive search in an interval of size O(2 2c ); in every iteration we execute a few integer operations on values of bit length ≈ 2(α + c); hence, every execution of this phase has a cost in O(2 2c (α + c) 2 ).Finally, the cost of every execution of the fourth phase is dominated by four multiplications of values of bit length ≈ α − c, hence it is in O(α 2 ).Summing all up, the worst-case cost of the whole recovering procedure is in O(K 5 (α + c) 2 2 2c ).
The values of the parameters K and c are chosen by the backdoor designer.It is easy to observe that larger values of K and c yield shorter running time for the search algorithm in Figure 3 and longer running time for the recovery procedure.Anyway, the value of c cannot be made too large, or it would be possible to discover the vulnerability by just guessing the design key T of bit length (T ) = α − c.On the other hand, experimental results show that larger values of c do not necessarily yield shorter times for the generation phase.By letting K ≈ α/5 and c = 7, as suggested in subsection 5.1, we obtain a running time for the recovery procedure in O(α 7 ), that is, polynomial in the size of the semi-primes.

Experimental results
In order to confirm that the backdoor works as expected and to assess the execution times with respect to the designer's parameters, we implemented TSB in SageMath [60] and performed extensive tests. 4n particular, we considered three values for α: 512 (the size of factors for RSA-1024), 1024 (RSA-2048), and 2048 (RSA-4096).All tests have been performed by choosing c = 7.This means that the designer keys have sizes 505, 1017, and 2041, respectively.The value of c is so small that detecting the existence of the backdoor by simply guessing the value of the designer key does not appear to be significantly easier than guessing one of the factors of the corresponding semi-primes.Every test trial involves choosing a value for the parameter K, generating a designer key T and a pair of vulnerable semi-primes, then recovering the factors of the semi-primes by just using the values of the semiprimes and the designer key.We basically executed the tests by varying the parameter K so as to determine a value yielding both fast generations of vulnerable semi-primes and reasonably quick recovery of the factors.
The tests have been executed on the same computational nodes described in Section 4.
All tests have properly recovered the factors of the vulnerable semi-primes.
Each value of K ∈ {10 • i | i = 1, . . ., 40} has been tested 20 times.The SageMath code is sequential, that is, each test trial runs on a single computation core.We report in Table 2 and Figure 5 averages and standard deviations of the running times.
The value of K is crucial in determining both the time required to generate a pair of semi-primes and the time required to recover the factors.The experimental results show that, even if the SageMath code is not optimzed, the recovery time is reasonably small for all tested values of K, hence TSB is a practically effective backdoor.However, generation time is also very important whenever the backdoor mechanism has to be hidden in hardware devices or software programs that are supposed to yield robust, legit semi-primes.While in general larger values of K are associated to smaller generation times, there seems to be a threshold value for K above which the generation times are essentially constants and near the minimum observed value.From the values shown in Figure 5

Conclusions
We presented a new idea for designing backdoors in cryptographic systems based on the integer factorization problem.The idea consists in introducing some mathematical relations among the factors of the semi-primes based on congruences modulo a large prime chosen by the designer.A first algorithm, SSB, can be used to implement a symmetric backdoor, hence the designer key acts as a pure escrow key that must be kept hidden to the owner of the generated keys (in order to hide the vulnerability) and to third-party attackers.Another proposed algorithm, TSB, injects a vulnerability in a pair of distinct semi-primes and may be used to implement both a symmetric backdoor and an asymmetric one.It is interesting to observe, however, that it does not seem to be hard to plug an asymmetric cipher in both SSB and TSB, similarly to the mechanism implemented in [18] for the Anderson's backdoor; this may be a future evolution of the present work.
We implemented both SSB and TSB in SageMath and conducted extensive experiments to determine optimal values for a crucial parameter of the algorithms, which basically sets a trade-off between the generation time of the vulnerable semi-primes and the recovery time when exploiting the backdoors.The SageMath code has not been optimized at all, however even for large RSA-4096 keys the recovery time is reasonably small (a few hours, at worst, on a single computation core).
A crucial point is minimizing the generation time of the vulnerable semi-primes.Our generation algorithms are not very sophisticated or optimized, because basically they generate random values in the hope to find the proper primes satisfying the mathematical conditions of the backdoors.We would like to get generation times similar to those of legit public key generators.However, an analysis of the performances of semi-prime generators likely depends on the characteristics of the underlining pseudo-random number generators, which also may depend on external factors such as the amount of entropy collected by the system (see for example Linux's PRNG).Such analysis is not simple, hence it has to be deferred to a future work.mental evaluation of our algorithms.We also thank E. Ingrassia e P. Santucci for their valuable comments and suggestions.

Declaration of interests
The author declares that he have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Figure 1 :
Figure 1: Generation of a vulnerable semi-prime with escrow key T

Figure 2 :
Figure 2: SSB: average running times for α = 512, 1024, 2048 (20 trials for each value of K).Magnitudes of the standard deviations are shown as vertical bars.

Figure 3 :
Figure 3: Generation of a vulnerable pair of semi-primes with designer key T u t : N1 , N2 , T Output : a l i s t o f p a i r s ( k1 , k2)f o r s := 0 t o ∞ f o r k1 := 0 t o s k2 := s − k1 gg := gcd ( k1 • T + N1 mod T , k2 • T + N2 mod T ) i f ( B <gg< T ) and( gg ≡ T γ 2 f o r some γ ) then add ( k1 , k2) t o t h e l i s t o f c o e f f .end i f end f o r end f o r

Figure 4 :
Figure 4: Brute-force search of the medium-level coefficients

Figure 5 :
Figure 5: TSB: average running times for α = 512, 1024, 2048 (20 trials for each value of K).Magnitudes of the standard deviations are shown as vertical bars.