Mathematical Attack of RSA by Extending the Sum of Squares of Primes to Factorize a Semi-Prime

: The security of RSA relies on the computationally challenging factorization of RSA modulus � = � � � � with � being a large semi-prime consisting of two primes � � and, � � for the generation of RSA keys in commonly adopted cryptosystems. The property of � � and � � , both congruent to 1 mod 4, is used in Euler’s factorization method to theoretically factorize them. While this caters to only a quarter of the possible combinations of primes, the rest of the combinations congruent to 3 mod 4 can be found by extending the method using Gaussian primes. However, based on Pythagorean primes that are applied in RSA, the semi-prime has only two sums of two squares in the range of possible squares √� − 1, ��/2 . As � becomes large, the probability of finding the two sums of two squares becomes computationally intractable in the practical world. In this paper, we apply Pythagorean primes to explore how the number of sums of two squares in the search field can be increased thereby increasing the likelihood that a sum of two squares can be found. Once two such sums of squares are found, even though many may exist, we show that it is sufficient to only find two solutions to factorize the original semi-prime. We present the algorithm showing the simplicity of steps that use rudimentary arithmetic operations requiring minimal memory, with search cycle time being a factor for very large semi-primes, which can be contained. We demonstrate the correctness of our approach with practical illustrations for breaking RSA keys. Our enhanced factorization method is an improvement on our previous work with results compared to other factorization algorithms and continues to be an ongoing area of our research.


Introduction
The RSA (Rivest-Shamir-Adleman) public-key primitive named after its inventors is the most widely used in cryptography since its introduction in 1977 [1]. The generation of RSA keys in public key cryptosystems is based on the modulus of a positive integer = , where is a semi-prime, which is a product of two large primes and [2,3]. The computationally intractable factorization property of semi-primes when they are large has been the fundamental premise in using RSA keys for computer security in several applications [4][5][6]. Modern cryptosystems make wide use of RSA encryption and digital signatures for secure message exchange and communication over different types of networks within government as well as various industry sectors [7][8][9]. These include popular transactions in various applications including Internet of Things (IoT) such as mobile banking, online shopping, smart card payments, e-health and e-mail communications that are available to the common man over the Internet [10,11]. Despite difficulty in breaking RSA keys (i.e. semi-prime factorization), cybercrimes and RSA key attacks are still on the rise [12][13][14].
A cryptanalytic attack of a short RSA key by M. J. Wiener was established as the first of its kind in 1990 [15,16]. Hence, the difficulty of factoring RSA modulus by choosing strong prime factors and was considered as a solution to address these attacks. Since then, it has become a common practice to employ 512-bit RSA to provide the required strong primes for many cryptographic security protocols [17]. Even though 512-bit RSA modulus was first factored in 1999, with the high computational power of today, there is still difficulty with 512-bit factorization in real-time applications [18]. Therefore, 512-bit RSA keys are actively used by popular protocols for email authentication such as Domain Keys Identified Mail (DKIM), retrieving messages from email servers by the client such as Post Office Protocol 3 (POP3), Simple Mail Transfer Protocol (SMTP) and Internet Message Access Protocol (IMAP), data exchange on the web such as Hypertext Transfer Protocol Secure (HTTPS), connecting two computers such as Secure Shell (SSH), and providing privacy such as Pretty Good Privacy (PGP) [19][20][21].
Enthusiastic amateurs have tried to factor 512-bit RSA keys and in 2012, Zachary Harris factored the 512-bit DKIM RSA keys used by Google and several other major companies in 72 hours per key, by using distributed cloud services [22][23][24]. The Factoring as a Service (FaaS) project demonstrates that a 512-bit RSA key can be factored reliably within four hours in a public cloud environment [19]. Factorization of the 768-bit RSA key has also been demonstrated using various methods such as the number field sieve factoring method and is several thousand times more difficult than 512-bit RSA keys. The higher the key size, the more difficult it is to factor the RSA key. Hence, with new methods and progress in computing power over time, there are risks and implications for the future of RSA. This forms the motivation for studying the mathematical principles of semi-prime factorization in proposing a novel method to increase the likelihood of breaking an RSA key.
Euler's factorization of a semi-prime = is based on the property of , and , both congruent to 1 mod 4 [25]. These constructions are based on Pythagorean primes that are applied in RSA [26][27][28]. These contribute to only a quarter of the possible combinations of primes and the rest of the combinations congruent to 3 mod 4 are found based on the Gaussian prime extension method [29]. The semi-prime has only two sums of two squares in the range of possible squares from √ − 1 to /2, and therefore, we extend previously established methods to increase the likelihood that a sum of two squares can be found [30,31]. Even though many sums of squares exist, once any two of them are found, we show that it is sufficient to only find two solutions to factorize the original semi-prime. Our enhanced factorization method is an improvement of our previous work. We apply our method for case scenarios and provide the necessary conjectures. Our algorithm is practically simple and is implemented using rudimentary arithmetic operations that require minimal computational memory with search cycle time being a factor to be considered for very large semiprimes. Further, we demonstrate the successful breaking of RSA keys such as 768-bit RSA verified through the implementation of our algorithm in Java. We provide the results highlighting the complexity of our enhanced factorization algorithm and comparing the performance with other factorization algorithms laying the scope for future research as well.
The rest of the paper is organized as follows. Section 2 discusses related work and the uniqueness of our work. Section 3 provides our enhanced semi-prime factorization by extending the sum of squares method and specifies our algorithm with implementation steps. In addition, the performance of the algorithm, its complexity and comparison with other factoring methods are described. Section 4 demonstrates the correctness of our algorithm when applied to break the 768-bit RSA key. Section 5 discusses the results and finally conclusions along with future research directions are given in Section 6.

Background
The difficulty of the semi-prime factorization problem forms the essential aspect to the security of an RSA cryptosystem. Revisiting RSA cryptography [1,3,17], assuming that and are two primes used to generate a semi-prime = , the Euler's totient function is given by In RSA based public key cryptography, two different keys known as public and private keys are used to perform the encryption and decryption of data or messages [32,33]. Any sensitive information is encrypted with a public key and it requires a matching private key to decrypt it. The public key is chosen arbitrarily as a pair = ( , ) where is an integer and not a factor of and 1 < < . The private key is based on tuples ( , , , ) , such that = ( , ) and is determined using the extended Euclidean algorithm = 1. A public key is used to encrypt a message , into a cipher text , such that = To retrieve the original message, the corresponding private key is used to decrypt the encrypted message such that = RSA cryptosystems make use of public key and private key generation techniques for security of data and end-to-end security of information transmission that cannot be understood by anyone except the intended recipients. These techniques are employed to authenticate the sender and the receiver of a message and to ensure that the integrity of the data or message received without being tampered with. However, the problem of determining from = ( , ) is equivalent to finding factors of RSA modulus . Hence, choosing strong primes for and is very important such that the factorization of the semi-prime , becomes computationally infeasible and nontrivial for an adversary. In practical applications, a smaller private key may be used for a faster decryption algorithm to improve the computational speed of online transactions. Once the private key is found, it can result in a total breaking of RSA posing a great security risk. Hence, an enhanced prime factorization method to attack the small decryption exponent could pose serious security challenges to RSA cryptosystems that are widely adopted even today.
Several studies have attempted to perform a general survey of attacks on the RSA cryptosystem since its introduction [16,19,34]. In 1990, Wiener introduced the first RSA cryptanalytic attack showing that if the decryption exponent is small with an upper bound given by < . using the continued fractions method [35]. Subsequently, Boneh and Durfee [36] proposed an attack on short decryption exponents with an improved upper bound while the RSA attack by Blomer and May [37] had demonstrated an upper bound of . with lattices of smaller dimension. Coppersmith's technique used lattice reduction approach for finding small solutions of modular bivariate integer polynomial equations [38].
Most of the recent research has also been focusing on extending the number range upper bound of and in the RSA private and public keys by working on the limitation of the Wiener and Coppersmith methods by approaching the problem differently. One recent work [39] considers the prime sum + using sublattice reduction techniques and Coppersmith's methods for finding small roots of modular polynomial equations, achieving slight improvement in the upper bound with reduced lattice dimension. Another work [17] uses a small prime difference − method which is then developed into a continuous fraction as per Wiener's original method. While these extend the range of Boneh's original limit, the Lenstra-Lenstra-Lovász (LLL) method is still seen as the state of the art which was first proposed by Lenstra and Lenstra in 1991 [40].
A survey on the history of number theory reveals that it has been explored widely by mathematicians to establish different representations of integers, in particular prime numbers, with a view to arrive at more efficient methods in deriving them [41][42][43][44][45][46][47]. Up until about a decade ago, there had been a strong mathematical interest in polynomials which generated sums of squares. However, recently there is a reviving interest in their application to practical experimentations for establishing their rudimentary computations and implications to cryptography [29,[48][49][50][51][52]. In line with these approaches, it has been proved in our previous work that the semi-primes can be represented as the sum of four squares [30]. A new factorization method as a faster alternative to Euler's method [25] was proposed by establishing the relationship among the four squares. Our interest is to apply this method in a novel way to the semi-prime factorization problem and is part of our ongoing research. This paper aims to explore further, on new findings of the method that once one sum of the squares is known, this can be used to find the other.

Proposed Method
We consider the basis of an earlier work [25,42] that showed a semi-prime , constructed from two primes, and , is also congruent to 1( 4) ≡ . Further, in [30], it was established that a sum of four squares, could be reduced to two sums of two squares using the Brahmagupta-Fibonacci identity given in [49]. We note that Gaussian primes are of the form ≡ 3 4 , and cannot be represented as the sum of two squares [29]. On the other hand, Pythagorean primes are of the form ≡ 1 4 [26][27][28]53]. In accordance with Fermat's Christmas theorem, an odd prime can be represented as the sum of two squares of integers and , if and only if 1 ( 4) ≡ = + [42,54]. This property was useful to determine which numbers could be represented as the sum of two squares, which was later proved by Euler [25].
In an earlier work [30], it was proved that if a semi-prime is constructed using two Pythagorean primes of the form ≡ 1 4 then Euler's factorization can be used to find two representations as the sum of two squares. Finding these two representations is non-trivial and computationally intensive for large numbers even with high performance computers. We make use of a previously established property that all Pythagorean triples can be represented as = + ( + 2 − 1) [28]. This equation provides a computationally simpler search using increments of and fine convergence using . In this paper, we extend our related works that were reported previously to show that once one sum of the squares is known, it can be used to find the other. We provide our proposed method taking a step by step approach. First, we apply our method to semi-prime factorization of two simple case examples and arrive at a conjecture. Following this, we apply our proposed method to two more large semi-primes as case examples. Finally, we demonstrate in the next section, the breaking of 768-bit RSA using our proposed method as the final result achieved.
The first 3 sums of two squares can easily be found by decrementing from 403 such that a perfect square remains. By applying Equations (1)-(5), we arrive at the following sequence of steps: In this way, by using the modified Euler factorization from previous work [30], the factors of a compound number can be found. This can be extended to compound numbers where the factorization is not known.

Factorizing a Semi-Prime
With an increase in the application of number theory in information security, it has drawn the attention of researchers to explore the interesting problem of factorizing a semi-prime, which is a positive integer that has two prime factors, and forming = . Encryption algorithms such as RSA and public-key cryptosystems rely on special large prime factors of a semiprime for encoding a sender's message and decoding it at the receiver end. Since only one of the two prime factors of the semi-prime is known at either end of sender and receiver, even if the semi-prime is revealed, an interceptor is required to know both prime factors and only then the message can be decoded. Hence, with the evolution of information and communication technologies, a fast and efficient method factorization of very large semi-primes forms much interest among mathematicians and information security researchers.
With the current state of knowledge, we apply our proposed method of using the sum of squares (SoS) to factorize a semi-prime and illustrate our algorithm with case examples. In broad terms, our proposed algorithm for factorizing a semi-prime consists of three parts as given below: 1. Part 1 of the algorithm, namely new sum of squares (N-SoS1) is proposed in this paper, 2. Part 2 of the algorithm, namely polynomial-based sum of squares (P-SoS2) leveraging on another work [53], and 3. Part 3 of the algorithm, namely a modified Euler-based sum of squares (E-SoS3) factorization using an earlier work [30].
In the above algorithm, we take advantage of having special attributes which when known allows for the recovery of N-SoS1 using "simple" algebra as given below: ( , ) = (2 + 2 + 2 + + 1) + This has special properties to be discussed in a future paper. In short, it generates two numbers , of the form: ( , ) = ( + ( + 1) )(( + ) + ( + + 1) ) Numbers whose squares are one apart as per and ( + 1) enable special factorizing properties to be maintained. Hence, multiplying a semi-prime ( ) to be factored by such a number allows for the recovery of N-SOS1 for . It is conjectured in this paper that by multiplying by , leads to the determination of N-SoS1 more quickly. The algebra takes the form of 8 possible equations as explained below: We posit that once an SoS for the product of × is found, N-SoS1 for can be recovered using one of 8 equations.
This is the essence of the N-SoS1 algorithm we have proposed in this work.
Once an N-SoS1 for ( × ) becomes known a simple GCD test with a result of determines which of the 8 equations will yield N-SOS1 for . Only a simple division is then required to yield the N-SoS1 solution for . Once N-SoS1 is known, this is then used to find P-SoS2. Further, once N-SoS1 and P-SoS2 are known, a modified Euler's factorization using sum of squares (E-SoS3) is able to yield and and hence factorization is achieved.
A Java implementation of N-SoS1 uses simple arithmetic operations such as +, -, * and / as well as GCD and is provided as Supplementary Resources.
It is possible to avoid P-SoS2 once the first square becomes known as shown in Case Example 3 (below), by decrementing the square from ( , ). However, Case Example 4 illustrates that even for small numbers not using P-SoS2 creates unpredictability of finding the second SoS. The RSA case illustrates that as per Case Example 3 and 4 a solution exists but poor selection of leads to an intractable result for RSA768. Hence, from the above, the factorization of = 27161 = (157)(173).

Results
We employ our proposed method to demonstrate the factorization of a large RSA key. The commonly adopted 512-bit RSA key has been attempted to be attacked by successfully completing factorization several times with different factorization methods [22][23][24]. To take the challenge of RSA key attack further in this work, we consider a higher key size of RSA such as 768-bit RSA key to apply our proposed semi-prime factorization method with the enhanced sum of squares of primes. The RSA case illustrates that a solution exists but poor selection of leads to an intractable result for RSA768. The selection of , of ( , ) can be better determined. As presented in this paper, as n is incremented, a range of m values is searched through. This is continued until a perfect square is found. The attributes of have been determined via experimentation. The authors are currently characterizing with a view to reducing the search field for a suitable . This is an area of ongoing research.

Performance and Comparison
In Table 1, we summarize the cost performance of our proposed method of the extended sum of squares approach to factorize semi-primes of different lengths (key sizes). For the various case examples of semi-primes including RSA768 that were computed in this work, we provide the cost factors of memory used and the non-optimized search time taken in terms of decrement loops of the square for completing the semi-prime factorization using our method. If a linear search is used from the square root of = , the number of iterations is given thus. However, this can be significantly reduced by using = + ( + 2 − 1) congruent to 1 4. * decrement loops of the square (non-optimized search). Table 1 demonstrates that the memory required for our algorithm is minimal and the search cycle time is the main factor that needs to be contained for large RSA keys. It is important to note that each search cycle refers to only six steps involving rudimentary algebra consisting of multiplication, addition, subtraction and division operations exhibiting very low processing time. Next, we elaborate on the complexity of our algorithm and its comparison to other factoring algorithms.

Complexity of the Algorithm
We summarize the complexity of our proposed semi-prime factorization algorithm in terms of memory and computational time. These complexity measures are followed in similar lines to existing methods reported in the literature [55].
The memory requirement of our algorithm is very minimal with most computations operating on the accumulator (using BigInteger arithmetic). The number of memory variables used for each major part and step involved in our algorithm are given below: The manipulation of ( , ) to generate requires the use of three variables. The resultant SoS ( , ) requires the use of two variables. The recovery ( , ) requires the use of two variables. Hence, these requirements occupy 7 BigInteger values in memory.
Studying the time complexity of our proposed algorithm, we find that each of the algorithm steps is rudimentary. Steps 3 and 5 use a function, which is the most expensive function in this algorithm. Only the integer part is considered so a Newton-Rapson algorithm is used here.
Step 6 uses a GCD function. The remaining parts are simply multiplication, addition, subtraction and division operations that require minimal processing time. Continued research is underway to characterize ( , ) so that a reduced stochastic search can be undertaken in a reduced search field. Our proposed semi-prime factorization method is simple to understand and to implement. It uses rudimentary algebra of multiplication, addition, subtraction, and division operations that require minimal processing time unlike other algorithms that focus on lattice reduction (LR) techniques requiring time consuming operations. While this work is limited to an empirical study of our proposed method, formal proofs are quite extensive and are reserved for future work.

Discussion
By factoring the modulus of an RSA private key an attacker can compute the corresponding public key or vice versa. Various studies have surveyed RSA key size and limitations across public key infrastructure and new methods for attacking commonly used 512-bit and 768-bit RSA keys are continuing to interest researchers [19,56]. Most of the previous studies have studied attacks on RSA cryptosystems with specific focus. For instance, partial key exposure (PKE) attacks were studied in [57,58]. A PKE attack on low public-exponent RSA key of a variant of the RSA public-key cryptosystem was found to be less effective than for standard RSA. However, the large public exponent RSA key was reported to be more vulnerable to such attacks than for standard RSA. Some generalized studies have also been conducted comprehensively at a practical Internet scale to survey vulnerabilities of RSA key generation for various protocols such as SSH, HTTPS, and DKIM [19][20][21]59]. However, in this paper, we have considered the number theoretic properties of semi-primes that form the underlying principle behind any RSA key generation to be the fundamental cause for any RSA attack. For instance, by choosing an RSA modulus with a small difference of its prime factors, private key attacks could be improved [60]. Some early studies have considered polynomial equations to study low exponent RSA vulnerabilities and their variants [61,62]. Recent number theory-based studies in this domain have focused on lattice reduction (LR) techniques with prime number theory to study RSA key attacks [31,38,39]. Such LR based techniques come under the general category of Coppersmith attacks. However, the uniqueness of our contribution is that we propose factorization using the sum of squares that is more in line with Euler's sum of squares approach rather than Coppersmith's approach.
Whilst there is active research in LR methods, sum of squares remains an area of interest and continues to yield surprises, and it has been the motivation of this work. The subtleties in the value of are an area of interest that we explored in this study. The value of in this paper has special properties, which are described briefly.
was constructed with two semi-primes. Each of the primes has special properties. They are all congruent to 1 4. Their squares are one apart. In the case of both semi-primes, the highest square of the smallest prime was the lowest square in the larger prime. Both semi-primes are perfect squares when 1 was subtracted. Multiplying the two semiprimes together created 8 sums of 2 squares, three of which are very close together (402, 401, 399). When such a number is multiplied with a semi-prime to be factored, 32 sums of 2 squares are available. As was shown previously, only two sums of two squares are required to factor the semiprime in question. This increased the likelihood of finding two such sums of squares. Equations (4)-(8) describe the general form for . In this case = (65)(2501) = 162565. It would appear counter-intuitive to make the semi-prime to be factored ( ) larger, however this is offset by the likelihood of finding two sums of two squares near the square root of = . In the case of RSA768 the first solution was some way off from the square root. The search can be restricted by only considering congruent co-prime sums of squares that are consistent with a previous study [30]. In this way only sums of two squares which approximate need be tested and of those, only the ones equal to yield a valid solution. This substantially reduces the search cost. Since a linear search is conducted, the search field can easily be divided up and this lends itself very well to parallel processing. The mathematics in our approach is rudimentary, consisting of multiplications, additions and subtractions. The perfect square test does require a square root which is computationally costly, but this too can be avoided if the approach in [30] is used to find solutions equivalent to = + ( + 2 − 1) all of which are congruent to 1 4. Overall, the search area to find the first sum of squares is key to finding the second. Hence, the search is focused on finding the first sum of two squares. The relationship between the two squares of the sum of two squares of the semi-prime determines the spread of, and the likelihood of, finding the first sum of two squares for . Table 1 shows the search cycle time for different key sizes explored in this work. The search area can be limited by √ − 1 and /2. This can be narrowed by considering the properties of and the closeness of the sums of squares of (402, 401, 399). The smearing of across by determines the search field. This field is then divided over a number of parallel processes running a very simple (but fast) + ( + 2 − 1) algorithm. This search field to find the first sum of two squares is an area of ongoing research.

Conclusion and Future Work
We studied the semi-prime factorization and the impact on the security of the RSA cryptosystem. We proposed a novel extension to our previously established methods of semi-prime factorization using a sum of squares approach for cryptanalytic attacks on RSA modulus = with being a large semi-prime, constituting two primes and . We gradually developed our proposed technique by providing illustrations of semi-prime factorization for small and large numbers. We arrived at the conjecture that a composite consisting of unique primes ≡ 1 4, has 2 sums of squares. We provided the detailed steps involved in the implementation of our enhanced semiprime factorization algorithm. We then applied our proposed factorization algorithm to attack 768bit RSA successfully. Finally, we discussed the strengths and limitations of our algorithm by providing the complexity of the algorithm in terms of memory and search cycle time as well as its comparison to other factorization algorithms.
In future, a more extensive comparison of various RSA keys adopted in cryptosystems will be studied for key attacks using our method as against other contemporary methods such as Shor's algorithm. Our ongoing research will also focus on characterising a reduced search field in arriving at the semi-prime factorization result and the formal theoretical proofs.