1. Introduction
The RSA (Rivest–Shamir–Adleman) public-key primitive named after its inventors is the most widely used in cryptography since its introduction in 1977 [
1]. The generation of RSA keys in public key cryptosystems is based on the modulus of a positive integer 
 where 
 is a semi-prime, which is a product of two large primes 
 and 
 [
2,
3]. The computationally intractable factorization property of semi-primes when they are large has been the fundamental premise in using RSA keys for computer security in several applications [
4,
5,
6]. Modern cryptosystems make wide use of RSA encryption and digital signatures for secure message exchange and communication over different types of networks within government as well as various industry sectors [
7,
8,
9]. These include popular transactions in various applications including Internet of Things (IoT) such as mobile banking, online shopping, smart card payments, e-health and e-mail communications that are available to the common man over the Internet [
10,
11]. Despite difficulty in breaking RSA keys (i.e., semi-prime factorization), cybercrimes and RSA key attacks are still on the rise [
12,
13,
14].
A cryptanalytic attack of a short RSA key by M. J. Wiener was established as the first of its kind in 1990 [
15,
16]. Hence, the difficulty of factoring RSA modulus 
 by choosing strong prime factors 
 and 
 was considered as a solution to address these attacks. Since then, it has become a common practice to employ 512-bit RSA to provide the required strong primes for many cryptographic security protocols [
17]. Even though 512-bit RSA modulus was first factored in 1999, with the high computational power of today, there is still difficulty with 512-bit factorization in real-time applications [
18]. Therefore, 512-bit RSA keys are actively used by popular protocols for email authentication such as Domain Keys Identified Mail (DKIM), retrieving messages from email servers by the client such as Post Office Protocol 3 (POP3), Simple Mail Transfer Protocol (SMTP) and Internet Message Access Protocol (IMAP), data exchange on the web such as Hypertext Transfer Protocol Secure (HTTPS), connecting two computers such as Secure Shell (SSH), and providing privacy such as Pretty Good Privacy (PGP) [
19,
20,
21].
Enthusiastic amateurs have tried to factor 512-bit RSA keys and in 2012, Zachary Harris factored the 512-bit DKIM RSA keys used by Google and several other major companies in 72 h per key, by using distributed cloud services [
22,
23,
24]. The Factoring as a Service (FaaS) project demonstrates that a 512-bit RSA key can be factored reliably within four hours in a public cloud environment [
19]. Factorization of the 768-bit RSA key has also been demonstrated using various methods such as the number field sieve factoring method and is several thousand times more difficult than 512-bit RSA keys. The higher the key size, the more difficult it is to factor the RSA key. Hence, with new methods and progress in computing power over time, there are risks and implications for the future of RSA. This forms the motivation for studying the mathematical principles of semi-prime factorization in proposing a novel method to increase the likelihood of breaking an RSA key.
Euler’s factorization of a semi-prime 
 is based on the property of 
 both congruent to 1 mod 4 [
25]. These constructions are based on Pythagorean primes that are applied in RSA [
26,
27,
28]. These contribute to only a quarter of the possible combinations of primes and the rest of the combinations congruent to 3 mod 4 are found based on the Gaussian prime extension method [
29]. The semi-prime has only two sums of two squares in the range of possible squares from 
, and therefore, we extend previously established methods to increase the likelihood that a sum of two squares can be found [
30,
31]. Even though many sums of squares exist, once any two of them are found, we show that it is sufficient to only find two solutions to factorize the original semi-prime. Our enhanced factorization method is an improvement of our previous work. We apply our method for case scenarios and provide the necessary conjectures. Our algorithm is practically simple and is implemented using rudimentary arithmetic operations that require minimal computational memory with search cycle time being a factor to be considered for very large semi-primes. Further, we demonstrate the successful breaking of RSA keys such as 768-bit RSA verified through the implementation of our algorithm in Java. We provide the results highlighting the complexity of our enhanced factorization algorithm and comparing the performance with other factorization algorithms laying the scope for future research as well.
The rest of the paper is organized as follows. 
Section 2 discusses related work and the uniqueness of our work. 
Section 3 provides our enhanced semi-prime factorization by extending the sum of squares method and specifies our algorithm with implementation steps. In addition, the performance of the algorithm, its complexity and comparison with other factoring methods are described. 
Section 4 demonstrates the correctness of our algorithm when applied to break the 768-bit RSA key. 
Section 5 discusses the results and finally conclusions along with future research directions are given in 
Section 6.
  2. Background
The difficulty of the semi-prime factorization problem forms the essential aspect to the security of an RSA cryptosystem. Revisiting RSA cryptography [
1,
3,
17], assuming that 
 and 
 are two primes used to generate a semi-prime 
, the Euler’s totient function 
 is given by
      
In RSA based public key cryptography, two different keys known as public and private keys are used to perform the encryption and decryption of data or messages [
32,
33]. Any sensitive information is encrypted with a public key and it requires a matching private key to decrypt it. The public key 
 is chosen arbitrarily as a pair 
 where 
 is an integer and not a factor of 
 and 
. The private key 
 is based on tuples 
, such that 
 and 
 is determined using the extended Euclidean algorithm 
. A public key 
 is used to encrypt a message 
, into a cipher text 
, such that
      
To retrieve the original message, the corresponding private key 
 is used to decrypt the encrypted message such that
      
RSA cryptosystems make use of public key and private key generation techniques for security of data and end-to-end security of information transmission that cannot be understood by anyone except the intended recipients. These techniques are employed to authenticate the sender and the receiver of a message and to ensure that the integrity of the data or message received without being tampered with. However, the problem of determining  from  is equivalent to finding factors of RSA modulus . Hence, choosing strong primes for  and  is very important such that the factorization of the semi-prime , becomes computationally infeasible and nontrivial for an adversary. In practical applications, a smaller private key may be used for a faster decryption algorithm to improve the computational speed of online transactions. Once the private key is found, it can result in a total breaking of RSA posing a great security risk. Hence, an enhanced prime factorization method to attack the small decryption exponent could pose serious security challenges to RSA cryptosystems that are widely adopted even today.
Several studies have attempted to perform a general survey of attacks on the RSA cryptosystem since its introduction [
16,
19,
34]. In 1990, Wiener introduced the first RSA cryptanalytic attack showing that if the decryption exponent is small with an upper bound given by 
 using the continued fractions method [
35]. Subsequently, Boneh and Durfee [
36] proposed an attack on short decryption exponents with an improved upper bound while the RSA attack by Blomer and May [
37] had demonstrated an upper bound of 
 with lattices of smaller dimension. Coppersmith’s technique used lattice reduction approach for finding small solutions of modular bivariate integer polynomial equations [
38].
Most of the recent research has also been focusing on extending the number range upper bound of 
 and 
 in the RSA private and public keys by working on the limitation of the Wiener and Coppersmith methods by approaching the problem differently. One recent work [
39] considers the prime sum 
 using sublattice reduction techniques and Coppersmith’s methods for finding small roots of modular polynomial equations, achieving slight improvement in the upper bound with reduced lattice dimension. Another work [
17] uses a small prime difference 
 method which is then developed into a continuous fraction as per Wiener’s original method. While these extend the range of Boneh’s original limit, the Lenstra–Lenstra–Lovász (LLL) method is still seen as the state of the art which was first proposed by Lenstra and Lenstra in 1991 [
40].
A survey on the history of number theory reveals that it has been explored widely by mathematicians to establish different representations of integers, in particular prime numbers, with a view to arrive at more efficient methods in deriving them [
41,
42,
43,
44,
45,
46,
47]. Up until about a decade ago, there had been a strong mathematical interest in polynomials which generated sums of squares. However, recently there is a reviving interest in their application to practical experimentations for establishing their rudimentary computations and implications to cryptography [
29,
48,
49,
50,
51,
52]. In line with these approaches, it has been proved in our previous work that the semi-primes can be represented as the sum of four squares [
30]. A new factorization method as a faster alternative to Euler’s method [
25] was proposed by establishing the relationship among the four squares. Our interest is to apply this method in a novel way to the semi-prime factorization problem and is part of our ongoing research. This paper aims to explore further, on new findings of the method that once one sum of the squares is known, this can be used to find the other.
  3. Proposed Method
We consider the basis of an earlier work [
25,
42] that showed a semi-prime
, constructed from two primes, 
 and 
 is also congruent to 
. Further, in [
30], it was established that a sum of four squares, could be reduced to two sums of two squares using the Brahmagupta–Fibonacci identity given in [
49]. We note that Gaussian primes are of the form 
, and cannot be represented as the sum of two squares [
29]. On the other hand, Pythagorean primes are of the form 
 [
26,
27,
28,
53]. In accordance with Fermat’s Christmas theorem, an odd prime 
 can be represented as the sum of two squares of integers 
 and 
 if and only if 
 [
42,
54]. This property was useful to determine which numbers could be represented as the sum of two squares, which was later proved by Euler [
25].
In an earlier work [
30], it was proved that if a semi-prime is constructed using two Pythagorean primes of the form 
 then Euler’s factorization can be used to find two representations as the sum of two squares. Finding these two representations is non-trivial and computationally intensive for large numbers even with high performance computers. We make use of a previously established property that all Pythagorean triples can be represented as 
 [
28]. This equation provides a computationally simpler search using increments of 
 and fine convergence using 
. In this paper, we extend our related works that were reported previously to show that once one sum of the squares is known, it can be used to find the other. We provide our proposed method taking a step by step approach. First, we apply our method to semi-prime factorization of two simple case examples and arrive at a conjecture. Following this, we apply our proposed method to two more large semi-primes as case examples. Finally, we demonstrate in the next section, the breaking of 768-bit RSA using our proposed method as the final result achieved.
Case Example 1: Consider the semi-prime 
The semi-prime 65 consists of two primes 
By applying Brahmagupta identity [
49], 
Case Example 2: Consider the semi-prime 
The semi-prime 2501 consists of two primes 
By applying Brahmagupta identity [
49], 
From Case Example 1 and Case Example 2, a general equation can be derived as follows:
By using the Brahmagupta identity [
49], we get
      
For Case Example 1, 
For Case Example 2, 
Let 
Considering Case Example 2, we have, 
Conjecture: A composite consisting of  unique primes , has  sums of squares
Let 
The four possible combinations are expanded to produce the 4 sums of 4 squares. Using the Brahmagupta identity [
49] and Equations (1)–(3), we obtain 8 sums of 2 squares.
      
In accordance with the conjecture, we can arrive at the following:
By employing Equations (4)–(8), when  the 8 sums of 2 squares can be derived as follows:
Let us consider  which approximates to 403.
The first 3 sums of two squares can easily be found by decrementing from  such that a perfect square remains. By applying Equations (1)–(5), we arrive at the following sequence of steps:
In an earlier work [
30], the modified Euler factorization was given by 
Consider the solutions (402,31), (401,42)
      
Consider the solutions (401,42), (399,58)
      
A factorization of 
In this way, by using the modified Euler factorization from previous work [
30], the factors of a compound number can be found. This can be extended to compound numbers where the factorization is not known.
  Factorizing a Semi-Prime
With an increase in the application of number theory in information security, it has drawn the attention of researchers to explore the interesting problem of factorizing a semi-prime, which is a positive integer  that has two prime factors,  and  forming . Encryption algorithms such as RSA and public-key cryptosystems rely on special large prime factors of a semi-prime for encoding a sender’s message and decoding it at the receiver end. Since only one of the two prime factors of the semi-prime is known at either end of sender and receiver, even if the semi-prime is revealed, an interceptor is required to know both prime factors and only then the message can be decoded. Hence, with the evolution of information and communication technologies, a fast and efficient method factorization of very large semi-primes forms much interest among mathematicians and information security researchers.
With the current state of knowledge, we apply our proposed method of using the sum of squares (SoS) to factorize a semi-prime and illustrate our algorithm with case examples. In broad terms, our proposed algorithm for factorizing a semi-prime consists of three parts as given below:
Algorithm of N-SoS1 involves performing the following seven key steps:
- Step 1.
- Generate a special number () such that  
- Step 2.
- Multiply  × , recalling  
- Step 3.
- Find  × ) 
- Step 4.
- Subtract Integer part of Square root 
- Step 5.
- Test for Perfect Square 
- Step 6.
- If perfect square recover N-SoS1 for  GOTO P-SoS2 
- Step 7.
- If NOT perfect square, Increment  and GOTO Step 1. 
In the above algorithm, we take advantage of 
 having special attributes which when known allows for the recovery of N-SoS1 using “simple” algebra as given below:
This has special properties to be discussed in a future paper. In short, it generates two numbers 
 of the form:
Numbers whose squares are one apart as per  and  enable special factorizing properties to be maintained. Hence, multiplying a semi-prime () to be factored by such a number  allows for the recovery of N-SOS1 for . It is conjectured in this paper that by multiplying  by , leads to the determination of N-SoS1 more quickly. The algebra takes the form of 8 possible equations as explained below:
We posit that once an SoS for the product of  ×  is found, N-SoS1 for  can be recovered using one of 8 equations.
- Let  ×  =  
The result of these equations is then the greatest common divisor (GCD) with ( × ) and if the GCD result is  for any one (or two) of these equations then N-SOS1:  =  is recovered.
This is the essence of the N-SoS1 algorithm we have proposed in this work.
Once an N-SoS1 for ( × ) becomes known a simple GCD test with a result of  determines which of the 8 equations will yield N-SOS1 for . Only a simple division is then required to yield the N-SoS1 solution for . Once N-SoS1 is known, this is then used to find P-SoS2. Further, once N-SoS1 and P-SoS2 are known, a modified Euler’s factorization using sum of squares (E-SoS3) is able to yield  and  and hence factorization is achieved.
A Java implementation of N-SoS1 uses simple arithmetic operations such as +, −, * and / as well as GCD and is provided as Supplementary Resources.
It is possible to avoid P-SoS2 once the first square becomes known as shown in Case Example 3 (below), by decrementing the square from (. However, Case Example 4 illustrates that even for small numbers not using P-SoS2 creates unpredictability of finding the second SoS. The RSA case illustrates that as per Case Example 3 and 4 a solution exists but poor selection of  leads to an intractable result for RSA768.
Case Example 3: Consider a semi-prime  along with an upscaling number 
Let 
We provide the results obtained after search iterations as follows:
Iteration 1:
Iteration 16:
Iteration 19:
 (18 search cycles required)
	  
A factorization of 
From which the factorization of .
Case Example 4: Consider another semi-prime  along with an upscaling number 
Let 
We provide the results obtained after search iterations as follows:
Iteration 1: 
Iteration 14: 
Iteration 27:
Iteration 155:
 (154 search cycles required)
A factorization of 
However, this did not lead to the factorization of 
 and an additional sum of squares is required.
A factorization of 
Hence, from the above, the factorization of 
  4. Results
We employ our proposed method to demonstrate the factorization of a large RSA key. The commonly adopted 512-bit RSA key has been attempted to be attacked by successfully completing factorization several times with different factorization methods [
22,
23,
24]. To take the challenge of RSA key attack further in this work, we consider a higher key size of RSA such as 768-bit RSA key to apply our proposed semi-prime factorization method with the enhanced sum of squares of primes.
  4.1. The Case of 768-bit RSA Key (RSA768)
Consider the semi-prime 
Let 
From the modified Euler factorization of previous work [
30], we have
		
p1
The factorization of 
The RSA case illustrates that a solution exists but poor selection of  leads to an intractable result for RSA768. The selection of  of  can be better determined. As presented in this paper, as n is incremented, a range of m values is searched through. This is continued until a perfect square is found. The attributes of have been determined via experimentation. The authors are currently characterizing  with a view to reducing the search field for a suitable . This is an area of ongoing research.
  4.2. Performance and Comparison
In 
Table 1, we summarize the cost performance of our proposed method of the extended sum of squares approach to factorize semi-primes of different lengths (key sizes). For the various case examples of semi-primes including RSA768 that were computed in this work, we provide the cost factors of memory used and the non-optimized search time taken in terms of decrement loops of the square for completing the semi-prime factorization using our method. If a linear search is used from the square root of 
 the number of iterations is given thus. However, this can be significantly reduced by using 
 congruent to 
.
Table 1 demonstrates that the memory required for our algorithm is minimal and the search cycle time is the main factor that needs to be contained for large RSA keys. It is important to note that each search cycle refers to only six steps involving rudimentary algebra consisting of multiplication, addition, subtraction and division operations exhibiting very low processing time. Next, we elaborate on the complexity of our algorithm and its comparison to other factoring algorithms.
   4.2.1. Complexity of the Algorithm
We summarize the complexity of our proposed semi-prime factorization algorithm in terms of memory and computational time. These complexity measures are followed in similar lines to existing methods reported in the literature [
55].
The memory requirement of our algorithm is very minimal with most computations operating on the accumulator (using BigInteger arithmetic). The number of memory variables used for each major part and step involved in our algorithm are given below:
- The manipulation of () to generate  requires the use of three variables. - The resultant SoS ( requires the use of two variables. - The recovery ( requires the use of two variables. - Hence, these requirements occupy 7 BigInteger values in memory. 
Studying the time complexity of our proposed algorithm, we find that each of the algorithm steps is rudimentary. Steps 3 and 5 use a  function, which is the most expensive function in this algorithm. Only the integer part is considered so a Newton–Rapson algorithm is used here. Step 6 uses a GCD function. The remaining parts are simply multiplication, addition, subtraction and division operations that require minimal processing time.
  4.2.2. Comparison to other Factoring Algorithms
Part 1 of our proposed factorization algorithm (N-SoS1) currently requires two input variables   is usually small and : .
The current algorithm calculates  and each cycle of  creates a larger and larger range for .  when . For .
Currently, a brute force search is conducted from ( until an SoS is found. This stochastic search is no better than Fermat’s factorization method. However, the following Case Example 5 illustrates the gain in the search cycle time performance of our proposed factorization method when  is contained within a search field.
Case Example 5: For  when defined achieves the result in much shorter overall search cycle time.
- A result for = 15325 generates the perfect square required. - From this, an SoS of  is obtained. - Further, N-SoS1 =  - P-SoS2 =  -  (using E-SoS3) 
Continued research is underway to characterize  so that a reduced stochastic search can be undertaken in a reduced search field. Our proposed semi-prime factorization method is simple to understand and to implement. It uses rudimentary algebra of multiplication, addition, subtraction, and division operations that require minimal processing time unlike other algorithms that focus on lattice reduction (LR) techniques requiring time consuming operations. While this work is limited to an empirical study of our proposed method, formal proofs are quite extensive and are reserved for future work.
  5. Discussion
By factoring the modulus of an RSA private key an attacker can compute the corresponding public key or vice versa. Various studies have surveyed RSA key size and limitations across public key infrastructure and new methods for attacking commonly used 512-bit and 768-bit RSA keys are continuing to interest researchers [
19,
56]. Most of the previous studies have studied attacks on RSA cryptosystems with specific focus. For instance, partial key exposure (PKE) attacks were studied in [
57,
58]. A PKE attack on low public-exponent RSA key of a variant of the RSA public-key cryptosystem was found to be less effective than for standard RSA. However, the large public exponent RSA key was reported to be more vulnerable to such attacks than for standard RSA. Some generalized studies have also been conducted comprehensively at a practical Internet scale to survey vulnerabilities of RSA key generation for various protocols such as SSH, HTTPS, and DKIM [
19,
20,
21,
59]. However, in this paper, we have considered the number theoretic properties of semi-primes that form the underlying principle behind any RSA key generation to be the fundamental cause for any RSA attack. For instance, by choosing an RSA modulus with a small difference of its prime factors, private key attacks could be improved [
60]. Some early studies have considered polynomial equations to study low exponent RSA vulnerabilities and their variants [
61,
62]. Recent number theory-based studies in this domain have focused on lattice reduction (LR) techniques with prime number theory to study RSA key attacks [
31,
38,
39]. Such LR based techniques come under the general category of Coppersmith attacks. However, the uniqueness of our contribution is that we propose factorization using the sum of squares that is more in line with Euler’s sum of squares approach rather than Coppersmith’s approach.
Whilst there is active research in LR methods, sum of squares remains an area of interest and continues to yield surprises, and it has been the motivation of this work. The subtleties in the value of  are an area of interest that we explored in this study. The value of  in this paper has special properties, which are described briefly.  was constructed with two semi-primes. Each of the primes has special properties. They are all congruent to . Their squares are one apart. In the case of both semi-primes, the highest square of the smallest prime was the lowest square in the larger prime. Both semi-primes are perfect squares when 1 was subtracted. Multiplying the two semi-primes together created 8 sums of 2 squares, three of which are very close together (402, 401, 399). When such a number is multiplied with a semi-prime to be factored, 32 sums of 2 squares are available. As was shown previously, only two sums of two squares are required to factor the semi-prime in question. This increased the likelihood of finding two such sums of squares. Equations (4)–(8) describe the general form for . In this case 
It would appear counter-intuitive to make the semi-prime to be factored 
 larger, however this is offset by the likelihood of finding two sums of two squares near the square root of 
. In the case of RSA768 the first solution was some way off from the square root. The search can be restricted by only considering congruent co-prime sums of squares that are consistent with a previous study [
30]. In this way only sums of two squares which approximate 
 need be tested and of those, only the ones equal to 
 yield a valid solution. This substantially reduces the search cost. Since a linear search is conducted, the search field can easily be divided up and this lends itself very well to parallel processing. The mathematics in our approach is rudimentary, consisting of multiplications, additions and subtractions. The perfect square test does require a square root which is computationally costly, but this too can be avoided if the approach in [
30] is used to find solutions equivalent to 
 all of which are congruent to 
.
Overall, the search area to find the first sum of squares is key to finding the second. Hence, the search is focused on finding the first sum of two squares. The relationship between the two squares of the sum of two squares of the semi-prime 
 determines the spread of, and the likelihood of, finding the first sum of two squares for 
. 
Table 1 shows the search cycle time for different key sizes explored in this work. The search area can be limited by 
 and 
. This can be narrowed by considering the properties of 
 and the closeness of the sums of squares of 
 (402, 401, 399). The smearing of 
 across 
 by 
 determines the search field. This field is then divided over a number of parallel processes running a very simple (but fast) 
 algorithm. This search field to find the first sum of two squares is an area of ongoing research.