New Semi-Prime Factorization and Application in Large RSA Key Attacks

: Semi-prime factorization is an increasingly important number theoretic problem, since it is computationally intractable. Further, this property has been applied in public-key cryptography, such as the Rivest–Shamir–Adleman (RSA) encryption systems for secure digital communications. Hence, alternate approaches to solve the semi-prime factorization problem are proposed. Recently, Pythagorean tuples to factor semi-primes have been explored to consider Fermat’s Christmas theorem, with the two squares having opposite parity. This paper is motivated by the property that the integer separating these two squares being odd reduces the search for semi-prime factorization by half. In this paper, we prove that if a Pythagorean quadruple is known and one of its squares represents a Pythagorean triple, then the semi-prime is factorized. The problem of semi-prime factorization is reduced to the problem of ﬁnding only one such sum of three squares to factorize a semi-prime. We modify the Lebesgue identity as the sum of four squares to obtain four sums of three squares. These are then expressed as four Pythagorean quadruples. The Brahmagupta–Fibonacci identity reduces these four Pythagorean quadruples to two Pythagorean triples. The greatest common divisors of the sides contained therein are the factors of the semi-prime. We then prove that to factor a semi-prime, it is sufﬁcient that only one of these Pythagorean quadruples be known. We provide the algorithm of our proposed semi-prime factorization method, highlighting its complexity and comparative advantage of the solution space with Fermat’s method. Our algorithm has the advantage when the factors of a semi-prime are congruent to 1 modulus 4. Illustrations of our method for real-world applications, such as factorization of the 768-bit number RSA-768, are established. Further, the computational viabilities, despite the mathematical constraints and the unexplored properties, are suggested as opportunities for future research.


Introduction
Prime numbers have caught the attention of mathematicians since the work of Euclid due to their unfathomable structural properties. This paper leverages some elegant properties of prime numbers beyond their basic definition of being divisible by themselves and one only. They also possess the property of being randomly distributed, which is not exploited fully [1][2][3]. The current perception is that there appears to be only a limited understanding of their underlying structure, and several mathematicians are constantly trying to uncover the mysteries behind these prime numbers. There is still much to be carried out, and areas of further interest are channeled towards a better understanding of the structure of primes for arriving at faster prime number generating algorithms and faster solutions to the prime factorization problem [4][5][6][7]. There is a need for generating more robust primes that are less susceptible to known factorization methods. In this paper, we draw attention to creating new approaches for the prime factorization of large prime factors or the semi-prime factorization problem. The focus of this research problem has an impact worldwide due to its practical application in digital communication, and in particular, the associated information security attacks and privacy challenges of today [8,9].
In information security, large semi-primes have applications in encryption algorithms. They are used for generating public keys and private keys, such as in the Rivest-Shamir-Adleman (RSA) cryptosystems [10]. The property that the prime factorization of large numbers is a challengingly difficult task is well utilized in RSA-based encryption algorithms. Due to the dominant application of the RSA public-key primitive in cryptography, the security of RSA has been extensively analyzed for various attack scenarios [11]. In this paper, we take a modest step further by proposing new methods of semi-prime factorization of the RSA primitive.
Previous research [1] proved that the semi-primes can be represented as the sum of four squares. A new factorization method, by exploiting the relationship among the four squares, was proposed as a faster alternative to Euler's method, as given in [12]. The purpose of this paper is to explore the topic of factorization further with a computationally simple approach for applications in RSA cryptography. In earlier work [1], we showed that a semi-prime N could be constructed from two primes, p 1 and p 2 , in accordance with Fermat's Christmas theorem, as given in [13]. In other words, Fermat stated that an odd prime p can be represented as the sum of two squares of integers x 1 and x 2 , if and only if 1 (mod 4) ≡ p = x 2 1 + x 2 2 . Hence, this determined which numbers can be represented as the sum of two squares and was later proved by Euler [12]. It is also mathematically represented that the semi-prime product is congruent to 1(mod 4) ≡ p 1 p 2 . This paper advances further by applying the Brahmagupta-Fibonacci identity uniquely to establish that the product of these two primes (which is a sum of four squares) could be mathematically reduced to two sums of two squares [14]. In other words, we prove mathematically how the Brahmagupta-Fibonacci identity reduces these four Pythagorean quadruples to two Pythagorean triples. Hence, our proposed factorization method based on this property would be computationally successful whenever the semi-prime, a product of two primes, is such that both factors are congruent to 1 modulus 4.
In this work, we leverage on the gaps found in the literature towards providing a novel proposal for semi-prime factorization. While there are several properties of Pythagorean triples, new patterns based on these properties are yet to be researched in the context of semi-prime factorization [15]. The main contributions of the paper are envisaged via the key features of our proposed factorization method, as listed below: i.
The novel semi-prime factorization method uses simple number theory uniquely for the first of its kind; ii. The method applies new patterns of Pythagorean tuples and triples that are unexplored so far in literature; iii. By employing simple arithmetic operations, the semi-prime factorization algorithm assures a low order of computing cost; iv. The algorithm exhibits an enhanced solution space as compared to Fermat's method.
The paper is organized as follows: Section 2 provides related works, and Section 3 postulates the background theory of our proposed new method, which includes definitions, a theorem, and four lemmas. In Section 4, we show how to factorize semi-primes using Pythagorean quadruples and triples by proving the theorems and the four lemmas. Further, we provide the algorithm of our proposed semi-prime factorization method and summarize its complexity, comparison, and constraints with existing similar works qualitatively. In Section 5, case study examples illustrate the ease of computing the factors of a semiprime integer with the proposed approach numerically. Moreover, we demonstrate the application of the method for RSA-768 successfully. Finally, we draw key conclusions and future research directions in Section 6.

Related Works
Several studies on the important problem of semi-primes factorization have challenged the intractability of RSA [16][17][18]. Since the first RSA cryptanalytic attack by Wiener in 1990 using the continued fractions method [19], several methods have been explored, such as the lattice reduction approach by Coppersmith [20] and by Blomer and May [21], as well as the Boneh and Durfee attack on short decryption exponents of RSA [22]. More recent works approach the problem differently, with varying purposes, such as to find weak RSA keys for transport layer security (TLS) attacks [23], LogJam [24], or to factorize RSA keys for smartcard cryptography attacks on several devices [25].
Historically, general-purpose factorization of integers can be dated back to the continued fraction factorization method (CFRAC) introduced by Lehmer and Powers in 1931 [26], which was later implemented as a computer algorithm by Morrison and Brillhart [27]. Subsequently, Pomerance and Wagstaff devised an improved algorithm [28], and such works including Pollard's ρ algorithm [29] and Pomerance's quadratic sieve method (1985) [30] lay the foundation for generating interests in general-purpose factorization of integers in various applications. In the context of RSA cryptography applications, recent works have considered such known factorization methods to focus on security parameters such as the length of the prime factors p 1 and p 2 , of the RSA modulus n = p 1 p 2 , or other structural properties of the primes. Modifications of existing methods have become popular recently, such as using the prime sum p 1 + p 2 with sublattice reduction techniques and Coppersmith's methods [31] or using a small prime difference p 1 − p 2 method with Wiener's original method [32]. The method by Lenstra's elliptic-curve method also serves as the state-of-the-art proposal for several future studies [33]. A recent work has considered Chengs's 4p − 1 method [34] to provide simplified and asymptotically deterministic versions, as it is similar to the well-known Lenstra's methods [35]. The study analyzed existing methods as the means of a potential backdoor for the RSA primes generated.
A survey of the literature shows that the factorization problem of prime numbers is gaining research popularity, with a recent focus towards developing efficient mathematical techniques that are computationally faster and simpler [36][37][38][39]. Recent research interest in polynomials, which generate sums of squares, has featured applications to cryptography [40][41][42][43][44][45]. In this context, our previous works leverage the semi-prime representation as the sum of four squares [1,46] with an enhancement to Euler's method [47,48]. Such a factorization method focusing on the special form of primes allows for an efficient factorization of RSA moduli. Thus, an adversary is motivated to subvert the prime generation to produce such RSA keys and could serve as a backdoor. With this view, we propose a new semi-prime factorization based on unique properties of Pythagorean triples with new mathematical theories and underlying patterns that are unexplored so far, with the purpose of advancing our research further in this direction.

Background Theory
This section provides the background theory that forms the mathematical foundations of this research work. Mathematical proofs, along with a summary of the definitions and a theorem with the supporting lemmas that are used for our proposed semi-prime factorization approach, are given below.
Let us consider the Brahmagupta-Fibonacci identity [14], which expresses the product of two prime sums of two squares as two sums of two squares in two different ways with the following mathematical representation: Jacobi [2] provides various sums of four squares for a particular number. Among these, Equation (1) is a special case in the set of possible r 4 (n) solutions.
Let us consider Lagrange's four-square theorem, also known as Bachet's conjecture [49], which states that every natural number can be represented as the sum of four integer squares with the following mathematical representation: k , y k are integers such that n = y 2 1 + y 2 2 + y 3 3 + y 2 4 According to Legendre's three-square theorem [50], a natural number n can be represented as the sum of three squares of integers as follows:

are integers
From Lebesgue's identity [51], the square of the sum of four squares can be given by the sum of three squares and can be represented as follows: Definitions 1. The definitions of Pythagorean triple and Pythagorean quadruple are given below [52].
A Pythagorean triple is an ordered triple of distinct positive integers (α, β, N) with a mathematical notation as follows [48]: A Pythagorean quadruple is an ordered quadruple of positive integers (z 1 , z 2 , z 3 , N) expressed mathematically as follows: In this paper, we prove the following theorem with some lemmas: Theorem 1. If a Pythagorean quadruple has a square of a triple and N is semi-prime, then N can be factored.
Lemma 1. The square product of two primes (each sum of two squares) is four sums of three squares.

Lemma 2.
The square of a semi-prime is four sums of two squares.

Lemma 3.
The greatest common divisor of all the two squares factors the semi-prime.

Lemma 4. One of the squares of a Pythagorean quadruple represents a Pythagorean triple.
Lemma 1 produces the four Pythagorean quadruples. Lemma 2 reduces the four Pythagorean quadruples to two Pythagorean triples. Lemma 3 factors the semi-prime. Lemma 4 reduces a Pythagorean quadruple to a triple. Overall, the theorem shows that it is sufficient to only find such a Pythagorean quadruple to factor the semi-prime.

Proposed Method for Factoring Semi-Primes
The inherent hardness of finding the prime factors of large semi-primes forms the fundamental premise for RSA robustness [53]. However, this holds good until there is an efficient method found to compute the unknown prime factors of RSA keys, which are essentially large semi-prime numbers [54]. While several methods for integer factorization exist in general [55][56][57], none of them focus on semi-prime factorization relating to RSA cryptosystems. Hence, in this paper we will take this challenge and focus on solving the semi-prime factorization problem with a proposed reduced problem approach. Through the rest of the paper, it will be shown that the problem can be reduced to finding only one suitable sum of three squares to factorize a semi-prime. We provide a mathematical proof for our proposed approach in this section.
Proposed method: "The sum of four squares factorization" generates r4n solutions [48] (Equation (3)), of which only one solution is applicable to the Brahmagupta-Fibonacci identity, namely Equations (1) and (2). The remaining solutions do not lead to solutions for the semi-prime factorization. Some simple case examples are given below.
Once the two sums of two squares are known, the prime factors can be found using a modified Euler factorization [1].
Restating Theorem 1. If a Pythagorean quadruple has a square of a triple and N is semi-prime, then N can be factored. Some lemmas required are provided below.
Lemma 1. The square product of two primes (each sum of two squares) is four sums of three squares.
Proof. Consider the Lebesgue identity. The square of the sum of four squares can be given by the sum of three squares. From Equation (4), we have: The four sums of three squares are thus: The four Pythagorean quadruples from Equations (9)-(12) are given as: Lemma 2. The square of the semi-prime is four sums of two squares.

Proof:
We have the following product of two sums of squares: Using the Brahmagupta-Fibonacci identity, we obtain: We make use of Equations (7) and (8), and we have the four sums of two squares for (c 1 c 2 ) 2 , as follows: Two of the four Pythagorean triples from Equations (18) and (19) have no common factors. The other two of the four Pythagorean triples, which are factorable from Equations (7) and (8), are given by: Lemma 3. The greatest common divisors of the sums of two squares factor the semi-prime.
Proof. Consider the semi-prime n = c 1 c 2 and assume that c 1 , c 2 = 1 (mod 4), i.e., c 1 and c 2 are the sum of two squares, then we have: From Equation ( The greatest common divisors of the sums of two squares are the factors of the semiprime.
Lemma 4. There exists a Pythagorean quadruple, a square of which represents a triple.
Proof. From Pythagorean quadruple Equation (6) given by N 2 = z 2 1 + z 2 2 + z 2 3 and Pythagorean triple Equation (5) given by N 2 = α 2 + β 2 , α = β we have the following: From Equation (13), it can be seen that: If a Pythagorean quadruple has a square of a triple, the semi-prime can be factored.

Proof. From Lemma 1, we have:
The four Pythagorean quadruples from Equation (13) are given as: From Lemma 4, there exists a Pythagorean quadruple, a square of which represents a triple.

Algorithm of Our Proposed Semi-Prime Factorization
Cryptographic algorithms make use of standard integer factorization algorithms found in the literature, such as Pollard's factoring algorithm, Lenstra's elliptic curve factorization algorithm, and others with different kinds of number sieves [58,59]. For our algorithm to factor semi-primes, we capitalize on the properties of semi-prime representations as the sum of three squares, which is supported by well-proven theorems [60]. Based on the theoretical background postulated in this work, our algorithm consists of four key steps, as given below: Step 1. Square the semi-prime to be factored: N 2 .
Step 2. Find sums of three squares such that: Step 3.2 Test for perfect square // use a square root function) Step 4. Calculate and output the semi-prime factors c 1 and c 2 such that N = c 1 c 2 Step 4.1 c 1 = gcd(α, β) // use gcd function Step 4.2 c 2 = N c 1

Complexity, Comparison and Constraints of Our Algorithm
We summarize the complexity of our proposed semi-prime factorization algorithm in terms of memory and computational time. These complexity measures are followed in similar lines to existing methods reported in the literature [61]. The memory requirement of our algorithm is very minimal, with most computations operating on the memory variables (N, x, a, b, α, β), which use BigInteger arithmetic. In terms of time complexity, our method compares favorably against Fermat's factorization method, in that only one solution exists in Fermat's method. As was shown in Case Example 2, our method has identified 45 solutions, of which 11 could lead directly to a factorization and the others may as well, if their sums of squares are part of a tree that reveals other sums of squares.
In our algorithm, the initial step of squaring the number to be factorized could be a constraint for large semi-prime numbers. Further, the method is also probabilistic, while in many ways it is comparable to the stochastic nature of finding a sum of two squares. However, the advantage of our algorithm is that there are many possible solutions in the set of the sum of three squares, making it more likely that such a solution also leads towards finding a sum of two squares. Moreover, once a sum of three squares is known, the squares themselves form trees, which will be explored in future work. For instance, in Case Example 2 discussed earlier, 45 possible sums of squares were obtained. Among these, 11 created direct solutions to the factorization of the semi-prime and our method provides more possibilities of the solution space. Otherwise, using common approaches such as Fermat's, only two sums of squares exist for the semi-prime. Therefore, using our method, the probability of finding a solution is greatly enhanced and, in this case, it is more likely by an order of six times than the common methods. Further, traversing the tree of squares, if any of the 45 possible solutions are discovered, then any of these may lead to a factorization solution, which is 22 times more likely to lead to a solution. These trees provide additional sums of three squares, some of which will lead to a sum of two squares. Once a sum of two squares is known, the search for the second sum of squares is contained in a subset, thereby reducing the search space via the efficient use of polynomials to factorize semi-primes quickly [1].

Case Study Examples Applied to RSA Key Factorization
In this section, we illustrate the application of our proposed semi-prime factorization method using case study examples. An application area of particular interest in considering specific variants of RSA is how both small and large encryption keys perform within our proposed factorization approach. With the evolution of the Internet of things (IoT), the emergence of lightweight cryptography is on the rise. However, due to the relatively low computational power of personal devices, malicious attacks are recently targeting IoT networks [62][63][64]. Since the security of cryptographic operations in both small keys as well as large keys depends upon whether the semi-prime factorization can be solved efficiently, we consider one application example for each of the two scenarios.
Application Example 1. Let us consider Case Example 2 with N = 377.
The 45 Pythagorean quadruples are: (12,81,368,377), (12,108,361,377), (12,156,343,377), (12,224,303,377), (15,252,280,377), (17,144,348,377), (17,192,324,377), (24,143,348,377), (24,177,332,377), (28,72,369,377), (28,252,279,377), (39,72,368,377), (39,208,312,377), (44,207,312,377), (44,228,297,377), (64,252,273,377), (72,199,312,377), (72,252  Application Example 2. In computer security applications, the aim of a cryptosystem is to encrypt a message before communication, and only the authenticated user that has the right key can be able to decrypt the message, thereby enforcing message integrity, confidentiality, authentication, and privacy. Among several ciphers developed for this purpose, RSA is widely used in key exchange, digital signatures, and small blocks of data encryption. For the second application example, we consider RSA where the key is derived from a very large number (a semi-prime). Since determining the two prime factors of the large number is computationally difficult, research studies on evaluating the RSA and the factorization schemes are gaining attention [65][66][67]. Hence, we describe how our proposed semi-prime factorization method can be applied to factorize RSA quickly.
While the public key ( , ) is distributed for encrypting a message, the private key ( , ) is kept a secret for decrypting the message only by an authorized entity. Hence, based on this RSA scheme, it is evident that with the factorization of , we can compute ≡ 1 (mod φ( )) [68], which gives the private key ( , ). When the factorization is ef- Application Example 2. In computer security applications, the aim of a cryptosystem is to encrypt a message before communication, and only the authenticated user that has the right key can be able to decrypt the message, thereby enforcing message integrity, confidentiality, authentication, and privacy. Among several ciphers developed for this purpose, RSA is widely used in key exchange, digital signatures, and small blocks of data encryption. For the second application example, we consider RSA where the key is derived from a very large number (a semi-prime). Since determining the two prime factors of the large number is computationally difficult, research studies on evaluating the RSA and the factorization schemes are gaining attention [65][66][67]. Hence, we describe how our proposed semi-prime factorization method can be applied to factorize RSA quickly.
The steps involved in RSA encryption are given below: based on this RSA scheme, it is evident that with the factorization of N, we can compute de ≡ 1 (mod ϕ(N)) [68], which gives the private key (N, d). When the factorization is efficient, RSA can be broken. The factoring challenge was introduced to identify the safety limits of the key length to be used for the RSA encryption algorithm that can ensure information security. Hence, researchers focus on mathematically proving the cryptanalytic strength using efficient RSA factorization methods. Table 1 demonstrates the application of our proposed semi-prime factorization method for a key length of 768 decimal digits, denoted as RSA-768. The sums of squares and polynomials have been explored for semi-prime factorization in previous research works [1, 46,69]. However, there are more than 50 properties of Pythagorean triples that have been reported and new patterns yet to be explored [15,70]. In this research work, the proposed method has applied such unique patterns unexplored so far in the literature and maintains the algorithm's order of computational complexity, similar to the existing approaches that have been evaluated and reported recently [71,72]. A recent experimental study employed the representation of primes in the form p = 6 · x ± 1 and applied the theory to the RSA factorization problem [73]. Another work shows the decomposition of the two prime numbers with the Pisano period factorization method, which has been proven to be a subexponential complexity method [74]. Several integer factorization methods have also suggested direct application to cryptanalysis of RSA by applying different genetic algorithms [75]. While genetic algorithms could be a promising avenue of research for integer factorization, they are computationally complex. This paper's focus to involve new simple arithmetic operations while exploring unique structures of prime numbers has resulted in an efficient factorization. In our computationally simple method, the need to square the number that is to be factorized could be considered a constraint for large semi-primes to be attacked quickly. An example for RSA-768 is given below, illustrating our method of squaring the number to achieve semi-prime factorization. Future work will be devoted towards how the inherent mathematical constraint can be overcome by reducing the solution search space. One method is to use low order prime multipliers congruent to 1 mod 4 (for example 5,13,50) to increase the likelihood of finding sums of three squares (and two squares) without the complexity of first squaring the number to be factored. An illustration of the complexity is given below. The likelihood of finding a solution, β, is sparse and becomes less likely for larger semi-primes, as shown below. Future work should explore reducing the search space implied by squaring of the semi-prime to be factored. c 1 c 2 RSA768(c 1 c 2 ) 2 βα (c 1 c 2 ) 2 − β 2 c 1 gcd(α, β)c 2 RSA768 c 1

Conclusions and Future Research
In this paper, we proposed a new method for semi-prime factorization and emphasized its key contribution in the context of information security underpinned by the RSA cryptosystem of the current digital world. Some new ideas have resulted in a breakthrough of factoring the RSA-129 challenge number, but these were possible only after several years. Our novel method follows with the proof that the sum of fours squares of a semi-prime c 1 c 2 has many solutions, but only one solution leads to factorization. The validation is fast, and the method uses a binary greatest common division approach with simple arithmetic operations to find the sum of two squares of one (or both) of the prime factors.
The sum of two squares has only two solutions and both are valid, though hard to find. Once these are known, previous work has proved that a modified Euler factorization can easily determine the prime factorization [1]. This paper was enhanced further by considering the sum of three squares, which has many solutions. However, the semiprime must first be squared, resulting in larger numbers required to be processed. This is offset by the abundance of suitable solutions, leading to factorization successfully without affecting the order of computational complexity. The algorithm and the case examples have demonstrated the simplicity of our proposed method and its enhanced solution space, as compared to Fermat's method. The complexity of our proposed approach was demonstrated using numerical illustrations, including the real-time factorization of the 768-bit number RSA-768.
It is noted that for extremely large semi-primes, the search space may be constrained with the need to square the semi-prime. One approach to address this is highlighted and forms the key motivation for future research. In this context, one of the properties of semi-primes that forms a motivation for future research is given as follows: once the sum of three squares is known, the squares themselves form trees. Hence, reducing the solution search space of these trees for such cases, using our earlier associated research work, will be quite promising to explore.