1. Introduction
Integer factorization (IF) is the problem of decomposing a composite number into a product of smaller integers, ideally primes. More formally, given a positive integer
N, the goal is to find its prime factors
such that
, where
are positive integers. The IF is a hard problem on a classical computer, particularly when
N is the product of two large prime numbers [
1], i.e., there is no polynomial algorithm for IF until now.
The problem plays a central role in the design and security of several public-key asymmetric cryptosystems, such as the Rivest, Shamir, and Adleman (RSA) system [
2]. RSA encryption uses modular exponentiation with the public key
. On the other hand, decryption requires knowledge of the modular inverse using the private key
d, which is derived from Euler’s totient function
. Since computing
without factoring
N is infeasible, RSA remains secure as long as integer factorization is computationally difficult [
3]. Finding the private key helps to factorize the modulus
N [
4].
Over the decades, various algorithms have been developed to solve the IF problem based on different approaches, including (1) deterministic or probabilistic methods, (2) general or specialized methods, and (3) sequential or parallel methods. The trial division method is based on testing all odd integers starting from 2 up to
to check if they divide the given integer
N [
5]. This method was optimized by constructing a “wheel” that skips multiples of the first few primes, known as the wheel factorization method [
6,
7]. Fermat’s method is based on the fact that any composite positive integer can be represented as the difference between two squares,
. Therefore, the algorithm starts with
and tries to find a factor if
is a perfect square; otherwise, it increases
x by 1 [
8].
The Strassen factorization (SF) method is based on utilizing batch greatest common divisor (GCD) computations and fast polynomial arithmetic to detect prime divisors of
N [
9]. Pollard’s Rho method is a probabilistic method useful for factoring numbers with small prime divisors [
10,
11]. Pollard’s
method [
12] finds the prime factor
p if
has only small prime divisors. The elliptic curve (EC) method generalizes Pollard’s
algorithm by operating on elliptic curves [
13,
14]. In the EC method, randomly chosen elliptic curves and points are used, scalar multiplications are performed, and GCD is used to extract factors.
The Quadratic sieve (QS) method is based on the idea of finding two integers
x and
y such that
. This congruence leads to a nontrivial factor of
N [
15,
16]. The general number field sieve (GNFS) extends the idea of the QS method by working in algebraic number fields rather than just integers [
17]. Lehman’s factorization method combines trial division and a square-checking strategy inspired by Fermat’s method [
18]. The generalization of Lehman’s method is based on finding a square identity
while optimizing the way
k is chosen and incorporating early abort techniques, preprocessing by sieving, and modular arithmetic filters to skip bad candidates.
Another strategy for IF is based on mapping the arithmetic structure of integer multiplication into manageable Boolean constraints and demonstrating that most of those equations admit compact ordered binary decision diagram representations, potentially enabling a symbolic approach to factorization [
19].
In this paper, we propose an improved integer factorization algorithm based on a polynomial evaluation function and GCD. Also, the algorithm depends on the backward method for searching the prime factors. The goal of the proposed algorithm is to reduce the memory complexity of the SF algorithm to be suitable for limited hardware devices. Furthermore, the proposed algorithm can factor a number such that bit size of . The proposed algorithm was compared with the SF method.
The idea of the proposed algorithm is inspired by the SF algorithm without using the subproduct-tree and multipoints evaluation. The SF algorithm has several advantages as follows.
(1) It is a deterministic algorithm.
(2) Strassen’s method was the first algorithm to break the exponential time barrier for deterministic integer factorization, achieving a sub-exponential run time of approximately , significantly outperforming the naive trial division.
(3) Unlike heuristic or randomized methods (e.g., EC, GNFS), Strassen’s method offers provable worst-case runtime guarantees, which is valuable in theoretical cryptographic settings or formal analyses.
(4) Strassen’s approach inspired subsequent theoretical improvements. Hittmeir [
20] introduced a babystep–giantstep method for deterministic integer factorization with running time roughly
giving the first major improvement over the long-standing Strassen/Pollard–Strassen type [
21] bound by a superpolynomial factor. Hittmeir [
9] developed a time–space tradeoff for Lehman’s deterministic factorization method and obtained a runtime of
This was the first exponential improvement over the classical deterministic bound. Harvey [
22] improved the general deterministic factorization bound to
surpassing Hittmeir’s
result and establishing the best general-purpose deterministic complexity known in this line. Harvey and Hittmeir [
23] improved the previous result to
by improving the secondary factors while keeping the main exponent equal to
.
On the other hand, the impact of altering the difference between two factors on the running time has not been investigated experimentally for the SF algorithm. Furthermore, to the best of our knowledge, no research papers have examined the use of a high-performance system to solve the SF algorithm.
The paper includes this introduction and four sections.
Section 2 outlines the basic concepts related to the SF algorithm.
Section 3 explains, in details, the steps of the proposed algorithm through various approaches.
Section 4 discusses the experimental studies that compare the proposed algorithms with the SF algorithm. Finally,
Section 5 presents the conclusions and explores open questions related to the proposed solutions.
2. The SF Algorithm
In this section, we provide a brief overview of the SF algorithm that is used as an idea behind the proposed algorithm, with some modifications. Assume that , where p and q are two primes such that with .
The SF algorithm, introduced by Volker Strassen (1977) [
24], is a deterministic method for factoring composite integers
N that leverages batch GCD computations and fast polynomial arithmetic to detect prime divisors of
N [
25]. Unlike trial division, which tests each potential factor individually, the SF algorithm encodes a large number of candidates simultaneously into a single polynomial, significantly reducing computational complexity.
The main steps of the SF algorithm are given in Algorithm 1. Initially, the SF algorithm starts by generating two sets
B and
S, such that (1) the two sets are disjoint modulo
N, and (2) the two sets are not disjoint modulo
p or
q. Hittmeir [
9] constructed the two sets as follows:
where
The SF algorithm consists of two main phases. In the first phase (lines 6–7 of Algorithm 1), the algorithm evaluates the elements of
S using a polynomial function modulo
N:
where
is the cardinality of
B. The results of computing the polynomial function on the elements of
S are stored in the auxiliary array
Y, where
. This phase can be computed using the subproduct-tree algorithm to calculate
f and implement the multipoint-evaluation technique to calculate
.
In the second phase (lines 8–24), the algorithm computes the GCD of
and
N, say
. Then
is a factor of
N if
. If
, this means that there exists a nontrivial factor of
N, and it can be determined when the value of the GCD for
and
N is greater than 1,
. In this case, the algorithm terminates, and there is no need to consider a new iteration for the main loop at line 8.
| Algorithm 1 Strassen Factorization (SF) Algorithm |
Require: A composite number N of size n bits. Ensure: Two factors p and q of N, or no factor found.
- 1:
- 2:
- 3:
- 4:
- 5:
- 6:
- 7:
Compute - 8:
for to d do - 9:
- 10:
if then - 11:
- 12:
- 13:
return p, q - 14:
else if then - 15:
for to d do - 16:
- 17:
if then - 18:
- 19:
- 20:
return p, q - 21:
end if - 22:
end for - 23:
end if - 24:
end for - 25:
return No factor found
|
The time complexity of the SF algorithm is
, where
is the cost of multiplying two polynomials of degree less than
k in
[
25]. This time is equal to
complexity bits. The memory complexity of the SF algorithm is
, where the term
is the memory required by the subproduct tree algorithm used to compute the function
f. In contrast, the term
is the memory required by the four auxiliary arrays
and
G, each of size
d.
3. The Proposed Solutions
In this section, we provide a detailed description to propose an improved algorithm for integer factorization from various perspectives, including memory and time consumption. This section includes three subsections. In
Section 3.1, we provide five comments on the computational time and memory usage of the SF algorithm. In
Section 3.2, we propose solutions for the comments on the SF algorithm. In
Section 3.3, we measure the complexity analysis of the proposed algorithms.
3.1. Comments on the SF
By analyzing the SF algorithm from the perspectives of memory consumption and computational time, we find the following five observations.
The first observation is that the memory consumption of the SF increases rapidly as
n increases. The SF uses four auxiliary arrays
B,
S,
Y, and
G of size
d. The memory consumption for each array is equal to
.
Figure 1 illustrates how memory consumption increases as the number of bits increases. In the proposed improved SF, we reduced the memory requirement from
to a constant,
.
The second observation is that the elements of the auxiliary arrays G and Y are used only once, meaning they do not need to be retained before or after applying the GCD function. Additionally, we can generate the elements of the two arrays S and B without requiring precomputation or generation because the structure of these elements is simple and easy to use without being precomputed previously. We will use only two variables to store the results of applying the polynomial function and GCD. The values of the two variables will be updated at every iteration of the outer loop (line 8 in Algorithm 1).
The third observation is that, in some cases, the SF spends a significant amount of computational time in calculating the polynomial function for all elements of S, even though these calculations were not used in the testing. For example, if there exists an element at the beginning of the array S that satisfies the condition , this scenario means that the time spent to calculate all the elements , , is useless.
Additionally, the running time for the first phase is significantly larger than the running time for the second phase. For example, assume that N = 70,919,611,552,003; the running time for the first phase is s, while the running time for the second phase is s to find the first factor, which is equal to 8,388,673. Therefore, instead of computing the polynomial function for all elements of S first and then testing one by one, we compute the polynomial function of and then test the resultant. If there is a prime factor, then the algorithm terminates; otherwise, the algorithm takes another element of S and repeats the process.
The fourth observation is that, in the RSA cryptosystem, the modulus N is constructed as a product of two prime numbers p and q such that the size of each prime factor is . This means that finding a small factor has no practical significance in real applications. Therefore, the probability of the existence of the solution starting from the back of the array S is higher than searching from the beginning. This results in a shift from testing the elements of array S in a forward manner to employing a backward strategy.
The fifth observation is that applying the polynomial function f on the elements of S and then computing the gcd of the resultant for each iteration can be done independently. This means that if we have a parallel model that consists of t threads, we can test t elements concurrently. This leads to a significant improvement in running time due to the use of parallel computing. If the parallel model includes a large number of threads, we can also use the parallelism when computing the two functions f and GCD. Moreover, we can parallelize the inner loop of Algorithm 1 at line 15.
3.2. The Algorithms
In this subsection, we explain how to address the five observations related to the SF algorithm for factoring N. For the first four observations, we propose a sequential improvement algorithm based on polynomail evaluation function and GCD, while the last observation represents the proposed parallel algorithm.
Algorithm 2 presents the pseudocode of an improved algorithm (ISF). The algorithm begins by calculating the value of d using and , as outlined in lines 1, 2, and 3. In line 4, the algorithm begins its calculations using a backwards approach, starting from a large value of j (equal to d) and proceeding to the smallest value (). The backwards loop is used to overcome the fourth observation. Additionally, there is no need to construct the two sets B and S, as was done in lines 4 and 5 of Algorithm 1.
In line 5, the polynomial function f is called with the value , and it does not utilize the auxiliary arrays S and B, nor does it save the results in the auxiliary arrays Y and G. Line 6 saves the result of applying the GCD function in an auxiliary variable of size 1, y, and this variable will be updated in every iteration. This overcomes the second observation. Additionally, this step overcomes the third observation by reducing the time spent on non-tested elements.
Lines 7–11 are similar to the SF algorithm. In line 13, the algorithm uses the value instead of using the two auxiliary arrays S and B. Also, the result of applying the GCD is saved in the auxiliary variable y, without saving it in the auxiliary array G, to test the possibility of finding the prime factor.
The element belongs to the set S. Therefore, the first step is to compute . If the result of GCD is greater than 1 or less than N, the algorithm returns , where from the definition of f and the set B.
Otherwise,
. This means that
. That means every prime factor of
n divides the product:
Therefore, the algorithm computes , .
Note that the element
does not belong to the set
B (non-positive integer). Hence, directly computing
and
is not arbitrary, but is guaranteed by number theory to expose factors of
N.
| Algorithm 2 Improved Integer Factorization (ISF) |
Require: A composite positive integer N of size n bits. Ensure: Two factors p and q of N, or no factor found.
- 1:
- 2:
- 3:
- 4:
for downto 1 do - 5:
- 6:
- 7:
if then - 8:
- 9:
- 10:
return p, q - 11:
else if then - 12:
for to d do - 13:
- 14:
if then - 15:
- 16:
- 17:
return p, q - 18:
end if - 19:
end for - 20:
end if - 21:
end for - 22:
return “No factor found"
|
From the fifth observation, we propose two parallel algorithms for the ISF algorithm. The first proposed parallel algorithm is based on parallelizing the outer loop of the ISF only; i.e., one level of parallelization. The second proposed parallel algorithm is based on multi-level parallelism. The first level of parallelism is the outer loop, while the second level uses parallelism in the body of the outer loop, including parallelizing the two functions f and GCD, and the inner loop (line 13 in the ISF algorithm).
Algorithm 3 describes the first direction of parallelizing the ISF algorithm, denoted by PISF. In this algorithm, t threads work concurrently to parallelize the outer loop by assigning one iteration to each thread. Initially, thread i is assigned to the iteration , as in line 6. Then each thread works sequentially, as in the ISF algorithm, lines 8–23. If no prime factor is found at the current iteration, thread i updates the value of j to , as in line 24, and repeats the same process, as in line 7, until one of the threads finds the prime factors (found = true) or the outer loop terminates and the integer N is prime.
The second direction of parallelizing the ISF algorithm is based on parallelizing the following additional parts:
- (1)
Parallelizing the function
f using parallel prefix-product and parallel multiplication algorithms [
26,
27,
28,
29,
30,
31].
- (2)
Parallelizing the function GCD [
32,
33].
- (3)
Parallelizing the inner loop at line 15 in Algorithm 3 using m threads.
| Algorithm 3 Parallel Improved Integer Factorization (PISF) |
Require: A composite number N of size n bits, and t threads. Ensure: Two factors p and q of N, or no factor found.
- 1:
- 2:
- 3:
- 4:
found = false - 5:
for to t do parallel - 6:
- 7:
while (found == false) and () do - 8:
- 9:
- 10:
if then - 11:
- 12:
- 13:
found = true - 14:
else if then - 15:
for to d do - 16:
- 17:
if then - 18:
- 19:
- 20:
found = true - 21:
end if - 22:
end for - 23:
end if - 24:
- 25:
end while - 26:
end for - 27:
if
found == true
then - 28:
return p, q - 29:
else - 30:
return “No factor found" - 31:
end if
|
3.3. Complexity Analysis
In this subsection, we measure the complexity analysis of the proposed algorithms in terms of running time and memory consumption.
For Algorithm 2, the time complexity is bounded by
, where
d represents the total number of iterations for the outer loop (line 4) and
and
are the time complexities for the functions
f and GCD, respectively. Assume that
is the cost of multiplying two integers, each of size
bits. Therefore,
Note that in computing , we considered it as a multiplication of d integers, each of size .
Since the time complexity of is greater than , the time complexity of the ISF algorithm is .
In the best case, the ISF algorithm finds the prime factor after a few iterations from the outer loop, so the time complexity of the ISF algorithm is reduced to . In contrast, the time complexity of SF in the best case is complexity bit.
The memory consumption of Algorithm 2 is because the algorithm uses a constant number of auxiliary variables during the computation.
For Algorithm 3, the time complexity is , while the memory consumption is . The PISF algorithm reduces the time complexity and memory consumption of the SF and ISF algorithms.
4. Experimental Studies
In this section, we conduct an experimental investigation into the time and memory consumption of three algorithms: SF, ISF, and PISF. Additionally, we investigate the speedup and scalability of the proposed algorithm PISF as a function of increasing the number of threads. This section is divided into three subsections. The methodology for evaluating the experimental performance of the three algorithms—SF, ISF, and PISF—is covered in the first subsection. In the second subsection, we show how well the suggested sequential algorithm ISF performs in comparison to other algorithms. In the last subsection, we show how well the parallel algorithm PISF performs in comparison to the sequential algorithm.
4.1. Methodology
We have two directions for measuring the performance of the proposed algorithms.
The first direction is related to sequential computation, in which we compare the two algorithms SF and ISF based on running time and memory consumption. The percentage of improvement in the running time or the ISF algorithm is equal to .
The second direction is related to parallel computation, in which we evaluate the proposed parallel algorithm PISF using several criteria, including running time, speedup, efficiency, and scalability. The speedup of the parallel solution, , is the ratio of the running time of the sequential algorithm, ISF, to the running time of the parallel solution, PISF. The speedup is linear if , while the speedup is sublinear if . The efficiency of the parallel algorithm is the ratio of its speedup to the number of processors.
For both directions, we use the C language to implement all algorithms as well as a library to manipulate large integers named GNU Multiple Precision (GMP) [
34]. We also utilize an application programming interface to implement parallel regions in PISF, known as Open Multi-Processing (OpenMP) [
35]. All programs run on a machine consisting of a processor with a speed of 2.5 GHz and work under the Linux OS. For a parallel program, the system can run 16 threads simultaneously. Additionally, the computer has 15 MB of cache memory and 32 GB of RAM.
For all experimental studies, we generate a test dataset that verifies the following properties:
- 1.
The size of each prime factor is because in real applications, such as RSA, the modulus N of size n is a product of two primes of the same size. Additionally, if the size of a prime factor is small, the trial division and wheel methods are efficient strategies for finding it.
- 2.
The difference between two prime factors is bounded by d.
Therefore, a random integer N of size n bits is constructed by generating a random prime number p of size . Then we generate another prime number q of size such that the size of is , where . For fixed values of n and starting from to , we generate 50 instances that verify the properties of the test dataset.
The primary metric employed to compare various algorithms is the running time. For fixed n and , the running time of each algorithm is measured by taking the average of the running time for 50 instances. For the first direction (sequential computation), we use only one thread for both algorithms: SF and ISF. For the second direction (parallelism), we run the PISF on the same data using different numbers of threads: 2, 4, 8, and 16. Additionally, in the second direction, we calculate other measurements such as (1) the speedup measurement, which is equal to the running time of ISF divided by the running time of PISF, and (2) the efficiency measurement, which is equal to the running time of ISF divided by the product of the running time of PISF with the number of threads. The values of n are and 88.
4.2. Performance of ISF
To measure the performance of the proposed algorithm ISF, we compare it with the SF algorithm.
Table 1 shows the performance of the two algorithms SF and ISF in terms of the running time over different values of
n and
. The results indicate the following observations:
There is a significant difference between the running time of SF and ISF algorithms for each n and ; see Columns 3 and 4. For example, when bits and the difference between two prime factors is 16 bits, the running times for the SF and ISF algorithms are 0.521 and 11,358.32 s, respectively.
The average percentage of improvement for the ISF algorithm compared to the SF algorithm is for all studied cases, which is a significant improvement over the SF algorithm.
For a fixed value of n less than 64 bits, the running time of the original algorithm SF is acceptable because it takes at most a few minutes. On the other hand, when bits, the running time of SF increases rapidly, especially when increases.
Figure 2 illustrates the execution time of the ISF algorithm for 50 instances when
.
For a fixed value of n less than 64 bits, the ISF running time is very fast.
For bits, the running time of SF is considerable (greater than one hour), so we neglected measuring it. The running time for ISF will be measured when measuring the performance of PISF for .
4.3. Performance of PISF
To measure the performance of the PISF algorithm, we run it on the same data used in the previous section with different numbers of threads for values of
n equal to 72, 80, and 88, since the running time of the ISF algorithm is small for values below this number of bits. Additionally, to measure the speedup of the PISF algorithm, we run the ISF algorithm on the same dataset using one thread.
Figure 3 demonstrates the performance of the PISF algorithm compared to the ISF algorithm with varying numbers of threads. From
Figure 3, we observe the following:
Parallelism significantly reduces the running time of the ISF as the number of threads increases.
The PISF is scalable, which means that increasing the number of threads reduces its running time. For example, when , the running time of ISF is 166.22 s, while the running times of PISF using 2, 4, 8, and 16 threads are , , , and s, respectively.
The average percentage of improvement in the running time for the ISF algorithm when we use two threads for PISF is approximately for the studied cases. This percentage increases to when we use four threads. Using eight and sixteen threads, the PISF improved the ISF with average percentages of and , respectively, for all studied cases.
The average speedup of the PISF using 2, 4, 8, and 16 threads is 1.78, 3.64, 7.1, and 12.9, respectively. This means that the speedup of PISF is nearly linear.
The average efficiency of the PISF using 2, 4, 8, and 16 threads is 0.89, 0.91, 0.88, and 0.81, respectively.