Next Article in Journal
Exponential Tail Estimates for Lacunary Trigonometric Series
Previous Article in Journal
Vulnerable Option Pricing Under the 4/2 Stochastic Volatility Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on Software Optimization for Discrete Fourier Test

Department of Basic Education, Wuxi University of Technology, Wuxi 214121, China
*
Author to whom correspondence should be addressed.
Submission received: 27 October 2025 / Revised: 18 December 2025 / Accepted: 19 December 2025 / Published: 22 December 2025

Abstract

Random sequences are critical to cryptographic technologies and applications. Randomness testing typically employs probabilistic statistical techniques for evaluating the randomness properties of such sequences. Both the National Institute of Standards and Technology (NIST, Gaithersburg, MD, U.S.) and the State Cryptography Administration (SCA, China) have issued guidelines for randomness testing, each of which includes the Discrete Fourier Transform (DFT) test as one of the mandatory assessments. This paper focuses on the efficient implementation of the DFT test and proposes a fast implementation approach that leverages FFTW (Fastest Fourier Transform in the West). Comprehensive experimental tests and performance evaluations were performed both before and after optimization of the algorithm. The results show that the optimized algorithm increases the speed of the DFT test for a single sample by a factor of 3.37.
MSC:
65T50; 94A60; 60B20; 68W30

1. Introduction

Binary random sequences occupy a pivotal role in cryptographic applications, forming the cornerstone of secure algorithms in both Shannon’s perfect secrecy system and modern cryptographic frameworks. Today, they are widely used in computer systems for tasks such as key generation, digital signatures, and authentication—underscoring their practical significance and criticality.
Within applied cryptography, the goal of randomness testing is to employ probabilistic statistical methods to analyze and test the randomness of binary sequences generated by random number generators (RNGs). Its core objective is to determine if the tested binary sequences are statistically indistinguishable from true random numbers. Different randomness testing algorithms assess discrepancies between tested binary sequences and true random sequences from diverse angles. Over the years, substantial progress has been made in developing such algorithms, with numerous existing methods and new approaches continuously emerging.
Randomness testing specifications provide a scientific basis for randomness evaluation. The U.S. National Institute of Standards and Technology (NIST) has published and subsequently revised its SP 800-22 document [1], which recommends 15 statistical tests for randomness assessment. Building on this standard, Germany issued its BSI AIS 30 specification [2]. In 2021, China’s State Cryptography Administration (SCA) released its national randomness testing specification [3], which also recommends 15 statistical tests for randomness evaluation.
Randomness testing methods find broad practical use, including
  • Evaluating pseudo-random data streams generated in accordance with specific standards;
  • Testing data streams produced by cryptographic algorithms paired with specific operation modes [4];
  • Assessing data streams derived from hash functions—for instance, China’s SM3 hash function [5].
These applications help reduce analysts’ workloads and identify security risks that may be overlooked by other testing methods. Consequently, extensive randomness testing and evaluation have been conducted on candidate algorithms in cryptographic initiatives such as the AES competition and NIST Post-Quantum Cryptography Standardization (note: clarify the acronym “NISSIE” if it refers to a specific competition, as it is not a widely recognized standard term) [6].
Numerous studies have explored the accelerated implementation of randomness testing components and the evaluation of cryptographic sequence randomness, providing a solid foundation for subsequent research. Specifically, relevant research can be organized into two key strands while maintaining the original reference order:
(1) Optimization of existing randomness test items: Scholars have focused on improving the efficiency of the poker test [6], enhancing the implementation of the monobit test and block frequency test [7], developing new statistical tests to supplement existing standards [8], and investigating GPU-based parallel implementations of statistical tests while establishing corresponding parallel testing frameworks [9];
(2) Evaluation of ciphertext randomness via standard test suites: Researchers have utilized mainstream randomness testing suites to assess the randomness of sequences generated by cryptographic algorithms—for instance, Li et al. [10] designed a jump test based on jump complexity to calculate the sum of jump heights across intervals; Haramoto [11] proposed upper bounds for sample sizes in secondary tests corresponding to six items in NIST SP 800-22; Abdelwahab et al. [12] analyzed the randomness of different configurations of the SNOW and ZUC algorithms using the NIST SP 800-22 suite; Deb et al. [13] employed the NIST randomness testing tool to compare the randomness quality of outputs from software-oriented stream cipher algorithms, achieving effective discrimination of cryptographic systems.
Beyond the optimization of randomness testing itself, extensive studies on stochastic system stability and control further confirm that randomness factors are critical to the performance of complex systems—including cryptographic systems, where pseudo-random sequences serve as the core security foundation. Specifically. Xu et al. [14] analyzed the exponential stability of stochastic nonlinear delay systems, revealing that random factors significantly influence the long-term operational reliability of dynamic systems, a finding that extends to the reliability of cryptographic systems dependent on random sequences. Ding et al. [15] studied intermittent static output feedback control for stochastic delayed-switched positive systems, emphasizing that rational management of random disturbances is key to ensuring system performance—analogous to how precise characterization of randomness in cryptographic sequences safeguards encryption effectiveness. Zhu et al. [16] investigated event-triggered feedback control for stochastic nonlinear delay systems with exogenous disturbances, demonstrating that ignoring randomness can lead to severe performance degradation, which aligns with the security risks posed by non-random sequences in cryptography. Wang et al. [17] explored the stability of discrete-time semi-Markov jump linear systems with time delay, highlighting the indispensable role of randomness modeling in system performance optimization—relevant to the accurate modeling of random sequences in cryptographic applications. Zhao et al. [18] focused on the stabilization of stochastic highly nonlinear delay systems with neutral-term, confirming that random factors are inherent to system dynamics and directly impact control effectiveness, much like how inherent randomness in pseudo-random sequences determines cryptographic security. Wang et al. [19] researched the stabilization of discrete-time hidden semi-Markov jump linear systems, further verifying that accurate characterization of randomness is critical to maintaining system stability and performance—echoing the need for rigorous randomness testing in cryptography.
Notably, while existing optimizations have made progress in various randomness test items, most focus on non-DFT test components. As the Discrete Fourier Transform (DFT) test is a mandatory item in both NIST and Chinese randomness testing standards yet remains one of the least efficient, it lacks targeted acceleration solutions. This research gap motivates the work presented in this paper.
This paper is organized as follows: Section 2 provides a brief introduction to the basic notations used herein. Section 3 presents an overview of the DFT test execution process, alongside an analysis of the efficiency of existing detection algorithms. Section 4 analyzes the causes of low algorithm efficiency, then leverages the free, open-source FFTW (Fastest Fourier Transform in the West, version 3.3.5) software package [20] to optimize Fourier transformation, refine the algorithm execution workflow, reduce computational complexity, and propose an accelerated implementation algorithm for the DFT test. Section 5 conducts simulations and tests to evaluate the detection efficiency and speed improvements of the proposed algorithm. Section 6 offers additional optimization recommendations for multi-sample detection scenarios. Finally, Section 7 concludes the paper.

2. Terms and Notations Explanation

Let ε = ε 1 ε 2 ε n denote the binary sequence of n bits to be tested.
According to the randomness testing specifications in China, the sample length n is set to one million bits.
Let α denote the significance level, which is defined by the Chinese randomness testing specifications as α = 0.01.
The complementary error function, represented as erfc, is defined as
e r f c ( z ) = 2 π z e u 2 d u .
Let Xk be the k-th bit, where k = 1 , 2 , , n . Assume that the bits are coded −1 and +1. The Fourier transform produces a series of complex numbers, denoted as fi,
f j = k = 1 n X k cos ( 2 π ( k 1 ) j / n ) + i sin ( 2 π ( k 1 ) j / n ) ,   j = 0 , 1 , , n 1 ,
where i = 1 .
The magnitude of the complex number is represented by modj = modulus(fj) = |fj|, denoted as
| f | = a 2 + b 2 .

3. Introduction to the Discrete Fourier Test

Both the Chinese randomness testing specifications and the NIST randomness testing specifications include 15 tests each. Eleven of these tests are identical, including the Block Frequency Test, Matrix Rank Test, Approximate Entropy Test, Monobit Test, Cumulative Sums Test, Runs Test, Maximum Ones in a Block Test, Linear Complexity Test, and Maurer’s Universal Statistical Test. The DFT Test is one of these 11 common tests.
The Chinese randomness testing specifications include four unique tests: Runs Distribution Test, Poker Test, Serial Test, and Binary Derivative Test.
The NIST randomness testing specifications include four unique tests: Random Excursions Test, Random Excursions Variant Test, Non-overlapping Template Matching Test, and Overlapping Template Matching Test.

3.1. Overview of the Discrete Fourier Test Procedure

The Discrete Fourier Test is used to determine if the number of abnormal peaks obtained after performing a Fourier transform on the sequence exceeds an allowable threshold. The DFT test uses spectral analysis to assess the randomness of the sequence. After performing the Fourier transform on the sequence, peak heights are obtained. According to the assumption of randomness, these peak heights should not exceed a certain threshold value (which is related to the sequence length); otherwise, they are considered abnormal. If the number of abnormal peaks exceeds the allowed limit, the sequence is deemed non-random.
The description of the Discrete Fourier Test in NIST SP800-22 is outlined as follows [1] (Algorithm 1):
Algorithm 1: Discrete Fourier Test Algorithm
Input: Binary test sequence ε = ε 1 ε 2 ε n .
Output: Test result (pass/fail) and corresponding P _ value.
Execution Steps:
Step A1: Convert the 0 s and 1 s in the sequence ε = ε 1 ε 2 ε n to −1 and 1, respectively, obtaining a new sequence X 1 X 2 X n , where Xi = 2εi − 1, 0 ≤ in − 1.
Step A2: Perform the Fourier transform on the new sequence to obtain a series of complex numbers f 0 , f 1 , f 2 , , f n 1 , where.
f i = k = 1 n X k cos 2 π ( k 1 ) j n + i sin 2 π ( k 1 ) j n .(4)
Step A3: Compute the modulus of fi, 0 ≤ in/2 − 1,
m o d i = m o d u l u s ( f i ) = f i .(5)
Step A4: Calculate the threshold value T = 2.995732274 n .
Step A5: Calculate N0 = 0.475n.
Step A6: Count the number of coefficients |fi| that are less than the threshold value T, denoted as N1.
Step A7: Compute the statistic
V = ( N 1 N 0 ) 0.95 × 0.05 × n / 4 .(6)
Step A8: Compute P-value = erfc(|V|/2).
Step A9: If P-valueα, then the tested sequence passes the Discrete Fourier Test.
Step A3 leverages the symmetry of the transformation from real numbers to complex numbers, considering only half of the Fourier coefficients to accelerate the test algorithm. Based on the randomness hypothesis, a confidence interval (e.g., the 95% confidence level specified in relevant standards) can be predefined; this interval implies that at least 95% of the modi values (modulus values of the Fourier coefficients) should be less than a predefined threshold. Let N1 denote the count of modi values that fall below this threshold. The statistic V computed in Step A7 follows a standard normal distribution.

3.2. Efficiency of the DFT Test

The DFT test relies on Fourier transformation, making it typically among the least efficient assessments specified in both NIST and Chinese randomness testing standards. Its low computational speed substantially limits the overall efficiency of randomness evaluation. Optimizing the test’s speed is therefore essential to enhance the execution efficiency of both this specific test component and the overall randomness testing workflow.
Traditionally, the DFT test utilizes the randomness testing code provided in NIST [21]. The subsequent measurements quantify the time consumption of each step in the DFT test for a single sample (1 million bits).
The testing platform employed in this study is detailed below; all subsequent experiments were performed on this identical platform (Table 1).
The test data comprised pseudorandom numbers generated by the SM4 algorithm, split into multiple samples—each containing 1 million bits.
The testing method adopted a simplified version of the speed testing model from the European eSTREAM algorithm competition. This model was not only utilized in the eSTREAM competition but also widely employed in subsequent performance evaluations of numerous cryptographic algorithms.
Algorithm 2: Performance Evaluation Algorithm
Input: Target code segment for performance testing; number of repeated tests C (requirement: C is an odd number).
Output: Statistical results of performance (i.e., timing values).
Execution Steps:
Step B1: Initialize the timing array such that T[i] = 0, 1 ≤ iC.
Step B2: Repeat the timing of the target code segment C times, recording each timing value T[i], 1 ≤ iC. This involves two sub-steps:
B2.1 Initialize a timer (denoted as A) immediately before the target code segment and another timer (denoted as B) immediately after the segment.
B2.2 Calculate the elapsed time for the code segment as the difference between the two timers: T[i] = BA.
Step B3: Sort the timing sequence T[i], 1 ≤ iC in descending order to generate a non-increasing sequence T [ 1 ] T [ 2 ] T [ C ] . Ascending order is also acceptable as an alternative.
Step B4: Extract the median value T [ ( C + 1 ) / 2 ] from the sorted sequence. This value serves as the statistical timing result for the target code segment, which is then output.
In the experiment, the number of repeated tests C was set to 21 (consistent with the “odd number” requirement in Step B1). To ensure the accuracy of test results, the time counter leveraged a CPU frequency-based timer: it either directly invoked the RDTSC assembly instruction or used the __rdtsc() function in the Windows environment. Given the clock frequency of modern CPUs, this timer achieves nanosecond-level precision. The elapsed time (in seconds) was derived by dividing the difference between the clock cycles returned by two RDTSC instructions by the CPU’s operating frequency.
The time consumption of each step in the DFT Test for a single sample (1 million bits) is presented in Table 2. Note that the time for Steps A4 to A9 is negligible and thus aggregated for statistical analysis (Figure 1).
From the test results presented above, the DFT test took 136.369 ms to complete detection for a single sample—a duration significantly longer than that of most other randomness tests [6,7]. Specifically, Step A2 (which entailed performing Fourier transformation on the newly constructed sequence) was the most time-consuming step, accounting for approximately 68.89% of the total detection time. The second most time-consuming step was Step A3, which involved computing the modulus of Fourier coefficients and occupied around 27.31% of the total time. Collectively, these two steps accounted for 96.20% of the total detection time, confirming that they were the primary bottlenecks limiting the DFT test’s overall efficiency.

4. Optimization of the DFT Algorithm

4.1. Bottleneck Analysis and Improvement Strategy

Based on the preceding analysis, Steps A2 and A3 of the DFT Test were the two most time-consuming components. Thus, the optimization efforts focused on these two steps.
The key improvements proposed for the DFT Test in this study included two aspects:
1. Optimizing the Fourier transformation in Step A2 via the FFTW software package (version 3.3.5); detailed implementation is provided in Section 4.2.
2. Optimizing the modulus calculation and the statistics of N1 across Steps A3 to A6. This aspect primarily involved reducing the number of square root operations and optimizing the associated code to achieve the required improvements; further details are presented in Section 4.3. A complete description of the optimized DFT Test is provided in Section 4.4.

4.2. Optimization of Discrete Fourier Transform Using FFTW

FFTW [20] (Fastest Fourier Transform in the West) is a free, open-source C-language software package (version 3.3.5) developed by Matteo Frigo and Steven G. Johnson from the Massachusetts Institute of Technology (MIT). It features easy portability across different platforms.
FFTW supports DFT for data of arbitrary size and dimensions, as well as other transforms, including Discrete Cosine Transform (DCT), Discrete Sine Transform (DST), and Discrete Hartley Transform (DHT). Per developer testing, FFTW outperforms other free DFT-computing libraries and even rivals commercial DFT libraries in computational speed.
FFTW achieves optimal performance for transformations where the data length n = 2a3b5c7d11e13f (with a, b, c, d, e, f being non-negative integers) and e + f ≤ 1. The sample length specified in Chinese randomness testing standards—1 million bits—falls within a range where FFTW performs exceptionally well.
In the DFT Test, the new sequence employed for Fourier transformation was composed entirely of real numbers. FFTW supports fast Fourier transforms (FFTs) for real-valued inputs. Additionally, FFTW supports multi-threaded computation and parallel processing.
The steps for integrating FFTW into the DFT Test were as follows (Algorithm 3):
Algorithm 3: FFTW-Based Execution of the DFT Test
Input: Test data (binary sequence).
Output: Test result (statistical outcome of the DFT Test).
Execution Steps:
Step C1: Allocate memory for a real-number array (size n) and a complex-number array (size n/2 + 1) using the recommended function void *fftw_malloc(size_t n).
Step C2: Create a plan for the Fourier transform of the real-number array by calling fftw_plan fftw_plan_dft_r2c_1d(int n, double *in, fftw_complex *out, unsigned flags).
Step C3: Execute Step A1 of the DFT Test to compute the new sequence X 1 X 2 X n , then store it in the pre-allocated input real-number array.
Step C4: Perform the Fourier transform by calling void fftw_execute(const fftw_plan plan), thereby completing Step A2 of the DFT Test.
Step C5: Sequentially execute Steps A3 to A9 of the DFT Test to derive the test result.
Step C6: Destroy the plan using void fftw_destroy_plan(fftw_plan plan) and release memory for the input real-number array and output complex-number array via void fftw_free(void *p).
Step C7: Return the test result computed in Step C5.
The fftw_plan_dft_r2c_1d function called in Step C2 includes a critical flags paramete—a key factor influencing plan generation efficiency and transform performance. This parameter typically accepts values such as FFTW_MEASURE, FFTW_ESTIMATE, or other relevant options, with distinct functionalities:
  • FFTW_MEASURE: Instructs FFTW to time multiple FFT algorithms for the target transform size, then selects the one with optimal performance.
  • FFTW_ESTIMATE: Only creates a plan deemed “reasonable” by FFTW based on heuristic rules. This plan is lightweight to generate but not guaranteed to be optimal for the specific task.
When generating the FFTW plan via fftw_plan_dft_r2c_1d, additional flags are available to refine plan selection, with FFTW_PATIENT and FFTW_EXHAUSTIVE being notable examples. Their characteristics and trade-offs are as follows:
  • FFTW_PATIENT: Operates similarly to FFTW_MEASURE but expands the algorithm search scope to identify a more optimized plan—an advantage that becomes particularly pronounced for large-scale transformations. However, this thoroughness comes at a cost: it substantially prolongs plan creation time.
  • FFTW_EXHAUSTIVE: Takes an even broader approach, evaluating algorithms that are typically dismissed as suboptimal. This exhaustive search generally yields superior plans compared to FFTW_PATIENT but, as expected, results in significantly longer plan generation times.
Generating FFTW plans can be computationally intensive—particularly when using FFTW_PATIENT and FFTW_EXHAUSTIVE, which demand substantial time. To mitigate this, two key strategies are employed:
1. Reusing FFTW plans to minimize the frequency of plan regeneration.
2. Leveraging FFTW’s “wisdom” feature: The core concept of wisdom is to store configuration details (e.g., memory allocation patterns and register usage) associated with generated plans on disk. Upon subsequent program execution, this information is reloaded into memory, thereby saving significant time during plan regeneration. Importantly, wisdom stores configuration metadata rather than the plans themselves; thus, plan generation functions must still be invoked after loading wisdom.
Critical considerations for wisdom usage include the following:
  • Wisdom is runtime-environment-specific: Any change to the runtime environment (e.g., hardware, operating system, or compiler) requires wisdom to be regenerated.
  • Wisdom is program-specific: Modifications to the program (e.g., adjustments to transform parameters or input size) also necessitate regenerating wisdom.

4.3. Further Optimization of Steps A3–A6

Steps A3 to A6 of the DFT Test—primarily involving the calculation of |fi| (modulus of complex coefficients) and the counting of N1—admit further optimization. The key strategies are as follows:

Eliminating Redundant Square Root Operations

The modulus of a complex number fi = a + bi is defined as | f | = ( a 2 + b 2 ) , which requires a square root operation. However, since |fi| is only used for comparison with a threshold T to determine N1, the square root operation is redundant. Instead, we can
1. Adjust Step A3 to compute |fi| = a2 + b2 (avoiding the square root);
2. Adjust Step A4 to use T2 = 2.995732274n (the square of the original threshold) as the new comparison criterion;
3. Adjust Step A6 to count N1 as the number of coefficients where |fi|2 < T2.

4.4. Description of Optimized Algorithm

According to the preceding descriptions, the optimized algorithm for the DFT Test is detailed in Algorithm 4.
Algorithm 4: Optimized Implementation of the Discrete Fourier Test
Input: Test data ε = ε 1 ε 2 ε n (binary sequence).
Output: Test result (pass/fail with corresponding P-value).
Execution Steps:
Step D1: Allocate memory for a real-number array (size n) and a complex-number array (size n/2 + 1) using the fftw_malloc function.
Step D2: Create a plan for the Fourier transform of the real-number array by calling fftw_plan_dft_r2c_1d.
Step D3: Convert the binary sequence ε = ε 1 ε 2 ε n (composed of 0 s and 1 s) to X 1 X 2 X n (where Xi = 2εi − 1 for 0 ≤ in − 1), yielding a new sequence of −1 s and 1 s.
Step D4: Perform the Fourier transform by calling fftw_execute, resulting in a series of complex numbers f 0 , f 1 , f 2 , , f n 1 .
Step D5: For fi (where 0 ≤ in/2 − 1), compute the squared modulus |fi|2 = a2 + b2 (where fi = a + bi).
Step D6: Calculate N0 = 0.475n, then count N1 as the number of coefficients |fi|2 that are less than the threshold T2 = 2.995732274n.
Step D7: Compute the statistical value:
V = ( N 1 N 0 ) 0.95 × 0.05 × n / 4 (7)
Step D8: Compute the P-value = erfc(|V|/2).
Step D9: If erfc(|V|/2) meets the predefined criterion (e.g., P-value ≥ 0.01 for significance), the tested sequence passes the DFT Test.
Step D10: Destroy the FFTW plan and release memory for the input real-number array and output complex-number array using fftw_free. Return the test result.
The flowchart for the optimized DFT Test implementation is presented in Figure 2.

5. Experimental Setup and Results

A core prerequisite for speed improvement is ensuring that the test results of the optimized algorithm are fully statistically equivalent to those of the original algorithm. To this end, it is necessary to clarify that the proposed optimized algorithm only performs equivalent mathematical transformations on three aspects: the execution of the DFT (optimized based on the FFTW library), the calculation logic of the modulus of complex numbers (replacing the original modulus with the squared modulus to avoid redundant square root operations), and the counting process of the statistic N1. It does not alter the statistical core logic, hypothesis testing criteria, or decision thresholds of the algorithm. Thus, the test results of the optimized algorithm were statistically consistent with those of the original algorithm. Furthermore, prior to conducting performance evaluation experiments, a correctness verification experiment for the optimized algorithm was specifically designed: the verification data adopted five sets of benchmark sequences (data.e, data.pi, data.sqrt2, data.sqrt3, data.sha1) provided in [21] (NIST STS standard test datasets); the verification method involved performing DFT tests on the aforementioned five sets of sequences using three approaches: the original DFT test implementation in [21], the optimized algorithm proposed in this paper, and the standard procedure specified in [1] (NIST SP 800-22rev1a). The comparison dimensions included the test results of [21] (including the statistic N1 value and P-value), the test results of the optimized algorithm in this paper (including the statistic N1 value and P-value), and the standard P-value reference data provided in [1]. The correctness verification results are shown in Table 3.
From the comparative verification results in Table 3, it can be clearly concluded that the test results of the optimized algorithm proposed in this paper are fully statistically equivalent to those of the original algorithm, which fully verifies the effectiveness and statistical correctness of the optimization scheme.
To rigorously validate the efficacy of the optimized algorithm, this section details the experimental evaluations conducted on the optimized algorithm and presents a comparative analysis with the performance metrics of the pre-optimized algorithm. The data sources, testing environment, and methodology are consistent with those specified in Section 3.2. Specifically, a discrete Fourier transform analysis was performed on a 1-million-bit sample derived from pseudorandom numbers generated by the SM4 encryption algorithm. The testing platform remained unchanged.
The performance metrics of the optimized DFT algorithm—encompassing both the Fourier transformation process and the modulus computation—are summarized in Table 4. In these experiments, the ‘flags’ parameter within the fftw_plan_dft_r2c_1d function was set to FFTW_ESTIMATE, its default heuristic mode for plan generation.
As can be observed from Table 4, the overall execution efficiency of the DFT Test was improved significantly. The elapsed time decreased from 136.369 ms (pre-optimization) to 58.651 ms, achieving a speed-up ratio of 2.325×. This enhancement is primarily attributed to two factors: FFTW substantially reduced the Fourier transform execution time, while the optimized modulus calculation reasonably compensated for inefficiencies in the execution procedures of the NIST Statistical Test Suite (STS). The execution efficiency of the optimized DFT Test primarily depends on the Fourier transform process. The efficiency of each major step in the optimized algorithm is detailed in Table 5 (Figure 3).
As can be seen from Table 5, the most time-consuming steps in the optimized algorithm were Step D1–D2 (involving memory allocation and FFTW plan creation via fftw_plan_dft_r2c_1d) and Step D4 (invoking fftw_execute for Fourier transformation). Collectively, these two steps accounted for approximately 94.81% of the total elapsed time. Additionally, other steps such as Step D3 (sequence conversion), Step D5 (modulus calculation), and Step D6–D10 (statistical analysis and resource release) contributed minimally to the overall execution time.

6. Further Optimization for Multi-Sample Detection

6.1. Multi-Sample Detection Optimization via FFTW Plan Reuse

In practical testing scenarios, multiple sample evaluations are frequently required (e.g., testing 1000 sample sets). For such cases, the previously optimized algorithm can be further enhanced. A key improvement is reusing FFTW plans to minimize plan regeneration frequency. The specific implementation process is detailed as follows (Algorithm 5):
Algorithm 5: Optimized Multi-Sample DFT Detection
Input: Multi-sample test data (multiple binary sequences).
Output: Detection results for all samples.
Execution Steps:
Step E1: During module initialization, execute Steps D1 and D2, and cache relevant resources: real-number arrays, complex-number arrays, and FFTW plans.
Step E2: For each test sample, use the cached FFTW plans from Step E1 to execute Steps D3–D9, completing individual sample detection.
Step E3: After all samples are processed, execute Step D10 during module deinitialization: destroy FFTW plans and release memory for the cached real-number and complex-number arrays.
This optimized approach reduces the frequency of FFTW plan generation and eliminates redundant memory allocation/release operations for plans. When the number of samples to be tested is large, the efficiency gain from this method becomes particularly significant.
The implementation workflow of the optimized multi-sample DFT detection is depicted in Figure 4.
The aforementioned optimization algorithm adopts the concept of reusing FFTW plans. The benefit of reusing FFTW plans is that there is no need to recreate and destroy the FFTW plans or allocate and release FFTW working memory for each sample detection. According to the test results in Table 5, reusing FFTW plans can reduce the detection time per sample from around 58.651 ms to approximately 10.594 ms per sample.
For comparative experiments involving 1000 sample sets (each containing 1 million bits), two testing approaches were evaluated:
Approach 1.
Repeatedly execute the full Algorithm 4 until all 1000 samples are processed. This requires performing FFTW memory allocation/deallocation and plan creation/destruction for each individual sample set.
Approach 2.
Utilize the FFTW plan reuse strategy of Algorithm 5, where FFTW memory allocation/deallocation and plan creation/destruction are performed only once (during initialization and deinitialization).
The results of this comparison are summarized in Table 6.
From Table 6, employing FFTW plan reuse in Algorithm 5 significantly reduces the overhead of FFTW plan creation and memory management. This decreases the total detection time for 1000 sample sets from 58,410 ms to 10,593 ms, demonstrating a substantial efficiency gain in multi-sample scenarios.

6.2. Multi-Sample Detection Optimization via Optimal FFTW Plan Generation

Beyond the aforementioned method, efficiency can be further enhanced by integrating plan storage and loading (via FFTW’s wisdom feature).
While FFTW_PATIENT, FFTW_EXHAUSTIVE modes prolong plan creation, they offer unique benefits for multi-sample detection: FFTW’s “wisdom” feature can store plan-related configuration information, which is then reloaded during subsequent sample processing.
This approach provides two key advantages:
1. Plans can be generated during idle periods, further reducing preprocessing overhead.
2. The more optimal FFTW plans derived from extensive searching will further reduce the execution time of Step D4.
Both advantages contribute to a decrease in the total DFT detection time for multiple samples.
Due to space constraints, test results using flags such as FFTW_PATIENT and FFTW_EXHAUSTIVE are not presented here.

7. Conclusions

This paper presents an optimized implementation of the DFT test algorithm, which is specified in both China’s randomness testing standards and the NIST Statistical Test Suite (STS). The optimizations are achieved through three key strategies: leveraging the FFTW library for efficient Fourier transformation, reducing redundant square root computations, and refining critical code segments.
Experimental results validate significant performance improvements of the proposed optimized algorithm:
  • For a single 1-million-bit sample, the testing time is reduced from 136.37 ms (as reported in [21]) to 58.65 ms, achieving a speedup factor of approximately 3.37.
  • For a batch of 1000 1-million-bit samples, the total processing time decreases from 136.37 s (at a speed of 7.333 Mbps) to 10.593 s (at 94.402 Mbps). This corresponds to a 12.873× speedup compared to the method in [21] and a 10.290× speedup compared to [23].
For practical multi-sample testing scenarios, we recommend adopting the further optimizations outlined in Section 5 to enhance detection efficiency.
Beyond the DFT test, both the NIST STS and China’s randomness testing standards include numerous other test items, most of which exhibit potential for further optimization. This will be a primary focus of our future research.

Author Contributions

X.Y.: Writing—original draft preparation, Methodology, and Software, L.W.: Visualization and Supervision, X.Y.: Conceptualization, Investigation and Writing—review and editing; X.Y.: Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Natural Science Research of Jiangsu Higher Education Institutions (19KJB120013) and the 333 Project of Jiangsu Province (BRA2022316825).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Rukhin, A.; Soto, J.; Nechvatal, J.; Smid, M.; Barker, E.; Leigh, S.; Levenson, M.; Vangel, M.; Banks, D.; Heckert, N.; et al. A Statistical Test Suite for the Validation of Random Number Generators and Pseudo Random Number Generators for Cryptographic Applications, [EB/OL]. Version STS-2.1, NIST Special Publication 800-22rev1a (April 2010). Available online: https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-22r1a.pdf (accessed on 15 December 2025).
  2. BSI AIS-20, AIS-30; Application Notes and Interpretation of the Scheme Functionality Classes and Evaluation Methodology for Deterministic and Physical Random Number Generators. German Federal Office for Information Security: Berlin, Germany, 2008.
  3. GM/T 0005-2021; Randomness Test Specification. National Cryptography Administration: Beijing, China, 2021.
  4. Luo, Y.; Liu, D.; Kang, H.-J. NIST New Block Cipher Modes of Operation and Their Fast Implementation. Commun. Technol. 2014, 47, 1066–1070. [Google Scholar]
  5. Yang, X.-W.; Kang, H.-J. Fast Software Implementation of SM3 Hash Algorithm. CAAI Trans. Intell. Syst. 2015, 10, 954–959. [Google Scholar]
  6. Yang, X.-W.; Kang, H.-J.; Liao, Z.-H. Study On Optimization of Poker Test of Random Sequences. CAAI Trans. Intell. Syst. 2016, 11, 513–518. [Google Scholar]
  7. Luo, Y.; Zhang, W.-K.; Yin, Y.-H.; Xu, Y.-Z. Fast Implementation of Monobit Frequency Test And Frequency Test within a Block. Commun. Technol. 2015, 48, 1073–1077. [Google Scholar]
  8. Alcover, P.M.; Guillamón, A.; Ruiz, M.D.C. A New Randomness Test for Bit Sequences. Informatica 2013, 24, 339–356. [Google Scholar] [CrossRef]
  9. Kaminsky, A. GPU Parallel Statistical and Cube Test Analysis of the SHA-3 Finalist Candidate Hash Functions [EB/OL]. (13 February 2012). Available online: https://www.cs.rit.edu/~ark/parallelcrypto/sha3test01/jce2011.pdf (accessed on 15 December 2025).
  10. Li, H.; Liu, Y.; Su, M.; Wang, G. Jump and hop randomness tests for binary sequences. Cryptogr. Commun. 2022, 14, 483–502. [Google Scholar] [CrossRef]
  11. Haramoto, H. Study on upper limit of sample size for a two-level test in NIST SP800-22. Jpn. J. Ind. Appl. Math. 2021, 38, 193–209. [Google Scholar] [CrossRef]
  12. Abdelwahab, Z.H.; Elgarf, T.A.; Zekry, A. Analyzing SNOW and ZUC security algorithms using NIST SP 800-22 and enhancing their randomness. J. Cyber Secur. Mobil. 2020, 9, 535–576. [Google Scholar] [CrossRef]
  13. Deb, S.; Pal, S.; Bhuyan, B. NMRMG: Nonlinear multiple-recursive matrix generator design approaches and its randomness analysis. Wirel. Pers. Commun. 2022, 125, 577–597. [Google Scholar] [CrossRef]
  14. Xu, H.; Zhu, Q.; Zheng, W.X. Exponential stability of stochastic nonlinear delay systems subject to multiple periodic impulses. IEEE Trans. Autom. Control 2024, 69, 2621–2628. [Google Scholar] [CrossRef]
  15. Ding, K.; Zhu, Q. Intermittent static output feedback control for stochastic delayed-switched positive systems with only partially measurable information. IEEE Trans. Autom. Control 2023, 68, 8150–8157. [Google Scholar] [CrossRef]
  16. Zhu, Q. Stabilization of stochastic nonlinear delay systems with exogenous disturbances and the event-triggered feedback control. IEEE Trans. Autom. Control 2019, 64, 3764–3771. [Google Scholar] [CrossRef]
  17. Wang, B.; Zhu, Q.; Li, S. Stability analysis of discrete-time semi-Markov jump linear systems with time delay. IEEE Trans. Autom. Control 2023, 68, 6758–6765. [Google Scholar] [CrossRef]
  18. Zhao, Y.; Zhu, Q. Stabilization of stochastic highly nonlinear delay systems with neutral-term. IEEE Trans. Autom. Control 2023, 68, 2544–2551. [Google Scholar] [CrossRef]
  19. Wang, B.; Zhu, Q.; Li, S. Stabilization of discrete-time hidden semi-Markov jump linear systems with partly unknown emission probability matrix. IEEE Trans. Autom. Control 2024, 69, 1952–1959. [Google Scholar] [CrossRef]
  20. Frigo, M.; Johnson, S.G. The Design and Implementation of FFTW3. Proc. IEEE 2005, 93, 216–231. [Google Scholar] [CrossRef]
  21. NIST. NIST Statistical Test Suite. [EB/OL] NIST, 2010. Available online: https://csrc.nist.gov/projects/random-bit-generation/documentation-and-software (accessed on 15 December 2025).
  22. Sýs, M.; Říha, Z. Faster randomness testing with the NIST statistical test suite. In International Conference on Security, Privacy, and Applied Cryptography Engineering; Springer: Cham, Switzerland, 2014; pp. 272–284. [Google Scholar]
  23. Sýs, M.; Říha, Z.; Matyáš, V. Algorithm 970: Optimizing the NIST statistical test suite and the berlekamp-massey algorithm. ACM Trans. Math. Softw. 2016, 43, 1–11. [Google Scholar] [CrossRef]
Figure 1. The performance of discrete Fourier test.
Figure 1. The performance of discrete Fourier test.
Axioms 15 00004 g001
Figure 2. The flow chart of the optimization algorithm of discrete Fourier test.
Figure 2. The flow chart of the optimization algorithm of discrete Fourier test.
Axioms 15 00004 g002
Figure 3. The performance of optimized algorithm.
Figure 3. The performance of optimized algorithm.
Axioms 15 00004 g003
Figure 4. The flow chart of the optimization algorithm of discrete Fourier test with multiple samples.
Figure 4. The flow chart of the optimization algorithm of discrete Fourier test with multiple samples.
Axioms 15 00004 g004
Table 1. The test platform information.
Table 1. The test platform information.
ComponentParameter
ProcessorIntel Core i5-1135G7
2400 MHz
L1 data cache: 48 KB per core
L1 instruction cache: 32 KB per core
L2 Cache: 256 KB per core
8 MB shared L3 cache
Memory16 GB DDR3 SDRAM
Operating SystemWindows 10
CompilerVisual Studio 2022 Community Edition
Performance Evaluation AlgorithmA simplified version of the speed testing model from the European eSTREAM algorithm competition; see Algorithm 2.
Table 2. The performance of discrete Fourier test (106 bits).
Table 2. The performance of discrete Fourier test (106 bits).
StepTime Consumption (Milliseconds)Percentage
Step A11.7751.30%
Step A293.94668.89%
Step A337.24327.31%
Step A4–A92.4041.76%
Other1.0010.73%
Total136.369100.00%
Table 3. Comparison Test of Algorithm Correctness Before and After Optimization.
Table 3. Comparison Test of Algorithm Correctness Before and After Optimization.
ComponentData.eData.piData.sqrt2Data.sqrt3Data.sha1
Reference [1]P-value0.8471870.0101860.5819090.7760460.163062
Reference [21]N1475,021475,280475,060475,031475,152
P-value0.8471870.0101860.5819090.7760460.163062
This paperN1475,021475,280475,060475,031475,152
P-value0.8471870.0101860.5819090.7760460.163062
Table 4. Performance Comparison of Pre-Optimized and Optimized Algorithms.
Table 4. Performance Comparison of Pre-Optimized and Optimized Algorithms.
AlgorithmsSample Size
(Bytes)
Elapsed Time
(Milliseconds)
Speed
(106 bit/Seconds)
Reference [21]125,000136.3697.333
Reference [22]20,971,52025,0626.694
Reference [23]125,0001099.174
Algorithm 4 in this paper125,00058.65117.050
Table 5. Performance Breakdown of the Optimized Algorithm.
Table 5. Performance Breakdown of the Optimized Algorithm.
StepElapsed Time (Milliseconds)Percentage (%)
Step D1–D248.10382.02%
Step D32.4894.24%
Step D47.50112.79%
Step D50.5390.92%
Step D6–D100.0190.03%
Total58.651100.00%
Table 6. Performance of Multi-Sample DFT Test.
Table 6. Performance of Multi-Sample DFT Test.
AlgorithmsSample Size
(Bytes)
Elapsed Time
(Milliseconds)
Speed
(106 bit/Seconds)
Reference [21]125,000136.3697.333
Reference [22]20,971,52025,0626.694
Reference [23]125,0001099.174
Algorithm 4 in this paper1000 × 125,00058,41017.120
Algorithm 5 in this paper1000 × 125,00010,59394.402
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, X.; Wang, L. Research on Software Optimization for Discrete Fourier Test. Axioms 2026, 15, 4. https://doi.org/10.3390/axioms15010004

AMA Style

Yang X, Wang L. Research on Software Optimization for Discrete Fourier Test. Axioms. 2026; 15(1):4. https://doi.org/10.3390/axioms15010004

Chicago/Turabian Style

Yang, Xianwei, and Lan Wang. 2026. "Research on Software Optimization for Discrete Fourier Test" Axioms 15, no. 1: 4. https://doi.org/10.3390/axioms15010004

APA Style

Yang, X., & Wang, L. (2026). Research on Software Optimization for Discrete Fourier Test. Axioms, 15(1), 4. https://doi.org/10.3390/axioms15010004

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop