Research on Software Optimization for Discrete Fourier Test
Abstract
1. Introduction
- Evaluating pseudo-random data streams generated in accordance with specific standards;
- Testing data streams produced by cryptographic algorithms paired with specific operation modes [4];
- Assessing data streams derived from hash functions—for instance, China’s SM3 hash function [5].
2. Terms and Notations Explanation
3. Introduction to the Discrete Fourier Test
3.1. Overview of the Discrete Fourier Test Procedure
| Algorithm 1: Discrete Fourier Test Algorithm | |
| Input: Binary test sequence . Output: Test result (pass/fail) and corresponding P _ value. Execution Steps: Step A1: Convert the 0 s and 1 s in the sequence to −1 and 1, respectively, obtaining a new sequence , where Xi = 2εi − 1, 0 ≤ i ≤ n − 1. Step A2: Perform the Fourier transform on the new sequence to obtain a series of complex numbers , where. | |
| . | (4) |
| Step A3: Compute the modulus of fi, 0 ≤ i ≤ n/2 − 1, | |
| . | (5) |
| Step A4: Calculate the threshold value . Step A5: Calculate N0 = 0.475n. Step A6: Count the number of coefficients |fi| that are less than the threshold value T, denoted as N1. Step A7: Compute the statistic | |
| . | (6) |
| Step A8: Compute P-value = erfc(|V|/2). Step A9: If P-value ≥ α, then the tested sequence passes the Discrete Fourier Test. | |
3.2. Efficiency of the DFT Test
| Algorithm 2: Performance Evaluation Algorithm |
| Input: Target code segment for performance testing; number of repeated tests C (requirement: C is an odd number). Output: Statistical results of performance (i.e., timing values). Execution Steps: Step B1: Initialize the timing array such that T[i] = 0, 1 ≤ i ≤ C. Step B2: Repeat the timing of the target code segment C times, recording each timing value T[i], 1 ≤ i ≤ C. This involves two sub-steps: B2.1 Initialize a timer (denoted as A) immediately before the target code segment and another timer (denoted as B) immediately after the segment. B2.2 Calculate the elapsed time for the code segment as the difference between the two timers: T[i] = B − A. Step B3: Sort the timing sequence T[i], 1 ≤ i ≤ C in descending order to generate a non-increasing sequence . Ascending order is also acceptable as an alternative. Step B4: Extract the median value from the sorted sequence. This value serves as the statistical timing result for the target code segment, which is then output. |
4. Optimization of the DFT Algorithm
4.1. Bottleneck Analysis and Improvement Strategy
4.2. Optimization of Discrete Fourier Transform Using FFTW
| Algorithm 3: FFTW-Based Execution of the DFT Test |
| Input: Test data (binary sequence). Output: Test result (statistical outcome of the DFT Test). Execution Steps: Step C1: Allocate memory for a real-number array (size n) and a complex-number array (size n/2 + 1) using the recommended function void *fftw_malloc(size_t n). Step C2: Create a plan for the Fourier transform of the real-number array by calling fftw_plan fftw_plan_dft_r2c_1d(int n, double *in, fftw_complex *out, unsigned flags). Step C3: Execute Step A1 of the DFT Test to compute the new sequence , then store it in the pre-allocated input real-number array. Step C4: Perform the Fourier transform by calling void fftw_execute(const fftw_plan plan), thereby completing Step A2 of the DFT Test. Step C5: Sequentially execute Steps A3 to A9 of the DFT Test to derive the test result. Step C6: Destroy the plan using void fftw_destroy_plan(fftw_plan plan) and release memory for the input real-number array and output complex-number array via void fftw_free(void *p). Step C7: Return the test result computed in Step C5. |
- FFTW_MEASURE: Instructs FFTW to time multiple FFT algorithms for the target transform size, then selects the one with optimal performance.
- FFTW_ESTIMATE: Only creates a plan deemed “reasonable” by FFTW based on heuristic rules. This plan is lightweight to generate but not guaranteed to be optimal for the specific task.
- FFTW_PATIENT: Operates similarly to FFTW_MEASURE but expands the algorithm search scope to identify a more optimized plan—an advantage that becomes particularly pronounced for large-scale transformations. However, this thoroughness comes at a cost: it substantially prolongs plan creation time.
- FFTW_EXHAUSTIVE: Takes an even broader approach, evaluating algorithms that are typically dismissed as suboptimal. This exhaustive search generally yields superior plans compared to FFTW_PATIENT but, as expected, results in significantly longer plan generation times.
- Wisdom is runtime-environment-specific: Any change to the runtime environment (e.g., hardware, operating system, or compiler) requires wisdom to be regenerated.
- Wisdom is program-specific: Modifications to the program (e.g., adjustments to transform parameters or input size) also necessitate regenerating wisdom.
4.3. Further Optimization of Steps A3–A6
Eliminating Redundant Square Root Operations
4.4. Description of Optimized Algorithm
| Algorithm 4: Optimized Implementation of the Discrete Fourier Test | |
| Input: Test data (binary sequence). Output: Test result (pass/fail with corresponding P-value). Execution Steps: Step D1: Allocate memory for a real-number array (size n) and a complex-number array (size n/2 + 1) using the fftw_malloc function. Step D2: Create a plan for the Fourier transform of the real-number array by calling fftw_plan_dft_r2c_1d. Step D3: Convert the binary sequence (composed of 0 s and 1 s) to (where Xi = 2εi − 1 for 0 ≤ i ≤ n − 1), yielding a new sequence of −1 s and 1 s. Step D4: Perform the Fourier transform by calling fftw_execute, resulting in a series of complex numbers . Step D5: For fi (where 0 ≤ i ≤ n/2 − 1), compute the squared modulus |fi|2 = a2 + b2 (where fi = a + bi). Step D6: Calculate N0 = 0.475n, then count N1 as the number of coefficients |fi|2 that are less than the threshold T2 = 2.995732274n. Step D7: Compute the statistical value: | |
| (7) | |
| Step D8: Compute the P-value = erfc(|V|/2). Step D9: If erfc(|V|/2) meets the predefined criterion (e.g., P-value ≥ 0.01 for significance), the tested sequence passes the DFT Test. Step D10: Destroy the FFTW plan and release memory for the input real-number array and output complex-number array using fftw_free. Return the test result. | |
5. Experimental Setup and Results
6. Further Optimization for Multi-Sample Detection
6.1. Multi-Sample Detection Optimization via FFTW Plan Reuse
| Algorithm 5: Optimized Multi-Sample DFT Detection |
| Input: Multi-sample test data (multiple binary sequences). Output: Detection results for all samples. Execution Steps: Step E1: During module initialization, execute Steps D1 and D2, and cache relevant resources: real-number arrays, complex-number arrays, and FFTW plans. Step E2: For each test sample, use the cached FFTW plans from Step E1 to execute Steps D3–D9, completing individual sample detection. Step E3: After all samples are processed, execute Step D10 during module deinitialization: destroy FFTW plans and release memory for the cached real-number and complex-number arrays. |
6.2. Multi-Sample Detection Optimization via Optimal FFTW Plan Generation
7. Conclusions
- For a single 1-million-bit sample, the testing time is reduced from 136.37 ms (as reported in [21]) to 58.65 ms, achieving a speedup factor of approximately 3.37.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Rukhin, A.; Soto, J.; Nechvatal, J.; Smid, M.; Barker, E.; Leigh, S.; Levenson, M.; Vangel, M.; Banks, D.; Heckert, N.; et al. A Statistical Test Suite for the Validation of Random Number Generators and Pseudo Random Number Generators for Cryptographic Applications, [EB/OL]. Version STS-2.1, NIST Special Publication 800-22rev1a (April 2010). Available online: https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-22r1a.pdf (accessed on 15 December 2025).
- BSI AIS-20, AIS-30; Application Notes and Interpretation of the Scheme Functionality Classes and Evaluation Methodology for Deterministic and Physical Random Number Generators. German Federal Office for Information Security: Berlin, Germany, 2008.
- GM/T 0005-2021; Randomness Test Specification. National Cryptography Administration: Beijing, China, 2021.
- Luo, Y.; Liu, D.; Kang, H.-J. NIST New Block Cipher Modes of Operation and Their Fast Implementation. Commun. Technol. 2014, 47, 1066–1070. [Google Scholar]
- Yang, X.-W.; Kang, H.-J. Fast Software Implementation of SM3 Hash Algorithm. CAAI Trans. Intell. Syst. 2015, 10, 954–959. [Google Scholar]
- Yang, X.-W.; Kang, H.-J.; Liao, Z.-H. Study On Optimization of Poker Test of Random Sequences. CAAI Trans. Intell. Syst. 2016, 11, 513–518. [Google Scholar]
- Luo, Y.; Zhang, W.-K.; Yin, Y.-H.; Xu, Y.-Z. Fast Implementation of Monobit Frequency Test And Frequency Test within a Block. Commun. Technol. 2015, 48, 1073–1077. [Google Scholar]
- Alcover, P.M.; Guillamón, A.; Ruiz, M.D.C. A New Randomness Test for Bit Sequences. Informatica 2013, 24, 339–356. [Google Scholar] [CrossRef]
- Kaminsky, A. GPU Parallel Statistical and Cube Test Analysis of the SHA-3 Finalist Candidate Hash Functions [EB/OL]. (13 February 2012). Available online: https://www.cs.rit.edu/~ark/parallelcrypto/sha3test01/jce2011.pdf (accessed on 15 December 2025).
- Li, H.; Liu, Y.; Su, M.; Wang, G. Jump and hop randomness tests for binary sequences. Cryptogr. Commun. 2022, 14, 483–502. [Google Scholar] [CrossRef]
- Haramoto, H. Study on upper limit of sample size for a two-level test in NIST SP800-22. Jpn. J. Ind. Appl. Math. 2021, 38, 193–209. [Google Scholar] [CrossRef]
- Abdelwahab, Z.H.; Elgarf, T.A.; Zekry, A. Analyzing SNOW and ZUC security algorithms using NIST SP 800-22 and enhancing their randomness. J. Cyber Secur. Mobil. 2020, 9, 535–576. [Google Scholar] [CrossRef]
- Deb, S.; Pal, S.; Bhuyan, B. NMRMG: Nonlinear multiple-recursive matrix generator design approaches and its randomness analysis. Wirel. Pers. Commun. 2022, 125, 577–597. [Google Scholar] [CrossRef]
- Xu, H.; Zhu, Q.; Zheng, W.X. Exponential stability of stochastic nonlinear delay systems subject to multiple periodic impulses. IEEE Trans. Autom. Control 2024, 69, 2621–2628. [Google Scholar] [CrossRef]
- Ding, K.; Zhu, Q. Intermittent static output feedback control for stochastic delayed-switched positive systems with only partially measurable information. IEEE Trans. Autom. Control 2023, 68, 8150–8157. [Google Scholar] [CrossRef]
- Zhu, Q. Stabilization of stochastic nonlinear delay systems with exogenous disturbances and the event-triggered feedback control. IEEE Trans. Autom. Control 2019, 64, 3764–3771. [Google Scholar] [CrossRef]
- Wang, B.; Zhu, Q.; Li, S. Stability analysis of discrete-time semi-Markov jump linear systems with time delay. IEEE Trans. Autom. Control 2023, 68, 6758–6765. [Google Scholar] [CrossRef]
- Zhao, Y.; Zhu, Q. Stabilization of stochastic highly nonlinear delay systems with neutral-term. IEEE Trans. Autom. Control 2023, 68, 2544–2551. [Google Scholar] [CrossRef]
- Wang, B.; Zhu, Q.; Li, S. Stabilization of discrete-time hidden semi-Markov jump linear systems with partly unknown emission probability matrix. IEEE Trans. Autom. Control 2024, 69, 1952–1959. [Google Scholar] [CrossRef]
- Frigo, M.; Johnson, S.G. The Design and Implementation of FFTW3. Proc. IEEE 2005, 93, 216–231. [Google Scholar] [CrossRef]
- NIST. NIST Statistical Test Suite. [EB/OL] NIST, 2010. Available online: https://csrc.nist.gov/projects/random-bit-generation/documentation-and-software (accessed on 15 December 2025).
- Sýs, M.; Říha, Z. Faster randomness testing with the NIST statistical test suite. In International Conference on Security, Privacy, and Applied Cryptography Engineering; Springer: Cham, Switzerland, 2014; pp. 272–284. [Google Scholar]
- Sýs, M.; Říha, Z.; Matyáš, V. Algorithm 970: Optimizing the NIST statistical test suite and the berlekamp-massey algorithm. ACM Trans. Math. Softw. 2016, 43, 1–11. [Google Scholar] [CrossRef]




| Component | Parameter |
|---|---|
| Processor | Intel Core i5-1135G7 2400 MHz L1 data cache: 48 KB per core L1 instruction cache: 32 KB per core L2 Cache: 256 KB per core 8 MB shared L3 cache |
| Memory | 16 GB DDR3 SDRAM |
| Operating System | Windows 10 |
| Compiler | Visual Studio 2022 Community Edition |
| Performance Evaluation Algorithm | A simplified version of the speed testing model from the European eSTREAM algorithm competition; see Algorithm 2. |
| Step | Time Consumption (Milliseconds) | Percentage |
|---|---|---|
| Step A1 | 1.775 | 1.30% |
| Step A2 | 93.946 | 68.89% |
| Step A3 | 37.243 | 27.31% |
| Step A4–A9 | 2.404 | 1.76% |
| Other | 1.001 | 0.73% |
| Total | 136.369 | 100.00% |
| Component | Data.e | Data.pi | Data.sqrt2 | Data.sqrt3 | Data.sha1 | |
|---|---|---|---|---|---|---|
| Reference [1] | P-value | 0.847187 | 0.010186 | 0.581909 | 0.776046 | 0.163062 |
| Reference [21] | N1 | 475,021 | 475,280 | 475,060 | 475,031 | 475,152 |
| P-value | 0.847187 | 0.010186 | 0.581909 | 0.776046 | 0.163062 | |
| This paper | N1 | 475,021 | 475,280 | 475,060 | 475,031 | 475,152 |
| P-value | 0.847187 | 0.010186 | 0.581909 | 0.776046 | 0.163062 | |
| Algorithms | Sample Size (Bytes) | Elapsed Time (Milliseconds) | Speed (106 bit/Seconds) |
|---|---|---|---|
| Reference [21] | 125,000 | 136.369 | 7.333 |
| Reference [22] | 20,971,520 | 25,062 | 6.694 |
| Reference [23] | 125,000 | 109 | 9.174 |
| Algorithm 4 in this paper | 125,000 | 58.651 | 17.050 |
| Step | Elapsed Time (Milliseconds) | Percentage (%) |
|---|---|---|
| Step D1–D2 | 48.103 | 82.02% |
| Step D3 | 2.489 | 4.24% |
| Step D4 | 7.501 | 12.79% |
| Step D5 | 0.539 | 0.92% |
| Step D6–D10 | 0.019 | 0.03% |
| Total | 58.651 | 100.00% |
| Algorithms | Sample Size (Bytes) | Elapsed Time (Milliseconds) | Speed (106 bit/Seconds) |
|---|---|---|---|
| Reference [21] | 125,000 | 136.369 | 7.333 |
| Reference [22] | 20,971,520 | 25,062 | 6.694 |
| Reference [23] | 125,000 | 109 | 9.174 |
| Algorithm 4 in this paper | 1000 × 125,000 | 58,410 | 17.120 |
| Algorithm 5 in this paper | 1000 × 125,000 | 10,593 | 94.402 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Yang, X.; Wang, L. Research on Software Optimization for Discrete Fourier Test. Axioms 2026, 15, 4. https://doi.org/10.3390/axioms15010004
Yang X, Wang L. Research on Software Optimization for Discrete Fourier Test. Axioms. 2026; 15(1):4. https://doi.org/10.3390/axioms15010004
Chicago/Turabian StyleYang, Xianwei, and Lan Wang. 2026. "Research on Software Optimization for Discrete Fourier Test" Axioms 15, no. 1: 4. https://doi.org/10.3390/axioms15010004
APA StyleYang, X., & Wang, L. (2026). Research on Software Optimization for Discrete Fourier Test. Axioms, 15(1), 4. https://doi.org/10.3390/axioms15010004

