Automated Framework for Testing Random Number Generators for IoT Security Applications Using NIST SP 800-22

Castillo, Juan; Aran Vila, Pere; Palacio, Francisco; Garrido, Blas; Hernández, Sergi; Cirera, Albert

doi:10.3390/iot7010026

Open AccessArticle

Automated Framework for Testing Random Number Generators for IoT Security Applications Using NIST SP 800-22

by

Juan Castillo

^1,2,*

,

Pere Aran Vila

¹

,

Francisco Palacio

¹

,

Blas Garrido

¹,

Sergi Hernández

¹

and

Albert Cirera

¹

Departament d’Enginyeria Electrònica i Biomèdica Carrer de Martí i Franquès, Universitat de Barcelona, 1-11, Les Corts, 08028 Barcelona, Spain

²

Departamento de Sistemas, Universidad de Nariño, Calle 18 Carrera 50 Torobajo, San Juan de Pasto 520001, Colombia

^*

Author to whom correspondence should be addressed.

IoT 2026, 7(1), 26; https://doi.org/10.3390/iot7010026

Submission received: 27 January 2026 / Revised: 20 February 2026 / Accepted: 5 March 2026 / Published: 7 March 2026

(This article belongs to the Topic Privacy Challenges and Solutions in the Internet of Things)

Download

Browse Figures

Versions Notes

Abstract

The continuous expansion of the Internet of Things (IoT) has intensified the need to evaluate and guarantee the quality of entropy sources used in random number generation, an essential element in securing communications used in IoT ecosystems. This work presents an automated and web-based framework designed to execute and analyze the results of statistical tests defined in the NIST SP 800-22 standard, enabling systematic assessment of entropy sources and random numbers generators in IoT devices and environments. The proposed system integrates a Python-based backend built upon an optimized implementation of the original NIST suite, along with an intuitive web interface that facilitates configuration, monitoring, and parallel execution of tests through Representational State Transfer (REST) endpoints. Session management based on Redis ensures reliable and concurrent operation of multiple users or devices while maintaining isolation and data integrity. To demonstrate its applicability, an emulated IoT ecosystem was implemented in which multiple virtual devices periodically and asynchronously request real-time validation of their local random numbers generators. The obtained results confirm the system’s capability to detect deficiencies in pseudo random generators and validate true random number sources, highlighting its potential as a diagnostic and verification tool for distributed IoT security systems. The tool developed in this work is fully accessible to the public, allowing researchers, engineers, and practitioners to evaluate random number generators without requiring specialized hardware or proprietary software.

Keywords:

NIST SP 800-22; randomness; entropy sources; IoT; security; cryptography; automation; frameworks; RGN

1. Introduction

Nowadays, an increasing number of electronic devices are interconnected through the Internet to exchange information either with people or other devices enabling automation and data-driven decision-making in diverse domains such as healthcare, transportation, and industrial systems. This pervasive inter connectivity, known as the Internet of Things (IoT), has led to an exponential growth in the volume and sensitivity of transmitted data. Consequently, ensuring the protection, reliability, confidentiality, and integrity of this information has become a fundamental challenge for both researchers and industry practitioners [1,2].

The cryptographic strength of many widely used public-key schemes relies on the computational infeasibility of solving certain number-theoretic problems within a practical time frame. These cryptographic primitives underpin secure communication protocols such as TLS and SSL widely used today. However, the reliability and robustness of such algorithms critically depend on the quality of the random numbers used for key generation, nonce creation, and initialization vectors. Weak or biased randomness can lead to predictable keys or reproducible cryptographic material compromising overall system security [3,4]. Therefore, assessing the statistical quality of random number generators (RNGs) is essential across domains such as cryptography, simulation, and embedded-systems verification.

RNGs underpin stochastic modelling and Monte-Carlo methods in scientific computing, where any bias or structure in the output can compromise research validity [5]. In cryptographic applications, weak entropy sources jeopardize key generation, nonces, and initialization vectors, weakening even mathematically robust algorithms [6]. In embedded and IoT devices, hardware and resource limitations often expose RNGs to non-ideal conditions, making rigorous statistical evaluation crucial for reliable operation [7] since we depend on high-quality RNGs and their continuous evaluation as proposed in this work, especially in critical ecosystems such as IoT.

A true random number generator (TRNG) relies on the inherent unpredictability of physical noise sources to produce non-deterministic bit sequences. Different implementations use a variety of entropy sources, including electronic noise [8], hybrid architectures combining memristors with microcontrollers [9], random telegraphic noise (RTN) [10] among others. These approaches show that TRNGs can be realized through diverse physical phenomena offering robust and flexible solutions for security-critical applications. To ensure their reliability, TRNGs must produce sequences indistinguishable from ideal randomness, while pseudorandom number generators (PRNGs) emulate such behavior deterministically [11]. Rigorous evaluation of both types ensures compliance with international randomness standards.

The National Institute of Standards and Technology (NIST), part of the U.S. Department of Commerce, developed a comprehensive suite of 15 statistical tests known as NIST SP 800-22 [12], designed to evaluate whether a binary sequence exhibits properties consistent with randomness. Alongside the specification document, NIST released an original implementation in the C programming language. Despite its robustness, this reference implementation developed more than two decades ago, poses challenges when compiled or executed in modern environments. Its reliance on legacy tools, rigid output structure, and limited configurability often hinder integration into automated workflows. Moreover, the post-processing of test results requires additional utilities to interpret and summarize data stored in static directory structures.

Several authors have proposed enhancements to the original algorithms and implementations [13,14], including versions rewritten in modern programming languages such as Python or C++ [15,16]. Among these, the implementation known as Fast NIST [17] stands out for significantly reducing the execution time of computationally intensive tests. However, it still inherits certain limitations from the original software, particularly regarding the fixed organization of results and the lack of a graphical interface.

Recent efforts have produced modern reimplementations of the NIST test suite in languages such as Python, providing enhanced usability and integration with existing data analysis pipelines. For example, ref. [18] compared three open-source Python-based implementations, highlighting the challenges of adherence to the original test specifications and the impact on randomness assessments. Optimizations for computational efficiency have also been proposed, such as in [19], which introduces fast software implementations of the Serial Test and Approximate Entropy Test by replacing bit-level operations with byte-level operations, achieving speedups exceeding 2× over the original methods. Combining test parameters further enhanced performance with speedups above 4× compared to individual test implementations. Beyond standalone software, several web-based platforms have been developed to perform statistical randomness testing directly through web browsers, eliminating the need for local installation or configuration. However, the review of available implementations [20,21,22] reveals that many of these systems either limit their functionality to a subset of the NIST SP 800-22 tests, offer incomplete result reporting, or are no longer actively maintained. Although such platforms improve accessibility and user experience, they often lack transparency regarding parameter configuration, reproducibility of results, or full adherence to NIST specifications, issues that still constrain their suitability for research and validation workflows. In contrast, the testing platform presented in this work provides a fully accessible and user-friendly framework that can be used from any computer with an Internet connection, without requiring installation or specialized software or hardware. The system supports multiple simultaneous users through isolated session handling, ensuring independent and conflict-free execution of experiments. All tests included in the NIST SP 800-22 suite are implemented with complete compatibility in output format and statistical interpretation, enabling reproducibility and adherence to established standards. Furthermore, the platform exposes a dedicated API that allows both users and IoT devices to submit binary sequences, execute the full set of tests programmatically, and retrieve structured results, thereby broadening its applicability for automated validation workflows and large-scale experimentation.

In April 2022, the National Institute of Standards and Technology (NIST) announced its decision to revise Special Publication 800-22 Rev. 1a, following multiple rounds of public feedback and evaluation [23]. This has not materialized into a public draft to date.

The reviewed literature and practical implementations indicate the importance of refining and adapting the original NIST SP 800-22 suite for modern use cases. Interoperability of test results, adherence to specifications, scalability, and integration with contemporary software ecosystems remain active areas of advancement. Online implementations particularly represent a growing trend towards democratizing access to cryptographic randomness evaluation tools.

To address these issues, this work presents a web-based framework that automates the test execution and analysis of randomness under the NIST SP 800-22 specification. The proposed system provides an intuitive user interface for uploading data files, selecting specific tests, and obtaining detailed results almost instantly. Its backend adopts a Model–View–Controller (MVC) architecture that allows parallel execution of independent tests, thereby reducing total processing time.

Beyond NIST SP 800-22, the application also incorporates an entropy estimation module based on the ent utility, which is widely available in most GNU/Linux distributions. This opens the possibility of efficiently adding new tests to the server in the future, allowing the researchers to quickly and easily assess the quality of random numbers without installing additional software or performing manual result processing.

2. Materials and Methods

The proposed system consists of two main components: a Python-based backend and a frontend for user interaction. The general architecture is shown in Figure 1.

The platform integrates the compiled Fast NIST executable—slightly modified to accept an output directory as a runtime parameter—within a Python-based backend that acts as a wrapper for the optimized implementation presented in [17]. The backend exposes a set of endpoints for data upload, test execution, and result retrieval, which can be accessed either from the web-based application or directly by IoT devices through HTTPS requests. Upon receiving test parameters, it executes the corresponding NIST SP 800-22 tests using the compiled executable, collects and organizes the results, and performs a preliminary statistical evaluation to determine whether each test passes or fails, thereby streamlining the overall workflow and ensuring compatibility with the original software output format.

The frontend, developed with HTML5, CSS3, and Vanilla JavaScript, provides a lightweight and intuitive interface. Users can upload binary sequence files, configure parameters such as the number of sequences, bit length, and block size for tests requiring it. Progress and results are displayed almost immediately. All requests to the backend are managed via Representational State Transfer (REST) endpoints, enabling simultaneous test execution and significantly improving system performance.

To isolate execution contexts when multiple users use the tool concurrently, session handling was implemented using Redis, an in-memory key-value database. The backend’s static content is served through Nginx web server with SSL encryption via Let’s Encrypt certificates. The endpoints were implemented in Python 3.13 using the Flask framework, and all components were deployed on a server with a 16-core Intel Xeon processor @2.8 Ghz, 128 GB RAM, and 500 GB HDD, running GNU/Linux Ubuntu Server 24.04 LTS. The domain name was registered and managed through the No-IP service. Access to the tool requires a computer with Internet connectivity and a browser compatible with the mentioned standards. Our online NIST testing tool is publicly accessible at https://mindub.ddns.net/ (accessed on 4 March 2026). The Figure 2 shows a screenshoot of the graphical user interface.

To evaluate our NIST testing tool, a representative sample of random numbers was generated using an ESP32-C3 RISC-V microcontroller developed by Espressif^®, a low-cost device featuring integrated Wi-Fi and Bluetooth, widely employed in prototyping, research projects, and commercial IoT applications. A custom program was implemented leveraging the Espressif IoT Development Framework framework (ESP-IDF), following the workflow depicted in Figure 3, to produce sufficiently long binary sequences in accordance with the NIST SP 800-22 specifications (i.e., a minimum of 55 sequences of 1,000,000 bits each). For the experiments reported in this study, a dataset of 2 GB was generated utilizing the hardware-based TRNG of the ESP32-C3.

From this dataset, several experiments were conducted, each consisting of 1000 sequences of 1,000,000 bits. The generated sequences were analyzed using the proposed web-based tool. To validate the correctness and reliability of the tool, the same sequences were also evaluated using the original NIST software, and the outcomes were found to be consistent, confirming that the tool faithfully reproduces the results of the original implementation while providing enhanced usability, automated processing, and real-time reporting.

To demonstrate the practical applicability of the proposed framework, a prototype emulation of an IoT ecosystem was implemented, comprising multiple interconnected devices. Each emulated device periodically generates local random bit sequences, which are subsequently validated in real time by the framework before being used as seeds for cryptographic operations during data transmission.

The emulation was conducted using a Python-based script that simulates the validation requests of N IoT devices. Each device is represented by an independent thread that sends requests to the backend at randomized intervals, emulating asynchronous operation in a real-world network. Validation results are logged and visualized in real time, enabling assessment of how the dynamic quality of local entropy sources impacts the security of communications within a multi-node, concurrent environment.

The system ensures simultaneous evaluation of multiple devices without interference, maintaining data integrity and consistency of results. In addition, the framework allows execution of supplementary entropy tests via the GNU/Linux ent utility through the same interface, and its modular design facilitates the integration of new randomness tests in future expansions. The topology of the emulated IoT ecosystem is depicted in Figure 4.

To further evaluate the framework’s capability to distinguish between high- and low-quality entropy sources, the IoT emulation included a set of four devices that implement software-based pseudo-random number generators (PRNGs) and the hardware TRNG of the GNU/Linux. Each device continuously transmitted generated sequences to the backend for real-time NIST SP 800-22 testing. For testing purposes, the number of sequences was limited to 55 and the length of 1,000,000 bits per sequence was maintained according to NIST specification. The results demonstrate that the framework can reliably detect the statistical deficiencies of PRNG-generated sequences in comparison to true random sequences. This highlights the system’s potential for immediate identification of inadequate entropy sources in cryptographic applications, providing actionable insights for developers and security engineers in IoT deployments.

3. Results

Table 1 presents the results of applying the NIST SP 800-22 suite to 1000 sequences of 1,000,000 bits from the 2 GB dataset generated by the ESP32-C3 as described above.

The table includes Chi-square p-values (Chi²), Kolmogorov-Smirnov (KS) p-values, number of sequence n, number of approved sequences

n_{pass}

, success proportion proportion, pass rate passrate, and global results for each test. The Chi-square statistic quantifies the deviation between the observed and expected frequency distributions, serving as an indicator of how well the random data conforms to a uniform distribution [24]. The KS test measures the maximum absolute difference between the empirical cumulative distribution function (ECDF) of the observed data and the cumulative distribution function (CDF) of the reference theoretical distribution, which in the context of randomness testing is typically the uniform distribution. This statistic quantifies the largest deviation between the empirical and ideal behavior expected from a perfectly random sequence. Owing to its non-parametric nature, the KS test is particularly valuable because it does not assume any specific form of the underlying distribution, making it sensitive to a wide range of deviations from uniformity [25,26].

According to NIST SP 800-22, each statistical test is applied to a set of n independent binary sequences produced by the generator under evaluation. Every execution of a given test on a single sequence yields a p-value; that sequence is considered to have passed the test when its p-value satisfies

p > α

, where

α

is the significance level (the probability of a Type I error). NIST recommends

α = 0.01

for the suite, i.e., a 1% nominal probability of incorrectly rejecting a truly random sequence. After processing all n sequences, the observed success proportion is compared to the expected value and a goodness-of-fit analysis is performed on the collection of p-values. In this second stage the Kolmogorov–Smirnov (KS) test (or a chi-square test on binned p-values) is used to assess whether the distribution of the p-values is consistent with the uniform distribution on

[0, 1]

. The KS p-value is not used as a per-sequence decision rule against

α

; rather it provides a global measure of uniformity: a large KS p-value indicates that the ensemble of p-values behaves as expected for randomness, while a small KS p-value signals systematic deviation from uniformity and therefore potential non-randomness in the generator. This two-tier procedure (per-sequence acceptance with

α

, followed by global uniformity testing) combines sensitivity to individual failures with a test for collective anomalies in the p-value distribution [25,27]. The observed pass proportion is then given by the Equation (1)

p r o p o r t i o n = \frac{n_{pass}}{n},

(1)

where

n_{pass}

is the number of sequences that passed the test.

In order to determine whether the generator passes a given statistical test, NIST defines a minimum acceptable pass rate that accounts for statistical variability due to the finite sample size. This value is given by the Equation (2)

p a s s r a t e = (1 - α) - 3.0 \sqrt{\frac{α (1 - α)}{n}},

(2)

where the constant

3.0

adopted by NIST corresponds approximately to the

99.73 %

confidence interval of the normal distribution. Therefore, a test is considered successful if the observed pass proportion satisfies Equation (3)

p r o p o r t i o n \geq p a s s r a t e .

(3)

This expression ensures that minor statistical fluctuations due to limited sample size do not incorrectly classify a test as failed. In other words, Equation (2) defines a statistically justified lower bound on the expected pass rate with a confidence level of 99%.

Within the NIST SP 800-22 statistical test suite, certain tests produce more than one p-value per analyzed sequence. This occurs because these tests evaluate the same statistical property under multiple conditions or subpatterns. For instance, the Cumulative Sums test performs two independent evaluations (forward and backward), while the Random Excursions and Random Excursions Variant tests generate several p-values for each valid state of the random walk, depending on the number of state visits in each sequence. Similarly, the Serial test computes two p-values corresponding to patterns of length m and

m - 1

, and the Non-overlapping Template Matching test can generate up to 148 p-values, one for each analyzed template.

4. Discussion

In the original NIST implementation, each of these p-values is treated as an independent statistical observation, and the overall test result is obtained by analyzing the combined distribution of all p-values. Specifically, the uniformity of the p-values is evaluated using the chi-square goodness-of-fit test, and the proportion of sequences passing the significance criterion (

p > α

) is computed over the entire consolidated set.

Following this approach, in the present work all p-values generated by a multi-output test were aggregated into a single column vector, which was subsequently used to calculate the chi-square and Kolmogorov–Smirnov statistics, the pass proportion, and the overall pass rate. This ensures that the final result of each NIST test corresponds to a single aggregated metric, thus maintaining consistency with the methodology applied to single-output tests.

Regarding performance, Table 2 presents the execution times obtained for the NIST SP 800-22 tests using three different implementations: the original NIST implementation, the optimized Fast NIST version, and the proposed system described in this work. The test was performed using 55 independent sequences, each containing 1,000,000 bits, extracted from a 2 GB dataset of random binary sequences generated by the ESP32-C3 microcontroller. This setup ensures reproducibility and provides a representative sample for evaluating the statistical properties of the generated data. The results demonstrate a significant reduction in execution time compared to the official implementation, particularly in computationally intensive tests such as Linear Complexity and Non-overlapping Template. The optimized Fast NIST offers substantial improvement with regards original NIST. The proposed architecture of this work partially takes advantage of this improvement, but a substantial additional performance gains through asynchronous task management, concurrent test execution, and efficient server-side processing.

It should be noted that in Table 2, the total execution time reported for this work (1750.60 ms) differs from the NIST and Fast NIST columns, where tests are executed sequentially. In our framework, the reported time is not the arithmetic sum of individual test durations because multiple tests are executed in parallel across independent threads. This cumulative value, retrieved from the web application’s Activity Log, represents the total wall-clock time from the initiation of the first task to the completion of the last one. Furthermore, these results may exhibit subtle variations depending on the server’s instantaneous workload and CPU availability at the time of execution, as concurrent requests and resource scheduling in a multi-user environment can influence the final response time and computational overhead.

The observed performance gain in the Non-overlapping Template test is primarily due to the multi-core task distribution and the asynchronous execution model. These results reflect an optimal resource allocation by the server’s task manager during the case study.

Additionally, the developed tool executes an entropy test using the ent utility package available for GNU/Linux systems. This test complements the NIST SP 800-22 suite by measuring the average amount of information produced by a random source. The ent utility estimates the randomness of a file through several statistical indicators to evaluate the quality of random sequences: the entropy (measured in bits per byte) represents the average information content of each byte, with a perfectly random sequence approaching 8.0 bits per byte; lower values indicate redundancy or predictability. The optimum compression percentage estimates how much a file could be compressed without information loss, with values near 0% corresponding to highly random data. The chi-square test compares the distribution of byte values (0–255) against a uniform distribution, outputting a percentage representing how often a truly random sequence would exceed the observed chi-square value. The arithmetic mean value of data bytes reflects the average byte value, expected to be close to 127.5 in random sequences, with deviations revealing potential bias. The Monte Carlo estimate of $π$ uses byte pairs as coordinates in a unit square to assess the uniformity of data distribution, with estimates closer to

π

indicating higher uniformity. Finally, the serial correlation coefficient measures the relationship between consecutive bytes; values near 0 indicate no correlation, whereas values approaching +1 or –1 reveal strong correlation or anti-correlation, exposing non-random structures. Table 3 show the results obtained for the 2GB dataset file generated as indicated above.

On the other hand, to emulate the behavior of an IoT ecosystem using the framework, a Python script was developed to simulate multiple IoT devices performing randomness–validation requests to the backend via HTTPS. For experimental purposes, each simulated device randomly selects, at every execution cycle, between two internal sources: (i) a TRNG-like source based on the GNU/Linux /dev/urandom interface, and (ii) a PRNG implemented using Python’s random module, which relies on the Mersenne Twister algorithm. Since Mersenne Twister is deterministic and not intended for cryptographic use, it provides a suitable contrast against the higher-entropy output typically obtained from /dev/urandom. Each emulated client generates in memory 55 sequences of 1,000,000 bits (the minimum recommended by NIST) and periodically submits them for validation using independent threads and randomized time intervals, effectively reproducing the asynchronous and concurrent nature of real IoT communication. For each validation request, five representative NIST statistical tests were executed, as summarized in Table 4. It is important to emphasize that the use of a PRNG does not necessarily imply that statistical tests will always fail. When the number of sequences is small or when only a limited subset of tests is applied, certain deterministic patterns may remain undetected, allowing the PRNG to temporarily produce outputs that satisfy the NIST SP 800-22 criteria. This behavior is expected, as the statistical power of the tests increases with both sample size and test diversity. Consequently, passing a small set of tests should not be interpreted as evidence of cryptographic strength, but rather as a reminder that deterministic generators may mimic randomness under restricted testing conditions.

For each client request, the backend processes the data and returns the results in real time, including key statistical metrics and the execution time of each test. The execution time reflects the computational performance and responsiveness of the system under varying workloads. Representative results obtained from four emulated IoT clients in one experiment are summarized in Table 5, demonstrating the system’s scalability, responsiveness, and robustness when handling concurrent, high-load scenarios. As can be observed in the table, all the statistical tests applied to the true random number generator (TRNG) passed successfully, achieving proportions above the minimum pass rate threshold. However, for the pseudo-random number generator (PRNG), the Frequency (IoT Device Number 3) and Longest Run of Ones in a Block (IoT Device Number 2) tests reported a FAILURE indicating that, for the corresponding device, the generated sequence did not exhibit the expected uniformity. This is because a pseudo random number generator (PRNG) relies on deterministic algorithms, which means its output inevitably contains structural patterns that can cause certain NIST SP 800-22 tests to fail, especially when the internal state or seeding process is not ideal. In contrast, a true random number generator (TRNG) extracts randomness from a physical source of entropy, such as thermal noise or quantum effects, which naturally produces sequences with no algorithmic structure. As a result, TRNG outputs tend to pass the NIST statistical tests more consistently, since they more closely resemble the behavior of an ideal random process. This highlights the importance of studying and developing high-quality entropy sources.

5. Conclusions

This work presented the development of an automated framework for executing and analyzing the randomness tests defined in the NIST SP 800-22 standard, designed to evaluate entropy sources used in security applications within the Internet of Things (IoT). The proposed solution integrates a Python-based backend and a lightweight, browser-accessible frontend, enabling remote, concurrent, and reproducible execution of statistical tests on binary sequences generated by different devices or systems.

The system’s modular architecture, based on REST endpoints and Redis-managed sessions, allows the simultaneous management of multiple users or devices while ensuring data isolation and integrity of results. Deployment on a GNU/Linux environment with Flask and Nginx provides stability, security, and scalability, allowing the framework to be adapted easily to both experimental and production contexts.

Experimental validation was conducted using 2 GB of data generated by the hardware TRNG of the ESP32-C3, from which 1000 sequences of 1,000,000 bits were evaluated. All fifteen statistical tests defined in NIST SP 800-22 were successfully passed, with observed proportions exceeding the minimum required pass rates and global p-value distributions consistent with theoretical expectations. Cross-validation against the official NIST implementation and Fast NIST confirmed statistical equivalence of results, demonstrating the correctness of the proposed framework.

In terms of performance, the proposed architecture reduced total execution time to approximately 1.75 s for 55 sequences of 1 Mbit, outperforming both the original NIST implementation (∼49.3 s) and Fast NIST (∼2.27 s). This improvement is achieved through asynchronous execution and efficient task parallelization, making the framework suitable for large-scale and near real-time evaluations. Complementary entropy analysis using the ent tool yielded an estimated entropy of 7.999999 bits per byte with negligible serial correlation, further supporting the statistical quality of the evaluated source.

Validation in a simulated IoT environment demonstrated the framework’s ability to manage concurrent requests from heterogeneous devices and to detect statistical deficiencies in deterministic generators when compared with true entropy sources. These results confirm the robustness, scalability, and diagnostic capability of the system for distributed and resource-constrained environments.

Finally, the tool has been made publicly accessible through an open and easy-to-use framework that promotes transparency, reproducibility, and broader adoption of statistical testing practices. Its modular architecture makes it easily extensible, enabling future integration of additional statistical test suites such as Dieharder and deployment in real IoT testbeds for continuous entropy monitoring in emerging IoT and cryptographic applications.

Author Contributions

Conceptualization, J.C. and A.C.; methodology, A.C.; software, J.C.; validation, F.P., S.H. and B.G.; investigation, J.C.; resources, A.C.; writing—original draft preparation, J.C.; writing—review and editing, J.C., P.A.V., F.P., S.H., B.G. and A.C.; supervision, A.C. All authors have read and agreed to the published version of the manuscript.

Funding

The authors wish to acknowledge the financial support from Science, Innovation and Universities Ministry of Spain for FLEXRAM Project TED2021-129643B-I00 and LIP-FREE Project (PID2022-140978OB-I00). J. Castillo, also thanks the Universidad de Nariño (San Juan de Pasto, Colombia), for providing a funded study leave.

Data Availability Statement

The 2 GB dataset used in this study is publicly available at https://mindub.ddns.net/SAMPLE.BIN (accessed on 4 March 2026). The source code of the software developed for this work is available at https://github.com/jjcastilloj/rtt (accessed on 4 March 2026).

Acknowledgments

During the preparation of this manuscript/study, the author(s) used OpenAI ChatGPT 4 and Google Gemini 2.5 Pro for write the source code of the some software components. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

NIST	National Institute of Standards and Technology
REST	Representational State Transfer
PRGN	Pseudo Random Number Generator
TRNG	True Random Number Generator
TLS	Transport Layer Security
SSL	Secure Sockets Layer
IoT	Internet of Things
API	Application Programming Interface

References

Atzori, L.; Iera, A.; Morabito, G. The Internet of Things: A Survey. Comput. Netw. 2010, 54, 2787–2805. [Google Scholar] [CrossRef]
Roman, R.; Zhou, J.; Lopez, J. On the Features and Challenges of Security and Privacy in Distributed Internet of Things. Comput. Netw. 2018, 57, 2266–2279. [Google Scholar] [CrossRef]
Ferguson, N.; Schneier, B.; Kohno, T. Cryptography Engineering: Design Principles and Practical Applications; Wiley: Hoboken, NJ, USA, 2010. [Google Scholar]
Eastlake, D.E.; Schiller, J.; Crocker, S. Randomness Requirements for Security. RFC 2011, 4086, 1–48. [Google Scholar]
Crocetti, L.; Nannipieri, P.; Di Matteo, S.; Fanucci, L.; Saponara, S. Review of Methodologies and Metrics for Assessing the Quality of Random Number Generators. Electronics 2023, 12, 723. [Google Scholar] [CrossRef]
Banks, D.L. Statistical Testing of Random Number Generators. In Proceedings of the 22nd National Information Systems Security Conference, Arlington, VA, USA, 18–21 October 1999. [Google Scholar]
Foreman, C.; Yeung, R.; Curchod, F.J. Statistical Testing of Random Number Generators and Their Improvement Using Randomness Extraction. Entropy 2024, 26, 1053. [Google Scholar] [CrossRef] [PubMed]
Zhao, X.; Chen, L.W.; Li, K.; Schmidt, H.; Polian, I.; Du, N. Memristive True Random Number Generator for Security Applications. Sensors 2023, 24, 5001. [Google Scholar] [CrossRef] [PubMed]
Pazos, S.; Zheng, W.; Zanotti, T.; Aguirre, F.; Becker, T.; Shen, Y.; Zhu, K.; Yuan, Y.; Wirth, G.; Puglisi, F.M.; et al. Hardware Implementation of a True Random Number Generator Integrating a Hexagonal Boron Nitride Memristor with a Commercial Microcontroller. Nanoscale 2024, 15, 2171–2180. [Google Scholar] [CrossRef] [PubMed]
Zhu, K.; Vescio, G.; González-Torres, S.; López-Vidrier, J.; Frieiro, J.L.; Pazos, S.; Jing, X.; Gao, X.; Wang, S.D.; Ascorbe-Muruzábal, J.; et al. Inkjet-Printed h-BN Memristors for Hardware Security. Nanoscale 2023, 15, 9985–9992. [Google Scholar] [CrossRef] [PubMed]
Hayes, B. Randomness as a Resource. Am. Sci. 2001, 89, 300–304. [Google Scholar] [CrossRef]
NIST SP 800-22: Documentation and Software—NIST Statistical Test Suite (STS). 2014. Available online: https://csrc.nist.gov/projects/random-bit-generation/documentation-and-software (accessed on 4 March 2026).
Sýs, M.; Říha, Z.; Matyáš, V. Algorithm 970: Optimizing the NIST Statistical Test Suite and the Berlekamp-Massey Algorithm. ACM Trans. Math. Softw. 2017, 43, 1–17. [Google Scholar] [CrossRef]
Kravietz. kravietz/nist-sts: A Slightly Updated Version of NIST Statistical Test Suite (STS) for Randomness Testing. Available online: https://github.com/kravietz/nist-sts (accessed on 4 March 2026).
Ang, S. Randomness_testsuite: Python Implementation of NIST SP 800-22 Tests. Available online: https://github.com/stevenang/randomness_testsuite (accessed on 4 March 2026).
Arcetri. Arcetri/sts: Improved Version of the NIST Statistical Test Suite (STS). Available online: https://github.com/arcetri/sts (accessed on 4 March 2026).
Sýs, M.; Říha, Z. Faster Randomness Testing with the NIST Statistical Test Suite. In Security, Privacy, and Applied Cryptography Engineering; Chakraborty, R.S., Matyas, V., Schaumont, P., Eds.; Springer: Cham, Switzerland, 2014; pp. 272–284. [Google Scholar]
Holzmer, P.; Koschuch, M.; Hudler, M. Putting Chaos into Perspective: Evaluation of Statistical Test Suite Implementations on Isolated Sequences of Arbitrary Length. In Proceedings of the 7th International Conference on Internet of Things, Big Data and Security, Vienna, Austria; SciTePress: Setúbal, Portugal, 2023. [Google Scholar] [CrossRef]
Yang, X.; Zhan, X.; Kang, H.; Luo, Y. Fast Software Implementation of Serial Test and Approximate Entropy Test of Binary Sequence. Secur. Commun. Netw. 2021, 2021, 1375954. [Google Scholar] [CrossRef]
Molnár, M.Z. Random Bitstream Tester. 2024. Available online: https://mzsoltmolnar.github.io/random-bitstream-tester/ (accessed on 4 March 2026).
Říha, Z.; Sýs, M. Fast Randomness Testing (FRT) Framework. 2024. Available online: http://frt.fi.muni.cz/ (accessed on 4 March 2026).
TestQRNG Project. TestQRNG: Online Cryptographic Randomness Testing with NIST SP800-22. 2023. Available online: https://www.cesga.es/wp-content/uploads/2023/10/informe_HPCNow.pdf (accessed on 4 March 2026).
National Institute of Standards and Technology. Decision to Revise NIST SP 800-22 Rev. 1a: A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications. 2022. Available online: https://csrc.nist.gov/news/2022/decision-to-revise-nist-sp-800-22-rev-1a (accessed on 4 March 2026).
Knuth, D.E. The Art of Computer Programming, Volume 2: Seminumerical Algorithms; Addison-Wesley: Hoboken, NJ, USA, 1997. [Google Scholar]
Massey, F.J., Jr. The Kolmogorov-Smirnov Test for Goodness of Fit. J. Am. Stat. Assoc. 1951, 46, 68–78. [Google Scholar] [CrossRef]
Justel, A.; Peña, D.; Zamar, R. A multivariate Kolmogorov–Smirnov test of goodness of fit. Stat. Probab. Lett. 1997, 35, 251–259. [Google Scholar] [CrossRef]
Rukhin, A.; Soto, J.; Nechvatal, J.; Smid, M.; Barker, E.; Leigh, S.; Levenson, M.; Vangel, M.; Banks, D.; Heckert, A.; et al. A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications (SP 800-22 Rev. 1a); Technical Report; National Institute of Standards and Technology (NIST): Gaithersburg, MD, USA, 2010. [Google Scholar]

Figure 1. General system architecture.

Figure 2. Graphical user interface of the tool.

Figure 3. Generation of long binary random sequences using the ESP32-C3.

Figure 4. IoT ecosystem topology used for emulation.

Table 1. NIST SP 800-22 randomness tests results on 1000 sequences of 1,000,000 bits each one generated by the TRNG of ESP32-C3 microcontroller and evaluated with our tool.

#	Test	Chi² p-Value	KS p-Value	n	$n_{pass}$	Proportion	Passrate	Global Result
1	Frequency (Monobit)	0.605916	0.358883	1000	986	0.986000	0.980561	SUCCESS
2	Block Frequency	0.834308	0.156062	1000	990	0.990000	0.980561	SUCCESS
3	Cumulative Sums	0.650652	0.860185	2000	1970	0.985000	0.983325	SUCCESS
4	Runs	0.907419	0.888421	1000	992	0.992000	0.980561	SUCCESS
5	Longest Run of Ones	0.496351	0.867173	1000	991	0.991000	0.980561	SUCCESS
6	Binary Matrix Rank	0.564639	0.500264	1000	990	0.990000	0.980561	SUCCESS
7	Discrete Fourier Transform	0.743915	0.045175	1000	992	0.992000	0.980561	SUCCESS
8	Non Overlapping Template MT	0.096102	0.148176	148,000	146,513	0.989953	0.989224	SUCCESS
9	Overlapping Template MT	0.242986	0.243342	1000	989	0.989000	0.980561	SUCCESS
10	Maurer’s Universal	0.844641	0.678546	1000	984	0.984000	0.980561	SUCCESS
11	Aproximate Entropy	0.138069	0.590097	1000	988	0.988000	0.980561	SUCCESS
12	Random Excursions	0.240039	0.257038	5056	4988	0.986551	0.985802	SUCCESS
13	Random Excursions Variant	0.140850	0.120954	11,374	11,235	0.987779	0.987201	SUCCESS
14	Serial	0.013664	0.390557	2000	1975	0.987500	0.983325	SUCCESS
15	Linear Complexity	0.248014	0.456822	1000	985	0.985000	0.980561	SUCCESS

Table 2. Execution time comparison between the official NIST implementation, the optimized FAST NIST implementation, and this work.

Test	NIST (ms)	Fast NIST (ms)	This Work (ms)
Frequency	127.60	18.99	21.28
Block Frequency	123.46	19.46	21.22
Cumulative Sums	187.93	21.27	24.53
Runs	243.27	20.43	22.55
Longest Run	172.20	21.68	23.28
Rank	610.42	36.52	42.87
FFT	1409.15	1356.87	1367.77
Non-overlapping Template	14,737.64	49.65	47.02
Overlapping Template	367.21	31.12	35.12
Maurer’s Universal	362.17	40.84	43.61
Approximate Entropy	1902.87	34.10	37.91
Random Excursions	155.25	57.40	64.15
Random Excursions Variant	535.52	57.05	60.34
Serial	3757.43	189.72	194.58
Linear Complexity	24,605.60	314.39	321.14
Overall time	49,297.72	2269.50	1750.60

Table 3. Entropy test results for the 2GB dataset file.

Parameter	Result
Entropy (bits per byte)	7.999999
Optimum compression (%)	0% (no compression possible)
Chi-square test	2316.53 for 2,147,483,648 samples; would be exceeded by random data less than 0.01% of the time.
Arithmetic mean value of data bytes	127.4877 (127.5 = random)
Monte Carlo value for $π$	3.141833170 (error 0.01%)
Serial correlation coefficient	−0.000003 (totally uncorrelated = 0.0)
Execution time (ms)	23,917.39 ms

Table 4. Relevant NIST tests for an initial randomness approximation.

Test	Measures	Description
Frequency (Monobit)	Distribution of 0 s and 1 s	Checks whether the number of 1’s and 0’s in the entire sequence are approximately the same.
Block Frequency	Distribution in blocks	Divides the sequence into blocks and checks if the proportion of 1’s within each block is close to 0.5. Detects local imbalances of bits within blocks.
Runs	Length of consecutive sequences	Evaluates whether there are too many or too few consecutive sequences of the same bit.
Longest Run	Length of the longest sequence	Detects anomalous patterns in long sequences of 0 s or 1 s.
Approximate Entropy	Complexity and repetition of patterns	Detects repetitive patterns that do not appear in simpler tests.

Table 5. Results of the NIST SP800-22 statistical tests for TRNG and PRNG sources from four IoT emulated devices (the devices randomly choose between the PRNG or TRNG source in each execution cycle).

IoT Device Number	RNG Source	Test	Result	Chi² p-Value	KS p-Value	Proportion	Passrate	Time (ms)
0	PRNG	Frequency	SUCCESS	0.525011	0.276682	1	0.949751	100.7
0	PRNG	BlockFrequency	SUCCESS	0.948119	0.658426	0.981818	0.949751	101.82
0	PRNG	Runs	SUCCESS	0.784407	0.727477	1	0.949751	108.56
0	PRNG	LongestRun	SUCCESS	0.637119	0.127281	1	0.949751	110.96
1	PRNG	BlockFrequency	SUCCESS	0.712343	0.499477	0.981818	0.949751	112.72
1	PRNG	Frequency	SUCCESS	0.057975	0.010683	1	0.949751	100.72
1	PRNG	LongestRun	SUCCESS	0.996795	0.765644	1	0.949751	109.97
0	PRNG	ApproximateEntropy	SUCCESS	0.229125	0.729546	1	0.949751	186.54
2	PRNG	Runs	SUCCESS	0.489065	0.126917	1	0.949751	109.32
1	PRNG	Runs	SUCCESS	0.15455	0.26697	0.981818	0.949751	108.65
2	PRNG	BlockFrequency	SUCCESS	0.275709	0.149362	0.981818	0.949751	101.51
1	PRNG	ApproximateEntropy	SUCCESS	0.818179	0.699444	1	0.949751	185.76
3	PRNG	Frequency	FAILURE	0.454224	0.345266	0.945455	0.949751	99.46
3	TRNG	Runs	SUCCESS	0.525011	0.855644	1	0.949751	109.89
3	TRNG	LongestRun	SUCCESS	0.328861	0.131923	0.963636	0.949751	111.07
2	PRNG	Frequency	SUCCESS	0.229125	0.574534	1	0.949751	98.84
2	PRNG	LongestRun	FAILURE	0.712343	0.659847	0.945455	0.949751	108.48
3	TRNG	ApproximateEntropy	SUCCESS	0.712343	0.71612	1	0.949751	185.17
2	PRNG	ApproximateEntropy	SUCCESS	0.964295	0.970919	1	0.949751	186.09
3	TRNG	BlockFrequency	SUCCESS	0.849861	0.920421	1	0.949751	102.67

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Castillo, J.; Aran Vila, P.; Palacio, F.; Garrido, B.; Hernández, S.; Cirera, A. Automated Framework for Testing Random Number Generators for IoT Security Applications Using NIST SP 800-22. IoT 2026, 7, 26. https://doi.org/10.3390/iot7010026

AMA Style

Castillo J, Aran Vila P, Palacio F, Garrido B, Hernández S, Cirera A. Automated Framework for Testing Random Number Generators for IoT Security Applications Using NIST SP 800-22. IoT. 2026; 7(1):26. https://doi.org/10.3390/iot7010026

Chicago/Turabian Style

Castillo, Juan, Pere Aran Vila, Francisco Palacio, Blas Garrido, Sergi Hernández, and Albert Cirera. 2026. "Automated Framework for Testing Random Number Generators for IoT Security Applications Using NIST SP 800-22" IoT 7, no. 1: 26. https://doi.org/10.3390/iot7010026

APA Style

Castillo, J., Aran Vila, P., Palacio, F., Garrido, B., Hernández, S., & Cirera, A. (2026). Automated Framework for Testing Random Number Generators for IoT Security Applications Using NIST SP 800-22. IoT, 7(1), 26. https://doi.org/10.3390/iot7010026

Article Menu

Automated Framework for Testing Random Number Generators for IoT Security Applications Using NIST SP 800-22

Abstract

1. Introduction

2. Materials and Methods

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI