1. Introduction
Internet of Things (IoT) devices frequently collect, transmit, and manage data that might include personal information, financial data, health records, or other sensitive information. Many regions and industries have strict regulations and standards regarding data protection and privacy, such as the Health Insurance Portability and Accountability Act (HIPAA) in the United States and the General Data Protection Regulation (GDPR) in Europe. Implementing strong cryptographic protocols helps IoT device manufacturers and service providers comply with these regulations, avoiding legal penalties and reputational damage. Thus, the significance of secure cryptographic protocols in the IoT cannot be overstated, especially as the proliferation of IoT devices continues to grow at an unprecedented rate.
As the demand for secure communication over untrusted networks increased, especially in decentralized IoT deployments, the need for more scalable key management led to the adoption of public-key cryptography. The advent of public-key cryptography revolutionized secure communications, allowing secure exchanges over insecure channels without a shared secret, using a pair of keys—one public, one private. One of the earliest practical implementations was RSA (Rivest–Shamir–Adleman), introduced in the late 1970s, which remains widely used for secure data transmission and digital signatures. Later, Elliptic Curve Cryptography (ECC) emerged, offering comparable security with much smaller key sizes, making it particularly suitable for resource-constrained environments such as mobile and IoT devices. Closely related to public key cryptography is the development of digital signatures, which authenticate the identity of the sender and ensure the message’s integrity, akin to a handwritten signature but far more secure. Hashing, another critical cryptographic technique, transforms input into a fixed-size string of bytes, used in data retrieval, integrity checks, and cryptographic applications. Cryptographic methods have evolved to address the needs of increasingly sophisticated digital environments, constantly adapting to new challenges, including those posed by the advent of quantum computing.
A particularly alarming issue is the “store now, decrypt later” dilemma, where attackers might capture and save encrypted data presently, planning to decrypt it later, either through conventional cryptanalysis or more critically once quantum computing has sufficiently evolved [
1]. The recent advancements in quantum computing pose new challenges to traditional cryptographic methods, leading to a shift toward quantum-resistant algorithms. The rapid advancements in quantum computing, such as Microsoft’s recent unveiling of Majorana 1, the world’s first quantum chip based on the novel Topological Core architecture, highlight the acceleration toward operational quantum computers. This new processor architecture promises the potential to fit a million qubits on a single chip small enough to fit in the palm of one’s hand. This emerging quantum capability underscores the urgent need for quantum-resistant cryptographic solutions.
Ascon, a symmetric key cryptographic standard approved by the National Institute of Standards and Technology (NIST) in 2023, is engineered to provide Authenticated Encryption with Associated Data (AEAD), hashing, and Extendable Output Function (XOF) capabilities [
2]. It was chosen as the primary option for lightweight authenticated encryption in the final portfolio of the Competition for Authenticated Encryption: Security, Applicability, and Robustness (CAESAR), which ran from 2014 to 2019. It is designed to be efficient in both software and hardware, making it particularly suitable for constrained devices. It offers robust side-channel resistance and resilience against misuse. However, a critical observation is that it cannot distinguish between original data messages and unauthorized retransmissions. This vulnerability to replay attacks poses a significant security risk in stateless protocols typical of IoT deployments.
This paper presents a Field Programmable Gate Array (FPGA)-based extension of the Ascon cryptographic protocol, specifically engineered to address critical security gaps in lightweight IoT environments. We focus on mitigating Ascon’s inherent vulnerability to replay attacks by implementing a hardware based nonce generation and verification framework. A 128-bit Linear Feedback Shift Register (LFSR) is deployed on a Xilinx Artix-7 FPGA to ensure per-message nonce uniqueness during encryption. At the decryption end, replay protection is enforced using a Bloom Filter-based detection system stored on FPGA Block RAM (BRAM). Filter indices are derived from the received nonce using Ascon-XOF128, which utilizes the same permutation logic as the Ascon core, effectively eliminating the overhead of implementing a separate hashing module. The Bloom Filter provides guaranteed detection of replayed messages, i.e., zero false negatives, while maintaining an exceptionally low False Positive Rate (FPR) [
3], thus effectively safeguarding the integrity of IoT communications. This design is inherently scalable, efficiently adapting to diverse security requirements across low to high-end IoT devices. Furthermore, leveraging the FPGA’s intrinsic support for parallel processing and high-speed operations, our solution aligns seamlessly with IoT constraints such as limited computational resources, power efficiency, and minimal latency.
The rest of the paper is organized as follows:
Section 2 reviews related work on replay attack mitigation and FPGA-based security designs.
Section 3 details the methodology, including the implementation of Ascon AEAD on FPGA, with a focus on the nonce generation mechanism, followed by the integration of Bloom Filter-based replay attack detection and the design of the Bloom Filter using Ascon-XOF128 hashing.
Section 4 describes the FPGA implementation.
Section 5 presents the experimental results, including timing validation, permutation latency analysis, replay detection and hashing optimizations, hardware resource utilization, and a scalability analysis of the Bloom Filter-based replay detection mechanism.
Section 6 provides a combined discussion and conclusion, highlighting key findings and outlining potential directions for future work.
3. Methodology
This section details the implementation of Ascon-AEAD128 on FPGA [
2], focusing on its architectural design, optimization strategies, and hardware resource management to ensure efficient encryption and authentication. An important addition to this implementation is the LFSR-based nonce generation scheme, which guarantees nonce uniqueness while maintaining a lightweight and efficient hardware footprint. Following this, we introduce our novel replay attack detection strategy using Bloom Filter, offering a probabilistic, memory-efficient alternative over conventional nonce-tracking mechanisms. The integration of Ascon-XOF128 hashing with Bloom Filter transforms nonces into secure hash mappings, eliminating the need for direct nonce storage. We discuss the system-level integration of Ascon encryption and decryption along with LFSR-based nonce generation, and Bloom Filter-based replay detection, emphasizing parallel processing, BRAM utilization, and FPGA resource efficiency to achieve real-time attack detection in constrained IoT environments.
3.1. Ascon AEAD Implementation on FPGA
Ascon operates on a 320-bit state, which is updated using two types of permutations, denoted as
(a rounds) and
(b rounds). This 320-bit state
S is divided into two components: an outer part
of
r bits (rate) and an inner part
of
c bits (capacity) where the values of
r and
c = 320 −
r vary depending on the specific Ascon variant. For this implementation, we adopt Ascon-AEAD128, where
r equals 128. To facilitate the definition and application of round transformations, the 320-bit state
S is further divided into five 64-bit words denoted as
,
,
,
,
. This allows for efficient processing and manipulation of the cipher state throughout the encryption and decryption operations.
3.1.1. Nonce Generation Mechanism
The nonce generation mechanism in this implementation utilizes a 128-bit LFSR to produce a unique nonce for each encryption session. The LFSR is initialized with a fixed seed to establish a complex starting state and is updated using a tapped feedback polynomial, where the new input bit is computed as the XOR of selected tap positions from the current register state. This structure ensures sufficient diffusion and enables a long pseudo-random sequence before repeating.
To maximize hardware efficiency and reduce latency, a one-to-many feedback configuration is employed, in which a single feedback bit is distributed to multiple tap positions. This configuration allows the logic to be implemented in just two levels, minimizing critical path delay compared to conventional many-to-one feedback designs. It is particularly advantageous for high-throughput or resource-constrained environments. An illustration of this structure is shown in
Figure 1.
During each clock cycle, when enabled, the LFSR shifts its state by one bit while applying the feedback logic. This produces a distinct 128-bit value for each encryption operation without requiring external randomness or counters. The design prioritizes nonce uniqueness, which is critical for ensuring authenticated encryption and preventing replay attacks. As each encryption operation uses a new LFSR state, the resulting nonce stream provides a long non-repeating sequence suitable for preventing reuse over the device’s operational lifetime. In AEAD schemes like Ascon, the nonce is not a secret but must be unique for each encryption operation to ensure security. This design goal aligns with the use of an LFSR, which provides deterministic, non-repeating nonce values with minimal hardware overhead.
While LFSRs are not cryptographically secure random number generators, they are lightweight and efficient for ensuring nonce uniqueness in hardware. In this design, the LFSR is used strictly to guarantee distinct nonces across encryption operations. Potential risks from seed reuse or LFSR periodicity are acknowledged and can be mitigated by initializing the seed from device-specific constants or startup entropy sources.
3.1.2. Authenticated Encryption and Verified Decryption
Ascon follows a sponge-based encryption approach, as shown in
Figure 2, and is structured into four distinct phases—initialization, associated data processing, plaintext processing, and finalization. Its operational mode is inspired by duplex-based constructions such as MonkeyDuplex [
20], but enhances security by using a stronger keyed initialization and keyed finalization function.
The 320-bit initial state of Ascon is formed by the secret key,
K of 128 bits, and nonce
N of 128 bits generated using the method elaborated in the previous section and the Initialization Vector (
) assigned to
as
. After completing all processing stages, it produces a ciphertext
C of the same length as plaintext
P and a 128-bit authentication tag
T:
A twelve-round permutation, , is first applied to the concatenated input , ensuring strong diffusion and secure mixing of the key and nonce. During the associated data absorption phase, each associated data block is XORed into the rate portion of the state, followed by an eight-round permutation, , after each block.
The plaintext processing phase absorbs the message by XORing each plaintext block into the state’s rate portion, producing the ciphertext while applying an eight-round permutation function after each block. During the finalization phase, the key is reintegrated into the state, followed by another twelve-round permutation, . The authentication tag T and ciphertext C are then extracted from the state’s rate and capacity portions, respectively. This tag ensures message integrity, preventing unauthorized modifications or forgeries.
Similarly, as shown in
Figure 3, the decryption function begins by initializing the 320-bit internal state as
, where
is a fixed constant,
K is the shared secret key, and
N is the nonce received alongside the ciphertext. The associated data
A, ciphertext
C, and authentication tag
T are also provided as inputs for verification and decryption.
The decryption process mirrors the encryption phases. The associated data is first absorbed into the state using XOR operations over the rate portion, with an 8-round permutation, , applied after each block. Following that, the ciphertext blocks are processed to recover the plaintext P, again using XOR operations, interleaved with permutations after each block.
In the finalization phase, the key K is XORed back into the state, and a 12-round permutation, , is applied. The resulting state is then used to generate a recomputed authentication tag, which is compared against the received tag, T. If the tags match, the decryption is deemed successful, and the original plaintext P is returned. If the tags differ, the ciphertext is considered unauthenticated, and the decryption process fails, preventing the release of invalid or potentially tampered data.
3.2. Replay Attack Detection Using Bloom Filters
Bloom Filters are space- and time-efficient probabilistic data structures that enable fast membership checks with significantly reduced memory requirements [
3]. While the standard Ascon specification does not include any built-in mechanism for replay protection, our system extends its security by integrating a Bloom Filter that probabilistically determines whether a given nonce has likely been seen before, without storing each nonce individually. This eliminates the need for full-length nonce comparisons. This approach offers substantial memory savings, requiring only a fraction of the space used by traditional error-free hashing methods, while maintaining high detection accuracy. In our design, the Bloom Filter integrates seamlessly with the Ascon-based decryption pipeline, offering scalable and efficient replay detection suitable for constrained hardware environments. Building on this foundation, we now present the underlying architecture, design choices, and operational flow of the Bloom Filter within the replay detection system, highlighting how it complements the Ascon decryption process in both functionality and efficiency.
3.2.1. Framework of Bloom Filter
Let
be a subset of a universal set
U, containing
n elements. A Bloom Filter represents these elements using a bit vector of length
m, with all bits initially set to zero. To include an element
x in
S,
k distinct hash functions,
, are used to assign
x to
k specific positions
within the bit vector, where each
falls within the range [0,
m − 1]. The bits at these positions in the vector are then set to 1. To check if a given element is part of set
S, the element is hashed to the bit vector using the same
k hash functions, and the bits at the corresponding positions are examined. If any of these bits is 0, the Bloom Filter determines that the element is not part of
S; if all are 1
s, the Bloom Filter suggests that the element might be in
S. However, it guarantees no false negatives, meaning any element reported as “not present” is definitely not in the set.
Figure 4 shows an example of a Bloom Filter with filter size,
bits, and hash functions,
, used to represent a set
. The 12-bit vector is initialized to all zeros. Upon inserting elements, specific bits corresponding to each element are set to 1, as determined by the hash functions.
For , suppose the hash functions determine the positions 1, 4, and 7. These bits are set to 1 in the vector.
For , the hash functions map it to positions 2, 4, and 9. Note the shared position 4 with , showcasing hash collision.
For , let us say the bits at position 0, 3, and 11 are set to 1.
Now, when querying the set,
A query for checks bits at positions 1, 4, and 7. Since all these bits are 1, the Bloom Filter returns “Positive”, correctly indicating ’s membership.
A query for an element , which is not part of set S, might check bits at positions 2, 5, and 8. Since the bit at position 5 is 0 (assuming no previous element has affected this bit), the Bloom Filter returns “Negative”, correctly indicating that is not in the set.
However, the possibility of false positives arises:
Suppose a query for (not in S) maps to positions 1, 9, and 11. All these positions have bits set to 1 due to the insertion of , , and . The Bloom Filter would incorrectly return “Positive”, suggesting is a member of S despite it not being true.
This example illustrates the inherent risk of false positives in Bloom Filters due to hash conflicts, where different input elements result in the same hash values affecting the same bits in the vector. Accordingly, it is essential to adopt a well-balanced configuration of Bloom Filter parameters, specifically the bit array size and the number of hash functions, to optimize the tradeoffs among memory efficiency, computational speed, and detection accuracy. While the Bloom Filter is not inherently cryptographic and has been criticized for vulnerabilities in uncontrolled environments due to its susceptibility to false positives and pollution attacks [
21], it remains highly effective when deployed as an auxiliary mechanism in controlled systems [
22]. In our implementation, we overcome these limitations by incorporating Ascon-XOF128, a lightweight, post-quantum secure hash function, ensuring that the indices generated for Bloom Filter updates are tamper-resistant and difficult to predict. This combination provides a robust and efficient replay attack detection scheme suited for resource-constrained IoT environments.
3.2.2. Implementation of Bloom Filter-Based Replay Protection for Ascon on FPGA
Despite Ascon’s adoption as a lightweight AEAD standard, existing FPGA implementations focus primarily on encryption efficiency, energy optimization, and side-channel resistance, with no prior work explicitly addressing replay attack mitigation in Ascon-based cryptographic systems. Traditional approaches rely on explicit nonce tracking, protocol-layer defenses, or storage-heavy mechanisms, all of which impose significant memory and computational overhead, making them unsuitable for resource-constrained FPGA-based IoT applications. To bridge this gap, we propose a novel, hardware-efficient replay detection mechanism that integrates Bloom Filters with Ascon-XOF128 hashing, providing a lightweight, scalable, and high-speed security enhancement for Ascon on FPGA.
Unlike previous methods that rely on persistent storage, our design employs a Bloom Filter to efficiently track nonces, thereby eliminating explicit memory requirements. It reuses Ascon’s existing permutation modules for hashing, reducing hardware complexity and improving resource efficiency. By exploiting FPGA-level parallelism, the system achieves real-time, high-speed replay attack detection, in contrast to the latency overhead of software-based approaches. To the best of our knowledge, this work presents the first FPGA-based replay attack mitigation mechanism for Ascon, offering a novel, efficient, and scalable solution for securing IoT environments against replay attacks.
3.3. Bloom Filter Design and Setup
A Bloom Filter is a space-efficient probabilistic data structure that supports set membership queries while allowing false positives but no false negatives. The filter consists of an array of
m bits, initially set to zero, and utilizes
k independent hash functions to map each inserted element to
k positions in the bit array. The theoretical foundation of Bloom Filters, as detailed by Tarkoma et al. [
23], enables a balance between memory efficiency and query performance.
The number of bits
m required for a given number of elements
n and a False Positive Rate
p is given by
The probability of a false positive occurring after inserting
n elements into the Bloom Filter can be approximated as
The likelihood of false positives in a Bloom Filter can be minimized by choosing appropriate values for the array size
m and the number of hash functions
k. Increasing the number of hash functions reduces the FPR up to an optimal value
calculated using Equation (
6), beyond which additional hash functions may degrade performance by setting too many bits in the filter.
A higher
k also increases computational overhead, making it a trade-off between accuracy and efficiency. Similarly, expanding the filter size (
m) lowers the FPR by providing more space for hash results, reducing unintended bit collisions. However, this comes at the cost of increased memory consumption, which may not be feasible in resource-constrained environments. Additionally, as the number of inserted elements grows, the probability of hash collisions rises, leading to a higher FPR [
23].
In our implementation, the Bloom Filter uses 1 Megabit of on-chip BRAM, configured as a bit array of size m = 1,048,576 and hash functions. This setup supports the tracking of approximately 100,000 nonces while maintaining an FPR below 1%. While the selected number of hash functions slightly exceeds the theoretical optimum for minimizing false positives in a Bloom Filter, this choice is both deliberate and justified within the context of our hardware architecture. In conventional Bloom Filters, increasing k beyond the optimal point can lead to diminishing returns in accuracy and increased computational overhead due to the need for multiple independent hash computations.
However, we have addressed this concern through the use of Ascon-XOF128, a cryptographically secure, post-quantum resistant Extendable Output Function. It enables the generation of multiple pseudo-random outputs from a single absorbed input by incrementally squeezing the state, enabling the efficient generation of all ten Bloom Filter indices in a single hashing pass. This approach incurs minimal hardware overhead, as the hashing logic is lightweight and structurally aligned with the primary Ascon core. Furthermore, the cryptographic strength of Ascon-XOF128 guarantees high entropy and uniformity in its output distribution, reducing the likelihood of bit saturation and ensuring that Bloom Filter indices are well-dispersed, which is an essential property for maintaining a low FPR.
3.3.1. Integration of Ascon-XOF128 Hashing in Bloom Filter
Efficient hash function selection is crucial for optimizing Bloom Filter-based replay attack detection in IoT security applications. Integrating Ascon-XOF128 into Bloom Filter-based replay attack detection systems offers a balanced approach between security and performance, particularly in FPGA-based IoT environments. In this work, the replay attack detection system is designed as an extension of the existing Ascon core, making the integration of Ascon-XOF128 hashing both efficient and resource-conscious. Unlike implementations where hashing is treated as an independent module, here, Ascon-XOF128 is derived from the same permutation functions already present in the Ascon encryption-decryption core. This significantly reduces the hardware overhead. The diagram below illustrates the integration of Ascon-XOF128 as a lightweight hashing mechanism for Bloom Filter-based authentication within the extended Ascon core.
The 128-bit nonce, generated per message during encryption using a seeded LFSR, is transmitted to the decryption side and serves as the input to Ascon-XOF128, which consists of three main stages—initialization, absorbing the nonce,
N, and squeezing out the hashed nonce,
H, as shown in
Figure 5. The Initialization Vector is defined as
=
. Given the 128-bit nonce input, (
), the algorithm produces a 256-bit hash output structured as
This is partitioned into ten segments that are used as indices into the Bloom Filter for replay detection. By deriving all ten indices from a single, lightweight Ascon-XOF128 module, the design avoids redundant hashing operations, minimizes latency, and maintains a high level of resistance to collision-based attacks, all within the constraints of low-resource FPGA environments. This seamlessly integrated approach demonstrates how post-quantum cryptographic primitives like Ascon-XOF128 can be repurposed to support efficient, hardware-friendly security enhancements beyond their traditional hashing roles.
3.3.2. Security Strength and On-Chip Efficiency
Traditional cryptographic hash functions like SHA-256, while secure, introduce significant computational overhead, making them impractical for real-time Bloom Filter operations [
24]. In contrast, non-cryptographic hash functions like MurmurHash offer superior performance but lack the necessary security guarantees, making them vulnerable in authentication-based systems.
Ascon-XOF128 offers an efficient and lightweight hashing mechanism by using the existing permutation units of the Ascon core, eliminating the need for additional hardware [
2]. This integration ensures both cryptographic strength and hardware efficiency. According to the security properties summarized in
Table 2, Ascon-XOF128 achieves up to 128-bit security, defined as
bits for collision resistance and
bits for preimage and second preimage resistance, where
L is the output length. These properties are especially critical for the proposed replay detection system, which relies on Bloom Filter indexing based on hashed nonces.
In the implemented design, each 256-bit output of Ascon-XOF128 is divided into multiple segments that determine the Bloom Filter indices. A decryption session is flagged as a replay, and plaintext release is suppressed, even when tag verification passes, if all indices derived from the session’s nonce are already set in the Bloom Filter. This rule ensures accurate detection of repeated nonces, prioritizing security over false positive suppression. While this approach may lead to occasional false positives, where a fresh nonce is incorrectly flagged as reused due to the probabilistic nature of Bloom Filters, such occurrences are statistically bounded and tunable based on filter size and the number of hash segments.
An attacker aiming to exploit this mechanism would need to craft a nonce that, when processed by the XOF logic, maps to the same Bloom Filter indices as a previously stored one. However, due to the 128-bit preimage and collision resistance of Ascon-XOF128, the probability of generating such a spoofed nonce is computationally negligible. More importantly, even if such a nonce were crafted, the corresponding authentication tag would also need to match for the message to be accepted. Since tag generation depends on both the nonce and the secret key, any mismatch results in authentication failure. Thus, our system not only prevents straightforward replay attacks but also defeats advanced forgery-based Denial-of-Service (DoS) attempts, demonstrating strong resilience in adversarial conditions.
Furthermore, Ascon-XOF128’s extendable output, which allows generation of variable length hash outputs, makes it particularly well-suited for Bloom Filter indexing. The number and size of hash-derived indices can be flexibly adapted to meet target FPRs and hardware resource constraints, without the need for multiple independent hash functions. This simplifies hardware implementation, reduces logic duplication, and minimizes synchronization overhead. Combined with the use of on-chip BRAM for storing the Bloom Filter bit array, the proposed system enables high-speed lookup operations with minimal latency, avoiding external DRAM access and ensuring scalability for real-time, resource-constrained IoT deployments.
3.4. System Integration and Optimization
A core design objective is to integrate replay detection into the Ascon framework without introducing significant hardware or timing overhead. Rather than introducing an independent hashing function for the Bloom Filter indexing, this design utilizes the existing AEAD permutation logic to implement Ascon-XOF128 hash, thereby minimizing design overhead. The system workflow consists of three primary stages: Ascon encryption, replay attack detection using Bloom Filter driven by Ascon-XOF128 hashing, and Ascon decryption, as illustrated in
Figure 6.
In the encryption path, a 128-bit nonce is generated using an LFSR-based mechanism. This ensures a unique nonce for each encryption cycle and maintains synchronization with the Ascon encryption core. The generated nonce, along with the plaintext, key, and associated data, is fed into the Ascon authenticated encryption module, which outputs the ciphertext and a 128-bit authentication tag.
For decryption with replay detection, the received nonce is hashed using Ascon-XOF128 to produce a variable-length output. This hash is segmented into multiple parts, each of which was used to compute an index for Bloom Filter lookup. If all the derived indices are already set in the Bloom Filter, the nonce is flagged as a potential replay. Otherwise, a logic 1 is written to each of the computed indices in the Bloom Filter to mark the nonce as seen. The ciphertext is decrypted in parallel using the same nonce, key, and associated data to recompute the authentication tag. The decrypted plaintext is released only if the authentication tag is valid and the nonce passes the replay detection check.
Thus, the proposed architecture builds on the inherent security guarantees of the Ascon-128 AEAD scheme, which ensures message integrity and authenticity through tag verification. Any tampering with the ciphertext, nonce, or associated data results in a tag mismatch, causing authentication to fail and preventing plaintext release. This behavior is preserved in the current design, which enforces tag verification alongside a nonce freshness check using a Bloom Filter, establishing a robust dual-layer security mechanism against message forgery and replay attacks.
To maintain hardware efficiency, the design integrates replay detection into the existing Ascon-based system without introducing dedicated hashing modules or redundant logic. The Bloom Filter implementation achieves a FPR below 1%, enabling reliable nonce tracking with minimal memory overhead. To mitigate the risk of DoS attacks resulting from early message rejection, decryption is initiated in parallel with nonce verification. Importantly, only authenticated messages are allowed to update the Bloom Filter, inherently preventing nonce flooding attacks from poisoning its state. Decrypted plaintext is released only after both authentication and freshness conditions are confirmed. This tightly integrated architecture ensures low-latency operation and strong security guarantees while remaining resource-efficient, making it well-suited for deployment on low-range FPGAs and is scalable to more complex platforms.
4. FPGA Implementation
We implemented the proposed system on Xilinx ARTY A7-100T FPGA, as shown in
Figure 7. The design features a post-quantum-ready authenticated encryption core with hardware-level replay attack detection using nonce tracking and Bloom Filter logic. Optimization for both hardware efficiency and performance are achieved by employing lightweight cryptographic primitives, bit-sliced computation techniques, and parallel processing wherever applicable.
The Ascon core initializes a 320-bit state comprising of the secret key, the LFSR generated nonce, and an Initialization Vector that encodes algorithm parameters such as the version identifier, the number of rounds, the rate, and mode-specific constants. This state serves as the starting point for the permutation-based sponge construction. The permutations apply a round-based transformation in an iterative manner, where each round follows a Substitution–Permutation Network (SPN) structure composed of three steps—the addition of round constants
, a non-linear substitution layer
, and a linear diffusion layer
. Equation (
1) describes the 320-bit state on which the round transformations are applied. The Finite State Machine (FSM) manages the transitions between initialization, associated data processing, plaintext/ciphertext processing, and finalization while maintaining correct control signals in the encryption steps.
The SPN structure is a self-contained algorithm that performs essential cryptographic operations such as non-linear substitution, mixing, and diffusion, making Ascon a strong candidate for secure applications in the post-quantum era. As shown in
Table 3, the number of permutation rounds differs between the AEAD mode and the hashing mode, with the hashing variant requiring twelve rounds during absorption and squeezing to meet higher diffusion and uniformity requirements. In this work, the permutation core supports two hardware variations that differ in how many rounds are executed per clock cycle, enabling performance optimization based on the operation type. These variations are evaluated in the context of encryption, decryption, and nonce hashing for replay detection. Their impact on latency and hardware resource utilization is discussed in
Section 5.
In the round constant addition step (
), a predefined constant
as shown in
Table 4 is XORed into the third 64-bit word
of the 320-bit internal state
S during each round. Here,
i denotes the current round number (starting from 0), and
r is the index used to select the appropriate round constant. For the 12-round permutation
, used in Ascon-128 and Ascon-XOF128 Hash, the constant is selected using
. For the reduced-round permutation
, applied during absorption and squeezing in Ascon-128 AEAD, the round constant index is calculated as
, where
and
.
The substitution layer,
, which is the S-box transformation layer, utilizes a 5-bit S-box applied in a bit sliced manner across the entire state. This S-box, detailed in
Table 5, defines the core non-linear transformation.
The linear diffusion layer
ensures diffusion within each 64-bit register word
, significantly increasing the avalanche effect. It applies a linear function, as specified in Equation (
7).
where the symbol ⋙ denotes a right rotation operation, and the rotation constants
and
for each 64-bit word
are defined in
Table 6.
In the proposed system, a 128-bit nonce is generated using an LFSR before each encryption operation. Both the plaintext and the associated data are processed in 128-bit blocks, consistent with the Ascon-128 specification. After the final permutation, the ciphertext is extracted from the state, and a 128-bit authentication tag is derived to ensure message authenticity and integrity.
In the decryption phase, the received ciphertext and associated data are processed in 128-bit blocks, consistent with the encryption procedure. Simultaneously, the 128-bit nonce associated with the message is passed through the Ascon-XOF128 hashing module to generate a 256-bit digest. This digest is segmented into ten indices, each of which is used to query a Bloom Filter stored in on-chip BRAM for replay detection. While decryption begins in parallel, plaintext is withheld unless the authentication tag is successfully verified and the nonce is confirmed to be fresh. If all ten indices derived from the nonce are set in the Bloom Filter, the message is flagged as a potential replay, and the plaintext is securely discarded even if tag verification succeeds. Otherwise, the nonce is inserted into the Bloom Filter for future comparison, and the authenticated plaintext is released.
Figure 8 illustrates the FSM that governs the replay detection process using a Bloom Filter. The system begins in the
IDLE state, waiting for a
hash_start signal indicating that a new nonce is available for evaluation. Upon receiving this signal, it transitions to the
START_HASH state, where Ascon-XOF128 hashing is triggered on the received 128-bit nonce. The FSM then enters the
WAIT_HASH_DONE state, holding until XOF hashing completes. Once the digest is ready, the system proceeds to
GEN_INDICES, where the 256-bit output of Ascon-XOF128 is segmented into ten index values corresponding to positions in the Bloom Filter. In the
BF_CHECK state, the system sequentially reads BRAM at the ten derived indices and collects the corresponding bit values. These bits are evaluated in the
REPLAY_EVAL state using a logical AND operation. If all bits are set to 1, the nonce is considered previously seen, and a replay is flagged; otherwise, the nonce is considered fresh.
If the tag associated with the message is valid and no replay is detected, the FSM transitions to the BF_UPDATE state, where a 1 is written to each of the ten Bloom Filter indices to record the nonce as seen. This update occurs only for fresh, authenticated messages, ensuring that replayed or unauthenticated inputs do not contaminate the filter. This update process is non-blocking and occurs in the background, as the system does not initiate the next replay check until nonce hashing for the subsequent message has completed. As measured in simulation, the Bloom Filter update process completes significantly faster than the nonce hashing operation. This temporal gap ensures that all filter updates safely complete before the next replay check begins, even when processed in the background. Finally, the system enters the DONE state, where the bf_done signal is asserted and the replay result computed prior to the update phase is communicated to the top-level controller before returning to IDLE. This standalone FSM enables low-latency, parallel replay detection by efficiently reusing the existing Ascon permutation core and performing all checks entirely in hardware, ensuring that only messages with fresh nonces are permitted for further processing.
Although a traditional hash table is not implemented, the Bloom Filter combined with Ascon-XOF128 hashing provides equivalent functionality for tracking previously seen nonces in a lightweight and memory-efficient manner. The hashed indices serve as a compact representation for freshness checking without requiring explicit key-value storage. In practical IoT deployments, the nonce, ciphertext, and authentication tag can be transmitted over serial interfaces such as UART or SPI and parsed into 128-bit blocks by the hardware modules. This design ensures compatibility with standard IoT communication protocols while maintaining low area and latency overhead.
5. Results
This section evaluates the proposed FPGA-based Ascon architecture across several dimensions, including functional correctness, latency performance, and hardware efficiency. We begin with simulation-based validation of the replay detection mechanism, followed by a detailed analysis of permutation latency, timing synchronization, and Bloom Filter operations. The scalability and resource footprint of the system are assessed to demonstrate its suitability for low-power and high-throughput IoT deployments.
5.1. Experimental Validation
Figure 9 illustrates a simulation of the test sequence, conducted at a 100 MHz clock frequency, to validate the effectiveness of the proposed replay attack detection mechanism in lightweight IoT security applications. It includes two scenarios.
In scenario 1, a valid message is encrypted using a freshly generated nonce by an LFSR. Upon receiving the start signal, the FSM transitions from the IDLE state to WAIT_NONCE, where it waits for the LFSR to generate a valid nonce. Once available, the nonce is latched, encryption begins, and the FSM moves to WAIT_ENC. After encryption completes (enc_done), both decryption and hashing are triggered in parallel as the FSM progresses through WAIT_DEC and WAIT_XOF. After decryption (dec_done_pulse), the decrypted plaintext and computed authentication tag are buffered. Once hashing completes (hash_done_pulse), the FSM proceeds to BF_CHECK, where the Bloom Filter is checked for nonce freshness. In the DECISION state, the FSM evaluates both the tag match and the Bloom Filter output. Since the tag is valid and the nonce is fresh, authentication passes, and auth_valid_pulse is asserted. The authenticated plaintext is released immediately as the FSM transitions to the DONE state. Simultaneously, the system triggers the BF_UPDATE phase, where a 1 is written to each of the ten Bloom Filter indices to record the nonce as seen. This update is performed in the background and completes independently, without stalling the main control flow.
In scenario 2, the same LFSR-generated nonce from scenario 1 is intentionally reused to emulate a replay attack. After encryption, decryption and nonce hashing begin in parallel, as before. Although the authentication tag again matches, the Bloom Filter detects that the nonce has already been used. During the REPLAY_EVAL phase, this replay is flagged and latched internally (replay_detected_pulse). In the subsequent DECISION state, the FSM evaluates the tag match and the replay detection result. Since the nonce is no longer fresh, the FSM bypasses the Bloom Filter update phase and transitions directly to DONE without asserting auth_valid_pulse. As a result, the plaintext output is suppressed. This sequence confirms that both a valid authentication tag and a fresh nonce are required for successful message acceptance. Two independent FSMs manage decryption and nonce hashing in parallel, with the top-level controller ensuring that Bloom Filter-based replay detection is triggered immediately after hashing completes. This design minimizes authentication latency without compromising security.
Figure 10 shows the hardware behavior of the Ascon-based IoT authentication system during a replay detection test. This configuration is intended solely for verification and demonstration and does not reflect the operational setup, where detection signals are integrated with high-speed data handling modules. In the first execution (left), a unique nonce is used, the authentication passes, and the
auth_valid LED (green) lights up, confirming the message integrity and
dec_done LED (blue) to indicate valid decryption. In the second execution (right), the same nonce is reused, triggering replay attack. The
replay LED (red) lights up while the
auth_valid and
dec_done LEDs remain off. This hardware demonstration validates the system’s ability to detect replay attempts and ensure message freshness in real time, an essential feature for secure IoT authentication. These observed LED outcomes correspond directly to the internal FSM states and replay detection signals shown in
Figure 9. Specifically, the
auth_valid_pulse,
dec_done_pulse, and
replay_detected_pulse signals are routed to drive the green, blue, and red LEDs, respectively.
5.2. Permutation Latency Analysis
To evaluate the efficiency of the Ascon permutation under different configuration strategies, two designs were implemented and compared based on how permutation rounds are distributed across clock cycles. The one Round Per Clock Cycle (RPCC) design achieves a balance between latency and resource usage by executing each round in a single cycle that integrates round constant addition, substitution, and linear diffusion. In contrast, the 2RPCC design offers the lowest latency by cascading two full rounds within a single cycle. This optimized version reduces the total cycle count while introducing a moderate increase in combinational logic complexity, a tradeoff that enables faster hashing performance [
26].
Table 7 summarizes the latency for both 12-round and 8-round permutations under each configuration. The 2RPCC configuration achieves the lowest overall latency and is preferred for the Ascon-XOF128 hashing variant for the Bloom Filter, while the 1RPCC version provides a favorable trade-off between speed and resource utilization for authenticated encryption and decryption.
This distinction arises from different data handling characteristics of these operations: the Ascon-XOF128 module processes a fixed 128-bit nonce, which allows two permutation rounds to be unrolled and executed in a single clock cycle without timing closure issues. This makes the 2RPCC configuration highly effective for nonce hashing in replay detection. Conversely, encryption and decryption handle variable-length messages, such as 200-byte payloads, and must process many sequential 128-bit blocks. In such scenarios, the 1RPCC schedule is more suitable, as it supports higher clock frequencies, reduced resources, better timing closure, and consistent throughput for streaming multi-block messages. This makes it an optimal choice for scalable, high-throughput authenticated encryption on resource-constrained IoT-class FPGAs.
5.3. Replay Detection Timing and Hashing Optimization
Replay detection performance was evaluated by analyzing the latency of nonce hashing and Bloom Filter verification relative to the decryption path. In the proposed architecture, nonce hashing is accelerated using a 2RPCC permutation core. Rather than executing one permutation round per cycle as in a standard Ascon-XOF128 configuration, the 2RPCC module performs two rounds per cycle using fully combinational logic. This enables the permutations required for nonce hashing to complete in approximately 560 ns. The subsequent Bloom Filter check adds 250 ns. Due to FSM transitions and control logic overhead, the total measured replay detection latency is approximately 820 ns, slightly exceeding the analytical sum.
In the implemented design, Bloom Filter updates are non-blocking and proceed in parallel with the start of the next nonce hashing cycle. Once a replay check passes, the Bloom Filter begins updating while the system prepares for the next input. This overlap ensures that the update process does not introduce additional latency or stall the replay detection pipeline, allowing continuous operation without interrupting throughput. A detailed latency breakdown for all stages, including a comparison with the 1RPCC hashing configuration at 100 MHz clock, is provided in
Table 8. Since the latency in clock cycles remains fixed, the absolute time increases proportionally at lower frequencies and decreases at higher ones. This frequency dependence allows system designers to tune clock rates according to power and performance needs, while maintaining the correctness and synchronization of the replay detection process.
Since replay detection latency is independent of message size, the decryption path can avoid stalling as long as the input data meets the required minimum threshold. Based on a simulated decryption latency of approximately 520 ns for a 32-byte plaintext with 16-byte associated data, measured at 100 MHz, this implies a minimum plaintext size of 64 bytes (four 128-bit blocks) is required to match the 820 ns replay detection latency when using the 2RPCC hashing configuration. For the same plaintext and associated data size, if hashing were implemented with a 1RPCC core instead, the replay detection latency would increase to approximately 1180 ns, requiring a minimum plaintext size of 80 bytes (five blocks) to avoid stall in the data path. These values are summarized in
Table 9, providing guidance for latency-aware system design in secure IoT applications.
This latency threshold aligns well with typical IoT message sizes in applications such as smart meters, wearables, and industrial sensor nodes, where payloads commonly range from 100 to 200 bytes. These values are consistent with protocol constraints found in lightweight IoT communication standards. For example, LoRaWAN supports maximum uplink payloads ranging from 51 to 222 bytes depending on region and data rate [
27], while MQTT and MQTT-SN protocols allow for small, frequent sensor messages, with MQTT supporting payloads up to 256 MB depending on negotiated packet limits [
28,
29]. As such, the proposed design remains broadly compatible with a wide spectrum of real world IoT deployments. These results confirm that the implemented design supports parallel, high-throughput, and replay-resilient authenticated message processing on FPGA hardware.
5.4. Hardware Resource Utilization
The hardware resource utilization of the complete Ascon-based authentication and replay detection system under three different architectural configurations is summarized in
Table 10. The design was synthesized for a Xilinx Arty A7-100T FPGA, operating at a clock frequency of 100 MHz. All configurations include the same top-level functionality, integrating modules for authenticated encryption (AEAD), nonce generation, nonce hashing, and replay detection using a Bloom Filter.
Type 1 employs a 1RPCC permutation across the encryption, decryption, and hashing cores; Type 2 uses 1RPCC for encryption and decryption but adopts a 2RPCC configuration for the hashing core to reduce verification latency; and Type 3 applies a 2RPCC permutation for all three cryptographic cores.
Across these implementations, the top-level resource utilization ranges from 3490 to 5335 Slice LUTs (5.5–8.4%) and 4285 to 4287 Slice Registers (3.38%). The Bloom Filter controller consistently uses 32 BRAM tiles, representing 23.7% of the available BRAM. No UltraRAM or DSP resources are consumed. The architecture demonstrates a flexible and compact hardware footprint, allowing trade-offs between area and performance depending on deployment constraints.
The total on-chip power was measured as 208 mW, 258 mW, and 377 mW for the Type 1, Type 2, and Type 3 configurations, respectively. The corresponding dynamic power values were 109 mW, 159 mW, and 277 mW respectively. These results reflect the increased combinational logic activity in the higher-throughput 2RPCC configuration and illustrate the trade-off between latency and energy efficiency. This architectural flexibility enables deployment on a wide range of low to mid-tier FPGAs, with configurations tailored to the energy and performance requirements of specific IoT applications.
Using a 1RPCC permutation for both AEAD and hashing in the replay detection system provides a highly area-efficient solution. This configuration is well-suited for low-power or resource-constrained systems where performance requirements are moderate. At the same time, as seen in the Type 2 configuration, the 2RPCC hashing design offers significant latency benefits while maintaining a modest resource profile, making it a practical option when higher throughput or faster tag verification is required. This architectural flexibility allows system integrators to trade off area and latency based on specific application needs.
It is also important to note that the total resource utilization of the top level module is not a strict sum of the individual submodules. Additional control logic, such as Finite State Machines, signal routing, and orchestration logic, contributes to the overall footprint. Moreover, FPGA synthesis optimizations, including resource sharing and boundary logic absorption, may cause the final resource count to differ slightly from the sum of its parts. Overall, the design achieves an efficient and scalable partitioning of functionality, maintaining a compact resource profile while supporting secure, high-throughput authenticated communication.
Table 11 compares our architecture with representative Ascon hardware implementations. While prior works primarily focus on performance and area, they do not address replay protection. Moreover, due to differences in FPGA platforms and inconsistent reporting of power consumption and message size, a fully equivalent comparison is not feasible. Our implementation uniquely integrates real-time replay detection using a Bloom Filter, while maintaining a compact hardware footprint on an Artix-7 platform.
5.5. Scalability Analysis of Bloom Filter Replay Detection
To evaluate the scalability of the proposed replay detection system, we analyze the Bloom Filter’s theoretical capacity using the standard FPR model. The Bloom Filter is configured with a size of 1 Mbit and utilizes
hash functions derived from the 256-bit Ascon-XOF128 output. This estimation, based on Equation (
5), assumes independent and uniformly distributed hash functions and no bit-level errors. These are conditions commonly accepted in Bloom Filter applications. The use of Ascon-XOF128, an NIST-standard lightweight Extendable Output Function, provides high entropy and strong diffusion, aligning well with the theoretical assumptions. This ensures that the Bloom Filter indices are evenly distributed, which is critical to minimizing FPR and maintaining predictable performance.
To empirically validate these assumptions, we conducted using Python 3.11.13 simulation comparing two nonce generation strategies: a simple counter and the proposed hardware- oriented 128-bit LFSR. In both cases, nonces were hashed using Ascon-XOF128 to generate Bloom Filter indices.
Table 12 summarizes theoretical design parameters, while
Table 13 presents the experimental results from Python simulation. These collectively confirm the system’s scalability and robustness in replay protection, especially under high-throughput conditions.
For up to 100,000 unique nonce insertions, both methods maintained an FPR well below 1%, consistent with theoretical predictions. To evaluate behavior beyond the nominal capacity, the test was extended to 200,000 total queries. After 193,191 insertions, the counter-based approach yielded an FPR of 3.40%, while the LFSR-based method resulted in 3.37%. These findings confirm that Ascon-XOF128’s strong diffusion produces uniformly distributed indices regardless of nonce structure, validating the suitability of LFSR as a lightweight hardware nonce generator.
The current Bloom Filter configuration supports up to 100,000 unique nonces with an FPR of 0.17%. However, continued insertions beyond this capacity, as illustrated by our empirical test, lead to filter saturation and a gradual increase in FPR. This is a well-documented limitation of fixed-size Bloom Filters in long-running deployments. To address this in practice, strategies such as periodic filter resets, rotation, or time-based partitioning can be employed depending on application and security requirements [
23]. However, periodic resets inherently discard prior entries, which may permit undetected replay of older messages. Future work will further evaluate long-term FPR behavior under varying nonce distributions and saturation scenarios.
While the current implementation uses a fixed size Bloom Filter, future iterations may incorporate adaptive structures such as counting filters, time-based aging, or sliding windows to maintain replay detection performance under prolonged high-throughput operation. These mechanisms can help mitigate saturation without requiring full resets, especially in resource-constrained devices with continuous communication.
The actual nonce requirement depends heavily on device behavior. High-throughput devices (e.g., smart meters, industrial sensors) may quickly saturate the Bloom Filter, while low-duty-cycle nodes (e.g., wearables, environmental sensors) may operate within capacity over long durations. For these cases, session-based reset policies aligned with key rotation can be employed. Since nonce tracking is only valid within a key epoch, resetting the Bloom Filter alongside key updates avoids stale data accumulation and ensures effective replay detection. These system-level strategies maintain security while keeping memory use minimal, demonstrating the proposed method’s practicality for diverse IoT deployments.
Importantly, the proposed design ensures that only authenticated messages, i.e., those passing Ascon tag verification, result in the nonce being inserted into the Bloom Filter. As such, even if an adversary attempts to flood the system with spoofed messages containing randomly generated nonces, these nonces will not be recorded unless the corresponding ciphertext passes authentication. Given the strength of Ascon’s tag validation, the likelihood of a forged message being accepted is negligible. This design choice mitigates DoS attempts via nonce flooding, as unauthorized or unauthenticated messages cannot contribute to Bloom Filter saturation. Consequently, the filter maintains its integrity and performance, even under adversarial conditions.
Additionally, the architecture is well-suited for extension to multi-core or distributed deployments, where synchronized hardware nonce generation across nodes can ensure system-wide protection against replay attacks. These directions offer pathways for further scaling the design to match the evolving demands of next-generation IoT and edge-computing systems.
6. Conclusions
This work presents a lightweight, FPGA-based implementation of Ascon-128 with integrated replay attack detection using a Bloom Filter. The proposed architecture addresses a key limitation of the standard Ascon specification, its inability to distinguish between original messages and unauthorized replays, by introducing a hardware-based replay protection layer. The design leverages a 128-bit LFSR-based nonce generator, which produces a unique nonce per encryption operation. Unlike counter-based methods that require persistent state, the LFSR offers a lightweight, stateless hardware solution that maintains uniqueness without relying on external entropy sources. Replay detection is achieved by hashing each nonce using Ascon-XOF128 and mapping the output to a Bloom Filter stored in on-chip BRAM. The hashing utilizes the same permutation core as the main AEAD datapath, minimizing hardware duplication and ensuring a compact implementation. Session isolation is enforced by resetting the Bloom Filter when the session key changes, preventing stale entries from persisting across authentication sessions. This tightly integrated approach maintains low latency and achieves reliable, low-overhead replay detection, with an FPR below 1%, where the theoretical and empirical values are 0.77% and 0.17%, respectively, based on testing with 100,000 nonces. This enhancement is achieved without modifying the Ascon-128 AEAD algorithm, preserving NIST compliance while extending its capabilities. The replay protection module and nonce generator act as modular extensions, providing a lightweight and scalable plug-in that enhances Ascon-128 from a lightweight encryption scheme into a secure authentication framework with integrated replay detection. The architecture supports a range of configurations, enabling trade offs between latency and area efficiency. Overall, the proposed design is well-suited for low-power, regulation-compliant IoT applications that require secure and efficient message authentication at scale.