Next Article in Journal
Federated Unlearning Framework for Digital Twin–Based Aviation Health Monitoring Under Sensor Drift and Data Corruption
Previous Article in Journal
BIM-Based Adversarial Attacks Against Speech Deepfake Detectors
Previous Article in Special Issue
HLSCAM: Fine-Tuned HLS-Based Content Addressable Memory Implementation for Packet Processing on FPGA
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

High-Throughput Post-Quantum Cryptographic System: CRYSTALS-Kyber with Computational Scheduling and Architecture Optimization

1
Department of Engineering Science, National Cheng Kung University, No. 1 University Road, Tainan 701, Taiwan
2
Program on Integrated Circuit Design, National Cheng Kung University, No. 1 University Road, Tainan 701, Taiwan
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(15), 2969; https://doi.org/10.3390/electronics14152969
Submission received: 21 June 2025 / Revised: 21 July 2025 / Accepted: 22 July 2025 / Published: 24 July 2025

Abstract

With the development of a quantum computer in the near future, classical public-key cryptography will face the challenge of being vulnerable to quantum algorithms, such as Shor’s algorithm. As communication technology advances rapidly, a great deal of personal information is being transmitted over the Internet. Based on our observation that the Kyber algorithm exhibits a significant number of idle cycles during execution when implemented following the conventional software procedure, this paper proposes a high-throughput scheduling for Kyber by parallelizing the SHA-3 function, the sampling algorithm, and the NTT computations to improve hardware utilization and reduce latency. We also introduce the 8-stage pipelined SHA-3 architecture and multi-mode polynomial arithmetic module to increase area efficiency. By also optimizing the hardware architecture of the various computational modules used by Kyber, according to the implementation result, an aggregate throughput of 877.192 kOPS in Kyber KEM can be achieved on TSMC 40 nm. In addition, our design not only achieves the highest throughput among existing studies but also improves the area and power efficiencies.

1. Introduction

With the development of quantum computing technology, traditional encryption algorithms are facing severe challenges, especially in the field of public-key cryptography. Historically, public-key cryptography has relied on complex mathematical operations, such as the integer factorization problem, for example, the Rivest–Shamir–Adleman (RSA) cryptosystem, and elliptic curve cryptography (ECC). However, suitable algorithms and large-scale quantum computers can easily address these problems. For example, Shor’s algorithm  [1] is a quantum algorithm that uses quantum computers to accelerate the process of integer factorization. Therefore, research institutions around the world have already begun to develop post-quantum cryptography (PQC) that can effectively resist the threats posed by quantum computers while remaining secure on classical computers. The National Institute of Standards and Technology (NIST) also started standardizing public-key encryption and key-encapsulation mechanism (PKE/KEM) algorithms as early as 2016. CRYSTALS-Kyber is a PKE/KEM standardized by NIST under the name ML-KEM in August 2023 [2,3]. Kyber is built upon the module-learning-with-errors (MLWE) problem over lattices, forming a robust foundation for its encryption and key exchange protocols [4,5].
The methods for implementing Kyber in hardware can be mainly divided into two categories. One is to support instruction set extension to increase hardware flexibility [6,7,8], the other is to accelerate specific computations of Kyber to enhance overall performance [9,10,11,12,13,14,15,16,17,18].
The work [6] entailed the design of 20 customized instruction codes to facilitate the control and operation of Kyber. It also proposed a parallel scheduling in polynomial sampling and SHA-3 computation [19] to decrease the required cycles. A vector-architecture processor based on RISC-V called VPQC was proposed in [8], which has a custom instruction set extended from RISC-V to improve the flexibility and efficiency in ASIC.
Since Kyber requires modular reductions after addition, subtraction and multiplication, ref. [18] proposed the K2-RED modular reduction algorithm to decrease the additional cost of memory utilization. Chauhan et al. [9] proposed a reconfigurable and hardware-efficient KECCAK architecture that integrates SHAKE functionality and supports dynamic input processing, enabling compatibility with a wide range of post-quantum cryptographic algorithms, including Kyber. In addition, Alrayes et al. [16] investigated pipeline optimization techniques for SHA-3 on FPGA, analyzing how different levels of pipelining impact performance, latency, and resource usage. The study [17] implemented a systolic array architecture for the number-theoretic transform (NTT) to enhance throughput during NTT operations. Furthermore, it also developed an out-of-order execution mechanism to reduce the overall latency. Shimada et al. [15] presented a high-throughput CRYSTALS-Kyber processor featuring pipelined NTT and SHA-3 units, achieving a good balance between performance and energy efficiency. Li et al. [12] proposed a secure and energy-efficient post-quantum crypto-processor based on modular design and separated secure/public memory architecture, achieving an energy consumption of 2.76   μ J for the complete Kyber KEM operation. Subsequently, an ultra-low-power variant [11] further reduced the energy to 0.72   μ J and chip area to 0.34   mm 2 , making it well suited for resource-constrained edge devices.
Since the standardization of post-quantum cryptography has been finalized, Kyber—now established as the first NIST-approved KEM—has been integrated into various secure communication protocols. In server-side environments, the growing demand for large-scale and latency-sensitive cloud services requires systems to handle a high number of concurrent secure connections. Post-quantum key exchange using Kyber has already been adopted or proposed in TLS 1.3, QUIC (HTTP/3), VPN gateways, and edge cloud platforms. These scenarios demand frequent and efficient key encapsulation, making high-throughput Kyber implementations critical for maintaining performance at scale. We present a solution that can significantly reduce computational latency and enhance performance by optimizing computational scheduling and hardware architecture. For clarity, this work focuses on Kyber512.
The contributions of this work are briefly listed below:
  • We propose a high-throughput scheduling for Kyber by parallelizing and pipelining the SHA-3 function, the sampling algorithm, and the NTT computations. The proposed scheduling reduces the time required to generate the necessary polynomial coefficients to just one-fourth of that required by the conventional software procedure.
  • Due to the iterative and repetitive nature of hash operations in Kyber, pipeline bubbles are inevitable with both round-based and unrolled-pipeline architectures. Therefore, we propose an 8-stage pipelined architecture that effectively addresses feedback issues during the absorbing and squeezing phases of SHA-3 operations.
  • We propose a multi-mode polynomial arithmetic module based on an unrolled-pipeline mixed-radix architecture, which can be configured into different modes for various polynomial computations in Kyber. Moreover, it employs resource-sharing techniques to maximize the utilization of each pipeline stage within a limited and reasonable hardware area.
The rest of this paper is organized as follows. In Section 2, we briefly describe the background of Kyber. In Section 3, our proposed efficient scheduling and high-speed architecture are presented. In Section 4, we discuss the experimental results. Finally, Section 5 draws the conclusions.

2. Preliminaries

Learning with Errors (LWE) is a cryptographic problem based on the hardness of the Closest Vector Problem in lattice theory. It is formulated as follows: given a public matrix A   Z q k × k and a noisy vector t = A s + e   Z q k , recover the secret vector s   Z q k . The error vector e   Z q k consists of small random entries. The presence of e makes the problem computationally intractable and forms the security foundation for many lattice-based schemes.
Kyber adopts a structured variant called Module Learning with Errors (MLWE) [20], which generalizes LWE to polynomial rings. In MLWE, the matrix A and vectors s , e , and t have the same dimensions as in LWE, but their entries are polynomials with coefficients in a finite field, rather than individual integers. The problem is defined as:
M L W E A ( s , e ) = t = A · s + e .
To accelerate polynomial multiplication in MLWE-based schemes such as Kyber, the Number Theoretic Transform (NTT) is used. As a finite-field analog of the Discrete Fourier Transform, the NTT converts polynomials from the coefficient domain to a form that allows efficient element-wise (point-wise) multiplication. The result is then transformed back using the inverse NTT (INTT).

2.1. Symbol Definition

The integer polynomials ring modulo X n + 1 is defined as R q = Z q X / X n + 1 in which n = 256 is the dimension and q = 3329 is the prime modulus. Thus, the polynomial F R q can be defined as F = i = 0 n 1 f i X i where f i Z q for all i.

2.2. Auxiliary Functions in Kyber

To better understand the Kyber algorithm, we need to first illustrate various functions it uses, as follows:
  • Hash function: The hash function used in Kyber is the SHA-3 algorithm. Four different SHA-3 modes are employed to increase randomness in Kyber: SHA3-256, SHA3-512, SHAKE128, and SHAKE256.
  • Sampling function: The sampling functions in Kyber include SampleNTT and SamplePolyCBD. The former primarily samples coefficients from the NTT domain for the matrix A ^ , while the latter provides property with a centered binomial distribution (CBD) on Z q . In this work, the hat on top, ( · ) ^ , represents a term in the NTT domain.
  • Compress function: The compress function primarily serves two purposes: reducing the size of the ciphertext and creating an error tolerance gap for MLWE during decryption. The function compresses the input x into the range { 0 , 1 , , 2 d 1 } , d { 1 , 4 , 10 } , and then the result is rounded to the nearest integer, as shown below:
    Compress d ( x ) = 2 d q · x mod 2 d
    However, if the compression module is directly implemented with (2), it would require 256 dividers for fully parallel processing. Therefore, we propose a hardware-friendly implementation to reduce its complexity.

2.3. Kyber KEM Algorithms

Kyber is one of the post-quantum KEM algorithms standardized by NIST and consists of key-generation, encapsulation and decapsulation. These algorithms are introduced below.
  • Key-Generation: The key-generation algorithm produces a public key pk and a secret key sk. The function is derived from the MLWE-based construction in (1), where the secret vector s and error vector e are sampled from the centered binomial distribution (CBD), and the matrix A ^ is sampled uniformly in the NTT domain. The resulting pk consists of ( t ^ , ρ ), where ρ is derived from SHA3-512 applied to a random seed d, and the secret key sk is the NTT form of s . A simplified version of the key-generation process is listed as follows in Algorithm 1:
Algorithm 1: K-PKE: Key Generation
Output: Public key pk, Secret key sk
1 
d RandomBytes ( ) ;
2 
( ρ , σ ) SHA 3 _ 512 ( d ) ;
3 
A ^ R q k × k in NTT domain Gen _ matrix ( ρ ) ;
4 
( s , e ) R q k CBD _ Sample η 1 ( σ ) ;
5 
( s ^ , e ^ ) ( NTT ( s ) , NTT ( e ) ) ;
6 
t ^ A ^ s ^ + e ^ ;
7 
( p k , s k ) ( Encode ( t ^ ρ ) , Encode ( s ^ ) ) ;
8 
return  ( p k , s k )
  • Encapsulation: The encapsulation algorithm takes the public key pk and the message m as input to produce the ciphertext ct and a 32-byte shared secret key K. First, m is decompressed to generate an error tolerance gap, resulting in μ , where bit “0” maps to 0 and bit “1” maps to 1665. Next, the ciphertext components are computed as u = INTT ( A ^ T r ^ ) + e 1 and v = INTT ( t ^ T r ^ ) + e 2 + μ . Finally, the shared key K is derived as the first 32 bytes of SHA 3 - 512 ( m , SHA 3 - 256 ( pk ) ) , and the ciphertext ct consists of (Compress ( u ^ ), Compress (v)). A simplified version of the encryption process is listed as follows in Algorithm 2:
Algorithm 2: K-PKE: Encryption
Input: Public key pk, Message m, Random coins r
Output: Ciphertext c
  1
( t ^ , ρ ) Decode ( pk )
  2
A ^ T R q k × k in NTT domain Gen _ matrix ( ρ )
  3
r R q k CBD _ Sample η 1 ( r )
  4
e 1 R q k CBD _ Sample η 2 ( r )
  5
e 2 R q CBD _ Sample η 2 ( r )
  6
r ^ NTT ( r )
  7
u INTT ( A ^ T r ^ ) + e 1
  8
v INTT ( t ^ T r ^ ) + e 2 + Decompress ( m )
  9
( c 1 , c 2 ) ( Encode ( u ) , Encode ( v ) )
10
return  ( c 1 c 2 )
  • Decapsulation: The decapsulation algorithm takes dk and ct as input to produce the shared secret key K. Firstly, m is computed as Compress( v INTT( s ^ T u ^ )). Next, K is the first 32 bytes of SHA3-512( m , h), where h is SHA3-256(ek). Finally, through re-encryption verification, the same K can be obtained. A simplified version of the decryption process is listed as follows in Algorithm 3:
Algorithm 3: K-PKE: Decryption
Input: Secret key sk, Ciphertext c
Output: Message m
1 
( u , v ) Decode ( c )
2 
s ^ Decode ( sk )
3 
m Compress v INTT s ^ NTT ( u )
4 
return  m

2.4. Mathematical Foundations of the Kyber Mixed Radix-2/4 NTT

Although [21] presents a radix-4 NTT algorithm applicable to Kyber, it does not provide a derivation showing how the radix-4 butterfly arises from combining two radix-2 stages. In contrast, the mathematical derivation in [22] demonstrates that the Number Theoretic Transform (NTT) adopted in Kyber can be decomposed using a radix-2 structure. Based on this foundation, the derivation is further extended to show that two consecutive radix-2 stages can be consolidated into a single radix-4 stage. Consequently, the original 7-stage radix-2 NTT in Kyber can be equivalently realized using three radix-4 stages and one radix-2 stage.
A mathematical derivation is presented to establish the theoretical foundation for merging two consecutive radix-2 stages into a single radix-4 butterfly. This formulation serves as the basis for constructing an equivalent mixed-radix structure applicable to Kyber’s NTT.
The NTT are written as a summation of N items:
f ^ i = j = 0 N 1 f j · γ 2 N j · ω N i j , for i = 0 , 1 , , N 1 .
  • ω N is the N-th primitive root of unity modulo q.
  • γ N is a scaling factor introduced to simplify reduction, where ω N = γ N 2 .
The initial radix-2 decomposition can be written as [22]:
f ^ i = j = 0 N / 2 1 f 2 j · γ N j · ω N / 2 i j NTT of even-indexed inputs + γ 2 N · ω N i Twiddle rotation term · j = 0 N / 2 1 f 2 j + 1 · γ N j · ω N / 2 i j NTT of odd-indexed inputs = f ^ i ( 0 ) Partial sum from even terms + γ 2 N · ω N i · f ^ i ( 1 ) Weighted partial sum from odd terms for i = 0 , 1 , , N 2 1 ,
f ^ i + N / 2 = f ^ i ( 0 ) Partial sum from even terms γ 2 N · ω N i · f ^ i ( 1 ) Weighted partial sum from odd terms , for i = 0 , 1 , , N 2 1 .
We now expand both f ^ i ( 0 ) and f ^ i ( 1 ) from Equations (4) and (5) using another radix-2 decomposition. For i = 0 , 1 , , N 4 1 , the even-indexed partial NTT f ^ i ( 0 ) becomes:
f ^ i ( 0 ) = j = 0 N / 4 1 f 4 j · γ N j · ω N / 4 i j + γ N · ω N / 2 i · j = 0 N / 4 1 f 4 j + 2 · γ N j · ω N / 4 i j , for i = 0 , 1 , , N 4 1
Similarly, the odd-indexed partial NTT f ^ i ( 1 ) becomes:
f ^ i ( 1 ) = j = 0 N / 4 1 f 4 j + 1 · γ N j · ω N / 4 i j + γ N · ω N / 2 i · j = 0 N / 4 1 f 4 j + 3 · γ N j · ω N / 4 i j , for i = 0 , 1 , , N 4 1
By grouping terms according to their input index patterns—namely, even-of-even, odd-of-even, even-of-odd, and odd-of-odd—we define four partial sums f ^ i ( 0 , 0 ) , f ^ i ( 0 , 1 ) , f ^ i ( 1 , 0 ) , f ^ i ( 1 , 1 ) as follows:
f ^ i ( 0 , 0 ) = j = 0 N / 4 1 f 4 j · γ N j · ω N / 4 i j , for i = 0 , 1 , , N 4 1
f ^ i ( 0 , 1 ) = j = 0 N / 4 1 f 4 j + 2 · γ N j · ω N / 4 i j , for i = 0 , 1 , , N 4 1
f ^ i ( 1 , 0 ) = j = 0 N / 4 1 f 4 j + 1 · γ N j · ω N / 4 i j , for i = 0 , 1 , , N 4 1
f ^ i ( 1 , 1 ) = j = 0 N / 4 1 f 4 j + 3 · γ N j · ω N / 4 i j , for i = 0 , 1 , , N 4 1
Substituting Equations (6) and (7) into Equation (4), we obtain:
f ^ i = f ^ i ( 0 , 0 ) + γ N · ω N / 2 i · f ^ i ( 0 , 1 ) + γ 2 N · ω N i · f ^ i ( 1 , 0 ) + γ N · ω N / 2 i · f ^ i ( 1 , 1 ) = f ^ i ( 0 , 0 ) + γ N · ω N / 2 i · f ^ i ( 0 , 1 ) + γ 2 N · ω N i · f ^ i ( 1 , 0 ) + γ 2 N · γ N · ω N i · ω N / 2 i · f ^ i ( 1 , 1 ) = f ^ i ( 0 , 0 ) + γ 2 N · ω N i · f ^ i ( 1 , 0 ) + γ 2 N 2 · ω N 2 i · f ^ i ( 0 , 1 ) + γ 2 N 3 · ω N 3 i · f ^ i ( 1 , 1 ) , for i = 0 , 1 , , N 4 1 .
Similarly, from Equation (5), we derive:
f ^ i + N / 2 = f ^ i ( 0 , 0 ) + γ N · ω N / 2 i · f ^ i ( 0 , 1 ) γ 2 N · ω N i · f ^ i ( 1 , 0 ) γ 2 N · γ N · ω N i · ω N / 2 i · f ^ i ( 1 , 1 ) = f ^ i ( 0 , 0 ) γ 2 N · ω N i · f ^ i ( 1 , 0 ) + γ 2 N 2 · ω N 2 i · f ^ i ( 0 , 1 ) γ 2 N 3 · ω N 3 i · f ^ i ( 1 , 1 ) , for i = 0 , 1 , , N 4 1 .
To derive the other outputs f ^ i + N / 4 and f ^ i + 3 N / 4 , we apply phase rotation patterns in the radix-4 structure using the 4th root of unity ω N N / 4 . The resulting expressions are:
f ^ i + N / 4 = f ^ i ( 0 , 0 ) + γ 2 N · ω N i · ω N N / 4 · f ^ i ( 1 , 0 ) γ 2 N 2 · ω N 2 i · f ^ i ( 0 , 1 ) γ 2 N 3 · ω N 3 i · ω N N / 4 · f ^ i ( 1 , 1 ) , for i = 0 , 1 , , N 4 1
f ^ i + 3 N / 4 = f ^ i ( 0 , 0 ) + γ 2 N · ω N i · ω N 3 N / 4 · f ^ i ( 1 , 0 ) γ 2 N 2 · ω N 2 i · f ^ i ( 0 , 1 ) γ 2 N 3 · ω N 3 i · ω N 3 N / 4 · f ^ i ( 1 , 1 ) , for i = 0 , 1 , , N 4 1
This result corresponds exactly to a radix-4 butterfly where the four partial sums f ^ i ( 0 , 0 ) , f ^ i ( 0 , 1 ) , f ^ i ( 1 , 0 ) , f ^ i ( 1 , 1 ) are processed in a single stage, this derivation provides the mathematical foundation for implementing the Kyber NTT using a mixed-radix architecture.

3. Proposed Hardware Architecture

3.1. Proposed Computational Scheduling

Since Kyber is based on MLWE, it requires the generation of A , s , and e during its computation process. We observed that, in the computational procedure of Kyber’s pseudocode [3], the matrix A ^ is generated first, followed by the sequential generation of vectors s and e . However, the sampling time for s and e is shorter than that for A ^ . Upon completion of the sampling of s and e , they can perform NTT first to improve overall throughput. Therefore, there are opportunities for parallel processing between SHA-3, sampling, and NTT operations.
To that end, we propose an optimized scheduling framework, as shown in Figure 1, that overlaps SHA-3 hashing, sampling, and NTT computations. We first pipeline the SHA-3 core into 8 stages, where each stage executes three Keccak permutation rounds. This allows the hardware to output hash values at regular intervals without pipeline bubbles. These hash outputs are then alternately fed into two dedicated samplers: SamplePolyCBD (used for s and e ) and SampleNTT (used for A ).
Moreover, as SamplePolyCBD only requires two hash blocks to generate a polynomial while SampleNTT requires four, we prioritize s and e generation. This design enables polynomials from SamplePolyCBD to be transformed by the NTT unit earlier. Meanwhile, SampleNTT continues sampling polynomials for A in the background. This interleaved scheduling absorbs the latency of SHA-3, sampling, and NTT operations. As a result, it significantly improves throughput and area efficiency, without additional buffering or complex control logic.

3.2. Proposed High-Speed Architecture

Figure 2 presents the overall architecture of this work, including a multi-mode polynomial arithmetic, SHA-3, sampling, and compress and decompress modules, which can be configured for key generation, encapsulation, and decapsulation in Kyber. It also illustrates the data flow configuration for key generation, while the data flows for encapsulation and decapsulation are similar and omitted here.
For the key generation described in (1), firstly, the input data are processed through the SHA-3 module to compute the seeds for A , s and e . Next, these seeds are sampled to obtain the correct coefficients. The vectors s and e are then computed using the multi-mode polynomial arithmetic module for NTT, followed by pointwise multiplication and modular addition. Finally, the generated pk and sk are processed through SHA3-256 to produce the final ek and dk.

3.3. Multi-Mode Polynomial Arithmetic Module

Since Kyber is based on the MLWE problem, it involves several polynomial operations, including the NTT, INTT, pointwise multiplication, modular addition, and modular subtraction. The fundamental computation units for these operations are modular adders, modular subtractors, and modular multipliers, which can be combined into a common butterfly architecture. Therefore, we utilize resource-sharing techniques to balance high throughput with hardware efficiency in implementing these operations.
There exist various implementation approaches for the Number Theoretic Transform (NTT) used in the Kyber algorithm. One prominent method is the mixed-radix NTT architecture [23], which serves as a foundational concept for mixed-radix implementations. In [21], a mixed-radix method combining radix-2 and radix-4 butterfly operations was proposed to optimize the NTT architecture, particularly benefiting implementations with an odd number of stages.
However, these existing approaches typically utilize only a small number of butterfly units for computation, which leads to a performance bottleneck in high-throughput Kyber scenarios. To address this issue, we propose a mixed-radix architecture based on a fully unrolled-pipeline technique. By leveraging this technique, the performance bottleneck caused by the limited number of butterfly units is effectively addressed, thereby enabling high-throughput operation suitable for practical Kyber deployments. Moreover, the proposed design eliminates the need for an additional address generator, thereby enhancing throughput and achieving a more efficient and scalable implementation suitable for post-quantum cryptographic applications. These hardware architectures are introduced below.
  • Proposed Mixed-Radix 2 1 / 4 3 Butterfly Architecture: we propose a mixed-radix architecture based on unrolled-pipeline techniques. Under this architecture, additional computational resources such as address generators or delay registers are not required to determine the input address for the next stage. Moreover, by fully unrolling the computation, polynomial operations are no longer the bottleneck in Kyber.
    Due to the mathematical background in Kyber, where the 512th primitive root cannot be found, it is unfeasible to perform a 256-point transformation for NTT/INTT, limiting the operation to 128 points. As a result of this factor, the polynomial must be divided into odd and even terms and transforming them using the same roots of unity in Kyber. Based on this characteristic, we design an 128-point mixed-radix 2 1 / 4 3 NTT/INTT architecture to compute the 256-term polynomial NTT and INTT; the incoming 256-term polynomial needs to be divided into odd and even groups, and then input into the mixed-radix architecture sequentially. However, when performing pointwise multiplication, modular addition, and modular subtraction, the 256-term polynomial can be directly input into the mixed-radix architecture for computation. Figure 3 is the block diagram of the proposed mixed-radix 2 1 / 4 3 architecture. In the proposed mixed-radix architecture, the first stage consists of 64 sets radix-2 butterfly, while each of the second to fourth stages consist of 32 sets radix-4 butterfly; the individual radix-2 and radix-4 butterfly units are shown in Figure 4 and Figure 5, respectively. The reordering module, required due to the shared architecture between the NTT and INTT, is shown in Figure 6. Since all modules are pipelined, the input interface of this module receives 256 polynomial coefficients simultaneously, with each coefficient represented in 12 bits, resulting in a total input width of 3072 bits. The output follows the same format. A control signal, mode, determines the operational behavior of the module.
    In the original algorithm, the output addresses of each stage in the NTT and INTT operations differ, resulting in distinct unfolded architectures. This design arises because, during NTT computation, the input is in normal order while the output is in bit-reversed order, whereas INTT operates in the opposite manner. To create a shared architecture capable of computing both NTT and INTT, adjustments are necessary. Thus, we designed a coefficient permutation module to ensure the correctness of the order after transformation. This module adjusts the output order from being bit-reversed to normal, as shown in Figure 6. Notably, due to the unroll-pipeline architecture, the permutation module only requires rewiring without needing any logic gates.
  • Proposed Hardware Architecture of Radix-2 Butterfly: To achieve high throughput, we implement the radix-2 butterfly using a three-stage pipelined architecture, combined with modular adders, modular subtractors, and multipliers. Since the product may exceed q-1 after multiplication, we need to insert a modular reduction unit after the multipliers.
    The modular reduction unit in our design is based on [7], which proposes a hardware-friendly modular reduction algorithm tailored for Kyber’s mathematical background. The proposed reduction unit effectively simplifies the modular reduction operation for 24-bit products under Z q . The proposed algorithm decomposes the 24-bit product into its upper 12 bits and lower 12 bits, and utilizes mathematical manipulations to simplify values that exceed 12 bits, thereby simplifying the modular reduction operation. The detailed implementation of the radix-2 butterfly is shown in Figure 4. The red datapath represents the three-stage pipelined architecture of the single-stage radix-2 NTT butterfly, including modular addition, subtraction, and multiplication.
    To reduce resource usage, arithmetic units are shared between the single-stage radix-2 NTT and INTT butterflies. The blue datapath illustrates the INTT butterfly that reuses the same computational resources.
  • Proposed Hardware Architecture of Radix-4 Butterfly: We implement the radix-4 butterfly in a five-stage pipelined architecture. The mul_quarter modules of NTT and INTT are constant multiplications designed for ω 4 1 and ω 4 1 , respectively. These modules adopt the same modular multiplication strategy as the radix-2 design, where the product is followed by a modular reduction to ensure correctness. The detailed implementation of the radix-4 butterfly is shown in Figure 5. The red datapath represents the five-stage pipelined architecture of a single-stage radix-4 NTT butterfly, which includes modular addition, subtraction, and multiplication. To minimize resource consumption, arithmetic units are shared between the radix-4 NTT and INTT butterflies. The blue datapath highlights the INTT computation, which reuses the same set of arithmetic units.
    When implementing the INTT, it is necessary to multiply all output coefficients by the modular inverse of the number of points used in the computation at the final stage. Our proposed approach preprocesses the roots of unity in the radix-4 butterfly at the final stage. Figure 7 represents the radix-4 INTT butterfly in the final stage. The blue multiplier denotes the additional multiplication for the modular inverse, while the black ones perform multiplication with the roots of unity after preprocessing in the original radix-4 INTT butterfly. Here, we preprocess the roots of unity and modular inverse offline; 3303 is the modular inverse of 128 in Z q . Through this method, the mixed-radix 2 1 / 4 3 INTT architecture can eliminate 75% of the computational workload at the final stage.

3.4. SHA-3 Module

SHA-3 consists mainly of 24 rounds of iteration functions, where each round implements five types of operations: θ , ρ , π , χ , ι [19]. Due to the iterative and repetitive nature of hash operations, implementing a 24-stage fully unrolled-pipeline architecture for Kyber would consume excessive resources and cause pipeline bubbles. This is because the SHA-3 requires feedback processing for more than one absorbing or squeezing operation, leading to inefficient performance in a fully unrolled-pipeline architecture. Therefore, according to the proposed scheduling, we propose a pipelined architecture with 8 stages, as shown in Figure 8. The Keccak state register pads the 1344-bit input data to 1600 bits, and then processes the computation of 24 rounds over an 8-stage pipeline. This design approach can effectively address feedback issues during the SHA-3 operations.

3.5. Sampling Module

In addition to optimizing the hardware architecture of the multi-mode polynomial arithmetic module and SHA-3 modules, the sampling module also affects the overall throughput of Kyber. According to the computation scheduling, the SampleNTT module must complete its operation within 2 clock cycles. Therefore, we have designed the SampleNTT module to sample 84 bytes per clock cycle. As for SamplepolyCBD 2 and SamplepolyCBD 3 , they must be able to sample 64 and 66 bytes per clock cycle, respectively.

3.6. Compress Module

Due to different values of d in (2), the Compress function can be configured into three modes: Compress 1 , Compress 4 , and Compress 10 . For Compress 1 , the computation can be seen as compressing to 1 if input x is between 833 and 2496; otherwise, it is compressed to 0. Therefore, it only requires implementation using comparators and does not need additional optimization. Hence, our focus for optimization lies in Compress 4 and Compress 10 . Firstly, taking Compress 10 as an example, 2 10 / 3329 is approximately 0.3076, represented in binary as 0.0100111011 2 . To ease the hardware implementation, it can be approximated as shown in (16):
2 10 · x 3329 315 · x 1024 .
According to (2), rounding operations are necessary. However, since (16) already approximates the computation, we can easily find the approximation number to be added for rounding. Finally, the formal simplifications of the Compress 4 and Compress 10 functions can be expressed as (17) and (18), respectively,
Compress 4 ( x ) = 2 4 · x + 1664 3329 = 315 · x + 32,701 65536
Compress 10 ( x ) = 2 10 · x + 1664 3329 315 · x + 484 1024 .
Even though Compress 10 does not have an optimal approximation expression after simplification, the result of (18) is still very close to the correct result. Therefore, a simple comparator can be used for error correction purposes to filter out incorrect terms, as shown in Figure 9. For Compress 4 , the optimal rounding approximation was found to be 32,701. Therefore, no further error corrections are necessary, as the result of (17) equals (2) using fixed-point representations.

4. Experimental Results

Firstly, as mentioned in Section 3, we found opportunities for parallel processing between SHA-3, sampling, and NTT operations. Therefore, we proposed a computational scheduling for reducing the computation time and improving the overall throughput. The speedup of the proposed computation scheduling compared to that of the original pseudocode is shown in Figure 10. Notably, the speedup in computation latency can only be observed when using the same hardware module. For the key generation phase, our proposed scheduling reduces the total computational latency to approximately 140 clock cycles. This speedup is achieved by parallelizing the sampling and SHA-3 operations. Specifically, the computational latency of the sampling is fully absorbed by the SHA-3 module. As a result, our proposed computational scheduling significantly reduces the computation time, achieving an average reduction of about 23% compared to the pseudocode scheduling.
Next, to validate the hardware cost and performance of our proposed design, we use the Synopsys Design Compiler with the TSMC 40 nm standard cell library for logic synthesis. Additionally, the power consumption is estimated by gate-level simulations at 500 MHz. Table 1 shows the implementation results of the proposed design and state-of-the-art works. To further analyze the area usage within our design, we provide a detailed breakdown in Table 2. The polynomial arithmetic unit and SHA-3 engine together account for a significant portion of the area—40.1% and 26.5%, respectively—because both modules are fully unrolled and deeply pipelined to maximize throughput. This design decision trades area for speed and is the main contributor to our high throughput of 877.192k operations per second. The “Others” category, which accounts for 28.2% of the total area, includes controller logic and registers.
To provide a comprehensive evaluation, we compare our architecture with prior works that pursue both similar and differing design goals. Since ref. [8] adopts a RISC-V processor with an extended instruction set to enhance hardware computation flexibility, it focuses on general-purpose architectural enhancements. However, despite specific optimizations, this approach falls short in improving overall computational efficiency. On the other hand, refs. [11,12] are specifically designed for low-power scenarios, such as energy-constrained IoT and edge devices. Given the differing application contexts, direct comparison in terms of raw throughput and area may not fully reflect the effectiveness of each design. To provide a more comprehensive evaluation, we additionally consider the area efficiency and power efficiency as key metrics.
The works [6,17] optimize the computations and modules required for Kyber to achieve higher throughput and area efficiency. Notably, [17] c and [17] d can accelerate all operations required by Kyber, resulting in superior performance across various metrics. Compared to these designs, the implementation results in Table 1 show that the proposed design achieves the highest throughput of 877.192 kilo-operations per second (kOPS) for Kyber KEM, significantly outperforming all prior designs. Although the area of our design is significantly larger than that of lightweight implementations such as [11,12], which are specifically optimized for energy-constrained IoT and edge devices, our architecture achieves the highest area efficiency of 0.276 kOPS/kGE, demonstrating excellent hardware utilization. Furthermore, the proposed design also achieves the highest power efficiency of 2272.73 kOPS/W, indicating that our performance gains do not come at the cost of energy waste but rather optimize both speed and energy per operation.
In summary, while refs. [11,12] offer compact and ultra-low-power solutions tailored for edge applications, and ref. [8] demonstrates higher flexibility through a RISC-V based general-purpose design, these approaches sacrifice performance either because they are not optimized for Kyber, or because they are designed for low-resource environments. In contrast, our design targets performance-critical scenarios such as data centers or high-throughput systems, where maximizing throughput, area efficiency, and power efficiency is essential.

5. Conclusions

To meet the growing demand for secure high-throughput data transmission in cloud services, we propose an optimized CRYSTALS-Kyber hardware design that parallelizes NTT, SHA-3, and sampling operations for improved utilization and reduced latency. Synthesized under the TSMC 40 nm node, our design achieves a throughput of 877.192 kOPS and demonstrates the highest area efficiency among existing works at 0.276 kOPS/kGE. In addition, it achieves the highest power efficiency of 2272.73 kOPS/W, ensuring that performance gains are delivered without excessive energy consumption. These results highlight the suitability of our architecture for server-side applications, where maximizing both computational throughput and energy efficiency is essential for large-scale, performance-critical workloads.
While this work focuses on throughput-oriented architectural optimization, secure hardware deployment in real-world applications also requires consideration of side-channel resistance and fault tolerance. In future work, we plan to explore integrating basic countermeasures to enhance robustness against such attacks, depending on specific deployment scenarios.
Although our current implementation targets Kyber512, the proposed architecture is extensible to support Kyber768 and Kyber1024. These higher security levels would primarily require additional SHA-3 executions and sampling operations to generate more polynomials, as well as increased NTT computations due to the larger matrix dimensions. Modifications to the control logic and expanded on-chip memory would also be necessary to support the storage and processing of a larger number of polynomials. The Compress module can likewise be extended to support parameters such as d = 5 and d = 11 , by applying the same fixed-point approximation and rounding techniques presented in this work. These extensions can be integrated without a fundamental redesign of the architecture, and we plan to investigate them further in future research.

Author Contributions

Conceptualization, S.-H.C., Y.-H.Y., and W.-L.C.; Methodology, S.-H.C., Y.-H.Y., and W.-L.C.; Software, S.-H.C., Y.-H.Y., and W.-L.C.; Validation, S.-H.C., Y.-H.Y., and W.-L.C.; Formal analysis, S.-H.C., Y.-H.Y., and W.-L.C.; Investigation, S.-H.C., Y.-H.Y., and W.-L.C.; Resources, W.-L.C.; Data curation, W.-L.C.; Writing—original draft, W.-L.C.; Writing—review and editing, W.-L.C.; Visualization, C.C., C.-Y.T., P.-L.T., and W.-L.C.; Supervision, W.-L.C., C.C., C.-Y.T., and P.-L.T.; Project administration, W.-L.C.; Funding acquisition, W.-L.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Science and Technology Council (NSTC), Taiwan, under Grant No. NSTC 114-2221-E-006-184-MY2.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CBDCentered Binomial Distribution
ECCElliptic Curve Cryptography
HTTPHyperText Transfer Protocol
INTTInverse Number Theoretic Transform
KEMKey Encapsulation Mechanism
LWELearning With Errors
MLWEModule Learning With Errors
NISTNational Institute of Standards and Technology
NTTNumber Theoretic Transform
PKEPublic-Key Encryption
PQCPost-Quantum Cryptography
QUICQuick UDP Internet Connections
RLWERing Learning With Errors
RSARivest–Shamir–Adleman
SHA-3Secure Hash Algorithm 3
TLSTransport Layer Security
VPNVirtual Private Network

References

  1. Shor, P.W. Algorithms for Quantum Computation: Discrete Logarithms and Factoring. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science, Santa Fe, NM, USA, 20–22 November 1994; pp. 124–134. [Google Scholar]
  2. Avanzi, R.; Bos, J.; Ducas, L.; Kiltz, E.; Lepoint, T.; Lyubashevsky, V.; Schanck, J.M.; Schwabe, P.; Seiler, G.; Stehlé, D. CRYSTALS-Kyber: Algorithm Specifications and Supporting Documentation; Version 3.01; NIST (National Institute of Standards and Technology): Gaithersburg, MD, USA, 2021.
  3. FIPS PUB203; Specification for the Module-Lattice-Based Key-Encapsulation Mechanism Standard. National Institute of Standards and Technology: Gaithersburg, MD, USA, 2023.
  4. Moody, D.; Alagic, G.; Apon, D.; Cooper, D.; Dang, Q.; Kelsey, J.; Liu, Y.; Miller, C.; Peralta, R.; Perlner, R.; et al. Status Report on the Second Round of the NIST Post-Quantum Cryptography Standardization Process; NIST Interagency/Internal Report (NISTIR) 8309; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2020.
  5. Nejatollahi, H.; Dutt, N.; Ray, S.; Regazzoni, F.; Banerjee, I.; Cammarota, R. Post-Quantum Lattice-Based Cryptography Implementations: A Survey. ACM Comput. Surv. 2019, 51, 129. [Google Scholar] [CrossRef]
  6. Bisheh-Niasar, M.; Azarderakhsh, R.; Mozaffari-Kermani, M. Instruction-Set Accelerated Implementation of CRYSTALS-Kyber. IEEE Trans. Circuits Syst. I Reg. Pap. 2021, 68, 4648–4659. [Google Scholar] [CrossRef]
  7. Huang, Z.; Chen, S.; Sun, P.; Deng, D.; Sun, G. An Efficient and Low-Cost Design of Modular Reduction for CRYSTALS-Kyber. Electronics 2025, 14, 2309. [Google Scholar] [CrossRef]
  8. Xin, G.; Han, J.; Yin, T.; Zhou, Y.; Yang, J.; Cheng, X.; Zeng, X. VPQC: A Domain-Specific Vector Processor for Post-Quantum Cryptography Based on RISC-V Architecture. IEEE Trans. Circuits Syst. I Reg. Pap. 2020, 67, 2672–2684. [Google Scholar] [CrossRef]
  9. Chauhan, S.; Shrestha, R. Reconfigurable and Hardware-Efficient KECCAK Architecture with SHAKE Integration and Dynamic Input Processing for Post Quantum Cryptography. In Proceedings of the 2025 International VLSI Symposium on Technology, Systems and Applications (VLSI TSA), Hsinchu, Taiwan, 21–24 April 2025; pp. 1–4. [Google Scholar]
  10. Kieu-Do-Nguyen, B.; The Binh, N.; Pham-Quoc, C.; Nghi, H.P.; Tran, N.-T.; Hoang, T.-T.; Pham, C.-K. Compact and Low-Latency FPGA-Based Number Theoretic Transform Architecture for CRYSTALS-Kyber Postquantum Cryptography Scheme. Information 2024, 15, 400. [Google Scholar] [CrossRef]
  11. Li, A.; Lu, J.; Liu, D.; Yang, S.; Huang, T.; Zhang, J.; Xiong, S.; Yang, C.; Li, X. A 273 μW 0.34 mm2 Efficient CRYSTALS-KYBER Processor for PQC Towards Edge Computing. In Proceedings of the 2024 IEEE European Solid-State Electronics Research Conference (ESSERC), Bruges, Belgium, 9–12 September 2024; pp. 472–475. [Google Scholar]
  12. Li, A.; Lu, J.; Liu, D.; Li, X.; Yang, S.; Huang, T.; Zhang, J.; Xiong, S.; Yang, C. A 40 nm 2.76 μJ/Op Energy-Efficient Secure Post-Quantum Crypto-Processor for CRYSTALS-Kyber on Module-LWE. In Proceedings of the 2023 IEEE Asian Solid-State Circuits Conference (A-SSCC), Haikou, China, 5–8 November 2023; pp. 1–3. [Google Scholar] [CrossRef]
  13. Ni, Z.; Khalid, A.; Kundi, D.-S.; O’Neill, M.; Liu, W. HPKA: A High-Performance CRYSTALS-Kyber Accelerator Exploring Efficient Pipelining. IEEE Trans. Comput. 2023, 72, 3340–3353. [Google Scholar] [CrossRef]
  14. Nguyen, T.T.; Kim, S.; Eom, Y.; Lee, H. Area-Time Efficient Hardware Architecture for CRYSTALS-Kyber. Appl. Sci. 2022, 12, 5305. [Google Scholar] [CrossRef]
  15. Shimada, T.; Ikeda, M. High-Speed and Energy-Efficient Crypto-Processor for Post-Quantum Cryptography CRYSTALS-Kyber. In Proceedings of the 2022 IEEE Asian Solid-State Circuits Conference (A-SSCC), Taipei, Taiwan, 6–9 November 2022; pp. 12–14. [Google Scholar]
  16. Sideris, A.; Dasygenis, M. Enhancing the Hardware Pipelining Optimization Technique of the SHA-3 via FPGA. Computation 2023, 11, 152. [Google Scholar] [CrossRef]
  17. Zhao, Y.; Xie, R.; Xin, G.; Han, J. A High-Performance Domain-Specific Processor with Matrix Extension of RISC-V for Module-LWE Applications. IEEE Trans. Circuits Syst. I Reg. Pap. 2022, 69, 2871–2884. [Google Scholar] [CrossRef]
  18. Bisheh-Niasar, M.; Azarderakhsh, R.; Mozaffari-Kermani, M. High-Speed NTT-Based Polynomial Multiplication Accelerator for CRYSTALS-Kyber Post-Quantum Cryptography. In Proceedings of the 2021 IEEE 28th Symposium on Computer Arithmetic (ARITH), Lyngby, Denmark, 14–16 June 2021; Paper 2021/563. [Google Scholar]
  19. FIPS PUB202; SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions. National Institute of Standards and Technology: Gaithersburg, MD, USA, 2015.
  20. Albrecht, M.R.; Deo, A. Large Modulus Ring-LWE ≥ Module-LWE. In International Conference on the Theory and Application of Cryptology and Information Security; Paper 2017/612; Springer International Publishing: Cham, Switzerland, 2017. [Google Scholar]
  21. Guo, W.; Li, S. Highly-Efficient Hardware Architecture for CRYSTALS-Kyber with a Novel Conflict-Free Memory Access Pattern. IEEE Trans. Circuits Syst. I Reg. Pap. 2023, 70, 4505–4515. [Google Scholar] [CrossRef]
  22. Zhang, N.; Yang, B.; Chen, C.; Yin, S.; Wei, S.; Liu, L. Highly Efficient Architecture of NewHope-NIST on FPGA using Low-Complexity NTT/INTT. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2020, 2, 49–72. [Google Scholar] [CrossRef]
  23. Duong-Ngoc, P.; Lee, H. Configurable Mixed-Radix Number Theoretic Transform Architecture for Lattice-Based Cryptography. IEEE Access 2022, 10, 12732–12741. [Google Scholar] [CrossRef]
  24. Nguyen, T.-H.; Nguyen, H.-D.; Chen, J.-Y.; Lin, Y.-H. An Area-Time Efficient Hardware Architecture for ML-KEM Post-Quantum Cryptography Standard. IEEE Access 2025, 13, 103834–103847. [Google Scholar] [CrossRef]
Figure 1. Proposed computational scheduling.
Figure 1. Proposed computational scheduling.
Electronics 14 02969 g001
Figure 2. Proposed architecture for Kyber KEM. The data flow of key generation is also illustrated.
Figure 2. Proposed architecture for Kyber KEM. The data flow of key generation is also illustrated.
Electronics 14 02969 g002
Figure 3. The block diagram of the proposed mixed-radix NTT architecture.
Figure 3. The block diagram of the proposed mixed-radix NTT architecture.
Electronics 14 02969 g003
Figure 4. Hardware architecture of radix-2 butterfly.
Figure 4. Hardware architecture of radix-2 butterfly.
Electronics 14 02969 g004
Figure 5. Hardware architecture of radix-4 butterfly.
Figure 5. Hardware architecture of radix-4 butterfly.
Electronics 14 02969 g005
Figure 6. Order adjustment for the coefficient permutation.
Figure 6. Order adjustment for the coefficient permutation.
Electronics 14 02969 g006
Figure 7. The final stage of the radix-4 INTT butterfly.
Figure 7. The final stage of the radix-4 INTT butterfly.
Electronics 14 02969 g007
Figure 8. Proposed 8-stage pipelined SHA-3 architecture.
Figure 8. Proposed 8-stage pipelined SHA-3 architecture.
Electronics 14 02969 g008
Figure 9. Proposed architecture of the Compress module.
Figure 9. Proposed architecture of the Compress module.
Electronics 14 02969 g009
Figure 10. The speedup of proposed computational scheduling.
Figure 10. The speedup of proposed computational scheduling.
Electronics 14 02969 g010
Table 1. Comparison of synthesis results with state-of-the-art.
Table 1. Comparison of synthesis results with state-of-the-art.
WorkTech. (nm)Freq. (MHz)Logic Gates (kGE) bMemory (kB)Cycles a (kCCs)Total Time a ( μ s)Throughput a (kOPS)Area Eff. a (kOPS/kGE)Power Eff. a (kOPS/W)
[8]2830097912144.54822.0740.00278.13
[17]2854069724.7572.21347.4620.011900.01
[6]652009580211059.5230.091
[17] c2854062336.757.61471.4280.114
[17] d2854062336.7561190.9090.146
[15]6533613706.13212.977.5190.0566
[24]180931607.377.782.812.0770.075156.25
[12]401155321412.245106.499.3910.018362.32
[11]40270253721.14378.3112.2700.0511388.89
This Work4050031830.571.14877.1920.2762272.73
a The cycles, total time, throughput, area efficiency, and power efficiency are the results of (Key-Generation + Encapsulation + Decapsulation). b The area is represented by the two-input NAND equivalent gate counts for fair comparison, where a two-input NAND gate is 0.68 μ m2. c [17] with in-order instruction execution, without dynamic scheduling. d [17] with out-of-order instruction execution, with dynamic scheduling. Eff.: Efficiency.
Table 2. Area breakdown of major modules in the proposed design.
Table 2. Area breakdown of major modules in the proposed design.
ModuleLogic Gates (kGE)Area (%)
SHA-3843.49526.5%
Multi-mode Polynomial Arithmetic Unit1276.38340.1%
CBD Sampler6.3660.2%
Compression/Decompression159.155.0%
Others897.60628.2%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chou, S.-H.; Yang, Y.-H.; Chin, W.-L.; Chen, C.; Tsao, C.-Y.; Tung, P.-L. High-Throughput Post-Quantum Cryptographic System: CRYSTALS-Kyber with Computational Scheduling and Architecture Optimization. Electronics 2025, 14, 2969. https://doi.org/10.3390/electronics14152969

AMA Style

Chou S-H, Yang Y-H, Chin W-L, Chen C, Tsao C-Y, Tung P-L. High-Throughput Post-Quantum Cryptographic System: CRYSTALS-Kyber with Computational Scheduling and Architecture Optimization. Electronics. 2025; 14(15):2969. https://doi.org/10.3390/electronics14152969

Chicago/Turabian Style

Chou, Shih-Hsiang, Yu-Hua Yang, Wen-Long Chin, Ci Chen, Cheng-Yu Tsao, and Pin-Luen Tung. 2025. "High-Throughput Post-Quantum Cryptographic System: CRYSTALS-Kyber with Computational Scheduling and Architecture Optimization" Electronics 14, no. 15: 2969. https://doi.org/10.3390/electronics14152969

APA Style

Chou, S.-H., Yang, Y.-H., Chin, W.-L., Chen, C., Tsao, C.-Y., & Tung, P.-L. (2025). High-Throughput Post-Quantum Cryptographic System: CRYSTALS-Kyber with Computational Scheduling and Architecture Optimization. Electronics, 14(15), 2969. https://doi.org/10.3390/electronics14152969

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop