1. Introduction
Large-scale quantum computing poses a direct threat to the public-key cryptosystems that currently secure internet communications, financial transactions, and critical infrastructures. Classical schemes such as RSA and Elliptic-Curve Cryptography (ECC) rely on the hardness of integer factorization and discrete logarithms, problems that can be solved in polynomial time on a quantum computer using Shor’s algorithm [
1]. Once cryptanalytically relevant quantum computers become available, today’s public-key infrastructure will no longer provide adequate security [
2], motivating the design and deployment of Post-Quantum Cryptography (PQC).
Over the last decade, several mathematical families have emerged as leading candidates for PQC, including lattice-, code-, isogeny-, multivariate-, and hash-based constructions. A key milestone was NIST’s report on Post-Quantum Cryptography (NISTIR 8105 [
3]), which set out the initial threat assessment and transition considerations and announced the plan for an open standardization process. Comprehensive surveys such as [
4] synthesize these algorithmic families and discuss directions for the transition process. After a multi-year public evaluation process, the U.S. National Institute of Standards and Technology (NIST) selected CRYSTALS-Kyber for key establishment and CRYSTALS-Dilithium, Falcon, and SPHINCS+ for digital signatures. NIST has since published the first Federal Information Processing Standards (FIPS) for these schemes: ML-KEM (FIPS 203) [
5], corresponding with CRYSTALS-Kyber; ML-DSA (FIPS 204) [
6], corresponding with CRYSTALS-Dilithium; and SLH-DSA (FIPS 205) [
7], corresponding with SPHINCS+. Falcon is being standardized separately as FN-DSA in FIPS 206 (currently in development). In parallel, additional proposals such as HQC [
8] and BIKE [
9] have been kept under consideration to provide algorithmic diversity in case new attacks appear on structured lattices.
While the asymptotic security of these schemes has been analyzed extensively, their practical performance on real platforms remains a critical factor for adoption. Implementations must cope with larger keys and signatures, different memory access patterns, and more complex arithmetic than traditional RSA/ECC, which can stress constrained devices and high-throughput servers alike. Recent surveys and measurement studies emphasize that deployment decisions hinge not only on security guarantees but also on execution time, memory footprint, throughput, and communication overhead in specific environments. At the same time, a broader literature is emerging on how quantum computing interacts with cybersecurity and sustainability in industrial systems, analyzing its role in next-generation quantum security architectures [
2], in circular economy and Industry 4.0 settings [
10], and in energy efficiency and environmental performance [
11]. In parallel, recent work has begun to measure the impact of PQC on constrained devices and Industrial Internet of Things (IIoT) scenarios [
12], where the computational and communication overhead of new primitives is particularly critical. These perspectives reinforce the importance of understanding the concrete resource costs associated with PQC adoption.
A growing body of work has begun to quantify these costs. In [
13], Abbasi et al. present a cross-platform benchmark of NIST-selected algorithms such as Kyber, Dilithium, and BIKE across heterogeneous computing environments, measuring latency, memory usage, and protocol overhead at multiple security levels. Other studies evaluate PQC in specific protocol contexts: Paquin et al. [
14] and Sikeridis et al. [
15] analyze the impact of hybrid and post-quantum primitives on TLS 1.3 handshakes, Montenegro et al. [
16] propose a performance evaluation framework for post-quantum TLS configurations combining classical and PQC ciphersuites [
16], and Juaristi et al. [
17] benchmark PQC in Ethereum-based blockchains, focusing on transaction-level costs. Complementary work targets resource-constrained platforms, assessing the feasibility of PQC on microcontrollers and IoT devices and measuring the impact of PQC in IIoT scenarios where computational and communication overheads may be particularly restrictive. These contributions collectively show that performance varies substantially across algorithm families, parameter sets, and deployment scenarios and that no single primitive dominates along all dimensions. Recent experimental work also emphasizes that primitive-level costs and deployment-level costs can diverge substantially once post-quantum schemes are integrated into real protocols and services. In particular, hybrid TLS 1.3 deployments that combine classical and post-quantum mechanisms (and, in some proposals, additional key material sources) show that operational overhead depends not only on the primitive’s standalone cost but also on how it is integrated into the handshake and the surrounding protocol logic [
18]. Complementary TLS-oriented analyses [
19] further illustrate how updated post-quantum standardization outcomes and candidate diversity (e.g., the inclusion of HQC as a key establishment standard alongside ongoing evaluation of code-based alternatives) motivate continuous performance validation in deployment-relevant settings.
Despite this progress, there is still value in controlled, platform-level benchmarks that (i) compare multiple families of key encapsulation mechanisms (KEM) and signature schemes under a unified methodology, (ii) use commodity general-purpose processors representative of client and server machines, and (iii) include both standardized algorithms and research-stage proposals that may influence future designs. Particularly, lattice-based schemes, such as Kyber, Dilithium, and HAWK; code-based KEMs, such as HQC and BIKE; isogeny-based signatures, such as SQISign; multivariate-based signatures, such as SNOVA; and hash-based SPHINCS+ offer very different trade-offs between key and signature size, computational cost, and implementation complexity. A comparative evaluation of these schemes on typical desktop-class CPUs, in a networked setting, can help bridge the gap between algorithm design and system deployment.
This paper contributes to that effort with an experimental evaluation of eight post-quantum primitives: three key encapsulation mechanisms (Kyber-FIPS 203, HQC, and BIKE) and five digital signature schemes (CRYSTALS-Dilithium-FIPS 204, HAWK, SQISign, SNOVA, and SPHINCS+-FIPS 205). We implement a TCP client–server testbed in Python that invokes C executables for each primitive and measure key generation, encapsulation/decapsulation, and signature generation/verification times on two commodity processors: an AMD Ryzen 7 4000 (8 cores, 16 threads, 1.8 GHz) and an Intel Core i5-1035G1 (4 cores, 8 threads, 1.0 GHz), both running Windows 11. Each operation is executed ten times under controlled load, and the results are aggregated to obtain stable performance indicators.
The main contributions of this work are as follows:
A unified benchmarking framework for post-quantum key encapsulation mechanisms (KEM) and digital signatures in a client–server setting, suitable for evaluating networked cryptographic operations on general-purpose processors;
A comparative performance study of five signature schemes and three KEMs spanning lattice-based, code-based, isogeny-based, multivariate-based, and hash-based families, using consistent metrics (key generation, encapsulation/decapsulation, signing, verification, key sizes) across two hardware platforms;
An analysis of algorithm–platform interactions, highlighting how architectural differences between AMD Ryzen 7 4000 and Intel Core i5-1035G1 influence the relative cost of PQC operations and the practicality of different schemes for deployment in networked information systems.
Prior benchmarking studies often focus on only NIST-selected schemes, a single protocol embedding (e.g., TLS), or constrained platforms (IoT/microcontrollers). In contrast, the goal of this study is a like-for-like comparison of both standardized baselines (e.g., Kyber/ML-KEM, Dilithium/ML-DSA, SPHINCS+/SLH-DSA) and research-stage designs (e.g., BIKE, HQC, HAWK, SQISign, SNOVA) executed under the same orchestration pipeline, measurement boundary, and reporting format. This consolidated view is intended to help practitioners understand how non-standardized alternatives compare to today’s standardization baseline on mainstream Windows client/server hardware, including both timing and artefact-size trade-offs. The insights derived from these measurements complement existing PQC benchmarking and protocol integration studies and provide empirical guidance for practitioners selecting and implementing quantum-resistant primitives on mainstream hardware.
From a practical standpoint, the resulting benchmark data support two immediate needs in quantum-resistant planning: first, evidence-based primitive selection under a consistent measurement boundary across multiple PQC families and parameter sets, and second, capacity planning on mainstream client/server hardware by showing how observed overheads vary across operations and across representative CPU platforms. These outcomes are intended to help practitioners anticipate computational and artefact-size trade-offs when integrating post-quantum primitives into networked services.
The rest of this article is organized as follows.
Section 2 reviews the relevant background and related work on post-quantum key encapsulation mechanisms and digital signature schemes.
Section 3 describes the experimental methodology, including the client–server testbed, hardware platforms, and performance metrics.
Section 4 presents and analyzes the benchmarking results for the evaluated KEMs and signature schemes on both processors.
Section 5 discusses the main findings, practical implications, and limitations of this study.
Section 6 concludes this paper and outlines directions for future work.
2. Background and Related Work
2.1. Quantum Computing and Its Impact on Cryptography
Quantum computing has introduced a paradigm shift in computational capabilities. By exploiting quantum superposition and entanglement, quantum devices can, in principle, solve specific problems much more efficiently than classical computers. One of the most critical implications for information security is Shor’s algorithm, which can solve integer factorization and discrete logarithm problems in polynomial time on a sufficiently powerful quantum computer. As a result, widely deployed public-key schemes such as RSA and ECC would become vulnerable once cryptanalytically relevant quantum computers are available.
This threat motivates a transition towards cryptographic schemes that are secure against both classical and quantum adversaries. The challenge is not only mathematical but also practical: new schemes must be integrated into existing protocols and infrastructures without unacceptable performance or interoperability penalties.
2.2. Post-Quantum Cryptography and the NIST Standardization Process
Recognizing the vulnerabilities posed by quantum computing, NIST launched a public process to standardize PQC. NISTIR 8105 outlined the initial threat assessment and transition considerations and announced the plan for an open evaluation of candidate schemes. Over several rounds, NIST assessed submissions based on security, performance, implementation characteristics, and suitability for different applications.
As summarized in recent surveys such as [
4], five main mathematical families have emerged as leading candidates: lattice-based cryptography (e.g., Kyber, CRYSTALS-Dilithium, Falcon), code-based cryptography (e.g., HQC, BIKE), isogeny-based cryptography (e.g., SQISign), multivariate-based cryptography (e.g., SNOVA), and hash-based cryptography (e.g., SPHINCS+).
After a multi-year public evaluation process, NIST selected CRYSTALS-Kyber as the primary key establishment mechanism and CRYSTALS-Dilithium, Falcon, and SPHINCS+ as digital signature schemes and has published the first Federal Information Processing Standards (FIPS) for these algorithms (ML-KEM, ML-DSA, and SLH-DSA, respectively). Falcon has also been selected and is being standardized separately as FN-DSA in FIPS 206 (currently in development). Additional proposals, such as HQC and BIKE, remain under consideration to provide algorithmic diversity in case new structural weaknesses are found in lattice-based designs. Each family presents distinct trade-offs in terms of key and ciphertext size, signature size, computational complexity, and implementation constraints, which makes empirical performance evaluation essential for deployment decisions.
2.3. Post-Quantum Key Encapsulation Mechanisms
KEMs are crucial for establishing shared secrets over untrusted networks. In the PQC context, lattice-based and code-based constructions are among the most prominent KEM families.
Kyber is a lattice-based KEM built on the Module-LWE problem. It was selected by NIST as the primary standard for key establishment due to its strong security arguments, relatively compact keys and ciphertexts, and efficient implementations on a wide range of platforms. Its structured lattice design allows for vectorized and constant-time implementations, making it attractive for high-performance servers and constrained devices alike.
HQC and BIKE are code-based KEMs derived from the McEliece tradition. They use quasi-cyclic codes to reduce key sizes compared with classical code-based proposals while retaining conservative security assumptions. HQC relies on hard decoding problems for quasi-cyclic codes in the Hamming metric (syndrome decoding), whereas BIKE is based on the hardness of decoding in moderate-density parity-check codes. Both schemes provide valuable algorithmic diversity but typically exhibit larger public keys or higher computational cost than lattice-based KEMs, which motivates careful performance benchmarking on different hardware platforms.
2.4. Post-Quantum Digital Signatures
Digital signatures provide authenticity, integrity, and non-repudiation and are a critical component of public-key infrastructures, software update mechanisms, and many authentication protocols. In the post-quantum setting, several families of signature schemes have been proposed and analyzed.
First, CRYSTALS-Dilithium is a lattice-based signature scheme built on Module-LWE and Module-SIS problems. It was standardized by NIST as ML-DSA and is widely regarded as a strong default choice due to its favorable balance between security, signature size, and computational efficiency. Second, SPHINCS+ is a stateless hash-based signature scheme that relies only on the security of underlying hash functions. It offers strong long-term security and robustness against advances in number theoretic cryptanalysis at the cost of relatively large signatures and higher signing time, especially at higher security levels. Third, SQISign is an isogeny-based signature scheme that achieves very compact signatures and public keys by leveraging hard problems in the isogeny graphs of supersingular elliptic curves or abelian varieties. However, they typically entail higher computational cost and more complex implementations than lattice-based designs. Fourth, SNOVA is a multivariate signature scheme. Fifth and last, HAWK is a lattice-based deterministic signature scheme designed to produce compact signatures while maintaining strong security guarantees. Its structure and parameter choices differ from Dilithium, providing an additional design point within the lattice-based family.
These schemes illustrate the diversity of design approaches and efficiency profiles in post-quantum digital signatures, underscoring the need for comparative performance studies under realistic conditions.
2.5. Performance Evaluation of PQC Implementations
Beyond asymptotic complexity, the adoption of PQC depends critically on measured performance in specific environments. Recent work has begun to quantify the cost of PQC primitives in various deployment scenarios.
Abbasi et al. [
13] present a cross-platform benchmark of NIST-selected algorithms such as Kyber, Dilithium, and BIKE across heterogeneous computing environments, measuring latency, memory usage, and protocol overhead at multiple security levels. Paquin et al. [
14] studied the impact of post-quantum and hybrid key exchange and authentication mechanisms on TLS 1.3 handshakes, providing empirical data for protocol-level design choices. In [
16], Montenegro et al. propose a performance evaluation framework for post-quantum TLS configurations combining classical and PQC cipher suites, and Juaristi et al. [
17] benchmark PQC in Ethereum-based blockchains, focusing on transaction-level costs and throughput.
Complementary studies analyze PQC on microcontrollers and IoT devices, as well as in IIoT contexts, e.g., [
12], where CPU cycles, memory, and bandwidth are limited, and even moderate overheads may be problematic. In parallel, broader work on quantum computing and cybersecurity examines how quantum-safe primitives interact with sustainability, energy efficiency, and Industry 4.0 architectures (e.g., [
2,
10,
11]).
These studies collectively show that performance varies substantially across algorithm families, parameter sets, platforms, and protocol embeddings and that no single primitive is optimal along all dimensions. However, there remains a need for controlled, platform-level benchmarks that directly compare multiple KEMs and signature schemes on commodity general-purpose processors under a unified methodology, which is precisely the focus of the experimental study presented in this paper.
In this sense, our measurements complement (rather than replace) both microbenchmarks that isolate kernel-level costs and protocol-level studies (e.g., TLS) that include full handshake and network effects. Specifically, our results quantify a deployment-proximate invocation boundary for primitive operations under a unified orchestration workflow on commodity CPUs while explicitly excluding TCP transmission and request parsing/serialization, as defined next in
Section 3.2.
3. Methodology
This section describes the experimental design used to evaluate the performance of selected post-quantum KEMs and digital signature schemes on commodity general-purpose processors, using a unified measurement procedure and consistent metrics.
3.1. Evaluated Primitives and Measured Operations
We evaluate three KEMs, namely, Kyber, HQC, and BIKE, and five digital signature schemes, namely, CRYSTALS-Dilithium, HAWK, SQISign, SNOVA, and SPHINCS+ (see
Table 1). For KEMs, we measure the execution time of the following operations:
Key generation: generation of a public/secret key pair;
Encapsulation (encryption): generation of a ciphertext and shared secret using the public key;
Decapsulation (decryption): derivation of the shared secret from the ciphertext using the secret key.
For digital signatures, we measure the following:
Key generation: generation of a public/secret signing key pair;
Signature generation (signing): creation of a signature over an input message;
Signature verification: verification of a signature using the corresponding public key.
In addition to execution time, we include size-related indicators (where available through the invoked implementations) for the main artefacts exchanged or stored by each primitive (e.g., public keys, secret keys, ciphertexts, and signatures). These size indicators are used to support the discussion of practical deployment trade-offs together with timing results.
3.2. Testbed Architecture and Implementation Approach
To orchestrate experiments in a way that resembles typical deployment patterns (client requesting cryptographic services from a server-side component), we implement a lightweight TCP client–server testbed.
A client component initiates requests specifying the primitive (e.g., Kyber, Dilithium) and the operation type (e.g., key generation, encapsulation, signing). Then, a server component performs the requested operation and returns the results needed to complete the measurement workflow (e.g., produced keys, ciphertexts, signatures, and/or verification outcomes depending on the operation).
The testbed is implemented in Python as the orchestration layer. The actual cryptographic computations are performed by invoking compiled implementations corresponding to each evaluated primitive. Depending on the available implementation, primitives are executed either as standalone binaries (invoked via
subprocess.run(…)) or as in-process dynamic libraries (DLLs) accessed via
ctypes. This design keeps the orchestration code consistent across schemes while allowing each primitive to be executed using its own implementation. The testbed code is available at [
20].
For reproducibility, in addition to providing the source code, we define the measurement boundary as follows. For each request, the client submits a primitive identifier, parameterization identifier, and operation type to the server. The server triggers the corresponding compiled executable and records the wall-clock duration from immediately before invoking the operation (i.e., right before calling the executable through
subprocess.run(…) or the corresponding DLL routine) to immediately after it returns (i.e., when the subprocess exits/the DLL call returns). This timing excludes TCP send/receive overhead and request parsing/serialization; it is taken after the request has been received and inputs are prepared. For executable-based schemes, it includes process start-up and the complete execution of the invoked binary; for DLL-based schemes, it measures only the duration of the library routine call. For signature generation/verification, the workload consists of signing/verifying a fixed-length message of 4 bytes generated as the UTF-8 encoding of the ASCII string “hola” (written to
message.txt). For KEM operations, the workflow follows the standard sequence key generation–encapsulation–decapsulation, and correctness is verified by comparing derived shared secrets. All reported results correspond to the parameterizations configured in the evaluated executables summarized in
Table 1.
3.3. Hardware and Operating System Environment
All experiments are conducted on two commodity platforms representative of widely deployed client/server machines: (1) AMD Ryzen 7 4000 series (8 cores/16 threads, base frequency of 1.8 GHz), Windows 11, and (2) Intel Core i5-1035G1 (4 cores/8 threads, base frequency of 1.0 GHz), Windows 11. Details are included in
Table 2.
Using two distinct CPUs allows us to observe how the relative cost of PQC operations varies with architectural characteristics (e.g., core count and base frequency) under the same operating system family and measurement tooling.
3.4. Timing Methodology and Execution Procedure
We report wall-clock execution time (seconds) for each primitive, parameterization, and operation type (key generation, encapsulation/decapsulation for KEMs, and signature generation/verification for signature schemes). Timing is recorded locally at the endpoint that executes the cryptographic operation inside our TCP client–server testbed (default loopback configuration 127.0.0.1 in the scripts), so network transmission time is not part of the reported cryptographic timings.
The timing boundary depends on how the primitive is integrated in the testbed. For schemes executed via standalone binaries, the orchestration layer measures elapsed time as the difference between timestamps taken immediately before and after the corresponding subprocess.run(…) call (implemented using time.time() or time.time()*1000 in the scripts). This boundary includes process start-up and the work performed inside the executable but excludes any preprocessing performed outside the timed region (e.g., preparing input files before launching the process). For schemes accessed as in-process dynamic libraries (e.g., via ctypes.CDLL/ctypes.WinDLL), elapsed time is measured around the specific library function call using time.perf_counter(), capturing the cost of the cryptographic routine itself. In the SNOVA scripts specifically, the message hashing step (SHAKE256) is computed outside the timed interval, and the recorded signing/verification times correspond to the core signing/verification routines over the derived digest. Because some primitives are invoked via standalone executables and others via in-process DLL calls, absolute timings should be interpreted within this measurement boundary; relative comparisons are primarily meaningful within each integration class.
Each experiment is repeated ten times on each platform. We report the mean, standard deviation, and 95% confidence intervals in
Figure 1,
Figure 2,
Figure 3,
Figure 4,
Figure 5,
Figure 6,
Figure 7,
Figure 8,
Figure 9,
Figure 10,
Figure 11 and
Figure 12 and
Table 3 and
Table 4, respectively, and visualize per-operation comparisons across schemes.
In summary, for each platform and configuration in
Table 1, we execute the following workflow: (1) the client submits the primitive identifier, parameterization, and operation type to the server; (2) the server invokes the corresponding compiled implementation either as a standalone executable (
subprocess.run(…)) or as an in-process library call (DLL via
ctypes) depending on availability; (3) the wall-clock timing interval is recorded only around the cryptographic invocation boundary defined in
Section 3.2 and
Section 3.4 (excluding TCP send/receive and request parsing/serialization); (4) the operation is repeated ten times and aggregated as mean, standard deviation, and 95% confidence interval; and (5) functional correctness is validated through shared secret agreement for KEMs and signature verification for signature schemes.
3.5. Data Collection and Result Aggregation
For each combination of platform (AMD vs. Intel), primitive (KEM or signature scheme), and operation type, the testbed produces a set of per-run timing values (ten measurements). These values are collected in structured logs and exported to tabular form for post-processing and visualization. The reported results in subsequent sections are derived from these collected measurements, using aggregated indicators (e.g., representative average timings across runs) to compare primitives across platforms under a unified methodology. Figures report mean ±1 standard deviation over 10 runs, and
Table 3 and
Table 4 report the mean and 95% confidence intervals.
3.6. Measured Outputs and Correctness Checks
In addition to timing, the testbed verifies functional correctness for each primitive. For KEMs, the encapsulated and decapsulated shared secrets are checked for agreement. For digital signatures, a generated signature is verified with the corresponding public key, and the verification outcome is recorded. This ensures that reported performance measurements correspond to successfully executed cryptographic operations rather than failed or degenerate runs.
3.7. Limitations
As with any empirical performance study, the results reported in this paper should be interpreted in light of the following limitations.
Our timings reflect wall-clock execution time as observed by the Python orchestration layer within a TCP client–server testbed. However, by construction, the timed interval starts after the request has been received and inputs are prepared. Therefore, TCP send/receive and request parsing/serialization are excluded from the reported timings. The measured time may still include overhead from the orchestration environment within the timing boundary (e.g., process invocation/start-up for executable-based schemes, wrapper logic, file/I/O used by the invoked implementation, and OS scheduling). This overhead is applied consistently within each integration class, but it may be non-negligible for very fast operations and should be considered when interpreting small differences.
Performance results depend strongly on the specific implementation choices embedded in the invoked executables (e.g., algorithmic optimizations, compiler and build configuration, use of platform-specific instructions, and defensive coding for constant-time behavior). Consequently, the measurements in this work should be understood as evidence for the evaluated implementations as executed in our testbed, rather than as universal lower bounds (or definitive rankings) for the underlying algorithms.
PQC schemes typically offer multiple parameter sets and variants. The performance reported here corresponds to the parameterization configured in the tested executables. Since different parameter sets can shift the time/size trade-offs substantially, our results should not be extrapolated to all configurations without additional measurement.
Experiments were conducted on two commodity processors and under a single operating system family (Windows 11). While these platforms are representative of common endpoints, they do not capture the full diversity of deployment environments (e.g., servers with different microarchitectures, Linux-based infrastructure, mobile devices, microcontrollers, and specialized accelerators). Results may therefore differ on other hardware/software stacks.
Each operation was repeated ten times to reduce transient variability and obtain stable indicators under controlled load. However, this repetition count does not provide comprehensive statistical characterization of variance under diverse system conditions (e.g., background load, thermal throttling, or long-running contention effects). For deployment planning, additional testing under production-like conditions may be required.
The benchmark focuses on primitive-level operations (key generation, encapsulation/decapsulation, signing, verification) in a controlled client–server workflow. It does not directly measure full protocol-level behavior under concurrency (e.g., multi-session TLS termination, PKI renewal at scale) or real wide-area network conditions. Therefore, the results primarily inform primitive selection and relative computational cost, rather than complete end-to-end system throughput in production.
This study emphasizes execution time and includes size-related indicators where available. It does not provide a systematic evaluation of additional deployment-relevant metrics such as memory consumption, energy usage, side-channel resistance, or formal constant-time validation. These factors can be decisive in high-assurance or resource-constrained environments and should be assessed separately when required.
These limitations are common in comparative benchmarking studies and do not detract from the main objective of this work: providing a consistent, platform-level comparison of representative PQC KEMs and signature schemes under a unified measurement procedure.
5. Discussion
This section interprets the benchmarking results presented in
Section 4, focusing on implications for algorithm selection, implications for common deployment contexts, and limitations and reproducibility considerations.
5.1. Implications for Selecting PQC Primitives
A consistent outcome of our measurements is that lattice-based primitives provide the most favorable efficiency profile on the evaluated commodity CPUs. In particular, Kyber (ML-KEM) exhibits the lowest computational cost among the evaluated KEMs for key generation and encapsulation/decapsulation, making it a strong default candidate for latency-sensitive key establishment. Similarly, Dilithium (ML-DSA) offers stable signing and verification performance across both platforms, supporting its role as a pragmatic baseline for post-quantum authentication in many applications.
By contrast, the code-based KEMs evaluated in this study (HQC and BIKE) incur higher encapsulation/decapsulation times in our setup. From a deployment perspective, this does not reduce their relevance because algorithmic diversity is a recognized risk-management strategy. However, it means that adopting these schemes may require additional engineering effort (optimized implementations or platform tuning) and careful capacity planning, especially on systems where key establishment is performed at high rates.
For digital signatures, the results reinforce the importance of matching the primitive to the operational profile. SPHINCS+ (SLH-DSA) provides strong long-term security assumptions (hash-based) but introduces different cost and size trade-offs, including potentially high signing latency depending on the chosen configuration and relatively large artefacts compared to lattice-based signatures. This can be acceptable for low-frequency signing use cases, but it may be less attractive for high-throughput transactional signing workloads. On the other hand, SQISign and SNOVA show a higher computational burden under the evaluated configurations, which limits their immediate attractiveness for performance-sensitive systems. However, these schemes remain relevant as research directions and as alternative design points. In some contexts, implementers may accept higher computation if it enables favorable properties elsewhere (e.g., certain size or key-management considerations).
We emphasize that these conclusions are drawn under the end-to-end measurement boundary defined in
Section 3.4. Therefore, the results should be interpreted as implementation and integration costs within our testbed, rather than as isolated cryptographic kernel timings. Overall, the results support a practical near-term narrative. ML-KEM/ML-DSA-class schemes are well-positioned for mainstream adoption on general-purpose processors. Other families may be best viewed as either diversity options or specialized candidates requiring stronger justification and optimization work.
5.2. Implications for Deployment Contexts
Implications are addressed in terms of interactive key establishment (e.g., TLS-like handshakes), PKI, identity, and authentication services; software distribution and update ecosystems; and finally constrained and industrial environments (IoT/IIoT).
Regarding key establishment, KEM encapsulation and decapsulation are performed during session establishment and can directly affect latency, throughput, and CPU cost at termination points (e.g., load balancers and application gateways). The relative advantage observed for Kyber suggests that lattice-based KEMs are particularly suitable for high-rate session establishment and environments where key exchange occurs frequently. This is consistent with protocol-level evaluations that measure the overhead of hybrid and post-quantum mechanisms in TLS contexts (e.g., [
14,
16]), which highlight that the feasibility of PQC in network protocols depends strongly on the computational and communication costs of the selected primitives.
In terms of PKI, identity and authentication, signature verification is often performed at scale (e.g., validation of certificates, signed messages, and software artefacts), whereas signing may be centralized and rate-limited (e.g., certificate issuance, internal code-signing services, etc.). In such settings, schemes with efficient and stable verification, such as Dilithium in our experiments, are attractive because they reduce the cost of widespread verification across endpoints.
Software signing pipelines and firmware update mechanisms typically exhibit an asymmetric workload. A small number of signing operations on the producer side, followed by many verification operations on endpoints. This pattern can accommodate signatures that are computationally heavier for signing (if signing is infrequent), but it may still be sensitive to verification latency and signature size, which impacts bandwidth and storage. Consequently, selecting a signature scheme for these ecosystems requires balancing long-term security objectives with operational constraints (e.g., update package size or verification on heterogeneous endpoints).
The performance trends matter for constrained systems because they indicate how quickly costs can grow as implementations move away from highly optimized primitives. Recent work on PQC overhead in IIoT settings [
12] similarly stresses that even moderate overheads can become problematic when CPU, memory, and bandwidth are limited. In these environments, platform-specific optimization and careful protocol design are often prerequisites for deployment.
5.3. Cross-Platform Effects and Capacity Planning
Across all evaluated primitives, the AMD platform consistently outperformed the Intel platform in our measurements. This reinforces a practical lesson for transition planning, which is that PQC performance is not only algorithm-dependent but also hardware-dependent. Organizations estimating the cost of adopting PQC should therefore avoid relying on a single benchmark number. Instead, they should evaluate representative target hardware classes (endpoints, servers, gateways) and consider refresh cycles and scaling strategies. Even when the same primitives are selected, the operational cost profile can vary materially with CPU generation, microarchitecture, and system configuration.
5.4. Relation to Prior Benchmarking Work
Our findings are broadly aligned with the existing benchmarking literature that reports strong performance for NIST-selected lattice-based primitives in practical settings. For instance, Abbasi et al. [
13] benchmark Kyber, Dilithium, and BIKE across heterogeneous environments and emphasize the sensitivity of PQC overhead to platform characteristics and deployment assumptions. Protocol context studies, particularly around TLS [
14,
16], further confirm that practical viability depends on the combined effect of primitive performance, protocol integration, and implementation choices. Work exploring PQC in distributed ledgers similarly illustrates that performance trade-offs may shift when primitives are embedded in complex transaction workflows [
17].
Importantly, different studies adopt different measurement boundaries. Many benchmarks focus on in-process microbenchmarks of cryptographic kernels [
21], while others evaluate protocol-level or application-level behavior. In contrast, our results reflect an end-to-end primitive invocation boundary through a common TCP client–server orchestration workflow (
Section 3.4). Accordingly, cross-scheme rankings should be interpreted cautiously when schemes fall into different integration classes (executable vs. DLL) and are most directly comparable within each class under the same boundary. As a consequence, absolute timings are not directly comparable across studies because they can incorporate additional non-cryptographic overheads such as process invocation, wrapper logic, and artifact handling within the testbed boundary.
In this context, the contribution of our study is not to replace micro benchmarking or protocol-level analysis but to provide a controlled, unified, deployment-proximate comparison across multiple algorithm families—including both standardized baselines and research-stage schemes—on two representative commodity platforms. This supports evidence-based selection and planning decisions, and it highlights the extent to which integration choices can materially affect observed PQC overheads in practical computing environments.
6. Conclusions
This paper presented an experimental performance evaluation of post-quantum cryptographic primitives on commodity general-purpose processors, focusing on three key encapsulation mechanisms (Kyber, HQC, and BIKE) and five digital signature schemes (CRYSTALS-Dilithium, HAWK, SQISign, SNOVA, and SPHINCS+). Using a unified TCP client–server testbed, we measured key generation, encapsulation/decapsulation, signing, and verification times on two representative platforms, namely, AMD Ryzen 7 4000 and Intel Core i5-1035G1, with repeated runs under controlled conditions.
Across the evaluated configurations, the results indicate that lattice-based schemes provide the most favorable performance profile for mainstream deployment on commodity CPUs. Kyber consistently achieved the lowest computational overhead among the evaluated KEMs, and Dilithium exhibited stable and efficient signing and verification performance. The code-based KEMs (HQC and BIKE) and the evaluated signatures SQISign (isogeny-based) and SNOVA (multivariate) incurred higher computational costs in our testbed, illustrating the practical trade-offs that arise when selecting schemes outside the lattice-based family. SPHINCS+, as a hash-based construction, offers conservative security assumptions and a distinct trade-off space that can be appropriate in specific contexts, but its performance characteristics require careful consideration for high-throughput use cases. Finally, the AMD platform consistently outperformed the Intel platform, underlining that hardware characteristics can materially affect observed PQC overheads and should be considered in capacity planning for migration. These findings complement prior PQC benchmarking and protocol integration studies by providing a controlled primitive-level comparison across multiple families, and they can inform practitioners when selecting primitives and estimating operational costs for quantum-resistant deployments.
Future work will extend this benchmark from primitive-level timings to end-to-end evaluation in real web service deployments, capturing operational impact under realistic loads. We will also replicate the experiments on broader hardware platforms and operating systems to improve generalizability. Finally, we will evaluate additional parameter sets and complementary metrics, such as memory or communication overhead, to study the impact of implementation choices on performance.