GPU Acceleration for KLSS Key Switching in Fully Homomorphic Encryption
Abstract
1. Introduction
- A CUDA-optimized GPU implementation of KLSS key switching for the BGV, BFV, and CKKS schemes.
- A comprehensive benchmark against CPU and prior state-of-the-art GPU implementations, demonstrating significant speedups.
- An in-depth analysis of the trade-off between single- and double-decomposition for key switching, specifically within the context of GPU-accelerated homomorphic encryption.
2. Related Works
2.1. Fully Homomorphic Encryption
2.2. Research in Key Switching
2.3. GPU-Accelerated FHE
3. Preliminaries
3.1. BGV, BFV, CKKS
- Setup: The process begins by establishing the cryptographic environment based on a security parameter . This involves selecting public parameters , which define the polynomial ring dimension N, plaintext modulus t, ciphertext modulus Q, and the distributions for key and error sampling ( and ).
- Key Generation: A secret key is created by sampling a polynomial from the key distribution . The corresponding public key is a pair , where is a uniformly random polynomial in and is a small error from . Additionally, evaluation keys, such as relinearization keys and automorphism keys , are generated for homomorphic operations.
- Encryption: To encrypt a plaintext , it is first encoded into a polynomial . The specific encoding method depends on the scheme:
- –
- BFV: The message is scaled to the most significant bits: .
- –
- BGV: A correction factor is applied: .
- –
- CKKS: The message is directly encoded as a polynomial: .
The final ciphertext is then formed by adding this encoded message to a fresh encryption of zero, which is calculated as .
- Decryption: The original message is recovered from a ciphertext using the secret key .
- –
- BFV:
- –
- BGV:.
- –
- CKKS:
- Homomorphic Operations: These schemes support computations directly on encrypted data.
- –
- Addition: The sum of two ciphertexts, and , is simply their component-wise sum: .
- –
- Multiplication: The product of two ciphertexts is computed, followed by a relinearization step using to produce .
- –
- Automorphism: An automorphism (e.g., rotation) is applied to a ciphertext using a corresponding key to yield the transformed ciphertext .
3.2. Key Switching
3.2.1. Relinearization
3.2.2. Modulus Switching
4. GPU Acceleration for KLSS
4.1. The Double-Decomposition Approach
4.2. GPU Acceleration
4.2.1. Stage 1: Key Generation (Offline)
4.2.2. Stage 2: GPU-Accelerated Key Switching (Online)
| Algorithm 1 KLSS Key Switching | |
| Input: Context parameters: Bases ; Decompositions . Input: Offline: Secret keys . Input: Online: Ciphertext component . Output: Switched ciphertext parts . | |
| STAGE 1. Generates and store. 1: for to d do 2: Sample . Compute raw parts: 3: 4: // BaseConvert (Map from ) 5: fordo | |
| 6: | ▹ Extract digit |
| 7: | |
| ▹ Store in NTT form in Ring for fast online dot products. | |
| STAGE 2. Transforms using . 1. Input Preparation (Decompose & BaseConvert) 8: for to d do | |
| 9: | ▹ Decompose input |
| 10: | |
| ▹ Move input parts to Ring and transform to NTT. | |
| 2. Inner Product 11: fordo 12: Initialize accumulator . 13: for to do 14: 15: for to d do | |
| 16: | ▹ Pointwise Mult-Add |
| 17: | ▹ Reconstruct in |
| 18: | ▹ Inverse NTT in Ring |
| 3. Post-Processing (BaseConvert & ModDown) 19: fordo | |
| 20: | ▹ Convert back to large modulus |
| 21: | ▹ Correction factor |
| 22: | ▹ Scale down to |
| 23: return | |
| Algorithm 2 GPU-Accelerated Key Generation for KLSS | |
| Input: New secret key , old secret key (on Device). Output: Double-decomposition key stored in GPU Device memory. // Step 1: Generate standard key parts in a batched kernel. | |
| 1: parallel for do | ▹ Launch a kernel to process all parts in parallel. |
| 2: Sample a batch of d polynomials and on the GPU. | |
| ▹ All polynomial arithmetic is coefficient-wise and parallelized within the kernel. | |
| 3: | |
| 4: // Step 2: Apply second decomposition and base-extend in a single, large-scale kernel. | |
| 5: parallel for do | ▹ A massively parallel kernel. |
| ▹ Each thread block be assigned to compute one or more . | |
| 6: Decompose to get . | ▹ RNS decomposition is coefficient-wise. |
| 7: Let . | |
| 8: . | ▹ Batched BaseExt with error correction. |
| 9: return (residing on the GPU Device). | |
| Algorithm 3 GPU-Accelerated Double-Decomposition KLSS Key Switching. | ||
| Input: Ciphertext component (on Host), key (on Device). Output: Switched ciphertext components (on Host). 1: Transfer from Host to GPU Device memory. // Step 1: Batched Input Extension. 2: Decompose into on the GPU. | ||
| 3: parallel for do | ▹ Launch a kernel for batched base extension. | |
| 4: . | ▹ Batched BaseExt | |
| // Step 2: Batched Dot Product in . | ||
| 5: parallel for do | ▹ Batched NTTs and multiplications. | |
| 6: . | ▹⊙ is coefficient-wise product. | |
| 7: parallel for do | ▹ Parallel reduction over index l. | |
| 8: . | ▹ Kernel using parallel reduction. | |
| 9: parallel for do | ▹ Parallel CRT reconstruction (another reduction). | |
| 10: . // Step 3: Batched Base Conversion from back to . 11: parallel for do | ||
| 12: . | ▹ Batched and BaseExt. | |
| // Step 4: Batched Final Scaling. 13: parallel for do | ||
| 14: . | ▹ Batched kernels for all steps. | |
| 15: . 16: Transfer from GPU Device to Host memory. 17: return . | ||
4.3. Unified Framework for BFV, BGV, and CKKS
5. Results
5.1. Experimental Setup
5.2. GPU Acceleration Performance
5.3. Comparison with Hybrid Key Switching
6. Discussion
6.1. Single- or Double-Decomposition? A Trade-Off Analysis
6.2. GPU Key Size Constraints
- Impact of Gadget Dimension: The size of the gadget vector dimension, r (and for double decomposition), has a dramatic effect on the total key size. In the single-decomposition case, increasing the dimension from to results in a significant increase in memory footprint.
- The Cost of Double Decomposition: The penalty is even more pronounced with double decomposition. As shown in the figure, moving from ‘Double-Decomp (r = 3, rdouble = 2)’ to ‘(r = 3, rdouble = 8)’ causes a steep rise in key size. Crucially, for comparable parameters, double decomposition consistently requires more memory than single decomposition.
7. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| N | The polynomial ring degree, a power-of-two. |
| The ring . | |
| The ciphertext modulus, , composed of ℓ co-prime primes. | |
| P | The key-switching extension modulus, , composed of k co-prime primes. |
| The total number of co-primes in the mod-up ring . . | |
| T | The double-decomposition modulus, , composed of r co-prime primes. |
| The old secret key () and the new secret key (). For relinearization, . | |
| d | The number of decomposition groups for the first decomposition over q. |
| The groups are denoted . | |
| The number of decomposition groups for the second decomposition over . | |
| The groups are denoted . | |
| r | The RNS word width of the first decomposition. |
| The RNS word width of the second decomposition. | |
| The first decomposition function and its reconstruction basis over . | |
| For a polynomial , . | |
| The second decomposition function and its reconstruction basis over . | |
| The input ciphertext polynomial component to be switched, | |
| e.g., from a multiplication result. | |
| The standard (single-decomposition) key-switching key. | |
| The double-decomposition key-switching key. |
References
- Zhang, Q.; Yang, L.T.; Chen, Z. Privacy Preserving Deep Computation Model on Cloud for Big Data Feature Learning. IEEE Trans. Comput. 2016, 65, 1351–1362. [Google Scholar] [CrossRef]
- Chabanne, H.; Wargny, A.; Milgram, J.; Morel, C.; Prouff, E. Privacy-preserving classification on deep neural network. Cryptol. ePrint Arch. 2017. Available online: https://ia.cr/2017/035 (accessed on 30 October 2025).
- Jia, H.; Cai, D.; Huo, Z.; Wang, C.; Zhang, S.; Zhang, S.; Li, X.; Yang, S. Evaluation of Activation Functions in Convolutional Neural Networks for Image Classification Based on Homomorphic Encryption. In Proceedings of the 13th International Conference on Computer Engineering and Networks, Wuxi, China, 3–5 November 2023; Lecture Notes in Electrical Engineering. Springer Nature: Singapore, 2024; Volume 1127, pp. 343–355. [Google Scholar] [CrossRef]
- Jiang, S.; Yang, H.; Xie, Q.; Ma, C.; Wang, S.; Xing, G. Lancelot: Towards Efficient and Privacy-Preserving Byzantine-Robust Federated Learning Within Fully Homomorphic Encryption. arXiv 2024, arXiv:2408.06197. [Google Scholar] [CrossRef]
- Asiri, M.; Khemakhem, M.A.; Alhebshi, R.M.; Alsulami, B.S.; Eassa, F.E. Decentralized Federated Learning for IoT Malware Detection at the Multi-Access Edge: A Two-Tier, Privacy-Preserving Design. Future Internet 2025, 17, 475. [Google Scholar] [CrossRef]
- Kim, M.; Lee, D.; Seo, J.; Song, Y. Accelerating HE operations from key decomposition technique. In Proceedings of the Annual International Cryptology Conference, Santa Barbara, CA, USA, 18–22 August 2024; Springer: Berlin/Heidelberg, Germany, 2023; pp. 70–92. [Google Scholar]
- Brakerski, Z.; Gentry, C.; Vaikuntanathan, V. (Leveled) fully homomorphic encryption without bootstrapping. ACM Trans. Comput. Theory TOCT 2014, 6, 309–325. [Google Scholar] [CrossRef]
- Fan, J.; Vercauteren, F. Somewhat Practical Fully Homomorphic Encryption. Cryptol. ePrint Arch. 2012. Available online: https://eprint.iacr.org/2012/144 (accessed on 30 October 2025).
- Brakerski, Z.; Vaikuntanathan, V. Efficient fully homomorphic encryption from (standard) LWE. SIAM J. Comput. 2014, 43, 831–871. [Google Scholar] [CrossRef]
- Brakerski, Z.; Vaikuntanathan, V. Fully homomorphic encryption from Ring-LWE and security for key dependent messages. In Proceedings of the Annual Cryptology Conference, Santa Barbara, CA, USA, 14–18 August 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 505–524. [Google Scholar]
- Cheon, J.H.; Kim, A.; Kim, M.; Song, Y. Homomorphic encryption for arithmetic of approximate numbers. In Proceedings of the Advances in Cryptology—ASIACRYPT 2017: 23rd International Conference on the Theory and Applications of Cryptology and Information Security, Hong Kong, China, 3–7 December 2017; Springer: Berlin/Heidelberg, Germany, 2017. Proceedings, Part I 23. pp. 409–437. [Google Scholar]
- Gentry, C. A Fully Homomorphic Encryption Scheme. Ph.D. Thesis, Stanford University, Stanford, CA, USA, 2009. [Google Scholar]
- Zvika, B.; Vinod, V. Lattice-based FHE as secure as PKE. In Proceedings of the 5th Conference on Innovations in Theoretical Computer Science, ITCS’14, New York, NY, USA, 12–14 January 2014; pp. 1–12. [Google Scholar] [CrossRef]
- Gentry, C.; Sahai, A.; Waters, B. Homomorphic encryption from learning with errors: Conceptually-simpler, asymptotically-faster, attribute-based. In Proceedings of the Advances in Cryptology—CRYPTO 2013: 33rd Annual Cryptology Conference, Santa Barbara, CA, USA, 18–22 August 2013; Springer: Berlin/Heidelberg, Germany, 2013. Proceedings, Part I. pp. 75–92. [Google Scholar]
- Naehrig, M.; Lauter, K.; Vaikuntanathan, V. Can homomorphic encryption be practical? In Proceedings of the 3rd ACM Workshop on Cloud Computing Security Workshop, Chicago, IL, USA, 21 October 2011; pp. 113–124. [Google Scholar]
- Gentry, C.; Halevi, S.; Smart, N.P. Homomorphic Evaluation of the AES Circuit. Cryptol. ePrint Arch. 2012. Available online: https://eprint.iacr.org/2012/099 (accessed on 30 October 2025).
- Aslett, L.; Esperança, P.; Holmes, C. Encrypted statistical machine learning: New privacy preserving methods. arXiv 2015, arXiv:1508.06845. [Google Scholar] [CrossRef]
- Aono, Y.; Hayashi, T.; Wang, L.; Moriai, S. Privacy-preserving deep learning via additively homomorphic encryption. IEEE Trans. Inf. Forensics Secur. 2017, 13, 1333–1345. [Google Scholar] [CrossRef]
- Chen, H.; Cammarota, R.; Valencia, F.; Regazzoni, F.; Koushanfar, F. Ahec: End-to-end compiler framework for privacy-preserving machine learning acceleration. In Proceedings of the 2020 57th ACM/IEEE Design Automation Conference (DAC), Virtual, 20–24 July 2020; IEEE: New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
- Juvekar, C.; Vaikuntanathan, V.; Chandrakasan, A. GAZELLE: A Low Latency Framework for Secure Neural Network Inference. In Proceedings of the 27th USENIX Security Symposium (USENIX Security 18), Baltimore, MD, USA, 15–17 August 2018; pp. 1651–1669. [Google Scholar]
- Ao, W.; Boddeti, V.N. AutoFHE: Automated Adaption of CNNs for Efficient Evaluation over FHE. In Proceedings of the 33rd USENIX Security Symposium (USENIX Security 24), Philadelphia, PA, USA, 14–18 August 2024; pp. 2173–2190. [Google Scholar]
- Chen, H.; Laine, K.; Player, R. Simple encrypted arithmetic library-SEAL v2.1. In Proceedings of the Financial Cryptography and Data Security: FC 2017 International Workshops, WAHC, BITCOIN, VOTING, WTSC, and TA, Sliema, Malta, 7 April 2017; Springer: Berlin/Heidelberg, Germany, 2017. Revised Selected Papers 21. pp. 3–18. [Google Scholar]
- Halevi, S.; Shoup, V. Design and implementation of a homomorphic-encryption library. IBM Res. 2013, 6, 8–36. [Google Scholar]
- Al Badawi, A.; Bates, J.; Bergamaschi, F.; Cousins, D.B.; Erabelli, S.; Genise, N.; Halevi, S.; Hunt, H.; Kim, A.; Lee, Y.; et al. Openfhe: Open-source fully homomorphic encryption library. In Proceedings of the 10th Workshop on Encrypted Computing & Applied Homomorphic Cryptography, Los Angles, CA, USA, 7 November 2022; pp. 53–63. [Google Scholar]
- Kim, A.; Polyakov, Y.; Zucca, V. Revisiting Homomorphic Encryption Schemes for Finite Fields. In Proceedings of the Advances in Cryptology—ASIACRYPT 2021, Singapore, 6–10 December 2021; pp. 608–639. [Google Scholar] [CrossRef]
- Han, K.; Ki, D. Better Bootstrapping for Approximate Homomorphic Encryption. Cryptol. ePrint Arch. 2019. Available online: https://eprint.iacr.org/2019/688 (accessed on 30 October 2025).
- Zhou, L.; Huang, R.; Wang, B. Enhancing Multi-Key Fully Homomorphic Encryption with Efficient Key Switching and Batched Multi-Hop Computations. Appl. Sci. 2025, 15, 5771. [Google Scholar] [CrossRef]
- Hwang, I.; Seo, J.; Song, Y. Optimizing HE operations via Level-aware Key-switching Framework. In Proceedings of the 11th Workshop on Encrypted Computing & Applied Homomorphic Cryptography, Copenhagen, Denmark, 26 November 2023; pp. 59–67. [Google Scholar]
- Wang, W.; Hu, Y.; Chen, L.; Huang, X.; Sunar, B. Accelerating fully homomorphic encryption using GPU. In Proceedings of the 2012 IEEE Conference on High Performance Extreme Computing, Waltham, MA, USA, 10–12 September 2012; IEEE: New York, NY, USA, 2012; pp. 1–5. [Google Scholar]
- Wang, W.; Chen, Z.; Huang, X. Accelerating leveled fully homomorphic encryption using GPU. In Proceedings of the 2014 IEEE International Symposium on Circuits and Systems (ISCAS), Melbourne, Australia, 1–5 June 2014; IEEE: New York, NY, USA, 2014; pp. 2800–2803. [Google Scholar]
- Dai, W.; Sunar, B. cuHE: A homomorphic encryption accelerator library. In Proceedings of the Cryptography and Information Security in the Balkans: Second International Conference, BalkanCryptSec 2015, Koper, Slovenia, 3–4 September 2015; pp. 169–186. [Google Scholar]
- Al Badawi, A.; Veeravalli, B.; Mun, C.F.; Aung, K.M.M. High-performance FV somewhat homomorphic encryption on GPUs: An implementation using CUDA. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2018, 2018, 70–95. [Google Scholar] [CrossRef]
- Al Badawi, A.; Veeravalli, B.; Lin, J.; Xiao, N.; Kazuaki, M.; Mi, A.K.M. Multi-GPU design and performance evaluation of homomorphic encryption on GPU clusters. IEEE Trans. Parallel Distrib. Syst. 2020, 32, 379–391. [Google Scholar] [CrossRef]
- Goey, J.Z.; Lee, W.K.; Goi, B.M.; Yap, W.S. Accelerating number theoretic transform in GPU platform for fully homomorphic encryption. J. Supercomput. 2021, 77, 1455–1474. [Google Scholar] [CrossRef]
- Alves, P.G.M.; Ortiz, J.N.; Aranha, D.F. Faster homomorphic encryption over GPGPUs via hierarchical DGT. In Proceedings of the International Conference on Financial Cryptography and Data Security, Virtual, 1–5 March 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 520–540. [Google Scholar]
- vernamlab. cuFHE; GitHub. 2018. Available online: https://github.com/vernamlab/cuFHE (accessed on 5 November 2025).
- Xiao, Y.; Liu, F.H.; Ku, Y.T.; Ho, M.C.; Hsu, C.F.; Chang, M.C.; Hung, S.H.; Chen, W.C. GPU Acceleration for FHEW/TFHE Bootstrapping. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2025, 2025, 314–339. [Google Scholar] [CrossRef]
- Shen, S.; Yang, H.; Liu, Z.; Liu, Y.; Lu, X.; Dai, W.; Zhou, L.; Zhao, Y.; Cheung, R.C.C. VeloFHE: GPU Acceleration for FHEW and TFHE Bootstrapping. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2025, 2025, 81–114. [Google Scholar] [CrossRef]
- Jin, S.; Shen, S.; Yang, H.; Chen, D.; Dai, W.; Cheung, R.C.C. CuFDFB: Fast and Private Computation on Non-Linear Functions Using FHE. Cryptol. ePrint Arch. 2025, 14, 1–13. Available online: https://eprint.iacr.org/2025/1096 (accessed on 30 October 2025).
- Jung, W.; Kim, S.; Ahn, J.H.; Cheon, J.H.; Lee, Y. Over 100x Faster Bootstrapping in Fully Homomorphic Encryption through Memory-centric Optimization with GPUs. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2021, 2021, 114–148. [Google Scholar] [CrossRef]
- Yang, H.; Shen, S.; Dai, W.; Zhou, L.; Liu, Z.; Zhao, Y. Phantom: A CUDA-accelerated Word-Wise Homomorphic Encryption Library. IEEE Trans. Dependable Secur. Comput. 2024, 21, 4895–4906. [Google Scholar] [CrossRef]
- Ozcan, A.S.; Savas, E. HEonGPU: A GPU-Based Fully Homomorphic Encryption Library 1.0. Cryptol. ePrint Arch. 2024. Available online: https://eprint.iacr.org/2024/1543 (accessed on 30 October 2025).
- Fan, S.; Wang, Z.; Xu, W.; Hou, R.; Meng, D.; Zhang, M. TensorFHE: Achieving Practical Computation on Encrypted Data Using GPGPU. In Proceedings of the 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), Montreal, QC, Canada, 25 February–1 March 2023; pp. 922–934, ISSN 2378-203X. [Google Scholar] [CrossRef]
- Fan, G.; Zhang, M.; Zheng, F.; Fan, S.; Zhou, T.; Deng, X.; Tang, W.; Kong, L.; Song, Y.; Yan, S. WarpDrive: GPU-Based Fully Homomorphic Encryption Acceleration Leveraging Tensor and CUDA Cores. In Proceedings of the 2025 IEEE International Symposium on High Performance Computer Architecture (HPCA), Las Vegas, NV, USA, 1–5 March 2025; pp. 1187–1200, ISSN 2378-203X. [Google Scholar] [CrossRef]
- Kim, J.; Choi, W.; Ahn, J.H. Cheddar: A Swift Fully Homomorphic Encryption Library for CUDA GPUs. arXiv 2024, arXiv:2407.13055. [Google Scholar] [CrossRef]
- Mono, J.; Güneysu, T. A New Perspective on Key Switching for BGV-like Schemes. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2025, 2025, 763–794. [Google Scholar] [CrossRef]




| Set # | BFV | BGV | CKKS | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Hybrid [26,41] | Ours | Speedup | Hybrid [26,41] | Ours | Speedup | Hybrid [26,41] | Ours | Speedup | |
| (ms) | (ms) | (ms) | (ms) | (ms) | (ms) | ||||
| 1 | 0.94 | 0.85 | 1.10× | 1.03 | 0.97 | 1.06× | 1.00 | 0.95 | 1.06× |
| 2 | 0.61 | 0.68 | 0.89× | 0.67 | 0.74 | 0.91× | 0.66 | 0.72 | 0.92× |
| 3 | 0.49 | 0.71 | 0.69× | 0.55 | 0.81 | 0.67× | 0.53 | 0.73 | 0.73× |
| 4 | 1.36 | 1.19 | 1.15× | 1.49 | 1.36 | 1.10× | 1.48 | 1.32 | 1.12× |
| 5 | 0.86 | 0.79 | 1.09× | 0.96 | 0.89 | 1.08× | 0.92 | 0.88 | 1.04× |
| 6 | 0.64 | 0.79 | 0.80× | 0.75 | 0.85 | 0.89× | 0.71 | 0.87 | 0.81× |
| 7 | 1.86 | 1.51 | 1.23× | 1.93 | 1.63 | 1.18× | 1.94 | 1.67 | 1.17× |
| 8 | 1.14 | 1.03 | 1.11× | 1.20 | 1.13 | 1.07× | 1.17 | 1.12 | 1.05× |
| 9 | 0.80 | 0.92 | 0.88× | 0.89 | 1.03 | 0.87× | 0.91 | 1.02 | 0.89× |
| 10 | 5.58 | 3.30 | 1.69× | 5.66 | 3.73 | 1.52× | 5.67 | 3.62 | 1.57× |
| 11 | 3.46 | 2.96 | 1.17× | 3.55 | 3.23 | 1.10× | 3.56 | 3.17 | 1.12× |
| 12 | 2.48 | 2.29 | 1.09× | 2.70 | 2.58 | 1.05× | 2.70 | 2.53 | 1.07× |
| 13 | 1.91 | 2.02 | 0.95× | 2.11 | 2.26 | 0.94× | 2.04 | 2.30 | 0.89× |
| 14 | 8.62 | 4.41 | 1.95× | 8.62 | 4.77 | 1.81× | 8.54 | 4.72 | 1.81× |
| 15 | 5.23 | 4.13 | 1.27× | 5.39 | 4.53 | 1.19× | 5.32 | 4.44 | 1.20× |
| 16 | 3.90 | 2.87 | 1.36× | 4.07 | 3.23 | 1.26× | 4.01 | 3.29 | 1.22× |
| 17 | 2.89 | 2.63 | 1.10× | 3.09 | 2.91 | 1.06× | 3.06 | 2.92 | 1.05× |
| 18 | 12.00 | 5.81 | 2.06× | 12.12 | 6.29 | 1.93× | 12.17 | 6.27 | 1.94× |
| 19 | 7.41 | 5.27 | 1.41× | 7.52 | 5.68 | 1.32× | 7.55 | 5.68 | 1.33× |
| 20 | 5.22 | 3.73 | 1.40× | 5.51 | 4.15 | 1.33× | 5.38 | 4.15 | 1.30× |
| 21 | 4.15 | 3.14 | 1.32× | 4.33 | 3.62 | 1.20× | 4.26 | 3.62 | 1.17× |
| Set # | BFV | BGV | CKKS | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Hybrid [26,41] | Ours | Speedup | Hybrid [26,41] | Ours | Speedup | Hybrid [26,41] | Ours | Speedup | |
| (ms) | (ms) | (ms) | (ms) | (ms) | (ms) | ||||
| 1 | 0.65 | 0.60 | 1.07× | 0.71 | 0.67 | 1.06× | 0.71 | 0.67 | 1.05× |
| 2 | 0.37 | 0.45 | 0.83× | 0.41 | 0.51 | 0.80× | 0.40 | 0.50 | 0.80× |
| 3 | 0.29 | 0.41 | 0.71× | 0.34 | 0.47 | 0.72× | 0.33 | 0.46 | 0.72× |
| 4 | 0.91 | 0.85 | 1.07× | 0.98 | 0.92 | 1.06× | 0.97 | 0.91 | 1.06× |
| 5 | 0.54 | 0.54 | 1.00× | 0.58 | 0.60 | 0.97× | 0.57 | 0.59 | 0.98× |
| 6 | 0.39 | 0.47 | 0.83× | 0.44 | 0.53 | 0.82× | 0.42 | 0.53 | 0.79× |
| 7 | 1.20 | 1.15 | 1.04× | 1.29 | 1.24 | 1.04× | 1.28 | 1.23 | 1.04× |
| 8 | 0.73 | 0.70 | 1.04× | 0.79 | 0.77 | 1.02× | 0.78 | 0.76 | 1.03× |
| 9 | 0.51 | 0.56 | 0.91× | 0.57 | 0.64 | 0.90× | 0.56 | 0.62 | 0.91× |
| 10 | 3.84 | 3.03 | 1.27× | 4.03 | 3.23 | 1.25× | 4.04 | 3.21 | 1.26× |
| 11 | 2.37 | 2.59 | 0.92× | 2.52 | 2.75 | 0.92× | 2.51 | 2.73 | 0.92× |
| 12 | 1.74 | 1.88 | 0.93× | 1.88 | 2.03 | 0.93× | 1.86 | 2.02 | 0.92× |
| 13 | 1.34 | 1.58 | 0.85× | 1.46 | 1.76 | 0.83× | 1.45 | 1.72 | 0.84× |
| 14 | 6.18 | 4.15 | 1.49× | 6.38 | 4.42 | 1.44× | 6.38 | 4.40 | 1.45× |
| 15 | 3.83 | 3.85 | 0.99× | 3.90 | 4.07 | 0.96× | 3.89 | 4.06 | 0.96× |
| 16 | 2.86 | 2.56 | 1.12× | 2.94 | 2.78 | 1.06× | 2.91 | 2.77 | 1.05× |
| 17 | 2.19 | 2.22 | 0.99× | 2.26 | 2.43 | 0.93× | 2.24 | 2.42 | 0.93× |
| log N | Set # | ℓ | r | Optimal | |
|---|---|---|---|---|---|
| 15 | 1 | 16 | 15 | 1 | 4 |
| 2 | 16 | 14 | 2 | 5 | |
| 3 | 16 | 13 | 3 | 5 | |
| 4 | 20 | 19 | 1 | 4 | |
| 5 | 20 | 18 | 2 | 5 | |
| 6 | 20 | 17 | 3 | 5 | |
| 7 | 24 | 23 | 1 | 4 | |
| 8 | 24 | 22 | 2 | 5 | |
| 9 | 24 | 21 | 3 | 7 | |
| 16 | 10 | 32 | 31 | 1 | 5 |
| 11 | 32 | 30 | 2 | 3 | |
| 12 | 32 | 29 | 3 | 3 | |
| 13 | 32 | 28 | 4 | 6 | |
| 14 | 40 | 39 | 1 | 5 | |
| 15 | 40 | 38 | 2 | 3 | |
| 16 | 40 | 37 | 3 | 5 | |
| 17 | 40 | 36 | 4 | 6 | |
| 18 | 48 | 47 | 1 | 5 | |
| 19 | 48 | 46 | 2 | 3 | |
| 20 | 48 | 45 | 3 | 5 | |
| 21 | 48 | 44 | 4 | 6 |
| Set # | CPU [6] (ms) | NVIDIA A100 | NVIDIA RTX 4090 | ||
|---|---|---|---|---|---|
| Time (ms) | Speedup | Time (ms) | Speedup | ||
| 1 | 84 | 0.95 | 88.59× | 0.67 | 124.66× |
| 2 | 62 | 0.72 | 86.68× | 0.50 | 123.04× |
| 3 | 66 | 0.73 | 90.35× | 0.46 | 143.87× |
| 4 | 112 | 1.32 | 85.01× | 0.91 | 122.69× |
| 5 | 85 | 0.88 | 96.15× | 0.59 | 144.52× |
| 6 | 87 | 0.87 | 99.61× | 0.53 | 164.61× |
| 7 | 140 | 1.67 | 84.07× | 1.23 | 113.68× |
| 8 | 110 | 1.12 | 98.13× | 0.76 | 144.39× |
| 9 | 101 | 1.02 | 99.26× | 0.62 | 162.05× |
| 10 | 451 | 3.62 | 124.75× | 3.21 | 140.57× |
| 11 | 352 | 3.17 | 110.95× | 2.73 | 128.96× |
| 12 | 335 | 2.53 | 132.63× | 2.02 | 165.50× |
| 13 | 301 | 2.30 | 131.01× | 1.72 | 174.67× |
| 14 | 604 | 4.72 | 127.92× | 4.40 | 137.34× |
| 15 | 508 | 4.44 | 114.31× | 4.06 | 125.14× |
| 16 | 479 | 3.29 | 145.43× | 2.77 | 172.64× |
| 17 | 438 | 2.92 | 149.91× | 2.42 | 181.35× |
| Polynomial Degree N | ℓ | Key Size (MB) | |
|---|---|---|---|
| Max Single Decomposition | Max Double Decomposition | ||
| 60 | 221 | 9071 | |
| 60 | 443 | 18,143 | |
| 60 | 885 | 36,285 | |
| 60 | 1770 | 72,570 | |
| 60 | 3540 | 145,140 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jin, S.; Cheung, R.C.C. GPU Acceleration for KLSS Key Switching in Fully Homomorphic Encryption. Mathematics 2025, 13, 3809. https://doi.org/10.3390/math13233809
Jin S, Cheung RCC. GPU Acceleration for KLSS Key Switching in Fully Homomorphic Encryption. Mathematics. 2025; 13(23):3809. https://doi.org/10.3390/math13233809
Chicago/Turabian StyleJin, Shutong, and Ray C. C. Cheung. 2025. "GPU Acceleration for KLSS Key Switching in Fully Homomorphic Encryption" Mathematics 13, no. 23: 3809. https://doi.org/10.3390/math13233809
APA StyleJin, S., & Cheung, R. C. C. (2025). GPU Acceleration for KLSS Key Switching in Fully Homomorphic Encryption. Mathematics, 13(23), 3809. https://doi.org/10.3390/math13233809

