Optimized Hybrid Central Processing Unit–Graphics Processing Unit Workflow for Accelerating Advanced Encryption Standard Encryption: Performance Evaluation and Computational Modeling

Yang, Min Kyu; Jeong, Jae-Seung

doi:10.3390/app15073863

Open AccessArticle

Optimized Hybrid Central Processing Unit–Graphics Processing Unit Workflow for Accelerating Advanced Encryption Standard Encryption: Performance Evaluation and Computational Modeling

by

Min Kyu Yang

and

Jae-Seung Jeong

^*

Division of Artificial Intelligence Convergence Engineering, Sahmyook University, Seoul 01795, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(7), 3863; https://doi.org/10.3390/app15073863

Submission received: 17 February 2025 / Revised: 24 March 2025 / Accepted: 28 March 2025 / Published: 1 April 2025

Download

Browse Figures

Versions Notes

Abstract

Featured Application

This study provides an efficient GPU-accelerated AES encryption and decryption method, along with a hybrid CPU–GPU workflow, which is highly applicable to cloud-based data encryption and protection systems. The proposed approach ensures the rapid and secure processing of large-scale data during storage and transmission in cloud environments, offering significant reductions in processing time and operational costs for cloud service providers.

Abstract

This study addresses the growing demand for scalable data encryption by evaluating the performance of AES (Advanced Encryption Standard) encryption and decryption using CBC (Cipher Block Chaining) and CTR (Counter Mode) modes across various CPU (Central Processing Unit) and GPU (Graphics Processing Unit) hardware models. The objective is to highlight GPU acceleration benefits and propose an optimized hybrid CPU–GPU workflow for large-scale data security. Methods include benchmarking encryption performance with provided data, mathematical models, and computational analysis. The results indicate significant performance gains with GPU acceleration, particularly for large datasets, and demonstrate that the hybrid CPU–GPU approach balances speed and resource utilization efficiently.

Keywords:

AES encryption; GPU acceleration; CPU–GPU hybrid workflow; data security; performance analysis

1. Introduction

The rapid advancement of ICT (information and communication technologies), including AI (Artificial Intelligence), IoT (Internet of Things), and telecommunications, has led to an exponential increase in data generation and utilization [1,2]. Ensuring robust data security is essential, particularly for sensitive information such as financial records, medical data, and digital multimedia content, which are vulnerable to piracy, copyright infringement, and data breaches [3,4]. As data size and complexity grow, efficient computational methods and strong encryption algorithms have become indispensable [5,6].

The surge in big data has introduced challenges in maintaining data confidentiality and integrity during storage, transmission, and processing [7]. Traditional CPU (Central Processing Unit)-based encryption methods, though reliable, often struggle with the computational demands of large-scale operations, leading to performance bottlenecks [8,9]. Parallel computing solutions, particularly GPUs (Graphics Processing Units), have emerged as effective tools due to their high throughput and parallel processing capabilities [10].

GPUs, initially designed for graphics rendering, have evolved into versatile parallel processors capable of handling complex mathematical operations, making them suitable for cryptographic tasks [2,11,12]. GPU acceleration enhances encryption and decryption processes and supports real-time data security applications in cloud computing, multimedia streaming, and secure communications [13,14,15].

AES (Advanced Encryption Standard) [9,10], standardized by NIST (National Institute of Standards and Technology), remains a cornerstone of symmetric encryption for its balance of security, performance, and adaptability [16]. AES-128 is widely used for its efficiency and strong security [17]. However, the sequential nature of CBC (Cipher Block Chaining) mode poses challenges for parallel encryption, whereas CTR (Counter Mode) mode’s inherent parallelism is well suited for GPU acceleration [18].

This study benchmarks AES encryption in CBC and CTR modes across multiple CPU and GPU models, comparing their encryption performance. It emphasizes the benefits of GPU acceleration and presents a hybrid CPU–GPU workflow that optimizes performance and resource utilization based on data size and operational requirements. The growing demand for scalable and efficient encryption solutions in data-intensive applications underscores the significance of this study. By harnessing the parallelism of GPUs and the flexibility of CPUs, the proposed hybrid approach balances speed and resource efficiency, making it suitable for modern encryption needs. Experiments applied AES encryption in CBC and CTR modes across different CPU and GPU hardware models, analyzing performance under varying data sizes. Performance data and mathematical models were used to calculate encryption times, GPU resource utilization, and cost-performance metrics. Results demonstrated that GPUs significantly accelerate large-scale data encryption, while the hybrid CPU–GPU approach enhances performance for specific workloads. This study highlights the importance of hardware selection for cryptographic tasks and illustrates the effectiveness of integrating theoretical models with performance analysis. Practical applications of the proposed hybrid workflow include cloud computing, data centers, and real-time processing systems, with detailed background on GPU architecture, encryption challenges, and prior research on parallel cryptographic processing to provide a comprehensive context. The existing CPU-only encryption methods face limitations in processing large-scale data efficiently, while GPU-only methods can suffer from suboptimal resource utilization and I/O bottlenecks. This study aims to bridge this gap by proposing and evaluating a hybrid CPU–GPU encryption workflow.

The objective of this study is to benchmark AES encryption performance across various CPU and GPU hardware models, highlighting the computational benefits provided by GPU acceleration. Additionally, we propose an optimized hybrid CPU–GPU workflow designed to balance workload distribution efficiently, tailored to specific data sizes and operational scenarios. To achieve this, we first formulate the research problem and identify key computational bottlenecks in AES encryption (Section 2.1). We then analyze CPU and GPU architectures (Section 2.2) and describe the AES-128 algorithm, emphasizing CBC and CTR modes suitable for parallel computing (Section 2.3). Next, we introduce our proposed conceptual hybrid CPU–GPU workflow and the structured experimental approach adopted for its evaluation (Section 2.4). Subsequently, we present detailed descriptions of five hybrid encryption methods—data shuffling, I/O bottleneck reduction, adaptive workload distribution, ping-pong processing, and bit-slicing parallelization (Section 2.5). Computational models and analyses applied to quantitatively evaluate performance improvements, resource utilization, and energy efficiency are detailed in (Section 2.6). Finally, we describe the experimental environment, hardware specifications, and software tools employed to ensure reproducibility and validation of our results (Section 2.7). The outcomes of our comprehensive evaluations demonstrate the effectiveness of the hybrid CPU–GPU encryption methods, offering valuable insights and practical guidelines for efficient large-scale data encryption.

Related Work

Previous research has explored GPU-based AES encryption, primarily focusing on leveraging GPU architectures to accelerate encryption processes. For instance, NVIDIA’s implementation demonstrated that GPU-based encryption could achieve speeds almost 1.7 times faster than traditional CPU-based methods. Similarly, studies have achieved up to 60 Gbps throughput on NVIDIA Tesla GPUs, showcasing significant performance improvements over sequential CPU implementations [19].

Hybrid computing approaches combining CPUs and GPUs have also been investigated to enhance computational efficiency. Udagawa and Sekijima proposed a method to balance workloads between CPUs and GPUs, aiming for efficient processor utilization and acceleration. Additionally, hybrid co-processing has been applied to various computational problems, demonstrating the potential of collaborative CPU–GPU utilization [20,21].

However, these studies often focus on specific aspects of CPU–GPU interactions or target particular applications without implementing a fully adaptive workload balancing strategy. Our framework differentiates itself by introducing a comprehensive adaptive hybrid approach that dynamically leverages CPU strengths for tasks such as metadata and small block encryption, while fully exploiting GPU parallelism for bulk data processing. This balanced methodology offers clear performance and efficiency advantages not thoroughly addressed in prior studies.

2. Proposed Framework and Experimental Methodology

Large datasets ranging from megabytes to gigabytes were used to assess the encryption and decryption performance. Data files in raw byte format were encrypted without restrictions on file type or extension to provide a broad comparison of hardware performance. This approach allowed the accurate evaluation of computational performance by minimizing the influence of file-specific overhead such as compression and metadata.

2.1. Problem Formulation

Despite the recognized security and efficiency of AES encryption, traditional CPU-based implementations encounter significant performance bottlenecks in large-scale data encryption tasks due to their inherently sequential execution characteristics [9,10]. GPU-based encryption, leveraging parallel processing capabilities, presents a promising alternative. However, purely GPU-based approaches often suffer from issues such as inefficient workload distribution, high memory transfer overhead, and the suboptimal utilization of CPU resources [19,20]. Addressing these limitations necessitates a carefully designed hybrid CPU–GPU approach to optimally balance computational tasks, efficiently distributing workloads according to data characteristics and resource availability.

2.2. CPU and GPU Architectures

Figure 1 illustrates the fundamental differences in AES encryption and decryption processes between CPUs and GPUs. As depicted, CPUs handle encryption sequentially (Figure 1a), while GPUs utilize parallel threads to concurrently process multiple encryption blocks (Figure 1b), emphasizing GPU suitability for parallel cryptographic operations.

Performance was quantified using the theoretical throughput equation:

T h r o u g h t p u t = \frac{O p e r a t i o n s}{T i m e} \times N u m b e r o f C o r e s

(1)

2.3. AES-128 Encryption Algorithm and Models

Figure 2 presents the AES-128 encryption algorithm flowchart, detailing each encryption step. The process begins with loading a 128-bit plaintext block, followed by an initial AddRoundKey operation. A conditional loop checks if the current round (r) is less than the total number of rounds (Nr). If true, the algorithm performs the SubBytes, ShiftRows, MixColumns, and AddRoundKey operations sequentially, incrementing the round count until the final round is reached. In the last round, MixColumns is omitted, and the ciphertext is stored after the final AddRoundKey operation. This flowchart clarifies the AES encryption process, showing the iterative nature of the encryption steps and their importance in securing data.

Figure 3 illustrates the CBC and CTR encryption modes used in AES-128. In Figure 3a, CBC mode shows that each plaintext block is XORed with the previous ciphertext block before encryption, making it inherently sequential and limiting parallel processing. Decryption reverses this process by XORing each decrypted block with the previous ciphertext. In Figure 3b, CTR mode demonstrates parallel-friendly encryption where a counter block (Nonce and Counter) is encrypted to produce a key stream, which is XORed with the plaintext blocks independently, enabling concurrent encryption and decryption operations. This independence highlights the CTR mode’s suitability for GPU-based parallel processing.

2.4. Proposed Hybrid CPU–GPU Workflow

2.4.1. Conceptual Model

The proposed hybrid CPU–GPU workflow optimizes AES encryption by dynamically allocating computational workloads to CPUs and GPUs based on the characteristics of the data being encrypted. Initially, input data are analyzed to determine its size and computational complexity. Small data blocks and metadata are assigned to the CPU, exploiting its strengths in sequential and control-oriented tasks. In contrast, larger data chunks, which are more suited for parallel processing, are assigned to the GPU for efficient parallel encryption. Following parallel encryption, encrypted outputs from the CPU and GPU are combined to generate the final encrypted result (Figure 4).

2.4.2. Experimental Flowchart

The experimental approach to validating the proposed hybrid workflow involved clearly defined steps, as illustrated in Figure 5. First, datasets of various sizes ranging from megabytes to gigabytes were prepared. Benchmarking scripts were executed multiple times (ten iterations each) to measure encryption performance precisely, employing Python 3.7’s “timeit” library and NVIDIA’s profiling tool “nvprof”. Based on iterative feedback from performance data, chunk sizes, CPU–GPU task ratios, and thread counts were dynamically optimized. Statistical analyses were performed using NumPy and SciPy libraries to verify the consistency and reliability of the results. This structured experimental flow ensured accurate evaluation and reproducibility.

2.5. Proposed Encryption Workflow with Hybrid CPU–GPU Strategies

This hybrid approach balances workload distribution effectively, maximizing parallel processing on GPUs while efficiently leveraging CPU strengths for sequential tasks, small block encryption, and metadata processing. Prior research has demonstrated that parallel AES algorithms implemented on CPU–GPU heterogeneous platforms can significantly enhance energy efficiency, highlighting the practicality and relevance of hybrid encryption approaches [22].

2.5.1. Data Shuffling

In this method, data are partitioned into blocks, and a seeded random generator is used to assign each block to either the CPU or GPU, as shown in Algorithm 1. Random seed (R) is a uniformly distributed pseudo-random number between 0 and 1. If R < 0.5, the block is assigned to the CPU; otherwise, it is assigned to the GPU. This randomization ensures workload balance and helps avoid predictable encryption patterns, enhancing both performance and security. The CPU processes its assigned blocks sequentially, while the GPU performs parallel encryption on its assigned blocks, ensuring balanced workload distribution and enhanced security [22].

Algorithm 1: Data shuffling for hybrid CPU–GPU encryption

This algorithm partitions input data into multiple blocks and assigns each block to either the CPU or GPU using a seeded random value. This approach aims to balance workload and en-hance parallel encryption efficiency.

1. Partition data

D

into

n

blocks

2. Generate random seed

R

3. for each block

d_{i}

in

D

do

4. Assign

d_{i}

to CPU or GPU based on

R

5. if

d_{i}

assigned to CPU then

6.

CPU_encrypt (d_{i})

7. else

8.

GPU_encrypt (d_{i})

9. Combine all encrypted blocks

2.5.2. I/O Bottleneck Reduction

As shown in Algorithm 2, this method reduces GPU memory transfer bottlenecks by having the CPU pre-encrypt metadata and small data blocks, while the GPU handles bulk data encryption in parallel. Small blocks (S) are empirically defined as data chunks smaller than 4 MB based on benchmarking which indicates reduced GPU memory transfer overhead at this threshold. Data larger than 4 MB is considered normal or large, and processed directly by GPU encryption. The reduction in data transfer time, given by

T_{t r a n s f e r} = (T o t a l D a t a - P r e E n c r y p t e d D a t a) / B a n d w i d t h

, improves overall encryption performance by minimizing GPU idle time. Recent studies on CPU–GPU heterogeneous platforms have shown that efficient workload partitioning significantly reduces data transfer overheads, thereby improving energy efficiency and overall encryption performance.

Algorithm 2: I/O bottleneck reduction with CPU pre-encryption

This method reduces data transfer bottlenecks by encrypting metadata and smaller data blocks on the CPU in advance, minimizing GPU idle time and improving overall encryption throughput.

1.

CPU_encrypt (metadata M and small blocks S)

2.

GPU_encrypt (remaining data D - (M \cup S))

3. Combine encrypted metadata, small blocks, and bulk data

2.5.3. Adaptive Workload Distribution

Algorithm 3 explains the adaptive workload distribution, where tasks are dynamically allocated based on data size. Smaller data are encrypted by the CPU, while larger datasets are processed by the GPU, with the hybrid encryption time minimized as

T_{h y b r i d} = m i n (T_{C P U}, T_{G P U})

, optimizing resource usage. The threshold value (

N_{t h r e s h o l d}

) was set at 32 MB, determined through iterative benchmarking, indicating optimal CPU–GPU workload distribution. Lower

N_{t h r e s h o l d}

values increase CPU utilization but might reduce overall throughput, while higher values effectively utilize GPU resources but might increase initial CPU overhead. The adaptive workload distribution strategy is further supported by recent research, highlighting substantial performance improvements when dynamically adjusting CPU and GPU tasks based on data size and resource availability in heterogeneous computing environments [23].

Algorithm 3: Adaptive CPU–GPU workload distribution

The workload distribution between CPU and GPU is determined adaptively based on data size relative to a predefined threshold

N_{t h r e s h o l d}

. Smaller datasets are processed by CPU encryption, while larger datasets are efficiently handled by GPU encryption.

1. Measure data size

N

2. if

N < N_{t h r e s h o l d}

then

3.

CPU_encrypt (N)

4. else

5.

GPU_encrypt (N)

2.5.4. Ping-Pong Processing

The ping-pong processing technique, explained in Algorithm 4, pipelines data processing between the CPU and GPU. The CPU processes a chunk and sends it to the GPU while working on the next chunk, ensuring continuous data flow and maximizing resource utilization with total processing time

T_{p i p e l i n e} = m a x (T_{C P U}, T_{G P U})

.

Algorithm 4: Ping-pong CPU–GPU processing

CPU and GPU alternately encrypt data chunks in a pipelined fashion, maintaining continuous processing flow and maximizing resource efficiency.

1. for each chunk

i

in data

D

:

2. if

i \ m o d 2 = 0

then

3.

CPU_encrypt (d_{i})

4. else

5.

GPU_encrypt (d_{i})

2.5.5. Bit-Slicing Parallelization

As shown in Algorithm 5, bit-slicing parallelization breaks data into slices at the bit level, allowing simultaneous AES operations in the GPU. The CPU prepares the slices, and the GPU processes them concurrently, achieving a speedup approximated by

O (N / k)

, where k is the number of bit slices processed in parallel. This method significantly enhances parallel processing efficiency. Additionally, parallel implementations such as Galois/Counter Mode (GCM) on GPUs demonstrate significant throughput enhancements, further validating the efficacy of bit-slicing parallelization techniques for accelerating AES encryption on modern GPUs [24].

Algorithm 5: Bit-slicing parallelization in GPU

This algorithm divides data into bit-level slices, enabling GPU threads to encrypt multiple data slices simultaneously, significantly increasing parallel processing performance.

1. CPU prepares

k

-bit slices of data

D

2. GPU runs parallel AES operations on each bit slice

3. Combine all encrypted slices into final output

2.6. Computational Analysis and Models

GPU Performance Estimation Model: Amdahl’s Law was applied to estimate theoretical GPU speedup based on the parallelizable fraction of encryption tasks.

S p e e d u p = \frac{T_{C P U}}{T_{G P U}} + \frac{1}{(1 - p) + \frac{p}{n}}

(2)

Time Complexity Modeling: Linear regression analyzed encryption time as a function of data size:

T = a N + b

(3)

Resource Utilization Analysis: GPU core usage, memory bandwidth, and clock speed were evaluated using:

U = \frac{U s e d R e s o u r c e s}{T o t a l R e s o u r c e s} \times 100 %

(4)

Cost-Performance Analysis: GPU cost versus encryption throughput was calculated:

C = \frac{C o s t}{T h r o u g h t p u t}

(5)

Energy Consumption Estimation: Power consumption during encryption was estimated based on GPU TDP and encryption time:

E = P \times T

(6)

2.7. Experimental Environment and Setup

Experiments utilized three NVIDIA GPUs—GTX 1080 Ti (Pascal) (NVIDIA, Santa Clara, CA, USA), RTX 2080 Ti (Turing), and RTX 3080 (Ampere)—and multi-core Intel/AMD CPUs (AMD, Santa Clara, CA, USA) as baselines (Table 1). Python-based benchmarking scripts were executed with “timeit”, and NVIDIA’s “nvprof” profiled GPU workloads. Statistical analyses employed NumPy/SciPy libraries (scipy.stats.linregress), ensuring reliability via repeated measurements (10 iterations each). All the codes, libraries (CUDA (Compute Unified Device Architecture), cryptographic), and datasets are available upon request for reproducibility.

3. Results

3.1. Performance Comparison of CPU and GPU-Based AES Encryption

In our experiments, we utilized raw binary data files without additional metadata or compression to accurately measure pure computational encryption throughput.

Different file formats (e.g., compressed images, videos, and structured text) may influence actual encryption performance due to variable I/O overhead and preprocessing requirements, but these variations were deliberately excluded from this study to maintain a clear focus on computational speed comparisons. Figure 6 presents a comparison of encryption time across various CPUs (Intel 6-core, AMD 7-core, and Intel 12-core) and GPUs (GTX 1080 Ti, RTX 2080 Ti, and RTX 3080) for increasing file sizes ranging from 100 MB to 1200 MB. As shown, GPU-based encryption consistently outperforms CPU-based encryption, with encryption times remaining under 1 s even for the largest file sizes. This performance gain is attributed to the massive parallelism offered by GPUs, making them suitable for large-scale data encryption tasks.

3.2. GPU Performance Across Different Models

Figure 7 illustrates the encryption time comparison among the three GPU models tested. Despite differences in architecture and specifications, the encryption times show minor variations, with the RTX 3080 marginally outperforming the others due to its higher CUDA core count and memory bandwidth. This result indicates that while hardware improvements enhance performance, the fundamental parallelism of the AES-CTR mode ensures consistent results across models.

3.3. Evaluation of Hybrid CPU–GPU Methods

In the evaluation of the proposed hybrid CPU–GPU encryption methods, the experimental flow chart shown in Figure 8 outlines the detailed process undertaken to assess the performance improvements. This flow chart highlights key steps such as data size analysis, chunk size and thread count definition, key generation, and iterative optimization through benchmarking and workload simulations across both CPU and GPU processes. Each method, including data shuffling, I/O bottleneck reduction, adaptive workload distribution, ping-pong processing, and bit-slicing parallelization, was systematically evaluated using this experimental approach. The iterative feedback loops ensured optimal parameter selection, and data exchange between CPU and GPU was managed efficiently to reflect realistic encryption scenarios. This experimental setup provided a comprehensive assessment of the proposed methods.

The encryption times presented in Table 2 highlight that all the proposed hybrid CPU–GPU methods significantly reduce encryption times compared to the GPU baseline, with bit-slicing parallelization achieving the fastest results across all the file sizes.

Table 3 further emphasizes the performance improvements, showing that bit-slicing yields the highest speedup (35–40%), while data shuffling also achieves notable gains (25–30%). These results underscore the efficiency of each hybrid method, with adaptive distribution, I/O bottleneck reduction, and ping-pong processing also contributing substantial performance enhancements across varying data sizes. This comprehensive evaluation illustrates that the proposed hybrid approaches effectively optimize encryption workloads by leveraging both CPU and GPU resources.

4. Discussion

The results presented in this study highlight the significant performance advantages of GPU-based AES encryption over traditional CPU-based methods, particularly when handling large datasets. As demonstrated in Figure 6 and Figure 7, GPU acceleration substantially reduces encryption times, making it a viable solution for data-intensive applications such as cloud computing and real-time processing. This aligns with previous studies that emphasized the benefits of GPU parallelism for cryptographic operations, but our study extends these findings by integrating a hybrid CPU–GPU approach tailored for optimal workload distribution [25,26,27].

The AES-CTR mode’s inherent parallelism facilitated efficient GPU utilization, while the CBC mode, though challenging due to its sequential dependency, benefited from our proposed hybrid methods. The bit-slicing parallelization method showed the highest performance improvement (35–40%), consistent with the theoretical advantages of bit-level parallelism highlighted. The adaptive workload distribution method, providing 15–25% improvement, reinforces the importance of dynamic task allocation in heterogeneous computing environments [19,28].

Our results also illustrate that while hardware advancements (such as increased CUDA cores and memory bandwidth in the RTX 3080) contribute to performance gains, the architectural differences among GPUs have a lesser impact than the employed encryption strategies. This finding is supported by prior research and underscores the importance of algorithmic optimization alongside hardware selection [28,29,30]. Although our analysis identifies the number of CUDA cores and memory bandwidth as key contributors to GPU performance improvements, a deeper exploration into the interactions between GPU hardware architecture (such as warp scheduling, shared memory utilization, and cache hierarchies) and AES encryption algorithms would provide further insight. Profiling kernel execution patterns, analyzing memory access behaviors, and assessing cache hit/miss rates would provide valuable insights into optimizing encryption performance. Future studies should conduct in-depth profiling using GPU performance monitoring tools (e.g., NVIDIA Nsight and nvprof) to precisely quantify these interactions and further enhance the optimization of AES implementations on modern GPU architectures. Detailed profiling analysis, such as the examination of kernel execution patterns, shared memory access, and cache hit rates, should be conducted in future work to precisely quantify how these architectural features affect cryptographic performance. Such analyses would enhance understanding of the critical factors that optimize GPU-based encryption workflows and allow further optimization of AES implementations.

The hybrid CPU–GPU workflow, detailed in Figure 8, offers a balanced approach by leveraging the strengths of both CPUs (for sequential tasks and control operations) and GPUs (for parallel data processing). This synergy not only enhances performance but also optimizes resource utilization, making it particularly suitable for large-scale data encryption in distributed systems and cloud environments.

Furthermore, our study introduces practical techniques to address common bottlenecks in GPU-based encryption, such as I/O data transfer latency. The proposed I/O bottleneck reduction method achieved a 20% improvement by minimizing GPU idle time through CPU pre-encryption, echoing similar strategies in recent works.

The implications of these findings are far-reaching. In an era where data security is paramount, our proposed hybrid encryption methods offer scalable solutions that can be integrated into the existing security infrastructures, enhancing both performance and security. Future research could explore adaptive hybrid models that dynamically adjust encryption strategies based on real-time performance metrics, further improving efficiency.

Additionally, integrating advanced GPU optimization techniques, such as tensor cores for cryptographic computations, presents an exciting avenue for future exploration. Investigating the impact of varying encryption key sizes (AES-192 and AES-256) on hybrid workflows could also provide valuable insights, as would applying similar hybrid strategies to other cryptographic algorithms like RSA or ECC.

5. Conclusions

This study conducted a detailed evaluation of AES-128 encryption performance across a range of CPU and GPU hardware platforms, with a focus on designing and validating an optimized hybrid CPU–GPU encryption framework. By analyzing both the serial (CBC) and parallel (CTR) modes, we identified key performance bottlenecks such as I/O latency and workload imbalance. To address these, we proposed five enhancement strategies: data shuffling, I/O bottleneck reduction, adaptive workload distribution, ping-pong processing, and bit-slicing parallelization.

The experimental results demonstrated that the hybrid methods notably improved encryption throughput across different file sizes and hardware setups. In particular, bit-slicing parallelization achieved the highest performance gains on large datasets, while adaptive workload distribution consistently reduced execution time variance. These findings confirm the effectiveness of combining CPU–GPU collaboration with tailored optimization techniques for high-performance cryptographic computing.

Despite these improvements, certain limitations remain. The study focused primarily on NVIDIA GPU architectures, and the results may vary when applied to AMD or Intel GPU platforms. Additionally, the research was limited to AES-128 encryption, while different key lengths (AES-192 and AES-256) could exhibit varied performance characteristics. Furthermore, operating system and driver variations were not explored, which could impact real-world encryption speeds.

Future research should address these limitations by extending hybrid CPU–GPU optimization techniques to heterogeneous GPU architectures and investigating performance implications for different AES key lengths. Additionally, real-time adaptive encryption models should be developed to dynamically allocate tasks based on system workload and power efficiency. The integration of emerging GPU architectures, such as those optimized for AI acceleration, into cryptographic frameworks presents another promising avenue for exploration.

The proposed hybrid encryption strategies pave the way for more scalable, efficient, and hardware-optimized data security solutions, contributing significantly to the broader field of high-performance cryptographic computing.

Author Contributions

Conceptualization, M.K.Y. and J.-S.J.; Methodology, M.K.Y. and J.-S.J.; Software, J.-S.J.; Validation, M.K.Y. and J.-S.J.; Formal analysis, J.-S.J.; Investigation, J.-S.J.; Resources, M.K.Y.; Data curation, J.-S.J.; Writing – original draft, J.-S.J.; Writing – review & editing, M.K.Y. and J.-S.J.; Supervision, M.K.Y.; Project administration, M.K.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grants funded by the Korean government (MSIT) (Approval No. RS-2024-00445552).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Dataset available upon request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

AES	Advanced Encryption Standard
GPU	Graphics Processing Unit
CPU	Central Processing Unit
CBC	Cipher Block Chaining
CTR	Counter Mode
CUDA	Compute Unified Device Architecture

References

Cao, J.; Ma, M.; Li, H.; Zhang, Y.; Luo, Z. A Survey on Security Aspects for LTE and LTE-A Networks. IEEE Commun. Surv. Tutor. 2014, 16, 283–302. [Google Scholar] [CrossRef]
Owens, J.D.; Houston, M.; Luebke, D.; Green, S.; Stone, J.E.; Phillips, J.C. GPU Computing. Proc. IEEE 2008, 96, 879–899. [Google Scholar] [CrossRef]
Awaysheh, F.M.; Alazab, M.; Garg, S.; Niyato, D.; Verikoukis, C. Big Data Resource Management & Networks: Taxonomy, Survey, and Future Directions. IEEE Commun. Surv. Tutor. 2021, 23, 2098–2130. [Google Scholar] [CrossRef]
Chen, Y.; Sion, R. On Securing Untrusted Clouds with Cryptography. In Proceedings of the 9th Annual ACM Workshop on Privacy in the Electronic Society, Chicago, IL, USA, 4 October 2010; Available online: https://dl.acm.org/doi/abs/10.1145/1866919.1866935 (accessed on 17 February 2025).
Lucas, C. Asymptotics of the Odometer Function for the Internal Diffusion Limited Aggregation Model. arXiv 2009, arXiv:0911.3224. [Google Scholar]
Chen, J.; Ran, X. Deep Learning with Edge Computing: A Review. Proc. IEEE 2019, 107, 1655–1674. [Google Scholar] [CrossRef]
Shi, R.; Potluri, S.; Hamidouche, K.; Perkins, J.; Li, M.; Rossetti, D.; Panda, D.K.D.K. Designing Efficient Small Message Transfer Mechanism for Inter-Node MPI Communication on InfiniBand GPU Clusters. In Proceedings of the 2014 21st International Conference on High Performance Computing (HiPC), Goa, India, 17–20 December 2014; pp. 1–10. [Google Scholar]
Mahajan, D.P.; Sachdeva, A. A Study of Encryption Algorithms AES, DES and RSA for Security. Glob. J. Comput. Sci. Technol. 2013, 13, 15–22. [Google Scholar]
Mahajan, D.P.; Sachdeva, A. A Survey of General-Purpose Computation on Graphics Hardware—Owens—2007—Computer Graphics Forum—Wiley Online Library. Available online: https://onlinelibrary.wiley.com/doi/full/10.1111/j.1467-8659.2007.01012.x (accessed on 17 February 2025).
Ryoo, S.; Rodrigues, C.I.; Baghsorkhi, S.S.; Stone, S.S.; Kirk, D.B.; Hwu, W.-m.W. Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Salt Lake City, UT, USA, 20–23 February 2008; Available online: https://dl.acm.org/doi/abs/10.1145/1345206.1345220 (accessed on 17 February 2025).
Vetter, J.S.; Glassbrook, R.; Dongarra, J.; Schwan, K.; Loftis, B.; McNally, S.; Meredith, J.; Rogers, J.; Roth, P.; Spafford, K.; et al. Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community. Comput. Sci. Eng. 2011, 13, 90–95. [Google Scholar] [CrossRef]
Che, S.; Boyer, M.; Meng, J.; Tarjan, D.; Sheaffer, J.W.; Skadron, K. A Performance Study of General-Purpose Applications on Graphics Processors Using CUDA. J. Parallel Distrib. Comput. 2008, 68, 1370–1380. [Google Scholar] [CrossRef]
Nickolls, J.; Buck, I.; Garland, M.; Skadron, K. Scalable Parallel Programming with CUDA: Is CUDA the Parallel Programming Model That Application Developers Have Been Waiting For? Queue 2008, 6, 40–53. [Google Scholar] [CrossRef]
Harrison, O.; Waldron, J. Efficient Acceleration of Asymmetric Cryptography on Graphics Hardware. In Progress in Cryptology—AFRICACRYPT 2009; Preneel, B., Ed.; Springer: Berlin/Heidelberg, Germany, 2009; pp. 350–367. [Google Scholar]
Manavski, S.A. CUDA Compatible GPU as an Efficient Hardware Accelerator for AES Cryptography. In Proceedings of the 2007 IEEE International Conference on Signal Processing and Communications, Dubai, United Arab Emirates, 24–27 November 2007; pp. 65–68. [Google Scholar]
Nechvatal, J.; Barker, E.; Bassham, L.; Burr, W.; Dworkin, M.; Foti, J.; Roback, E. Report on the Development of the Advanced Encryption Standard (AES). J. Res. Natl. Inst. Stand. Technol. 2001, 106, 511–577. [Google Scholar] [CrossRef] [PubMed]
Elbirt, A.J.; Yip, W.; Chetwynd, B.; Paar, C. An FPGA-Based Performance Evaluation of the AES Block Cipher Candidate Algorithm Finalists. IEEE Trans. Very Large Scale Integr. VLSI Syst. 2001, 9, 545–557. [Google Scholar] [CrossRef]
Dworkin, M. Recommendation for Block Cipher Modes of Operation: Methods for Format-Preserving Encryption; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2016; p. NIST SP 800-38G. [Google Scholar]
Li, Q.; Zhong, C.; Zhao, K.; Mei, X.; Chu, X. Implementation and Analysis of AES Encryption on GPU. In Proceedings of the 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems, Liverpool, UK, 25–27 June 2012; IEEE: Liverpool, UK, 2012; pp. 843–848. [Google Scholar]
Song, P.; Zhang, Z.; Zhang, Q.; Liang, L.; Zhao, Q. Implementation of the CPU/GPU Hybrid Parallel Method of Characteristics Neutron Transport Calculation Using the Heterogeneous Cluster with Dynamic Workload Assignment. Ann. Nucl. Energy 2020, 135, 106957. [Google Scholar] [CrossRef]
Agulleiro, J.I.; Vázquez, F.; Garzón, E.M.; Fernández, J.J. Hybrid Computing: CPU + GPU Co-Processing and Its Application to Tomographic Reconstruction. Ultramicroscopy 2012, 115, 109–114. [Google Scholar] [CrossRef]
Fei, X.; Li, K.; Yang, W.; Li, K. Analysis of Energy Efficiency of a Parallel AES Algorithm for CPU-GPU Heterogeneous Platforms. Parallel Comput. 2020, 94–95, 102621. [Google Scholar] [CrossRef]
Performance Analysis of AES on CPU-GPU Heterogeneous Systems. Available online: https://www.researchgate.net/publication/362473516_Performance_Analysis_of_AES_on_CPU-GPU_Heterogeneous_Systems (accessed on 20 March 2025).
Lee, J.; Kim, D.; Seo, S.C. Parallel Implementation of GCM on GPUs. ICT Express 2025, 11, 310–316. [Google Scholar] [CrossRef]
Shao, F.; Chang, Z.; Zhang, Y. AES Encryption Algorithm Based on the High Performance Computing of GPU. In Proceedings of the 2010 Second International Conference on Communication Software and Networks, Singapore, 26–28 February 2010; pp. 588–590. [Google Scholar]
Chiuta, A.M. AES Encryption and Decryption Using Direct3D 10 API. arXiv 2012, arXiv:1201.0398. [Google Scholar]
Joy of GPU Computing: A Performance Comparison of AES and RSA in GPU and CPU. Available online: https://www.researchgate.net/publication/344263989_Joy_of_GPU_Computing_A_Performance_Comparison_of_AES_and_RSA_in_GPU_and_CPU?utm_source=chatgpt.com (accessed on 17 February 2025).
Tezcan, C. Optimization of Advanced Encryption Standard on Graphics Processing Units. IEEE Access 2021, 9, 67315–67326. [Google Scholar] [CrossRef]
Niewiadomska-Szynkiewicz, E.; Marks, M.; Jantura, J.; Podbielski, M. A Hybrid CPU/GPU Cluster for Encryption and Decryption of Large Amounts of Data. Inf. Technol. 2012, nr 3, 32–39. [Google Scholar] [CrossRef]
Yudheksha, G.; Kumar, P.; Keerthana, S. A Study of AES and RSA Algorithms Based on GPUs. In Proceedings of the 2022 International Conference on Electronics and Renewable Systems (ICEARS), Tuticorin, India, 16–18 March 2022; pp. 879–885. [Google Scholar]

Figure 1. Schematic of encryption and decryption process (a) CPU, (b) GPU. Note: Ellipses in values (e.g., ‘…’) are used to indicate truncated representations for visual clarity.

Figure 2. Flowchart for AES algorithm.

Figure 3. Comparison of AES encryption and decryption workflows using (a) CBC mode and (b) CTR mode. In CBC mode, encryption is performed sequentially due to the dependency between blocks, while CTR mode allows parallel processing by precomputing key streams using counter blocks. Arrows represent data flow, and ellipses (“…”) indicate continuation to subsequent blocks.

Figure 4. Proposed conceptual hybrid CPU–GPU workflow.

Figure 5. Experimental flowchart for hybrid method evaluation. *The asterisk symbols indicate implementation details: “*Python/CUDA/nvprof Profiling” refers to the software environment used for benchmarking setup, and “AES-128 Encryption in CTR” indicates the encryption scheme used during execution.

Figure 6. Comparison of encryption time for different file sizes across CPU and GPU models.

Figure 7. Comparison of encryption time for different file sizes across three GPU models (GeForce GTX 1080 Ti, GeForce RTX 2080 Ti, and GeForce RTX 3080).

Figure 8. Experimental flow chart illustrating the evaluation process of the proposed hybrid CPU–GPU encryption methods, including data size analysis, chunk and thread management, key generation, benchmarking, workload simulation, and performance evaluation.

Table 1. Specifications of the GPUs (GeForce GTX 1080 Ti, RTX 2080 Ti, and RTX 3080).

	GTX 1080 Ti	RTX 2080 Ti	RTX 3080
Architecture	Pascal	Turing	Ampere
CUDACores	3584	4352	8704
Memory [GB]	11	11	10
MemoryBandwidth [GB/s]	484	616	760
Boost Clock Speed [MHz]	1582	1545	1710

Table 2. Encryption times of hybrid CPU–GPU methods for different file sizes (in seconds).

Method	75 MB (s)	150 MB (s)	300 MB (s)	600 MB (s)	1200 MB (s)
GPU Baseline	0.05	0.09	0.18	0.36	0.72
Data Shuffling	0.04	0.07	0.14	0.28	0.56
I/O Bottleneck Reduction	0.042	0.08	0.16	0.32	0.64
Adaptive Distribution	0.043	0.081	0.162	0.324	0.648
Ping-Pong Processing	0.041	0.078	0.155	0.31	0.62
Bit-Slicing Parallel	0.037	0.07	0.14	0.28	0.56

Table 3. Performance improvements of hybrid CPU–GPU methods (percentages by file size).

Method	Improvement (%)
Data Shuffling	25–30
I/O Bottleneck Reduction	20
Adaptive Distribution	15–25
Ping-Pong Processing	18–22
Bit-Slicing Parallel	35–40

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, M.K.; Jeong, J.-S. Optimized Hybrid Central Processing Unit–Graphics Processing Unit Workflow for Accelerating Advanced Encryption Standard Encryption: Performance Evaluation and Computational Modeling. Appl. Sci. 2025, 15, 3863. https://doi.org/10.3390/app15073863

AMA Style

Yang MK, Jeong J-S. Optimized Hybrid Central Processing Unit–Graphics Processing Unit Workflow for Accelerating Advanced Encryption Standard Encryption: Performance Evaluation and Computational Modeling. Applied Sciences. 2025; 15(7):3863. https://doi.org/10.3390/app15073863

Chicago/Turabian Style

Yang, Min Kyu, and Jae-Seung Jeong. 2025. "Optimized Hybrid Central Processing Unit–Graphics Processing Unit Workflow for Accelerating Advanced Encryption Standard Encryption: Performance Evaluation and Computational Modeling" Applied Sciences 15, no. 7: 3863. https://doi.org/10.3390/app15073863

APA Style

Yang, M. K., & Jeong, J.-S. (2025). Optimized Hybrid Central Processing Unit–Graphics Processing Unit Workflow for Accelerating Advanced Encryption Standard Encryption: Performance Evaluation and Computational Modeling. Applied Sciences, 15(7), 3863. https://doi.org/10.3390/app15073863

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimized Hybrid Central Processing Unit–Graphics Processing Unit Workflow for Accelerating Advanced Encryption Standard Encryption: Performance Evaluation and Computational Modeling

Abstract

Featured Application

Abstract

1. Introduction

Related Work

2. Proposed Framework and Experimental Methodology

2.1. Problem Formulation

2.2. CPU and GPU Architectures

2.3. AES-128 Encryption Algorithm and Models

2.4. Proposed Hybrid CPU–GPU Workflow

2.4.1. Conceptual Model

2.4.2. Experimental Flowchart

2.5. Proposed Encryption Workflow with Hybrid CPU–GPU Strategies

2.5.1. Data Shuffling

2.5.2. I/O Bottleneck Reduction

2.5.3. Adaptive Workload Distribution

2.5.4. Ping-Pong Processing

2.5.5. Bit-Slicing Parallelization

2.6. Computational Analysis and Models

2.7. Experimental Environment and Setup

3. Results

3.1. Performance Comparison of CPU and GPU-Based AES Encryption

3.2. GPU Performance Across Different Models

3.3. Evaluation of Hybrid CPU–GPU Methods

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI