High-Resolution Hogel Image Generation Using GPU Acceleration

Hyunmin Kang; Byungjoon Kim; Yongduek Seo

doi:10.3390/photonics12090882

,

and

¹

Digital Healthcare Center, Gumi Electronics & Information Technology Research Institute, Gumi 39253, Republic of Korea

²

Department of Artificial Intelligence, Sogang University, Seoul 04107, Republic of Korea

³

Korean AI Certification, Seoul 04778, Republic of Korea

^*

Author to whom correspondence should be addressed.

Photonics2025, 12(9), 882;https://doi.org/10.3390/photonics12090882

This article belongs to the Special Issue Holographic Information Processing

Version Notes

Order Reprints

Abstract

A holographic stereogram displays reconstructed 3D images by rearranging multiple 2D viewpoint images into small holographic pixels (hogels). However, conventional CPU-based hogel generation processes these images sequentially, causing computation times to soar with as the resolution and number of viewpoints increase, which makes real-time implementation difficult. In this study, we introduce a GPU-accelerated parallel processing method to speed up the generation of high-resolution hogel images and achieve near-real-time performance. Specifically, we implement the pixel-rearrangement algorithm for multiple viewpoint images as a CUDA-based GPU kernel, designing it so that thousands of threads process individual pixels simultaneously. We also optimize CPU–GPU data transfers and improve memory access efficiency to maximize GPU parallel performance. The experimental results show that the proposed method achieves over a 5× speedup compared to the CPU across resolutions from FHD to 8K while maintaining output image quality equivalent to that of the CPU approach. Notably, we confirm near-real-time performance by processing large-scale 8K resolution with 16 viewpoints in just tens of milliseconds. This achievement significantly alleviates the computational bottleneck in large-scale holographic image synthesis, bringing real-time 3D holographic displays one step closer to realization. Furthermore, the proposed GPU acceleration technique is expected to serve as a foundational technology for real-time high-resolution hogel image generation in next-generation immersive display devices such as AR/VR/XR.

Keywords:

3D display; AR/VR/XR; holographic stereogram; hogel generation; 2D parallax images

1. Introduction

Hologram technology, which records and reproduces the wavelength information of three-dimensional objects to deliver realistic volumetric images, has attracted attention as a next-generation display technology [,]. In particular, advances in computer-generated holography (CGH) and holographic printing techniques have enabled a variety of applications for visualizing 3D virtual models in the real world [,]. Among these holographic 3D imaging methods, the holographic stereogram (HS) is the most widely used: it converts multiple 2D parallax images into an array of small holographic elements called hogels and reconstructs a 3D image incoherently []. Each hogel, as a single holographic element, displays different brightness levels from different viewing angles and forms a volumetric image through plane-wave diffraction. The advantage of the HS method is that unlike conventional wavefront-based holograms, it can record multi-angle images without complex diffraction calculations, making digital content generation relatively simple [].

However, existing hogel image generation methods face significant limitations. Traditionally, hogel images are constructed by sequentially processing M × N parallax images using the pixel-rearrangement algorithm proposed by Bjelkhagen et al. [], but this approach suffers from rapidly increasing memory usage and computation time as the number of perspectives grows [,]. For example, processing hundreds of parallax images for high-resolution 3D video requires vast memory resources and long computation times, posing a major obstacle to real-time processing [,]. In practice, even when employing various acceleration techniques proposed to date—such as LUT-based, WRP-based, and FPGA acceleration—the real-time color reproduction of large-format, high-resolution holograms remains challenging []. In particular, as the display size or resolution increases, the trade-off between computation time and image quality becomes more pronounced, further limiting the real-time synthesis of high-quality holographic images []. Therefore, accelerating hogel image generation has emerged as an urgent research task for enabling immersive 3D displays (e.g., AR/VR headsets and multi-viewpoint displays) [].

In this paper, we propose a GPU-parallel-processing-based method for high-resolution hogel image generation to address these challenges. By leveraging the thousands of cores in modern graphics processing units (GPUs), we have massively parallelized the traditional CPU-based sequential hogel generation algorithm, significantly accelerating processing speed. Without introducing any additional complex hologram generation algorithms, our approach simply parallelizes the existing computations and achieves a 92% reduction in processing time for 8K images. Other studies have similarly reported 60–66% reductions in hologram computation time via GPU acceleration [] or over 94% reductions when combined with novel algorithms to enable 24 fps real-time playback []. Our results demonstrate that high-resolution hogel images can be generated rapidly on a single GPU, paving the way for real-time or near-real-time high-resolution hogel generation in immersive AR/VR/XR displays within metaverse environments.

The organization of this paper is as follows. First, we review related work to examine the limitations of existing hogel generation techniques and the use cases for GPU parallelization. Next, we describe the detailed implementation and system architecture of the proposed GPU-parallel hogel generation method. We then quantitatively evaluate the performance of our approach through experimental results and analyze the visual outputs from various perspectives. Finally, in the conclusion, we summarize the significance of this study and discuss potential future applications.

2. Related Work

Hogel image generation techniques have been developed over many years by CGH researchers. Traditional methods use a pixel-rearrangement algorithm to reorder multiple parallax images into individual hogels [], which is relatively simple to implement but leads to an exponential increase in computation as resolution or the number of viewpoints grows. For example, generating an FHD hogel image by copying pixels one by one from multiple 1920 × 1080 parallax images involves nested double loops on the CPU, resulting in delays that become impractical for real-time processing at 4K or 8K resolutions. Indeed, one study pointed out that “existing LUT-, WRP-, and GPU-based methods are all insufficient for the real-time playback of large-scale color holograms” []. Another study reported that for large-format 3D holographic displays, a trade-off between computation time and image quality limits real-time, high-quality synthesis [].

To overcome these limitations, various acceleration techniques have been explored. Numerous CGH computation-optimization algorithms have been proposed—such as the NLUT method, which enhances look-up tables (LUTs); voxel-based and hierarchical algorithms; and optical-path segmentation optimizations []—yet none fully eliminate the fundamental computational burden. On the hardware side, FPGA acceleration and GPU-based parallelization have also been attempted [,,,]. For example, Ma et al. reported a superpixel-based sub-hologram approach that reduced computations by 94.9% and achieved real-time color hologram generation at over 24 fps on a GPU [], while Cao et al. successfully generated 3D holographic video in real time using two GPUs []. More recently, Dashdavaa et al. highlighted the importance of parallel computing in hogel-based holographic printing, demonstrating that because hogels are independent, their synthesis can be parallelized on GPUs for high-resolution holograms []. Kim et al. introduced several GPU optimization techniques—such as memory coalescing, block-level parallel processing, and overlapping computation with data transfer—to accelerate point-based hologram generation []. These prior works collectively suggest that the massive parallelism of GPUs can dramatically improve hologram computation speeds. However, most existing studies have focused on developing new hologram generation algorithms or leveraging multi-GPU infrastructures; examples of real-time implementation of the traditional pixel-rearrangement hogel algorithm on a single, general-purpose GPU remain rare.

In this paper, we differentiate our approach by parallelizing the conventional pixel-rearrangement hogel algorithm on a modern GPU to demonstrate high-resolution hogel generation. By simply restructuring the existing algorithm’s compute operations into parallel form, we measure its pure performance gains and establish a foundation that can readily integrate with other high-speed hologram-synthesis techniques. This approach directly implements and validates the GPU parallelization strategy previously suggested for reducing high-resolution hogel generation delays [].

3. Materials and Methods

3.1. Overview of GPU Parallelization of the Hogel Generation Process

The proposed method restructures the entire hogel-image generation pipeline to execute in parallel on a GPU. Figure 1 illustrates this generation process. First, multiple perspective images—each a 2D capture of the same 3D scene from different viewing angles—are provided as input. Given an

M \times N

array of these perspective images, the hogel images are produced by a pixel-rearrangement algorithm: pixels at the same location across all perspectives are gathered to form a single hogel. More formally, if we represent the set of perspective images as a matrix

P

, then the set of hogel images

H

is obtained as follows:

Figure 1. GPU parallelization of the hogel generation process: (a) Overview of GPU parallel processing for multiple perspective images; (b) GPU-based hogel image generation pipeline.

Hogel Generation as Pixel Rearrangement: The pixel value at coordinate

(x, y)

in each perspective image is extracted and combined to form a single hogel image

H_{x, y}

. Here, each hogel image

H_{x, y}

has dimensions

M \times N

, and its pixel at position

(i, j)

corresponds to the

(x, y)

pixel of perspective image

P_{i, j}

. Performing this operation for every

(x, y)

location produces

M \times N

hogel images. In this process, the resolution of each hogel (i.e., the number of pixels per hogel image) is determined by the total number of perspectives

M \times N

; increasing the number of perspective images yields larger hogel images with higher angular resolution. However, as the number of perspectives grows, the computational workload and memory requirements increase dramatically.

P = M \times N

(1)

Equation (1) defines the total number of perspective images

P

as the product of

M

and

N

, where

M

and

N

are the number of images arranged horizontally and vertically, respectively.

W_{p} \times H_{p}

(2)

Equation (2) states that each perspective image has a horizontal resolution

W_{p}

and a vertical resolution

H_{p}

and that the total number of pixels in each image is given by

W_{p} \times H_{p}

.

W_{o u t} = M \times W_{p}, H_{o u t} = N \times H_{p}

(3)

Equation (3) defines that the output hogel image, formed by rearranging the

M \times N

perspective images, has a total width

W_{o u t} = M \times W_{p}

and a total height

H_{o u t} = N \times H_{p}

.

i = ⌊\frac{u}{W_{p}}⌋ (x = u m o d W_{p}), j = ⌊\frac{v}{H_{p}}⌋ (y = v m o d H_{p})

(4)

Equation (4) determines, for a pixel at position

(u, v)

in the output hogel image, which perspective image and which coordinates

(x, y)

it was copied from. The perspective-image column index is given by the quotient

⌊\frac{u}{W_{p}}⌋

and the local column coordinate by the remainder

u m o d W_{p}

; the row index and local row coordinate are obtained similarly as

⌊\frac{v}{H_{p}}⌋

and

v m o d H_{p}

.

H (u, v) = I_{j \times M + i} (x, y) (0 \leq u \leq W_{o u t}, 0 \leq v \leq H_{o u t})

(5)

Equation (5) forms the final hogel image by copying the pixel at

(x, y)

from the selected perspective image

I_{i, j}

directly into the output image at position

H (u, v)

.

T_{O P S} = P \times W_{p} \times H_{p} = W_{p} H_{p} M N = (M W_{p}) \times (N H_{p}) = O (W_{o u t} \times H_{o u t})

(6)

Equation (6) expresses that the total number of pixel-copy operations—i.e., the time complexity

T_{O P S}

required to generate one hogel image equals the total number of output pixels

W_{o u t} \times H_{o u t}

. Because

W_{o u t} = M W_{p}

and

H_{o u t} = N H_{p}

(where

M

and

N

are the numbers of horizontal and vertical views, and

W_{p}

and

H_{p}

are the per-view width and height),

T_{O P S} \propto W_{p} H_{p} M N

. Let

P = M \times N

denote the total number of perspective images (viewpoints); then, Equation (6) can be written equivalently as

T_{O P S} = P \times W_{p} \times H_{p}

. In other words, the algorithm’s execution time increases linearly with both the number of perspective images

P

and the per-view resolution.

GPU Kernel Implementation: In this study, the pixel-rearrangement operation described above is executed in parallel on the GPU. Whereas the CPU version relies on nested loops, the GPU version distributes the work so that thousands of threads each handle a single pixel-copy operation. We implement a CUDA-based kernel in which each thread reads the value of a specific element from the input perspective-image array

P

and writes it to the corresponding position in the output hogel array

H

. The number of CUDA blocks launched is set to

⌈\frac{T}{C}⌉

, where

T

is the total number of pixels to process (i.e., the sum of all perspective-image pixels) and

C

is the number of threads per block. Each thread computes its unique global index

p

, uses it to determine its assigned pixel’s coordinates, and then performs the memory read from

P

and writes to

H

. In this way, tens of millions of pixel-copy operations can be performed concurrently on the GPU, completing the entire hogel-rearrangement process in just a few milliseconds.

p = b o l c k I d x . x \times b l o c k D i m . x + t h r e a d I d x . x, p < W_{o u t} \times H_{o u t}

(7)

Each thread running on the GPU is assigned to a unique index

p

. This index is computed by multiplying the block index by the number of threads per block and then adding the thread’s index within the block, as shown in Equation (7). The thread proceeds with further operations only if

p

is less than the total number of output pixels

(W_{o u t} \times H_{o u t})

.

v = ⌊\frac{p}{W_{o u t}}⌋, u = p m o d W_{o u t}

(8)

Using the quotient and remainder of the global index

p

divided by the output width

W_{o u t}

, the 2D coordinates

(v, u)

in the output image are obtained as shown in Equation (8), where

v

denotes the row coordinate and

u

denotes the column coordinate.

r = ⌊\frac{v}{H_{p}}⌋, c = ⌊\frac{u}{W_{p}}⌋, k = r \times M + c

(9)

To determine which hogel tile—that is, which perspective image—this pixel belongs to, we define the tile row index

r

as the quotient of

v

divided by the single-view vertical resolution

H_{p}

and the tile column index

c

as the quotient of

u

divided by the single-view horizontal resolution

W_{p}

. The final view index

k

is then computed as

k = r \times M + c

, as shown in Equation (9).

v^{'} = v m o d H_{p}, u^{'} = u m o d W_{p}

(10)

Next, the local pixel coordinates within the selected view are calculated as shown in Equation (10).

δ = v^{'} \times W_{p} + u^{'}

(11)

The pixel offset

δ

within the view is computed by multiplying the local row coordinate

v^{'}

by the view’s width

W_{p}

and then adding the local column coordinate

u^{'}

, as shown in Equation (11).

s = k \times (W_{p} \times H_{p}) + δ

(12)

By adding this offset to the starting position of view

k

in memory—namely

k \times (W_{p} \times H_{p})

—we obtain the final source-image linear index

s

, as shown in Equation (12).

O [3 t + b] = I [3 s + b]

(13)

Finally, assuming a three-channel (RGB) image, the pixel for each channel

b = 0, 1, 2

is copied with a single memory read–write operation, as shown in Equation (13).

T_{O P S} = \frac{W_{o u t} \times H_{o u t}}{C} = O (\frac{P \times W_{p} \times H_{p}}{C})

(14)

Although the total workload remains proportional to the output pixel count

W_{o u t} \times H_{o u t}

, distributing it in parallel across

C

hardware threads allows the actual execution time to be modeled as shown in Equation (14). In this manner, each thread independently performs its pixel mapping and copy operations, enabling the entire hogel-rearrangement process to complete in just a few milliseconds.

3.2. GPU Implementation Optimizations

To maximize GPU performance, we focused on memory-use optimizations and thread-structure design. First, we used pinned memory to boost the CPU–GPU data-transfer bandwidth and reduce the time needed to upload large volumes of perspective images to the GPU. Next, we arranged the perspective-image data in a contiguous memory buffer on the GPU so that threads would achieve coalesced global-memory access. Concretely, all perspective images are stored sequentially in a single array

P

, and each thread uses its linear thread index to access the corresponding element, ensuring that adjacent threads are read from adjacent memory locations. Similarly, the output hogel images are allocated in one contiguous array

H

so that each thread’s write operations are also coalesced. This access pattern resembles a large matrix transpose for which shared-memory tiling is a well-known optimization. However, in our case, each input pixel is used exactly once and never reused, so shared-memory tiling offers little benefit. Nonetheless, if output writes become heavily non-contiguous, one could load input tiles into shared memory within a block, transpose them, and then write them out to achieve coalesced writing. In summary, our implementation maximizes parallel efficiency without extra computation by optimizing global-memory access patterns and mapping CUDA threads efficiently. Additionally, we considered GPU memory-management strategies for handling high-resolution, multi-view data. Loading hundreds of 8K-resolution images into GPU memory at once can consume a tremendous amount of memory. To mitigate this, we can employ streaming techniques instead of loading all data simultaneously. For example, rather than uploading all 8K perspective images to GPU memory at once, we can partition them into smaller chunks (e.g., sub-blocks of the matrix) and process them sequentially. Alternatively, we can use multiple CUDA streams to overlap data transfers with kernel execution, thereby reducing memory pressure and overall latency.

High-resolution (e.g., 8K) multi-view data dramatically increase GPU memory usage. In this study, we reduced peak memory consumption and overall latency by (i) employing transfer–compute–overlapped streaming with pinned host memory and multiple CUDA streams; (ii) arranging the input and output as single contiguous buffers (P and H) to guarantee coalesced accesses; and (iii) when necessary, using shared-memory tile-transpose schemes to achieve coalesced writes. Looking ahead, we plan to combine a CUDA memory pool and Unified Memory with prefetch/advise, the on-device decoding of compressed inputs, multi-resolution and ROI-based selective high-fidelity processing, and out-of-core streaming via multi-GPU/NVLink and GPUDirect Storage, thereby scaling both memory efficiency and throughput for larger datasets.

3.3. System Architecture and Workflow

The overall system workflow is illustrated in Figure 2:

Data Preparation: Prepare multiple perspective images on the CPU or acquire them via a camera array or rendering. In our experiments, we used 16 perspective images at FHD, QHD, 4K, and 8K resolutions.
Data Transfer: Copy the set of perspective images into GPU memory, using the optimized transfer techniques described above.
Parallel Hogel Generation: Launch the CUDA kernel on the GPU to perform pixel rearrangement. Millions to billions of individual pixel-copy operations run in parallel to produce the hogel image array $H$ .
Output: Either display the generated hogel images directly from the GPU or, if needed, transfer them back to CPU memory.

Figure 2. Hogel image generation system architecture diagram.

In this process, we fully leveraged the GPU to ensure real-time performance while the CPU was dedicated to input preparation and result handling. Since our approach does not introduce any complex algorithmic transformations, it is easy to implement, highly robust, and exploits the GPU’s parallel computing resources to achieve a dramatic speedup over the conventional method.

4. Results and Discussion

4.1. Experimental Setup and Configuration

To evaluate the performance of the proposed GPU-parallel hogel generation method, we conducted experiments under the following setup. The hardware platform comprised an Intel Core i9-14900KS CPU (3.2 GHz base clock, 24 cores) and an NVIDIA GeForce RTX 5090 GPU (21,760 CUDA cores, 32 GB VRAM). The software implementation used CUDA C/C++ for the kernel code, which was built with the CUDA 12 Toolkit and NVIDIA drivers on Windows 11. The CPU baseline was the same algorithm implemented in single-threaded C++ without any optimizations, while the GPU version incorporated all of the optimization techniques described in Section 3.

The perspective images were color captures of a 3D scene from multiple viewpoints, which were either rendered virtually or acquired with a multi-camera array. The hogel generation algorithm consumed all input views to produce a full set of hogel images at the target resolution. For instance, in the 8K scenario, 16 separate 8K-resolution perspective images were used to generate a single 8K-resolution hogel image, entailing a very large computational load. In every test, the GPU’s clock frequency was held constant, and each configuration was executed at least five times to compute the average performance and reduce variance.

4.2. Speed Comparison and Analysis

Table 1 compares the average times required to generate hogel images on a single threaded CPU versus a GPU for various resolutions and numbers of viewpoints. Table 1 reports programmatically measured runtimes for executing the Section 3 pipeline across image resolutions under 16 × 1 and 16 × 16 viewpoint configurations. The GPU-accelerated implementation showed a significant speedup over the CPU in every case. For example, at FHD resolution with 16 × 1 viewpoints, the CPU required approximately 83.2 ms (≈12 fps), which the GPU reduced to 16 ms (≈62.5 fps), a 5.2× speedup. At 8K resolution with 16 × 1 viewpoints, the CPU needed about 1.05 s, while the GPU took only around 80 ms, achieving over a 13× acceleration. Generally, as resolution increases, CPU execution time grows more than linearly, whereas GPU execution time increases relatively slowly, leading to larger speedup factors at higher resolutions. For instance, the maximum speedup of about 8.2× at QHD resolution occurs because the GPU’s absolute time stays nearly the same as at FHD (16 ms), while the CPU time grows more with increased resolution. Even at 4K resolution, the GPU achieves real-time performance (48 ms, >20 fps), whereas the CPU slows to 248.8 ms, which is a ~5.1× difference.

Table 1. Average hogel-image generation times and speedups on CPU vs. GPU.

Unless otherwise stated, the numbers in Table 1 are end-to-end GPU times measured for the sequence H2D copy → kernel execution → D2H copy on a single CUDA stream without overlap. We timed the GPU using CUDA events with sub-millisecond resolution:

Create events (cudaEventCreate);
Record before/after each stage (cudaEventRecord);
Ensure completion (cudaEventSynchronize); and
Compute elapsed times (cudaEventElapsedTime).

We report

T_{H 2 D}, T_{K}, T_{D 2 H}

, and the total

T_{G P U} = T_{H 2 D} + T_{K} + T_{D 2 H}

. For fairness with the CPU baseline, timing excludes disk I/O and covers only the in-memory Section 3 pipeline. Each setting was warmed up (10 iterations) to stabilize JIT and caches, which was followed by 50 timed iterations; we report the mean (and use the median to verify robustness) with the GPU application clocks fixed and with pinned (page-locked) host memory plus cudaMemcpyAsync for transfers. We also verified kernel-only timings (recording events immediately around the kernel); these follow the same trend but are smaller than the end-to-end times reported in Table 1. Figure 3 depicts this measurement workflow.

Figure 3. Hogel generation runtime measurement workflow.

For the case with an increased viewpoint count of 16 × 16 (256 total views), the GPU’s superior performance was again maintained. Across FHD–4K resolutions, speedup factors ranged from approximately 5× to 8×. In the most demanding scenario—8K resolution with 256 views—the GPU completed processing in 1.28 s, whereas the CPU required 16.95 s, which was a 13.2× speedup. Although the 8K/256-view combination still falls short of real time (>20 fps) in absolute terms, reducing a task that once took tens of seconds to just over one second demonstrates its practical viability. In other words, our method makes it feasible to synthesize ultra-high-resolution 8K hogel images—previously virtually impossible—within practical time frames. Overall, GPU parallelization delivered an average speed up of more than 5× across resolutions from FHD to 8K, effectively eliminating the bottleneck in hogel generation and greatly enhancing the potential for real-time or near-real-time 3D image synthesis.

4.3. Visualization and Analysis

To visually verify the output of the proposed GPU-accelerated hogel generation method, we evaluated the holographic stereogram results for several sample scenes. As a basic validation, we compared the hogel images generated by the CPU version and the GPU version from the same input perspective images and confirmed that the two outputs matched exactly on a pixel-by-pixel basis. This demonstrates that the GPU implementation performs the identical operations of the CPU’s sequential algorithm without any numerical floating-point errors or omissions introduced by parallelization.

Figure 4 illustrates, for the QHD case with

M = N = 16

(a total of 256 views), the visual relationship between the proposed forward hogel transform and the inverse hogel transform. The left column arranges example input view images from

1 \times 1

to

16 \times 16

with row-wise view patterns (black–white–gradient, etc.), and the right column shows the hogel mosaic formed as tiles by gathering, according to Equations (3)–(5), the pixels at identical coordinates from each view. The arrows in the diagram indicate the 1:1 correspondence between the forward (views → hogel) and inverse (hogel → views) mappings. That is, in the forward direction, we use

u = i W_{p} + x, v = j H_{p} + y, H (u, v) = P_{i, j} (x, y)

(15)

while in the inverse direction, the relation becomes

P_{i, j} (x, y) = H (i W_{p} + x, j H_{p} + y)

(16)

Figure 4. Hogel transform and inverse transform for QHD 16 × 16-view images on CPU and GPU.

Thus, the mapping can be exactly reversed; because no interpolation, filtering, or compression is involved, this pure rearrangement is lossless. As in this example, when distinct ramp/tone patterns are assigned to each view, the patterns are exactly restored per view (pixel-wise identical) after the inverse transform, allowing us to visually confirm that there are no numerical errors or omissions due to parallelization. This visualization also explains the mechanism of angular-resolution improvement. Increasing the number of views from

1 \times 1

to

16 \times 16

enlarges each hogel tile by

M \times N

and makes the pixel spacing per viewing angle denser, thereby reducing discrete parallax jumps. Even under the linear time complexity of Equation (6),

T_{O P S} = O (M N W_{p} H_{p})

, our GPU implementation shows a small constant overhead and near-linear scaling, demonstrating practical processing times even for high-viewpoint cases (e.g.,

16 \times 16

). Therefore, while securing angular continuity due to the increased number of views, the method alleviates the previous bottleneck without degrading image quality.

As predicted by Equation (6), the runtime necessarily increases with the number of perspectives and the per-view resolution (

T_{O P S} = O (W_{p} H_{p} M N)

). Nevertheless, our GPU implementation exhibits a small constant factor and near-linear scaling, which keeps wall-clock times within a practical range across the tested settings—16–80 ms for 16-view inputs from FHD to 8K, and 1.28 s for the most demanding 8K/256-view case—corresponding to 5–13× speedups over the CPU. This scalability allows us to raise the number of views to improve angular resolution while maintaining feasible processing times. Whereas multi-viewpoint synthesis was previously limited by computational load, GPU parallelization enables a substantial increase in viewpoint count, resulting in smoother and more natural transitions between views as an observer moves. For example, increasing the number of viewpoints from 16 × 1 to 16 × 16 makes each hogel 16 times larger and reduces the viewing angle per pixel, thereby minimizing discrete jumps between adjacent viewpoints and producing a more continuous 3D image. Because our GPU implementation can process these high-viewpoint datasets rapidly, it ultimately contributes to improved visual quality.

Finally, Figure 5 visualizes performance by plotting processing time against resolution and viewpoint count. The graph shows that compared to the CPU, the GPU’s computation time curve flattens as resolution increases, reflecting a widening gap. This trend matches the values in Table 1 and highlights that the benefits of GPU parallel processing become more pronounced as the problem size grows. In summary, the proposed GPU-acceleration method significantly alleviates the computational bottleneck of the conventional approach without degrading output image quality, and its effectiveness is successfully validated through various angular-view renderings.

Figure 5. Speed comparison between CPU and GPU across resolutions and viewpoints.

5. Conclusions

In this study, we proposed and experimentally validated a method leveraging GPU parallel processing to dramatically accelerate high-resolution hogel image generation. By implementing the traditional CPU-based sequential hogel generation algorithm on the massively parallel architecture of a modern GPU, we achieved near-real-time performance while preserving the output image’s resolution and quality. Experimental results show that GPU parallelization delivers over a 5× speedup across resolutions from FHD to 8K and enables processing very large-scale inputs—such as 8K resolution with 16 viewpoints—in practical time frames. This achievement significantly alleviates the longstanding temporal bottleneck in large-scale hologram synthesis, marking a major step toward real-time 3D holographic displays.

The significance and potential applications of this technique can be considered from several perspectives. First, it can be applied to real-time hogel image generation in future augmented reality (AR), virtual reality (VR), and mixed reality (XR) devices. For example, by using our method in head-mounted holographic displays or in metaverse 3D streaming environments, it would be possible to deliver high-resolution images that adapt to the user’s viewpoint without perceptible latency. Second, this approach can be combined with other hologram-acceleration techniques to achieve even greater performance gains. For instance, coupling algorithmic optimizations (such as NLUT) with GPU hardware acceleration could yield tens-fold improvements []. Furthermore, extending the method to multi-GPU systems or leveraging next-generation GPU features (e.g., Tensor Cores) can continue to push the limits of resolution and scale for real-time 3D image synthesis. Going forward, we can consider improving energy efficiency for portable XR headsets—whose single-digit-watt power and tight thermal limits make performance-per-watt as critical as frame rate—by (i) reducing off-chip bandwidth via coalesced access, kernel fusion, and on-chip reuse; (ii) adopting mixed-precision and LUT/NLUT approximations; and (iii) applying head-pose/eye-gaze-guided foveated/ROI hogel synthesis. We can also consider adapting the workload to motion and scene complexity with viewpoint/resolution scaling and DVFS-aware scheduling, employing asynchronous streaming and duty-cycling to cut I/O and idle energy, and, when appropriate, using split/edge rendering to offload non-critical passes while keeping low-latency foveal synthesis on-device to balance latency, bandwidth, and power.

For future work, additional optimizations are required to fully secure real-time performance in video scenarios. Specifically, techniques to reduce data I/O bottlenecks during continuous, frame-by-frame processing and methods to adaptively adjust computational load based on scene complexity (e.g., variable-resolution schemes that process only regions of interest at high resolution) should be explored. Furthermore, integration with dedicated hardware (e.g., hologram-specific processors) or coupling with deep-learning–based correction techniques also presents promising directions. Finally, while prior studies [,,,,,,] have accelerated CGH/HS via EPISM-based hogel-size optimization, commodity GPU/OpenCL and Xeon Phi acceleration, multi-GPU/cluster architectures, and task-specific formulations (e.g., compression or line drawings), our approach parallelizes the classical hogel pixel-rearrangement pipeline on a single GPU without new approximations, achieving 5–13× speedups and pixel-wise equivalence from FHD to 8K under multi-view inputs. Because it is complementary to those algorithmic and hardware strategies, the proposed GPU parallelization provides a practical foundation for next-generation real-time holographic stereogram systems and is readily applicable to other multi-view display paradigms—light-field and integral imaging—thereby contributing to the broader advancement of 3D display technologies.

Author Contributions

Conceptualization, H.K. and B.K.; methodology, H.K.; software, H.K.; validation, H.K.; formal analysis, H.K.; investigation, H.K.; resources, H.K.; writing—original draft preparation, H.K. and B.K.; writing—review and editing, H.K. and Y.S.; visualization, H.K.; supervision, Y.S.; project administration, H.K.; funding acquisition, H.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (RS-2022-II220128, Hologram-based measurement and Inspection substantiation).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

Author Byungjoon Kim was employed by the company Korean AI Certification. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Dashdavaa, E.; Khuderchuluun, A.; Wu, H.-Y.; Lim, Y.-T.; Shin, C.-W.; Kang, H.; Jeon, S.-H.; Kim, N. Efficient Hogel-Based Hologram Synthesis Method for Holographic Stereogram Printing. Appl. Sci. 2020, 10, 8088. [Google Scholar] [CrossRef]
Yan, X.; Zhang, T.; Wang, C.; Liu, Y.; Wang, Z.; Wang, X.; Zhang, Z.; Lin, M.; Jiang, X. View-Flipping Effect Reduction and Reconstruction Visualization Enhancement for EPISM-Based Holographic Stereogram with Optimized Hogel Size. Sci. Rep. 2020, 10, 13492. [Google Scholar] [CrossRef] [PubMed]
Yang, X.; Xu, F.; Zhang, H.; Zhang, H.; Huang, K.; Li, Y.; Wang, Q. High-Resolution Hologram Calculation Method Based on Light Field Image Rendering. Appl. Sci. 2020, 10, 819. [Google Scholar] [CrossRef]
Pi, D.; Liu, J.; Wang, Y. Review of computer-generated hologram algorithms for color dynamic holographic three-dimensional display. Light Sci. Appl. 2022, 11, 231. [Google Scholar] [CrossRef] [PubMed]
Bjelkhagen, H.; Brotherton-Ratcliffe, D. Ultra-Realistic Imaging: Advanced Techniques in Analog and Digital Colour Holography, 1st ed.; CRC Press: Boca Raton, FL, USA, 2013. [Google Scholar]
Li, Y.-L.; Wang, D.; Li, N.-N.; Wang, Q.-H. Fast Hologram Generation Method Based on the Optimal Segmentation of a Sub-CGH. Appl. Opt. 2021, 60, 4235–4244. [Google Scholar] [CrossRef] [PubMed]
Cao, H.; Jin, X.; Ai, L.; Kim, E.-S. Faster Generation of Holographic Video of 3-D Scenes with a Fourier Spectrum-Based NLUT Method. Opt. Express 2021, 29, 39738–39754. [Google Scholar] [CrossRef] [PubMed]
Sugie, T.; Akamatsu, T.; Nishitsuji, T.; Hirayama, R.; Masuda, N.; Nakayama, H.; Ichihashi, Y.; Shiraki, A.; Oikawa, M.; Takada, N.; et al. High-performance parallel computing for next-generation holographic imaging. Nat. Electron. 2018, 1, 254–259. [Google Scholar] [CrossRef]
Kwon, M.W.; Kim, S.C.; Yoon, S.E.; Ho, Y.S.; Kim, E.S. Object tracking mask-based NLUT on GPUs for real-time generation of holographic videos of three-dimensional scenes. Opt. Express 2015, 23, 2101–2120. [Google Scholar] [CrossRef] [PubMed]
Sato, H.; Kakue, T.; Ichihashi, Y.; Endo, Y.; Wakunami, K.; Oi, R.; Yamamoto, K.; Nakayama, H.; Shimobaba, T.; Ito, T. Real-time color hologram generation based on ray-sampling plane with multi-GPU acceleration. Sci. Rep. 2018, 8, 1500. [Google Scholar] [CrossRef] [PubMed]
Niwase, H.; Naoki, T.; Hiromitsu, A.; Yuki, M.; Masato, F.; Hirotaka, N.; Takashi, K.; Tomoyoshi, S.; Tomoyoshi, I. Real-time electro holography using a multiple-graphics processing unit cluster system with a single spatial light modulator and the InfiniBand network. Opt. Eng. 2016, 55, 093108. [Google Scholar] [CrossRef]
Ma, H.; Wei, C.; Wei, J.; Han, Y.; Liu, J. Superpixel-Based Sub-Hologram Method for Real-Time Color Three-Dimensional Holographic Display with Large Size. Opt. Express 2022, 30, 4235–4244. [Google Scholar] [CrossRef] [PubMed]
Khuderchuluun, A.; Piao, Y.-L.; Erdenebat, M.-U.; Dashdavaa, E.; Lee, M.-H.; Jeon, S.-H.; Kim, N. Simplified Digital Content Generation Based on an Inverse-Directed Propagation Algorithm for Holographic Stereogram Printing. Appl. Opt. 2021, 60, 4235–4244. [Google Scholar] [CrossRef] [PubMed]
Kim, D.-W.; Lee, Y.-H.; Seo, Y.-H. High-speed computer-generated hologram based on resource optimization for block-based parallel processing. Appl. Opt. 2018, 57, 3511–3518. [Google Scholar] [CrossRef] [PubMed]
Magallón, J.A.; Blesa, A.; Serón, F.J. Monte–Carlo Techniques Applied to CGH Generation Processes and Their Impact on the Image Quality Obtained. Eng. Rep. 2025, 7, e1410. [Google Scholar] [CrossRef]
Ahrenberg, L.; Benzie, P.; Magnor, M.; Watson, J. Computer generated holography using parallel commodity graphics hardware. Opt. Express 2006, 14, 7636–7641. [Google Scholar] [CrossRef] [PubMed]
Shimobaba, T.; Ito, T.; Masuda, N.; Ichihashi, Y.; Takada, N.; Oikawa, M. Fast Calculation of Computer-Generated Hologram on AMD HD5000 Series GPU and OpenCL. arXiv 2010, arXiv:1002.0916. [Google Scholar] [CrossRef] [PubMed]
Murano, K.; Shimobaba, T.; Sugiyama, A.; Takada, N.; Kakue, T.; Oikawa, M.; Ito, T. Fast Computation of Computer-Generated Hologram Using Xeon Phi Coprocessor. arXiv 2013, arXiv:1309.2734. [Google Scholar] [CrossRef]
Endo, Y.; Shimobaba, T.; Kakue, T.; Ito, T. GPU-accelerated compressive holography. Opt. Express 2016, 24, 8437–8445. [Google Scholar] [CrossRef] [PubMed]
Nishitsuji, T.; Blinder, D.; Kakue, T.; Shimobaba, T.; Schelkens, P.; Ito, T. GPU-accelerated calculation of computer-generated holograms for line-drawn objects. Opt. Express 2021, 29, 12849–12866. [Google Scholar] [CrossRef] [PubMed]
Watanabe, S.; Jackin, B.J.; Ohkawa, T.; Ootsu, K.; Yokota, T.; Hayasaki, Y. Acceleration of Large-Scale CGH Generation Using Multi-GPU Cluster. In Proceedings of the 2017 Fifth International Symposium on Computing and Networking (CANDAR), Aomori, Japan, 19–22 November 2017; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar]

Figure 1. GPU parallelization of the hogel generation process: (a) Overview of GPU parallel processing for multiple perspective images; (b) GPU-based hogel image generation pipeline.

Figure 3. Hogel generation runtime measurement workflow.

Figure 4. Hogel transform and inverse transform for QHD 16 × 16-view images on CPU and GPU.

Figure 5. Speed comparison between CPU and GPU across resolutions and viewpoints.

Table 1. Average hogel-image generation times and speedups on CPU vs. GPU.

Resolution $(W \times H)$	Views $(M \times N)$	CPU Time $(m s)$	GPU Time $(m s)$	Speedup (×)
FHD (1920 × 1080)	1 × 1	83.2 $m s$ ( $\approx$ 12 FPS)	16 $m s$ ( $\approx$ 62.5 FPS)	5.2
QHD (2560 × 1440)	16 × 1	131.2 $m s$ ( $\approx$ 7.6 FPS)	16 $m s$ ( $\approx$ 62.5 FPS)	8.2
4K (3840 × 2160)	16 × 1	248.8 $m s$ ( $\approx$ 3.5 FPS)	48 $m s$ ( $\approx$ 20.8 FPS)	5.1
8K (7680 × 4320)	16 × 1	1052.8 $m s$ ( $\approx$ 0.9 FPS)	80 $m s$ ( $\approx$ 12.5 FPS)	13.1
FHD (1920 × 1080)	16 × 16	1280 $m s$ ( $\approx$ 0.7 FPS)	256 $m s$ ( $\approx$ 3.9 FPS)	5
QHD (2560 × 1440)	16 × 16	2099.2 $m s$ ( $\approx$ 0.4 FPS)	256 $m s$ ( $\approx$ 3.9 FPS)	8.2
4K (3840 × 2160)	16 × 16	4608 $m s$ ( $\approx$ 0.2 FPS)	819.2 $m s$ ( $\approx$ 1.2 FPS)	5.6
8K (7680 × 4320)	16 × 16	16,947.2 $m s$ ( $\approx$ 0.05 FPS)	1280 $m s$ ( $\approx$ 0.7 FPS)	13.2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

High-Resolution Hogel Image Generation Using GPU Acceleration

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Overview of GPU Parallelization of the Hogel Generation Process

3.2. GPU Implementation Optimizations

3.3. System Architecture and Workflow

4. Results and Discussion

4.1. Experimental Setup and Configuration

4.2. Speed Comparison and Analysis

4.3. Visualization and Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics