Holographic 3D Display Using Depth Maps Generated by 2D ‐ to ‐ 3D Rendering Approach

: Holographic display has the potential to be utilized in many 3D application scenarios be ‐ cause it provides all the depth cues that human eyes can perceive. However, the shortage of 3D content has limited the application of holographic 3D displays. To enrich 3D content for holographic display, a 2D to 3D rendering approach is presented. In this method, 2D images are firstly classified into three categories, including distant view images, perspective view images and close ‐ up images. For each category, the computer ‐ generated depth map (CGDM) is calculated using a corresponding gradient model. The resulting CGDMs are applied in a layer ‐ based holographic algorithm to obtain computer ‐ generated holograms (CGHs). The correctly reconstructed region of the image changes with the reconstruction distance, providing a natural 3D display effect. The realistic 3D effect makes the proposed approach can be applied in many applications, such as education, navigation, and health sciences in the future.


Introduction
Holography is a technology which can build mathematical and physical connections between targets and holographic fringes. Thus, it has been widely employed in the fields of 3D imaging and 3D display. In the field of 3D imaging, captured holographic fringes are often employed to reconstruct corresponding targets [1][2][3]. Applications of holographic imaging include sonar [4], radar [5], microscopy [6], et al. In the field of 3D display, holographic fringes are often calculated from targets by algorithms [7][8][9]. As the holographic display can provide all the depth cues that human eyes are capable of perceiving, it is considered a promising option for 3D display [10][11][12][13]. It has the potential to be utilized in many augmented reality (AR) application scenarios, including video education [14,15], spatial cognition and navigation [16], and health sciences [17].
Currently, the shortage of 3D content limits the application of holographic 3D displays. Three-dimensional acquisition devices, including light-field cameras [18] and timeof-flight (TOF) cameras [19], are regarded as the solution to produce 3D content. For lightfield cameras, the quality of image reduces with the shooting distance [20]. The additional processes that are required to address this issue [21] increase system complexity. For most TOF cameras, the resolution is insufficient, leading to lower quality displays with limited definition. In addition, the production of 3D content by 3D acquisition devices is expensive and hardware intensive. Furthermore, existing 2D content cannot be fully utilized in 3D acquisition devices.
2D-to-3D rendering provides an alternative option to enrich 3D content. Various features, including edge [22,23], texture [22,24], color [25], and motion [24,26], have been used to calculate computer-generated depth maps (CGDMs) from 2D images. However, a 2D-to-3D rendering approach that uses only one of these features may not be widely applicable. Therefore, 2D-to-3D rendering approaches that use mixed features have been employed [27][28][29]. The CGDMs calculated by the mixed-features-based method are more stable than that by single-feature-based method. Currently, most 2D-to-3D rendering approaches are utilized for spatial cognition and image identification. The CGDMs are usually not optimized for holographic algorithms.
With the development of machine learning technology, learning-based 2D-to-3D rendering approaches have also been widely employed to enrich 3D content [30][31][32][33][34]. Learning-based approaches utilize deep neural network to generate CGDMs of 2D images, which have the advantages of strong ability of generalization and high accuracy of depth estimation. However, learning-based approaches need tons of data for training. Obtaining reliable 2D/3D data pairs is a challenging task for current learning-based approaches.
In this study, we present a 2D-to-3D rendering approach with mixed features. Based on features, 2D images are first classified into three categories, including the distant view, perspective, and close-up types. The CGDM for each category is obtained by using a corresponding model. The obtained CGDMs have been optimized for the layer-based holographic algorithm and can be applied directly to calculate the computer-generated holograms (CGHs) of the 2D images. The resulting CGHs provide 3D reconstructions with prominent depth variations.

Distant View Images
Currently, 2D images are usually captured by 2D cameras. Most 2D cameras can be simplified as lens-based imaging systems, as shown in Figure 1a. When dis is the distance between the object and the focal plane in the object space, f is the focal length in the object space, and siz is the size of the object, then the size of the object in the captured 2D image siz' can be obtained from Newton's image formula (Equation (1)): An object located at an infinite distance from the lens appears to be a point in a 2D image. This point is called the vanishing point. In a 2D image, as the distance to the vanishing point changes, objects with a same size appear to be distributed along divergent lines originating at the vanishing point, as shown in Figure 1b. These divergent lines are called vanishing lines. There are two types of 2D images that contain a vanishing point and vanishing lines. The first type is a 2D image with a large shooting distance. These images are often captured outdoors, and typically present the sky, land areas, and water bodies. These type of images are referred to as distant view images, which primarily capture scenes on the horizontal plane. The second type is a 2D image with a moderate shooting distance, and such images contain an obvious perspective effect. The scenes on both the horizontal and vertical planes are presented, and such images are referred to as perspective images. The vanishing point of a distant view image is always located on the borderline between the sky and other physical elements (Figure 1c), while that of a perspective image is located near the central area of the image (Figure 1d).
The CGDMs of these two types of 2D images are calculated according to different depth gradient models in the proposed method. Therefore, the image type should be firstly identified. Identification of a distant view image uses the color feature. The 2D image is transformed from the RGB color space to the hue-saturation-intensity (HSI) color space [35]. Pixels representing the sky, land areas, and water bodies, have typical pixel values in the range [36]: where H, S, and I are the hue, saturation, and intensity, respectively. In addition, p is the proportion of the pixels whose values are in the above range (Equation (2)). When p > 0.5, the 2D image is classified as a distant view type. Because the vanishing point of a distant view image is always located on the borderline between the sky and other physical elements, the CGDM is calculated using a cumulative horizontal edge histogram [37]. In this model, the sky is assumed to be infinitely far from the observer. The distances to other physical elements are linearly far-to-near, from the top edge of the image to its bottom edge. The borderline is therefore distinguished first, and subsequently the CGDM depth (x, y) can be expressed as: where BD is the bit depth of the CGDM, N is the pixel number of the CGDM in the vertical direction, and ybo is the vertical coordinate value of the borderline. A larger pixel value indicates that the point is nearer to the observer. The pixel value for the sky is assigned as zero. As the distant view image appears far-to-near, extraction of the vanishing point and vanishing lines is unnecessary in the cumulative horizontal edge histogram.

Perspective View Images
If p ≤ 0.5, it is necessary to determine if the 2D image is a perspective type. This is determined by edges extracted from the original image. Edges in the 2D image are extracted using Canny algorithm [38]. The Hough transform [39] is used to detect straight lines from the edges. If and only if straight lines intersect at one point, the intersection is regarded as a vanishing point. The existence of a vanishing point is key to determining whether the 2D image belongs to the perspective type.
For the perspective image, the vanishing point is regarded as the farthest point. Since a typical perspective scene will contain image data in both the horizontal and vertical planes, the CGDMs for content on the two planes are calculated separately. Vanishing lines are used to distinguish the horizontal and vertical planes [40]. The CGDMs can be calculated by: where depth_h and depth_v are the depth gradients on the horizontal and vertical planes, respectively. Additionally, (xvp, yvp) is

Close-Up Images
If no convergence point is detected in the image, or multiple convergence points are detected, the image will be classified as a close-up image, rather than a perspective view image. For a close-up image, the CGDM can be found using occlusion [41]. Regions that contain fewer edges generally represent that they are farther away. The spatial relationship of a series of objects with multiple depths can be easily determined by counting the amount of edges. The local edge histogram [40] is employed to calculate the CGDM. As shown in Figure 2, the edges of the 2D image are first extracted by Canny algorithm. Then, the image of the edges is divided into 5 × 5 blocks. The number of edges Nij in each block is counted. Blocks where Nij is larger than the average (Nav) are defined as the main blocks. The total number of main blocks is M, while the number of edges in each main block is denoted by N1', N2', …, NM'. From a series of simulations, the reliable CGDM for each main block is proved to be a circle with a depth gradient. The center of the circle locates at the center of the corresponding block. The pixel values of the center and the circumference are assigned as 255 and 0, respectively. Meanwhile, the pixel value decreases evenly from the center to the circumference. The radius of the circle is obtained by traversal comparisons, and an optimized radius is the half the length (or width) of the image. The CGDM for the entire close-up image is obtained by fusing together the depth maps of the main blocks. If Di is the CGDM of the lth main block, the fused CGDM Df can be expressed as:

Calculation of CGHs
A layer-based holographic algorithm [42] is employed to calculate the CGHs. Because an 8-bit CGDM is employed, the 3D model obtained by 2D-to-3D rendering is sliced into 256 parallel layers. A random phase r (x, y) is superposed on each layer to simulate the diffusive effect of the object surface. The complex amplitude distribution on the holographic plane Ecom (x, y) is calculated as follows: where FT represents the Fourier transform, Ul (x, y) is the amplitude of the lth layer, zl is the distance between the lth layer and the holographic plane, λ is the wavelength, u and v are the spatial frequencies, and α and β are the angles between the incident wave and the x-and y-directions, respectively. As this study uses a phase-only spatial light modulator (SLM), the phase-only distribution Ep (x, y) should be extracted from Ecom (x, y) (Equation (7)).

Reconstruction of CGHs
To verify the effectiveness of the 2D-to-3D rendering approach in practical applications, we built a phase-only holographic display system, as shown in Figure 3. The illumination laser beam was filtered by a pinhole and collimated by a lens. The CGH of was uploaded on a phase-only SLM. The SLM employed in the system was a Holoeye Gaea-2 VIS. After being modulated by the CGH on the SLM, the reconstructed wavefront was captured by a Canon 60D digital camera. In the experimental system, the illumination wavelength, pixel pitch, and resolution of the CGH are 532 nm, 3.74 μm and 2000 × 2000, respectively. During the experiments, the camera is placed at shooting distances of 155, 160, 165, and 170 mm, respectively. The captured results are shown in Figure 4. When the shooting distance is 155 mm, the part of the reconstruction that is nearest to the camera is clear, while the distant part is blurred. As the shooting distance varies, the focus position of the reconstruction also changes. Hence, a 3D effect can be obtained using CGDMs in this 2D-to-3D rendering approach.

Discussion
In Figure 5a, objects A and B are the same size, but are placed at different shooting distances. Thus, their sizes in the captured 2D image are different, sizA and sizB, respectively. For simplification, only the image sizes in the x-direction xA and xB are discussed. The pixel value of the CGDM changes linearly along the vanishing line. Thus, the CGDM values for these two objects, depthA and depthB, are proportional to their sizes. From Equation (1), the relationship between the real shooting distance and the pixel value of the CGDM can be expressed as: When Δdepth is infinitely small, Equation (9) can be rewritten as: As the shooting distance increases, each increment in the CGDM gray scale represents a larger change in depth, as shown in Figure 5b. The human eye's perception of depth information wanes as the observation distance increases. Thus, human factor engineering is fully considered during the design of the CGDM. In this study, the average processing time for the image classification, CGDM calculation, and CGH generation are 9.67, 87.33, and 1201.67 ms, respectively. The total calculation time is 1298.67ms, which limits the application of the proposed approach in dynamic 3D display. Considering that both the calculation of CGDMs and the generation of CGHs can be realized by deep learning [43], further optimization of the calculation time would be practical. Combining the proposed method with deep learning network is the future direction of the work.

Conclusions
Adoption of holographic 3D displays is inhibited by the dearth of rich 3D content. To address this issue, we successfully demonstrate a layer-based holographic algorithm by applying 2D images to a 2D-to-3D rendering approach. In this study, 2D images are first classified into three categories: the distant view, perspective, and close-up types. A cumulative horizontal edge histogram, vanishing line method, and local edge histogram are employed to calculate the CGDMs. The layer-based holographic algorithm is then employed to calculate the CGHs for 3D models obtained by the 2D-to-3D approach. Identical 3D variations are obtained in the reconstructions of 2D images. The average processing time is approximately 1200 ms. Further improvement of the CGH calculation time would assist application of the proposed approach in dynamic holographic displays.  Data Availability Statement: All data generated or analyzed during this study are included in this published article.

Conflicts of Interest:
The authors declare no conflict of interest.