Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Transforming Monochromatic Images into 3D Holographic Stereograms Through Depth-Map Extraction

Appl. Sci. 2025, 15(10), 5699; https://doi.org/10.3390/app15105699

by Oybek Mirzaevich Narzulloev¹

, Jinwon Choi¹, Jumamurod Farhod Ugli Aralov¹

, Leehwan Hwang¹, Philippe Gentet¹

and Seunghyun Lee^2,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Appl. Sci. 2025, 15(10), 5699; https://doi.org/10.3390/app15105699

Submission received: 13 April 2025 / Revised: 13 May 2025 / Accepted: 15 May 2025 / Published: 20 May 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Review of the manuscript “Transforming Monochromatic Images into 3D Holographic 2 Stereograms through Depth-Map Extraction” by Oybek Mirzaevich Narzulloev, Jinwon Choi, Jumamurod Farhod Ugli Aralov, Leehwan Hwang, Philippe Gentet and Seunghyun Leeю

The manuscript is well written and informative. Research results can have practical application. These studies are useful for improving the visual perception of black and white photographs. However, it should be noted that any image processing does not increase the amount of information, but only reduces it. In my opinion, for photos of a person, it is rational to filter to reduce noise and make a color image of the face for better perception. At the same time, a certain colour of clothing may not correspond to historical truth. On the other hand, for buildings, especially for which there are many black and white photographs, and the photographs contain straight lines, rectangles, cylinders, etc., using artificial intelligence, it is possible to form 3D colour images, and using the method described in this manuscript, to obtain a holographic image.

I have these minor comments:

In Figure 3, additional building elements appeared in the Result: 3D scene that were not present in the original colour image.
Figure 10 shows the final digital hologram of An Jung-geun from different angles. We can see that the image of the face is shifted relative to the building for different angles, but the image of the face remains flat. That is, there is no significant advantage in creating a digital hologram.
Figure 10 shows the final digital hologram of An Jung-geun from different angles. We can see that the image of the face is shifted relative to the building for different angles, but the image of the face remains flat. That is, there is no significant advantage in creating a digital hologram.
I would phrase the captions for figures 10 and 11 a little differently: "Image created by An Jung-geun from the final digital hologram".
Some paragraphs in the manuscript could be shortened somewhat.

Conclusion. These studies are useful and can be used to process black and white photographs for better visual perception and archiving. The manuscript may be published with minor changes.

Author Response

Responses to Reviewer Comments:

The authors extend their gratitude to the reviewers for their insightful comments. We have thoroughly considered each suggestion and made the necessary revisions, with the hope that these changes align with your expectations. Detailed responses to all questions and comments are provided in the subsequent replies.

Comments and Suggestions for Authors (Reviewer 1):

Author Response:

Thank you for this insightful observation. We agree that for portraits, especially of historical figures, denoising and careful colorization of facial regions can significantly enhance viewer perception. However, as the reviewer rightly points out, clothing colorizations are speculative and may deviate from historical accuracy.

In our work, we employ a model trained on a diverse dataset of indoor and outdoor scenes, enabling the colorization process to infer plausible colors based on contextual cues. As discussed in the paper by Iizuka et al., “Let There Be Color!: Joint End-to-End Learning of Global and Local Image Priors for Automatic Image Colorization with Simultaneous Classification”, colorization inherently involves ambiguity. The authors emphasize that the colorization of garments, particularly in human portraits, often relies on priors learned from training data, and while visually convincing, the output may deviate from historical truth.

Considering this, we have added a clarifying note in Section 4.4 “Results and Analysis” of our manuscript, highlighting this limitation and recommending cautious interpretation when applying AI-based colorization to historically significant images.

I have these minor comments (Reviewer 1):

In Figure 3, additional building elements appeared in the Result: 3D scene that were not present in the original colour image.

Author Response: Thank you for your valuable feedback regarding the additional building elements in Figure 3. We appreciate your observation and acknowledge that the image might have unintentionally included elements not present in the original colorized photograph. This discrepancy may have resulted from the process used to generate the 3D scene, where some elements were inferred by the AI-based colorization and depth estimation algorithms to create a more coherent depth map. We have revisited the figure and made the necessary corrections to ensure that the 3D scene more accurately reflects the original colorized image without introducing unintended features. The updated version of Figure 3 has been included in the revised manuscript for your review. We hope this addresses your concern, and we welcome any further suggestions.

Figure 10 shows the final digital hologram of An Jung-geun from different angles. We can see that the image of the face is shifted relative to the building for different angles, but the image of the face remains flat. That is, there is no significant advantage in creating a digital hologram.

Author Response: Thank you for your insightful comment. The observed shift of the face relative to the building in Figure 10 is a consequence of the limited field of view (FOV) in our holographic setup. As discussed in the conclusion, we deliberately constrained the horizontal viewing angle to approximately 85-100°to maintain a balance between visual realism and viewer comfort. A wider FOV could enhance parallax effects, but it would also introduce significant distortions and reduce image quality, especially when working with limited input data such as a single monochromatic image. Moreover, a key principle in our approach was to preserve the original visual appearance of the historical figure as faithfully as possible. Rather than introducing artificial exaggeration of depth or perspective in the facial region, we focused on achieving authentic and respectful reconstructions of archival portraits. This design choice ensures that the face of the historical figure remains recognizable and visually consistent with the original photograph. In this way, the method contributes to the creation of aesthetically pleasing and meaningful 3D holographic representations from heritage photographs using minimal input data. We appreciate your observation.

Figure 10 shows the final digital hologram of An Jung-geun from different angles. We can see that the image of the face is shifted relative to the building for different angles, but the image of the face remains flat. That is, there is no significant advantage in creating a digital hologram.

Author Response: Thank you for your insightful feedback. The observation that the face appears flat in the final digital hologram of An Jung-geun, despite the shifting relative to the building, is indeed an important point. This phenomenon is primarily due to the limitations of the current depth estimation and the restricted field of view (FOV) used in our approach.

In our setup, we focused on maintaining a balanced FOV (85°–100°) to ensure visual comfort and minimize distortions caused by extreme parallax. However, this narrower FOV limits the depth perception and does not fully account for the depth variations in the face relative to other elements in the scene, resulting in a flat appearance.

Moreover, our method of creating 3D representations from a single black-and-white image, while successful in many aspects, still faces challenges in fully reconstructing facial depth, especially in terms of maintaining natural depth perception for all objects within the scene.

We believe that the digital hologram provides a significant advantage over traditional 2D images by enabling dynamic viewing angles and a more immersive experience.

I would phrase the captions for figures 10 and 11 a little differently: “Image created by An Jung-geun from the final digital hologram”.

Author Response: Thank you for your suggestion regarding the captions for Figures 10 and 11. We agree that the current phrasing can be improved for clarity. We will revise the captions as follows:

Revised Captions:

Figure 10: “Images of the final digital hologram portrait of An Jung-geun.”

Figure 11: “Images of the final digital hologram portrait of Yu Gwan-sun from.”

This change more accurately reflects the nature of the images and their relation to the digital hologram creation process.

Some paragraphs in the manuscript could be shortened somewhat.

Author Response: Thank you for your suggestion regarding the length of certain paragraphs in the manuscript. We appreciate your feedback and understand the importance of clarity and conciseness in scientific writing. We will review the manuscript carefully and shorten the relevant paragraphs without compromising the essential information and context. This revision will help improve readability and make the manuscript more concise.

Conclusion. These studies are useful and can be used to process black and white photographs for better visual perception and archiving. The manuscript may be published with minor changes.

Author Response: Thank you for your positive feedback and for recognizing the practical utility of our approach in processing black-and-white photographs for enhanced visual perception and archiving. We are grateful for your suggestion to make minor changes. Based on your feedback, we will address the necessary revisions to improve the manuscript. We hope that these changes will help strengthen the overall quality of the paper. We look forward to your further suggestions and are confident that the revised version will meet the required standards for publication.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This paper proposes a novel method for generating digital holographic stereograms using a single monochromatic photograph. By integrating deep learning techniques for image colorization and depth-map estimation, the approach synthesizes multi-view perspective images required for holographic stereogram creation. This methodology also establishes a groundbreaking framework for applications in cultural heritage preservation and personal archival systems. However, this paper has some minor mistakes which need to be modified. My questions and comments are as follows:

In Section 2, the literature review of existing technologies is not comprehensive enough, as it fails to mention the latest advancements in the field of AI-based hologram generation. This omission may impact the demonstration of the novelty of the present study. It is suggested that the authors supplement the literature review with recent studies on AI-driven hologram generation to provide a more complete background and strengthen the novelty of this work.
In Section 4.1, when describing the dataset, the paper only mentions the resolution of the images but does not provide information about other image characteristics, such as brightness, contrast, and sharpness. It is suggested to supplement the description of the image characteristics to better assess the quality and complexity of the dataset.
The paper describes the final hologramsin Section 4.3 but does not provide a detailed comparison of the visual effects and quality between the generated holograms and the original images. It is recommended to include specific evaluation metrics and comparative analysis to more objectively demonstrate the performance of the proposed method.
In Sections 3.1 and 3.3, key technical terms such as "hogel," "DCNF model," and "CRF loss layer" are introduced without sufficient context or definition. For instance, the paper does not clearly explain how a hogel differs from a conventional pixel in terms of data structure or the specific advantages of the DCNF model over other depth-estimation models like MiDaS.It is recommended to add short explanations for these technical terms.

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This manuscript presents a novel and practical method for transforming single black-and-white images into digital holographic stereograms. However, some revisions need to be addressed by the authors before it can be considered for publication.

Limited Test Cases. The study evaluates the method using only two monochromatic portraits. While these examples are historically significant and well-chosen, this small sample size limits the assessment of the pipeline’s robustness and generalizability. Please include additional test cases with varying image quality, content complexity, and resolution to strengthen the experimental validation.
Lack of Quantitative Evaluation. The paper relies primarily on visual inspection to assess the results. While the images are visually appealing, no objective or subjective metrics are used to evaluate the quality of colorization, depth accuracy, or final holographic output. Please add a quantitative analysis, such as SSIM or PSNR for colorization, or depth estimation error compared to available ground truth (if feasible). Alternatively, a small user study can be conducted for perceptual quality assessment.
To strengthen the technical rigor of the paper, we recommend including a comparative analysis between your proposed method and at least one or two existing approaches for generating 3D holograms or synthesizing depth maps from limited input data. This comparison could be carried out on depth estimation output or even the final holograms generated, to demonstrate how your single-image pipeline performs relative to more data-intensive methods.
Manual vs. Automated Pipeline Steps. It is unclear which parts of the pipeline are fully automated and which require manual intervention, particularly regarding Adobe After Effects usage. This lack of detail limits reproducibility and practical adoption. Please, clearly distinguish between automated and manual stages. If scripts or templates were used in After Effects, consider providing more technical details or including them in supplementary materials.
Time Required for Full Hologram Generation. The manuscript does not specify how long it takes to complete the full process, from input image preprocessing to final hologram printing. This information is important for assessing the practical feasibility of the method. Please specify the average or estimated time required to complete the entire hologram generation workflow, ideally breaking it down by stage (e.g., preprocessing, colorization, depth estimation, rendering, printing).
Expand the discussion on potential issues and limitations in hologram reconstruction (e.g., how to handle uncertainties or potential misrepresentations).

Minor comments:

Please specify whether the AI models (e.g., for colorization and depth estimation) were used as-is from public sources (such as Hugging Face) or fine-tuned for this application. If fine-tuned, briefly describe the dataset, training parameters (e.g., epochs, learning rate), and any architectural or preprocessing modifications. This clarification will improve the reproducibility and technical transparency of your method.
Discuss whether the method is compatible with open-source alternatives to Adobe After Effects for future scalability.

Author Response

Responses to Reviewer Comments:

The authors thank the reviewers for their insightful comments. We have thoroughly considered each suggestion and made the necessary revisions, hoping these changes align with your expectations. Subsequent replies provide detailed responses to all questions and comments.

Comments and Suggestions for Authors:

Author Response: Thank you for your thoughtful and constructive comments on our manuscript. We greatly appreciate your recognition of the novelty and practicality of our proposed method. We have carefully addressed all the points raised in your review and revised the manuscript accordingly to improve its clarity, accuracy, and overall quality.

Limited Test Cases. The study evaluates the method using only two monochromatic portraits. While these examples are historically significant and well-chosen, this small sample size limits the assessment of the pipeline’s robustness and generalizability. Please include additional test cases with varying image quality, content complexity, and resolution to strengthen the experimental validation.

Author Response: We appreciate your valuable feedback. While our current study evaluates two historically significant monochromatic portraits An Jung-geun and Yu Gwan-sun, these cases were carefully selected not only for their cultural relevance but also to demonstrate the practical application of our pipeline under challenging real-world conditions. It is important to emphasize that both the facial portraits and their corresponding backgrounds were originally 2D images. Our method successfully reconstructs them into coherent 3D holographic scenes using only single-image inputs. The ability to generate visually compelling 3D representations from limited monochromatic data, including architectural elements in the background, reflects the robustness and adaptability of the proposed approach across both foreground and environmental content. This demonstrates that the pipeline is capable of handling diverse image components consistently and reliably, even when applied to historical materials with limited visual data.

Lack of Quantitative Evaluation. The paper relies primarily on visual inspection to assess the results. While the images are visually appealing, no objective or subjective metrics are used to evaluate the quality of colorization, depth accuracy, or final holographic output. Please add a quantitative analysis, such as SSIM or PSNR for colorization, or depth estimation error compared to available ground truth (if feasible). Alternatively, a small user study can be conducted for perceptual quality assessment.

Author Response: Thank you for your constructive feedback. We understand your concern regarding the lack of quantitative metrics such as SSIM or PSNR in the original manuscript. However, we would like to clarify that the algorithms we used for colorization and depth estimation are directly based on previous works, such as the methods by Iizuka et al. (2016) for colorization and Liu et al. (2015) for depth estimation. These foundational works already include extensive quantitative evaluations using metrics like SSIM, PSNR, and others, which we have referenced in our manuscript. Given that the core algorithms for colorization and depth estimation are derived from these well-established methods, we believe that a detailed repetition of these quantitative evaluations is not necessary, as they have already been thoroughly validated in the original studies. Instead, we focused on demonstrating the application of these techniques in our specific context of creating 3D digital holograms from monochromatic images. Nevertheless, we acknowledge the importance of validation and will provide additional clarification in the manuscript, emphasizing the reliance on these prior works for the colorization and depth estimation steps. We hope that this will provide sufficient justification for the approach without repeating the quantitative metrics already established in the cited literature.

To strengthen the technical rigor of the paper, we recommend including a comparative analysis between your proposed method and at least one or two existing approaches for generating 3D holograms or synthesizing depth maps from limited input data. This comparison could be carried out on depth estimation output or even the final holograms generated, to demonstrate how your single-image pipeline performs relative to more data-intensive methods.

Author Response: Thank you for this valuable suggestion. We have addressed your comment by expanding the discussion in the revised Related Work section and clarifying how our method differs from existing techniques. Most of the previously proposed methods—such as those by Lee et al. [17], Sarakinos et al. [12], and Dashdavaa et al. [18]—require either multiple photographs, perspective image sequences, or hardware-based depth sensing (e.g., time-of-flight sensors) to generate holograms. These approaches are highly effective for controlled imaging environments or pre-existing 3D datasets. In contrast, our method is tailored for situations where only a single monochromatic image is available, such as historical portraits. By leveraging deep learning-based colorization and monocular depth estimation, followed by manual refinement in Adobe After Effects, our pipeline can reconstruct plausible depth with perceptual accuracy even from minimal input data. While it does not claim to surpass hardware-based or multi-view approaches with in-depth precision, it offers a practical and accessible alternative for heritage preservation, media restoration, and personal archiving tasks where input is inherently limited. This distinction has been clarified in the manuscript (see Section 2).

Manual vs. Automated Pipeline Steps. It is unclear which parts of the pipeline are fully automated and which require manual intervention, particularly regarding Adobe After Effects usage. This lack of detail limits reproducibility and practical adoption. Please, clearly distinguish between automated and manual stages. If scripts or templates were used in After Effects, consider providing more technical details or including them in supplementary materials.

Author Response: Thank you for your insightful comment. We have clarified that the image layers used in our 3D reconstruction were directly obtained as separate foreground and background images. The reconstruction process in Adobe After Effects involved the generation and refinement of depth maps using the Face 3D plugin, where parameters such as Depth Map Control and Blur were manually adjusted. Scene composition and the setup of the virtual camera array were carried out manually, while the rendering of perspective images was automated using After Effects’ rendering engine. These updates have been incorporated into the revised manuscript, specifically in Section 4.1: Dataset, in the Results and Discussion.

Time Required for Full Hologram Generation. The manuscript does not specify how long it takes to complete the full process, from input image preprocessing to final hologram printing. This information is important for assessing the practical feasibility of the method. Please specify the average or estimated time required to complete the entire hologram generation workflow, ideally breaking it down by stage (e.g., preprocessing, colorization, depth estimation, rendering, printing).

Author Response: Thank you for your valuable comment. We agree that providing a breakdown of the processing time for each stage contributes to understanding the practical feasibility of our method. We have added the following information to the revised manuscript (Section 4.3. Final Holograms):

“In this experiment, we successfully printed two hologram portraits, each measuring 15 × 20 cm. The printing process was carried out using the Digital CHIMERA holographic stereogram printer. The system achieved a spatial resolution of 250 µm at a printing frequency of 60 Hz, resulting in the production of 60 hogels per second. The entire production process for a single hologram required approximately four hours.”

As for earlier stages such as image colorization and depth estimation, these were conducted using publicly available implementations from GitHub repositories (Iizuka et al. for colorization and Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields by Liu et al. for depth estimation). Since these processes depend on hardware specifications and computational resources, we did not include a precise time breakdown for them in the manuscript.

Expand the discussion on potential issues and limitations in hologram reconstruction (e.g., how to handle uncertainties or potential misrepresentations).

Author Response: Thank you for your valuable suggestion. We have carefully considered your comment and have expanded the discussion on the limitations and potential issues in hologram reconstruction, particularly addressing the impact of half-parallax generation, depth-map inaccuracies, and the challenges associated with AI-based colorization of historical portraits. These additions aim to provide a more comprehensive and balanced perspective on the constraints of our proposed method.

Minor comments:

Please specify whether the AI models (e.g., for colorization and depth estimation) were used as-is from public sources (such as Hugging Face) or fine-tuned for this application. If fine-tuned, briefly describe the dataset, training parameters (e.g., epochs, learning rate), and any architectural or preprocessing modifications. This clarification will improve the reproducibility and technical transparency of your method.

Author Response: Thank you for your valuable comment. The AI models used for image colorization and monocular depth estimation were utilized as-is from publicly available repositories, without any fine-tuning or retraining. Specifically, we employed the pre-trained models from “Let There Be Color!: Automatic Colorization of Grayscale Images” by Iizuka et al. for colorization, and “Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields” by Liu et al. for depth estimation.

After the initial depth maps were extracted, further refinement was performed in Adobe After Effects using built-in tools such as Depth Map Control and Blur. In this stage, depth values were manually adjusted using slider controls, and blur parameters were fine-tuned to achieve smoother and more perceptually accurate depth transitions. These adjustments were part of the 3D scene composition process and independent of the original AI model training.

Discuss whether the method is compatible with open-source alternatives to Adobe After Effects for future scalability.

Author Response: We appreciate your suggestion regarding compatibility with open-source alternatives to Adobe After Effects. In response, we have added a brief extension to the 4.4. Results and analysis section.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

Comments and Suggestions for Authors

The authors have made a commendable effort to improve the manuscript, and the revised version is suitable for publication. However, Figure 11 currently lacks a caption, and this should be addressed to ensure clarity and completeness.

Author Response

Responses to Reviewer Comments:

Comments and Suggestions for Authors:

Author Response: Thank you for your encouraging feedback and for recognizing our efforts to improve the manuscript. We appreciate your attention to detail. We have addressed the issue by adding the caption to Figure 11 as follows:

“Figure 11. Images of the final digital hologram portrait of Yu Gwan-sun.” We believe this addition improves the clarity and completeness of the figure.

Author Response File: Author Response.docx

Article Menu

Transforming Monochromatic Images into 3D Holographic Stereograms Through Depth-Map Extraction

Further Information

Guidelines

MDPI Initiatives

Follow MDPI