Leveraging NeRF for Cultural Heritage Preservation: A Case Study of the Katolička Porta in Novi Sad
Jean-Pierre Jessel
Round 1
Reviewer 1 Report
Comments and Suggestions for Authors- It is recommended to supplement the effectiveness data in practical applications (such as the improvement degree of texture details after activating "Rand Levels") in Figure 3.
- The virtual restoration section (such as the application of IP-Adapter and ControlNet) involves numerous technical terms. It is recommended to add flowcharts or schematic diagrams.
- It is recommended to add detailed comparison data and case analyses regarding accuracy, efficiency, cost, etc., between NeRF and traditional 3D reconstruction methods, so as to clarify the advantages and applicable scenarios of NeRF
- The following studies were recommended to be properly cited:[1] Research on Linear Active Disturbance Rejection Control for Uncertain Systems with Output Noise[2] Ship Detection under Low-Visibility Weather Interference via an Ensemble Generative Adversarial Network
Author Response
We want to thank the Reviewer for the valuable comments and suggestions, which helped us to improve the quality of the manuscript.
Please note that the line numbers correspond to the PDF version of the revised manuscript without the Track Changes option (electronics-3731358_Track Changes_accepted.pdf).
We have addressed all comments in the revised version of the manuscript. Please follow our detailed responses to the comments below:
Reviewer comment:
- It is recommended to supplement the effectiveness data in practical applications (such as the improvement degree of texture details after activating "Rand Levels") in Figure 3.
Response:
In the revised version of the manuscript, qualitative evidence from this project is presented in the third column of Table 3. Please see revised Table 3.
- The virtual restoration section (such as the application of IP-Adapter and ControlNet) involves numerous technical terms. It is recommended to add flowcharts or schematic diagrams.
Response:
The flowchart for the application of IP-Adapter and ControlNet was added, please see Figure 14 in the revised version of the manuscript (page 16). An explanation above the image has also been added, see lines 548-558.
- It is recommended to add detailed comparison data and case analyses regarding accuracy, efficiency, cost, etc., between NeRF and traditional 3D reconstruction methods, so as to clarify the advantages and applicable scenarios of NeRF.
Response:
In the revised version of the manuscript, a detailed comparison between NeRF and traditional reconstruction methods such as photogrammetry and laser scanning was added. Please see lines 46-59.
- The following studies were recommended to be properly cited:[1] Research on Linear Active Disturbance Rejection Control for Uncertain Systems with Output Noise[2] Ship Detection under Low-Visibility Weather Interference via an Ensemble Generative Adversarial Network
Response:
In the revised version of the manuscript, the recommended studies were cited, please see references [3] and [4].
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe preservation and documentation of architectural and cultural heritage increasingly rely on digital technologies, as they have become increasingly powerful and accessible. Among the most commonly used technologies, photogrammetry and laser scanners require specific equipment and enable 3D reconstruction of buildings or objects, following specific and relatively unscalable manual and (semi)automatic processing.
Two-dimensional image sets can be transformed into three-dimensional scenes using a neural network. In particular, as presented in this article, Neural Radiance Field (NeRF) reconstruction and the assistance of artificial intelligence can produce photorealistic 3D scenes from a limited set of 2D images. Thus, using machine learning to represent the 3D scene in interaction with light offers an alternative to traditional methods. These approaches significantly accelerate workflows, reduce costs and minimize manual intervention, and are suitable for inaccessible or fragile sites.
This article presents a methodology designed for the rapid, high-quality creation of 3D environments using NeRF technology. The data and images used to train NeRF were captured by drone, with particular attention paid to critical aspects of scene capture to ensure data usability and enable rapid convergence to a 3D model.
The application of NeRF, combined with high-resolution drone-acquired images, as in the Katolička Porta project in Novi Sad presented in the article, produced detailed and accurate digital twins. This approach also enabled virtual restoration and texture enhancement. The article also highlights the importance of Katolička Porta, a historic site that has evolved over the centuries, which has benefited from these techniques to preserve its architectural and cultural identity. Thus, the article demonstrates the potential of this technology for cultural heritage conservation.
The NeRF pipeline is described, including video input conversion, image preprocessing, desampling, stylization, and final video generation. This approach was evaluated in collaboration with 20 experts specializing in cultural heritage protection. Professionals indicate a good appreciation of the importance of using advanced visualization tools, particularly 3D reconstructions, in the assessment, preservation, and presentation of immovable cultural heritage. However, these technologies are still considered auxiliary tools for interpretation rather than substitutes for formal documentation.
The article also presents work to be developed in several directions such as integrating NeRF models into virtual and augmented reality (VR/AR) environments to create immersive and interactive experiences or developing user-friendly interfaces for educational applications or in museums. To develop the method, it is proposed to automate and optimize data acquisition and processing in order to increase efficiency and scalability, and to develop multimodal sensor fusion to produce more efficient 3D models. Similarly, to increase application possibilities, possibilities of integration into geographic information systems (GIS), or building information modeling (BIM) are discussed.
Other remarks:
The NeRF method and AI use could have been as detailed than implementation and application use.
There is a too small population for the assessment, leading to devalue the results, even if they are encouraging
There are 4.3.2.1. without any 4.3.2.x, and 7.1 without 7.2
Author Response
We want to thank the Reviewer for the valuable comments and suggestions, which helped us to improve the quality of the manuscript.
Please note that the line numbers correspond to the PDF version of the revised manuscript without the Track Changes option (electronics-3731358_Track Changes_accepted.pdf).
We have addressed all comments in the revised version of the manuscript. Please follow our detailed responses to the comments below:
Reviewer comment:
The preservation and documentation of architectural and cultural heritage increasingly rely on digital technologies, as they have become increasingly powerful and accessible. Among the most commonly used technologies, photogrammetry and laser scanners require specific equipment and enable 3D reconstruction of buildings or objects, following specific and relatively unscalable manual and (semi)automatic processing.
Two-dimensional image sets can be transformed into three-dimensional scenes using a neural network. In particular, as presented in this article, Neural Radiance Field (NeRF) reconstruction and the assistance of artificial intelligence can produce photorealistic 3D scenes from a limited set of 2D images. Thus, using machine learning to represent the 3D scene in interaction with light offers an alternative to traditional methods. These approaches significantly accelerate workflows, reduce costs and minimize manual intervention, and are suitable for inaccessible or fragile sites.
This article presents a methodology designed for the rapid, high-quality creation of 3D environments using NeRF technology. The data and images used to train NeRF were captured by drone, with particular attention paid to critical aspects of scene capture to ensure data usability and enable rapid convergence to a 3D model.
The application of NeRF, combined with high-resolution drone-acquired images, as in the Katolička Porta project in Novi Sad presented in the article, produced detailed and accurate digital twins. This approach also enabled virtual restoration and texture enhancement. The article also highlights the importance of Katolička Porta, a historic site that has evolved over the centuries, which has benefited from these techniques to preserve its architectural and cultural identity. Thus, the article demonstrates the potential of this technology for cultural heritage conservation.
The NeRF pipeline is described, including video input conversion, image preprocessing, desampling, stylization, and final video generation. This approach was evaluated in collaboration with 20 experts specializing in cultural heritage protection. Professionals indicate a good appreciation of the importance of using advanced visualization tools, particularly 3D reconstructions, in the assessment, preservation, and presentation of immovable cultural heritage. However, these technologies are still considered auxiliary tools for interpretation rather than substitutes for formal documentation.
The article also presents work to be developed in several directions such as integrating NeRF models into virtual and augmented reality (VR/AR) environments to create immersive and interactive experiences or developing user-friendly interfaces for educational applications or in museums. To develop the method, it is proposed to automate and optimize data acquisition and processing in order to increase efficiency and scalability, and to develop multimodal sensor fusion to produce more efficient 3D models. Similarly, to increase application possibilities, possibilities of integration into geographic information systems (GIS), or building information modeling (BIM) are discussed.
Response:
We want to thank the Reviewer for the valuable comments.
Other remarks:
The NeRF method and AI use could have been as detailed than implementation and application use.
Response:
The NeRF method and AI use in this study was discussed in more details in the revised version of the manuscript. Please see revised Discussion section and lines 825-834.
There is a too small population for the assessment, leading to devalue the results, even if they are encouraging.
Response:
The detailed explanation of the limited number of participants was added in the Discussion, please see lines 815-824.
There are 4.3.2.1. without any 4.3.2.x, and 7.1 without 7.2
Response:
In the revised version of the manuscript, the title of the subsection 4.3.2.1 was converted to 4.3.3. (line 590), and subsection 7.1. was removed. Please see line 885.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThis manuscript focuses on the digital preservation and virtual restoration of Katolička Porta in Novi Sad through the integration of cutting-edge technologies that combine aerial data acquisition, advanced 3D reconstruction, and texture enhancement methods. The methodology uses a neural network for reconstructing (the neural radiance field (NeRF)), where 2D images can become a complex 3D scene.
The results of a conducted survey among experts in the field of cultural heritage protection clearly indicate a high level of acceptance and recognition of the value of digital technologies, particularly 3D visualizations based on methods such as NeRF.
The proposed methodology would be especially valuable for documenting hard-to-reach or restricted parts of monuments that are typically difficult to access using conventional methods.
The manuscript is very well structured. The use of figures and tables is helpful in a better understanding of the study. The research is based on appropriate, recent, and qualitative references. The research design is appropriate, and the methodology is well defined and easy to replicate. The contribution of this study is easy to read and understand.
Some minor issues would further improve the quality of this research:
- In the Abstract, remove the abbreviation (line 16).
- In the Introduction, the manuscript presents the pillars of the research. Relevant and appropriate references support it. However, there is a need to split the very huge paragraph into smaller paragraphs (see lines 46 to 123). This will be helpful for the readers.
- In the first appearance, set the abbreviations and use only them in the whole document (i.e., NeRF in lines 41, 46, and 90). Cross-check all the abbreviations in the whole document.
- Try to include and clearly define the research questions or the hypotheses of this study. Place this subsection at the end of the Introduction, or the end of Section 3.
- In Section 2, the background of the use case is described. Split the long paragraph into smaller ones.
- In Section 3, some representative related works are presented in four subsections. Subsection 3.2 could be enhanced with the addition of some earlier studies (i.e., 10.1109/3DTV.2017.8280406 in lines 220-222). In subsection 3.3, consider replacing "augmented reality (AR)" with "extended reality (XR)". This will be more accurate, as AR does not offer immersive experiences. However, XR includes VR, MR, and AR.
- In Materials and Methods, the main issue is the limited number of participants in the survey of the methodology. An explanation of the limited participants and the risks of this fact would provide more transparency in the research. It is risky to make generalizations based on a limited number of participants (N=20). Consider providing this clarification in the last paragraph of the Discussion, as limitations.
- There are no issues in the Results. The results are briefly presented.
- In the Discussion, recall the research questions or the hypotheses and validate your results. Include a subsection with the limitations met.
- In the Conclusion, remove the title of the subsection 7.1.
Author Response
We want to thank the Reviewer for the valuable comments and suggestions, which helped us to improve the quality of the manuscript.
Please note that the line numbers correspond to the PDF version of the revised manuscript without the Track Changes option (electronics-3731358_Track Changes_accepted.pdf).
We have addressed all comments in the revised version of the manuscript. Please follow our detailed responses to the comments below:
Reviewer comment:
This manuscript focuses on the digital preservation and virtual restoration of Katolička Porta in Novi Sad through the integration of cutting-edge technologies that combine aerial data acquisition, advanced 3D reconstruction, and texture enhancement methods. The methodology uses a neural network for reconstructing (the neural radiance field (NeRF)), where 2D images can become a complex 3D scene.
The results of a conducted survey among experts in the field of cultural heritage protection clearly indicate a high level of acceptance and recognition of the value of digital technologies, particularly 3D visualizations based on methods such as NeRF.
The proposed methodology would be especially valuable for documenting hard-to-reach or restricted parts of monuments that are typically difficult to access using conventional methods.
The manuscript is very well structured. The use of figures and tables is helpful in a better understanding of the study. The research is based on appropriate, recent, and qualitative references. The research design is appropriate, and the methodology is well defined and easy to replicate. The contribution of this study is easy to read and understand.
Some minor issues would further improve the quality of this research:
- In the Abstract, remove the abbreviation (line 16).
Response:
In the revised version of the manuscript, the abbreviation from the line 16 was removed. Please see line 16.
- In the Introduction, the manuscript presents the pillars of the research. Relevant and appropriate references support it. However, there is a need to split the very huge paragraph into smaller paragraphs (see lines 46 to 123). This will be helpful for the readers.
Response:
In the revised version of the manuscript, the huge paragraph was split into smaller ones. Please see lines 60-123.
- In the first appearance, set the abbreviations and use only them in the whole document (i.e., NeRF in lines 41, 46, and 90). Cross-check all the abbreviations in the whole document.
Response:
All abbreviations were checked in the whole document. Please see the revised version of the manuscript.
- Try to include and clearly define the research questions or the hypotheses of this study. Place this subsection at the end of the Introduction, or the end of Section 3.
Response:
The research questions (hypotheses) were added at the end of the section The History and Cultural Significance of Katolička Porta in Novi Sad. Please see lines 224-237.
- In Section 2, the background of the use case is described. Split the long paragraph into smaller ones.
Response:
In the revised version of the manuscript, the long paragraph from Section 2 was split into smaller ones. Please see lines 148-211.
- In Section 3, some representative related works are presented in four subsections. Subsection 3.2 could be enhanced with the addition of some earlier studies (i.e., 10.1109/3DTV.2017.8280406 in lines 220-222). In subsection 3.3, consider replacing "augmented reality (AR)" with "extended reality (XR)". This will be more accurate, as AR does not offer immersive experiences. However, XR includes VR, MR, and AR.
Response:
In the revised version of the manuscript, the recommended studies were cited, please see reference [12]. The "augmented reality (AR) " was replaced with "extended reality (XR)", please see lines 281-283 and 885-888.
- In Materials and Methods, the main issue is the limited number of participants in the survey of the methodology. An explanation of the limited participants and the risks of this fact would provide more transparency in the research. It is risky to make generalizations based on a limited number of participants (N=20). Consider providing this clarification in the last paragraph of the Discussion, as limitations.
Response:
The detailed explanation of the limited number of participants was added in the Discussion, please see lines 815-824.
- There are no issues in the Results. The results are briefly presented.
We want to thank again the Reviewer for the valuable comments.
- In the Discussion, recall the research questions or the hypotheses and validate your results. Include a subsection with the limitations met.
Response:
In the Discussion, the recall of the hypotheses was added, please see lines 756-770.
- In the Conclusion, remove the title of the subsection 7.1.
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsComments were addressed.
