Leveraging Neural Radiance Fields for Large-Scale 3D Reconstruction from Aerial Imagery
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsIn this work, the authors evaluated three Neural Radiance Fields (NeRFs) -based methods for large-scale 3D reconstruction of nadir-looking aerial imagery. The authors analyzed different configurations of the methods by varying hyperparameters such as the number of NeRF submodule, the density threshold for extractiong the point cloud, and the presence of an additional neural network to predict point visibility. The manuscript is well-written and had experiments on multiple datasets. However, I have a few clarifications, questions, and suggestions I would like the authors to address first, to further improve the quality and clarity of the manuscript.
In page 4, lines 132--133: Something wrong with the sentence. Please rephrase for clarity.
In page 4, line 137: It is not clear whether p is a viewing point or a viewed/reconstructed point.
In page 7, equation 2: Is "model" not equivalent to "point cloud" so P_{gt} = \Epsilon?
In page 7, lines 229--230: How is this filter applied? Points are ranked based on absolute distance from nearest ground truth point?
In page 7, equation 3: Equation 3 is not so clear to me. Is Xth used as a condition to include/filter points to include in the calculation of the absolute distance? If so, wouldn't it be more clearer to put it under the summation sign? In the current format, it looks like it's included in the expression being evaluated instead.
In page 7, lines 232--233: How is the threshold X_{th} chosen?
In page 7, lines 234--235: Incomplete sentence. Please complete.
In page 7, lines 250--252: It is not clear whether original and reconstructed images are not equivalent to x and y?
In page 8, lines 283--284: Is this a completeness threshold? From previous text it is more like an ground-truth based outlier filter.
In page 8, lines 291--292: Why not perform the same quantitative assessment for H3D dataset?
In page 9, lines 306--308: Seems consistent except for the TMB, DVGO entries. Any reason behind this?
In page 10, lines 331--332: Difficult to see this observation. Perhaps error maps could help.
In page 10, lines 334--335: I am not quite sure what this mean. I remember the parameter was mentioned but it's exact value or use is not clear to me in this. Could you clarify / expound this part?
In page 11, lines 345--346: This internal density value is not normalized? What does a threshold of 100 mean compared to a threshold of 200?
In page 11, lines 356--357: Any reason behind this exception?
In page 12, lines 368--371: What is and how is the target reference for the VisibilityNet obtained?
In page 12, lines 373--375: This is counterintuitive to me as I thought the visibility prediction will filter out points. Could you clarify / expound this part?
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for Authors1. This study is an application of the neural radiation field (NeRF) network. It is necessary for the author to describe the principle and structure of NeRF in order to facilitate readers' better understanding of the principle and the subsequent content.
3.According to the research results in the manuscript, 3D reconstruction based on NeRF technology can achieve complete reconstruction under unfavorable conditions, but the accuracy of reconstruction is lower than traditional photogrammetry methods, which to some extent limits the application scope of this method. If possible, it is recommended that the author modify the NeRF model or integrate it with other methods to enhance the accuracy of reconstruction.
3. The explanation of some issues in the manuscript is too general, with multiple places using "possible" descriptions. It is recommended to use actual data or effects to clarify the statement.
4. The parameters such as threshold and voxel grid size used in the manuscript also need to be clearly stated, and it is recommended that the author provide a more detailed description of the algorithm so that the methods described in the manuscript can be reproduced.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThis study investigates the application of neural radiation field (nerf) in large-scale 3D reconstruction of outdoor scenes. The authors evaluated three NerF-based approaches: Mega-NeRF, BlockNeRF, and Direct Voxel Grid Optimization (DVGO), focusing on their accuracy and completeness compared to ground-based true point clouds. The effects of using multiple submodules, estimating visibility through additional neural networks, and changing density thresholds on point cloud extraction are also analyzed. The results show that despite the lower quality compared to classical photogrammetry methods, the NERF-based reconstruction provides visually compelling results in challenging areas. In particular, increasing the number of submodules and using additional multilayer perceptrons (MLPS) to predict visibility can significantly improve the quality of the result reconstruction.
The manuscript is very innovative, and the method proposed by Su Oh is very interesting and, in my opinion, very consistent with the scope of remote sensing. But before being considered for acceptance, there are a few minor issues that should be noted:
For Tables 2 and 3, it is recommended to highlight the contrast.
Figure 6 makes a lot of sense in my opinion, but the description is not detailed enough, so I suggest adding more intuitive descriptions.
It would improve the quality of manuscripts if poor quality data sets could be quantified and if the methods being compared were better for which data.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for AuthorsThis paper proposes a method utilizing Neural Radiance Fields (NeRF) for three-dimensional reconstruction from large-scale aerial imagery, and compares it with the traditional Multi-View Stereo (MVS) approach. It elaborates on the fundamental principles of NeRF and its advantages in large-scale scene reconstruction in detail, and verifies the effectiveness and superiority of the proposed method through experiments. The paper showcases innovative thinking and provides a novel perspective and methodology for the three-dimensional reconstruction of aerial imagery, positively contributing to the advancement of remote sensing applications.
Comprehensive Comments:
This paper demonstrates a certain level of innovation and possesses significant research value. The organizational structure of the paper is clear and logically rigorous. The constructed method is scientific and reasonable, and the conclusions drawn are highly credible. The paper is recommended for publication after appropriate revisions.
Specific Revision Suggestions:
(1) Refine the abstract to highlight the key findings and contributions of the research. The language throughout the paper should be further condensed, with appropriate simplification and explanation of overly specialized terminology or complex sentence structures to enhance readability and ease of understanding.
(2) Incorporate a technical route flowchart to improve the overall logical structure of the paper.
(3) The figures and tables in the paper are incorrect, particularly those representing experimental results. Please revise them according to standard academic paper formatting guidelines.
(4) For the different results observed in the experiments, such as the differences between the NeRF method and traditional methods, it is recommended to conduct a more in-depth analysis and discussion to reveal the underlying causes and mechanisms.
(5) In the conclusion section, provide a more comprehensive explanation of the COLMAP and NeRF methods and add prospects for future research, including potential new application scenarios and improvement methods.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsIn the experimental verification section, it is recommended that the author list the overall computation time of the four cases in a chart, which will make the comparison more obvious, such as the data collection time of each case, the time used by the comparison algorithm, the time used by the algorithm in this manuscript , the time after reducing the image resolution, and the time that can be expected after the author's suggestion, although this time cannot be calculated temporarily, it can be used as a relatively reliable expected time, which will help readers understand whether real-time or quasi real time 3D modeling can be achieved.
Author Response
Comment 1: In the experimental verification section, it is recommended that the author list the overall computation time of the four cases in a chart, which will make the comparison more obvious, such as the data collection time of each case, the time used by the comparison algorithm, the time used by the algorithm in this manuscript , the time after reducing the image resolution, and the time that can be expected after the author's suggestion, although this time cannot be calculated temporarily, it can be used as a relatively reliable expected time, which will help readers understand whether real-time or quasi real time 3D modeling can be achieved. Reply: Thank you for this feedback. Due to the fact that the NeRF-based results are quantitatively considerably worse than conventional photogrammetry and also slower, we have decided not to include more detailed time measurements in this work. The many different configurations, such as number of submodules, image resolution and architecture of the MLP, that we evaluated in this work would all influence these time measurements, making interpretation difficult. We already discuss rough numbers in the discussion chapter in order to give a idea of how the methods compare in terms of computation time. In the context of methods where the focus would be on speed, such an analysis of runtime complexity would definitely be of great interest.