Review Reports - Comparative Analysis of Novel View Synthesis and Photogrammetry for 3D Forest Stand Reconstruction and Extraction of Individual Tree Parameters

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This study first applied NeRF and 3DGS to complex forest stands. The results provide a clear comparison among the three methods (photogrammetry, NeRF, and 3DGS). However, in my opinion, the study lacks novelty, as it primarily reports comparative results without providing deeper insights. Further discussion and interpretation of the findings are necessary to ensure the paper does not read merely as a technical report.

In addition, the authors should consider how their findings can be generalized across different forest types. Although two forest types were selected, it remains difficult to determine which method is most suitable for a given environment. For instance, even a structurally simple forest may have highly complex terrain, which can significantly influence reconstruction results. Moreover, the study lacks a detailed explanation of the differences between the two plots, limiting the reader’s ability to fully understand and evaluate the discussion.

Furthermore, the authors did not provide details on how the models were tuned. I think the occlusion effects might be a potentially key factor influencing the results, but there was no further analysis of how the three methods have advantages in reducing these effects.

Last, the registration process using TLS data lacks explanation (What are the accuracy after registration?).

Author Response

Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding revisions/corrections highlighted in track changes in the re-submitted file.

Questions for General Evaluation	Reviewer’s Evaluation	Response and Revisions
Does the introduction provide sufficient background and include all relevant references?	Can be improved	We included more related information in the introduction, and added more references.
Is the research design appropriate?	Must be improved	We adopted a common and comprehensive approach to investigate the applicability of NVS methods for 3D reconstruction of forest stand. Additionally, we incorporated more detailed information and descriptions to demonstrate the rationality of the study design.
Are the methods adequately described?	Must be improved	We added more detailed descriptions of the 3D reconstruction method, point cloud filtering, and single tree segmentation algorithm.
Are the results clearly presented?	Can be improved	We supplemented the analysis with more detailed result descriptions and discussions
Are the conclusions supported by the results?	Yes	Thank you

3. Point-by-point response to Comments and Suggestions for Authors

Comments 1: This study first applied NeRF and 3DGS to complex forest stands. The results provide a clear comparison among the three methods (photogrammetry, NeRF, and 3DGS). However, in my opinion, the study lacks novelty, as it primarily reports comparative results without providing deeper insights. Further discussion and interpretation of the findings are necessary to ensure the paper does not read merely as a technical report.

Response 1: Thank you for your valuable suggestions. We applied novel view synthesis (NVS) technology to the 3D reconstruction of forest stands, following a general and reasonable experimental design to achieve a complete reconstruction process and comparisons. The focus of our study is to evaluate the differences, advantages and limitations of this method compared to photogrammetry, as well as to verify its applicability in forest stand scenarios. Although this study does not present further theoretical innovations, we believe that our findings provide valuable reference for subsequent research on NVS applications. Additionally, we have taken your suggestions into account and further discussed and interpreted the results. Please refer to sections 3 Results and 4 Discussion for more details.

Comments 2: In addition, the authors should consider how their findings can be generalized across different forest types. Although two forest types were selected, it remains difficult to determine which method is most suitable for a given environment. For instance, even a structurally simple forest may have highly complex terrain, which can significantly influence reconstruction results. Moreover, the study lacks a detailed explanation of the differences between the two plots, limiting the reader’s ability to fully understand and evaluate the discussion.

Response 2: Thank you for your insightful comments. In Section 4 (Discussion), we have addressed the limitations of our studies on forest stand. Based on this research, we will continue to explore more complex forest scenes with varying tree structures and terrain complexity. However, based on the reconstruction results from this study, it is clear that NeRF has more advantages over photogrammetry and is better suited to complex forest environments. Additionally, we have incorporated your suggestions by adding more descriptions of the differences between the two plots, such as terrain elevation differences and tree height ranges. Please refer to lines 172-188 in the revised version for further details.

Comments 3: Furthermore, the authors did not provide details on how the models were tuned. I think the occlusion effects might be a potentially key factor influencing the results, but there was no further analysis of how the three methods have advantages in reducing these effects.

Response 3: Following your suggestion, we have supplemented more details on the reconstruction tunings for the NeRF and 3DGS models, including training loss convergence, batch size, learning rate, and other relevant parameters. Please refer to the lines 221–224, 310–316, and 355–358 for these updates. Regarding the occlusion effect, in Section 3.2 Point Cloud Comparison, we compared the impact of canopy occlusion in Plot_2 on the completeness of tree point clouds across the three methods. In Section 3.3 Extraction of Tree Parameters from Stand Plot Point Cloud, we analyzed the effect of occlusion on the extraction of TH and CD parameters between Plot_1 and Plot_2 (see revised lines 530–533). In the discussion (Section 4), we also added a description highlighting NeRF’s better adaptability to reconstructing occluded scenes (see revised lines 635–638).

Comment 4: Last, the registration process using TLS data lacks explanation (What are the accuracy after registration?).

Response 4: Thank you for your valuable comment. In the revised version, lines 316–320, we have added details on the registration process between the three types of point cloud models and the TLS point cloud, including the registration methods used and the corresponding accuracy information.

4. Response to Comments on the Quality of English Language

Point 1:

Quality of English Language

( ) The English could be improved to more clearly express the research.
(x) The English is fine and does not require any improvement.

Response : Thank you

5. Additional clarifications

We gratefully appreciate your valuable comments and suggestions.

Reviewer 2 Report

Comments and Suggestions for Authors

Dear Authors,

The manuscript “Comparative Analysis of Novel View Synthesis and Photogrammetry for 3D Forest Stand Reconstruction and extraction of individual tree parameters” mentions that New View Synthesis (NVS) technology, such as NeRF and 3DGS, has become an emerging research topic. The application of these technologies to the 3D reconstruction of plants and trees is also a current area of research. The scarcity of research into the application of NVS technology in forest stands, which consist of multiple trees and present challenges such as self-occlusion and increasing complexity, is a challenge that still has several gaps to be explored.

The term Lidar is a more widespread acronym than laser scanner. Consider using the acronym Lidar.

I made suggestions in the comments of the digital archive. Below are my general recommendations:

1) Improve the abstract: state the types of data used in the experiment, the best result found and the superiority of the proposed method over the current method.

2) Figure1: plot 1b is it possible to place the aerial view as in plot 2b?

3) Detail the two study areas. For example, for Plot_1, mention the approximate density of the trees or the variation in diameter of the visible trunks (since they were without leaves).

4) More explicitly justify the choice of these two “contrasting” areas. Although it is mentioned that Plot_1 was without leaves and Plot_2 with leaves, the specific relationship of this choice with the research objectives (comparing the performance of the methods in different conditions) could be reinforced.

5) The description of the topography as “higher in the center and lower at the edges” for both areas could be complemented with information on the magnitude of this topographic variation, if relevant to data acquisition or reconstruction results.

6) When mentioning COLMAP, it would be useful to indicate the main parameters used during the SfM (Structure from Motion) and MVS (Multi-View Stereo) process, such as the specific feature extraction and matching algorithms that were used.

7) When describing the use of NeRFStudio and the Nerfacto algorithm, it would be beneficial to provide more details about the training parameters used. For example, the number of layers and neurons in the MLP network, the loss function used, the learning rate and the batch size. This would increase the transparency and replicability of the method.

8) The choice of 30,000 epochs for training the NeRF is mentioned as being sufficient for stable results. However, a brief justification for this choice (e.g. based on the observation of loss convergence during preliminary tests) could be added.

9) Similar to NeRF, when describing the use of the 3DGS algorithm, it would be important to detail how the sparse point cloud obtained from SfM was used to build the initial distribution of Gaussian points.

10) More information on the training and optimization parameters of the 3DGS would be valuable. The article mentions the optimization of the mean, covariance matrix, opacity and spherical harmonic coefficients, but detailing the specific settings used (e.g. loss functions, learning rates, learning rate scheduling) would improve the description of the method.

11) For the video acquisition with the iPhone 11 in Plot_1, it would be useful to briefly describe how the individual frames were extracted from the video. What was the frame extraction frequency? This can influence the overlapping of the images and, consequently, the quality of the reconstruction.

12) For the aerial and ground images captured with the DJI Phantom 4 drone, more details about the flight plan (height, lateral and frontal overlap of the images) and the capture trajectories of the ground images could be provided for a better understanding of the scene coverage. Although Figure 3 illustrates the camera positions, an additional textual description could be beneficial.

13) With regard to terrestrial laser scanning with the RIEGL VZ-400, information on the positioning strategy of the six scanning stations in each plot (e.g. spatial distribution and justification for the number of stations) could be relevant for assessing the completeness of the reference data.

14) When mentioning the registration of the point clouds with the TLS cloud using common control points, it would be useful to indicate the number of control points used in each plot and the precision achieved in the registration (the article mentions a registration precision of approximately 4 mm for the fine registration of the TLS scans, but not for the registration with the other clouds).

15) More details on the pre-processing steps carried out in LiDAR360, such as the specific types of filters applied to remove noise and the normalization method used, would be important to understand the treatment of the data prior to segmentation and parameter extraction.

16) The description of the automatic segmentation tool used in LiDAR360 could include the name or a brief description of the segmentation algorithm used and the parameters configured for the extraction of individual trees.

17) Ensure that the methodology for obtaining tree parameters (TH, DBH, CD) was consistent for all point clouds (TLS, COLMAP, NeRF, 3DGS) using LiDAR360. Any difference in approach for different types of point clouds should be clearly mentioned.

18) Section 3.1 mentions that 30,000 epochs for NeRF and 20,000 for 3DGS were used to achieve stable reconstruction results. It would be beneficial to add a brief explanation of how this stability was determined.

19) Table 2 shows different processing times for COLMAP on different datasets. Although section 3.1 suggests that total image resolution can have an influence, the observation that Plot_2_UAV (with more images and presumably higher total resolution) took less time for COLMAP than Plot_1_UAV merits further discussion. It could be hypothesized that the complexity of the scene, the number of detectable features or other factors could explain this difference.

20) Although Table 3 provides the number of points, section 3.2 qualitatively describes the point clouds as having "more points" (COLMAP), "less noise" (NeRF) and being "more sparse" (3DGS). For a more rigorous comparison, it would be useful to consider including a point density metric (for example, number of points per cubic meter or per surface area of the reconstructed scene). Alternatively, the discussion could be more explicit about how the total point count translates into visible density in the different areas of the trees (crown, trunk).

21) In section 3.3, the R² and RMSE values for TH, DBH and CD are presented. It would be useful to provide additional context for these values. For example, for the specific study area or for forest management applications, what ranges of R² would be considered indicative of a good fit? What magnitudes of RMSE would be considered acceptable for the different variables (TH, DBH, CD)? This would help the reader to interpret the practical significance of the results.

22) Section 3.2 and the discussion on Figure 9 mention that Plot_2's Lidar model presented incomplete canopy information due to tree height and foliage density. When presenting the results of the parameter extraction for Plot_2 in section 3.3, it is crucial to reiterate this limitation of the Lidar reference and how it may affect the assessment of the accuracy of the other methods for canopy (CD) and potentially for height (TH). The exclusion of central trees for the comparison of TH and CD already partially addresses this, but an explicit mention at the beginning of subsection 3.3 about the limitations of the reference data for Plot_2 would be beneficial.

23) Section 3.3 mentions that models generated from higher resolution images (UAV) resulted in greater accuracy in parameter extraction. This observation is also made in the summary. In Chapter 3, it would be useful to highlight these comparisons (e.g. by directly comparing the R² and RMSE values between Plot_1_Phone and Plot_1_UAV for each method) to reinforce this conclusion.

24) Expand on the mechanism of NeRF that allows it to reconstruct dense canopies with less noise compared to photogrammetry, perhaps linking this to its nature as an implicit volumetric representation.

25) Chapter 4 makes some references to other studies. It would be beneficial to expand on these connections, discussing how the findings of this study align with or diverge from the results of previous research mentioned in the Introduction or in these specific references. For example, how do the observed processing times compare to those reported in previous individual tree reconstruction studies?

26) For which specific applications would each method be best suited? (e.g. NeRF for high-quality visualization of complex areas, photogrammetry for accurate DBH estimation in less dense areas despite the processing time).

I end my review by congratulating you on your study.

Respectfully,

Comments for author File: Comments.pdf

Comments on the Quality of English Language

Review punctuation, especially the use of commas. Watch out for some long paragraphs, check the punctuation as this tends to confuse the reader. Standardize: when introducing acronyms for the first time, their meaning should be presented. Check the rules for citations. See the citation for the applications as an example (e.g. line 284).

Author Response

1. Summary
Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding revisions/corrections highlighted/in track changes in the re-submitted files.
2. Questions for General Evaluation	Reviewer’s Evaluation	Response and Revisions
Does the introduction provide sufficient background and include all relevant references?	Must be improved	We included more related information in the introduction, and added more references.
Is the research design appropriate?	Can be improved	We adopted a general and comprehensive approach to investigate the applicability of NVS methods for 3D reconstruction of forest stands. Additionally, we incorporated more detailed information and descriptions to demonstrate the rationality of the study design.
Are the methods adequately described?	Must be improved	We added more detailed descriptions of the 3D reconstruction method, point cloud filtering, and single tree segmentation algorithm.
Are the results clearly presented?	Must be improved	We supplemented the analysis with more detailed result descriptions and discussions
Are the conclusions supported by the results?	Can be improved	We have added more content to support our conclusions. Please refer to the following section of our response for details.
3. Point-by-point response to Comments and Suggestions for Authors
Comments 1: The manuscript “Comparative Analysis of Novel View Synthesis and Photogrammetry for 3D Forest Stand Reconstruction and extraction of individual tree parameters” mentions that New View Synthesis (NVS) technology, such as NeRF and 3DGS, has become an emerging research topic. The application of these technologies to the 3D reconstruction of plants and trees is also a current area of research. The scarcity of research into the application of NVS technology in forest stands, which consist of multiple trees and present challenges such as self-occlusion and increasing complexity, is a challenge that still has several gaps to be explored. The term Lidar is a more widespread acronym than laser scanner. Consider using the acronym Lidar.
Response 1: Thank you for your positive feedback on our research. Indeed Lidar is a more widely used term representing a type of active sensing technology. In this paper, we consider "Laser Scanner" as a specific type of instrument—for example, as stated in Line 64 of the revised version: “terrestrial laser scanner.” Therefore, we have retained the term "Laser Scanner" in three instances throughout the manuscript, while using "LiDAR" in all other contexts. Comment 2: Improve the abstract: state the types of data used in the experiment, the best result found and the superiority of the proposed method over the current method. Response 2: Thank you for your valuable suggestions. Following your advice, we have added descriptions of the data used and differences between the sample plots in the abstract (Lines 13–15 of the revised version). Additionally, we have included a discussion on the applicability of different methods to various forest stand scenarios and highlighted the advantages of NeRF reconstruction results (Lines 19–24 of the revised version). Comment 3: Figure1: plot 1b is it possible to place the aerial view as in plot 2b? Response 3: Since the trees in Plot 1 are neither as tall nor as densely canopied as those in Plot 2, we did not use UAV to capture a top-down aerial view of the entire plot. We think the current display can provide more visual cue about this plot. However, in Sections 2.1 Study Area and 2.3.1 Data Acquisition of the revised manuscript, we have provided more detailed descriptions of the plot conditions and the data collection process. Comment 4: Detail the two study areas. For example, for Plot_1, mention the approximate density of the trees or the variation in diameter of the visible trunks (since they were without leaves). Response 4: Based on your suggestion, we have added descriptions of the average spacing between trees, the range of tree heights and DBH, as well as ground elevation differences between the two plots in Section 2.1 Study Area of the revised manuscript. Comment 5: More explicitly justify the choice of these two “contrasting” areas. Although it is mentioned that Plot_1 was without leaves and Plot_2 with leaves, the specific relationship of this choice with the research objectives (comparing the performance of the methods in different conditions) could be reinforced. Response 5: We have added a comparison of the differences between the two plots, and emphasized the rationale for their selection and how it relates to the research objectives in lines 188–190 of the revised manuscript. Comment 6: The description of the topography as “higher in the center and lower at the edges” for both areas could be complemented with information on the magnitude of this topographic variation, if relevant to data acquisition or reconstruction results. Response 6: Based on your suggestion, we have included descriptions of the terrain variation magnitude in lines 174–175 and 181–182 of the revised manuscript. Comment 7: When mentioning COLMAP, it would be useful to indicate the main parameters used during the SfM (Structure from Motion) and MVS (Multi-View Stereo) process, such as the specific feature extraction and matching algorithms that were used. Response 7: Thank you for your suggestion. In the revised version, we have added the specific feature extraction and matching algorithms, as well as the MVS algorithm, in lines 201–206. Additionally, in section 4. Discussion, lines 553–583, we have provided a detailed and in-depth discussion on the selection of feature matching methods and their impact on the reconstruction results. We have also outlined potential directions for future improvements and related technologies. Comment 8: When describing the use of NeRFStudio and the Nerfacto algorithm, it would be beneficial to provide more details about the training parameters used. For example, the number of layers and neurons in the MLP network, the loss function used, the learning rate and the batch size. This would increase the transparency and replicability of the method. Response 8: Thank you for your suggestion. We have added details on the number of layers and neurons in the NeRF MLP network (lines 211–224), as well as the loss functions, learning rate, and batch size for NeRF and 3DGS (lines 256–259). Comment 9: The choice of 30,000 epochs for training the NeRF is mentioned as being sufficient for stable results. However, a brief justification for this choice (e.g. based on the observation of loss convergence during preliminary tests) could be added. Response 9: Thank you for your suggestion. We have added observations on the loss convergence during NeRF and 3DGS training to explain the rationale for our choices (lines 355–360 in the revised version). Comment 10: Similar to NeRF, when describing the use of the 3DGS algorithm, it would be important to detail how the sparse point cloud obtained from SfM was used to build the initial distribution of Gaussian points. Response 10: In the revised version, lines 262-265 provide a detailed explanation of how the sparse point cloud from SfM is used to adaptively adjust and construct the initial distribution of Gaussian points. Comment 11: More information on the training and optimization parameters of the 3DGS would be valuable. The article mentions the optimization of the mean, covariance matrix, opacity and spherical harmonic coefficients, but detailing the specific settings used (e.g. loss functions, learning rates, learning rate scheduling) would improve the description of the method. Response 11: In the revised version, lines 315-316 include a description of the parameter settings for learning rate, batch size, and other settings during training. Comment 12: For the video acquisition with the iPhone 11 in Plot_1, it would be useful to briefly describe how the individual frames were extracted from the video. What was the frame extraction frequency? This can influence the overlapping of the images and, consequently, the quality of the reconstruction. Response 12: Based on your suggestion, in the revised version, lines 271-273, we explain the frequency at which image frames are extracted to ensure at least 70% overlap between images. Comment 13: For the aerial and ground images captured with the DJI Phantom 4 drone, more details about the flight plan (height, lateral and frontal overlap of the images) and the capture trajectories of the ground images could be provided for a better understanding of the scene coverage. Although Figure 3 illustrates the camera positions, an additional textual description could be beneficial. Response 13: Based on your suggestion, in the revised manuscript, lines 276-279, we have provided a detailed description of how data was collected using drones, including information on overlap, flight trajectories, and other relevant details. The caption for Figure 3 (lines 294-298) was modified to better describing the camera's position information. Comment 14: With regard to terrestrial laser scanning with the RIEGL VZ-400, information on the positioning strategy of the six scanning stations in each plot (e.g. spatial distribution and justification for the number of stations) could be relevant for assessing the completeness of the reference data. Response 14: Thank you for your valuable suggestion. To ensure data integrity, we set up six scanning stations carefully spaced around the plot, based on the extent of the study area. For more details, please refer to lines 287-288 in the revised manuscript. Comment 15: When mentioning the registration of the point clouds with the TLS cloud using common control points, it would be useful to indicate the number of control points used in each plot and the precision achieved in the registration (the article mentions a registration precision of approximately 4 mm for the fine registration of the TLS scans, but not for the registration with the other clouds). Response 15: Thank you for your suggestion. Based on your advice, we have described the process of point cloud registration using the ICP algorithm in CloudCompare software, as well as the registration accuracy. For more details, please refer to lines 316-320 in the revised manuscript. Comment 16: More details on the pre-processing steps carried out in LiDAR360, such as the specific types of filters applied to remove noise and the normalization method used, would be important to understand the treatment of the data prior to segmentation and parameter extraction. Response 16: In the preprocessing steps of LiDAR360, we used the Gaussian filtering method to remove noise and applied the Cloth Simulation Filter (CSF) algorithm to separate ground points from vegetation points. The specific details have been added to lines 325-328 in the revised manuscript. Comment 17: The description of the automatic segmentation tool used in LiDAR360 could include the name or a brief description of the segmentation algorithm used and the parameters configured for the extraction of individual trees Response 17: For all forest stand point cloud models, we applied a distance-based clustering method, setting the same threshold parameters to automatically extract individual trees and obtain the structural parameter information for each tree. The specific details have been added to lines 328-331 in the revised manuscript. Comment 18: Ensure that the methodology for obtaining tree parameters (TH, DBH, CD) was consistent for all point clouds (TLS, COLMAP, NeRF, 3DGS) using LiDAR360. Any difference in approach for different types of point clouds should be clearly mentioned. Response 18: In LiDAR360, the same steps and parameters are applied to extract the structural parameters (TH, DBH, CD) for all the forest stand point cloud models. Comment 19: Section 3.1 mentions that 30,000 epochs for NeRF and 20,000 for 3DGS were used to achieve stable reconstruction results. It would be beneficial to add a brief explanation of how this stability was determined. Response 19: As addressed in Response 9, we assess the stability of NeRF and 3DGS training based on the convergence of the loss function. Comment 20: Table 2 shows different processing times for COLMAP on different datasets. Although section 3.1 suggests that total image resolution can have an influence, the observation that Plot_2_UAV (with more images and presumably higher total resolution) took less time for COLMAP than Plot_1_UAV merits further discussion. It could be hypothesized that the complexity of the scene, the number of detectable features or other factors could explain this difference. Response 20: Thank you for your insightful suggestion. In section 4. Discussion, we discussed that this situation may be due to the simpler structure and richer texture of Plot 1, which resulted in a higher number of detectable features (lines 595-601 in the revised manuscript). At the same time, we pointed out that the reasons for this situation are multifaceted and require further experimental analysis. Comment 21: Although Table 3 provides the number of points, section 3.2 qualitatively describes the point clouds as having "more points" (COLMAP), "less noise" (NeRF) and being "more sparse" (3DGS). For a more rigorous comparison, it would be useful to consider including a point density metric (for example, number of points per cubic meter or per surface area of the reconstructed scene). Alternatively, the discussion could be more explicit about how the total point count translates into visible density in the different areas of the trees (crown, trunk). Response 21: Thank you for your valuable suggestion. Firstly, from the visual comparison of the overall tree point clouds in Figures 5 and 8, as well as the individual tree point clouds in Figures 6, 7, and 9, we preliminarily concluded that COLMAP has the largest number of point clouds, NeRF has the least noise, and 3DGS point clouds are the sparsest. Further, through the vertical point cloud count comparison of individual tree point clouds, we quantitatively support this conclusion. For example, in Figure 7, the point cloud count of COLMAP for each 0.1m interval is greater than 5000, exceeding both NeRF and 3DGS, indicating that the single tree point cloud in COLMAP has the most points. Using the vertical distribution of Lidar point clouds as a reference, NeRF's distribution is similar to Lidar, whereas COLMAP shows more significant fluctuations in the distribution curve due to the presence of noise, indicating that NeRF has the least noise in the single tree point cloud. In Figure 6, the tree trunk part of the 3DGS single tree point cloud is visually sparser, and its vertical distribution shows a point cloud count approaching zero, indicating that 3DGS point clouds are sparser. Therefore, we believe that both qualitative and quantitative analyses validate the conclusions of this study. Comment 22: In section 3.3, the R² and RMSE values for TH, DBH and CD are presented. It would be useful to provide additional context for these values. For example, for the specific study area or for forest management applications, what ranges of R² would be considered indicative of a good fit? What magnitudes of RMSE would be considered acceptable for the different variables (TH, DBH, CD)? This would help the reader to interpret the practical significance of the results. Response 22: Thank you for your valuable suggestion. Based on your advice, we have added the range of relevant metrics. Taking the TLS point cloud, which provides higher accuracy in extracting tree structure parameters, as a reference, the required accuracy range is as follows: DBH (0-2 cm), TH (0-0.5 m), and CD (0.5-1 m). A more detailed description can be found in lines 481-483 of the revised manuscript. Comment 23: Section 3.2 and the discussion on Figure 9 mention that Plot_2's Lidar model presented incomplete canopy information due to tree height and foliage density. When presenting the results of the parameter extraction for Plot_2 in section 3.3, it is crucial to reiterate this limitation of the Lidar reference and how it may affect the assessment of the accuracy of the other methods for canopy (CD) and potentially for height (TH). The exclusion of central trees for the comparison of TH and CD already partially addresses this, but an explicit mention at the beginning of subsection 3.3 about the limitations of the reference data for Plot_2 would be beneficial. Response 23: Thank you for your suggestion. Based on your advice, we have explicitly mentioned the limitations of the reference data for Plot_2 at the beginning of the "3.3. Extraction of Tree Parameters from Stand Plot Point Cloud" section (lines 472-475 of the revised manuscript). Comment 24: Section 3.3 mentions that models generated from higher resolution images (UAV) resulted in greater accuracy in parameter extraction. This observation is also made in the summary. In Chapter 3, it would be useful to highlight these comparisons (e.g. by directly comparing the R² and RMSE values between Plot_1_Phone and Plot_1_UAV for each method) to reinforce this conclusion. Response 24: Thank you for your suggestion. Following your advice, we directly compared the R² and RMSE values for each method between Plot_1_Phone and Plot_1_UAV in Section 3 (lines 503-507 and 514-517 of the revised manuscript). Additionally, we further analyzed and discussed this conclusion in Section 4, Discussion (lines 665-677 of the revised manuscript). Comment 25: Expand on the mechanism of NeRF that allows it to reconstruct dense canopies with less noise compared to photogrammetry, perhaps linking this to its nature as an implicit volumetric representation. Response 25: Thank you for your suggestion. We believe that NeRF's ability to reconstruct dense canopies may be attributed to its differentiable implicit volume representation, which optimizes camera poses through backpropagation of loss gradients. Even with imperfect input data, this method can significantly reduce pose estimation errors, enhancing scene clarity and detail expression, ultimately achieving high-quality reconstruction. For more details, please refer to the discussion section (lines 620-624 in the revised manuscript). Comment 26: Chapter 4 makes some references to other studies. It would be beneficial to expand on these connections, discussing how the findings of this study align with or diverge from the results of previous research mentioned in the Introduction or in these specific references. For example, how do the observed processing times compare to those reported in previous individual tree reconstruction studies? Response 26: Thank you for your suggestion. In Chapter 4, we referenced relevant studies to discuss how to improve the quality and results of the upstream SfM for 3D reconstruction, such as through more accurate feature matching. Additionally, in lines 584-594 of the revised manuscript, we compared the time taken by the current method for processing forest stand scenes with the time required for reconstructing a single tree. Comment 27: For which specific applications would each method be best suited? (e.g. NeRF for high-quality visualization of complex areas, photogrammetry for accurate DBH estimation in less dense areas despite the processing time). Response 27: Thank you for your suggestion. Based on your feedback, we have clarified that NeRF is more suitable for reconstructing trees in complex areas, while photogrammetry is better suited for regions with lower density and simpler structures. We also emphasized that NeRF requires complete perspectives to ensure high-quality reconstruction, whereas photogrammetry can provide more accurate DBH estimations (see lines 17-27 and 633-637 in the revised manuscript).
4. Response to Comments on the Quality of English Language
Point 1: Review punctuation, especially the use of commas. Watch out for some long paragraphs, check the punctuation as this tends to confuse the reader. Standardize: when introducing acronyms for the first time, their meaning should be presented. Check the rules for citations. See the citation for the applications as an example (e.g. line 284).
Response 1: Thank you very much for your questions raised and comments made in the PDF file. We have accordingly reviewed and made corrections to the punctuation and sentences in the article. For example, revisions have been made in lines 74, 102-103, 121-125, 150-153, and so on.
5. Additional clarifications
We fully appreciate your valuable comments and suggestions.

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Dear authors,

I would like to thank you for submitting the revised version of the manuscript and for implementing the changes suggested during the review process. I have carefully analyzed your responses and also compared the two versions of the manuscript.

I note that most of the recommendations have been implemented, which has contributed to the clarity and scientific rigor of the manuscript. The inclusion of additional details about the study areas, the methodology employed for photogrammetry, NeRFStudio and 3DGS6, the data acquisition and pre-processing, and the justification for the parameter choices has considerably enriched the text.

The way you addressed the suggestions related to the presentation and discussion of the results also demonstrates an engagement with the feedback provided.

I also appreciate the responses that accompanied the revised version of the manuscript.

Respectfully,