Mapping Trails and Tracks in the Boreal Forest Using LiDAR and Convolutional Neural Networks
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe present study focuses on the detection and mapping of trails and tracks, including those caused by wildlife or off-highway vehicles (OHVs). The study utilizes a convolutional neural network-based approach for the processing of DTMs. The acquisition of data was conducted through the utilization of a UAV-mounted LiDAR-based survey (1), an ALS survey (2), and UAV based sub-centimeter resolution optical imagery (3). The Digital Terrain Model (DTM) was acquired at a resolution of 10 centimeters, whereas the resolution was 50 centimeters for the ALS data. The third survey was conducted a year after the others, to provide visual guidance for the creation of training samples. The results section includes a comparison of predicted trails and tracks using two datasets. Furthermore, an analysis of the density of the trails and tracks, depending on different land-cover classes or the density of the canopy, is conducted in the results section.
In general, the manuscript is well-structured. The objectives, materials, and evaluation metrics are described well. The methods section needs some improvements for a better understanding of the article. Line-by-line comments are given follow.
Line 165
Please check the Section numbering. It seems 2.2, 2.3, 2.4 are missing
2.2. Data Acquisition and Processing
Lines 175-176 and 197-198
DTM generation is done using the las2dem tool for Aerial lidar data. In contrast, for the UAV data, a knn-based inverse distance-based algorithm was used. Can you explain why different methods were used? Please explain if there is a process or data-caused limitation.
2.3.1 Training Data Preparation
Lines 232--236
The reviewer understands that by a visual interpretation on aerial LIDAR data, a labeling process was conducted to create training samples. Then, an initial U-NET model was trained based on training samples. Finally, the predictions of initial model were used as training samples for the U-NET models of 10cm DTMs and 50cm DTMs explicitly. The reviewer suggests the following:
1- The labeling process can be explained in detail. It is unclear that if the DTMs or points are labeled.
2- Although the authors create a training dataset, they do not use the training data but the predictions on DTMs. Please provide more information on why it was necessary to have initial training.
2.3.2. U-Net
Lines 262-263
Please give information about the input images. U-NET training and prediction takes single band 512x512 images as input. However, the information of image specification is missing. Do the gray values directly show the elevation on DTMs, or are there any enhancements like gray scale conversion, or filtering applied before forwarding the images to the network?
2.3.3. Accuracy Assessment
Lines 305-308
Evaluation metrics seem to be computed based on randomly sampled data. This can be a nice way, especially if the ground truth is not based on field measure but on visual interpretation. However, the Reviewer suggests extending the discussion section about the reasons and impact of the accuracy assessment based on randomly selected points.
Line 365
In Figure 5, the trail densities are hardly visible on the upper image. It would be nice to provide a higher scaled part.
Author Response
(Comments 1) In general, the manuscript is well-structured. The objectives, materials, and evaluation metrics are described well. The methods section needs some improvements for a better understanding of the article.
(Response 1) Thank you for your thoughtful comments on our work!
(Comments 2) Line 165
Please check the Section numbering. It seems 2.2, 2.3, 2.4 are missing
(Response 2) Good catch. The section numbers have been fixed.
(Comments 3) Lines 175-176 and 197-198
DTM generation is done using the las2dem tool for Aerial lidar data. In contrast, for the UAV data, a knn-based inverse distance-based algorithm was used. Can you explain why different methods were used? Please explain if there is a process or data-caused limitation.
(Response 3) The two DTM-generation methods reflect differences in point density and file structure between the piloted-aircraft and UAV LiDAR datasets. For the piloted-aircraft data (30 pts/m²), we used the las2dem tool in LAStools, which performed robustly with vendor-classified ground returns at this resolution. In contrast, the UAV dataset had a much higher point density (185 pts/m²) and was processed manually from raw point clouds. We applied a k-nearest neighbor inverse distance weighting (knnIDW) algorithm via the lidR package in R to create smoother DTMs at 10-cm resolution and to exert greater control over interpolation behavior in complex microtopography. While both methods produced high-quality terrain models, the choice of interpolation strategy was guided by the structure and characteristics of the raw input data for each platform. We have added the following passage to Section 2.2 to explain this:
“The use of different DTM-generation methods for the drone and piloted-aircraft data reflects differences in point density and preprocessing requirements between the two data sets. The vendor-supplied piloted-aircraft data were processed with las2dem due to their standardized classification and moderate density (30 pts/m²), while the higher-density UAV data (185 pts/m²) were interpolated using a k-nearest neighbor inverse distance weighting algorithm (knnIDW) in lidR to better preserve fine-scale topographic detail in complex terrain.”
(Comments 4) Lines 23 - 36
The reviewer understands that by a visual interpretation on aerial LIDAR data, a labeling process was conducted to create training samples. Then, an initial U-NET model was trained based on training samples. Finally, the predictions of initial model were used as training samples for the U-NET models of 10cm DTMs and 50cm DTMs explicitly. The reviewer suggests the following:
1- The labeling process can be explained in detail. It is unclear that if the DTMs or points are labeled.
2- Although the authors create a training dataset, they do not use the training data but the predictions on DTMs. Please provide more information on why it was necessary to have initial training.
(Response 4) Below, we clarify both the labeling process and our rationale for the two-stage training strategy:
- Labeling process:
Our initial training labels were manually created by visually interpreting DTM raster tiles derived from the piloted-aircraft LiDAR data (50-cm resolution). We did not label raw point clouds. Instead, experienced interpreters annotated linear depressions directly on rasterized terrain surfaces using GIS software, identifying likely trail or track features. These raster labels were binary masks corresponding to presence or absence of trails/tracks, which were aligned pixel-wise with their respective DTM input patches.
- Rationale for initial U-Net model and refinement:
Our final models for the 10-cm and 50-cm DTMs were trained using labels generated by an initial U-Net model. This two-stage approach was adopted to address the incompleteness and sparsity of the manually labeled training data. Because visual labeling of fine-scale features is time-consuming and subjective, the initial model allowed us to bootstrap a larger, more representative training set with improved label consistency. Manual review and post-processing were applied to filter obvious false positives from the initial predictions before using them as labels for final training. This strategy enabled us to scale the training dataset while maintaining quality, and has been used in other segmentation workflows to reduce reliance on exhaustive manual annotation.
The first paragraph of Section 2.3.1 has been updated to clarify both points:
“We adopted a progressive refinement approach to training-data preparation and labeling. First, we used visual interpretation of aerial LiDAR data to manually label a pre-liminary training dataset. Labels were created by visually interpreting rasterized 50-cm DTMs derived from the piloted-aircraft LiDAR data. Trail and track features were delineated manually by expert interpreters using GIS software, based on their morphometric appearance in the terrain models. No point-cloud labeling was performed. An initial U-Net model was trained using the manually labeled patches. This model’s predictions were used to generate a refined set of labels for final model training on both the 10-cm and 50-cm DTMs. To ensure quality, we manually inspected the output from the initial model and corrected errors prior to using them as labels in the second-stage training. This hybrid approach allowed us to accelerate label creation while maintaining control over training data quality”
(Comments 5) Lines 262-263
Please give information about the input images. U-NET training and prediction takes single band 512x512 images as input. However, the information of image specification is missing. Do the gray values directly show the elevation on DTMs, or are there any enhancements like gray scale conversion, or filtering applied before forwarding the images to the network?
(Response 5) The input images used for U-Net training were single-band patches of LiDAR-derived DTMs, clipped around center points using a fixed patch size (e.g., 256×256 or 512×512 px). No grayscale conversion was applied, as the input data already represents continuous elevation values in a single channel. The DTM values were not filtered or transformed beyond alignment with the label layer. During training, pixel values were normalized to [0, 1] using min-max scaling per patch. We have updated the second paragraph of Section 2.3.1 with this information:
“Each DTM patch was independently normalized to a 0 – 1 range using min-max scaling. No further transformations were applied.”
(Comments 6) Lines 305-308
Evaluation metrics seem to be computed based on randomly sampled data. This can be a nice way, especially if the ground truth is not based on field measure but on visual interpretation. However, the Reviewer suggests extending the discussion section about the reasons and impact of the accuracy assessment based on randomly selected points.
(Response 6) We agree that the use of randomly sampled points – rather than full pixel-wise comparisons – is suitable when ground truth is derived from visual interpretation rather than field measurements. We have now extended the Discussion (Section 4.5) to explain the rationale for this approach, as well as its implications for model evaluation:
“Because our ground truth dataset was based on visual interpretation of ultra-high-resolution orthomosaics – rather than exhaustive field surveys – traditional full-scene pixel-wise accuracy assessments could introduce spatial uncertainty and inflate error estimates due to edge effects, subjective interpretation boundaries, or mixed pixels near feature edges. To mitigate this, we adopted a stratified random point sampling strategy, which enabled more consistent evaluation of trail/no-trail classification while minimizing spatial ambiguity. This approach reduces the likelihood of over-penalizing near-boundary errors and has been used in other remote sensing applications where reference data quality is constrained. However, we acknowledge that this method may underestimate small-scale misalignments or partial detections, and future work could explore pixel-wise comparisons or polygonal overlap metrics in conjunction with this approach.”
(Comments 7) Line 365
In Figure 5, the trail densities are hardly visible on the upper image. It would be nice to provide a higher scaled part.
(Response 7) We have provided enlarged subsets below the main panel in Figure 5, where individual trails are shown at full resolution. This combined approach (density map in the main body; individual trails in the insets) offers both a landscape-scale
Reviewer 2 Report
Comments and Suggestions for AuthorsCore Research Questions
The aim of this study is to achieve automated mapping of wildlife and off-road vehicle trails/tracks in boreal forests through high-density LiDAR data and convolutional neural networks (CNNs), and to answer the following questions:
Feasibility of the Method: Can LiDAR and CNN effectively detect linear features with diverse morphologies in natural environments?
Comparison of Platforms: Is there a significant difference in data accuracy between drones (10cm DTM) and piloted aircraft (50cm DTM)?
Ecological Impact: How is the distribution of trails/tracks affected by land cover types (such as peatlands) and industrial disturbances (seismic lines)?
Originality and Contribution to the Field
Originality: This is the first systematic evaluation of the application of LiDAR and CNN in mapping boreal forest trails, filling the gap in existing literature regarding automated methods for detecting natural trails in complex terrains and vegetation cover.
Relevance: The research results have direct application value for wildlife habitat protection, monitoring of human activity disturbances, and ecosystem restoration (such as seismic line restoration).
Added Value: Compared with traditional manual interpretation or optical imaging methods (such as Kaiser et al., 2004), this study proposes a scalable automated process and reveals the "funnel effect" of seismic lines on the trail network (with a density 4.4 times that of natural areas).
Suggestions for Methodological Improvement
Details of the U-Net Architecture:
It is necessary to supplement the network structure diagram and the specific implementation of the attention mechanism (such as the difference between gated attention and standard self-attention) to enhance reproducibility.
It is recommended to compare the performance of other segmentation models (such as DeepLabv3+ or Swin-UNet) in sparse feature detection to verify the optimization space of the current architecture.
Experimental Basis for Threshold Selection:
It is necessary to provide a quantitative basis for the 40% probability threshold (such as maximizing the F1 score or application scenario requirements), referring to the threshold optimization strategies of similar studies (such as Bhatnagar et al., 2022).
Limitations of Data Augmentation:
Currently, only rotation, flipping, and brightness adjustment are used. It is recommended to add affine transformation or simulate different terrain noises to improve the generalization ability of the model in complex environments (such as dense forest canopies).
Further Comparison and Validation
Impact of Seasonal Changes: The LiDAR data was collected in summer, and the snow cover in winter may obscure the trail features. It is necessary to discuss the applicability of the model in cross-seasonal scenarios.
Ground Validation Data: Currently, it relies on the interpretation of drone orthophotos as a reference. It is recommended to supplement field sampling (such as RTK-GNSS track records) to reduce potential errors in visual interpretation.
Refinement of Feature Classification: The model does not distinguish between wildlife and off-road vehicle tracks. It can be attempted to fuse multi-temporal LiDAR or optical data (such as texture features) to assist in classification.
Consistency of Conclusions
The conclusions are consistent with the experimental evidence. The effectiveness of LiDAR and CNN in open environments such as peatlands has been successfully verified (F1 = 77%), and the aggregation effect of seismic lines on the trail network has been revealed.
It is necessary to further clarify the limitations of the method in areas with dense forest canopies (such as track fragmentation) and propose improvement directions in the discussion (such as combining the minimum cost path algorithm).
Appropriateness of References
The literature covers LiDAR applications (such as Du et al., 2024), CNN segmentation (Ronneberger et al., 2015), and ecological impacts (Dabros et al., 2022), but lacks the citation of recent multi-scale fusion frameworks (such as the "multi-scale spatial enhancement network" in Li et al., 2023). It is recommended to supplement it to enhance the depth of method comparison.
Suggestions for Chart Improvement
Figure 5 (Trail Density Map): It is recommended to add a scale bar and legend to clarify the density range (km/km²) corresponding to the color gradient.
Table 2 (Statistics of Land Cover Types): It is necessary to define the quantitative criteria for "high/low density tree bogs" (such as the canopy coverage threshold) in the footnotes.
Figure 8 (Comparison of LiDAR and CNN Outputs): Local enlarged views can be added to highlight the model's ability to capture weak terrain features.
Other Suggestions
Terminology Unification: In the article, "piloted aircraft" and "aerial platform" are used alternately. It is recommended to unify them as "piloted aircraft platform".
Code and Data Publicity: The GitHub code repository needs to supplement the training protocol (such as the learning rate scheduling strategy) and the version of the dependent libraries to ensure reproducibility.
Final Conclusion
This study provides an innovative method for the automated mapping of boreal forest trails, and the conclusion of the platform comparison (no significant difference in accuracy between drones and piloted aircraft) has important application value. By supplementing method details, expanding literature comparison, and strengthening verification data, the academic influence of the paper can be further enhanced, providing a reliable tool for ecological management and industrial disturbance assessment.
Comments on the Quality of English Language
- Some sections (e.g., Introduction, Discussion) contain overly elaborate sentences that impede readability. Streamlining the language and eschewing redundancy (e.g., repeated references to peatland substrate properties) would enhance clarity.
- Terminology inconsistencies exist. For example, "piloted aircraft" is used interchangeably with "aerial platform"; standardizing the terminology would reduce ambiguity.
Author Response
(Comments 1) It is necessary to supplement the network structure diagram and the specific implementation of the attention mechanism (such as the difference between gated attention and standard self-attention) to enhance reproducibility.
(Response 1) The novelty of our work lies primarily in the application context, not in the underlying network architecture. We feel that a new figure illustrating the attention mechanism would bring undue focus to a very minor element of our contribution, particularly since this is already a very figure-heavy manuscript (ten!). To address the reviewer’s comment, we have added the following passage to Section 2.3.2:
“We implemented gated attention, following the Attention U-Net architecture [47], in which a gating signal from coarser feature maps is used to selectively highlight relevant activations in skip connections. This approach can improve segmentation accuracy by enhancing model sensitivity to important structures while maintaining computational efficiency. The implementation of the Attention U-Net was adapted from the open-source repository available at https://github.com/karolzak/keras-unet.”
We trust that this is acceptable.
(Comments 2) It is recommended to compare the performance of other segmentation models (such as DeepLabv3+ or Swin-UNet) in sparse feature detection to verify the optimization space of the current architecture.
(Response 2) While we appreciate the reviewer’s suggestion to compare our architecture with other segmentation models, this falls outside the scope of our current study. Our primary objective was not to conduct a comprehensive model comparison, but rather to demonstrate the feasibility of detecting trails and tracks using LiDAR and CNN-based methods. That said, we fully agree that exploring the performance of transformer-based or encoder-decoder hybrid architectures is a worthwhile direction. In fact, we have recently trained a new trail segmentation model with transformer-based SegFormer architecture, and plan to publish those results as part of a follow-up study. We have now acknowledged this explicitly in our revised Discussion (Section 4.6):
“Subsequent studies could also explore alternative segmentation architectures, such as transformer-based models like SegFormer [71], which have demonstrated strong performance on a variety of semantic segmentation tasks. These architectures may offer advantages in capturing complex spatial patterns or improving generalization, particularly in large-scale or diverse datasets.”
(Comments 3) It is necessary to provide a quantitative basis for the 40% probability threshold (such as maximizing the F1 score or application scenario requirements), referring to the threshold optimization strategies of similar studies (such as Bhatnagar et al., 2022).
(Response 3) We have clarified the rationale for selecting the 40% probability threshold in the manuscript. While we explored a range of thresholds during accuracy assessment, the 40% value was selected for mapping trails across larger areas based on visual inspection of model outputs, with a deliberate bias toward precision over recall. Although the threshold was not formally optimized (e.g., using F1 score), it was chosen to ensure reliability under current data constraints. A brief justification has been added to Section 2.3.2:
“We selected this threshold based on visual inspection of model outputs to reduce false positives while maintaining sufficient recall for ecological mapping purposes.”
(Comments 4) Currently, only rotation, flipping, and brightness adjustment are used. It is recommended to add affine transformation or simulate different terrain noises to improve the generalization ability of the model in complex environments (such as dense forest canopies).
(Response 4) We appreciate the reviewer’s suggestion to incorporate more diverse data augmentation strategies, such as affine transformations or simulated terrain noise, to improve model generalization under complex conditions (e.g., dense canopy cover). In this study, we intentionally kept our augmentation strategy simple. More aggressive augmentations require careful design and preprocessing to avoid introducing unrealistic terrain features that could negatively impact model performance. We have added a statement to Section 4.6 to bring attention to the potential of advanced augmentation techniques in future studies:
“Additional gains may also be achieved by including more advanced data augmentation techniques, such as affine transformations or simulated topographic noise, to improve model robustness in complex environments with dense vegetation cover.”
(Comments 5) Impact of Seasonal Changes: The LiDAR data was collected in summer, and the snow cover in winter may obscure the trail features. It is necessary to discuss the applicability of the model in cross-seasonal scenarios.
(Response 5) Our current model was developed using LiDAR and reference imagery collected under snow-free conditions, which are ideal for detecting persistent linear depressions in the terrain. Winter brings not only snow cover, which can obscure or flatten these features, but also the formation of new, temporary trails. As such, it would not be appropriate to apply a summer model to winter conditions. We have added a note in the Discussion (Section 4.5) to acknowledge this limitation and emphasizing the need for future work focused on developing winter-specific models:
“Our models were trained on summer, snow-free LiDAR data and have not been tested under winter conditions. Snow cover may obscure existing trails and create new, temporary ones on the snow surface. Future research should investigate winter-specific models to address these differences.”
(Comments 6) Ground Validation Data: Currently, it relies on the interpretation of drone orthophotos as a reference. It is recommended to supplement field sampling (such as RTK-GNSS track records) to reduce potential errors in visual interpretation.
(Response 6) We agree that field-based reference data (e.g., RTK-GNSS track logs) would further improve the robustness of the accuracy assessment. However, as noted in Section 4.5, generating a complete ground-based census of trails and tracks across our 50 × 50 m test plots would have been logistically prohibitive, given the high density and subtlety of features in our study area. We found that visual interpretation of ultra-high-resolution (0.5 cm) orthomosaics by trained experts provided a practical and precise alternative. Nevertheless, we fully agree that integrating RTK-GNSS field data could reduce interpretation errors and offer valuable complementary validation in future work. We have expanded the discussion of this point in Section 4.5.
“While our reference dataset was developed through expert interpretation of high-resolution drone orthomosaics, we acknowledge that this approach may introduce some subjectivity and potential errors of omission or commission. Although field-based GNSS track surveys would offer a more direct form of ground validation, the sheer density and complexity of features in our test plots made comprehensive field mapping impractical. Future studies could benefit from combining orthophoto interpretation with targeted RTK-GNSS ground sampling to strengthen reference datasets and reduce interpreter uncertainty.”
(Comments 7) Refinement of Feature Classification: The model does not distinguish between wildlife and off-road vehicle tracks. It can be attempted to fuse multi-temporal LiDAR or optical data (such as texture features) to assist in classification.
(Response 7) We agree that distinguishing between wildlife and OHV tracks presents a meaningful challenge with important ecological and management implications. As noted in Section 4.6, this study focused on undifferentiated trails and tracks in line with our scope of work. However, we have already developed and tested a complementary model specifically designed to map OHV tracks. That work is being prepared for a separate publication.
(Comments 8) It is necessary to further clarify the limitations of the method in areas with dense forest canopies (such as track fragmentation) and propose improvement directions in the discussion (such as combining the minimum cost path algorithm).
(Response 8) We agree that trail and track detection under dense forest canopies remains a significant challenge due to LiDAR occlusion and the resulting fragmentation of detectable features. While we noted this limitation in our discussion of canopy effects (Section 4.2), we have now expanded the discussion in Section 4.6 to more explicitly highlight the potential for post-processing approaches—such as minimum-cost path algorithms—to reconnect fragmented segments and improve spatial coherence in trail and track maps:
“Dense canopy cover remains a key limitation in our current workflow, as reduced LiDAR ground returns in these areas can lead to fragmentation and omission of trail and track features. Future research could explore the application of post-processing techniques—such as minimum-cost path algorithms or graph-based network reconstruction methods—to infer likely connections between detected fragments and restore spatial continuity in areas with limited terrain visibility [67–68].”
(Comments 9) The literature covers LiDAR applications (such as Du et al., 2024), CNN segmentation (Ronneberger et al., 2015), and ecological impacts (Dabros et al., 2022), but lacks the citation of recent multi-scale fusion frameworks (such as the "multi-scale spatial enhancement network" in Li et al., 2023). It is recommended to supplement it to enhance the depth of method comparison.
(Response 9) We thank the reviewer for highlighting this gap in the literature. We have updated the Discussion to acknowledge recent advances in multi-scale spatial enhancement networks for image segmentation. While our present study focused on a lightweight U-Net architecture, we agree that multi-scale feature fusion frameworks represent a promising direction for future trail and track mapping applications. A brief comparison and citation have been added in Section 4.6 (Future Research Needs):
“Subsequent studies could also explore alternative segmentation architectures, such as transformer-based models like SegFormer [71], which have demonstrated strong performance on a variety of semantic segmentation tasks. These architectures may offer advantages in capturing complex spatial patterns or improving generalization, particularly in large-scale or diverse datasets.”
(Comments 10) Figure 5 (Trail Density Map): It is recommended to add a scale bar and legend to clarify the density range (km/km²) corresponding to the color gradient.
(Response 10) Figure 5 has a scale bar and legend.
(Comments 11) Table 2 (Statistics of Land Cover Types): It is necessary to define the quantitative criteria for "high/low density tree bogs" (such as the canopy coverage threshold) in the footnotes.
(Response 11) Definitions of treed fens follow the Alberta Wetland Classification System (2015), which defines fens with >25% tree cover as "wooded." These were further categorized into high-density (>50% canopy cover) and low-density (25–50% canopy cover) classes.
This clarification has been added to the caption of Table 2: “Note that high-density treed fens have >50% canopy cover, while low-density treed fens have 25–50% canopy cover.”
(Comments 12) Figure 8 (Comparison of LiDAR and CNN Outputs): Local enlarged views can be added to highlight the model's ability to capture weak terrain features.
(Response 12) While we appreciate the reviewer’s suggestion here (we love maps!) we have elected not to make this enhancement. Adding insets to the current figure would add space to a manuscript that is already figure-heavy (10 figures). The current figure illustrates the point adequately.
(Comments 13) Terminology Unification: In the article, "piloted aircraft" and "aerial platform" are used alternately. It is recommended to unify them as "piloted aircraft platform".
(Response 13) An aerial platform can be either one with a pilot on-board (“piloted aircraft”) or a controller on the ground (“drone”). “Aerial platform” refers to both. This is why we have elected to retain all three terms.
(Comments 14) Code and Data Publicity: The GitHub code repository needs to supplement the training protocol (such as the learning rate scheduling strategy) and the version of the dependent libraries to ensure reproducibility.
(Response 14) Thank you for the suggestion. We have significantly improved the GitHub repository to enhance reproducibility. This includes:
- Adding complete preprocessing, training, and inference scripts.
- Moving all parameters to structured Hydra configuration files under a dedicated configs/ folder.
- Integrating learning rate scheduling (ReduceLROnPlateau) into the training script (1_train.py).
- Updating the README.md with clear instructions on data formats, training workflows, and model usage.
- Providing a requirements.txt with fixed library versions.
These updates ensure that the training protocol is fully documented and reproducible.
(Comments 15) This study provides an innovative method for the automated mapping of boreal forest trails, and the conclusion of the platform comparison (no significant difference in accuracy between drones and piloted aircraft) has important application value. By supplementing method details, expanding literature comparison, and strengthening verification data, the academic influence of the paper can be further enhanced, providing a reliable tool for ecological management and industrial disturbance assessment.
(Response 15) We thank the reviewer for their thoughtful comments on our work!
Reviewer 3 Report
Comments and Suggestions for AuthorsThis study validates the effectiveness of using LiDAR and CNN to map trails and tracks in boreal forests and discusses the impact of different land cover types on the accuracy of trail and track mapping, as well as their driving factors. Additionally, the study analyzes wildlife preferences for path selection under different land cover types. The research demonstrates a certain degree of applied innovation. However, the manuscript requires improvement in writing, including language expression and methodological design. Specific comments are as follows:
Introduction
- Lines 78-80 are unrelated to the research topic of the paper.
- Lines 98-100 claim a lack of studies applying remote sensing and mention that some methods rely on manual editing. However, the previously listed literature utilizes remote sensing data and deep learning models. This conclusion is weak and should be logically refined.
- The conclusion in lines 100-101 could be placed after introducing LiDAR data. It is recommended to merge lines 98-108 into a single paragraph and rewrite them.
- In lines 115-116, please clarify the purpose of comparing the two datasets. Also, remove the final “and” in line 116.
- Please supplement the final paragraph of the introduction with a justification. Given that UAV data can also provide high-resolution mapping, why is it necessary to validate manned aircraft data for producing large-scale, high-quality maps? Does manned aircraft data cover a wider area than UAVs, or is it more cost-effective?
Methods
- L234-236: Is the accuracy of the new model ensured when using the initial U-Net output to train it? Please provide supporting evidence.
- In lines 242-243, it is stated that a 50 cm DTM corresponds to 1 pixel, while a 10 cm DTM corresponds to 3 pixels. Is this a typo?
- Please provide relevant details on how the trails and tracks density map in Figure 5 was generated. Also, clarify why a 5-meter resolution density map was created.
- The manuscript preparation appears somewhat careless. Revision marks are still visible in lines 323-334.
Results
- In lines 361-362, the term “probability threshold of 40” should be written as “40%” to maintain consistency with the description in the Methods section.
- The manuscript inconsistently refers to the t trails and tracks detection model using different terms: DL model (such as the caption of table 1), CNN, and U-Net. Please use a consistent term throughout the paper.
- In Figure 6B on page 11, the subtitle is not fully displayed (the word “data” is cut off).
Discussion
- The comparison of model results in lines 455-461 is meaningless if it only considers the CNN model without taking into account the differences in study areas and datasets.
- The authors argue that the manned aircraft results slightly outperform UAV results due to the higher-powered Riegl VQ-1560ii system. Please explain the functions of this system and why it leads to better results.
Conclusion
- It is recommended to merge the second and third paragraphs to express the ecological implications based on the study results. The final paragraph could add the study’s limitations and future research directions.
Writing Issues
- Abbreviations should be fully spelled out upon first appearance.
- In line 85, a reference should also be provided for Kaiser et al., even though it was cited earlier.
- Some expressions are not typical for academic papers, and many are overly colloquial. For example, Line 319: “This output can be seen in figures throughout the manuscript.”
The manuscript contains numerous awkward and overly colloquial expressions. The excessive use of short, fragmented paragraphs (particularly in the Results and Discussion sections) requires consolidation. I recommend the authors engage a professional editing service for thorough language polishing.
Author Response
(Comments 1) Lines 78-80 are unrelated to the research topic of the paper.
(Response 1) We agree that the connection could be more explicit. We have revised lines 78 - 80 to better clarify how these indirect studies differ from our work and to reinforce the focus of our research. The passage now reads as follows:
“While some studies have assessed the indirect effects of concentrated animal activity on vegetation using spectral indices (e.g., [30 - 32]), these do not attempt to detect or map trails and tracks directly. The body of peer-reviewed research focused on the automated detection of trails and tracks in natural areas using remote sensing is quite limited.”
(Comments 2) Lines 98-100 claim a lack of studies applying remote sensing and mention that some methods rely on manual editing. However, the previously listed literature utilizes remote sensing data and deep learning models. This conclusion is weak and should be logically refined.
(Response 2) We have revised the passage to avoid overstating the gap and to better reflect the examples previously cited. The passage now reads as follows:
“Recent studies have begun exploring automated trail and track detection with remote sensing, though applications remain limited in scope and geography. The potential of CNNs and other forms of artificial intelligence to map trails and tracks over large terrestrial areas using high-resolution airborne datasets is largely untapped.”
(Comments 3) The conclusion in lines 100-101 could be placed after introducing LiDAR data. It is recommended to merge lines 98-108 into a single paragraph and rewrite them.
(Response 3) We appreciate the reviewer’s comment and considered the proposed restructuring. However, we feel the current paragraph organization — which first identifies the limited literature on automated trail/track detection and then introduces LiDAR as our chosen tool — allows us to clearly distinguish the research gap before motivating our methodology. The claim in lines 100–101 is intended to summarize the state of the field based on the literature discussed immediately prior. While we agree that this section could be improved for clarity (which we have done in response to your previous comment), we believe the current structure appropriately reflects the logical progression of our argument.
(Comments 4) In lines 115-116, please clarify the purpose of comparing the two datasets. Also, remove the final “and” in line 116.
(Response 4) Thank you for the helpful comment. We have clarified the purpose of comparing the two LiDAR datasets in Objective 2 by highlighting the trade-offs between spatial resolution and operational scalability. With regard to the final “and” in line 116, we have chosen to retain it, as this follows standard grammatical convention for multi-line lists separated by semicolons and improves the clarity of the objective structure. The passage now reads as follows:
“Our goal is to create tools and processing workflows for mapping trails and tracks automatically over large terrestrial areas using remote sensing. To help reach this goal, we established three research objectives, which are the subject of this paper:
- To demonstrate the capacity of high-density LiDAR and CNNs to map trails and tracks automatically in a natural environment;
- To compare the accuracy of trail/track maps developed with LiDAR from a drone platform (185 points/m2) and a piloted aircraft platform (30 points/m2) to evaluate trade-offs between spatial resolution and operational scalability; and
- To measure the abundance and distribution of tracks and trails across different land-cover classes, and their co-location with anthropogenic disturbances across our study area in the Canadian boreal forest.”
(Comments 5) Please supplement the final paragraph of the introduction with a justification. Given that UAV data can also provide high-resolution mapping, why is it necessary to validate manned aircraft data for producing large-scale, high-quality maps? Does manned aircraft data cover a wider area than UAVs, or is it more cost-effective?
(Response 5) We have supplemented the final paragraph of the Introduction with a justification for evaluating piloted aircraft LiDAR data. While UAVs offer excellent spatial resolution, their utility is constrained by regulatory limitations, range, and operational cost when applied to large or remote areas. By validating the performance of piloted aircraft data, we aim to support the scalability of trail and track mapping workflows to broader operational contexts. The paragraph has been revised:
“To our knowledge, this is the first study on the use of LiDAR and CNNs for mapping terrestrial trails and tracks to appear in the peer-reviewed literature. Our research demonstrates that not only can trails and tracks be detected accurately with remote sensing un-der the correct conditions, but that high-quality maps can be obtained over large areas using data from piloted aircraft. While drone platforms deliver very high-resolution data, their operation can be limited by line-of-sight restrictions, battery life, and regulatory constraints. Piloted aircraft platforms can be more practical for mapping large or inaccessible areas. Assessing their performance helps to establish scalable solutions. While our work is a case study, our findings reveal the potential for these workflows across a variety of ex-citing applications.”
(Comments 6) L234-236: Is the accuracy of the new model ensured when using the initial U-Net output to train it? Please provide supporting evidence.
(Response 6) We have clarified that the output of the initial U-Net model was manually reviewed and corrected prior to being used in second-stage training. This quality control step helped ensure that any inaccuracies in the preliminary model did not propagate into the final models. We have added the following passage to the first paragraph of Section 2.3.1 (Training Data Preparation) as a result:
“To ensure quality, we manually inspected the output from the initial model and corrected errors prior to using them as labels in the second-stage training. This hybrid approach allowed us to accelerate label creation while maintaining control over training data quality.”
(Comments 7) In lines 242-243, it is stated that a 50 cm DTM corresponds to 1 pixel, while a 10 cm DTM corresponds to 3 pixels. Is this a typo?
(Response 7) The values of 1 pixel (for the 50-cm DTM) and 3 pixels (for the 10-cm DTM) correspond to comparable real-world trail widths. We have revised the sentence to clarify that pixel width reflects ground resolution, and that the buffer step was designed to standardize physical trail widths across the two datasets. The passage now reads as follows:
“Finally, we thinned the segmented features (trails and tracks) to a skeletal form, then buffered them back to standardized widths appropriate to each resolution — 1 pixel (50 cm) for the 50-cm DTM, and 3 pixels (30 cm) for the 10-cm DTM — to represent approximate real-world trail widths”
(Comments 8) Please provide relevant details on how the trails and tracks density map in Figure 5 was generated. Also, clarify why a 5-meter resolution density map was created.
(Response 8) We have added methodological details to the caption of Figure 5 explaining how the map was produced. We prefer to keep this description in the figure caption to avoid cluttering the Results section. The figure caption now contains the following description:
“To generate this map, we rasterized the vectorized trail/track outputs onto a 5-meter grid using a line density calculation. Each grid cell value represents the total length (in meters) of trails and tracks within a 5-meter radius of the cell center. The 5-meter resolution was selected to balance detail and interpretability across the 59-km² study area, allowing spatial patterns in trail and track concentration to be visualized clearly without excessive pixel-level noise.”
(Comments 9) The manuscript preparation appears somewhat careless. Revision marks are still visible in lines 323-334.
(Response 9) Thank you for pointing this out. We apologize for the oversight and have removed all remaining revision marks from lines 323–334. We have also reviewed the full manuscript to ensure that no other tracked changes or comments remain.
(Comments 10) In lines 361-362, the term “probability threshold of 40” should be written as “40%” to maintain consistency with the description in the Methods section.
(Response 10) We have revised the sentence to refer to the threshold as “40%” to ensure consistency with the description in the Methods section.
(Comments 11) The manuscript inconsistently refers to the trails and tracks detection model using different terms: DL model (such as the caption of table 1), CNN, and U-Net. Please use a consistent term throughout the paper.
(Response 11) We have standardized the terminology throughout the manuscript to refer to the model as a “U-Net model,” which accurately reflects the architecture used. This change improves clarity and consistency for the reader.
(Comments 12) In Figure 6B on page 11, the subtitle is not fully displayed (the word “data” is cut off).
(Response 12) Thank you. This has been fixed.
(Comments 13) The comparison of model results in lines 455-461 is meaningless if it only considers the CNN model without taking into account the differences in study areas and datasets.
(Response 13) We agree that comparisons across studies using different sensors, study areas, and feature types must be interpreted with caution. We have revised the text in lines 455–461 to acknowledge these differences and to clarify that these performance values are presented only as general context for situating our work within the broader literature. The paragraph in question now reads as follows:
“Our models achieved F1 scores the same as those reported by Bhatnagar et al. [36], who mapped mechanical wheel-rut trails in Norwegian forest-harvest blocks using CNNs, and somewhat lower than the 89.5% reported by Yamamoto et al. [35], who used drone imagery and CNNs to detect dugong feeding trails in intertidal seagrass beds. Our performance was substantially higher than that reported by Kaiser et al. [34], who used semi-automated techniques to map human trails in the transboundary region between the United States and Mexico, achieving an F1 score of 56%. It is important to note that these comparisons should be interpreted with caution, as the studies differ in environmental context, data sources, target features, and validation protocols. We present these benchmarks only to provide general context for situating our results within the broader domain of automated trail and track mapping.”
(Comments 14) The authors argue that the manned aircraft results slightly outperform UAV results due to the higher-powered Riegl VQ-1560ii system. Please explain the functions of this system and why it leads to better results.
(Response 14) We have added a technical explanation to clarify why the piloted aircraft LiDAR sensor (Riegl VQ-1560ii) may have delivered slightly better results than the drone-mounted Zenmuse L1. The Riegl system offers higher pulse energy, better ground-return sensitivity, and more robust georeferencing, which collectively improve terrain model quality — especially under canopy — despite its lower point density. The following passage has been inserted:
“We attribute the nominally better performance of the piloted aircraft model to the advanced capabilities of the Riegl VQ-1560ii sensor system. This dual-channel, full-waveform LiDAR unit offers higher pulse energy, superior range accuracy, and great-er sensitivity to weak ground returns than the compact drone-mounted Zenmuse L1. These characteristics improve ground surface reconstruction under canopy and in variable terrain, even at lower point densities.”
(Comments 15) It is recommended to merge the second and third paragraphs to express the ecological implications based on the study results. The final paragraph could add the study’s limitations and future research directions.
(Response 15) While we appreciate the recommendation to restructure the Conclusion, we feel the current separation between ecological patterns related to natural land cover and those associated with anthropogenic disturbances helps clarify the distinct drivers of trail and track distribution. However, we have revised the final paragraph to more clearly articulate study limitations and opportunities for future research.
(Comments 16) Abbreviations should be fully spelled out upon first appearance.
(Response 16) We have reviewed the manuscript and checked for adherence to this rule.
(Comments 17) In line 85, a reference should also be provided for Kaiser et al., even though it was cited earlier.
(Response 17) Fixed.
(Comments 18) Some expressions are not typical for academic papers, and many are overly colloquial. For example, Line 319: “This output can be seen in figures throughout the manuscript.”
(Response 18) We have revised that passage and others throughout the manuscript.
(Comments 19) The manuscript contains numerous awkward and overly colloquial expressions. The excessive use of short, fragmented paragraphs (particularly in the Results and Discussion sections) requires consolidation. I recommend the authors engage a professional editing service for thorough language polishing.
(Response 19) We thank the reviewer for this comment. In response, we have carefully revised the manuscript to remove overly colloquial expressions and improve the overall clarity and tone of the writing. We appreciate the importance of maintaining a professional academic voice and have made numerous adjustments throughout the text to that end.
With respect to the reviewer’s concern regarding short or fragmented paragraphs—particularly in the Results and Discussion sections—we disagree. Our paragraphing was intentionally structured to isolate distinct findings and thematic ideas. This modular approach was chosen to enhance clarity and reader accessibility, especially when presenting multi-dimensional results across spatial, ecological, and methodological contexts. We believe this structure aligns with contemporary academic writing norms in our field and helps avoid overly dense blocks of text that could obscure key findings.
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThere is no any further problem for this paper!
Author Response
(Comments 1) There is no any further problem for this paper!
(Response 1) Thank you!
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors have largely addressed the concerns I raised in the first round of review. However, I remain reserved about the authors’ insistence on keeping multiple short paragraphs in some sections, as this often suggests limited ability to synthesize and summarize content.
There are some numbering errors in the level-2 headings within the Discussion section.
Author Response
(Comments 1) The authors have largely addressed the concerns I raised in the first round of review. However, I remain reserved about the authors’ insistence on keeping multiple short paragraphs in some sections, as this often suggests limited ability to synthesize and summarize content.
There are some numbering errors in the level-2 headings within the Discussion section.
(Response 1) Thank you for your assistance with this manuscript. We respect your difference of opinion regarding paragraph structure but will refrain from making any further changes. Our use of shorter paragraphs is a deliberate stylistic choice aimed at enhancing readability and emphasizing key points. The Chicago Manual of Style acknowledges that paragraph length should vary to suit the content and that even single-sentence paragraphs can be appropriate when used judiciously. We believe that our paragraphing approach aligns with these guidelines and effectively supports the clarity of our arguments.​
We have fixed the numbering errors in the Discussion section.