A Comparative Study of Deep Semantic Segmentation and UAV-Based Multispectral Imaging for Enhanced Roadside Vegetation Composition Assessment
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis paper conducted a comparative study to identify eight roadside vegetation species using various deep learning models, including Unet++, PSPNet, DeepLabV3+, and Manet. The datasets were all collected from construction sites. The topic is interesting.
However, the authors failed to explain why they detected vegetation from roadsides in the Introduction. If the roadside vegetation species detection is essential, why did the authors not choose roads but construction sites? The significance of the paper should be explained more clearly.
Moreover, it would be helpful to add the coordinate systems of Figures 10-12 to indicate the location of the construction site.
Comments on the Quality of English LanguageThere are several instances of typos and incorrect capitalization in sentences. The quality of English writing needs improvement. There are some typos, for example:
Page 2: Line 67. Two “in”s.
Page 4: Line 156. Table 1 has no title. What do you mean “Table 1. This is a table. Tables should be placed in the main text near to the first time they are cited.”?
Page 7 Line 226: “Figure 5. Illustrates”
Page 10 Line 318: “multispectral Images” should be “multispectral images.”
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 2 Report
Comments and Suggestions for AuthorsThis paper demonstrates the benefits of deep learning (DL) techniques for roadside vegetation assessment, making a timely and important contribution to ecological monitoring. Semantic segmentation combined with UAV-based RGB imagery provides a scalable substitute for time-consuming, conventional monitoring methods. It is good that a pixel-wise annotated dataset for eight plant species has been developed and used; it offers a useful standard for further study. When compared to traditional vegetation indices like SAVI, the MAnet model's high accuracy (IoU = 0.90, R2 = 0.99) highlights the accuracy and resilience of DL in vegetation classification and cover estimation. However, generalizability would be strengthened by additional validation in a variety of settings and lighting circumstances. Overall, this work lays a solid basis for useful ecological UAV applications. However, some minor suggestions can further improve the understanding of the manuscript.
Add more numerical results in the abstract section.
Remove the third keyword.
The height of different vegetation types is not uniform, does it affect the accuracy?
Mention the objectives of the study as a, b, and c.
Figure 4: What is the core reason for the low R2
Has the study considered the feature selection?
Fig. 10-12: Please provide a scale bar as per vegetation.
What is the main reason for selecting this location?
How have you considered the application of bands? Can the study consider the exploration of the texture features?
L#633, 650: Properly cite the references.
In the whole manuscript, do not use the “our” and “we”.
Please shorten the conclusion section, it is currently too long.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 3 Report
Comments and Suggestions for AuthorsThe manuscript provides an insightful comparison between deep learning-based semantic segmentation (SS) models and traditional vegetation indices (VI) for vegetation composition assessment using UAV-based imagery. The proposed study is well-structured and tackles a relevant problem in vegetation monitoring, specifically for roadside vegetation, which has critical applications in ecological management and construction zone monitoring. The results are promising. However, there are several areas where the paper can be enhanced.
- The methodology mentions the use of UAV-based imagery but does not provide details on the type of UAV used, its specifications (e.g., camera resolution, flight altitude), or the time of data collection. It would be useful to include this information to allow readers to assess the applicability of the findings to other UAV systems.
- While the paper discusses the class imbalance issue in the dataset, it does not mention the specific techniques used to address this, such as data augmentation or class weighting.
- The evaluation metrics are well-chosen; however, there is no mention of the number of epochs or the training time for each model.
- The paper should explain the role of the Mix Vision Transformer backbone in more detail and how it contributed to the model's success .
- Providing examples of these misclassifications would help illustrate the practical limitations of the VI approach.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 4 Report
Comments and Suggestions for AuthorsThis is a fine article, and presents significant results that help us move forward with CNN-based strategies that help us identify specific plant species that are very difficult to classify using either conventional methods or AI-based ones in general. Here are a few recommendations to further improve it:
1) Pay attention to Figures and Tables. Some do not have a title (instead, they have the MDPI template text).
2) This is a journal on Remote Sensing. Most readers are well acquainted with the specificities of the field (types of bands and sensors, classification methods, photogrammetric corrections, etc.) but most are not specialists on grasslands. I have no idea, for example, what the difference between Crabgrass and Lespedeza is. I recommend adding an image mosaic with both photos (up close) of each species and how they look like in orthoimagery. This will help readers evaluate the level of difficulty behind the classification process.
3) Figure 1 is too bland. I suggest improving the map considerably (perhaps adding water bodies, urban sprawl and a delicate hillshade). It often surprises me how context maps are so ignored by authors. They need to provide readers with the "big picture" of the natural landscape of those counties. Are they hilly, are they traversed by rivers? Where are the major cities? Some of those factors impact the research directly. For a few examples of Alabama maps, see: https://alabamamaps.ua.edu/contemporarymaps/alabama/physical/index.html
4) You say "The multispectral bands, excluding the RGB channel, had a spatial resolution of 3.2 MP", but I couldn't find the resolution of the RGB bands.
5) Why did you use only Precision, Recall and IoU? What about F1, Accuracy and MIoU? I'm not suggesting you redo the article, but an explanation on why just 3 metrics were used is necessary.
6) NDVI and NDRE are considerably correlated (see, for example: https://doi.org/10.3390/agriengineering5020052), and I believe SAVI will be too, since all of them end up using the NIR band (which is very significant for vegetation). Have you considered PCA components so you would extract the truly relevant components of each index, and then combine the three into one relevant 3-band layer?
7) For training each model, what was the early stop criterion?
8) How many epochs did it take to train each model?
9) In the results section, you discuss several other metrics in further detail. The paragraph that starts on line 392 deserves a table outlining all those metrics, for better comparative analysis.
10) The discussion and the conclusion are well developed, but I suggest the authors discuss on the applicability of using the DL model with indices as reference data - instead of regular bands. I gave the suggestion of using PCA but this is just an idea. It is worth mentioning, though, that it could be possible to add them as bands. Also, a whole discussion emerges on trying to establish new indices based on DL approaches (for example: see https://doi.org/10.3390/rs13122261). I believe these possible developments should be cited, at least.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors have well revised the manuscript.
Author Response
Thank you for confirming the quality of our revision.
Reviewer 3 Report
Comments and Suggestions for AuthorsThanks for the revision. Current version can be accepted for publication with minor edits.
In Introduction, several highly cited articles since 2023 that presented representative works concerning semantic segmentation of remote sensing images are missing, such as RSMamba, SAPNet, BEDSN, GPINet, TransUNet, et al. These works are closely related with this paper. I suggest a minor revision with literature review.
Author Response
Please see the attachment.
Author Response File: Author Response.docx