Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Automated Identification and Classification of Plant Species in Heterogeneous Plant Areas Using Unmanned Aerial Vehicle-Collected RGB Images and Transfer Learning

Drones 2023, 7(10), 599; https://doi.org/10.3390/drones7100599

by Girma Tariku^1,*

, Isabella Ghiglieno², Gianni Gilioli², Fulvio Gentilin², Stefano Armiraglio³ and Ivan Serina¹

Reviewer 1:

Máximo Larry López Caceres

Reviewer 2:

Sayantan Sarkar

Reviewer 3: Anonymous

Reviewer 4: Anonymous

Reviewer 5:

Ioannis Faraslis

Reviewer 6:

Guoxiong Zhou

Drones 2023, 7(10), 599; https://doi.org/10.3390/drones7100599

Submission received: 28 July 2023 / Revised: 16 September 2023 / Accepted: 20 September 2023 / Published: 25 September 2023

Round 1

Reviewer 1 Report

Automated Plant Species Identification and Classification in Heterogeneous Plant Species Areas Using UAV RGB Images and Transfer Learning

The topic presented by the authors is of high importance for biodiversity studies, especially in terrains with difficult access. However, the image processing methodology used presents several short-comings.

First, the semi-automatic annotation procedure used somehow bias the results obtained towards the algorithm used for annotation.

The authors use a clustering algorithm (the algorithm itself should be cited instead of the commercial implementation in the "multi-resolution segmentation" menu in ecognition) to group the image pixel. Then the resulting pixel clusters (objects) are assigned to different categories using the KNN algorithm. This is a very convenient way to annotate data that uses some expert input when the expert chooses some of the objects and assigns them to some of the classes that they can observe. This is usually done to reduce the time it takes to annotate the data as this is a difficult and time consuming process. However, the degree of control that the expert has in the whole process is very limited. The boundaries of objects are automatically determined and objects that the expert sees as belonging to one class may end up assigned to another. The process can be improved by manually fixing the mistakes that the clustering and KNN algorithms make, but if this is to be done right the time taken is similar to the time it would take to annotate the data manually from the start. The authors need to provide more details on the annotation process, acknowledge the limitations of their annotations and provide visual examples of all the annotated classes and illustrations of the imprecisions that they observe.

The description of how transfer learning is used is not rigorous enough. For example "concentrated at the top" does not have a clear meaning here. Give a brief overview of the architecture used and state clearly what layers are kept frozen and what layers are retrained. I assume you also change the last layer so that the number of outputs matches your problem, you should state this explicitly too.

The way the training and testing sets are built in a way that will overestimate the practical potential of the system (dependent test set). Either a separate test set should be chosen or the limitation acknowledged.

You mention that your classes are balanced. Is that a characteristic of the data collection site or a choice that you made? If it is the second case, you are limiting the practical usability of the system, as if you use it with real data the class balance need not be uniform and that will affect your results. From the confusion matrix in figure 4 it looks like the balance is not perfect. It would probably help if you provide the exact number of examples in every class in the training and testing sets.

Reporting training values is not very helpful as the authors only refer to the optimization of the model and act, at best, as surrogates for the performance regarding the practical problem. I would advice that you report only test values.

Other minor issues:

Line 78: Drone UAV? better just use UAV.

Line 98: you mentioned the are of the park, but unless I missed it, I could not find the area that was surveyed by the UAV. Flying at 30 m height I think will be not possible to cover the 960 ha.

Line 117: Maps do not have dimensions or directions.

Line 125: ... support the training of samples...

Lines 137: Can the additional class be called 'others'.

Line 147: Unclear photos instead of 'inappropriate'

Line 168: what is 'incarnation'

Line 291: In our work, we deal with two...

Line 311: It would have been great if you could have done some basic biodiversity study with the data you obtained from this study. Adding this in the paper would have been very significant.

Lines 317-321: We all know this. Unless you do not do it in your study, it is not worth mentioning in the discussion, but more on the introduction.

Line 334: you mean by 'mask extraction'?

Author Response

Point 1: The authors use a clustering algorithm (the algorithm itself should be cited instead of the commercial implementation in the "multi-resolution segmentation" menu in ecognition) to group the image pixel. Then the resulting pixel clusters (objects) are assigned to different categories using the KNN algorithm. This is a very convenient way to annotate data that uses some expert input when the expert chooses some of the objects and assigns them to some of the classes that they can observe. This is usually done to reduce the time it takes to annotate the data as this is a difficult and time consuming process. However, the degree of control that the expert has in the whole process is very limited. The boundaries of objects are automatically determined and objects that the expert sees as belonging to one class may end up assigned to another. The process can be improved by manually fixing the mistakes that the clustering and KNN algorithms make but if this is to be done right, the time taken is similar to the time it would take to annotate the data manually from the start. The authors need to provide more details on the annotation process, acknowledge the limitations of their annotations, and provide visual examples of all the annotated classes and illustrations of the imprecisions that they observe.

Response 1: Thank you for your comment. I agree, you are right in highlighting the most appropriate way of annotating data. Actually, we usually do the same. In the revised version,we have emphasized the importance of using a clear methodology (lines 127-186). While a separate, clear explanation regarding both the software tools and the algorithms used in each software is provided in lines 130-132 and 179-184. Plant experts provide only a few samples for each plant class to train the mapping; however, they do not participate in the entire process. After the mapping, less visible objects are removed. The experts’ involvement is not so much in the classification process itself but rather in the assessment of the mapping quality. This assessment is based on the color indicators placed before the drone flight, (lines 163-177). Data annotation limitations have been indicated in lines 231-235 of the revised version, with the addition of sample images for each label class as illustrated in Figure 4.

Point 2: The description of how transfer learning is used is not rigorous enough. For example "concentrated at the top" does not have a clear meaning here. Give a brief overview of the architecture used and state clearly what layers are kept frozen and what layers are retrained. I assume you also change the last layer so that the number of outputs matches your problem, you should state this explicitly too.

Response 2: Thank you for your comment. We have explained how EfficientNetV2 architecture works by specifically defining what the top layer is and which layers are added to fulfil our image classification task. ( lines 281-291)

Point 3: The way the training and testing sets are built in a way that will overestimate the practical potential of the system (dependent test set), see https://www.mdpi.com/2072-4292/13/14/2837 for example. Either a separate test set should be chosen or the limitation acknowledged.

Response 3: Thank you. Your comment is correct. In our case we subdivided the general dataset by training and testing.

Point 4: You mention that your classes are balanced. Is that a characteristic of the data collection site or a choice that you made? If it is the second case, you are limiting the practical usability of the system, as if you use it with real data the class balance need not be uniform and that will affect your results. From the confusion matrix in Figure 4 it looks like the balance is not perfect. It would probably help if you provided the exact number of examples in every class in the training and testing sets.

Response 4: Thank you for sharing your perspective. Our intention was not to suggest that all classes are of equal and balanced representation. Rather, we aimed to convey that there is no substantial imbalance or noticeable disparity in the number of images in each class. Your insight is appreciated and we hope this clarifies our approach. We have put the exact number of training and testing sets in table 1.

Point 5: Line 78: Drone UAV? Better just use UAV.

Response 5: Done

Point 6: You mentioned the area of the park, but unless I missed it, I could not find the area that was surveyed by the UAV. Flying at 30 m height I think will be not possible to cover the 960 ha.

Response 4: Thank you for your comment. We wish to specify that the total area of the park is 960 hectares, while our study focused on a specific area covering 1 hectare. In the revised version, we provided a separate description in this regard (lines 106-110). Additionally, we included a figure illustrating the study area. Please refer to Figure 1 for visual reference.

Point 7: Line 117: Maps do not have dimensions or directions.

Response 7: Thank you for your feedback. A thorough description of the orthophoto image is added in lines 123-126 of the revised version.

Point 8: Line 125: ... support the training of samples...

Response 8: Done

Point 9: Lines 137: Can the additional class be called 'others'.

Response 9: Done

Point 10 Line 147: Unclear photos instead of 'inappropriate'

Response 10: Done

Point 11: Line 168: what is 'incarnation'

Response 11: Thank you for your comment. The sentence has been changed clarification purposes.

Point 12: Line 291: In our work, we deal with two...

Response 12: Done

Author Response File: Author Response.pdf

Reviewer 2 Report

L 45: Please complete the sentence ‘…machine learning techniques in…’

L 33-59: These paragraphs seem to jump between various ideas and topics without a clear flow. Consider reorganizing and structuring the content to ensure a logical progression of ideas.

L 66-79: A lot of redundant information about transfer learning exits, please remove them.

L 80-82: It could be beneficial to explain why these contributions are important and how they address the existing gaps in the field.

L 84-93: here, the author listed the main contributions of the study, but it could be beneficial to explain why these contributions are important and how they address the existing gaps in the field.

L 100-106: Please mention the location (city and country) of drones and software used.

L 101: Please clarify how the Litchi software is used for the drone operation. Is it used to control the flight path, capture images, or both?

L 109: Please provide more detail about the segmentation process used.

L 108-140: This subsection lacks a clear flow of the methodological steps. Provide a step-by-step explanation of the process.

L 123: Please clarify. Specify which data is being classified and how the KNN algorithm is applied.

L 125-131: This section explaining the use of color indicators and training samples for plant species classification could be clearer. Explain how these color indicators are used and why they are effective.

L 139: Please explain what is "Merge Region algorithm" and how it is applied, as it's mentioned but not described.

L 147: The process of extracting plant images with class labels from the orthomosaic image could be more detailed. How are these images identified and extracted?

L 153 – 158: The "Transfer learning: EfficientNetV2" section could be expanded to provide more context about transfer learning, its benefits, and how EfficientNetV2 specifically is used.

L 236: In this paragraph, comparing previous works with this study, it would be beneficial to go into more detail into what specific challenges the authors address that were missing in the previous ones. Please discuss in 1-2 lines the studies you cite to compare.

L 248-258: In this section comparing different pre-trained models, please mention how and why EffientNetV2 outperforms other models or resNet50 performs the worst. This section just repeats the results.

Author Response

Response to Reviewer 2 Comments

Point 1: L 45: Please complete the sentence ‘…machine learning techniques in…’

Response 1: Done

Point 2: L 33-59: These paragraphs seem to jump between various ideas and topics without a clear flow. Consider reorganizing and structuring the content to ensure a logical progression of ideas.

Response 2: Thank you for your comment. We made the corrections in the revised version, (lines 29-53). The changes are now organized in the following format:

The first paragraph provides a general introduction (lines 29-32).
The second paragraph covers previous research conducted using public datasets and the models chosen for classification. This section focuses on comparing the performance of different models (lines 33-48).
The third paragraph addresses the limitations of using public datasets and introduces the use of UAVs as a solution (lines 49-53).

.Point 3: L 66-79: A lot of redundant information about transfer learning exits, please remove them.

Response 3: Thank you for your feedback. The paragraph originally covering lines 66-70 has been reformulated with the addition of paragraph 2 in the revised version (now lines 33-48), thereby minimizing redundancy.

Point 4: L 80-82: It could be beneficial to explain why these contributions are important and how they address the existing gaps in the field.

Response 4: Thank you for the feedback. Lines 75-80 of the revised version clarify this point.

Point 5: L 84-93: Here, the author listed the main contributions of the study, but it could be beneficial to explain why these contributions are important and how they address the existing gaps in the field.

Response 4: Thank you for offering your valuable suggestion. In response, we have taken proactive measures in the revised version to enhance the manuscript. More specifically, we have provided a detailed explanation for each of the listed main contributions, by specifying how each component effectively addresses the existing gaps in the research (lines 81-96).

Point 6: L 109: Please provide more detail about the segmentation process used.

Response 6: Thank you for your feedback. This pointis clarified in the revised version(lines 130-148).

Point 7: L 108-140: This subsection lacks a clear flow of the methodological steps. Provide a step-by-step explanation of the process.

Response 7: Thank you for your feedback. In the revised version, we explain this point step by step. Firstly, we explain the segmentation process in lines 127-147. Then, we provide an explanation of the mapping of image classification in lines 152-186. Finally, we detailed the image dataset extraction processin lines 208-220.

Point 8: L 123: Please clarify. Specify which data is being classified and how the KNN algorithm is applied.

Response 8: Thank you for your feedback, In the revised version, we provide a detailed explanation in lines 152-195.

Point 9: L 125-131: This section explaining the use of color indicators and training samples for plant species classification could be clearer. Explain how these color indicators are used and why they are effective.

Response 9: Thank you for your feedback. In response to your comment, we provide a clear and detailed explanation of how the color indicators are used and how they effectively contribute to the process (lines 163-177).

Point 10: L 139: Please explain what is "Merge Region algorithm" is and how it is applied, as it's mentioned but not described.

Response 10: Thank you for your feedback. In the revised version, we provide a clear definition of how the merge region algorithm works (lines 196-202).

Point 11: L 147: The process of extracting plant images with class labels from the orthomosaic image could be more detailed. How are these images identified and extracted?

Response 11: Thank you for your feedback. We addressed this issue by providing a detailed explanation in the revised version. You can find further details in lines 214-221 of the revised version.

Point 12: L 153 – 158: The "Transfer learning: EfficientNetV2" section could be expanded to provide more context about transfer learning, its benefits, and how EfficientNetV2 specifically is used.

Response 12: Thank you for your feedback. In lines 246-251 of the revised version, we explain the transfer learning model; the benefits are dealt with in lines 252-263, and how EfficientNetV2 was used is included in lines 281-291.

Point 13: L 236: In this paragraph, comparing previous works with this study, it would be beneficial to go into more detail about what specific challenges the authors address that were missing in the previous ones. Please discuss in 1-2 lines the studies you cite to compare.

Response 13: Thank you for your feedback. In the revised version, we explicitly address the challenges that were missing in previous works and specify how our approach fills the gap, (lines 397-408).

Point 14: L 248-258: In this section comparing different pre-trained models, please mention how and why EffientNetV2 outperforms other models or resNet50 performs the worst. This section just repeats the results.

Response 14: Thank you for your comments. We have taken your feedback into due account and made the necessary adjustments in the revised version. We provide a detailed explanation to address your specific comment (lines 421-442).

Author Response File: Author Response.pdf

Reviewer 3 Report

The manuscript “Automated Plant Species Identification and Classification in Heterogeneous Plant Species Areas Using UAV RGB Images and Transfer Learning” introduces a methodology using remote sensing and machine learning techniques for plant species identification and classification.

While the paper aims to address an important subject matter, focusing on plant species identification using UAV RGB images and transfer learning techniques, there are several critical issues that prevent its acceptance. The primary concern is that the paper doesn't present a new perspective, given the numerous existing studies on the same topic. Moreover, there's evident redundancy, especially in the repeated listing of plant species. This verbosity detracts from the core content. In addition, the manuscript contains grammatical and typographical errors, such as the inconsistency in naming conventions (for example, "EfficentNetV2" vs. "EfficientNetV2", and in line 100: “Using a UAV MAVIC…” where it is a sentence fragment, and in line 149: "A training dataset...are extracted" where it is a subject-verb agreement error). These issues raises concerns about the manuscript's thoroughness and editorial rigor, suggesting it has not undergone review before submission.

The introduction fails to explain the state of the art related to UAV for agricultural and forestry purposes. Moreover, the paper lacks sufficient depth in the dataset acquisition and preprocessing sections. For a study claiming to distinguish between heterogeneous plant species areas, this is a major oversight.

Introducing performance metrics after results is counter-intuitive. The metrics should be presented in the results section and the discussion section should be reserved for the discussion of the results. This demonstrates a lack of logical structure and can be confusing.

In the discussion section, while the comparison with other pre-trained models is appreciated, the absence of a rationale for EfficientNetV2's superior performance makes the discussion superficial and not insightful. Additionally, it is essential in a scientific article to discuss the results of the present study in comparison with the findings from other authors in other articles.

The conclusion, though summarizing the findings, fails to explain the significance of the research, or its real-world implications.

Finally, some figures are not referenced in the text. Furthermore, for figures, captions should be self-explanatory. For each figure and table, there should be clear descriptions and interpretations, allowing the reader to understand the significance of the data without referring to the text. Therefore, all captions in the paper should be enlarged to explain much better what is contained in the figure or table without resorting to the text.

Given the range and depth of these issues, it is recommended that the paper be rejected in its current form. The authors should strongly consider a comprehensive review, restructuring, and thorough proofreading before resubmitting to any journal.

the manuscript contains grammatical and typographical errors, such as the inconsistency in naming conventions (for example, "EfficentNetV2" vs. "EfficientNetV2", and in line 100: “Using a UAV MAVIC…” where it is a sentence fragment, and in line 149: "A training dataset...are extracted" where it is a subject-verb agreement error). These issues raises concerns about the manuscript's thoroughness and editorial rigor, suggesting it has not undergone review before submission.

Author Response

Point 1: While the paper aims to address an important subject matter, focusing on plant species identification using UAV RGB images and transfer learning techniques, there are several critical issues that prevent its acceptance. The primary concern is that the paper doesn't present a new perspective, given the numerous existing studies on the same topic. Moreover, there's evident redundancy, especially in the repeated listing of plant species. This verbosity detracts from the core content. In addition, the manuscript contains grammatical and typographical errors, such as the inconsistency in naming conventions (for example, "EfficentNetV2" vs. "EfficientNetV2", and in line 100: “Using a UAV MAVIC…” where it is a sentence fragment, and in line 149: "A training dataset...are extracted" where it is a subject-verb agreement error). These issues raises concerns about the manuscript's thoroughness and editorial rigor, suggesting it has not undergone review before submission.

Response 1: Thank you very much for your valuable feedback. We have incorporated the grammatical corrections into the revised version. Additionally, we have submitted the paper to an English reviewer for further improvements. We have undertaken a thorough revision to enhance the clarity of our work and have made improvements in all sections, from the abstract to the conclusion. Your insights have been instrumental in refining the quality of our manuscript.

Point 2: The introduction fails to explain the state of the art related to UAVs for agricultural and forestry purposes. Moreover, the paper lacks sufficient depth in the dataset acquisition and preprocessing sections. For a study claiming to distinguish between heterogeneous plant species areas, this is a major oversight.

Response 2: Thank you for your feedback. In the revised version, we improved the introduction by including additional references that help provide a more comprehensive context. One notable addition is the incorporation of your comment regarding the effective use of UAVs in agricultural applications (lines 54-66). Furthermore, we focused on enhancing the methodology section. By offering a more detailed and consistent explanation of the entire process, we make sure that readers can easily follow each step of the methodology. we clarified the problem step by step. Firstly, we explained the segmentation process (lines 127-147). Then, we provided an explanation of the mapping of image classification (lines 152 to 185). Finally, we detailed the image dataset extraction process, as described in lines 208 to 224.

Point 3: Introducing performance metrics after results is counter-intuitive. The metrics should be presented in the results section and the discussion section should be reserved for the discussion of the results. This demonstrates a lack of logical structure and can be confusing.

Response 3: Thank you for taking the time to provide your feedback. We wish to clarify that indeed in our manuscript separated the result and discussion sections. Furthermore, we made sure to present the performance measurement metrics within the results section, as you rightly mentioned. In the revised version, we processed these measurement metrics in more detail, as can be seen in lines 315 to 326. Your focus on these aspects is greatly appreciated, as it helps us enhance the clarity and consistency of our paper.

Point 4: In the discussion section, while the comparison with other pre-trained models is appreciated, the absence of a rationale for EfficientNetV2's superior performance makes the discussion superficial and not insightful. Additionally, it is essential in a scientific article to discuss the results of the present study in comparison with the findings from other authors in other articles.

Response 4: Thank you for your feedback. In the revised version, we explicitly addressed the specific challenges as presented in previous works and processed how our approach addresses the gaps that were previously missing (lines 397- 408). Additionally, we explained why EfficientNetV2 outperforms other models and why ResNet50’s performance is inferior. In lines 421-442, we delved into the reasons and mechanisms behind the superior performance of EfficientNetV2 compared to other models.

Point 5: The conclusion, though summarizing the findings, fails to explain the significance of the research or its real-world implications.

Response 5: Thank you for your feedback. In the revised version, we took your feedback into consideration and tried to enhance the conclusions. we added several sentences that summarize the significance of research in the real world, (lines 520-526).

Point 6: Finally, some figures are not referenced in the text. Furthermore, for figures, captions should be self-explanatory. For each figure and table, there should be clear descriptions and interpretations, allowing the reader to understand the significance of the data without referring to the text. Therefore, all captions in the paper should be enlarged to explain much better what is contained in the figure or table without resorting to the text.

Response 6: Done

Author Response File: Author Response.pdf

Reviewer 4 Report

The paper underscores biodiversity's vital role in agroecosystem stability and the need to preserve it for sustainable agriculture. It focuses on using remote sensing to identify plant species and introduces a novel approach combining object-based supervised machine learning with the EfficientNetV2 transfer learning model. This aims to map and classify plants via high-res UAV images. The process involves segmenting UAV orthophotos into canopy objects, using K-nearest neighbor for classification, and employing EfficientNetV2 for species identification. The model achieves 99% accuracy in classifying seven species. The results showcase the possibility of precise plant species identification. I have the following comments that need to be addressed for the next round of reviews:
1. There are works that use transfer learning to address the challenge of having small datasets when working with aerial images:

a. Liu, J., Chen, K., Xu, G., Sun, X., Yan, M., Diao, W. and Han, H., 2019. Convolutional neural network-based transfer learning for optical aerial images change detection. IEEE Geoscience and Remote Sensing Letters, 17(1), pp.127-131.

b. Rostami, M., Kolouri, S., Eaton, E. and Kim, K., 2019. Deep transfer learning for few-shot SAR image classification. Remote Sensing, 11(11), p.1374.

c. Tufail, A.B., Ullah, I., Khan, R., Ali, L., Yousaf, A., Rehman, A.U., Alhakami, W., Hamam, H., Cheikhrouhou, O. and Ma, Y.K., 2021. Recognition of ziziphus lotus through aerial imaging and deep transfer learning approach. Mobile Information Systems, 2021, pp.1-10. d. Fyleris, T., Kriščiūnas, A., Gružauskas, V., Čalnerytė,

D. and Barauskas, R., 2022. Urban change detection from aerial images using convolutional neural networks and transfer learning. ISPRS International Journal of Geo-Information, 11(4), p.246.

e. Ahmadibeni, A., Jones, B. and Shirkhodaie, A., 2020, August. Transfer learning from simulated SAR imagery using multi-output convolutional neural networks. In Applications of Machine Learning 2020 (Vol. 11511, pp. 174-183). SPIE.

The above work should be discussed in the introduction section to provide a thorough background to the reader.

2. The image captions are so brief. Please add more explanations in the captions to make them self-contained and explain more in the text.

3. COmparisons in the paper are limited. Are there prior methods for solving this problem? Is it possible to compare them to evaluate how competitive the proposed method is?

4. In Figure 5, what is the reason behind validation accuracy being larger at initial epochs compared to the training accuracy?

5. Please report the standard deviation values in your results to make the comparisons statistically meaningful.

6. Please release your code on a public domain such as GitHub to make reproducing the results straightforward.

Author Response

Point 1: There are works that use transfer learning to address the challenge of having small datasets when working with aerial images: ……….. The above work should be discussed in the introduction section to provide a thorough background to the reader.

Response 1: Thank you for your feedback. In the introduction of the revised version, we incorporated your recommended references related to the convolutional neural network-based transfer learning method for aerial images (lines 45-48). We also included other references for additional content support.

Point 2: The image captions are so brief. Please add more explanations in the captions to make them self-contained and explain more in the text.

Response 2: Thank you for your feedback. In the revised version, we provided more details and explanations to clarify the captions more thoroughly..

Point 3: Comparisons in the paper are limited. Are there prior methods for solving this problem? Is it possible to compare them to evaluate how competitive the proposed method is?

Response 3: Thank you for your feedback. In the revised version, we explicitly addressed the specific challenges dealt with in previous works and came up with how our approach fills the gaps previously missing (lines 397-408).

Point 4: In Figure 5, what is the reason behind validation accuracy being larger at initial epochs compared to the training accuracy?

Response 4: Thank you for taking the time to share your comment. We wish to address your concern regarding the scenario in which validation accuracy outperforms training accuracy at initial epochs. This phenomenon does not necessarily create a problem, especially in the initial stages of training. A number of factors linked to the training process and dataset characteristics may contribute to this occurrence. In neural networks, weights are often initialized randomly. In the early epochs, the model is working on grasping meaningful patterns from scratch. If the initial random weights align well with the validation set, a temporary rise in validation accuracy may indeed occur.

Point 5: Please report the standard deviation values in your results to make the comparisons statistically meaningful.

Response 5: Thank you for your suggestion. Done in lines 334-376.

Point 5: Please release your code on a public domain such as GitHub to make reproducing the results straightforward.

Response 5: Thank you for your feedback. I have uploaded the dataset in Zenedo at this link https://doi.org/10.5281/zenodo.8297802

Author Response File: Author Response.pdf

Reviewer 5 Report

The paper deals with the investigation of the plant species classification by applying pre-trained transfer learning model (EfficientNetV2) in high-resolution RGB images captured by UAV platform. The results seem promising concerning the classification accuracy of plant species.

Please find in the attached file the amendments which I believe are required prior to accepting the paper. All in all, my recommendation is to accept the paper for publication, subject to minor revision.

More generic comments:

- Is this methodology easy transferable elsewhere? Different commercial software has been used (Ecogintion, Arcgis) raising the cost in time and money. Please shortly description of implications for future research in “Discussion” section.

Comments for author File: Comments.pdf

Author Response

Point 1: Please find in the attached file the amendments which I believe are required prior to accepting the paper. All in all, my recommendation is to accept the paper for publication, subject to minor revision.

Response 1: Thank you for your comments. As you suggested, in the revised version, we included the flowchart (see Figure 1). We also provided the specifics regarding the number of images captured by the UAV and the count of overlapped points during the orthophoto preparation ( lines 120-126). Moreover, we included the optimal values for shape, scale, and compactness parameters (lines 134-141).

Point 2: Is this methodology easily transferable elsewhere? Different commercial software has been used (Ecogintion, Arcgis) raising the cost in time and money. Please short description of the implications for future research in the “Discussion” section.

Response 2: Thank you for your feedback. A short description in this regard has been included in the conclusions of the revised version.

Author Response File: Author Response.pdf

Reviewer 6 Report

1. The abstract is too redundant, please rewrite it to highlight novelty.

2. What is the significance of the dataset established in this article.

3. This article provides a lengthy introduction to the established dataset, and it is recommended to reduce it.

4. It is recommended to use algorithms from relevant applications when comparing algorithms.

5. This article has not made any improvements to the algorithm, and it is recommended to start from the dataset. Otherwise, readers cannot be convinced.

6. It is recommended to use a three line table for the table.

7. It is recommended to cite more articles from recent years as references.

Author Response

Point 1: The abstract is too redundant, please rewrite it to highlight novelty.

Response 1: Thank you for your comment, we rewrote the abstract to remove redundancy. Below is a list of words and phrases that have been eliminated in the revised version:

"plays a critical role in regulating agroecosystem processes and ensuring their stability"
"Preserving and restoring biodiversity is essential for improving sustainability in agricultural production"
"Species identification and classification within plant communities constitute key factors in biodiversity studies"
"Remote sensing techniques provide valuable support for species identification and classification"
"However, accurately identifying plant species in heterogeneous plant areas presents challenges in dataset acquisition, preparation, and model selection for image classification"
"This study introduces a methodology that combines object-based supervised machine learning for dataset preparation and a pre-trained transfer learning model (EfficientNetV2) for precise plant species classification"
"The plants within the orthophoto are subsequently classified using the K-nearest neighbor (KNN) supervised machine learning algorithm"
"To create the training dataset, individual plant species canopies are extracted using ArcGIS software"
"Finally, a pre-trained transfer learning model is applied for plant species classification"
"The test results demonstrate that the pre-trained transfer learning model (EfficientNetV2) achieves an impressive classification accuracy of 99% for seven plant species"
"Furthermore, a comparative study was conducted, comparing EfficientNetV2 with other widely used transfer learning models such as ResNet50, Xception, DenseNet121, InceptionV3, and MobileNetV2"

The novelty of this paper lies in the introduction of a methodology that combines object-based supervised machine learning for dataset preparation and a pre-trained transfer learning model (EfficientNetV2) for precise plant species classification in heterogeneous areas. This was also, highlighted in the revised abstract (lines 16-19).

Point 2: What is the significance of the dataset established in this article?

Response 2: Thank you for your question. The importance of creating datasets in different areas lies in that it makes it easier to prepare plant image datasets using drone images for specific regions and phenological stages. What limits the reliance on publicly available datasets is the tendency to encompass vast regions, which may include areas that are not relevant or contain information of no significance to the specific research focus. Our proposed methodology addresses this issue by gathering plant image datasets from regions with heterogeneous plant species. The novelty of our approach is based on the efficient utilization of a Drone UAV for dataset collection.

Point 3: This article provides a lengthy introduction to the established dataset, and it is recommended to reduce it.

Response 3: Thank you for your comment. We made the corrections in the revised version, (lines 29-53). The changes are now organized in the following format:

The first paragraph provides a general introduction (lines 29-32).
The second paragraph covers previous research conducted using public datasets and the models chosen for classification. This section focuses on comparing the performance of different models (lines 33-48).
The third paragraph addresses the limitations of using public datasets and introduces the use of UAVs as a solution (lines 49-53).

Point 4: It is recommended to use algorithms from relevant applications when comparing algorithms.

Response 4: Thank you for your input. We incorporated six commonly utilized transfer learning models for image classification (lines 415-420). Moreover, we conducted a comprehensive performance comparison using a range of performance metrics.

Point 5: This article has not made any improvements to the algorithm, and it is recommended to start from the dataset. Otherwise, readers cannot be convinced.

Response 5: Thank you for your comments. Indeed, we first created a plant image dataset from scratch using UAVs and then selected the model that best suited our prepared dataset.

Point 6: It is recommended to use a three-line table for the table.

Response 6: Done.

Point 7: It is recommended to cite more articles from recent years as references.

Response 7: Thank you for your feedback. In the introduction of the revised version, we incorporated references related to convolutional neural network-based transfer learning for aerial images (lines 45-47). You can find these references mentioned in lines 45-47. We also included other references to support the contents (lines 56-59).

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Automated Plant Species Identification and Classification in Heterogeneous Plant Species Areas Using UAV RGB Images and Transfer Learning

General Comments

I want to thank the authors for the effort they have clearly put into improving the paper. Sadly, I am afraid that the paper has not yet reached the level needed to grant publication. As it is, the paper has some incorrections and some serious methodological problems. As the reviews modality in mdpi journals does not allow for longer revision periods, I am now going to recommend the rejection of the paper while encouraging the authors to improve the study and re-submit when the problems have been solved.

Specific comments

1) The methodology used casts serious doubts on the practical usability of the system developed.

1.1) As mentioned in my previous review, the semi-automatic annotation will likely bias the result towards the automatic clustering method used and not towards human expert opinion. At the very least, I think this limitation should be clearly stated in the paper, but given the time that the process that you describe likely takes, I am pretty sure that you could use expert manual annotation in a similar amount of time (using dedicated software such as labelme or similar, Arcgis or Qgis or even a simple image processing software like gimp). The part of the process where you move from pixel clusters to the square patches that you need to fit to the deep learning networks is not trivial an not properly detailed. For example, if you use an inner bounding box you are missing the boundaries, if you use an outer bounding box you need to define what you do with the exterior part, and any resizing that you do can have consequences on the performance of the network. From the patches in the figure close to line 237 (I think this figure does not have a caption) it looks like you use central patches of fixed size, you should make it clear.

1.2) The random subdivision used for the training and testing sets potentially compromises the generalization power of the networks trained. If at all possible, the authors should use a second, separate site for testing.

2) There appear to be significant errors in the numbers presented. In particular, the number of "true labels" for each species in figure 7 is not consistent with what is presented in the test set in Table 1. The same happens with Figures 10-14. Additionally, the accuracy results reported should coincide with the confusion matrices too but they do not. For example, the first row of the confusion matrix in figure 7 shows 4 wrong classifications out of 56, which stands for 52/56=0.9286 accuracy and not the 0.997 reported in table 2. As far as I can see, the numbers presented are inconsistent. This may be due to some methodological error in some of the calculations and this is not acceptable.

3) The metrics used (apart from the confusion matrix and accuracy) are not clearly defined. In particular, what is a "True Positive" in a multi-class classification problem? More specifically: If an image belongs to Class A and is classified as belonging to class B, it would make sense to classify it as a "False Positive" when looking at it from the perspective of Class B, a "False Negative" when looking at it from the perspective of class A. Even more, how would you classify that image from the perspective of a third class (Class C)? if it was classified as a "True Negative", given that it is not from class C and was not classified in it, that would overestimate the performance of the classifier, as the classification would be counted as correct when it was not. There are ways to use these metrics for multi class classification, but they all have caveats. Either define what you are doing clearly or stick to using only accuracy (percentage of correctly classified images) and confusion matrices.

4) I am not sure what the training and validation accuracy and loss graphs (figures 8,9) bring to the discussion. As these are not directly related to the practical performance of the network they are more likely to confuse readers and the text does not use them to make any convincing point, please consider eliminating them. From my point of view, the flat curves from epoch likely indicate that your patience parameter was too high.

5) Some of the classification results obtained are a bit strange. For example, the big jump from ResNet50 to other networks is pretty surprising. This does not mean the paper is wrong, and unexpected results may be interesting, but the inconsistencies in the numbers make me wary. Please improve the rigor of your experiments and try to find a reason why you observe the numbers that you do. In this respect, maybe exploring some recent reviews in the are of deep learning in forestry may help you get some context. While number by number comparison is always difficult and not always meaningful, adding this context to the discussion may help the readers assess the strength of your work.

There are some small details that need to be corrected but i think the ones above are crucial for the publication of the paper.

The English for the paper is good.

Author Response

Comments 1: As mentioned in my previous review, the semi-automatic annotation will likely
bias the result towards the automatic clustering method used and not towards human expert
opinion. At the very least, I think this limitation should be clearly stated in the paper but given
the time that the process that you describe likely takes, I am pretty sure that you could use
expert manual annotation in a similar amount of time (using dedicated software such as label
me or similar, ArcGIS or Qgis or even a simple image processing software like gimp). The part
of the process where you move from pixel clusters to the square patches that you need to fit
into the deep learning networks is not trivial and not properly detailed. For example, if you
use an inner bounding box, you are missing the boundaries, if you use an outer bounding box
you need to define what you do with the exterior part, and any resizing that you do can have
consequences on the performance of the network. From the patches in the figure close to line
237 (I think this figure does not have a caption) it looks like you use central patches of fixed
size, you should make it clear.
Response 1: Thank you for your detailed feedback. As suggested, we have addressed the
limitation of the semi-automatic annotation in the revised version, as outlined in lines 490-
494.
Nevertheless, the color indicator plays a vital role in ensuring sufficient training data is
sampled for each plant species during the mapping process and for validation purposes,
mitigating the impact of the process on the results.
In response to your feedback regarding the transition from pixel clusters to square patches,
we have elaborated it on the image dataset extraction description in lines 208-220. To achieve
this, we employ rectangular boundary boxes to isolate individual plant images from the
mapped orthophoto using ArcGIS software. We generated individual plant class images
through the 'Extract by Mask' operation with a shifting clip polygon. This tool allowed us to
systematically extract images from different parts of the entire mapped large image by
adjusting the position of the clip polygon. By shifting the polygon across the large image, we
collected a diverse set of images, ensuring that we had enough data to create a comprehensive
dataset.
2
Regarding the patches in the figure near line 237, we have updated the caption.
Comments 2: The random subdivision used for the training and testing sets potentially
compromises the generalization power of the networks trained. If at all possible, the authors
should use a second, separate site for testing.
Response 2: Thank you for your valuable feedback. In response to your input, we've made
improvements in how we split the training and testing images for each class. We've now
implemented class weight balancing using the Python Keras library.
In our previous manual script, we employed a random subdivision approach with class
weights, which sometimes caused confusion. To address this, we have adopted a more
consistent approach by ensuring a constant number of training and testing images for all pretrained models, as presented in Table 1. This adjustment aims to provide greater clarity and
consistency in our image classification process.
Comments 3: There appear to be significant errors in the numbers presented. In particular,
the number of "true labels" for each species in Figure 7 is not consistent with what is presented
in the test set in Table 1. The same happens with Figures 10-14. Additionally, the accuracy
results reported should coincide with the confusion matrices too but they do not. For example,
the first row of the confusion matrix in Figure 7 shows 4 wrong classifications out of 56, which
stands for 52/56=0.9286 accuracy and not the 0.997 reported in Table 2. As far as I can see, the
numbers presented are inconsistent. This may be due to some methodological error in some
of the calculations and this is not acceptable.
Response 3: Thank you for your useful comment. We agree with your observation and
have taken steps to address it.
To provide clarity and consistency in our results, we have implemented a constant weightbalancing approach for both training and testing sets across all pre-trained models. This
adjustment aims to eliminate any confusion related to the presentation of the confusion
matrix figures.
Additionally, we have utilized the training-test split methodology as detailed in Table 1 to
ensure accurate representation. After conducting the Python simulations once more, we
have successfully resolved all issues related to the confusion matrices in Figure 7, as well as
Figures 10 through 14.
Comments 4: The metrics used (apart from the confusion matrix and accuracy) are not clearly
defined. In particular, what is a "True Positive" in a multi-class classification problem? More
specifically: If an image belongs to Class A and is classified as belonging to class B, it makes
sense to classify it as a "False Positive" when looking at it from the perspective of Class B, a
"False Negative" when looking at it from the perspective of class A. Even more, how would
you classify that image from the perspective of a third class (Class C)? if it was classified as a
3
"True Negative", given that it is not from class C and was not classified in it, that would
overestimate the performance of the classifier, as the classification would be counted as correct
when it was not. There are ways to use these metrics for multi-class classification, but they all
have caveats. Either define what you are doing clearly or stick to using only accuracy
(percentage of correctly classified images) and confusion matrices.
Response 4: Thank you for your feedback. In the revised version, we have defined the terms
"true positive," "false positive," and "false negative" in lines 325 to 328.
Comments 5: I am not sure what the training and validation accuracy and loss graphs (figures
8,9) bring to the discussion. As these are not directly related to the practical performance of
the network, they are more likely to confuse readers and the text does not use them to make
any convincing point, please consider eliminating them. From my point of view, the flat
curves from the epoch likely indicate that your patience parameter was too high.
Response 5: Thank you for your comments. We have included the training and validation
accuracy and loss graphs in the results section, as shown in lines 382-388. These graphs are
essential tools for optimizing, evaluating, and understanding the behavior of deep learning
models in image classification tasks.
Comments 6: Some of the classification results obtained are a bit strange. For example, the big
jump from ResNet50 to other networks is pretty surprising. This does not mean the paper is
wrong, and unexpected results may be interesting, but the inconsistencies in the numbers
make me wary. Please improve the rigor of your experiments and try to find a reason why
you observe the numbers that you do. In this respect, maybe exploring some recent reviews
in the area of deep learning in forestry may help you get some context. While number-bynumber comparison is always difficult and not always meaningful, adding this context to the
discussion may help the readers assess the strength of your work.
Response 6: Thank you for your feedback. We have provided an explanation for why
EfficientNetV2 outperforms other models and why ResNet50's performance is comparatively
lower. In lines 427-443, we have delved into the underlying reasons and mechanisms that
contribute to the superior performance of EfficientNetV2 in comparison to other models.
Additionally, we have incorporated relevant literature references within the text

Author Response File: Author Response.pdf

Reviewer 2 Report

Thank you for addressing the reviewers' comments. Please consider the following comments.

L 429-445 (revised manuscript): Please cite relevant literature related to the information provided.

Author Response

1. Summary
We thank reviewer 2 for the useful comment. Please find the detailed responses below and
the corresponding revisions in the re-submitted files.
2. A point-by-point response to Comments and Suggestions for Authors
Comments 1: L 429-445 (revised manuscript): Please cite relevant literature related to the
information provided.
Response 1: Thank you for your comment. We have taken your feedback into consideration
and included the corresponding literature in the revised version, specifically in lines 429-445.
Your input is greatly appreciated.

Author Response File: Author Response.pdf

Reviewer 3 Report

The authors have addressed my prior concerns.

The figure captions have been notably improved. However, I recommend further refining them to be concise and direct. Taking Figure 5 as an example:

Instead of "Figure 5 shows the result of ground truth map-making. The various plant species classes identified in the image are indicated by color as explained in the above legend. This figure illustrates the plant mapping prepared using eCognition software."

Consider:
"Figure 5: Ground truth plant mapping using eCognition software, with distinct plant species classes color-coded according to the provided legend."

I advise the authors to revise all captions similarly.

English should be written carefully. There are several grammatical errors in the text. For example, in line 72: "However, each dataset was collected from the homogeneity plant species area." This phrase is grammatically awkward. Is the word "homogenous"?

The entire document should be reviewed for similar errors.

Author Response

1. Summary
We thank reviewer 3 for the useful comment. Please find the detailed responses below and the corresponding revisions in the re-submitted files.
3. Point-by-point response to Comments and Suggestions for Authors
Comments 1: The figure captions have been notably improved. However, I recommend further refining them to be concise and direct. Taking Figure 5 as an example: Instead of "Figure 5 shows the result of ground truth map-making. The various plant species classes identified in the image are indicated by color as explained in the above legend. This figure illustrates the plant mapping prepared using eCognition software." Consider: "Figure 5: Ground truth plant mapping using eCognition software, with distinct plant species classes color-coded according to the provided legend," I advise the authors to revise all captions similarly.
Response 1: Thank you for your comment. We have taken your advice into account and have incorporated the necessary changes in the revised version.
Comments 2: English should be written carefully. There are several grammatical errors in the text. For example, in line 72: "However, each dataset was collected from the homogeneity plant species area." This phrase is grammatically awkward. Is the word "homogenous"?
Response 2: Thank you for your feedback. We have implemented the suggested corrections, and additionally, we have had the manual script reviewed by a professional English reviewer for the first round of review papers. We are pleased to provide you with the attached certificate as evidence of the review.

Article Menu

Automated Identification and Classification of Plant Species in Heterogeneous Plant Areas Using Unmanned Aerial Vehicle-Collected RGB Images and Transfer Learning

Further Information

Guidelines

MDPI Initiatives

Follow MDPI