Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This article presents two computer vision algorithms (namely face recognition) for identifying dragonflies. The comments below are with respect to clarity and methodological rigor of the article.

1) The article is somewhat unclear regarding its principal contribution. Is this in terms of the novelty of application area in entymology? The methodological adaptations are relatively minor.

2) The reasons for choice of the particular two recognition algorithms should be elucidated further, especially in the presence of several other similar algorithms.

3) Sensitivity with respect to other train and test ratios should be investigated.

4) In several places: the authors mention "two" data sets but enumerate three.

5) The relevance of the accuracy metric and its implications should be elaborated on further. What does a 97% imply for entymology? Are partial matches or recognitions given any credit? Would elaborating false positives and false negatives separately not provide greater accuracy related detail for the reader?

Author Response

Comments 1:

The article is somewhat unclear regarding its principal contribution. Is this in terms of the novelty of application area in entymology? The methodological adaptations are relatively minor.

Thank you for the reviewers' valuable comments.

We clarify that the core contribution is the novel application of face recognition techniques (specifically ArcFace loss) to dragonfly classification, Promoted the development of computer vision in entomology.. While the backbone (ResNet50) is established, the integration of ArcFace for similarity-based classification in species with subtle morphological differences (e.g., copera ciliata and copera marginipes) is methodologically innovative for entomology. This approach not only improves accuracy but also enables quantitative similarity analysis between species, offering new insights for biodiversity studies. We have revised the Introduction (Section 1) and Conclusion to emphasize this application-driven novelty. this change can be found – page 2 line 71-76，this change can be found – page 13 line 384-394.

Comments 2:

The reasons for choice of the particular two recognition algorithms should be elucidated further, especially in the presence of several other similar algorithms.

Thank you for the reviewers' valuable comments.

ArcFace’s advantages: ArcFace introduces an angular margin, which excels in separating classes in closed-set tasks. This is critical for distinguishing dragonfly species with subtle morphological differences (e.g., Figure 1).

ResNet’s robustness: ResNet, as a classic residual network, has been extensively validated for robust feature extraction. Its open-source implementations (e.g., InsightFace) also facilitated rapid experimentation.

Necessity of comparative experiments: While other algorithms (e.g., CosFace, SphereFace) exist, ArcFace has been shown to outperform them in face recognition (Reference 30). Our experiments further confirm its superiority in entomological classification (Table 2).

Comments 3:

Sensitivity with respect to other train and test ratios should be investigated.

Thank you for the reviewers' valuable comments.

We have added experiments with alternative train/val/test ratios (6:2:2, 8:1:1) in the revised manuscript. The results show that the model's conclusions remain largely unchanged across different proportions., demonstrating strong robustness (see Table 4 in the revised version). This addition validates the model’s adaptability to different data splits, this change can be found – page 9 line 251-252， page 10 line 262-267..

Comments 4:

In several places: the authors mention "two" data sets but enumerate three.

Thank you for identifying this inconsistency. We have uniformly updated all references from "two datasets" to "three datasets" (e.g., abstract, introduction, and methodology sections).

Comments 5:

The relevance of the accuracy metric and its implications should be elaborated on further. What does a 97% imply for entymology? Are partial matches or recognitions given any credit? Would elaborating false positives and false negatives separately not provide greater accuracy related detail for the reader?

Thank you for raising this critical question. We have expanded our analysis of the accuracy metric in the revised manuscript and added Precision (P), Recall (R), and F1-Score (Table 2) to clarify model performance:

Implications of 97% Top1 Accuracy:

Biodiversity Monitoring: A 97% Top1 accuracy ensures efficient identification of most dragonfly species, which is vital for large-scale ecological surveys (e.g., rapid censuses). For example, in Data1, ResNet50+ArcFace achieves 98.8% Top1 accuracy at a confidence threshold of 0.95, significantly outperforming ResNet50’s 93.7% and demonstrating robustness under high-quality data.

Ecological Health Assessment: High accuracy reduces manual verification costs and accelerates species distribution dynamics analysis, providing critical data for conservation.

Partial Matches and Threshold Analysis:

Top1-t Metrics: We introduced Top1-t metrics (Table 2: Top1-0.7 to Top1-0.95) to quantify performance across confidence thresholds. For instance, in Data3, ResNet50+ArcFace maintains 99.6% Top1 accuracy at 0.95 confidence, highlighting its robustness for high-confidence predictions.

Implicit Value of Partial Matches: Top5 accuracy (e.g., 97.8% in Data1) indirectly reflects tolerance for suboptimal predictions, which is valuable for auxiliary classification of morphologically similar species (e.g., Lamelligomphus spp.).

False Positives (FP) and False Negatives (FN) Analysis:

Precision and Recall: To better evaluate the algorithm, we conducted a statistical analysis of False Positives (FP), False Negatives (FN), and True Negatives (TN). Based on these values, we calculated Precision (P) = TP / (TP + FP), Recall (R) = TP / (TP + FN), and the F1-Score = 2 * (P * R) / (P + R), which have been added to Table 2.Table 2 explicitly distinguishes FP and FN impacts via Precision (P) and Recall (R). For example, in Data3, ResNet50+ArcFace improves Precision from 0.636 to 0.66 and Recall from 0.574 to 0.600 compared to ResNet50, indicating reduced misclassification and missed detections,this change can be found – page 9 line 251-252.

F1-Score as a Balanced Metric:

The F1-Score balances Precision and Recall, making it ideal for practical entomological classification. For instance, in Data2, ResNet50+ArcFace achieves an F1-Score of 0.857, surpassing ResNet50’s 0.843 and demonstrating superior equilibrium between reducing FP and FN. this change can be found – page 9 line 251-252.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

Summary
The manuscript proposes to adapt face‐recognition methodologies—specifically ResNet50 coupled with the ArcFace loss—to the task of dragonfly species classification. The authors construct three datasets of varying balance and size (Data1, Data2, Data3), apply a YOLOv10‐based preprocessing pipeline to crop and clean images, and demonstrate that ResNet50+ArcFace outperforms standard ResNet50 by up to 1.6 % in Top‐1 accuracy. They further explore confidence‐threshold “Top1‑t” metrics and present an ablation study showing the model’s capacity to flag and correct deliberately mislabeled samples.

General Concept Comments
- It is interesting to move ArcFace to a new domain, but the backbone (ResNet50) and loss function (ArcFace) are both well-known. The manuscript would be better if it explained more clearly why this is a big step forward over other deep-learning classifiers in entomology, like ViT-based, ensemble, or few-shot methods.
- There aren't enough specifics about annotation tools, class balance during detection training, and exact augmentation procedures in the description of YOLOv10 training (lines 68–76).
- ArcFace hyperparameters (scale s, margin m) are not specified (Eq. 2, line 115).
- Random seeds and splitting protocols for train/validation/test (6:2:2) are not given, hindering reproducibility.
- All of the datasets are from iNaturalist and a small selection of expert-curated data. It is not obvious how the model works in areas that it hasn't seen before or with new imaging settings.
- Data3's imbalanced classes make things harder than just getting the right answer; you need also add measures like per-class recall/F1 or confusion matrices.

Specific comments
- The introduction (lines 27–32) mentions recent AI-based odonate categorisation research (e.g., Sun et al., 2021; Theivaprakasham et al., 2022) but does not examine them here. Include them sooner.
- Data Process: Indicate which annotation technique was used for the YOLOv10 bounding boxes and explain how experts judged photos that were mislabeled or had multiple dragonflies.
- Table 1: In the caption, specify Box(P), Box(R), and mAP50-95. To put precision and recall in context, also include the number of false positives and negatives.
- Lines 153–161 of the experimental setup include the software versions (CUDA, PyTorch), the precise data-augmentation parameters, and the random seed(s) for dataset splits.
- Figure 5: Legends for both models should be included, and the axes should be clearly labelled "Training Loss," "Training Accuracy," and "Validation Accuracy."
- Conclusions: Wall-clock time and iteration counts do not support the claim of "faster training convergence" for ArcFace; please supply quantitative training-time data. Additionally, the results are repeated in the last paragraph; think about concentrating more on the wider ramifications and possible future paths (e.g., extension to other insect orders).

Author Response

Comments 1：

- It is interesting to move ArcFace to a new domain, but the backbone (ResNet50) and loss function (ArcFace) are both well-known. The manuscript would be better if it explained more clearly why this is a big step forward over other deep-learning classifiers in entomology, like ViT-based, ensemble, or few-shot methods.

Thank you for the reviewers' valuable comments. Although ResNet50 and ArcFace have been widely used in face recognition, the novelty of this study lies in the innovative application of face recognition technology (particularly the ArcFace loss function) to dragonfly classification, achieving significant improvement in the identification of dragonfly species with high inter-class similarity (as shown in Table 2, the Top1 accuracy is improved by up to 1.6%).

Our objective is to demonstrate that face recognition algorithms are more suitable for dragonfly species classification than traditional classification algorithms. Since the feature extraction method in face recognition algorithms employs the backbone of classification algorithms, our study primarily aims to prove that ResNet50+ArcFace outperforms ResNet50 alone. Subsequently, other classification algorithm backbones (e.g., ResNet50+ArcFace, VIT backbone+ArcFace) or backbones from other classifiers combined with ArcFace could be further explored.

Comments 2：

- There aren't enough specifics about annotation tools, class balance during detection training, and exact augmentation procedures in the description of YOLOv10 training (lines 68–76).

Thank you for the reviewers' valuable comments.The annotation process was performed using the CVAT (Computer Vision Annotation Tool) platform. For training the object detection model, only the 'dragonfly' category was utilized, thereby eliminating potential class imbalance issues. The YOLOv10 implementation employed in this study corresponds to the official codebase provided in the original YOLOv10 paper, incorporating the default data augmentation techniques specified in the code. this change can be found – page 4 line 105-111.

Comments 3：

- ArcFace hyperparameters (scale s, margin m) are not specified (Eq. 2, line 115).

Thank you for the reviewers' valuable comments.

The scaling factor and margin m use the default values, s=30, m=0.5, which have been modified in the paper, this change can be found – page 5 line 144-145.

Comments 4：

- Random seeds and splitting protocols for train/validation/test (6:2:2) are not given, hindering reproducibility.

Thank you for the reviewers' valuable comments.

The training, validation, and test sets were randomly divided in a 6:2:2 ratio for each class, this change can be found – page 8 line 214-215.

Comments 5：

- All of the datasets are from iNaturalist and a small selection of expert-curated data. It is not obvious how the model works in areas that it hasn't seen before or with new imaging settings.

Thank you for the reviewers' valuable comments.

Our research topic focuses on the species classification of dragonflies based on visible light images, and the data collection is also conducted using visible light images. To validate the performance in unseen regions and under new imaging settings, we need to recollect data.

Comments6：

- Data3's imbalanced classes make things harder than just getting the right answer; you need also add measures like per-class recall/F1 or confusion matrices.

Thank you for the reviewers' valuable comments.

Data imbalance is a challenging issue in deep learning, and while there are some methods to mitigate this problem, addressing data imbalance is not the focus of our research. Therefore, no specific measures have been taken to alleviate this issue. In terms of evaluation metrics, we have included Precision, Recall, and F1-score in the paper. this change can be found – page 9 line 251-252.

Specific comments

Comments 7：

- The introduction (lines 27–32) mentions recent AI-based odonate categorisation research (e.g., Sun et al., 2021; Theivaprakasham et al., 2022) but does not examine them here. Include them sooner.

Thank you for the reviewers' valuable comments.

We have added a brief review and comparison of the works by Sun et al. (2021) and Theivaprakasham et al. (2022) in the early part of the introduction to better position the research value of this paper，this change can be found – page 2 line 37-50.

Comments 8：

- Data Process: Indicate which annotation technique was used for the YOLOv10 bounding boxes and explain how experts judged photos that were mislabeled or had multiple dragonflies.

Thank you for the reviewers' valuable comments.

We use the CVAT annotation tool to annotate bounding boxes for dragonflies. During annotation, all types of dragonflies are labeled with bounding boxes, and their category is uniformly labeled as "dragonfly." For photos containing multiple dragonflies, each dragonfly is individually annotated with a separate bounding box.，this change can be found – page 4 line 105-111.

Comments 9：

- Table 1: In the caption, specify Box(P), Box(R), and mAP50-95. To put precision and recall in context, also include the number of false positives and negatives.

Thank you for the reviewers' valuable comments.

The number of false positives and false negatives has been added to Table 1, and the Precision (P) and Recall (R) values have been calculated，this change can be found – page 4 line 112.

Comments 10：

- Lines 153–161 of the experimental setup include the software versions (CUDA, PyTorch), the precise data-augmentation parameters, and the random seed(s) for dataset splits.

Thank you for the reviewers' valuable comments.

cuda=12.4，python=3.10，torch= 2.6.0， this change can be found – page 7 line 181-184.

Data augmentation uses random horizontal flipping, random vertical flipping, color jittering, and random affine transformation，this change can be found – page 7 line 196-201.

The training, validation, and test sets were randomly divided in a 6:2:2 ratio for each class, this change can be found – page 8 line 214-215.

Comments 11：

- Figure 5: Legends for both models should be included, and the axes should be clearly labelled "Training Loss," "Training Accuracy," and "Validation Accuracy."

Thank you for the reviewers' valuable comments.

We appreciate the reviewers' comments and have revised the paper accordingly, this change can be found – page 8 line 216.

Comments 12：

- Conclusions: Wall-clock time and iteration counts do not support the claim of "faster training convergence" for ArcFace; please supply quantitative training-time data. Additionally, the results are repeated in the last paragraph; think about concentrating more on the wider ramifications and possible future paths (e.g., extension to other insect orders).

Thank you for the reviewers' valuable comments.

The statement "ArcFace training converges faster" is not accurate. It should be stated that ArcFace training converges faster in terms of the number of training iterations. Our conclusion is based on the number of model training iterations, not on the time taken.，this change can be found – page 8 line 222-225.

The Conclusions of the paper have been updated.，this change can be found – page 13 line 384-394.

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

Transferring Face Recognition Techniques to Entomology: An ArcFace and ResNet Approach for Improving Dragonfly Classification

Summary

This paper proposes a novel application of facial recognition techniques using ArcFace and ResNet50 for fine-grained dragonfly species classification. The authors employ three unique datasets and demonstrate that their adapted model outperforms a conventional ResNet50 classifier, especially under varying data distribution scenarios. The paper is well-organized and innovative with real-world implications for biodiversity research and ecological monitoring.

General

The paper offers an innovative interdisciplinary application that adapts human face recognition models to species classification in entomology.
The experimental design is well done, with ablation studies and performance analysis across multiple datasets. However, the paper would benefit from more clarity on novelty along with a discussion on limitations and generalizability.
While the results are promising, slight improvements to the presentation of figures and threshold-related results can elevate this manuscript further.

Article

The experimental setup is sound, with consistent metrics and datasets to compare performance.
There is no discussion of limitations or generalizability.
The ablation study on mislabeled data is well-executed, illustrating the algorithm’s robustness.

Review

The literature review is thorough and impactful.
The authors identify a gap in the lack of confidence-based predictions in conventional models and propose an alternative.
The comparison to existing work in species identification (e.g., CNN, ViT, etc.) could improve the value of the contribution.
Figures like Figure 6 and 7 are too dense - breaking them down into subfigures can improve readability.

Specific

Figure 5: Improve font sizes and axis labels for better readability.
Lines 273–278: Including a brief discussion on the impact of dataset imbalance on performance could be useful here.

Author Response

Review

The literature review is thorough and impactful.

Thank you for the reviewers' valuable comments.

2. The authors identify a gap in the lack of confidence-based predictions in conventional models and propose an alternative.

Thank you for the reviewers' valuable comments.

3. The comparison to existing work in species identification (e.g., CNN, ViT, etc.) could improve the value of the contribution.

Thank you for the reviewers' valuable comments. The novelty of this study lies in the innovative application of face recognition technology (specifically the ArcFace loss function) to dragonfly classification, achieving significant improvements in the identification of dragonfly species with high inter-class similarity (as shown in Table 2, the Top1 accuracy improved by up to 1.6%). Our goal is to validate that face recognition algorithms are more suitable for dragonfly species classification than traditional classification algorithms. Since the feature extraction method in face recognition algorithms utilizes the backbone of classification algorithms, our study primarily demonstrates that ResNet50+ArcFace outperforms ResNet50 alone. Subsequently, other CNN classification algorithm backbones + ArcFace (e.g., VIT backbone + ArcFace) or other classifier backbones + ArcFace can be employed.

4. Figures like Figure 6 and 7 are too dense - breaking them down into subfigures can improve readability.

Thank you for the reviewers' valuable comments. The relevant content has been revised.，this change can be found – page 10 line 267， page 11 line 289.

Specific

Figure 5: Improve font sizes and axis labels for better readability.

Thank you for the reviewers' valuable comments. The relevant content has been revised.，this change can be found – page 8 line 216.

Lines 273–278: Including a brief discussion on the impact of dataset imbalance on performance could be useful here.

Thank you for the reviewers' valuable comments. The relevant content has been revised，this change can be found – page 13 line 344-3361.

Review Reports

Transferring Face Recognition Techniques to Entomology: An ArcFace and ResNet Approach for Improving Dragonfly Classification

Summary

General

Article

Review

Specific

Review

Specific