Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite

Open AccessArticle

Peer-Review Record

Automated Synthesis of Hierarchical Deep Learning Cascades for Identifying Visually Similar Objects in UAV Imagery

Technologies 2026, 14(6), 360; https://doi.org/10.3390/technologies14060360 (registering DOI)

by Dmytro Borovyk¹, Oleksander Barmak¹

, Pavlo Radiuk^1,*

and Iurii Krak^2,3

Reviewer 1: Anonymous

Reviewer 2:

Serhii Semenov

Technologies 2026, 14(6), 360; https://doi.org/10.3390/technologies14060360 (registering DOI)

Submission received: 29 April 2026 / Revised: 6 June 2026 / Accepted: 9 June 2026 / Published: 13 June 2026

(This article belongs to the Special Issue Advanced Technologies in Computer Vision and Applications)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

To address the challenge of accurately identifying similar objects in UAV imagery, this paper proposes a method for automatic detection using hierarchical deep learning cascades. Overall, the paper lacks innovation, and the experimental validation is insufficient. There are some issues that need to be addressed carefully.

There are currently many state-of-the-art (SOTA) CNN-based single-stage in-class object detection algorithms. What are the advantages of the multi-stage detection algorithm proposed in this paper?
The paper should include comparative experiments with SOTA object detection algorithms based on CNNs.
Ablation experiments for the proposed method should include both qualitative evaluations to verify its effectiveness at different stages.
A diagram showing the overall framework of the proposed method should be provided to facilitate a clearer understanding.
Validation experiments using other datasets need to be included to demonstrate the superiority of the proposed method.
The formatting of some figure captions needs to be consistent.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The manuscript addresses a relevant and practically important problem: automated synthesis of hierarchical deep-learning cascades for identifying visually similar objects in UAV imagery. The proposed idea of combining latent-space proximity with empirical classifier confusion is potentially useful, especially for cases where semantic class taxonomies do not correspond to visual similarity in aerial images. The paper is generally well structured and contains a clear experimental narrative. However, in its current form, the manuscript requires substantial revision before it can be considered for publication.

1. The literature review could be further strengthened by incorporating recent studies devoted to feature-space optimization, imbalance-aware preprocessing, and lightweight intelligent classification pipelines operating under constrained computational conditions. In particular, the following recent work is highly relevant to the discussed problem because it investigates preprocessing of highly correlated and imbalanced data, classifier robustness, and adaptive machine-learning-based decision systems:

Semenov, S.; Krupska-Klimczak, M.; Czapla, R.; Krzaczek, B.; Gavrylenko, S.; Poltorazkiy, V.; Vladislav, Z. Intrusion Detection Method Based on Preprocessing of Highly Correlated and Imbalanced Data. Applied Sciences, 2025, 15(8), 4243. https://doi.org/10.3390/app15084243

Additionally, the discussion could benefit from comparison with recent lightweight and feature-selection-oriented intelligent classification frameworks such as:

Kaushik, S.; Bhardwaj, A.; Almogren, A.; et al. Robust machine learning based intrusion detection system using simple statistical techniques in feature selection. Scientific Reports, 2025, 15, 3970. https://www.nature.com/articles/s41598-025-88286-9

Although these studies belong to the cybersecurity domain, they are methodologically relevant because they address highly correlated feature spaces, imbalance-aware optimization, lightweight architectures, and computational efficiency for Edge AI deployment — all of which are closely related to the objectives of the reviewed manuscript.

2. The novelty claim should be formulated more carefully. The manuscript presents the hybrid metric as the main contribution, but the distinction from existing approaches based on feature-space clustering, confusion-matrix-driven grouping, hierarchical classification, and automated taxonomy synthesis is not sufficiently demonstrated.

3. The experimental improvement over the monolithic YOLOv11s baseline is relatively small (approximately 0.8% in F1-score), while the FPS decreases from 50 to 41. Therefore, stronger statistical validation is required, including confidence intervals, statistical significance analysis, or a larger number of independent experimental runs.

4. The claim of fully automated hierarchy synthesis is weakened by the statement that the threshold τ was adjusted to obtain three hierarchy levels for comparison with the manually designed architecture from [1]. The manuscript should clarify whether τ is genuinely determined automatically or partially constrained to reproduce the expert-defined structure.

5. The confusion-based distance metric requires additional theoretical justification. Specifically, the normalization strategy in Equation (3) is insufficiently motivated, and the statement that class imbalance does not influence the correctness of the calculations is not convincingly supported.

6. The generalization experiment on UAV123 remains ambiguous. It is unclear whether the hierarchy generated on VisDrone2019 was transferred directly to UAV123 or whether a new hierarchy was synthesized specifically for UAV123. These two scenarios correspond to fundamentally different claims regarding transferability and adaptability.

7. Figures 3 and 5 resemble manually prepared schematic hierarchy illustrations rather than genuine HAC dendrograms. The manuscript would be considerably stronger if real dendrogram visualizations with linkage distances and cutting thresholds were presented.

8. The manuscript requires language polishing. Several expressions are grammatically incorrect or stylistically informal for a scientific publication, for example:

- “losses in FPS rate”;
- “FPS lose doesn’t seem really critical”;
- “what can be seen in comparison of Figure 6 and Figure 7”.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The innovation of the paper is not enough to meet the publishing requirements of the journal, so it is recommended to invest in another journal.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have addressed the comments and answered the questions

Author Response

Dear Reviewer,

We extend our sincere gratitude for your detailed and insightful review of our manuscript. Your highly constructive suggestions have been beneficial in enhancing the methodological rigor, clarity, and overall impact of our work.

Article Menu

Automated Synthesis of Hierarchical Deep Learning Cascades for Identifying Visually Similar Objects in UAV Imagery

Further Information

Guidelines

MDPI Initiatives

Follow MDPI