Detection and Recognition of Drones Based on a Deep Convolutional Neural Network Using Visible Imagery
Round 1
Reviewer 1 Report
The subject is current, and the paper demonstrates fluency in applying ML methods to the problem at hand. However, it falls a bit short when it comes to framing the problem itself and to substantiating the claim.
Problem statement. I started reading the paper thinking that the problem lies in detecting drones (i.e. the problem is binary: there is a drone - maybe of a particular type - or there is not). It turned out that the problem is to detect the drone (of one of two types) or a bird, i.e. the problem researched is different than the problem discussed e.g. in literature review. I appreciate that making the distinction between birds and drones is one of more challenging tasks, but nevertheless it is not what has been promised. I would also suggest reviewing the lit review from this perspective, to see whether the paper compares - so to say - pears with pears.
The same goes for the fact that this seems to be the study in analysing stationary images, not video feeds. It would be beneficial to structure the lit review into stationary and video sections, as video provides both challenges and advantages over stationary. This will make it also easier to verify the claim of this solution being better than the known ones.
Comparing results in ML is tricky, as there are at least two factors that may influence results: available dataset and chosen class of model, with parameters that affect its operation. Thus, if the claim is that the network is better, I would always prefer having identical datasets. Here, I do not how much of the gain can be attributed e.g. to better data preparation.
Note, that actually the claim of being better than other solutions is not properly substantiated, neither in discussion nor in conclusions. It only states that this solution solved the challenge, which my be true enough. So, if this is better than competing solutions, I would like to know what solution forms a baseline and to what extent this one is better.
Dataset. If the challenge is to detect drones, then the dataset should contain images of drones under different conditions and over different backgrounds, together with the images where there are no drones. I am missing the thorough explanation of the rationale of leaving aside images with neither of them. I am also not sure whether the weather element has been properly factored in, e.g. incremental weather, fog etc. I can see form the attached examples that various backgrounds were used, which is always good.
Bounding box. This is a mundane element of the data flow, but an important one. My question is, whether the bounding box for the ground truth was established manually or automatically. Both methods are feasible (yet not easy), but they may lead to different results.
As for references it may be worth looking at https://doi.org/10.3390/s21082824.
From the editorial perspective, the paper is readable, but the use of some words and the order of words may sometimes confuse the reader.
Author Response
We appreciate that the reviewer’s comments. The followings are our point-by-point responses:
- The subject is current, and the paper demonstrates fluency in applying ML methods to the problem at hand. However, it falls a bit short when it comes to framing the problem itself and to substantiating the claim.
Response: Thank you for your constructive comments. In response to your suggestion, the section on " Drone Detection and Recognition Challenges " on line 85 has been added to provide more detail on the challenges in this area.
--------------------------------------------------------------------------------------------------------------------------
- Problem statement. I started reading the paper thinking that the problem lies in detecting drones (i.e. the problem is binary: there is a drone - maybe of a particular type - or there is not). It turned out that the problem is to detect the drone (of one of two types) or a bird, i.e. the problem researched is different than the problem discussed e.g. in literature review. I appreciate that making the distinction between birds and drones is one of more challenging tasks, but nevertheless it is not what has been promised. I would also suggest reviewing the lit review from this perspective, to see whether the paper compares - so to say - pears with pears.
Response: According to your comment, these points have been corrected in lines 17 to 24. To clarify the ambiguity for readers, the issue of detecting and recognizing the two types of drones and distinguishing them from birds has also been added in lines 36, 40, and 47.
--------------------------------------------------------------------------------------------------------------------------
- The same goes for the fact that this seems to be the study in analysing stationary images, not video feeds. It would be beneficial to structure the lit review into stationary and video sections, as video provides both challenges and advantages over stationary. This will make it also easier to verify the claim of this solution being better than the known ones.
Response: In this article, in addition to stationary images, the conversion of video collections to images at a frame rate of 2 FPS has been used. The reason for this is to complete the dataset with all types of challenging drone images and train the network properly. According to your comment, this section was added in the improved version of the article on line 399.
--------------------------------------------------------------------------------------------------------------------------
- Comparing results in ML is tricky, as there are at least two factors that may influence results: available dataset and chosen class of model, with parameters that affect its operation. Thus, if the claim is that the network is better, I would always prefer having identical datasets. Here, I do not how much of the gain can be attributed e.g. to better data preparation.
Note, that actually the claim of being better than other solutions is not properly substantiated, neither in discussion nor in conclusions. It only states that this solution solved the challenge, which my be true enough. So, if this is better than competing solutions, I would like to know what solution forms a baseline and to what extent this one is better.
Response: The basis of our study was the success rate of the work and we did not claim that the results were better. In this study, for the first time, the anatomy of 4 types of multirotors (quadrotor, hexarotor +, octo coax wide and octorotor +) was studied together with helicopters and their discrimination from birds without identifying their commercial models. Moreover, the dataset used is unique and we have not seen a similar article before. Because in this article, not only stationary images were collected, but also various videos were converted into images to add more sophisticated images to the drone-vs-bird dataset.
--------------------------------------------------------------------------------------------------------------------------
- Dataset. If the challenge is to detect drones, then the dataset should contain images of drones under different conditions and over different backgrounds, together with the images where there are no drones. I am missing the thorough explanation of the rationale of leaving aside images with neither of them. I am also not sure whether the weather element has been properly factored in, e.g. incremental weather, fog etc. I can see form the attached examples that various backgrounds were used, which is always good.
Response: Thank you for your constructive comments. As we mentioned at the beginning of the article, challenges in drone detection and recognition include the small size of the drone, the presence of drones in a crowded environment, varying light, and different weather conditions that challenge the detection and recognition process. But not to misunderstand the reader, we have tried to include in Figure 17 the examples where drone detection and recognition is clearer.
According to your comment, the following items have been added in the improved version of the article.
- The challenge related to weather conditions has been added in line 144.
- We have changed Figure1, Figure2, Figure3, Figure4 and Figure17 by adding more complex images and trying to better demonstrate the ability of the trained model in detecting drones and birds.
- We have presented more sophisticated images in Figure18 and described in line 520.
-------------------------------------------------------------------------------------------------------------------------------
- Bounding box. This is a mundane element of the data flow, but an important one. My question is, whether the bounding box for the ground truth was established manually or automatically. Both methods are feasible (yet not easy), but they may lead to different results.
Response: In the Computer Vision Annotation Tool (CVAT) for drawing bounding boxes, two methods are available: manual and automatic. The manual mode is an accurate but time consuming method. Automatic mode speeds up the drawing of bounding boxes, but in some cases the bounding boxes are not drawn accurately around the object. Therefore, to benefit from the speed of automatic mode and the accuracy of manual mode, we used the semi-automatic method. First, we tracked the object using the automatic method and then manually set the final bounding boxes drawn in the previous step. The semi-automatic method has a better performance than the manual and automatic method due to the drawing of the best bounding box.
-------------------------------------------------------------------------------------------------------------------------------
- As for references it may be worth looking at https://doi.org/10.3390/s21082824.
Thank you for introducing this article. We have reviewed and referenced this article in Related Work on line 192 .
--------------------------------------------------------------------------------------------------------------------------
- From the editorial perspective, the paper is readable, but the use of some words and the order of words may sometimes confuse the reader.
Following your opinion, we have re-examined the text of the article and corrected some vague and confusing terms.
Author Response File: Author Response.pdf
Reviewer 2 Report
The automatic recognition of drones represents a topic of great interest at a time when these remote controlled aircraft have had a great diffusion thanks to the simplicity of use and the lowest prices. More and more often we hear about unauthorized access to closed areas and dangerous situations.
The manuscript face with a current theme, proposes and analyzes a new approach starting from the current status of the studies.
The manuscript may be of interest to readers and I have highlighted only a few small observations. I hope that in the future there will be further insights into more complete scenarios (for example fixed wing drones which can be even more dangerous, so important a quick and automatic location ).
341: 4.1. Data Acquisition and Model Implementation
It would be useful to know the systems and methods of image acquisition.
346: Multi-rotors include 4 types as Quadrotor, Hexarotor +, Octo Coax Wide, 346 and Octorotor….
It would be useful to specify if, for each type of drone (quadcopter, hexacopter, etc.), the dataset includes different commercial models. If yes, how many.
395 ....detection and recognition of the 2 types of drones and birds.
Did you include several bird species in the dataset? Some brief information on this aspect would also be useful.
423: 4.3. Model Evaluation in Addressing the Challenges
Have you estimated the applicability thresholds of the method? For example what minimum dimensions are usable to have an acceptable error. What minimum brightness? etc. It would be useful to have some quantitative information on the limits if you have measured them or at least more observations.
Have you thought about using the method also with fixed-wing drones? A few comments would still be interesting.
Have you done any tests in realtime applications? Some more detailed comments would be interesting.
Author Response
We appreciate that the reviewer’s comments. The followings are our point-by-point responses:
- It would be useful to know the systems and methods of image acquisition in the Data Acquisition and Model Implementation section.
Response: Thank you for your constructive comments. This section was added in the improved version of the article on line 392.
-------------------------------------------------------------------------------------------------------------------------------
- It would be useful to specify if, for each type of drone (quadcopter, hexacopter, etc.), the dataset includes different commercial models. If yes, how many.
Response: The case you mention is essentially about drone identification, in which each of the 4 types of multirotor is identified by their commercial models. However, this study addressed the issue of drone and birds detection and recognition, and the identification will be explored in future works. To enhance the drone dataset for the training process, we also included commercial examples for each multirotor type (quadrotor, hexarotor +, octo coax wide, and octorotor +).
With reference to your opinion, these two parts have been added in the modified version.
- Multiple examples have been added for each multirotor type in Figure 12.
- The number of 4 types of multirotors, helicopters and birds was given in line 408.
- Line 47 introduces the topic of detection and recognition in this article.
- Line 583 describes future research to identify different types of drones and distinguish them from bird.
-------------------------------------------------------------------------------------------------------------------------------
- Did you include several bird species in the dataset? Some brief information on this aspect would also be useful.
Response: In the training phase, we used a dataset of 1166 flying birds, ranging in size from the smallest to the largest, such as an eagle. With reference to your opinion, different bird species were added to the figure12.
-------------------------------------------------------------------------------------------------------------------------------
- Have you estimated the applicability thresholds of the method? For example what minimum dimensions are usable to have an acceptable error. What minimum brightness? etc. It would be useful to have some quantitative information on the limits if you have measured them or at least more observations.
Response: Thank you for your effective comments. Yes, our model is able to detect all types of drones and birds in the minimum dimensions of 15 x 30 pixels and the maximum dimensions of 600 x 600 pixels, and there are no limits in detecting and recognizing drones at short and long distances. Also, in this article the criterion of brightness alone is not important, but the difference between the brightness of the object and the background is considered. Thus, a bright object may be seen against a crowded background (or vice versa).
According to your comment, this part was added in the improved version of the article in line 584.
-------------------------------------------------------------------------------------------------------------------------------
- Have you thought about using the method also with fixed-wing drones?
Response: Yes, in future research we intend to detect and recognize fixed-wing and VTOL drones, in addition to multirotor and helicopters. We have included this point in line 584 of the Conclusions section.
-------------------------------------------------------------------------------------------------------------------------------
- Have you done any tests in real-time applications?
Response: Yes, we have done a number of works in the area of real-time drone detection applications with onboard systems. However, since these results are beyond the scope of this paper, we attempt to present these results in the form of other research papers.
Accordingly, we have added this section to line 586 as future work.
-------------------------------------------------------------------------------------------------------------------------------
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
I found your explanations and corrections satisfactory. Still, I would like you to consider, just for yourself, what would happen if you present to your algorithm some pages from the Rorschach test? Would the network identify it to be a drone, a bird or something else?
Possibly it is an inspiration for some new research :)