Next Article in Journal
Efficacy of Mapping Grassland Vegetation for Land Managers and Wildlife Researchers Using sUAS
Previous Article in Journal
Graph-Based Image Segmentation for Road Extraction from Post-Disaster Aerial Footage
 
 
Article
Peer-Review Record

Drone Detection and Tracking in Real-Time by Fusion of Different Sensing Modalities

Drones 2022, 6(11), 317; https://doi.org/10.3390/drones6110317
by Fredrik Svanström 1, Fernando Alonso-Fernandez 2,* and Cristofer Englund 2,3
Reviewer 1:
Reviewer 2:
Reviewer 3:
Drones 2022, 6(11), 317; https://doi.org/10.3390/drones6110317
Submission received: 29 September 2022 / Revised: 14 October 2022 / Accepted: 19 October 2022 / Published: 26 October 2022

Round 1

Reviewer 1 Report (Previous Reviewer 1)

Most comments of the first round were all taken into account. Justifications were added in cases where no change in architecture was possible.

However, still some parameters of the system are not described like the number of components used of the GMM and the system, process noise and sensor noise matrices of the kalman filters. Additionally, the acquisition or the estimation of these parameters need to be described to allow the transfer of the presented approach to similar problems.

The layout of the article is currently messed up, as the references are interrupted by pictures from the appendix.

The necessary changes should be easily integrated in a minor revision.

Author Response

Most comments of the first round were all taken into account. Justifications were added in cases where no change in architecture was possible.

Thank you for the kind consideration of the paper!

 

However, still some parameters of the system are not described like the number of components used of the GMM and the system, process noise and sensor noise matrices of the kalman filters. Additionally, the acquisition or the estimation of these parameters need to be described to allow the transfer of the presented approach to similar problems.

Response: We apologize for this. The method employed is described in two references from MathWorks [50], [51], where the initial values and parameters’ estimation are fully defined. In the paper, we were only highlighting those that were changed from the default and why, while all others were kept at the default. For clarity, we now mention this in the paper, and we also bring a description of the other relevant parameters and methods as requested (even if left at their default). Section 4.3 has been modified accordingly to include this information (changes highlighted in red). In particular:

  • The number of Gaussian modes in the mixture model is left to the default value of 5
  • The initial location of unassigned objects is given by the centroid given by the foreground detector
  • The initial state estimation error covariance matrix is built as a diagonal matrix [200 0; 0 200]. These values correspond to the initial estimate error of location and velocity, defined to be [200, 200]
  • The process noise covariance matrix is built similarly, but from the tolerance (variance) for the deviation on location and velocity, defined to be [50, 50]
  • The measurement noise covariance is set to 100

 

The layout of the article is currently messed up, as the references are interrupted by pictures from the appendix.

Response: A page break has been introduced, so the references are now together at the end

 

The necessary changes should be easily integrated in a minor revision.

Reviewer 2 Report (Previous Reviewer 2)

I really appreciate the hard work of the authors. Since, the authors have addressed all the comments in detail, I would like to recommend an acceptance after minor changes in the manuscript. Minor comments are:

1. Figure 1: instead of writing left top, bottom left, and so on .. divide a figure into parts a), b) and so on ... Similarly, do the same in all figures such as Figures 7, 9, 10, 13, 16, 19, 20, 22, 23 and so ...

2. line 306: where are blue boxes?

3. sec 3.21 heading is similar to sec 2.1. In section 2, the authors described the literature in detail which is unnecessary. I strongly recommend the authors to focus only present work in comparison to previous work in section 3. Hence the length of the paper should be decreased for sake of readers. I think the previous version of the manuscript consisted of 34 pages while the current version has 39 pages (almost equivalent to thesis work). I have no objection to the length of the paper. This is your work. This is just my suggestion. Because research students usually look for a maximum of 25 pages of paper. 

4. give citations on each figure as it has been referred from the thesis. and similarly give citations in each Table. 

 

thank you

Author Response

I really appreciate the hard work of the authors. Since, the authors have addressed all the comments in detail, I would like to recommend an acceptance after minor changes in the manuscript.

Thank you for the kind consideration of the paper!

 

Minor comments are:

 

  1. Figure 1: instead of writing left top, bottom left, and so on .. divide a figure into parts a), b) and so on ... Similarly, do the same in all figures such as Figures 7, 9, 10, 13, 16, 19, 20, 22, 23 and so ...

Response: The figures have modified as requested, and the references to figures in the text have been updated accordingly.

 

  1. line 306: where are blue boxes?

Response: It refers to the blue boxes at the bottom right of Figure 1a. We have added a text in the figure to help in better finding this element (the GUI layout)

 

  1. sec 3.21 heading is similar to sec 2.1. In section 2, the authors described the literature in detail which is unnecessary. I strongly recommend the authors to focus only present work in comparison to previous work in section 3. Hence the length of the paper should be decreased for sake of readers. I think the previous version of the manuscript consisted of 34 pages while the current version has 39 pages (almost equivalent to thesis work). I have no objection to the length of the paper. This is your work. This is just my suggestion. Because research students usually look for a maximum of 25 pages of paper.

Response: The heading of 3.2.1 is “Thermal Infrared Camera (IRcam)” and of 2.1 is “Thermal Infrared Sensors”, so we are not sure what the reviewer is referring to. However, we have changed section 2.1 to “Sensors Detecting Thermal Infrared Emission” to make sure that both headings differ sufficiently. Regarding paper length, we agree that it has many pages. The appendix takes 4 pages and it could be removed, but we decided to add it to have the study complete. In the same way, we wanted the review of the state of the art to be as complete as possible (specially compared with the conference paper version, where it is a very reduced version). Also, the related literature takes 3 pages in the previous version, which makes a small percentage of the total length. Having said that, we have gone through section 2 to do editing and removal of redundancies to achieve text reduction, sitting in the new version at little above 2.5 pages.

 

  1. give citations on each figure as it has been referred from the thesis. and similarly give citations in each Table.

Response: All figures and table now contain citations to the original source

Reviewer 3 Report (Previous Reviewer 3)

The authors addressed my concerns. 

Author Response

 Thank you for the kind consideration of the paper. 

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

The paper describes the setup of a drone detection and tracking system using multiple sensors. The system is designed to be mobile and detect drones outside without additional infrastructure. The sensors used, are  two different cameras, one infrared camera, a microphone array and an ADS-B receiver. The focus of the paper is the design and evaluation of the
individual sensor signal processing and detection mechanisms.  Additionally, the authors design and evaluate a sensor fusion approach to further enhance the output quality of the system.

The proposed system is rather well described, clearly indicating the used mechanisms to process the sensor signals. Most concepts are justified by state-of-the-art works, using similar approaches. The dataset necessary for the training of the machine learning-based classificators is made publicly available, which is a very beneficial for the whole community, as such datasets are very scarce and valuable.

However, there are some issues in the current paper:

The implementation part of the paper is very Matlab focused, which limits the accessibility to a large part of the drone community, as Matlab is not the most natural choice. A more generic description of the processing steps, together with a visualization of these in the form of flow-charts, algorithms or data-flow graphs would be very beneficial to the understanding of the reader.

Even though most of the parameters are of the system are and the machine-learning algorithms are well described, some are left out. An example is the parameters of the multi-hypothesis Kalman and the Gaussian-Mixture-Model.

Some used concepts are described slightly unclear. An example is the ADS-B worker and its two output queues. The usage and importance of these two is only described very superficial.

Most of the design choices are described clearly by the authors and are either directly justified or justified through related work. However, the choice of processing for the Fish-Eye Cam is strongly different compared to the other approaches, and this different choice is nearly unjustified. A clear definition of parameters and requirements for the processing steps and the used concept would help the reader to adapt the system to their own needs.

The sensor fusion part of the system is minimal and seems to be added on top of the whole system without much fine-tuning. The current approach is a two-way weighted sum over sensors and time. However, much better concepts for sensor fusion exist, which could be exploited like Kalman or Particle-Filters. The design choice of this specific approach is also not described at all. Only a generic discussion of early vs. late sensor fusion is done.

The used YOLOv2 is old and newer version of YOLO provide typically better detection quality as they use newer and more sophisticated neural networks. An example would be the actual YOLOv5.

The used English language is understandable and mostly ok. However, some grammatical errors persist, which need an additional round of proofreading.

Author Response

Reviewer 1

The paper describes the setup of a drone detection and tracking system using multiple sensors. The system is designed to be mobile and detect drones outside without additional infrastructure. The sensors used, are  two different cameras, one infrared camera, a microphone array and an ADS-B receiver. The focus of the paper is the design and evaluation of the

individual sensor signal processing and detection mechanisms.  Additionally, the authors design and evaluate a sensor fusion approach to further enhance the output quality of the system.

 

The proposed system is rather well described, clearly indicating the used mechanisms to process the sensor signals. Most concepts are justified by state-of-the-art works, using similar approaches. The dataset necessary for the training of the machine learning-based classificators is made publicly available, which is a very beneficial for the whole community, as such datasets are very scarce and valuable.

 

Response: Thank you for the kind consideration of the paper. We hope that the indicated challenges below contribute to mitigate the issues about the paper.

 

However, there are some issues in the current paper:

 

The implementation part of the paper is very Matlab focused, which limits the accessibility to a large part of the drone community, as Matlab is not the most natural choice. A more generic description of the processing steps, together with a visualization of these in the form of flow-charts, algorithms or data-flow graphs would be very beneficial to the understanding of the reader.

Response: we apologize for this. We have added more elaborated description in the text about what each function does, and about its parameters if they were not mentioned before, so researchers wishing to replicate it can find appropriate functions in other language platforms. In some cases when the flow of the text recommends it, the description is added via footnotes.

The hardware and software elements of each sub-system have also been added to figure 1, which has been made bigger for better view.

 

Even though most of the parameters are of the system are and the machine-learning algorithms are well described, some are left out. An example is the parameters of the multi-hypothesis Kalman and the Gaussian-Mixture-Model.

Response: Section 4.3 has been modified to describe and justify the parameters chosen for the Kalman and GMM methods.

 

Some used concepts are described slightly unclear. An example is the ADS-B worker and its two output queues. The usage and importance of these two is only described very superficial. Section 3.3

Response: one queue consists of current tracks and the other of the history tracks. The “current” queue contains the current positions and additional information (Id, position, altitude, calculated distance from the system, azimuth and elevation in degrees as calculated relative to the system, time since the message was received and the target category). The “history” tracks queue is just a list of the old positions and altitudes. This partition in two queues saves computational resources by reducing the amount of data from the worker drastically, so that only the information needed/used by the main process is sent. It also makes easier to control the data flow, since the length of the history track queue is easily set if it is separated.

Section 3.3 has been modified to include this information.

 

Most of the design choices are described clearly by the authors and are either directly justified or justified through related work. However, the choice of processing for the Fish-Eye Cam is strongly different compared to the other approaches, and this different choice is nearly unjustified. A clear definition of parameters and requirements for the processing steps and the used concept would help the reader to adapt the system to their own needs.

Response: please see our response to comment #2 above about the parameters of Kalman and GMMs.

 

The sensor fusion part of the system is minimal and seems to be added on top of the whole system without much fine-tuning. The current approach is a two-way weighted sum over sensors and time. However, much better concepts for sensor fusion exist, which could be exploited like Kalman or Particle-Filters. The design choice of this specific approach is also not described at all. Only a generic discussion of early vs. late sensor fusion is done.

Response: Avoiding a trained fusion approach is partly motivated in the text due to the need of sufficient training data. Another underlying reason is that this work stems from a master thesis, which has a well-defined timeframe. After thorough literature review, choice of hardware, software implementation, setup of the acquisition system, database acquisition in three different airports, and experiments with single sensors, few time was left for fusion experiments. Therefore, we opted for untrained fusion, in particular weighted fusion. Such choice was also motivated from the literature as being more robust than other untrained approaches.

The text of section 4.6 about sensor fusion has been modified to reinforce the above-mentioned ideas, including reference that justifies the choice of weighted fusion. The explanation about early vs. late sensor has been improved as well.

 

The used YOLOv2 is old and newer version of YOLO provide typically better detection quality as they use newer and more sophisticated neural networks. An example would be the actual YOLOv5.

Response: we agree, indeed the use of more recent YOLO versions is mentioned as future work. However, when the thesis described in this paper was carried out, YOLO v2 was the predominant choice in the literature (only one paper used the newer v3), so we decided to stick to YOLO v2 to make possible a fairer comparison with previous papers. A recent survey paper on drone detection still confirms this trend too.

All these mentioned considerations have been further strengthened in the discussion of the conclusions section.

 

The used English language is understandable and mostly ok. However, some grammatical errors persist, which need an additional round of proofreading.

Response: the paper has been thoroughly checked for typos and grammatical mistakes.

Reviewer 2 Report

I really appreciate the hard work of the authors. However, following are the major concerns related to the paper:

 

1. plagiarism is reported as 68%

2. Reference section is missing.

3. the paper was submitted as an article, but it looks like a review paper, consisting of 34 pages. It is recommended to reduce section 2. 

 

Therefore, it is recommended to revise the paper completely and include the reference. 

Author Response

Reviewer 2

I really appreciate the hard work of the authors. However, following are the major concerns related to the paper:

  1. plagiarism is reported as 68%

Response : This is very likely because the paper presents the work of a master thesis that is available in a public repository. In addition, a preliminary version of the paper also appeared in a conference, and the submitted paper has been also made available on arxiv.

Master thesis: http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-42141

Conference paper: https://ieeexplore.ieee.org/document/9413241 and https://arxiv.org/abs/2007.07396

The submitted paper: https://arxiv.org/abs/2207.01927

 

  1. Reference section is missing.

Response: In our local latex compilation, the references section was present, so we do not know what could happen. We will make sure that in the current submission, the references section is available in the paper generated by the submission system.

 

  1. the paper was submitted as an article, but it looks like a review paper, consisting of 34 pages. It is recommended to reduce section 2. 

Response: Section 2 (related work) has been reduced/summarized as requested.

 

Therefore, it is recommended to revise the paper completely and include the reference. 

Reviewer 3 Report

The authors consider real-time drone detection and tracking using different sensing modalities, which is a timely topic of broad interest to researchers working on signal processing, sensor fusion, and other related fields. The paper is quite comprehensive with great technical details and provides real data sets. Overall, the paper is easy to follow and well organized. To further enrich the scope of this paper, the authors may discuss a bit about distributed detection and tracking; while in this scenario, each distributed agent may only have access to a few common modalities/features and how to fuse the information in a collaborative and optimal manner is interesting to see. The authors may refer to the following paper which brought up some discussions in the same regard.

https://ieeexplore.ieee.org/abstract/document/9250516/ 

Author Response

Reviewer 3

The authors consider real-time drone detection and tracking using different sensing modalities, which is a timely topic of broad interest to researchers working on signal processing, sensor fusion, and other related fields. The paper is quite comprehensive with great technical details and provides real data sets. Overall, the paper is easy to follow and well organized. To further enrich the scope of this paper, the authors may discuss a bit about distributed detection and tracking; while in this scenario, each distributed agent may only have access to a few common modalities/features and how to fuse the information in a collaborative and optimal manner is interesting to see. The authors may refer to the following paper which brought up some discussions in the same regard.

https://ieeexplore.ieee.org/abstract/document/9250516/ 

 

Response: Thank you for the interesting suggestion. Indeed, distributed operation was mentioned as future work, but not elaborated. In the new version, we have added a discussion about this topic in the conclusions section:

“In a distributed scenario [60], several agents (detection stations in our case) could cooperate, each one maybe having access to a few sensing modalities only, or to a varying degree of computing capabilities or battery power. Replicating one collection station with many different sensors may be cost-prohibitive, or difficult if transport and deployment needs to be done quickly (e.g. we used a lighter version with only two sensors and manual steering for dataset collection, Figure 2). However, cooperative operation entails challenges, such as handling orchestration and task distribution between units, connectivity via 4G/5G networks, or cloud/edge computing if the deployed hardware lacks such capability. Another challenge is the optimal fusion of information from sources that may have different data quality (e.g. sensors with different features), different modalities (e.g. visible, NIR, audio...), or different availability at a given time (e.g. a specific sensor is not deployed or the object is not visible from that location). In the case of visual sensors, there is also need to match targets observed by sensors in different physical locations, since the object is seen from a different point of view. On the other hand, the first station that detects a target can notify it and provide useful information to aid in the detection by the others.”

Back to TopTop