Drones are commonly used in numerous applications, such as surveillance, navigation, spraying pesticides in autonomous agricultural systems, various military services, etc., due to their variable sizes and workloads. However, malicious drones that carry harmful objects are often adversely used to intrude restricted areas and attack critical public places. Thus, the timely detection of malicious drones can prevent potential harm. This article proposes a vision transformer (ViT) based framework to distinguish between drones and malicious drones. In the proposed ViT based model, drone images are split into fixed-size patches; then, linearly embeddings and position embeddings are applied, and the resulting sequence of vectors is finally fed to a standard ViT encoder. During classification, an additional learnable classification token associated to the sequence is used. The proposed framework is compared with several handcrafted and deep convolutional neural networks (D-CNN), which reveal that the proposed model has achieved an accuracy of 98.3%, outperforming various handcrafted and D-CNNs models. Additionally, the superiority of the proposed model is illustrated by comparing it with the existing state-of-the-art drone-detection methods.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.