A Deep Learning Approach of Intrusion Detection and Tracking with UAV-Based 360° Camera and 3-Axis Gimbal

Xu, Yao; Liu, Yunxiao; Li, Han; Wang, Liangxiu; Ai, Jianliang

doi:10.3390/drones8020068

Open AccessArticle

A Deep Learning Approach of Intrusion Detection and Tracking with UAV-Based 360° Camera and 3-Axis Gimbal

by

Yao Xu

,

Yunxiao Liu

,

Han Li

^*,

Liangxiu Wang

and

Jianliang Ai

Department of Aeronautics and Astronautics, Fudan University, Shanghai 200433, China

^*

Author to whom correspondence should be addressed.

Drones 2024, 8(2), 68; https://doi.org/10.3390/drones8020068

Submission received: 13 January 2024 / Revised: 15 February 2024 / Accepted: 15 February 2024 / Published: 18 February 2024

(This article belongs to the Section Drone Design and Development)

Download

Browse Figures

Versions Notes

Abstract

Intrusion detection is often used in scenarios such as airports and essential facilities. Based on UAVs equipped with optical payloads, intrusion detection from an aerial perspective can be realized. However, due to the limited field of view of the camera, it is difficult to achieve large-scale continuous tracking of intrusion targets. In this study, we proposed an intrusion target detection and tracking method based on the fusion of a 360° panoramic camera and a 3-axis gimbal, and designed a detection model covering five types of intrusion targets. During the research process, the multi-rotor UAV platform was built. Then, based on a field flight test, 3043 flight images taken by a 360° panoramic camera and a 3-axis gimbal in various environments were collected, and an intrusion data set was produced. Subsequently, considering the applicability of the YOLO model in intrusion target detection, this paper proposes an improved YOLOv5s-360ID model based on the original YOLOv5-s model. This model improved and optimized the anchor box of the YOLOv5-s model according to the characteristics of the intrusion target. It used the K-Means++ clustering algorithm to regain the anchor box that matches the small target detection task. It also introduced the EIoU loss function to replace the original CIoU loss function. The target bounding box regression loss function made the intrusion target detection model more efficient while ensuring high detection accuracy. The performance of the UAV platform was assessed using the detection model to complete the test flight verification in an actual scene. The experimental results showed that the mean average precision (mAP) of the YOLOv5s-360ID was 75.2%, which is better than the original YOLOv5-s model of 72.4%, and the real-time detection frame rate of the intrusion detection was 31 FPS, which validated the real-time performance of the detection model. The gimbal tracking control algorithm for intrusion targets is also validated. The experimental results demonstrate that the system can enhance intrusion targets’ detection and tracking range.

Keywords:

intrusion detection; deep learning; 360° camera; UAV application

1. Introduction

In security and inspection, intrusion detection is one of the critical technologies. This technology is mainly used in intrusion detection of personnel and engineering vehicles in critical security scenarios such as airports and construction sites. With the rapid development of UAVs and deep learning technology, UAVs equipped with optical payloads can achieve large-area detection based on aerial perspectives [1].

In the previous research, a detection method based on the combination of deep learning and UAV for bird’s nest intrusion of transmission lines, the practicality of the system was proved through experiments [2]. However, some key technologies could still be improved in the above applications. The pods carried by common UAVs have a relatively limited field of view, usually less than 120°. When performing target detection tasks, the field of view limits it, and it is not easy to obtain better continuous tracking characteristics of targets in large areas. To broaden the application of UAVs in intrusion detection, the above problems need to be solved urgently. In recent years, the application of deep learning for target detection in UAV images has gained significant traction within the computer vision domain. This research direction has witnessed the introduction of numerous novel target detection algorithms, finding practical applications in autonomous UAV surveillance. Since 2014, starting with the inception of the R-CNN algorithm as outlined in [3], these algorithms have leveraged deep learning technology to extract latent features from input images automatically, leading to more accurate sample classification and prediction. With continuous advancements in deep learning and computer vision, a succession of deep learning-based image target detection algorithms have surfaced post-R-CNN, including Fast R-CNN [4], Faster R-CNN [5], and YOLO [6]. Intrusion detection by UAVs refers to the use of unmanned aircraft equipped with sensors, cameras, or other detection technologies to identify and respond to unauthorized or anomalous activities within a specified area [7,8,9]. This approach is particularly valuable in scenarios where traditional surveillance methods may be challenging or impractical. Huang H. et al. introduce a comprehensive railway intrusion detection method for UAV surveillance, emphasizing its adaptability to diverse scenarios and its superior performance compared to existing object detection networks [10]. Tan L. et al. present YOLOv4_Drone, an improved algorithm for UAV image target detection, addressing challenges posed by small targets, complex backgrounds, and occlusion due to the UAV’s flying characteristics [11].

At present, 360° panoramic imaging technology has become increasingly mature. The UAV panoramic video shooting device proposed by Kun Xia et al. can realize 360°×360° aerial panoramic video captured, which provides a new idea for obtaining ultra-wide-range aerial video [12]. The research results of Ye Zheng et al. provide a system design method for UAVs intrusion detection, positioning, and tracking based on the ground panoramic imaging system. This research uses 16 lenses to form a 360° panoramic vision system, which realizes the efficient intrusion detection of UAVs with an effective range of up to 80 m [13]. In the field of bridge inspection, the author introduces a novel method for detecting cracks in bridges utilizing a 360° camera mounted on top of a drone. Unlike traditional manual bridge inspections, previous drone implementations employed standard high-definition cameras positioned beneath the drone. In order to establish a comparison with the conventional approach, the study selected and examined two bridges in Germany, employing both methods for crack and defect detection [14].

The above research provides a specific basis for the study of UAV-based intrusion detection. However, there is still a need for improvement in the field of view coverage and the continuous tracking effect of the target. According to related work, combining a 360° panoramic camera with a 3-axis gimbal camera can obtain a larger aerial surveillance perspective, which is very beneficial for intrusion target detection, but currently, there are few such studies. At the same time, due to the broad, high-altitude perspective of the UAV, the resolution of the intrusion target is low, the coverage area in the image is small, and the feature expression is insufficient, which makes the detection challenging. Therefore, the purpose of this study is to solve the listed challenges. The contributions are as follows:

(1): An intrusion target detection system based on the multi-rotor UAV was designed and implemented, and the related work of hardware selection, software architecture design, and target detection algorithm implementation was completed. Field flight tests were carried out based on this UAV system.
(2): In order to solve the problem of the limited viewing angle of UAVs for detecting intrusion targets, this research proposes a method that combines 360° panoramic images and 3-axis gimbal image tracking to improve the search and discovery range of intrusion targets.
(3): Based on a field flight test, 3043 flight images taken by a 360° panoramic camera and a 3-axis gimbal in various environments were collected, and an intrusion data set was produced. Considering the applicability of the YOLO model in intrusion target detection, this paper proposes an improved YOLOv5s-360ID model based on the original YOLOv5-s model.
(4): The YOLOv5s-360ID model uses the K-Means++ clustering algorithm to regain the anchor box that matches the small target detection task. At the same time, this research also improves the bounding box regression loss function of the original YOLOv5-s model.

2. Materials and Methods

This research provides a target intrusion detection system based on the combination of a 360° panoramic camera and a 3-axis gimbal. Based on the combination of the 360° panoramic image and the 3-axis gimbal, the initial detection of a wide range of targets can be achieved with the help of the 360° panoramic image. Then, using the zoom and rotation tracking advantages of the 3-axis gimbal, it is possible to attain higher-definition target tracking and detection after the target is enlarged. In this way, continuous detection and tracking of intrusion targets with a wide range and variable focus tracking can be realized. The comparison between the 360° panoramic image and the image captured by the 3-axis gimbal is shown in Figure 1.

The YOLOv5 target detection algorithm is used to build a target detection model for 360° panoramic images and images taken by a 3-axis gimbal. This model realizes automatic detection of ground personnel, cars, engineering vehicles, trucks, and bicycle targets. Usually, the detection of intrusion targets is used in security and inspection scenarios. For example, in an uncontrolled area in a security scenario, vehicles and personnel can be defined as intrusion targets if vehicles or personnel are prohibited from entering. Large operating equipment, such as cranes, may be included in the control targets for construction control areas. In this case, the crane may be listed as an intrusion target. Therefore, when this system is applied, it uses the ground station to interact with the UAV’s onboard software, and can clarify the current intrusion target detection task according to the needs of the scenario. A 3-axis gimbal tracking control algorithm based on synchronous feedback guidance of 360° panoramic images and gimbal images was also developed. The software was developed using an onboard computer, and the method verification was completed on a multi-rotor UAV. The research of this study was divided into three phases of the implementation process, as shown in Figure 2.

2.1. Hardware Design

This system uses the DJI M600 PRO UAV as the flight test platform, and uses NVIDIA Jetson Xavier NX as the onboard computer for the development and testing of deep learning algorithms. The image payloads are the DUXCAM 360° panoramic camera and the A8 Mini 3-axis gimbal. In addition, the ground monitoring and onboard computers are networked using the MESH networking module, which is convenient for test result feedback and test process monitoring. The system hardware structure is shown in Figure 3.

Among them, the DUXCAM 360° panoramic camera and 3-axis gimbal are networked with the onboard computer through a switch, and the onboard computer pulls video streams from the DUXCAM 360° panoramic camera and 3-axis gimbal, respectively, through the RTSP protocol. After testing, the time delay from camera capture to onboard computer acquisition of video is about 300 ms. In addition, the 3-axis gimbal is connected to the onboard computer through a serial port, and the onboard computer can obtain the position feedback of the gimbal through this serial port, and can also issue control commands to the gimbal through this serial port. The overall topology of the hardware connection is shown in Figure 4.

The specifications of the DJI M600 PRO UAV platform and image payloads are presented in Table 1 [15,16].

In this research, the Jetson Xavier NX AI computer serves as the hardware for image processing to enable the implementation of the detection algorithm on the UAV aerial platform. Building upon this hardware foundation, software development for the intrusion detection algorithm is undertaken. The device represents embedded AI processing platforms, delivering robust, low-power computational support for tasks involving deep learning and computer vision [17,18,19,20]. The performance and specifications of the embedded AI onboard computer are outlined in Table 2.

2.2. Software Design

This software development was grounded in the NVIDIA JetPack development environment, and the completion of the software development relied on the ROS and PyTorch deep learning framework. The comprehensive architecture of the software is illustrated in Figure 5.

2.2.1. Intrusion Detection Model Implementation

First, when constructing the intrusion target detection model in this research, the aerial 360° panoramic and 3-axis gimbal image data sets containing personnel, vehicles, engineering vehicles, trucks, and bicycle targets were collected through UAV flights in real operating scenarios. Training and testing were carried out after completing the target marking, based on the detection model proposed in this research. Then, we developed the 3-axis gimbal tracking algorithm, deployed the software and onboard computer to the UAV, and conducted a field flight test to evaluate the system’s intrusion detection and tracking effect in the actual operating environment. The above process is shown in Figure 6.

2.2.2. YOLOv5-s Detection Algorithm

YOLOv5 is the fifth-generation model of the YOLO series. It is a target detection model based on the PyTorch framework. Its structure and process are shown in Figure 7.

Structurally, YOLOv5 can be categorized into three components. The backbone feature extraction network utilizes the Focus + CSP Darknet structure, the neck incorporates the enhanced Feature Pyramid Network (PAN), and the YOLO head is responsible for the final predictions. YOLOv5 has four variants: YOLOv5-x, YOLOv5-l, YOLOv5-m, and YOLOv5-s. These versions differ in weight, with sequential model width and depth increments. For this study, YOLOv5-s is chosen. Despite its lower average precision (AP) accuracy than the other three models, it is better suited for efficiently implementing lightweight AI computing platforms on UAVs due to its lesser depth [21,22].

2.2.3. The Improved Detection Algorithm YOLOv5s-360ID

For ground intrusion target detection based on the UAV’s perspective, due to the broad high-altitude perspective of the UAV, the low resolution of the intrusion target, the small coverage area in the image and insufficient feature expression, the detection is complex, and the detection effect is unsatisfactory. There is no dedicated micro-intrusion target detection algorithm, and conventional-scale target detection algorithms have poor detection results. Because small targets occupy few pixels in the image, it is difficult for the algorithm to extract helpful target features, and it will cause a severe imbalance in the number of positive and negative samples.

This paper considers the applicability of the YOLO model in intrusion target detection, so it uses the efficient and accurate YOLOv5-s as the basis for research, and designs an improved YOLOv5s-360ID model. The YOLOv5s-360ID detection model is used to improve the detection accuracy of micro-intrusion targets under high altitude conditions. The structure and process of the model are shown in Figure 8.

According to the characteristics of the intrusion target, to make the network model more efficient and ensure higher detection accuracy, the anchor box of the YOLOv5-s model was improved and optimized, and the anchor box matching the small target detection task was re-obtained through the K-Means++ clustering algorithm. In addition, the EIoU loss function is introduced to replace the original CIoU loss function as the new target bounding box regression loss function [23,24], which optimizes the calculation method of the loss function, speeds up the convergence of the prediction box regression loss function, and improves the regression accuracy of the prediction box.

2.2.4. Anchor Box Improvement Optimization in YOLOv5s-360ID

The UAV’s aerial viewpoint makes it challenging to detect small intrusion targets using the original YOLOv5-s network’s anchor set. To address this issue, the K-Means or K-Means++ clustering algorithm is employed to improve the accuracy of small target detection. The use of anchor optimization can also enhance the model training process’s convergence effect.

The K-Means clustering operator is an unsupervised learning operator, and refers to the most straightforward iterative clustering operator. One of the main ideas is to split each data set sample into k clusters for clustering, and assign each sample to the cluster closest to the cluster centre according to the degree of similarity, where the Euclidean distance judges the degree of similarity. Before using the K-Means algorithm to cluster the data set, it is necessary to preset k cluster centres, recalculate the arithmetic mean of the two-dimensional coordinate errors of all cluster analysis centre points, and update and iteratively adjust the initial position of the centre point coordinates of the cluster analysis method. When the sum of square errors (SSE) of the centre coordinates of all cluster analysis methods remains unchanged, it will be used as the new cluster centre, and finally, all new clustering centres will be output [25,26,27]. The Euclidean distance between the data of the sample set and the centre point is:

d (x, C_{i}) = \sqrt{\sum_{j = 1}^{m} {(x_{j} - C_{i j})}^{2}}

(1)

Formula (1) corresponds to each data in the sample data set;

C_{i}

represents the collection centre of the i-th data cluster; m corresponds to different degrees of expansion of the sample data; and

x_{j}

and

C_{i j}

are the j-th corresponding values of

x

and

C_{i}

, respectively.

S S E

is the sum of square errors of the centre coordinates, and its calculation formula is:

S S E = \sum_{i = 1}^{k} \sum_{x \in C_{i}} {|d (x, C_{i})|}^{2}

(2)

where k in Formula (2) is the number of clusters. Since the initial k cluster centres of the K-Means clustering algorithm are randomly selected, such a selection method may cause the algorithm to appear in a local optimal situation. In response to this situation, this article proposes the K-Means++ clustering algorithm. When selecting the initial clustering centre, it follows the principle of keeping the distance between the initial centre points as far as possible, which can avoid the occurrence of local optimal situations and achieve as much as possible global optimal. The process of calculating the K-Means++ cluster centre is as follows.

Step 1: In the entire sample data set x, randomly select a sample as the centre of the first cluster,

C_{1}

.

Step 2: Each sample

x_{m}

in the data set x, except

C_{1}

, calculates the IoU distance from the first cluster centre

C_{1}

, and the distance between the sample m and the existing cluster centre

C_{j}

is recorded as

d (x_{m}, C_{j})

.

Step 3: Use the following probability formula to calculate the second cluster centre in the set x, denoted as

C_{2}

.

P_{m} = \frac{d^{2} (x_{m}, C_{1})}{\sum_{j}^{n} d^{2} (x_{j}, C_{1})}

(3)

Step 4: Then, select the next cluster centre

C_{j}

, calculate the distance between each sample centre and the centre analysed by the existing cluster analysis, and find the initial centre suitable for each sample centre to belong to. Then, according to each sample, m = 1, 2, … n, p = 1, 2, … j − 1, the following initial centre is selected from x with probability.

P = \frac{d^{2} (x_{m}, C_{p})}{\sum_{x h \in C_{p}} d^{2} (x_{h}, C_{p})}

(4)

where

C_{p}

in Formula (4) is the set of all samples to the nearest initial centre. The probability of selecting the next sample point as the initial centre is proportional to the distance attributed to the existing initial centre.

Step 5: Repeat Step 4 until the centre of the k-th data cluster is selected.

Step 6: The final steps to determine the k data cluster centres are the same as the K-Means clustering operator, except for the initial centre selection process.

2.2.5. Improvement of Bounding Box Regression Loss Function in YOLOv5s-360ID

The loss function of YOLOv5 is calculated by the weighted sum of three sub-loss functions. The three sub-loss functions are the bounding box regression loss function, the classification loss function, and the confidence loss function. The calculation formula is as follows:

\begin{array}{l} L_{total} & = \sum_{i}^{N} (λ_{1} L_{cls} + λ_{2} L_{reg} + λ_{3} L_{obj}) \\ = \sum_{i}^{N} (λ_{1} \sum_{j}^{B_{i}} L_{{cls}_{j}} + λ_{2} \sum_{j}^{B_{i}} L_{{CIoU}_{j}} + λ_{3} \sum_{j}^{S_{i} \times S_{i}} L_{{obj}_{j}}) \end{array}

(5)

Among them, N is the number of detection layers, YOLOv5 defaults to 3 layers, S × S is the number of grids that the feature map is divided into at this scale, and B is the number of a priori boxes assigned to each grid;

L_{cls}

is the classification Loss,

L_{r e g}

is the bounding box regression loss,

L_{o b j}

is the target object confidence loss, and

λ_{1}

,

λ_{2}

, and

λ_{3}

are the weights of these three losses, respectively. YOLOv5 uses CIoU Loss as the bounding box regression loss function [28,29].

The CIoU Loss calculation formula is as follows:

\{\begin{matrix} L_{CIoU} = 1 - I o U + \frac{ρ^{2} (b, b^{gt})}{c^{2}} + α υ \\ I o U = \frac{I n t e r s e c t i o n}{U n i o n} \\ υ = \frac{4}{π^{2}} {(\arctan \frac{w^{gt}}{h^{gt}} - \arctan \frac{w}{h})}^{2} \\ α = \frac{υ}{(1 - I o U) + υ} \end{matrix}

(6)

Among them, IoU is the intersection ratio of the anchor box and the target box,

ρ^{2} (b, b^{gt})

represents the Euclidean distance between the centre points of the anchor box and the target box,

c

represents the diagonal length of the minimum circumscribed rectangular box that contains both the anchor box and the target box, and

α

is the trade-off weight coefficient;

I n t e r s e c t i o n

represents the intersection of the predicted frame and the actual frame,

U n i o n

represents the union of the predicted frame and the actual frame, and

w^{gt}

,

h^{gt}

,

w

, and

h

are the width and height of the actual frame and the width and height of the predicted frame, respectively.

Compared with the GIoU Loss used in previous generations of YOLO, CIoU Loss considers the positional relationship between the anchor box and the target box as well as the scale relationship of the box, and adds the centre point of the expected box and the primary box to the loss function. The penalty terms of Euclidean distance and aspect ratio consider the overlapping area, centre point distance, and aspect ratio between the two, making the evaluation of the prediction frame more accurate. However, when the aspect ratio of the predicted frame and the actual frame are linearly related, the aspect ratio penalty term loses its effect, thus affecting the regression of the expected frame. Therefore, given the shortcomings of CIoU Loss, this article proposes using EIoU Loss to replace CIoU Loss and splitting the aspect ratio penalty term in CIoU Loss into a comprehensive and high-penalty term. The calculation formula for EIoU Loss is as follows:

\begin{array}{l} L_{EIoU} = L_{IoU} + L_{dis} + L_{asp} \\ = 1 - I o U + \frac{ρ^{2} (b, b^{gt})}{{(w^{c})}^{2} + {(h^{c})}^{2}} + \frac{ρ^{2} (w, w^{gt})}{{(w^{c})}^{2}} + \frac{ρ^{2} (h, h^{gt})}{{(h^{c})}^{2}} \end{array}

(7)

Among them, the first two penalty terms are the same as CIoU Loss.

L_{asp}

and

ρ^{2} (w, w^{gt})

in

ρ^{2} (h, h^{gt})

represent the difference in width and height between the predicted and target boxes, respectively.

w^{c}

and

h^{c}

are the minimum circumscribed rectangles that contain both the anchor box and the basic box, and, respectively, the width and height of the frame. The EIoU Loss is described in Figure 9.

EIoU Loss solves the problem of ambiguity in the definition of the aspect ratio penalty in CIoU Loss by splitting the penalty term of the aspect ratio into a penalty term of the difference between the width and height of the predicted frame and the actual frame. At the same time, the split width and height penalty terms can ensure that the model minimizes the difference between the width and height of the prediction frame and the actual frame during the training process, thus speeding up the convergence speed of the prediction frame regression loss function.

2.2.6. Target Tracking Algorithm

The intrusion target detection and tracking method proposed in this research combines the 360° panoramic image and the 3-axis gimbal image. The tracking algorithm is shown in Figure 10.

To achieve stable target detection, it is necessary to keep the selected intrusion target at the centre of the image and feedback on the heading angle of the gimbal to realize the intrusion position estimation of the intrusion target. We introduced a PD controller to realize the control of the gimbal [30,31,32]. The inner loop is implemented inside the gimbal, and the input of the outer ring is the offset of the selected target’s centre coordinate relative to the image’s centre. The PD controllers generate the target angular velocity for the gimbal rotation. The controllers consider the image centre coordinates,

c_{x}

and

c_{y}

, as the desired translational and vertical positions of the target within the image. The centre of the target region

(u_{x}, u_{y})

is defined as the position of the target in the image. The inputs for the controllers can be described as:

\begin{matrix} i_{p} = \{\begin{matrix} 0 & |c_{x} - u_{x}| < α W \\ c_{x} - u_{x} & |c_{x} - u_{x}| \geq α W \end{matrix} \\ i_{t} = \{\begin{matrix} 0 & |c_{y} - u_{y}| < α H \\ c_{y} - u_{y} & |c_{y} - u_{y}| \geq α H \end{matrix} \end{matrix}

(8)

where W and H represent the image width, which is 1920 pixels, and height, 1080 pixels, respectively. The coefficient scale

α

is set to 0.08. The PD controllers generate the desired angular velocity

\tilde{w}

for the gimbal rotation. During ground tests, optimal performance in target tracking is achieved by setting P parameters to 0.1025 and 0.1046, and D parameters to 0.001 and 0.002 for pitch and yaw controllers, respectively.

It should be further explained that the search procedure mentioned in the figure is a target loss and recapture procedure. When it is found that the target has been lost for more than 200 ms, the 3-axis gimbal will use the centre of the screen at the current tracking position as the reference and execute left, up, right, and down rotation instructions clockwise, and the rotation angle is the preset given search angle. If the search procedure is performed more than three times and the 3-axis gimbal still does not capture the target, it will return to the start of the main program.

3. Results

In constructing and labelling the intrusion target detection data set, the clips containing the intrusion target were selected from the videos shot by the 360° panoramic camera and the 3-axis gimbal, and then the images were extracted and obtained at intervals of 10 frames. A total of 3043 images were collected. Images containing intrusion targets, including 1690 360° panoramic images and 1353 simultaneous images taken by a 3-axis gimbal, were used to complete the labelling of these images using the LabelImg v1.8.1 software. Each set of labels covered personnel, vehicles, cranes, trucks, and bicycles, a total of five categories of targets. To realize the expansion of the data set, a total of about 10,000 images for training were obtained by flipping, cropping, and expanding the training data set.

Table 3 provides the details of the labels in the data set. The data set mentioned above is randomly divided into ten non-overlapping sub-data-sets, with nine selected for the training set and one reserved for verification.

Figure 11 depicts a schematic representation of image annotation for the training set. Figure 11a presents the original image, while the subsequent images are derived from the extended image data acquired through image processing.

3.1. Intrusion Detection Model Experiment

In assessing target detection algorithms, specific evaluation criteria are expected to be employed to evaluate the performance of the algorithm model. Specifically, when dealing with deep learning algorithms, it becomes essential to consider a range of evaluation metrics to appraise the model’s efficacy. Throughout this experiment, we manipulated different parameters to generate multiple training models during the model training process, observing the change in the loss curve of the training output parameters. Precision rate (Precision), recall rate (Recall), and average precision (AP) were computed as metrics to evaluate the model’s performance. In addition, since this project needs to realize real-time target detection on UAVs, it is very important to meet the real-time detection requirements, so the frames per second(FPS) also needs to be considered [33,34,35]. The metrics are defined as follows:

P = \frac{T P}{T P + F P}

(9)

R = \frac{T P}{T P + F N}

(10)

A P = \int_{0}^{1} P (R) d R

(11)

where TP, FP, and FN are the number of correct detection intrusions, false detection intrusions, and missed detection intrusions, respectively, and AP is the value of the area covered by the PR curve. From those above, it is apparent that AP assesses the average precision of a specific target class. Assessing the comprehensive performance of the model requires averaging the AP scores across all targets, resulting in the mAP calculation.

Below is the formula for computing mAP, where N denotes the count of target classes to be identified [36].

m A P = \frac{\sum A P}{N}

(12)

Throughout the model training phase, this study visualizes the fluctuations in the loss curve of training output parameters, configures diverse parameters to generate multiple training models, and computes precision as a metric to evaluate the model. An Intel Xeon 2.3 GHz CPU, a 12.7 GB RAM, and a NVIDIA Tesla T4 16 GB GPU were used for the training. Table 4 lists the parameter configurations employed for model training.

The accuracy and loss performance in the training process are shown in Figure 12. The change in the loss curve shows a downward trend. During the training phase for the two models of Original YOLOv5-s and YOLOv5s-360ID, it can be seen from the figure that when the epoch reaches 200, the loss and precision value achieve a better convergence effect.

Based on the training results, the statistical results for the two models are presented in Table 5.

Subsequently, both models were implemented on Jetson Xavier NX to evaluate the intrusion target detection frame rate, and the following experimental results were obtained, as shown in Table 6.

We propose a target detection and tracking method that combines 360° panoramic images with 3-axis gimbals mounted on a UAV. Our method has a larger field of view than other intrusion target detection methods [10,37].

The target detection algorithm can effectively identify intrusion targets on the ground. It can be seen from Table 5 and Table 6 that by improving the original CIoU Loss to EIoU Loss and using the K-Means++ clustering algorithm, the detection accuracy of the model can be improved without affecting the detection speed, which proves that the improved method has Effectiveness in intrusion target detection tasks.

At the same time, target detection algorithms are in the process of rapid development [38,39]. In the next step, we will test the adaptability of more target detection algorithms to our proposed system to improve the system’s detection accuracy and efficiency. More importantly, we verified that this path, from the object detection algorithm to the practical application of tracking objects using panoramic images and the 3-aix gimbal, is achievable.

Figure 13 illustrates the intrusion detection results obtained after deploying the two detection models on Jetson Xavier NX, presented individually.

3.2. Field Flight Test

In order to assess the system’s performance in a real intrusion detection scenario, the onboard computer was installed on the UAV and subjected to a field flight test. The test was carried out at an intersection where cranes, cars, and pedestrians passed. After confirming that the system is operating normally and establishing a connection with the ground control computer, the operator controls the UAV to take off and hover above all intersections. The flight test process is shown in Figure 14.

In Figure 15, we selected the crane as the intrusion target to be detected and tracked. When the 360° panoramic camera finds the target, the onboard computer automatically executes the 3-axis gimbal tracking program. It automatically adjusts the gimbal rotation to make the target in the centre of the screen, which verifies the system’s effectiveness.

4. Conclusions

This research proposes an intrusion detection system based on UAV and a target detection and tracking method that combines 360° panoramic images with 3-axis gimbals. Based on deep learning, an intrusion detection model containing five types of targets is developed. This model improves and optimizes the anchor box of the YOLOv5s model according to the characteristics of the intrusion target. It uses the K-Means++ clustering algorithm to regain the anchor box that matches the small target detection task. It also introduces the EIoU loss function as a replacement for the original CIoU loss function. The performance of the intrusion detection models proposed is evaluated. Meanwhile, this method takes advantage of the large field of view search and discovery advantages of the 360° panoramic camera. It combines it with the mechanical rotation characteristics of the 3-axis gimbal to enable continuous tracking of intrusion targets and further magnification for identification and verification. This approach provides a new technical means for ground intrusion target detection and alarm. It solves the problem of limited viewing angles when only using a 3-axis gimbal mounted on a UAV. Experiments prove that the system has reasonable practicability in intrusion detection. In the future, we will refine our model and UAV system and try to apply it to other fields, such as territorial security and marine environmental protection.

Author Contributions

All authors contributed to the study conception and design. Y.X. contributed to programming, visualization. Y.L. was involved with software, deep learning concepts, data collection and preparation, and drafting. H.L. contributed to hardware design, writing, and editing. L.W. and J.A. contributed to supervision, reviewing, and validation. Y.X, Y.L. and H.L. have contributed equally to this paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, S.; Jiang, F.; Zhang, B.; Ma, R.; Hao, Q. Development of UAV-based target tracking and recognition systems. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3409–3422. [Google Scholar] [CrossRef]
Liao, J.; Xu, H.; Fang, X.; Miao, Q.; Zhu, G. Quantitative Assessment Framework for Non-Structural Bird’s Nest Risk Information of Transmission Tower in High-Resolution UAV Images. IEEE Trans. Instrum. Meas. 2023, 72, 5013712. [Google Scholar] [CrossRef]
Bharati, P.; Pramanik, A. Deep learning techniques—R-CNN to mask R-CNN: A survey. In Computational Intelligence in Pattern Recognition. Advances in Intelligent Systems and Computing; Springer: Singapore, 2020; pp. 657–668. [Google Scholar]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Las Condes, Chile, 11–18 December 2015; pp. 1440–1448. [Google Scholar]
Liu, B.; Zhao, W.; Sun, Q. Study of object detection based on Faster R-CNN. In Proceedings of the 2017 Chinese Automation Congress (CAC), Jinan, China, 20–22 October 2017; pp. 6233–6236. [Google Scholar]
Jiang, P.; Ergu, D.; Liu, F.; Cai, Y.; Ma, B. A Review of Yolo algorithm developments. Procedia Comput. Sci. 2022, 199, 1066–1073. [Google Scholar] [CrossRef]
Hu, X.; Li, T.; Wu, Z.; Gao, X.; Wang, Z. Research and application of intelligent intrusion detection system with accuracy analysis methodology. Infrared Phys. Technol. 2018, 88, 245–253. [Google Scholar] [CrossRef]
Chia, L.; Bhardwaj, B.; Lu, P.; Bridgelall, R. Railroad track condition monitoring using inertial sensors and digital signal processing: A review. IEEE Sens. J. 2018, 19, 25–33. [Google Scholar] [CrossRef]
Zaman, A.; Liu, X.; Zhang, Z. Video analytics for railroad safety research: An artificial intelligence approach. Transp. Res. Rec. 2018, 2672, 269–277. [Google Scholar] [CrossRef]
Huang, H.; Zhao, G.; Bo, Y.; Yu, J.; Liang, L.; Yang, Y.; Ou, K. Railway intrusion detection based on refined spatial and temporal features for UAV surveillance scene. Measurement 2023, 211, 112602. [Google Scholar] [CrossRef]
Tan, L.; Lv, X.; Lian, X.; Wang, G. YOLOv4_Drone: UAV image target detection based on an improved YOLOv4 algorithm. Comput. Electr. Eng. 2021, 93, 107261. [Google Scholar] [CrossRef]
Xia, K.; Li, H.; Li, Z. Panoramic shot device of 720-degree VR for hexacopter UAV based on 3-axis gimbal. Electron. Res. 2019, 55, 18–20. [Google Scholar] [CrossRef]
Zheng, Y.; Zheng, C.; Zhang, X.; Chen, F.; Chen, Z.; Zhao, S. Detection, localization, and tracking of multiple MAVs with panoramic stereo camera networks. IEEE Trans. Autom. Sci. Eng. 2022, 20, 1226–1243. [Google Scholar] [CrossRef]
Humpe, A. Bridge inspection with an off-the-shelf 360° camera drone. Drones 2020, 4, 67. [Google Scholar] [CrossRef]
Baculi, J.E.; Ippolito, C.A. Towards an Autonomous sUAS Operating in UTM TCL4+ and STEReO Fire Scenario. In Proceedings of the AIAA Scitech 2021 Forum, virtual event, 11–15 & 19–21 January 2021; p. 1471. [Google Scholar]
Shvorov, S.; Lysenko, V.; Pasichnyk, N.; Opryshko, O.; Komarchuk, D.; Rosamakha, Y.; Rudenskyi, A.; Lukin, V.; Martsyfei, A. The method of determining the amount of yield based on the results of remote sensing obtained using UAV on the example of wheat. In Proceedings of the 2020 IEEE 15th International Conference on Advanced Trends in Radioelectronics, Telecommunications and Computer Engineering (TCSET), Lviv-Slavske, Ukraine, 25–29 February 2020; pp. 245–248. [Google Scholar]
Liu, Y.; Li, H.; Wang, L.; Ai, J. Deep Learning Approach to Drogue Detection for Fixed-Wing UAV Autonomous Aerial Refueling with Visual Camera. In Proceedings of the 2023 International Conference on Unmanned Aircraft Systems (ICUAS), Warsaw, Poland, 6–9 June 2023; pp. 827–834. [Google Scholar]
Ijaz, H.; Ahmad, R.; Ahmed, R.; Ahmad, W.; Kai, Y.; Jun, W. A UAV assisted edge framework for real-time disaster management. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1001013. [Google Scholar] [CrossRef]
Koubaa, A.; Ammar, A.; Abdelkader, M.; Alhabashi, Y.; Ghouti, L. AERO: AI-Enabled Remote Sensing Observation with Onboard Edge Computing in UAVs. Remote Sens. 2023, 15, 1873. [Google Scholar] [CrossRef]
Moon, S.; Jeon, J.; Kim, D.; Kim, Y. Swarm Reconnaissance Drone System for Real-Time Object Detection Over a Large Area. IEEE Access 2023, 11, 23505–23516. [Google Scholar] [CrossRef]
Liu, Z.; Gao, X.; Wan, Y.; Wang, J.; Lyu, H. An Improved YOLOv5 Method for Small Object Detection in UAV Capture Scenes. IEEE Access 2023, 11, 14365–14374. [Google Scholar] [CrossRef]
Xing, J.; Liu, Y.; Zhang, G.Z. Improved YOLOV5-Based UAV Pavement Crack Detection. IEEE Sens. J. 2023, 23, 15901–15909. [Google Scholar] [CrossRef]
Yang, Z.; Wang, X.; Li, J. EIoU: An improved vehicle detection algorithm based on vehiclenet neural network. J. Phys. Conf. Series. 2021, 1924, 012001. [Google Scholar] [CrossRef]
Zhang, Y.-F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
Jin, Z.-Z.; Zheng, Y.-F. Research on application of improved YOLO V3 algorithm in road target detection. J. Phys. Conf. Ser. 2020, 1654, 012060. [Google Scholar]
Sun, M.; Zhang, H.; Huang, Z.; Luo, Y.; Li, Y. Road infrared target detection with I-YOLO. IET Image Process. 2022, 16, 92–101. [Google Scholar] [CrossRef]
Rong, H.; Ramirez-Serrano, A.; Guan, L.; Gao, Y. Image object extraction based on semantic detection and improved K-means algorithm. IEEE Access 2020, 8, 171129–171139. [Google Scholar] [CrossRef]
Liu, Z.; Zhang, W.; Yu, H.; Zhou, S.; Qi, W.; Guo, Y.; Li, C. Improved YOLOv5s for Small Ship Detection with Optical Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2023, 20, 8002205. [Google Scholar] [CrossRef]
Zhao, D.; Shao, F.; Yang, L.; Luo, X.; Liu, Q.; Zhang, H.; Zhang, Z. Object Detection Based on an Improved YOLOv7 Model for Unmanned Aerial-Vehicle Patrol Tasks in Controlled Areas. Electronics 2023, 12, 4887. [Google Scholar] [CrossRef]
Liu, X.; Yang, Y.; Ma, C.; Li, J.; Zhang, S. Real-time visual tracking of moving targets using a low-cost unmanned aerial vehicle with a 3-axis stabilized gimbal system. Appl. Sci. 2020, 10, 5064. [Google Scholar] [CrossRef]
Yi, J.; Lee, D.; Park, W.; Byun, W.; Huh, S.; Nam, W. Autonomous Control of UAV for Proximity Tracking of Ground Vehicles with AprilTag and Feedforward Control. In Proceedings of the 2023 International Conference on Unmanned Aircraft Systems (ICUAS), Warsaw, Poland, 6–9 June 2023; pp. 349–353. [Google Scholar]
Lin, C.; Zhang, W.; Shi, J. Tracking Strategy of Unmanned Aerial Vehicle for Tracking Moving Target. Int. J. Control. Autom. Syst. 2021, 19, 2183–2194. [Google Scholar] [CrossRef]
Davis, J.; Goadrich, M. The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; pp. 233–240. [Google Scholar]
Goutte, C.; Gaussier, E. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In Proceedings of the European Conference on Information Retrieval, Santiago de Compostela, Spain, 21–23 March 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 345–359. [Google Scholar]
Shin, D.J.; Kim, J.J. A deep learning framework performance evaluation to use yolo in nvidia jetson platform. Appl. Sci. 2022, 12, 3734. [Google Scholar] [CrossRef]
Kumar, A.; Kashiyama, T.; Maeda, H.; Omata, H.; Sekimoto, Y. Real-time citywide reconstruction of traffic flow from moving cameras on lightweight edge devices. ISPRS J. Photogramm. Remote Sens. 2022, 192, 115–129. [Google Scholar] [CrossRef]
Yu, C.; Liu, Y.; Zhang, W.; Zhang, X.; Zhang, Y.; Jiang, X. Foreign Objects Identification of Transmission Line Based on Improved YOLOv7. IEEE Access 2023, 11, 51997–52008. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao HY, M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Terven, J.; Córdova-Esparza, D.M.; Romero-González, J.A. A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]

Figure 1. Comparison of 360° panoramic image and 3-axis gimbal image.

Figure 2. Research methodology.

Figure 3. Hardware structure.

Figure 4. Hardware connection topology.

Figure 5. Software structure.

Figure 6. The implementation process of intrusion detection deep learning model.

Figure 7. YOLOv5-s structure diagram.

Figure 8. YOLOv5s-360ID framework.

Figure 9. Description of the EIOU loss.

Figure 10. The flow chart of the 3-axis gimbal target tracking algorithm.

Figure 11. Schematic diagram of images after data augmentation. (a) Initial image; (b) horizontal flip image; (c) crop image; (d) random rotate image; (e) random rotate and noise image.

Figure 12. The performance comparison of Original YOLOv5-s and YOLOv5s-360ID.

Figure 13. Comparison of the detection results of the two models. (a) Original YOLOv5-s; (b) YOLOv5s-360ID.

Figure 14. Display of the field flight test.

Figure 15. The detection and tracking examples of a crane. (a) Detection result of 360° image. (b–f) Tracking results of the 3-axis gimbal.

Table 1. UAV Specifications.

	Specifications	Value
DJI M600 PRO UAV	Dimensions	1668 × 1518 × 727 mm
	Max Take-Off Weight	15.5 kg
	Max Forward Speed	65 km/h
	Max Ascent Speed	5 m/s
	Max Descent Speed	3 m/s
	Max Angular Velocity	Pitch: 300°/s Yaw: 150°/s
	Max Endurance Time	16 min
	Max Remote Control Distance	5 km
A8 Mini 3-axis Gimbal	Dimensions	55 × 55 × 70 mm
	Photo Size	1920 × 1080
	Lens	FOV: 93°
	Focal Length	21 mm
	Angular Vibration Range	±0.01°
	Controllable Range	Pitch: −135°~+45° Yaw: −30°~30°
DUXCAM 360° Camera	Dimensions	80 × 80 × 160 mm
	Photo Size	3840 × 1920
	Lens	4 × F2.0 fisheye lens FOV: 360°

Table 2. Onboard computer specifications.

	Specifications	Value
Jetson Xavier NX	AI Performance	21 TOPS
	CPU Max Frequency	1.9 GHz
	GPU Max Frequency	1100 MHz
	Memory	16 GB
	DL Accelerator	2× NVDLA
	USB	1× USB 3.2 Gen2 (10 Gbps) 3× USB 2.0
	Power	10 W~20 W
	Mechanical	103 × 90.5 × 34 mm

Table 3. Details of the labels in the data set.

Image	Personnel	Vehicle	Crane	Truck	Bicycle
360° panoramic images	3810	5463	257	432	341
3-axis gimbal images	1330	4330	226	386	311

Table 4. Parameter configurations for model training.

Model	Epoch	Batch Size	Learning Rate	Input Shape	Trainset/Validation
Original YOLOv5-s	200	32	0.005	640 × 640	9:1
YOLOv5s-360ID	200	32	0.005	640 × 640	9:1

Table 5. Accuracy results of Original YOLOv5-s and YOLOv5s-360ID.

Model	mAP@50	AP
Model	mAP@50	Personnel	Vehicle	Crane	Truck	Bicycle
Original YOLOv5-s	72.4%	81.6%	86.1%	93.1%	64.2%	37%
YOLOv5s-360ID	75.2%	82.7%	88.8%	96.5%	66.8%	41.2%

Table 6. Detection speed results of Original YOLOv5-s and YOLOv5s-360ID.

Model	FPS
Original YOLOv5-s	33
YOLOv5s-360ID	31

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, Y.; Liu, Y.; Li, H.; Wang, L.; Ai, J. A Deep Learning Approach of Intrusion Detection and Tracking with UAV-Based 360° Camera and 3-Axis Gimbal. Drones 2024, 8, 68. https://doi.org/10.3390/drones8020068

AMA Style

Xu Y, Liu Y, Li H, Wang L, Ai J. A Deep Learning Approach of Intrusion Detection and Tracking with UAV-Based 360° Camera and 3-Axis Gimbal. Drones. 2024; 8(2):68. https://doi.org/10.3390/drones8020068

Chicago/Turabian Style

Xu, Yao, Yunxiao Liu, Han Li, Liangxiu Wang, and Jianliang Ai. 2024. "A Deep Learning Approach of Intrusion Detection and Tracking with UAV-Based 360° Camera and 3-Axis Gimbal" Drones 8, no. 2: 68. https://doi.org/10.3390/drones8020068

APA Style

Xu, Y., Liu, Y., Li, H., Wang, L., & Ai, J. (2024). A Deep Learning Approach of Intrusion Detection and Tracking with UAV-Based 360° Camera and 3-Axis Gimbal. Drones, 8(2), 68. https://doi.org/10.3390/drones8020068

Article Menu

A Deep Learning Approach of Intrusion Detection and Tracking with UAV-Based 360° Camera and 3-Axis Gimbal

Abstract

1. Introduction

2. Materials and Methods

2.1. Hardware Design

2.2. Software Design

2.2.1. Intrusion Detection Model Implementation

2.2.2. YOLOv5-s Detection Algorithm

2.2.3. The Improved Detection Algorithm YOLOv5s-360ID

2.2.4. Anchor Box Improvement Optimization in YOLOv5s-360ID

2.2.5. Improvement of Bounding Box Regression Loss Function in YOLOv5s-360ID

2.2.6. Target Tracking Algorithm

3. Results

3.1. Intrusion Detection Model Experiment

3.2. Field Flight Test

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI