Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Real-Time Object Detection from UAV Inspection Videos by Combining YOLOv5s and DeepStream

Sensors 2024, 24(12), 3862; https://doi.org/10.3390/s24123862

by Shidun Xie¹, Guanghong Deng^1,*, Baihao Lin¹, Wenlong Jing^2,*

, Yong Li^2,* and Xiaodan Zhao¹

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Sensors 2024, 24(12), 3862; https://doi.org/10.3390/s24123862

Submission received: 16 April 2024 / Revised: 7 June 2024 / Accepted: 7 June 2024 / Published: 14 June 2024

(This article belongs to the Special Issue New Methods and Applications for UAVs)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

One of the hot topics in industrial applications of various kinds is real-time object detection. This area gets considerable attention in computer vision. Within this area, a very challenging problem is real-time inspection observation of objects on the earth's surface from high-altitude UAVs.

The reviewed paper proposes one of the possible solutions to this problem. Namely, a technology of automatic inspection of ground objects based on the use of YOLOv5 type models is developed for UAVs. As an example of implementation of the proposed technology, training of these models is performed using five datasets that contain images of vehicles as inspected objects. It is shown that the detection time of a single object using the described tools is relatively short. In particular, in one of the examples it amounted to 11.26 ms, which is quite acceptable for many computer vision applications. It should be noted that the effectiveness of the proposed approach is demonstrated on a relatively simple example, but it has the potential to solve more complex problems.

The article is in general adequately structured and presented. It may be useful for researchers in the field of computer vision, especially as applied to the specificity of these tasks due to their implementation on board a UAV.

As a remark on the layout of the paper, it should be noted the unsatisfactory quality of Fig. 11. This figure shows five confusion matrices. For all these matrices it is almost impossible to understand what they demonstrate. All numbers in the figures are very small and unsharp (blurred). Even at high magnification, it is not possible to read the values of numerical parameters in the cells of these matrices, as well as along the abscissa and ordinate axes.

Author Response

Comments 1: As a remark on the layout of the paper, it should be noted the unsatisfactory

quality of Fig. 11. This figure shows five confusion matrices. For all these matrices it is almost

impossible to understand what they demonstrate. All numbers in the figures are very small and

unsharp (blurred). Even at high magnification, it is not possible to read the values of numerical

parameters in the cells of these matrices, as well as along the abscissa and ordinate axes.

Response 1: Thank you for pointing this out. We have modified the quality of the image and

show that the pixel resolution is larger and clearer

Reviewer 2 Report

Comments and Suggestions for Authors

Below, a list of comments from my end:

-``Unmanned aerial vehicles (UAVs) high-altitude real-time inspection has always been a very challenging task.''

-But why is it very challenging? it is also unclear what are the problems been solved in this work.

-``YOLOv5s object detection'' needs significant improvement!

-The description part of the sections 2.1 and 2.2 need more clear and elaborate explanations.

-Apart from Figs 9-14, each of them needs adequate discussion.

-Mechanism part looks very weak.

-Also provide mathematical analysis to support your contribution your work.

-`Motivation and research gap' is unclear from introduction and related work.

Comments on the Quality of English Language

requires proofreading.

Author Response

Comments 1: -``Unmanned aerial vehicles (UAVs) high-altitude real-time

inspection has always been a very challenging task.''

-But why is it very challenging? it is also unclear what are the problems been

solved in this work

Response 1: Thank you for pointing this out. Therefore, We modify it as

follows：

Because high-altitude inspections are susceptible to interference from different weather

conditions, interference from communication signals and a larger field of view result in a

smaller object area to be identified.

Comments 2: -``YOLOv5s object detection'' needs significant improvement!

Response 2: Thank you for pointing this out. Therefore, We modify it as

follows：

Since the UAV flies at high altitude, it will keep shaking, and the field of view when flying

at high altitude is large, and the shooting area is also large, resulting in a smaller target to

be detected. There are difficulties in using object detection algorithms to detect smaller

object. This article mainly uses YOLOv5s, a lightweight object detection model, combined

with DeepStream architecture to achieve real-time video stream detection. To better

implement a complete set of real-time target detection system architecture, we made a

series of improvements to the original DreamStream and combined with the YOLOv5s

model to achieve target detection. YOLOv5s is designed to detect the goals that require

inspection. It is designed to achieve automatic detection capabilities. The main

improvement part is the design of the overall logical architecture of DeepStream, and the

improved experimental results are also very effective.

Commennt 3: -The description part of the sections 2.1 and 2.2 need more clear and elaborate

explanationsResponse 3: Thank you for pointing this out. Therefore, We modify it as

follows：

As shown in Figure 1, This figure shows the basic workflow of DeepStream’s original architecture.

First, we need to start a DeepStream video streaming service on the backend of the platform,

immediately start creating the DeepStream pipeline, and determine the pipeline initialization status. If

the pipeline initialization verification is successful, then the pipeline status is set to the playing state,

which is the pipeline starting working status. Secondly, the pipeline starts to push the stream, and the

algorithm performs video stream data pre-processing including normalization and scaling. Then the

video stream will be input into the YOLOv5 model for inference, and finally the Deepsort [37,38]

tracking algorithm is used to track the corresponding object. Add alarm logic to the object that needs to

be identified, push the processed video stream to the platform service, and finally display the

identification results in real time on the platform software.

As shown in Figure 2, this figure shows the improved workflow of DeepStream architecture. First, the

platform starts the DeepStream service command by sending an http request. Currently, once the

algorithm service receives the platform request to start the command, it immediately starts the

DeepStream service, starts parsing the request command parameters, determines whether the

parameters meet the requirements, and generates a configuration file according to the corresponding

task. Secondly, after the configuration is successful, start creating a sub-process to perform the task and

create a DeepStream pipeline. Determine whether the pipeline is initialized. After the initialization is

successful, set the pipeline status to the playing state. The pipeline starts pushing and the algorithm

starts working, including pre-processing, reasoning, tracking and other operations. After the object is

detected, alarm logic is added. Finally, the video stream is pushed to the platform service, and the

platform interface displays the stream inferred by the algorithm. In addition, we also added an auxiliary

service, whose purpose is to monitor whether the DeepStream service is disconnected. If a

disconnection occurs, the auxiliary service will immediately kill the task and restart the task

immediately.

As shown in Figure 3, Figure 3 shows two extremely important parts of data processing and data

training. The first is the data processing stage. In the process of collecting data, we need to plan the

UAV inspection route. In the areas that need to be inspected, let the UAV automatically inspect and

collect photos or videos according to the planned route. During the process of collecting images by

UAV, we flew in different time periods, different weather, and different lighting conditions to achieve

better performance of the model. To better make the model more robust to the complex background of

high-altitude inspections. We plan the flight area to include different areas such as villages, water bodies

and towns. The collected UAV inspection videos need to be manually edited and the required object

information retained and extracted into images. Images were manually filtered and labeled using

LabelImg software. Finally, the data is divided into training set and test set.

The second is the model training stage. In the first stage of data processing, the processed data has been

divided into training set and test set. We are in the data collection phase. Effectively planning routes for

different scenarios and collecting data under different conditions fully ensures the richness of model

training samples and can effectively prevent over-fitting during the training process. We evaluate the

performance of YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l and YOLOv5x models in the process of

UAV high-altitude inspection. The performance is evaluated on the training set and test set divided on

the same data set. Finally, we will select a model with strong robust performance for real-time detection

based on its real-time detection effect.

Commennt 4: -Apart from Figs 9-14, each of them needs adequate discussion.

Response 4: Thank you for pointing this out. For discussion of the additions to

Figures 1 to 3, please refer to the reply to comment 3.

Figure 4, We add some discussion as follows

Through the command and dispatch system interface, the status of the UAV and its slot can be monitored

in real time. We can see a white drone nest, the drone nest door has been opened, the platform has been

raised, and a black UAV is waiting to fly on it. There are three buttons available on the left side of the

command and dispatch system. The main functions include UAV control, landing control, system

functions, power control and emergency control, etc. It can not only command the takeoff and landing

of the drone, but also control the angle and focus of the gimbal. When the drone is flying, the battery

power can be monitored in real time. When an emergency occurs, such as low battery and automatic

hovering, it can also return to home with one click.

Figure 5, We add some discussion as follows

To ensure the effectiveness of model training in this article, we increase the richness of samples as much

as possible. Therefore, in the process of collecting images, we planned routes in different areas and

different route altitudes. While ensuring flight safety, the drone's flight speed and different flight

mileage are set.

Figure 6, We add some discussion as follows

We divide the training set and the test set at a ratio of 9:1. To ensure the effectiveness of the training

model, the divided data sets are independent of each other.

Figure 7, We add some discussion as follows

The routes are all included in a yellow circular area, which is displayed on a map interface. We can

clearly see that there are residential areas and river areas next to the planned routes. To increase the

richness of the sample, our routes span multiple regions to collect images.

Figure 8, We add some discussion as follows

After the model training was completed, to verify the detection effects of different models, we used

images containing five different categories to test the detection performance of five different models.

We can clearly see that different YOLOv5 detection models can detect five different types of objects:

car, truck, excavator, pile driver and crane.

Figure 15, We add some discussion as follows

It can be seen from Figure 5，It is a display of real-time inspection pictures. During this kind of high

altitude inspection by drones, the drones are looking down at the ground angle. Information such as

buildings, ponds and vehicles on the ground can be clearly seen. And it can also be clearly seen that the

YOLOv5s model trained in this article can automatically detect the required targets during the automatic

drone inspection process, and correctly distinguish the target area and the background area. In the lower

right corner of the image, you can obtain the inspection route of the drone and the specific location

information of the drone flying on this route.

Commennt 5: -Mechanism part looks very weak.

Response 5: Thank you for pointing this out. Therefore, We modify it as