Long-Strip Target Detection and Tracking with Autonomous Surface Vehicle

Zhang, Meiyan; Zhao, Dongyang; Sheng, Cailiang; Liu, Ziqiang; Cai, Wenyu

doi:10.3390/jmse11010106

Open AccessArticle

Long-Strip Target Detection and Tracking with Autonomous Surface Vehicle

by

Meiyan Zhang

¹

,

Dongyang Zhao

²,

Cailiang Sheng

³,

Ziqiang Liu

²

and

Wenyu Cai

^2,*

¹

College of Electrical Engineering, Zhejiang University of Water Resources and Electric Power, Hangzhou 310018, China

²

College of Electronics and Information, Hangzhou Dianzi University, Hangzhou 310018, China

³

Jiangsu Yongkang Machinery Co., Ltd., Wuxi 214299, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2023, 11(1), 106; https://doi.org/10.3390/jmse11010106

Submission received: 23 October 2022 / Revised: 24 December 2022 / Accepted: 26 December 2022 / Published: 5 January 2023

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

As we all know, target detection and tracking are of great significance for marine exploration and protection. In this paper, we propose one Convolutional-Neural-Network-based target detection method named YOLO-Softer NMS for long-strip target detection on the water, which combines You Only Look Once (YOLO) and Softer NMS algorithms to improve detection accuracy. The traditional YOLO network structure is improved, the prediction scale is increased from threeto four, and a softer NMS strategy is used to select the original output of the original YOLO method. The performance improvement is compared totheFaster-RCNN algorithm and traditional YOLO methodin both mAP and speed, and the proposed YOLO–Softer NMS’s mAP reaches 97.09%while still maintaining the same speed as YOLOv3. In addition, the camera imaging model is used to obtain accurate target coordinate information for target tracking. Finally, using the dicyclic loop PID control diagram, the Autonomous Surface Vehicle is controlled to approach the long-strip target with near-optimal path design. The actual test results verify that our long-strip target detection and tracking method can achieve gratifying long-strip target detection and tracking results.

Keywords:

targetdetection; YOLO; softer NMS; autonomous surface vehicle

1. Introduction

Target detection and tracking is a very important field in modern science and technology [1,2]. Generally, it uses different cameras and visual algorithms to analyze the target information, so as to replace the human eye to complete classification, recognition, and other tasks. Target detection has been widely used in resources exploration, civil aviation and military aircraft navigation, traffic management, security monitoring, and other fields. Recently, it is well known that target detection and tracking play an important role in unmanned systems.

Nowadays, some researchers have developed many intelligent oceanicequipment, such as Autonomous Underwater Vehicle [3], Unmanned Surface Vessel (USV) [4,5], Autonomous Surface Vehicle (ASV) [6], etc. These oceanic unmanned systems play an important role inmarine resources exploration [7], naval battles, and accident rescue [8]. In the military field, theycan be used for military guidance [9], enemy reconnaissance [10], etc. In the civilian field, they can be used for ship detection and tracking [11], biological exploration [12], etc. As an important branch of ocean robots, ASV has a wide range of applications, one of which is to search the ocean surface and underwater targets. More specifically, water surface target detection is among the important applications for the marine ASV. As far as we know, identification of small ships, obstacles, mines, and other targets at sea level is essential for normal offshore operations [13,14].

Target detection technology can be simply divided into traditional target detection methods and deep-learning-based target detection methods. The traditional target detection method is to extract the target area by an image processing algorithm, whilethedeep-learning-based target detection method is mainly realized by various Convolutional Neural Networks. The pipeline of traditional target detection models can be mainly divided into three stages: information region selection, feature extraction, and object classification [15]. The main methods of feature extraction include Haar [16], HOG [17], etc. The Support Vector Machine (SVM), AdaBoost, and deformable part-based model (DPM) are commonly used classifiers. Maire et al. [18] use aHaar-like feature classifier to detect poles and USV docking. Zhang et al. [19] detect USVs based on bounding box regression. Jin et al. [20] propose a Centroid Matching (CM) algorithm for target detection based on a frequency-tuned saliency method. The traditional target detection method is based on feature extraction, althoughdetection accuracy may be poor under the influence of illumination change, size change, and occlusion. Moreover, for the special scenarios studied in this paper, there is little existing research on long-strip target detection and tracking.

Existing researchverified that the target detection methods based on CNN (Convolutional Neural Network) can improve the target detection accuracy greatly [21]. CNN-based target detection algorithms can be divided into one-stage based and two-stage based methods. The one-stage based algorithms need to determine the candidate region first, then classify targets in the candidate region, and finally make coordinate corrections to the detection results. The two-stage based regression analysis model uses a neural network directly to output coordinates and complete classification. Region-CNN (R-CNN) and Faster R-CNN [22] belong to typical two-stage target detection algorithms. Moreover, there are two widely used target detection algorithms: You Only Look Once (YOLO) [23] and Single Shot Detector [24]. Pixel-level semantic segmentation includes SegNet, PSPNet, Mask-RCNN, andetc. There isalso some research that combines Visual SLAM with CNN for object detection. Mihir Kulkarni et al. [25] propose a novel method where the traditional Visual SLAM (VSLAM) method is accompanied by object detection using YOLOv3 to enhance the capabilities of navigating efficiently. Chen et al. [26] use YOLOv4 combined with VSLAM to detect dynamic objects. Hu et al. [27] propose a novel VSLAM framework integrating a neural network for moving object detection. Recently, different variants of YOLO algorithms are proposed in [28,29,30] to improve target detection accuracy. These studies demonstrate the feasibility and advancement of the combination between object detection and VSLAM, and most of them are combined with the YOLO series algorithms, indicating that the YOLO series has superior performance in the field of object detection.

Although CNN-based detection technology improves the target detection accuracygreatly, the computing time for target detection may be much longer than that for traditional algorithms. In this paper, we introduce YOLO for surface long-strip object detection and contributesome improvements to the original YOLO algorithm, finally we make a horizontal comparison with Faster R-CNN detection results.

In terms of target tracking control, fuzzy control, backstepping, nonlinear sliding mode control, neural network control, etc., have been applied widely [31]. However, most of these control algorithms have a large amount of calculation, complex design, and high requirements on steering angle, which may be not conducive to engineering implementation.

To the best of our knowledge, little attention has been paid to how to deal with long-strip object detection and tracking for Autonomous Surface Vehicles. In this paper, an ASV system with long-strip targettrackingobjectiveis designed and realized to meet the competition requirements. In this paper, we propose one CNN-based target detection method named YOLO-Softer NMS (Non-Maximum Suppression) [32] for long-strip target detection, which combines the YOLO detection model and Softer NMS algorithm to improve target detection accuracy. The traditional structure of the YOLO network is improved with the prediction scale increased to four, and the accuracy of the bounding box is further improved with the Softer NMS strategy. Further, the camera imaging model is used to obtain accurate target coordinate information. Finally, a dicyclic PID control diagram is applied to control the ASV approaching the long-strip shape target.

The key pointsof this paper are three-fold:

(1) This paper proposes an improved YOLOv3-based network architecture, which has an extra 4th prediction scale and a Softer NMS selection strategy. As a result, it has good detection accuracy performance when deployed in the actual ASV system.

(2) For fixed-size long-strip targets, one camera imaging model is introduced to obtain accurate target coordinate and heading information; therefore, the ASV has the ability of knowing its distance and deflection angle from the remote long-strip target.

(3) In this paper, a dicyclic PID control model is designed for the underactuated ASV to ensure the stability of the ASV tracking process, and the proposed method can be adapted to actual engineering perfectly.

The rest of this paper is organized as follows: Section 2 proposes detailed system framework of the ASV and introduces the operation process of whole system. In Section 3, YOLO-Softer NMS-based target detection algorithm based on YOLOv3 to modify its network structure is described in detail. Moreover, the improved target detection and tracking model is proposed, combinedwith a dicyclic PID controller, in Section 4. Experimental results are presented in Section 5, and a summary of whole paper is providedin Section 6.

2. System Framework and Problem Statement

2.1. System Framework

The typicalapplication scenario of anASV-based long-strip target detection and tracking system is shown in Figure 1. The ASV has the ability to detect remote target and search for long-strip targets near the inshore. Once a target has been identified, the ASV will use the designed controller to approach until it reachesproximity with thelong-strip target. In the ASV, oneHikVison waterproof camera is equipped for real-time target image acquisition, Jetson Xavier NX GPUplatform is used for image processing, and STM32MCU based controller is appliedto implementthe ASV’s motion control.

The totalprocess of the proposed method can be divided into three stagesas demonstrated in Figure 2 CNN-based target detection, target location estimation, and target tracking control. Firstly, we useda deep-learning-based target detection algorithm to process the image acquired by the waterproof camera, and the long-strip target could be characterized with a modified YOLO-Softer NMS algorithm. Secondly, the position and shape size of the long target wereestimated in real time to determine the forward direction of ASV. Finally, the optimal tracking trajectory wasrealized to approachthe surface long-strip targetthrough the dicyclic PID controller (Table 1).

To clearly describe this article, we list the main notations in Table 1.

2.2. Dynamic Model of ASV

The dynamics of self-made ASV can be expressed as follows:

\{\begin{cases} M \dot{v} + C (v) v + D (v) v = τ \\ \dot{η} = J (η) v \end{cases}

(1)

where M,

C (v)

, D, and τ denote inertia matrix, Coriolis and centrifugal terms matrix, damping matrix, and force and moment of ASV, respectively.

M = d i a g [m_{11}, m_{22}, m_{66}]

(2)

C (v) = [\begin{matrix} 0 & 0 & - m_{22} v \\ 0 & 0 & m_{11} u \\ m_{22} v & - m_{11} u & 0 \end{matrix}]

(3)

D = d i a g [d_{11}, d_{22}, d_{66}]

(4)

τ = {[τ_{u}, 0, τ_{r}]}^{T}

(5)

As seen in Figure 3, there are two coordinate systems for ASV, including the inertial coordinate system (I − xy) and the body coordinate system (B − uv). v = [u, v, r]^T denotesvelocity vector in the body coordinate system, η = [x, y, ψ]^T denotes position vector in the inertial coordinate system, and J(η) denotesthe conversion matrix.

J (η) = [\begin{matrix} \cos (φ) & - \sin (φ) & 0 \\ \sin (φ) & \cos (φ) & 0 \\ 0 & 0 & 1 \end{matrix}]

(6)

2.3. Problem Description

For the above application scenario, two key issues need to be addressed.

(1) The accuracy of target detection should be improved as much aspossible. Fora real-time target detection and tracking system, it is requested that object detection algorithms need to be accurateand fast, so we used the YOLOv3 model as the target detection algorithm. In order to detect small targets as far away as possible, the original structure of the YOLOv3 network wasmodified by adding a fourth feature layer on the basis of the original three feature layers. In addition, a softer-NMS wasintroduced into the output results to improve the target detection accuracy.

(2) Animplementable visual navigation method for ASV is hard to design. In order to track long-strip targets in the shortest time, the approaching trajectory needs to be designedcarefully. The objective of target tracking in this paper wasto keep the cruise direction align with long-strip target. If there is a deviation angle between them, the ASV should adjust its course continuously to reduce the deviation angle.

3. YOLO–Softer-NMS-Based Target Detection Algorithm

3.1. Improved Network Structure for YOLO–Softer NMS

The famous YOLOv3 [21] network directlyperforms regression to detect targets in images without requiring the Region Proposal Network (RPN)to detect regions of interest in advance, acceleratingthe target detection processes. Since the YOLOv3 target detection algorithm has high detection speed and high accuracy, it is widely used in real-time target detection. Unfortunately, there are still some defects, such as the inaccuracy of long-distance small-target detection and the frame selection deviation of the target area in the traditional YOLO algorithm. Therefore, in this paper, we modified the network structure to eliminate these two problems.

Figure 4 describes the improved network structure of the proposed YOLO–Softer NMS, which can be divided into three parts: the original YOLOv3 network structure, the fourth feature map, and Softer NMS module.

The network structure of the YOLOv3 model can be divided into two main parts: the feature extraction network Darknet-53 and the YOLO multi-scale prediction layer. As the backbone network of YOLOv3, Darknet-53 is responsible for extracting and calculating the feature of input image and obtaining the deep-seated feature information. The residual block uses shortcut connections that further deepen the network structure without causing the gradient to disappear. Three scales, including 13 × 13, 26 × 26, and 52 × 52, are responsible for detecting large, medium, and small targets. Deep feature maps contain a lot of semantic information, while shallow feature maps contain a lot of fine-grained information. Therefore, when feature fusion is performed, the network uses up-sampling to keep the size of the feature map the same.

For example, the input image is divided into 6 × 6 grids (Figure 5a), and the grid where the target center falls is responsible for predicting the target. Each grid predicts bounding boxes and confidence scores for these bounding boxes. The confidence score indicates whether it contains targets and the degree of prediction accuracy, so a high confidence score indicates that its prediction accuracy is relativelyhigh. The prediction confidence score is defined as follows:

C o n f i d e n e c = p_{r} (O b j e c t) \times I o U_{p r e d}^{t r u t h} = \{\begin{matrix} I o U_{p r e d}^{t r u t h}, p_{r} (O b j e c t) = 1 \\ 0, p_{r} (O b j e c t) = 0 \end{matrix}

(7)

where IoU as a standard indicator in target detection denotes the detection accuracy by calculatingthe overlap ration between the true bounding box and the bounding box predicted using detectionmethods. Finally, the YOLOv3 object detection algorithmuses non-maximum suppression to avoid repeated detection of the same target (Figure 5c).

3.2. The Fourth Feature Map Improvement

In the actual test, we foundthatthe small targets is difficult to detect due to itslow resolution and small size. To solve this problem, we neededtoobtain more reliable semantic informationbyadding a fourth 104 × 104 feature map to the original YOLO framework in Figure 6.

Through double up-sampling, the output characteristic scale of the fourth characteristic layer is enriched from 52 × 52 to 104 × 104. The feature fusion of the 109th layer and the 11th layer of the feature extraction network is carried out through the route layer to make full use of deep and shallow features. The remaining feature fusion is the output of the 85th and 97th layer after double up-sampling. The feature maps of the 85th and 61st layer, and the 97th and 36th layerarecombined by route layer. As a result, there are four output characteristic scales, including 104 × 104, 52 × 52, 26 × 26, and 13 × 13, respectively. Through the training data set, we found that the detection effect for large targets is very good, but the detection result for small targets is still not ideal. In other words, small targets are often undetected, as in Figure 7a using traditional YOLOv3. However, the detection ability increases with the fourth feature map, as Figure 7b.

3.3. Softer NMS Improvement

Softer NMS adopts Kullback–Leibler divergence Loss (KL-Loss) and variance voting method [32], and KL-Loss can improve the model learning ability for location information. In the case of multiple prediction boxes overlapping, the variance voting method obtains the optimal result through weighted judgment of each box. The Softer NMS is introduced to solve the problem of inaccuracy ofdetecting boxes for certain target.

Take the coordinate information {x₁, x₂, y₁, y₂} as certain bounding box, where (x₁, y₁) and (x₂, y₂) represent the upper-left and lower-right coordinates of a certain bounding box. The Softer NMS first models the coordinate information of the bounding box and the ground truth. Assuming thatthe bounding box followsa Gaussian function, the bounding box prediction model is shown in Equation (8):

P_{Θ} (x) = \frac{1}{\sqrt{2 π σ^{2}}} e^{- \frac{{(x - x_{e})}^{2}}{2 σ^{2}}}

(8)

where

P_{Θ} (x)

is Gaussian distribution function of bounding box prediction,

Θ

represents the parameter set to be learned in the bounding box model, x and x_e represent the bounding box and estimated bounding box, σ is the standard deviation. The smaller the deviation σ, the closer the prediction result is to the actual value.

The location of ground truth is the basis of Dirac delta function with the model as shown in Equation (9):

P_{D} (x) = δ (x - x_{g})

(9)

where x_g denotes the location information of the ground truth.

Kullback–Leibler divergence is used to measure the asymmetry of two probability distributions. The smaller the divergence, the closer the two distributions are. Therefore, Softer NMScalculates the KL divergence of the probability distribution for the bounding box and ground truth, and the target function is to obtain the KL divergence of the minimum value and convert it into KL-loss. The calculation process is described in Equation (10):

\begin{array}{l} L o s s_{r e g} = D_{K L} (P_{D} (x) ∥ P_{Θ} (x)) \\ = \int P_{D} (x) \log P_{D} (x) d x - \int P_{D} (x) \log P_{Θ} (x) d x \\ = \frac{{(x_{g} - x_{e})}^{2}}{2 σ^{2}} + \frac{\log (σ^{2})}{2} + \frac{\log (2 π)}{2} - H (P_{D} (x)) \end{array}

(10)

Finally, the coordinate information loss function is simplified to Equation (11) following Ref. [28]:

L o s s_{r e g} = e^{- α} (|x_{g} - x_{e}| - \frac{1}{2}) + \frac{1}{2} α

(11)

The first step is to deletebounding boxes below the set threshold. Then, the bounding box with the maximum confidence is selected and the remaining bounding boxes are weighted. The more overlaps there are between the remaining bounding boxes and the maximum confidence boundingbox, the higher theconfidence attenuation.

Finally, variance is used to weigh the bounding box, and the best is selected according to the final confidence value. Variance weighting is calculated as follows:

p_{i} = e^{- {(1 - I o U (b_{i}, b))}^{2} / σ_{t}}

(12)

x = \frac{\sum p_{i} x_{i} / σ_{_{x, i}}^{2}}{\sum p_{i} / σ_{_{x, i}}^{2}}

(13)

where b_i is the bounding box and σ_t is the weighted parameter. After weighting, the greater the variance between the bounding box and the ground truth or the smaller the IoU, the more serious the confidence attenuation of the prediction box will be.

For example, there is an inaccurate bounding box occursin Figure 8a. Obviously, in this case, the target detection result is inaccurate, leading to large deviations. Moreover, when the target is far away from the ASV, althoughthe long-strip target has been detected accurately in Figure 8c, the matching degree between the bounding box and the target is not good enough. The comparison resultsobviouslyshow that the target detection results of YOLO-Softer NMS are much better than that of the original YOLOv3. In Figure 8b,d, the bounding box is clearly more accurate and more fitting. In conclusion, the proposed Softer NMS optimizes the YOLO network’s learning of location information loss function, and the region identification for IoU has improved by 1.8% and 0.9%, respectively, in Figure 8.

4. Target Tracking Model of Autonomous Surface Vehicle

4.1. Target DetectionMethod

The long-strip target detection principle is describedin Figure 9. The waterproof camera can collect environmental images in front of the ASV. In the imaging model forthelong-strip target in Figure 9, coordinate point (x, y) denotes the upper-left coordinate of target’s bounding box, w and h denote the width and length of bounding box, and

Δ x

is the pixel distance between the target and midline of the imaging plane. Thesquareof the yellow boundingbox s = w × h is derived from the YOLO-Softer NMS method. The vertical distance between the ASV and target imaging plane is d, and the deviation

θ

is an angle between the heading of ASV and the target direction in the body coordinate system. In the investigated target tracking process, the ASV will adjust its heading angleconstantly to make the target imaging as close to the midline of imaging plane as possible, so that the ASV can approach the long-strip target with optimal or suboptimal trajectory.

In this paper, we assume that the long-strip target is of fully fixed size. Therefore, there is a definite relationship between imaging size and relative distance. The parameter values of s, w, and h with different relative distancesarecompared in Figure 10. It is obvious thatwidth w, length h, and area s of the bounding boxes will decrease when the distance d growslarger. After repeatingexperiments more than 10 times, it is verified that the curves formed by the actual data meet the same change trend.

Further, we find that the curve of distance d and area s is the smoothest one. Herein, a second-order exponential functionwasused to fitthe actual measured data, which is shownasEquation (14).

d = A e^{B s} + C e^{D s}

(14)

where A, B, C, and D aretheparameters that need tobefitted.

By fitting actual observed data using MATLAB fitting toolbox, four parameters are obtained as A = 0.001372, B = −21.84, C = 7.668, and D = −1.288. The curve-fitting result is illustrated in Figure 11. It is clear that the fitted curve can represent most raw data values.

According to the above analysis, we obtain the mapping relationship between the area of bounding box and the distance between ASV and target.

The horizontal imaging diagram is shown in Figure 12, where L is the focal length of the used camera, the maximal distance d can reach 35 m, the unit of

Δ x

is pixel, and its value range is in [−400, 400]. The vertical pixel distance

Δ x

between the long-strip-shapetarget and the midline is shown in Equation (15):

Δ x = x + w / 2

(15)

Therefore, the calculation formula for

θ

can be obtained as follows:

θ = \arctan (\frac{Δ x \times d x - u_{0} d x}{L})

(16)

where dx and u₀ denotethe internal parameters of the used camera, and dx and u₀ wereset to 1705.4 and −0.3284, respectively, after camera calibration.

4.2. Target Tracking Method

On the basis oftwo inputs d and θ, the applied target tracking method is described in Figure 13. Suppose that the starting position of ASV is P₁(x₁, y₁), the speed and heading adjustment process will be started at each position P_k iterativelyuntil reaching the position of target P_t. When d and θ approach 0, the ASV is assumed to reach its destination.

One dicyclic PID control diagram is proposed to implement iterative target tracking, as in Figure 14. For the speed controller, the greater d denotes a fartherdistance between the ASV and target. In this case, the power of the forward thruster is increased to improve the forward speed of ASV; otherwise, the power of the forward thruster is reduced to slow down forward speed, where d and θ are the inputs to the heading controller. For heading controller, both d and θ are used as inputs, and d is used as the basis for controlling the power of the steering thruster.

The control law of the dicyclic PID controller is described as follows:

δ_{1} {= k}_{p}_{1} d + k_{d}_{1} \frac{d d}{d t} + k_{i}_{1} \int_{0}^{t} d d t

(17)

δ_{2} {= k}_{p}_{2} θ + k_{d}_{2} \frac{d θ}{d t} + k_{i}_{2} \int_{0}^{t} θ d t

(18)

By setting appropriatePID controller parameters, stable convergence control law can be obtained, so that δ₁ and δ₂ will gradually converge to zero. If δ₁ and δ₂ exist in the system at a certain moment, the forward and steering thrusterswill work together to control the ASV. As a result, the heading angle of the ASV will approach the desired heading angle, and δ₁ and δ₂ will decrease to zero.

5. Experiments and Results

5.1. Target Detection Results

In order to obtain rich data sets, we make the ASV reach the target gradually from all directions through remote control andchange the position of the long-strip target. In our experiments, 3502 images weregenerated. The training server configuration is eight NVIDIA Tesla V100-SXM232GB GPUs, and the software platform is based on Ubuntu20.04, Darknet, Python 2.7. Three detectionmodels are compared in this paper, such as Faster-RCNN, YOlOv3, and YOLO-Softer NMS. In order to ensure the fairnessof comparison, the same initial training parameters wereset for each group of experiments. The input resolution wasuniformly adjusted to 416 × 416, and the number of iterations was 20,000.

In this experimental scene, the proposed YOLO-Softer NMS algorithm wasimplemented in the ASV to search the video stream and continuously output the coordinate information of a remote long-stripshape target. In Figure 15, as the distance between the ASV and the target decreases, the target image in the imaging plane increases gradually. Therefore, YOLO-Softer NMS based target detection and tracking algorithm not only has high detection speed, but also can identify small and large targets with high accuracy in the application process (Table 2).

Three evaluation indicators in this paper mainly use mean Average Precision (mAP) and speed (FPS). The numerical results are summarized in Table 2.

It is obvious from Table 1 that the performance of YOLO–Softer NMS is superior to Faster-RCNN in both mAP and speed. Furthermore, the proposed YOLO–Softer NMS’s mAP reaches 97.09%, while still maintaining the same speed as YOLOv3.

5.2. Motion Control Results

The experiment scene is located inanexperimental pool with size 40 m × 20 m. Two different experiments with different trajectories werecarried out to verify performance. In the first set of experiment, a long-striptargetlies in the middle of imaging plane, and the ASV moves forward in a straight line to reach the target. In the second set of experiments, the target is on the left side of the imaging plane. In other words, there is a large θ₀ in the initial state, which requires the ASV to adjust its heading angle largely. The whole process involves the ASVadjustingits direction until it is aligned with the remote target, and then going forward. At the time of the two groups of experiments, some images hadbeen collected during movement, which is exhibited in Figure 16. The experimental results verify that the proposed method can obtain the target position well in the whole process.

In order to verify the feasibility of the proposed method, we used MATLAB/Simulink to carry out simulation experiments on actual situations by modifying the initial conditions.

In this simulation experiments, the ASV starts at coordinate point (0, 0) or (0, 30) and ends at coordinate point (30, 30). The trajectory diagrams are shown in Figure 17a,b, respectively. Furthermore, the initial angle of ASV wasalso set to be different, including 0°, 15°, 30°, and 45°. From these results, when the starting angle is larger, the initial trajectory changes obviously, but eventually they all reach the target.

Figure 18 shows the change curve of forward speed u. Regardless of the starting angle, the forward speed increases from 0 and remains unchanged until the maximum speed is reached, and then decreases as the target approaches. Figure 19 shows the change of the heading angle θ for ASV. It is obvious that the larger the initial deviation angle, the longer the stabilization time.

5.3. Lake Test Results

Finally, the proposed long-strip target tracking method is implemented in the self-made Autonomous Surface Vehicle, as Figure 20.

In the first experimental case, the ASV only needs to move forward until it approaches the surface long-strip target. There is a small deviation angle as seen in Figure 21a. In the second experimental case, the ASV has a large deviation angle at thebeginning. The deviation angle of ASV gradually decreases andthen fluctuates up and down around zero, as in Figure 21b.

The velocity change curve of the ASV is demonstratedin Figure 22. In the first experimental case, the velocity of ASV increases from 0 and decreases when it is closer to the long-strip target. In the second experimental case, the ASV has an upper limit of speed during the steering process. After the steering process, the ASV accelerates again to maximum speed. Finally, the ASV’s speed decreases with the distance to the destination point.

The tracking trajectory is described in Figure 23. The ASV starts from the initial position (0, 0) to the target position (25, 25), but the deviation angles are different at the initial stage. It can be seen from Figure 23b that the ASV adjusts its heading angle at the beginning and in the process of moving forward, and finallyit can be driven in the direction of the long-strip target.

6. Conclusions

In this paper, we providea novel YOLO–Softer NMS algorithm operated on an ASVto detect long-strip targetson the water. The coordinate information of the fixed-size long-strip target is obtainedwithanimproved YOLO algorithm. In addition, one dicyclic PID controller is used for motion control of the ASV.The experimental results verify that the investigated system has higher detection accuracy, and its mAPis increased to 97.09%. In our future work, we will consider target detection methods based on acousto–optic information fusion, which will further increase the application scenario oftheASV system.

Author Contributions

Conceptualization, M.Z. and W.C.; methodology, W.C. and D.Z.; software, Z.L.; validation, D.Z.; formal analysis, Z.L. and W.C.; investigation, C.S.; writing—original draft preparation, W.C., D.Z. and C.S.; writing—review and editing, W.C., D.Z. and Z.L.; visualization, M.Z. and C.S.; supervision, M.Z. and C.S.; projectadministration, C.S. and W.C.; funding acquisition, C.S. and M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been partially supported by Natural Science Foundation of Zhejiang Province (No. LZJWY22E090001 and LZ22F010004), Fundamental Research Funds for the Provincial Universities of Zhejiang (No. GK209907299001-001), National Natural Science Foundation of China (No. 62271179 and No. 61871163), Scientific research foundation of Zhejiang University of Water Resources and Electric Power (XT-202105),the Stable Supporting Fund of Acoustics Science and Technology Laboratory and the Foundation of Science and Technology on Near-Surface Detection Laboratory under Grant 6142414200410.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained fromall subjects involved in the study.

Data Availability Statement

Data available on request due to restrictions eg privacy or ethical.

Conflicts of Interest

The authors declare no conflict of interest.

References

Teixeira, E.; Araujo, B.; Costa, V.; Mafra, S.; Figueiredo, F. Literature Review on Ship Localization, Classification, and Detection Methods Based on Optical Sensors and Neural Networks. Sensors 2022, 22, 6879. [Google Scholar] [CrossRef] [PubMed]
Liang, J.-M.; Mishra, S.; Cheng, Y.-L. Applying Image Recognition and Tracking Methods for Fish Physiology Detection Based on a Visual Sensor. Sensors 2022, 22, 5545. [Google Scholar] [CrossRef] [PubMed]
Vagale, A.; Oucheikh, R.; Bye, R.T.; Osen, O.L.; Fossen, T.I. Path planning and collision avoidance for autonomous surface vehicles I: A review. J. Mar. Sci. Technol. 2021, 26, 1292–1306. [Google Scholar] [CrossRef]
Zhang, X.D.; Liu, S.L.; Liu, Y.; Hu, X.F.; Gao, C. Review on development trend of launch and recovery technology for USV. Chin. J. Ship Res. 2018, 13, 50–57. [Google Scholar]
Liu, W.; Liu, Y.; Bucknall, R. A Robust Localization Method for Unmanned Surface Vehicle (USV) Navigation Using Fuzzy Adaptive Kalman Filtering. IEEE Access 2019, 7, 46071–46083. [Google Scholar] [CrossRef]
Busquets, J.; Zilic, F.; Aron, C.; Manzoliz, R. AUV and ASV in twinned navigation for long term multipurpose survey applications. In Proceedings of the MTS/IEEE OCEANS, Bergen, Norway, 10–14 June 2013. [Google Scholar]
Wu, J.; Liu, J.; Xu, H. A variable buoyancy system and a recovery system developed for a deep-sea AUV Qianlong I. In Proceedings of the OCEANS 2014, Taipei, Taiwan, 7–10 April 2014. [Google Scholar]
Venkatesan, S. AUV for Search & Rescue at sea—An innovative approach. In Proceedings of the 2016 IEEE/OES Autonomous Underwater Vehicles (AUV), Tokyo, Japan, 6–9 November 2016; pp. 1–9. [Google Scholar]
Martins, R.; De Sousa, J.B.; Afonso, C.C.; Incze, M.L. REP10 AUV: Shallow water operations with heterogeneous autonomous vehicles. In Proceedings of the OCEANS 2011 IEEE—Spain, Santander, Spain, 6–9 June 2011. [Google Scholar]
Rashid, M.; Roy, R.; Ahsan, M.M.; Siddique Design, Z. Design and Development of an Autonomous Surface Vehicle for Water Quality Monitoring. Electr. Eng. Syst. Sci. 2022, 1, 1–14. [Google Scholar] [CrossRef]
Im, S.; Kim, D.; Cheon, H.; Ryu, J. Object Detection and Tracking System with Improved DBSCAN Clustering Using Radar on Unmanned Surface Vehicle. In Proceedings of the 2021 21st International Conference on Control, Automation and Systems (ICCAS), Jeju, Korea, 12–15 October 2021. [Google Scholar]
Xu, H.X.; Jiang, C.L. Heterogeneous oceanographic exploration system based on USV and AUV: A survey of developments and challenges. J. Univ. Chin. Acad. Sci. 2021, 38, 145–159. [Google Scholar]
Yang, Z.; Li, Y.; Wang, B.; Ding, S.; Jiang, P. A Lightweight Sea Surface Object Detection Network for Unmanned Surface Vehicles. J. Mar. Sci. Eng. 2022, 10, 965. [Google Scholar] [CrossRef]
Park, H.; Ham, S.-H.; Kim, T.; An, D. Object Recognition and Tracking in Moving Videos for Maritime Autonomous Surface Ships. J. Mar. Sci. Eng. 2022, 10, 841. [Google Scholar] [CrossRef]
Masita, K.L.; Hasan, A.N.; Shongwe, T. Deep Learning in Object Detection: A Review. In Proceedings of the International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), Durban, South Africa, 6–7 August 2020; pp. 1–11. [Google Scholar]
Li, X.; Nishida, Y.; Myint, M.; Yonemori, K.; Mukada, N.; Lwin, K.N.; Takayuki, M.; Minami, M. Dual-eyes vision-based docking experiment of AUV for sea bottom battery recharging. In Proceedings of the OCEANS 2017—Aberdeen, Aberdeen, UK, 19–22 June 2017; pp. 1–5. [Google Scholar]
Neves, G.; Cerqueira, R.; Albiez, J.; Oliveira, L. Rotation-invariant shipwreck recognition with forward-looking sonar. Comput. Vis. Pattern Recognit. 2019, 1, 1–14. [Google Scholar] [CrossRef]
Maire, F.; Prasser, D.; Dunbabin, M.D.; Ict, C.; Dawson, M. A Vision Based Target Detection System for Docking of an Autonomous Underwater Vehicle. In Proceedings of the Australasian Conference on Robotics and Automation (ACRA), Sydney, Australia, 2–4 December 2009; pp. 1–7. [Google Scholar]
Zhang, Y.H.; Wu, S.; Liu, Z.H.; Yang, Y.J.; Zhu, D.; Chen, Q. A real-time detection USV algorithm based on bounding box regression. J. Phys. Conf. Ser. 2020, 1544, 12–22. [Google Scholar] [CrossRef]
Jin, J.; Zhang, J.; Liu, D.; Shi, J.; Wang, D.; Li, F. Vision-Based Target Tracking for Unmanned Surface Vehicle Considering Its Motion Features. IEEE Access 2020, 8, 132655–132664. [Google Scholar] [CrossRef]
Liu, H.; Sun, F.; Gu, J.; Deng, L. SF-YOLOv5: A Lightweight Small Object Detection Algorithm Based on Improved Feature Fusion Mode. Sensors 2022, 22, 5817. [Google Scholar] [CrossRef] [PubMed]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Lecture Notes in Computer Science; Lecture Notes in Computer Science: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
Kulkarni, M.; Junare, P.; Deshmukh, M.; Rege, P.P. Visual SLAM Combined with Object Detection for Autonomous Indoor Navigation Using Kinect V2 and ROS. In Proceedings of the 2021 IEEE 6th International Conference on Computing, Communication and Automation (ICCCA), New Delhi, India, 17–19 December 2021; pp. 478–482. [Google Scholar]
Chen, B.; Peng, G.; He, D.; Zhou, C.; Hu, B. Visual SLAM Based on Dynamic Object Detection. In Proceedings of the 2021 33rd Chinese Control and Decision Conference (CCDC), Kunming, China, 22–24 May 2021; pp. 5966–5971. [Google Scholar]
Hu, J.; Fang, H.; Yang, Q.; Zha, W. MOD-SLAM: Visual SLAM with Moving Object Detection in Dynamic Environments. In Proceedings of the 40th Chinese Control Conference (CCC), Shanghai, China, 26–28 July 2021; pp. 4302–4307. [Google Scholar]
Li, Y.; Zhang, X.; Shen, Z. YOLO-Submarine Cable: An Improved YOLO-V3 Network for Object Detection on Submarine Cable Images. J. Mar. Sci. Eng. 2022, 10, 1143. [Google Scholar] [CrossRef]
Kim, J.-H.; Kim, N.; Park, Y.W.; Won, C.S. Object Detection and Classification Based on YOLO-V5 with Improved Maritime Dataset. J. Mar. Sci. Eng. 2022, 10, 377. [Google Scholar] [CrossRef]
Liu, T.; Pang, B.; Zhang, L.; Yang, W.; Sun, X. Sea Surface Object Detection Algorithm Based on YOLO v4 Fused with Reverse Depthwise Separable Convolution (RDSC) for USV. J. Mar. Sci. Eng. 2021, 9, 753. [Google Scholar] [CrossRef]
Yildiz, Ö.; Gökalp, R.B.; Yilmaz, A.E. A review on motion control of the Underwater Vehicles. In Proceedings of the 2009 International Conference on Electrical and Electronics Engineering—ELECO 2009, Bursa, Turkey, 5–8 November 2009; pp. II-337–II-341. [Google Scholar]
He, Y.H.; Zhu, C.C.; Wang, J.R.; Savvides, M.; Zhang, X.Y. Bounding Box Regression with Uncertainty for Accurate Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]

Figure 1. Typical application scenario.

Figure 2. The main flow of our proposed algorithm.

Figure 3. Inertial Coordinate System and Body Coordinate System.

Figure 4. The improved structure of YOLO–Softer NMS.

Figure 5. The detection process of original YOLOv3. (a) Original image, (b) bounding boxes, (c) finalresult.

Figure 6. The fourth feature map for YOLO.

Figure 7. Small target detecting results. (a) Using original YOLOv3, (b) using YOLO with the fourth feature map.

Figure 8. Target detection results with YOLOv3 and YOLO–Softer NMS: (a) near case with YOLOv3, (b) near case with YOLO–Softer NMS, (c) far case with YOLOv3, (d) far case with YOLO–Softer NMS.

Figure 9. Long-strip target tracking principle.

Figure 10. The curve diagrams of area s, width w, length h, and distance d.

Figure 11. The curve-fitting result.

Figure 12. Horizontal imaging diagram of camera.

Figure 13. Iterative target tracking model.

Figure 14. Dicyclic PID control diagram.

Figure 15. Different views with different distance.

Figure 16. Different views during movementprocess of ASV: (a) forwardexperiment, (b) left and forward experiments.

Figure 17. The trajectory diagrams with two paths: (a) from (0, 0) to (30, 30), (b) from (0, 30) to (30, 30).

Figure 18. The change curve of forward speed u.

Figure 19. The change curve of heading angle θ.

Figure 20. Self-made Autonomous Surface Vehicle.

Figure 21. Heading angle variations curve of ASV: (a) forward, (b) left and forward.

Figure 22. The velocity variations curve of ASV: (a) forward, (b) left and forward.

Figure 23. Practical target tracking trajectory: (a) forward, (b) left and forward.

Table 1. Notations used in this paper.

Notation	Meanings
M	Inertia matrix
$C (v)$	Coriolis and centrifugal terms matrix
D	Damping matrix
τ	Force and moment of ASV
v = [u, v, r]T	Velocity vector
η = [x, y, ψ]T	Position vector
J(η)	Conversion matrix
IOU	Standard indicator
s = w × h	Width and length
$θ$	Angle deviation
d	Distance between ASV and target

Table 2. Test Results.

Model	mAP (%)	Speed (FPS)
Faster-RCNN	85.67%	16
YOLOv3	92.52%	27
YOLO–Softer NMS	97.09%	27

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, M.; Zhao, D.; Sheng, C.; Liu, Z.; Cai, W. Long-Strip Target Detection and Tracking with Autonomous Surface Vehicle. J. Mar. Sci. Eng. 2023, 11, 106. https://doi.org/10.3390/jmse11010106

AMA Style

Zhang M, Zhao D, Sheng C, Liu Z, Cai W. Long-Strip Target Detection and Tracking with Autonomous Surface Vehicle. Journal of Marine Science and Engineering. 2023; 11(1):106. https://doi.org/10.3390/jmse11010106

Chicago/Turabian Style

Zhang, Meiyan, Dongyang Zhao, Cailiang Sheng, Ziqiang Liu, and Wenyu Cai. 2023. "Long-Strip Target Detection and Tracking with Autonomous Surface Vehicle" Journal of Marine Science and Engineering 11, no. 1: 106. https://doi.org/10.3390/jmse11010106

APA Style

Zhang, M., Zhao, D., Sheng, C., Liu, Z., & Cai, W. (2023). Long-Strip Target Detection and Tracking with Autonomous Surface Vehicle. Journal of Marine Science and Engineering, 11(1), 106. https://doi.org/10.3390/jmse11010106

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Long-Strip Target Detection and Tracking with Autonomous Surface Vehicle

Abstract

1. Introduction

2. System Framework and Problem Statement

2.1. System Framework

2.2. Dynamic Model of ASV

2.3. Problem Description

3. YOLO–Softer-NMS-Based Target Detection Algorithm

3.1. Improved Network Structure for YOLO–Softer NMS

3.2. The Fourth Feature Map Improvement

3.3. Softer NMS Improvement

4. Target Tracking Model of Autonomous Surface Vehicle

4.1. Target DetectionMethod

4.2. Target Tracking Method

5. Experiments and Results

5.1. Target Detection Results

5.2. Motion Control Results

5.3. Lake Test Results

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI