Research on an Underwater Target-Tracking Method Based on Zernike Moment Feature Matching

: Sonar images have the characteristics of lower resolution and blurrier edges compared to optical images, which make the feature-matching method in underwater target tracking less robust. To solve this problem, we propose a particle ﬁ lter (PF)-based underwater target-tracking method utilizing Zernike moment feature matching. Zernike moments are used to construct the feature-description vector for feature matching and contribute to the update of particle weights. In addition, the particle state transition method is optimized by using a ﬁ rst-order autoregressive model. In this paper, we compare Hu moments and Zernike moments, and we also compare whether to optimize the particle state transition on the tracking results or not based on the e ﬀ ects of each option. The experimental results based on the AUV (autonomous underwater vehicle) prove that the robustness and accuracy of this innovative method is be tt er than the other combined methods mentioned in this paper.


Introduction
The PF is a commonly used state estimation method that has unique advantages for nonlinear processes and non-Gaussian models in practical problems [1,2]. It is currently widely used in scenarios such as target tracking, robot localization, etc. I. Masmitja [3] used a combination of PF and extended Kalman filter (EKF) to achieve the localization and tracking of an AUV, and they found that the PF provides better estimation for moving targets, while the EKF is more suitable for static targets.
Target characterization and recognition are the key aspects of tracking. In recent years, deep learning has been applied to target recognition [4], e.g., Shan Ma [5] proposed a method to solve the optimal feature combination based on GRNN (general regression neural network) judging criterion. The literature reference [6] used FCN (fully convolutional networks) to train and test an underwater sonar image data set and expressed the data set in a pixel matrix. However, due to the complexity of the underwater environment and noise overlay, dynamic target tracking still has urgent problems to be solved, such as harder target identification and poorer tracking accuracy. Manual feature-extraction methods are still of great importance.
Generally, we can characterize the target by the edges, contours, shapes, textures, regions, histograms, moment features, transformation coefficients, etc., in the images.
However, in dynamic underwater observation, sonar images are of lower resolution and are more susceptible to noise interference compared to optical images. The relative motion between the sonar and the target can easily lead to distortions in target imaging [7,8]. Therefore, a single-feature description method such as contour or texture is less accurate. There are many scholars who use histogram methods, such as Songwei Huang [9], who used a color-feature histogram to construct feature-description vectors for sonar images, and Xiao Wang [10], who extracted a color histogram and the shape texture to describe features together. The histogram can reflect the probability distribution of pixel values but lacks the target spatial location and shape information. In recent years, the multifeature fusion method has often been used to solve the tracking of moving targets in cluttered environments. The color features, edge features, and direction features of the target are the more commonly fused features. This method helps to improve the tracking accuracy, but fusing more features will enhance the complexity of the algorithm and affect the tracking in real time [11,12].
In this paper, we study the application of invariant moments in underwater target tracking. Hu moments, Zernike moments, Radon moments, etc., all belong to statistical feature-description methods. Because of the invariance in terms of translation, rotation, and scale change, these methods are suitable for image matching, pattern recognition, and other fields. Tiedong Zhang [13] designed an observation model in a PF tracker by fusing the area features and Hu moment features of the target region. Ziqi Wang [14] and Ji Li [15] conducted experiments in an indoor pool, and they demonstrated that using the Hu moment method can obtain better tracking results than the grayscale template matching method. However, these experiments had fewer disturbances and lacked the dynamic process of more outdoor environments. Hengguang Li [16] compared the performance of different moments on processing the sonar images of marine organisms and demonstrated that Zernike moments are more advantageous than Hu and Radon moments for static target recognition, but the robustness of dynamic target recognition was not further explored.
Hu invariant moments consist only of a nonlinear combination of second-or thirdorder normalized central moments. In contrast, Zernike moments are defined in the unit circle, and they are orthogonal moments and outperform Hu moments in terms of information redundancy. Zernike low-order moments can describe the overall shape of the target, and the extracted features are less correlated and more noise-resistant. In addition, arbitrary higher-order moments can be constructed to describe the image details [17]. In this paper, in order to study the robustness and accuracy of invariant moment features in dynamic target recognition, we use Zernike moments to describe the features and construct the observation model based on the assumption of an approximate 2D tracking scenario. Also, the Hu moment method is implemented and compared. This paper contains four sections. The first section discusses the origin of the targettracking method. The second section introduces the hardware platform and algorithm design of the tracking system in the PF framework. The third section presents the experimental and data analysis results, and demonstrates the feasibility and robustness of the algorithm. The fourth section provides a conclusion to the whole paper and an outlook for the future.

Hardware Architecture
The target-tracking sensor system of the AUV in this paper mainly includes an IMAGENEX MODEL 881L mechanical scanning sonar, HsKINS-SG-4500C1 fiber-optic gyro, PATHFINDER DVL, altimeter, etc. The sensor distribution of the AUV tracking system is shown in Figure 1. The observation module consists of a mechanical scanning sonar. The navigation module consists mainly of a fiber-optic gyro, DVL, and altimeter, and this module calculates the position and attitude information of the AUV. The control module is position closed-loop and velocity closed-loop. We fuse the position and attitude information of the AUV to correct the distortion of a sonar image, then we obtain the target information in an odometry (referred to as odom in the latter text) coordinate system, which is then inputted into the PF tracker ( Figure 2).

Data Preprocessing
The mechanical scanning sonar is a single-beam sonar which has the characteristics of simpler structure, lower cost, and lower power consumption compared to the multibeam sonar, and it can easily be mounted on a small AUV. The echo data of each beam comprise a Ping, and each Ping contains 500 points to record the echo information. Each point is recorded as a Bin, which contains the position coordinates and echo intensity information. In the sonar coordinate system, when we analyze the position and intensity information of the 500 points of each echo beam, we use PCL (Point Cloud Library) to construct a point cloud to express the echo information. In this paper, the navigation module error is 0.7% of the distance. Since the research in this paper focuses on the application of invariant moments to dynamic target feature recognition, we designed the time and distance scales to be small in the experiment so that the positioning drift error can be ignored.
The target detection is performed by the process in Figure 3. After obtaining the point cloud data in the odom coordinate system, we can extract the position of the target to be tracked. The position information of the target is described by the centroid coordinates of the extracted target point cloud.

Calibration
In order to obtain the observation data in the odom coordinate system, a coordinate transformation from sonar coordinate system to base_link carrier coordinate system and then to odom coordinate system is required. In this paper, the base_link coordinate system and the fiber-optic gyro coordinate system are considered to be the same, and the base_link X axis passes through the center of the sonar installation position. The dynamic transformation relationship between the base_link and odom is obtained from the navigation module. The transformation relationship between the sonar coordinate system and the base_link is static, and its translational transformation can be approximated by the sensor mounting position; as for the rotation relationship, the calibration of the horizontal deflection β of the two coordinate systems is required ( Figure 4). The AUV needs to observe obstacle A at two positions B and C, and the yaw angles at two positions are similar (the error within 5° can be ignored). There are three sets of data needed to solve β:  The absolute coordinates , and , of the AUV at B and C, obtained from GPS;  The relative coordinates , of target A at B and C positions under the sonar coordinate system;  The yaw angles at B and C.
The value of β is −2.70° after performing statistical processing for multiple sets of experiments. The transformation matrix for converting points under the sonar coordinate system to the base_link is Finally, the dynamic conversion from the base_link to the odom coordinate system is realized by fusing the position and attitude information of the AUV in real time.

Feature-Description Vector Construction
Target detection requires the identification and localization of a specific target object from an image, and the previous target detection session relies on prior information to obtain the position of the target of interest. Feature extraction is a multidimensional mathematical description of the target, which in turn realizes the automatic identification of the target and is the basis of target detection. In this paper, we have implemented both Zernike moment-and Hu moment- [18] based feature matching methods and also compared the robustness and accuracy of both.
The regular moments project the image function , onto the base of . Further, we can extend the base to the more general polynomial base . Zernike proposed a set of orthogonal polynomials , , which are orthogonal in the unit circle 1 . Zernike's nth-order polynomial , is defined as where 0, 1, 2, …, ∞, and takes on positive and negative integer values subject to the following conditions: In Equation (7), is the length of the vector from the origin to the pixel point , , and −1 , 1. is the angle between the vector and the X axis. is a radial polynomial, which can be expressed as M.R. Teague proposed Zernike moments based on the theory of orthogonal polynomials [19]. For a two-dimensional image function , , the nth-order Zernike moment with repetition rate is defined as where * indicates that the conjugate is taken, and the above equation is expressed in the polar coordinate system as For a two-dimensional image whose Zernike moments are complex numbers and whose real and imaginary parts are denoted as and , respectively: The image analysis in practical problems requires a discretization of Equation (11): In the calculation of Zernike moments, the center of the pixel point in the target image region must be used as the origin (position normalization), and the pixel coordinates are projected into the unit circle (scale normalization). In this paper, a transformation method for solving Zernike moments proposed in the literature [20] is learned to solve the moments for the sonar point cloud.
The feature vector constructed with the seven Hu moment values is seven-dimensional, i.e., , , … , , while the dimensionality of the feature vector constructed with Zernike moments is related to the values of and . Therefore, we take the combination of and values as in Table 1 to construct the seven-dimensional vector of Zernike invariant moments in order to let it be in contrast with the Hu moments.

. Tracking System Initialization
The basic idea of the PF is to represent the posterior probability distribution with a series of random state samples obtained from the posterior. The PF algorithm estimates the statistical properties of random variables based on the state and respective weights of the samples after the samples pass through the nonlinear system. The samples here are the so-called particles, and a particle is a possible hypothesis based on the real-world state at moment . The PF algorithm consists of initialization, state transition, update, and resampling steps [21], and its algorithmic flow is shown in Figure 5. In this paper, we approximate the tracking scenario as two-dimensional, considering only the position information as a state quantity. The position is written as , , where and are the coordinates of the target centroid position in the odom coordinate system.
Firstly, we construct the feature template using the first three frames of the sonar point cloud. Considering that the Zernike moments are solved on the orthogonal basis of the unit circle and that the tracking target is blocky and not elongated, a circle is used to construct the feature template. We calculate the farthest distance of the target point cloud to be tracked from the segmentation and approximate the distance value as the template's radius r (after calculation, this paper takes r = 1 m). Then, we can calculate three featuredescription vectors of the target in each of the three frames and construct the template feature vector using the mean values of the corresponding elements of the three vectors. Further, we can initialize the target position , using the motion state of the target in the third frame and then randomly initialize particles within a certain range of the target. Finally, the individual particle positions are obtained as where is the total number of particles, which is set to 120 in this paper. and are expressed as , , (17) where is a random number obeying the Gaussian distribution of 0, , and the initial particle weights are When subsequent point clouds arrive, the positions and weights of the particles are updated based on the state transition model and the observation model.

State Transition Model
PF tracking algorithms are mostly based on the smooth motion assumption in the state transition process, i.e., no abrupt changes occur during the target motion. The literature [22] reviewed the application of PF algorithms in smooth motion tracking. For the case of drastic changes in target appearance and attitude, a PF tracking algorithm based on a memory mechanism was proposed in the literature [23], which used historical target state sequences to achieve the simultaneous estimation of target position and attitude, but this algorithm has a high spatial complexity. In the real environment, the motion of the target has more autonomy, which is difficult to accurately express using mathematical modeling. The first-order autoregressive model [13][14][15] is often used for state transition analysis. This paper draws on this idea and also optimizes the state transition process for the possible degradation of the particle weights.
Only considering the position state of the target, the first-order autoregressive model is used to model the following: .
To simplify the state transition model, 1, 1 are taken in Equations (20) and (21), i.e., the current position is the previous moment's position plus the Gaussian term . is a random number obeying Gaussian distribution of 0, . Gaussian variance is selected by considering the maximum edge distance of the tracked target and its motion speed (about 0.1 m/s in this experiment). The above-mentioned particle state transition method is the Gaussian random method for each frame of observation. For a dynamic target with autonomous motion, this method generates a large number of particles that move in the opposite direction of the target motion. Therefore, this method could reduce the ability of the particles to "catch" the target, cause degradation of the particle weights, and reduce the efficiency of the algorithm. Thus, we propose an optimized particle state transition method in contrast to the above method.
At the state transition of each frame, the first two position estimations are considered together. The angle between the line connecting two adjacent position points and the X axis is , as in Figure 6. We define two modes of particle state transition and note that the distance between the positions of two adjacent frames is L. Mode 1: If L ≤ 0.3 m, we consider that the target motion is slow and use the ordinary particle state transition method outlined in the previous section. Mode 2: If L > 0.3 m, we consider that the target keeps a faster motion and let the particles tend to the direction of the target motion with a certain probability (which is 0.5) when performing particle state transition. Immediately after, we generate Gaussian random numbers and , obeying 0, distribution, in both directions X and Y and compute | |: If | | < 30° or | | > 150°, the target is considered to be moving more towards the X axis. We can adjust and so that the larger absolute value corresponds to the motion in the X direction and set and to have the same sign as ∆ and ∆ , respectively. If | | > 60° and | | < 120°, the target is considered to be moving more towards the Y axis. We can adjust and so that the larger absolute value corresponds to the motion in the Y direction and set and to have the same sign as ∆ and ∆ , respectively.

Observation Model
For each frame, we take particle samples from the prior distribution | , then obtain a point cloud of the same range as the initialized template at each particle position after state transition and calculate the feature vector of this local point cloud. Finally, we can update the weight of each particle according to its similarity with the template feature vector.
For the feature vectors constructed with Zernike or Hu moments, the data are processed in the same way. To facilitate the analysis, each moment value is first transformed as Then, the cosine formula is used to obtain the similarity of the current feature vector and the template feature vector . Define the cosine value as : The closer the is to 0, the more similar the two vectors are. In order to compare similarity in linear dimension, every obtained value is inverted to a corresponding vector angle : The smaller the is, the more similar the two point clouds are, and the corresponding particle weight should be larger. In order to avoid the impact of extreme differences on the algorithm performance, a threshold is set. In the first calculation of particle weights, the values corresponding to all particles are arranged in an array from smallest to largest, and the value at 2/3 of the total number of particles from the minimum is taken as the threshold of the whole tracking process. Then, we use the minimum absolute difference (MAD) function to calculate the similarity value for each particle. A particle with greater than is considered to be too different from the template, and its weight is set to 0: , 0 , The probability distribution based on the angular similarity can be defined as where is a constant. Thus, the particle weights based on sequence importance sampling are updated as . (30) After normalizing the particle weights, we sum the state weights to obtain an estimate of the target position in the current frame (Equation (31)) and wait for the next frame to arrive.

Resampling
The resampling method is designed to solve the particle weight degradation problem that occurs during the iterative process of the PF. This method can effectively allow the PF to avoid wasting arithmetic power on particles with tiny weights. The essence is to copy the particles with large weights, eliminate the particles with small weights, and assign the same weights to the new particles. Define the effective sampling scale: After updating the weights of particles using each frame's data, if the value is less than 2/3 of the total number of initial particles, it is considered necessary to perform particle resampling. When resampling, Gaussian random numbers are added to the positions of the new particles obtained by replication, with the aim of enhancing particle diversity. After resampling, the weights of all new particles are 1/N.

Algorithm
Algorithm 1 in the following table is the PF underwater target-tracking process based on mechanical scanning sonar.

14: END WHILE
The first row is for template initialization and particle set initialization. Rows 2 to 14 iteratively solve for the target position when the sonar sequence observation point clouds arrive. Among them, rows 3-7 update the state transition and weights of the particles, rows 8-11 calculate the normalized weights of the particles, and row 12 calculates the target position estimate at the current moment, while row 13 resamples the particles.

Experimental Scenes
We conducted AUV target-tracking experiments in late spring in Maishan Reservoir, Zhoushan, which is a freshwater reservoir. The reservoir is surrounded by a dike on the west side and mountains on the remaining three sides. The water is wide, and it had a temperature of about 8 °C, calm wind, and calm waves at the time of the experiments, which provided the experimental conditions. For the experiments, a white plastic bucket was chosen to simulate a dynamic obstacle with a diameter of 780 mm and a height of 1000 mm, and a 14 kg iron block was tied to the bucket so that it could be suspended in the water. The iron block was approximately on the axis of the bucket in about 2.5 m of water. The experimental layout is shown in Figure 7. A rubber boat equipped with a propeller hookup was used to drag the bucket to the shore in slow motion (due to water resistance and the impact of the flow, the actual path was not straight). In order to make an accuracy assessment of the algorithm results, a differential GPS module was placed directly above the bucket to read the position change, which was used as the a priori position to compare with the estimated position. The AUV was continuously observed behind the target, with a depth of about 1.5 m throughout ( Figure 8). Based on the above scenarios, we conducted three different sets of experiments to verify the robustness and accuracy of the algorithms in this paper: 1. Static target with AUV hovering observation; 2. Dynamic target with AUV hovering observation; 3. Dynamic target with AUV following observation.
Combining the characteristics of this mechanical scanning sonar and the application scenario of this paper, we set the parameters of the sonar to what is shown in Table 2. In this paper, a laptop equipped with an Intel Core i5 2.40 GHz processor was used to conduct the data offline processing. We built the PF tracking algorithm system in C++ based on the ROS (Robot Operating System) platform. The point cloud data obtained from sonar detection was interconnected with other modules of the AUV through the "Topic" communication mechanism in ROS.

Static Target with AUV Hovering Observation
In this set of static target data, the target to be tracked in the first three frames of the sonar scan point cloud is extracted (Figure 9), and it is used to construct the feature template. The white dashed circle indicates the size of the template, which surrounds the target to be tracked. We take the Hu moments of the above three frames and construct a 7-dimensional feature-description vector , , … , using the mean values of the corresponding elements. The individual elements of the feature-description vector are shown in Table 3. Then, the same operation is performed for the Zernike moments.  Figure 10 shows the tracking process after using the Zernike moment feature-description method and optimized state transition. The white point set is the set of position estimation points, and the green point set is the particle population distributed at the current moment. Since the target to be tracked is static, the position estimation changes very little. Due to the small distance between the position estimation points of two adjacent moments of this static target (L< 0.3 m in most cases), the algorithm will enter mode 1 under the optimized state transition method. The tracking deviations (the difference between the GPS values and the estimated values) of the PF tracker in both the X and Y directions are shown in Figure 11. Define the mean value of the distance deviation S as Further, we calculate the means and variances of the deviations in both the X and Y directions for the first data set, as in Table 4. For the static target in the first set of tracking data, it can be seen from the above analysis that the feature-description method using Zernike moments is better. The absolute values of means and variances in the X and Y directions are smaller in this method, where the mean value is about 41% and 81% smaller than the Hu moment method, respectively, and the S value is about 50% smaller.

Dynamic Target with AUV Hovering Observation
Similar to the previous section, for this dynamic target data set, we take the first 3 frames of the target to construct the feature template ( Figure 12). The corresponding feature-description vectors are obtained as in Table 5.  Figure 13 shows the tracking process after using the Zernike moment feature-description method and optimized state transition method. The white point set is the set of position estimation points, and the green point set is the particle population distributed at the current moment. In the optimized state transition method, the algorithm almost always goes to mode 2 and obtains a set of position estimation points. Figure 14 shows the differential GPS path of the target and the tracking paths under several methods.  The tracking deviations of the PF tracker in both the X and Y directions are shown in Figure 15. Further, we calculate the means and variances of the deviations in both the X and Y directions for the second data set, as in Table 6. For the second data set, it can be seen from Figure 15 and Table 6 that better tracking accuracy is obtained by using Zernike moments under the optimized state transition condition. The absolute values of the means of deviations in the X and Y directions in the Zernike moment method are about 98% and 61% smaller than those in the Hu moment method, respectively. The S value is nearly 69% smaller, and the corresponding variances are smaller, too. In addition, the Hu moment method quickly fails to track when the state transition is not optimized, while the Zernike moment method still maintains stable tracking. On the other hand, from the moments perspective, in the Hu moment method, stable tracking is achieved only after the optimized state transition. However, in the Zernike moment method, stable tracking is achieved whether the state transition is optimized or not. The absolute values of the means of the deviations in the X and Y directions after optimization are about 98% and 62% smaller than those without optimization, respectively, in the Zernike moment method, and the distance mean S is about 81% smaller; the corresponding variances are smaller, too.

Dynamic Target with AUV Following Observation
In the same way, we extract the first three frames to construct the feature template, as shown in Figure 16. The corresponding feature-description vectors are obtained in Table 7. 18.5920 Figure 17 shows the tracking process after using the Zernike moment feature-description method and optimized state transition. The white points are the position estimation points, the green point set is the particle population distributed at the current moment, and the purple line is the motion path of the AUV. For the optimized state transition method, the algorithm almost always goes to mode 2 and obtains a set of position estimation points. Figure 18 shows the differential GPS path of the target and the tracking paths under several methods.  The tracking deviations of the PF tracker in both the X and Y directions are shown in Figure 19. Further, we calculate the means and variances of the deviations in both the X and Y directions for the third data set, as in Table 8. Similar to the second data set, for the third data set, it can be seen from Figure 19 and Table 8 that better tracking accuracy is obtained by using Zernike moments under the optimized state transition condition. The absolute values of the means of deviations in the X and Y directions in the Zernike moment method are about 22% and 36% smaller than those in the Hu moment method, respectively. The S value is nearly 21% smaller, and the corresponding variances are smaller. In addition, the Hu moment method quickly fails to track when the state transition is not optimized, while the Zernike moment method still maintains stable tracking. On the other hand, from the moments perspective, in the Hu moment method, stable tracking is achieved only after the optimized state transition. However, in the Zernike moment method, stable tracking is achieved whether the state transition is optimized or not. The absolute values of the means of the deviations in the X and Y directions after optimization are about 32% and 18% smaller than those without optimization, respectively, in the Zernike moment method; the distance mean S is about 17% smaller and the corresponding variances are smaller, too.

Conclusions
In this paper, we study the PF target-tracking method based on mechanical scanning sonar. Through AUV tracking experiments and data analysis, we demonstrate that the method of Zernike moment feature matching leads to better tracking results. In addition, the optimization of the first-order autoregressive model using target motion convergence is also beneficial to the accuracy of the algorithm. This paper further corroborates the view in the literature [16] that Zernike moments are superior to Hu moments in target identification.
As we can see, the experimental data are processed offline for comparison, and we are ready to carry out online real-time tracking experiments in the future. At the same time, we will optimize the case of unstable observation caused by disruption to improve the tracking accuracy of a single target.