Target Tracking Algorithm Based on an Adaptive Feature and Particle Filter

To boost the robustness of the traditional particle-filter-based tracking algorithm under complex scenes and to tackle the drift problem that is caused by the fast moving target, an improved particle-filter-based tracking algorithm is proposed. Firstly, all of the particles are divided into two parts and put separately. The number of particles that are put for the first time is large enough to ensure that the number of the particles that can cover the target is as many as possible, and then the second part of the particles are put at the location of the particle with the highest similarity to the template in the particles that are first put, to improve the tracking accuracy. Secondly, in order to obtain a sparser solution, a novel minimization model for an Lp tracker is proposed. Finally, an adaptive multi-feature fusion strategy is proposed, to deal with more complex scenes. The experimental results demonstrate that the proposed algorithm can not only improve the tracking robustness, but can also enhance the tracking accuracy in the case of complex scenes. In addition, our tracker can get better accuracy and robustness than several state-of-the-art trackers.


Introduction
Target tracking has always been a popular research direction in the field of computer vision as it has important applications in scene monitoring, behavior analysis, autopilot, robot, and so forth [1][2][3].Although the visual tracking technology has made considerable progress, and a large number of excellent tracking algorithms have been proposed [4][5][6][7][8][9][10][11], there are still a series of unpredictable challenges, such as occlusion, motion blur, pose and shape change, illumination change, scale variation, and so on.Therefore, developing a robust tracking algorithm has always been a very tough task.
The particle-filter-based tracking algorithm has attracted a great number of scholars' attention because it has the advantages of simple implementation, parallel structure, and strong practicality, etc. Inspired by Yang [11], certain relationships between the particles could be exploited through the lower rank and the temporal consistency.However, the low rank and the temporal consistency cannot exploit the relationship between the particles when the scene changes greatly between two adjacent frames or the distribution of the particles of the current frame is relatively dispersed.Therefore, a relatively simple and effective approach for exploiting the relationship between the particles is proposed.Compared with all of the particles that are put at once in each frame in the traditional particle-filter-based tracking algorithms, in our tracking algorithm, all of the particles are divided into two parts and are put separately in each frame.Through the intelligent cooperation between the two parts of the particles, the number of particles that cover the target increases, which can improve the tracking accuracy.

Materials and Methods
According to the different expression strategies that have been adopted by the appearance models, the tracking algorithms could be generally divided into two categories, generative and discriminative approaches.Among them, the generative approach was used to establish a descriptive model in order to represent the target, and to then use it to search for the most similar region as the target in the image [16].Because the proposed tracking algorithm belonged to the generative approach, we have focused on some related works of the generative approach.
The theory of sparse representation has been widely used in the target tracking algorithms [4][5][6][7][8].Mei et al. [6] combined sparse representation with a L 1 norm minimization model to improve the particle-filter-based tracking algorithm, which obtained good tracking results.However, the L 1 norm minimization model needed to be solved for each particle at each frame, which led to a high computational complexity of the algorithm.To enhance the tracking speed, Mei et al. [7] proposed a minimum boundary error rule to remove some insignificant particles, which could reduce the number of calculating L 1 norm minimization model.To improve the tracking speed and accuracy at the same time, Bao et al. [8] improved the algorithms that were proposed by the authors of [6,7], through adding the L 2 regularization term on the coefficients that were associated with the trivial templates in the L 1 norm minimization model, and used the accelerated proximal gradient (APG) method to accelerate the speed of solving sparse coefficients.Zhang et al. [17] imposed a weighted least squares technique, which could release the sparsity constraint on the traditional sparse representation methods to achieve strong robustness against appearance variations, and that utilized structurally random projection to reduce the dimensionality of the feature, while improving computational efficiency.Meshgi et al. [18] proposed an occlusion-aware particle filter framework by utilizing a binary flag to attach to each particle, in order to estimate the occlusion state according to the state and to treat occlusions in a probabilistic manner.
In recent years, the L p norm was widely used.Zhang et al. [19] proposed an L p norm relaxation to improve the sparsity exploitation of the total generalized variation.Xie et al. [20] generalized the nuclear norm minimization to the Schatten L p norm minimization, which achievedd good results both in background subtraction and image denoising.Wang et al. [21] proposed to impose a L 2,p norm regularization on self-representation coefficients for an unsupervised feature selection.Chartrand [22] developed a new non-convex approach for matrix optimization problems, involving sparsity to the decomposition of video into low rank and sparse components, which could separate moving objects from the stationary background better than in the convex case.
Using the advantages of multiple features to improve the tracking robustness was a popular approach.Hong et al. [15] represented the target by using four complementary features to overcome the problem of complex scenes, yet they did not highlight which feature was more important in some scenes.Dash et al. [23] utilized texture feature and Ohta color features in the feature vector of the covariance tracking algorithm, which was capable of handling occlusion, camera motion, appearance change, and illumination change, nevertheless the algorithm had a poor tracking performance for targets with insignificant color features.Yoon et al. [24] proposed a novel visual tracking framework that fused multiple trackers with different feature descriptors in order to cope with motion blur, illumination changes, pose variations, and occlusions, but every tracker used a single feature and ignored other features.Morenonoguer et al. [25] developed a robust tracking system by applying the appearance and geometric features to segment a target from its background, and showed impressive result on the challenges, but the tracking system had a poor real-time performance.Yin et al. [26] combined color information, motion information, and edge information in the framework of particle filtering to solve the problem of illumination variation and similarly colored background clutters, but the tracking performance of the algorithm was not stable when the background was complicated, and the contrast between the target and the background was low.Zhao et al. [27] fused the color feature and the Haar feature to overcome the challenges of illumination and pose change, but the algorithm could not handle partial occlusion.Tian [28] represented the target by the multiple feature descriptions, based on the selected color subspace, to improve the robustness of the target tracking to some extent.However, the algorithm was not able to handle the occlusion trouble.
Based on the ideas of the above literature, we employed a particle filter as a tracking framework and made use of the properties of the L p norm, and then a tracking algorithm based on the L p norm and the intelligent particle filter (L p -IPFT) was proposed.In addition, on the basis of the L p -IPFT, combined with the advantages of the multiple features, an adaptive multi-feature L p -IPFT (AMFL p -IPFT) was proposed.

Particle Filter Framework
The particle filter transforms target tracking into a problem of estimating the target state through the known target measurement information.It calculates the posterior probability p(x t |z 1:t ) by two steps of prediction and update, supposing that x t represents the target state (the location and the shape of the target) in the t-th frame, and z 1:t = {z 1 , z 2 , • • • , z t } represents the target observations from the first frame to the t-th frame.According to the maximal approximate posterior probability, the optimal state x * t of the target can be obtained in the t-th frame, as follows: where x i t represents the state of the i-th particle in the t-th frame.In the particle-filter-based tracking algorithm, the expression of the prediction and the update equation are as follows: where p(z t |x t ) represents the observation likelihood between the target state x t and the observed state z t , which is inversely proportional to the reconstruction error of the particle, see Equation (8).p(x t |x t−1 ) denotes the state transition model, which uses six independent parameters of the affine transformation to represent the target state x t = {r, c, θ, s, α, ϕ}, where {r, c, θ, s, α, ϕ} represent the x translation, y translation, rotation angle, scale, aspect ratio, and skew direction, respectively.In general, the state transition model p(x t |x t−1 ) can be described by a zero-mean Gaussian distribution with diagonal covariance, as follows: where ϕ denote the variance of the above six independent parameters, respectively.
Applying the method mentioned above, according to the particle state x i t−1 , the candidate particles Y can be obtained in the t-th frame, as follows: where denotes real number, d represents the number of rows of the candidate particle y i , and n represents the number of the candidate particles.

Sparse Representation
The sparse representation model is mainly utilized to calculate the observation likelihood p(z t |x t ) of the sample state x t , which reflects the similarity between the candidate particle and the target template.Assuming that the target template set of the t-th frame is T t = [t where I is a trivial template set, a i T and a i I are the sparse coefficients of the target templates and the trivial templates, respectively, so a i t = [a i T ; a i I ] is sparse.In addition, to ensure that the L 1 tracking algorithm has a better robustness, it needs to impose nonnegative constrains on a i T .Then, the sparse representation of y i t can be obtained by solving Equation (7), as follows: where A = [T t , I, −I].Finally, the observation likelihood of x t i can be given by the following expression: where α is a constant for controlling the shape of the Gaussian kernel, Γ is a standard factor, and c i T is the minimizer of Equation (7), restricted to T t .

The Proposed Tracker
To improve the tracking accuracy and robustness, we propose three improvements on the basis of the L 1 -APG proposed by Bao et al. [8] as follows: (1) intelligent particle filter; (2) the minimization model for L p tracker; and (3) an adaptive multi-feature fusion strategy.With these improvements, the proposed algorithm can not only take advantage of the robustness to occlusion from sparse representation, but it can also introduce complementary features' representation for appearance modeling.

Intelligent Particle Filter
The traditional particle-filter-based tracking algorithms directly put all of the particles in accordance with the Gaussian distribution, according to the state of the particles in the previous frame, and do not take into account the relationship between these particles.When tracking a fast moving target, the tracking accuracy is affected because of the small or even zero number of particles covering the target.To tackle this problem, in our tracking algorithm, all of the particles are divided into two parts.In the first part, the particles are put in the same way as the traditional particle filter, while the other particles are put at the location where the particle that is put in the first part is most similar to the target template.Figure 1 illustrates the details of the particles put in the third frame of the Deer sequence.
T t

The Proposed Tracker
To improve the tracking accuracy and robustness, we propose three improvements on the basis of the L1-APG proposed by Bao et al. [8] as follows: (1) intelligent particle filter; (2) the minimization model for Lp tracker; and (3) an adaptive multi-feature fusion strategy.With these improvements, the proposed algorithm can not only take advantage of the robustness to occlusion from sparse representation, but it can also introduce complementary features' representation for appearance modeling.

Intelligent Particle Filter
The traditional particle-filter-based tracking algorithms directly put all of the particles in accordance with the Gaussian distribution, according to the state of the particles in the previous frame, and do not take into account the relationship between these particles.When tracking a fast moving target, the tracking accuracy is affected because of the small or even zero number of particles covering the target.To tackle this problem, in our tracking algorithm, all of the particles are divided into two parts.In the first part, the particles are put in the same way as the traditional particle filter, while the other particles are put at the location where the particle that is put in the first part is most similar to the target template.Then, the similarity of each red particle to the template is calculated, and the second part particles (green particles) are put at the location of the red particle with the highest similarity, and the locations of the green particles are still in accordance with the Gaussian distribution.Finally, the similarity of each green particle is calculated, and the most similar particle (including red and green particles) is selected as the candidate target.When the particles are put in the fourth frame, the same method as the third frame is utilized.It can be seen that the most similar red particle has been close to the target, and then using the green particles to cover the target, which can make the number of the particles that are distributed around the target as many as possible.
As we know, the more that particles cover the target, the more accurate the tracking result are.
Assuming that the total number of particles is 0 n , the number of particles in the first part is 1 n , the number of particles in the other part is 2 n , and n n ≤ , taking into account the computational complexity of the algorithm, the total number 0 n of particles is usually not too large, Firstly, the first part particles (red particles) are put according to the target state of the second frame, and the locations of these particles are in accordance with the Gaussian distribution.Then, the similarity of each red particle to the template is calculated, and the second part particles (green particles) are put at the location of the red particle with the highest similarity, and the locations of the green particles are still in accordance with the Gaussian distribution.Finally, the similarity of each green particle is calculated, and the most similar particle (including red and green particles) is selected as the candidate target.
When the particles are put in the fourth frame, the same method as the third frame is utilized.It can be seen that the most similar red particle has been close to the target, and then using the green particles to cover the target, which can make the number of the particles that are distributed around the target as many as possible.
As we know, the more that particles cover the target, the more accurate the tracking result are.Assuming that the total number of particles is n 0 , the number of particles in the first part is n 1 , the number of particles in the other part is n 2 , and n 1 >> n 2 .If n 1 ≤ n 2 , taking into account the computational complexity of the algorithm, the total number n 0 of particles is usually not too large, then the number of particles in the first part will be too small to cover the target, which is difficult for ensuring that the first part of the particles can provide effective information for the second part of particles, and then the function of the second part of the particles will be completely lost.Aiming at the problem, namely that it is difficult for the particles to cover the target, one common solution is to increase the number of particles, and the other common solution is to modify the affine parameters so as to make the particles more dispersed to enlarge the covering area.For the latter solution, since the distribution of the particles is too scattered, and the interval between the particles becomes larger and the reliability of the particles becomes smaller, it is easy to provide the wrong information for the second part of particles.Therefore, for the first part of the particles, we have employed the first solution, that is, increasing the number of particles in the first part to ensure that as many particles as possible can cover the target, so we choose n 1 >> n 2 .
In this paper, the particles are divided into two parts.The second part of the particles are put at the location where the particle is most similar to the target template in the first part, which is equivalent to putting the second part of the particles at or around the candidate target.This can improve the tracking accuracy.It can be said that the second part of the particles play a supporting role in the tracking process, its effect is equivalent to slightly adjusting the state (including the location, rotation, scale, and so on) of the optimal particles (candidate target) in the first part, to improve the tracking accuracy and robustness.At the same time, taking the relationship between the two parts of the particles into consideration, the affine parameters of the particles in the second part are smaller than that of the particles in the first part.Therefore, compared with the traditional particle filter, we have taken full account of the relationship between the particles.Our particles are intelligent, and they (the first part of the particles and the second part of the particles) can effectively assist, thus improving the tracking accuracy and robustness.

The Minimization Model for L p Tracker
According to the different values of p, the classic L p norm changes as shown in Figure 2. It can be seen from Figure 2 that the intersection of Figure 2a,b is not on the coordinate axis, and the intersection of Figure 2c is on the coordinate axis.Thus, the L 1 norm can obtain a sparser solution than the L 2 norm, and the L p norm can get a sparser solution than the L 1 norm.In addition, although it is a problem that the L p norm has a non-convex minimization, it is still possible to obtain excellent solutions efficiently.Thus, compared with the L 1 norm, the L p norm (0 < p < 1) has two advantages, (1) sparser solution; (2) higher flexibility, because p is no longer a fixed value.
ensuring that the first part of the particles can provide effective information for the second part of particles, and then the function of the second part of the particles will be completely lost.Aiming at the problem, namely that it is difficult for the particles to cover the target, one common solution is to increase the number of particles, and the other common solution is to modify the affine parameters so as to make the particles more dispersed to enlarge the covering area.For the latter solution, since the distribution of the particles is too scattered, and the interval between the particles becomes larger and the reliability of the particles becomes smaller, it is easy to provide the wrong information for the second part of particles.Therefore, for the first part of the particles, we have employed the first solution, that is, increasing the number of particles in the first part to ensure that as many particles as possible can cover the target, so we choose 1 2 n n >> .
In this paper, the particles are divided into two parts.The second part of the particles are put at the location where the particle is most similar to the target template in the first part, which is equivalent to putting the second part of the particles at or around the candidate target.This can improve the tracking accuracy.It can be said that the second part of the particles play a supporting role in the tracking process, its effect is equivalent to slightly adjusting the state (including the location, rotation, scale, and so on) of the optimal particles (candidate target) in the first part, to improve the tracking accuracy and robustness.At the same time, taking the relationship between the two parts of the particles into consideration, the affine parameters of the particles in the second part are smaller than that of the particles in the first part.Therefore, compared with the traditional particle filter, we have taken full account of the relationship between the particles.Our particles are intelligent, and they (the first part of the particles and the second part of the particles) can effectively assist, thus improving the tracking accuracy and robustness.

The Minimization Model for Lp Tracker
According to the different values of p , the classic Lp norm changes as shown in Figure 2. It can be seen from Figure 2 that the intersection of Figure 2a,b is not on the coordinate axis, and the intersection of Figure 2c is on the coordinate axis.Thus, the L1 norm can obtain a sparser solution than the L2 norm, and the Lp norm can get a sparser solution than the L1 norm.In addition, although it is a problem that the Lp norm has a non-convex minimization, it is still possible to obtain excellent solutions efficiently.Thus, compared with the L1 norm, the Lp norm ( ) The minimization model for L1 tracker proposed by Bao et al. [8], as follows: The minimization model for L 1 tracker proposed by Bao et al. [8], as follows: where A = [T t , I], T t is the target template, I is the trivial template, a = [a T ; a I ], λ is regularization factor, and is used to control the energy in trivial templates.When the occlusion is detected, µ t is set to 0; otherwise µ t is set to a preset constant.In order to obtain a sparser solution to improve the tracking accuracy, we solved the minimization problem of the L p norm (0 < p < 1) instead of the minimization problem of the L 1 norm presented by Bao et al. [8], and constructed a novel minimization model for L p tracker, as follows: min Similarly, for A = [T t , I], T t is the target template and I is the trivial template, a = [a T ; a I ], . To solve the problem of non-convex minimization of the L p norm more effectively, the generalized shrinkage operator λ (x) = sign(x)max(|x| − ( 1 λ ) p−2 |x| p−1 , 0) that was proposed by Chartrand [14] was employed, instead of the soft threshold λ (x) = sign(x)max(|x| − λ, 0) that was used in the L 1 norm.In addition, the generalized shrinkage operator could also be applied to the framework of the APG algorithm.Therefore, the approach that was utilized to solve the sparse coefficients a could be called L p -APG in this paper, and its detailed process is shown in Algorithm 1.
Through a large number of simulation experiments, it can be concluded that if p is set to 0.5, the performance of the L p -APG will be better.In addition, we compared L 0.5 -APG with L 1 -APG.The tracking results that were obtained are shown in Table 1.It can be seen from Table 1 that L 0.5 -APG is superior to L 1 -APG.Therefore, the tracking accuracy can be improved by solving the L p minimization model.

Adaptive Multi-Feature Fusion Strategy
In complex scenes, it is difficult to accurately represent the target with a single feature, which may lead to tracking drift or even failure.On the basis of the multi-feature fusion approach that has been proposed in the MTMVT algorithm, we utilized four complementary features (histogram of color (HOC), intensity, HOG, and LBP) to represent the target, and to propose an adaptive multi-feature fusion strategy, as shown in Figure 3a.

Adaptive Multi-Feature Fusion Strategy
In complex scenes, it is difficult to accurately represent the target with a single feature, which may lead to tracking drift or even failure.On the basis of the multi-feature fusion approach that has been proposed in the MTMVT algorithm, we utilized four complementary features (histogram of color (HOC), intensity, HOG, and LBP) to represent the target, and to propose an adaptive multi-feature fusion strategy, as shown in Figure 3a.In multi-feature tracking algorithms, the simple addition of the tracking result of each feature does not make full use of the advantages of the multiple features.Meanwhile, the robustness of each feature to the different scenes is different.When the robustness of a feature to a scene is high, the weight of the feature should be increased; otherwise, the weight of the feature should be reduced.As shown in Figure 3b, it can be seen that the weights of the four features change during the tracking of the Panda sequence.The weight of the feature is calculated by the following method.
In the first frame, set the weight j t w of each feature to be the same, as follows: where t denotes the number of the frame.From the second frame, the new weight is updated by the similarity between the particle with the highest similarity, which is calculated according to the j-th feature and the particle with the highest similarity, which is calculated according to all of the features.The expression of this similarity between the particles is as follows: In multi-feature tracking algorithms, the simple addition of the tracking result of each feature does not make full use of the advantages of the multiple features.Meanwhile, the robustness of each feature to the different scenes is different.When the robustness of a feature to a scene is high, the weight of the feature should be increased; otherwise, the weight of the feature should be reduced.As shown in Figure 3b, it can be seen that the weights of the four features change during the tracking of the Panda sequence.The weight of the feature is calculated by the following method.
In the first frame, set the weight w j t of each feature to be the same, as follows: where t denotes the number of the frame.From the second frame, the new weight is updated by the similarity between the particle with the highest similarity, which is calculated according to the j-th feature and the particle with the highest similarity, which is calculated according to all of the features.The expression of this similarity between the particles is as follows: where α feature is a constant for controlling the shape of the Gaussian kernel, and angel j is defined as follows: where a f f ine j best represents the state (affine parameter) of the j best -th particle, which has the highest similarity and is calculated according to the j-th feature, and a f f ine j best = r j best , c j best , θ j best , s j best , α j best , ϕ j best ; a f f ine i represents the state (affine parameter) of the i-th particle, which has the highest similarity to the target template and is calculated according to all of the features, and a f f ine i = {r i , c i , θ i , s i , α i , ϕ i }.Then, the expression of cal Angle(•, •), which calculates the angle between two vectors is as follows: The larger the angle, the greater the difference between the state of the particle with the highest similarity corresponding to the j-th feature, and the state of the particle calculated according to all of the features, or, the farther the two particles are apart from each other, or the greater the difference in their scale or post.Conversely, the smaller the angle, the closer the state of the particle with the highest similarity corresponding to the j-th feature to the state of the particle that is calculated according to all of the features, and the more reliable the j-th feature is been considered.The j best -th particle corresponding to a f f ine j best can be obtained by minimizing the sparse representation error corresponding to Equation (15), and the i-th particle corresponding to a f f ine i can be obtained by minimizing the sparse error corresponding to Equation ( 16).
where e j j best denotes the sparse representation error of the j best -th particle determined by the j-th feature, e i denotes the sparse error of the i-th particle determined by all of the features, and c is the sparse coefficient of the j best -th particle corresponding to the j-th feature according to Equation (10).Finally, the feature weight is updated as follows: In the proposed intelligent particle filter framework, in order to obtain a more reasonable target state in the first parts of the particles, we simultaneously considered the state of the particle with the highest similarity corresponding to each feature, and the state of the particle with the highest similarity corresponding to all of the features.Then, the expression of the state of the candidate target is as follows: where a f f ine is defined as the following: where m j best represents the number of features in the top n 1 * β of the sparse representation errors corresponding to the j best -th particle, in addition to the j-th feature, in ascending order.The larger m j best is, the more reliable the j-th feature is, and the more reliable the j best -th particle is.
The second part of the particles are put according to the affine parameters that have been obtained by Equation (18), and the state of the final candidate target is determined by the following steps.Firstly, find out the top two features of the weights using Equation (17).Then, directly add the sparse errors corresponding to these two features.Finally, the particle with the smallest error is determined as the state of the target.

Setting Parameters
The parameters that were involved in the experiments were set as follows: the template size was 12 × 15 for L p -IPFT and the template size was 15 × 15 for AMFL p -IPFT, the affine transformation parameters of the particles put for the first time were 0.03, 0.0005, 0.0005, 0.03, 3.5, and 3.5, and the affine transformation parameters of the particles put for the second time were 0.005, 0.0005, 0.0005, 0.005, 0.8, and 0.8.In Equation (10), p = 0.5, λ = 0.01, µ t = 10.In the intelligent particle filter algorithm, the total number of particles were n 0 = 500, the number of the particles that were put for the first time were n 1 = 450, the number of the particles that were put for the second time were n 2 = 50.In Algorithm 1, the number of the trivial template was I = 180, the number of the positive template was T = 10, and the Lipchitz constant was L = 8.In the multi-feature fusion strategy, β = 10% and α f eature = 20.The proposed AMFL p -IPFT algorithm ran at 0.45 frames per second, and the L p -IPFT algorithm ran at 25 frames per second.

Quantitative Analysis
The performance of our approach was quantitatively validate by two metrics, including distance precision (DP) and overlap success (OS) in a one-pass evaluation (OPE).The DP was defined as the percentage of frames where the distance precision (DP) and overlap success (OS)center location error (CLE) was within a threshold of 20 pixels.The OS was defined as the percentage of frames where the bounding box overlap surpassed a threshold of 0.5.The CLE was (x p − x g ) 2 + (y p − y g ) 2 , where x p , y p is the central location of the target tracked by the algorithm, and x g , y g is the real center location of the target.The region area overlap ratio is defined as area(S gt ∩S tr ) area(S gt ∪S tr ) , where S gt and S tr are the tracked bounding box and the ground truth bounding box, respectively.

Overall Performance Analysis
We plotted the precision and success plots, including the area-under-the curve (ACU) score over all of the 50 sequences, as shown in Figure 4.It could be seen from Figure 4 that the proposed AMFL p -IPFT was superior to the other trackers.Table 2 illustrates that our algorithm performed favorably against the competitive trackers.The AMFL p -IPFT performed well against the L p -IPFT (by 17.9% and 13.9%), MTMVT (by 21.0% and 28.0%), and L 1 -APG (by 53.7% and 40.4%), in terms of DP and OS, respectively.Compared with L 1 -APG, L p -IPFT registered a performance improvement of 35.8% in terms of OS and 26.5% in terms of DP. over all of the 50 sequences, as shown in Figure 4.It could be seen from Figure 4 that the proposed AMFLp-IPFT was superior to the other trackers.Table 2 illustrates that our algorithm performed favorably against the competitive trackers.The AMFLp-IPFT performed well against the Lp-IPFT (by 17.9% and 13.9%), MTMVT (by 21.0% and 28.0%), and L1-APG (by 53.7% and 40.4%), in terms of DP and OS, respectively.Compared with L1-APG, Lp-IPFT registered a performance improvement of 35.8% in terms of OS and 26.5% in terms of DP.

Attribute-Based Performance Analysis
In order to fully evaluate the effectiveness of the proposed algorithms, we further evaluated the performance of the algorithm using 11 attributes on the OTB-50 video dataset.All of the ACU results for each tracker were given in Tables 3 and 4. The best result was highlighted in the red and the second was highlighted in blue.

Attribute-Based Performance Analysis
In order to fully evaluate the effectiveness of the proposed algorithms, we further evaluated the performance of the algorithm using 11 attributes on the OTB-50 video dataset.All of the ACU results for each tracker were given in Tables 3 and 4. The best result was highlighted in the red and the second was highlighted in blue.
We noted that the proposed AMFL p -IPFT performed well in handling challenging factors, including fast motion (precision plots: 62.9% and success plots: 49.7%), illumination variation (precision plots: 71.4% and success plots: 52.8%), and out-of-plane rotation (precision plots: 70.6% and success plots: 52.3%).The L p -IPFT performed well in dealing with fast motion (precision plots: 54.2% and success plots: 44.1%) and low resolution (precision plots: 66.3% and success plots: 47.5%) challenging factors.Note: The best result was highlighted in the red and the second was highlighted in blue.

Qualitative Analysis
In view of the different characteristics of these video sequences mentioned above, we discussed four group experiments on eight video sequences with the 12 trackers that were described above.

Experiment 1: Robustness analysis of partial occlusion
There were partial occlusion challenges in the short time in the Figure 5a Girl and Figure 5b Soccer video sequences.In the Girl sequence, there was an occlusion of a man's head, as in frame 436.In the Soccer sequence, there were occlusions of red papers, as in frame 110 and 183, and an occlusion of the trophy at frame 96.For these videos, the L p -IPFT, L 1 -APG, and SCM algorithms all only used the intensity feature and utilized trivial templates to judge occlusion, but in the complex scenes, the trivial templates were prone to misjudgment and the templates could not update in time.The proposed AMFL p -IPFT algorithm had a better tracking performance.

Experiment 2: Robustness analysis of fast motion
There were fast motion challenges in the Figure 5c Deer and Figure 5d Jumping video sequences.In the Deer sequence, the target moved significantly around the 32nd frame, and there was motion blur around the 24th frame and the 57th frame.In the Jumping sequence, there also existed fast motion and motion blur, as in frame 16, 35, and 207.For the fast motion challenge, the proposed intelligent particle filter enabled more particles to cover the target, so the L p -IPFT and AMFL p -IPFT algorithms could solve the fast motion problem well, and further improved the robustness and accuracy of the tracking.

Experiment 3: Robustness analysis of illumination variation
There were illumination variation challenges in the Figure 5e Shaking and Figure 5f Singer2 video sequences.In the Shaking sequence, the targets were all affected by the background lighting, such as the 60th frame in the Shaking sequence and the 156th and 349th in the Singer2 sequence.Since the color feature was sensitive to illumination, it was not able to adapt to complex scenes with illumination variation.However, the LBP feature could make up for this deficiency, and it could overcome the influence of illumination variation.In addition, the HOG feature was also robust to the illumination variation.Therefore, the proposed AMFL p -IPFT algorithm had good tracking results to these sequences.

Experiment 4: robustness analysis of deformation
There were deformation challenges in the Figure 5g Bolt and Figure 5h Dudek video sequences.The targets' limbs changed during the tracking, as in frame 6, 95, and 269 of the Bolt sequence.In addition, there was some change in facial expression, such as in the 572th frame of the Dudek sequence.Since the HOG feature maintained a good invariance of the target's deformation in geometry and illumination, some movements of the subtle body could be ignored by the HOG feature without affecting the detection result.While the target had a large deformation, other features, such as color histograms, guaranteed that the final tracking result did not greatly deviate from the actual target.
There were deformation challenges in the Figure 5g Bolt and Figure 5h Dudek video sequences.The targets' limbs changed during the tracking, as in frame 6, 95, and 269 of the Bolt sequence.In addition, there was some change in facial expression, such as in the 572th frame of the Dudek sequence.Since the HOG feature maintained a good invariance of the target's deformation in geometry and illumination, some movements of the subtle body could be ignored by the HOG feature without affecting the detection result.While the target had a large deformation, other features, such as color histograms, guaranteed that the final tracking result did not greatly deviate from the actual target.

Conclusions
In order to enhance the tracking performance, we improved the L1-APG tracker [8].Firstly, we divided the particles into two parts and put them separately.In this way, the intelligent particles could cooperate with each other to achieve accurate tracking.Then, to get a sparser solution, a novel minimization model for the Lp tracker was proposed.Finally, an adaptive multi-feature fusion strategy was proposed to solve the problem that single feature could not deal with complex scenes ideally.The experimental results on a benchmark with 50 challenging sequences validated that the proposed AMFLp-IPFT algorithm had a better accuracy and robustness than several state-of-the-art trackers, for challenges such as fast motion, occlusion, illumination, and deformation.However, since the AMFLp-IPFT algorithm utilized four features to represent the target at the same time, the real-time performance could not be satisfied.Therefore, how to improve the tracking efficiency was the work that needed to be researched in the next step.

Conclusions
In order to enhance the tracking performance, we improved the L 1 -APG tracker [8].Firstly, we divided the particles into two parts and put them separately.In this way, the intelligent particles could cooperate with each other to achieve accurate tracking.Then, to get a sparser solution, a novel minimization model for the L p tracker was proposed.Finally, an adaptive multi-feature fusion strategy was proposed to solve the problem that single feature could not deal with complex scenes ideally.The experimental results on a benchmark with 50 challenging sequences validated that the proposed AMFL p -IPFT algorithm had a better accuracy and robustness than several state-of-the-art trackers, for challenges such as fast motion, occlusion, illumination, and deformation.However, since the AMFL p -IPFT algorithm utilized four features to represent the target at the same time, the real-time performance could not be satisfied.Therefore, how to improve the tracking efficiency was the work that needed to be researched in the next step.

Figure 1 .
Figure 1.The details of particles put in the third frame of the Deer sequence.All of the particles are divided into two parts and put separately, the red particles indicate the particles put for the first time, and the green particles indicate the particles put for the second time, and the location of the particle is assumed to correspond to the upper left corner of the tracked bounding box.Firstly, the first part particles (red particles) are put according to the target state of the second frame, and the locations of these particles are in accordance with the Gaussian distribution.Then, the similarity of each red particle to the template is calculated, and the second part particles (green particles) are put at the location of the red particle with the highest similarity, and the locations of the green particles are still in accordance with the Gaussian distribution.Finally, the similarity of each green particle is calculated, and the most similar particle (including red and green particles) is selected as the candidate target.When the particles are put in the fourth frame, the same method as the third frame is utilized.It can be seen that the most similar red particle has been close to the target, and then using the green particles to cover the target, which can make the number of the particles that are distributed around the target as many as possible.

Figure 1 .
Figure 1.The details of particles put in the third frame of the Deer sequence.All of the particles are divided into two parts and put separately, the red particles indicate the particles put for the first time, and the green particles indicate the particles put for the second time, and the location of the particle is assumed to correspond to the upper left corner of the tracked bounding box.Firstly, the first part particles (red particles) are put according to the target state of the second frame, and the locations of these particles are in accordance with the Gaussian distribution.Then, the similarity of each red particle to the template is calculated, and the second part particles (green particles) are put at the location of the red particle with the highest similarity, and the locations of the green particles are still in accordance with the Gaussian distribution.Finally, the similarity of each green particle is calculated, and the most similar particle (including red and green particles) is selected as the candidate target.When the particles are put in the fourth frame, the same method as the third frame is utilized.It can be seen that the most similar red particle has been close to the target, and then using the green particles to cover the target, which can make the number of the particles that are distributed around the target as many as possible.

Figure 3 .
Figure3.In (a), the red bounding box represents the final target state, the blue one represents the most similar particle to the target template for the histogram of color (HOC) feature, the yellow one represents the most similar particle to the target template for the intensity feature, the white one represents the most similar particle to the target template for the histogram of oriented gradient (HOG) feature, and the purple one represents the most similar particle to the target template for the LBP feature; (b) Is the variation curve of the weights of different features during the tracking of the Panda sequence (in this paper, we set the upper and lower limits of the weight, with a maximum of 0.5 and a minimum of 0.1).

Figure 3 .
Figure3.In (a), the red bounding box represents the final target state, the blue one represents the most similar particle to the target template for the histogram of color (HOC) feature, the yellow one represents the most similar particle to the target template for the intensity feature, the white one represents the most similar particle to the target template for the histogram of oriented gradient (HOG) feature, and the purple one represents the most similar particle to the target template for the LBP feature; (b) Is the variation curve of the weights of different features during the tracking of the Panda sequence (in this paper, we set the upper and lower limits of the weight, with a maximum of 0.5 and a minimum of 0.1).

Figure 4 .
Figure 4. Comparisons of different trackers by using precision and success plots over all 50 sequences.The legend contains the ACU score for each tracker.

Figure 4 .
Figure 4. Comparisons of different trackers by using precision and success plots over all 50 sequences.The legend contains the ACU score for each tracker.

Figure 5 .
Figure 5. Four groups of tracking results corresponding to different trackers.

Figure 5 .
Figure 5. Four groups of tracking results corresponding to different trackers.

Table 1 .
The center location error (CLE) average of the L 0.5 -accelerated proximal gradient (APG) and L 1 -APG on 8 video sequences.
r j best +c i * c j best +θ i * θ j best +s i * s j best +α i * α j best +ϕ i * ϕ j best

Table 2 .
Quantitative comparison of 12 trackers on 50 sequences.The maximum distance precision (DP) and overlap success (OS) values are highlighted in red type, and the second ones are highlighted in light blue type.MTMVT-multi-task multi-view tracking algorithm.

Table 3 .
ACU results of each tracker on sequences with different challenge for OPE about precision.

Table 2 .
Quantitative comparison of 12 trackers on 50 sequences.The maximum distance precision (DP) and overlap success (OS) values are highlighted in red type, and the second ones are highlighted in light blue type.MTMVT-multi-task multi-view tracking algorithm.

Table 3 .
ACU results of each tracker on sequences with different challenge for OPE about precision.

Table 4 .
ACU results of each tracker on sequences with different challenge for OPE about success.