Parallel Correlation Filters for Real-Time Visual Tracking

Correlation filter-based methods have recently performed remarkably well in terms of accuracy and speed in the visual object tracking research field. However, most existing correlation filter-based methods are not robust to significant appearance changes in the target, especially when the target undergoes deformation, illumination variation, and rotation. In this paper, a novel parallel correlation filters (PCF) framework is proposed for real-time visual object tracking. Firstly, the proposed method constructs two parallel correlation filters, one for tracking the appearance changes in the target, and the other for tracking the translation of the target. Secondly, through weighted merging the response maps of these two parallel correlation filters, the proposed method accurately locates the center position of the target. Finally, in the training stage, a new reasonable distribution of the correlation output is proposed to replace the original Gaussian distribution to train more accurate correlation filters, which can prevent the model from drifting to achieve excellent tracking performance. The extensive qualitative and quantitative experiments on the common object tracking benchmarks OTB-2013 and OTB-2015 have demonstrated that the proposed PCF tracker outperforms most of the state-of-the-art trackers and achieves a high real-time tracking performance.


Introduction
Visual tracking plays a core role in computer vision for its wide applications including video surveillance, robotics, driver-less vehicles, intelligent interaction, and various automatic systems [1]. The goal of visual tracking is to track the trajectory of the target that is initialized only by the bounding box from the first frame among the video sequences [2]. During tracking, the appearance of the target changes randomly and unpredictably when the target undergoes deformation, illumination variation, and rotation. It is one of the core issues determining the tracking accuracy and robustness in visual tracking. Although significant progress has been achieved in recent decades, visual tracking is still a challenging problem due to these factors.
In general, visual trackers can be broadly classified into two categories, generative trackers [3][4][5][6][7][8][9][10][11][12], and discriminative trackers [13][14][15][16][17][18][19][20][21][22][23][24][25]. Generative trackers describe the target in the real world by the target representation method in computer vision and establish a target appearance model dynamically to find a candidate most similar to the target appearance model in the video sequence. Therefore, generative trackers can reflect the similarity of the same kind of target [3]. However, it tends to produce a significant number of false positives and the learning process is complicated. Discriminative trackers extract the discriminative features of the target and utilize the method of classification in machine learning to search for the region most similar to the target and locate the position of the target [13]. Discriminative

Standard Discriminative Correlation Filter
The standard DCF [2] has been widely studied by many researchers due to its superior tracking accuracy and speed. The framework of the classical DCF is explicitly described in Figure 1. The DCF tracker trains a correlation filter model efficiently in the frequency domain by applying the machinelearning technique to distinguish the target from the background. It then updates the model online, exploiting the features extracted from the detected result in the current frame.  filter h l t is equal to an L 2 error function, which can be expressed as Equation (1): where * means the convolutional operator. k, l denote the number of training samples and the total dimension of the extracted features respectively. The expected correlation output g k is a 2-dimensional Gaussian distribution with the same size of h l t . The second term in this Equation is a regularization term which is utilized to prevent overfitting and λ(λ ≥ 0) is a regularization parameter.
For computational efficiency, Equation (1) can be transformed into the frequency domain by fast Fourier transform (FFT). Then the objective function can be transformed as follows: where the capital letters denote the discrete Fourier transformations. H l t denotes the correlation filter in the frequency domain and the overbar of H l t represents complex conjugation of H l t . The convolution operation * can be efficiently implemented by element-wise product • in the frequency domain.
Therefore, through deriving Equation (2) and setting the derivative as zero, the final solutions are computed by Equation (5), which is implemented as follows: where t is the current step of the current video frame. In Equation (5), H l t denotes the trained correlation filter in the frequency domain and l is the dimension value of the filter, here the dimension value is equal to the dimension of the extracted features. For computational efficiency, the solution can be divided into two part as Equations (3) and (4), A l t represents the numerator of the filter, and B t represents the denominator of the filter. The overbar of F means the complex conjugation.
In order to adapt to the photometric and geometric variations of the target appearance, A l t , B t of the correlation filter is updated online by Equations (6) and (7) respectively: where the scalar η 0 is a fixed learning rate. For detecting the position of the target in the frame t, the features Z l are extracted from the region of the target pending detection. The responding correlation scores y t can be computed by Equation (8). Then the position of the maximum value of y t is regarded as the center position of the target area in the current frame.
In this paper, the DCF-based tracker ECO-HC [15] was chosen as the baseline tracker due to its excellent performance in terms of accuracy and robustness. The ECO-HC tracker has made several improvements on the basis of the standard DCF tracker. Firstly, it factorizes the convolution operation to reduce the model parameters. Secondly, it simplifies the generation of the training set by merging similar samples into a component to guarantee the diversity of samples. Thirdly, it employs the sparse update strategy to avoid the model drift problem where the update interval is set to 6. However, ECO-HC tracker has its own deficiencies. It is unreasonable to update the correlation model with a fixed learning rate when the sample appearance changes greatly due to deformation, illumination variation or rotation. Hence, ECO-HC tracker is difficult to track the target when the target appearance changes greatly. Furthermore, the Gaussian distribution with a smooth peak is utilized as the desired output of the correlation filter, which makes the result of the location inaccurate. In contrast to the ECO-HC, the proposed PCF tracker has been improved on this issue and obtained remarkable improvements on the benchmarks as shown in Figure 2.

Parallel Correlation Filters
The fundamental framework of the proposed PCF is explicitly described in Figure 3. In the initial frame, the PCF tracker trains two parallel correlation filters PCF1 and PCF2 online in a frequency domain utilizing the shared samples and the sharp correlation output. In the current frame, PCF1 and PCF2 are utilized to track the target respectively. Through weighting the response maps of PCF1 and PCF2, PCF tracker can detect the position of the target by applying the Newton method. Then it utilizes the Gaussian mixture model (GMM) [15] to generate a new sample set by adding a new sample or merging the two closest samples. As with the update strategy in the baseline tracker [15], it utilizes the new sample set to update the two parallel correlation filters with different learning rates every six frames.
PCF learns respectively two parallel correlation filters PCF1 h1 l t and PCF2 h2 l t by exploiting m target samples f l 1k , f l 2k , . . . f l mk with different weights w t kp which are determined by the learning rate. The objective function of PCF1 can be expressed as Equation (9) and the objective function of PCF2 can be expressed as Equation (10): where k, p denote the number of samples in the training set and the circular shift set respectively. The sharp output gs k is merged by the Gaussian distribution and the triangle distribution which will be described in detail in Section 3.3. The second term of these two Equations is a regularized term and ω denotes the spatial regularization parameter.
where , k p denote the number of samples in the training set and the circular shift set respectively.
The sharp output k gs is merged by the Gaussian distribution and the triangle distribution which will be described in detail in Section 3.3. The second term of these two Equations is a regularized term and ω denotes the spatial regularization parameter.  On all these five cases, PCF tracker performs better center position precision and overlap precision than the baseline tracker. PCF tracker successfully tracks the target when the target undergoes significant rotation, illumination variation, and deformation.
Equations (9) and (10) can be transformed into the frequency domain efficiently by FFT. Due to the regularization parameter, ω breaks the closed solution of the objective functions, the solutions of H1 l t−1 and H2 l t−1 are obtained by the conjugation gradient (CG) iterative method [15]. Then for detecting the position of the target in the frame t, the shared features Z l t,pos are extracted from the region of the target pending detection. The center position of the region is determined by the previously detected position. The corresponding correlation scores y t,pos can be computed by Equation (11). Then the position of the target in the current frame is optimized by the Newton iterative method.
where α denotes the fusion factor. H1 l t−1 and H2 l t−1 represent two parallel correlation filters in the frequency domain.
where α denotes the fusion factor.  After detection, the new sample is extracted from the tracking result. Then the GMM is utilized to compute the similarities between the new sample and the components of the training set. After that, a new training set is generated by adding the new sample or merging two closest samples. The where t denotes the t-th frame and η is the learning rate. Different learning rates mean different weights of samples. After detection, the new sample is extracted from the tracking result. Then the GMM is utilized to compute the similarities between the new sample and the components of the training set. After that, a new training set is generated by adding the new sample or merging two closest samples. The weights w t kp of the trained samples are updated by Equation (12).
where t denotes the t-th frame and η is the learning rate. Different learning rates mean different weights of samples. For further detecting the scale of the target [13] in the frame t, the standard DCF is utilized to extract scale features Z l t,scale with different scale factors and to compute the scale correlation scores y t,scale in Equation (13). Then the scale factor of the maximum value of y t,scale is utilized to compute the scale of the target in the current frame.

Reasonable Distribution of the Correlation Response
The Gaussian distribution, the triangle distribution, and the merged distribution are described clearly in Figure 4a-c respectively. The desired output of the standard DCF methods follows a 2-D Gaussian distribution with a smooth peak. Due to the DCF training, the model using the synthetic samples generated by circular shift windows, the peak of the desired output should be sharp to avoid model drifting. The 2-D Gaussian probability density g(x, y) is expressed in Equation (14).
where σ represents the standard deviation and it is set to 1/16 in this paper. x, y denotes the coordinates for the figure pixels. w, h denotes the width and height of the figure, and w 2 , h 2 is regarded as the original point.
denote the numerator and the denominator of the scale correlation filter in the previous frame receptively.

Reasonable Distribution of the Correlation Response
The Gaussian distribution, the triangle distribution, and the merged distribution are described clearly in Figure 4a-c respectively. The desired output of the standard DCF methods follows a 2-D Gaussian distribution with a smooth peak. Due to the DCF training, the model using the synthetic samples generated by circular shift windows, the peak of the desired output should be sharp to avoid model drifting. The 2-D Gaussian probability density ( , ) g x y is expressed in Equation (14).
where σ represents the standard deviation and it is set to 1/16 in this paper. x , y denotes the coordinates for the figure pixels. w , h denotes the width and height of the figure, and 2 2 is regarded as the original point.  Different from the Gaussian distribution, the triangle distribution shown in Figure 4b has a sharp peak. However, it has a small slope at the hillside of the distribution. Due to the synthetic samples are generated by circular shift windows, the positions below the hillside of the distribution are regarded as the label of negative samples. The values of these positions should set nearly to zero. The 2-D triangle probability density s(x, y) is expressed in Equation (15).
where w 2 , h 2 is regarded as the original point. Both Gaussian distribution and triangle distribution are not suitable for DCF. Hence, a new sharp distribution of the desired output is proposed to replace the original Gaussian distribution to achieve higher tracking accuracy and robustness. The 2-D merged probability density m(x, y) is expressed in Equation (16). The sharp distribution organically merges Gaussian distribution and triangle distribution by multiplication operators. It simultaneously combines the merits of these two distributions, with a sharp peak and a large slope at the hillside. Extensive experiments on benchmarks have demonstrated the effectiveness of the sharp distribution.
where w 2 , h 2 is regarded as the original point.

The Brief Outline of the PCF Tracking Algorithm
The proposed PCF tracking algorithm is briefly described in Algorithm 1. The PCF tracker first detects the position of the target, and then the scale correlation filter is utilized to refine the scale of the target. There are some differences between the scale and the position correlation filters. For the position filters, the PCF tracker extracts the position features from the first frame with a bounding box. It then uses the extracted features initializes two parallel correlation filters PCF1 H1 l 1 and PCF2 H2 l 1 online in the frequency domain with a sharp correlation output. From the second frame to the end frame of the sequence, PCF1 and PCF2 are utilized to track the target respectively. Through weighting the response maps of these two parallel correlation filters, the PCF tracker can detect the center position P t of the target by applying the NM. Then it utilizes the GMM to generate a new sample set. It also utilizes the features extracted from the new sample set to update the two parallel correlation filters H1 l t , H2 l t with different learning rates η t1 , η t2 every six frames. For the scale filter, the PCF tracker extracts the scale features from the first frame with different scale factors. It then uses the extracted features initializes the scale correlation filters A 1,scale , B 1,scale . In the subsequent frame of the sequence, similar to the position filters, the scale filter utilizes the detected result S t to update the filter A t,scale , B t,scale by the learning rate η 0 every frame. 2: for t ∈ 2, t f do.

3: Position detection: 4:
Extract position features Z t,pos from I t at P t−1 and S t−1 by a search region.

6:
Merge the two correlation scores to y t,pos by Equation (11).

7:
Set P t to the target position by Newton iterative method.

8:
Scale detection: 9: Extract scale feature Z t,scale from I t at P t−1 and S t−1 by a search region.

11:
Set S t to the target scale that maximizes y t,scale . 12: Model update every six frames:

13:
Extract new sample features F t,pos and F t,scale from I t at P t and S t .

14:
Generate new training set by the GMM method.

15:
Update the PCF model H1 l t , H2 l t by the learning rate η t1 , η t2 .

16:
Update the scale model A t,scale , B t,scale by the learning rate η 0 .

Experiments
In this section, the implementation materials and parameter settings are described in detail. Comprehensive experiments are then performed on two benchmarks OTB-2013 [28] and OTB-2015 [29] to validate the effectiveness of the PCF tracker. Finally, the tracking failure cases of the proposed PCF tracker are analyzed briefly, and the future improvement works are further presented on this basis.
The results of these experiments have demonstrated that the proposed PCF tracker performs better than most of the state-of-the-art methods.

Implementation Details
All experiments are performed on the same desktop (equipping with INTEL i5-4590 CPU with 8G RAM). The PCF tracker presented in this paper is implemented in MATLAB R2016a. The relevant parameters of the PCF tracker are briefly described in Table 1. For parameter adjusting, the extensive parameter setting experiments are conducted on OTB-2013 benchmark. Figure 5 shows that the PCF tracker achieved the best OPE accuracy at the point (0.9, 0.5). Therefore, the learning rate η t2 of PCF2 is set to 0.5 in Equation (12), and the merging factor α of the two response maps is set to 0.9 in Equation (11). Following the baseline tracker [15], the learning rate η t1 of PCF1 is set to 0.009 in Equation (12). For the generative model of the training set mentioned in Section 3.2, the initial weight w 0 kp of the training sample is set to 1.0 and the number of the target samples m is set to 30. For the scale filter, the relevant parameters are set to the same values as the baseline tracker proposed in [19]. In order to make a fair comparison with the state-of-the-art trackers, the same parameters were used for all experiments on the benchmark data sets. the scale filter, the relevant parameters are set to the same values as the baseline tracker proposed in [19]. In order to make a fair comparison with the state-of-the-art trackers, the same parameters were used for all experiments on the benchmark data sets.

Ablation Experiments
To evaluate the effectiveness of progressively integrating the strategies proposed in this paper, the ablation experiments are conducted on OTB-2013 benchmark and the proposed PCF tracker is compared with the baseline tracker introduced in Section 3.1, the PCF with weighting fusion correlation response maps (PCF_WF tracker) proposed in Section 3.2 and the PCF with sharp output response (PCF_SR tracker) described in Section 3.3. The performance of these trackers is evaluated both on precision at different center location error thresholds and success rate at different overlap thresholds.

Ablation Experiments
To evaluate the effectiveness of progressively integrating the strategies proposed in this paper, the ablation experiments are conducted on OTB-2013 benchmark and the proposed PCF tracker is compared with the baseline tracker introduced in Section 3.1, the PCF with weighting fusion correlation response maps (PCF_WF tracker) proposed in Section 3.2 and the PCF with sharp output response (PCF_SR tracker) described in Section 3.3. The performance of these trackers is evaluated both on precision at different center location error thresholds and success rate at different overlap thresholds. Figure 6 explicitly shows the comparison results on OTB-2013 benchmark in terms of the precision plots (PP) and success plots (SP) of one pass evaluation (OPE). The proposed strategies of weighting fusion and sharp output response in this paper either obtain significant improvement compared to the baseline tracker. In terms of the OPE of location error threshold (LET) at 20 pixels, the PCF_WR tracker and PCF_SR tracker achieve a gain of 3.3% and 3.1% compared to the baseline tracker respectively. From the aspect of the OPE of areas under the curve (AUC), the PCF_WR tracker and the PCF_SP tracker obtain a gain of 1.9% and 1.7% contrasted to the baseline tracker respectively. It indicates that both the two strategies are effective. Overall, the proposed PCF tracker integrating these two strategies performs the best performance in terms of precision and success rate. Concretely, the PCF tracker achieves 89.2%, 67.5% in the OPE of LET at 20 pixels and AUC, and compared to the baseline tracker, the PCF tracker achieves a remarkable gain of 4.3% and 3.2%.   Table 2, comparing PCF_WF tracker with PCF_SR tracker, the former performs better than the latter in terms of IV, OPR, BC, DEF, and IPR. However, from the aspects of SV, OCC, MB, FM, OV, and LR, the latter outperforms the former. This indicates that the strategy of weighting fusion is more robust to significant appearance changes in the target, especially when the target undergoes deformation, illumination variation, and rotation, while the scheme of sharp response is more accurate to the problem of model drifting caused by distractors or blur. In general, Tables 2 and 3 indicate that the proposed PCF tracker acquires the best or the second results in terms of PP and SP of the OPE on all eleven attributes. Simultaneously, the PCF tracker runs an average speed of 42 FPS on all sequences of OTB-2013.    Table 2, comparing PCF_WF tracker with PCF_SR tracker, the former performs better than the latter in terms of IV, OPR, BC, DEF, and IPR. However, from the aspects of SV, OCC, MB, FM, OV, and LR, the latter outperforms the former. This indicates that the strategy of weighting fusion is more robust to significant appearance changes in the target, especially when the target undergoes deformation, illumination variation, and rotation, while the scheme of sharp response is more accurate to the problem of model drifting caused by distractors or blur. In general, Tables 2 and 3 indicate that the proposed PCF tracker acquires the best or the second results in terms of PP and SP of the OPE on all eleven attributes. Simultaneously, the PCF tracker runs an average speed of 42 FPS on all sequences of OTB-2013.

Experiments on OTB-2013
OTB-2013 is a common benchmark with 50 sequences that are divided into 11 different attributes: SV, IV, OPR, OCC, BC, DEF, MB, FM, IPR, OV, LR. On this challenge benchmark, the proposed PCF tracker is compared with 18 state-of-the-art trackers from the works: tracking-learning-detection (TLD) [2], distribution fields for tracking (DFT) [8], discriminative scale space tracking (DSST) [13], fast discriminative scale space tracking (FDSST) [13], spatially regularized discriminative correlation filters (SRDCF) [14], sum of template and pixel-wise learners (Staple) [16], compressive tracking (CT) [17], long-term correlation tracking (LCT) [18], locally orderless tracking (LOT) [19], least soft-threshold squares tracking (LSS) [21], visual tracking with online multiple instance learning (MIL) [22], scale adaptive kernel correlation filter tracker with feature integration (SAMF) [23], exploiting the circulant structure of tracking-by-detection with kernels (CSK) [24], high-speed tracking with kernelized correlation filters (KCF) [25], adaptive decontamination of the training set: A unified formulation for discriminative visual tracking (SRDCFdecon) [31], convolutional features for correlation filter-based visual tracking (DeepSRDCF) [32], Fully-convolutional Siamese networks for object tracking (SiamFC_3s) [36], object tracking via dual linear structured SVM and explicit feature map (DLSSVM) [37]. Only the ranks for the top 10 trackers are reported. Figure 7 clearly illustrates the PP and SP of the OPE on three different attributes: DEF with 19 sequences, OPR with 39 sequences and IV with 25 sequences. As is shown in Figure 7, the proposed PCF achieves the best results among the top 10 trackers in these attributes. Specifically, the PCF tracker obtains 91.3% and 69.6%; 88.4% and 65.9%, and 82.1% and 62.3% on attributes DEF, OPR, and IV, respectively. It indicates that the PCF tracker is more accurate and robust to the significant changes in the target appearance compared with the other top nine trackers. Furthermore, the proposed PCF tracker is very effective for handling the challenges of deformation, illumination variation, and out-of-plane rotation.
PCF achieves the best results among the top 10 trackers in these attributes. Specifically, the PCF tracker obtains 91.3% and 69.6%; 88.4% and 65.9%, and 82.1% and 62.3% on attributes DEF, OPR, and IV, respectively. It indicates that the PCF tracker is more accurate and robust to the significant changes in the target appearance compared with the other top nine trackers. Furthermore, the proposed PCF tracker is very effective for handling the challenges of deformation, illumination variation, and out-of-plane rotation.  (e) (f) The overall results on OTB-2013 benchmark among the top ten trackers are illustrated in Figure  8. Contrasted to the second tracker deepSRDCF-based on deep features, the proposed PCF tracker obtains the best ranks of 89.2%, 67.5% on the PP and SP of the OPE, and achieves a visibly gain of 3.9%, 7.5% in the PP and SP of the OPE respectively. Among the compared trackers employing handcrafted features, deep features, or combining these two features, the proposed PCF tracker achieves the best ranks and simultaneously runs a real-time speed of 41 FPS on a CPU. Furthermore, Tables 4 and 5 explicitly shows the SP and PP of the OPE for the top 10 trackers on 11 challenge attributes respectively. As is demonstrated in Tables 4 and 5, the PCF tracker outperforms the other top nine trackers on 10 out of 11 attributes of the OPE and obtains the best average precision (AP) and areas under the curve (AUC). It validates that the proposed sharp response strategy can effectively prevent the model from drifting and the proposed weighting fusion strategy is also very efficient to track the changes in the target appearance. Both strategies bring remarkable improvements in terms of accuracy and robustness. Besides, qualitative experiments are conducted on this benchmark and the results are reported in Figure 9. It further indicates that the proposed PCF The overall results on OTB-2013 benchmark among the top ten trackers are illustrated in Figure 8. Contrasted to the second tracker deepSRDCF-based on deep features, the proposed PCF tracker obtains the best ranks of 89.2%, 67.5% on the PP and SP of the OPE, and achieves a visibly gain of 3.9%, 7.5% in the PP and SP of the OPE respectively. Among the compared trackers employing handcrafted features, deep features, or combining these two features, the proposed PCF tracker achieves the best ranks and simultaneously runs a real-time speed of 41 FPS on a CPU. Furthermore, Tables 4 and 5 explicitly shows the SP and PP of the OPE for the top 10 trackers on 11 challenge attributes respectively. As is demonstrated in Tables 4 and 5, the PCF tracker outperforms the other top nine trackers on 10 out of 11 attributes of the OPE and obtains the best average precision (AP) and areas under the curve (AUC). It validates that the proposed sharp response strategy can effectively prevent the model from drifting and the proposed weighting fusion strategy is also very efficient to track the changes in the target appearance. Both strategies bring remarkable improvements in terms of accuracy and robustness. Besides, qualitative experiments are conducted on this benchmark and the results are reported in Figure 9. It further indicates that the proposed PCF tracker can accurately track the target with deformation, rotation, and illumination variation.

Experiments on OTB-2015
OTB-2015 is a more challenging benchmark than OTB-2013, including 100 videos with 11 different attributes: SV, IV, OPR, OCC, BC, DEF, MB, FM, IPR, OV, LR. The proposed PCF tracker is evaluated with 18 state-of-the-art trackers on this benchmark from the works: TLD [2], Incremental learning for robust visual tracking (IVT) [4], DFT [8], DSST [13], FDSST [13], SRDCF [14], Staple [16], CT [17], LCT [18], LOT [19], LSS [21], MIL [22], SAMF [23], CSK [24], KCF [25], SRDCFdecon [31], DeepSRDCF [32], and DLSSVM [37]. Only the ranks for the top 10 trackers are reported. Figure 10 reports the PP and SP of the OPE determined by three challenges attributes: OPR with 63 videos, IV with 38 videos, and DEF with 44 videos. As is clearly illustrated in Figure 10, the proposed PCF provides the best results among the top 10 trackers in all three attributes. More particularly, the PCF tracker obtains 85.1% and 62.6%; 82.5% and 63.6%, and 83.0% and 62.5% on attributes OPR, IV, and DEF respectively. It again validates the effectiveness of the proposed PCF tracker for handling the issues of the significant changes in the target appearance.  (e) (f)  Figure 11 shows the overall results for the top ten trackers on OTB-2015 benchmark. Among these compared trackers, the proposed PCF tracker obtains the best ranks in PP and SP of the OPE including the AP scores of 86.3% and the AUC scores of 64.7%. In addition, attributes-based evaluations are conducted on this benchmark. The results of this experiment are reported in Tables 6  and 7. Specifically, the proposed PCF tracker achieves the top scores in terms of PP on 9 out of 11 attributes as shown in Table 7, and obtains the best ranks in terms of SP on all 11 attributes as demonstrated in Table 7. As is demonstrated explicitly in Tables 6 and 7, the proposed PCF tracker obtains the best AP and AUC scores compared to the other top nine trackers, at the same time, the PCF tracker achieves a real-time performance running about 41 FPS on a CPU. It indicates the superior tracking performance for all 11 attributes. In general, the proposed tracker achieves a substantial improvement of the other top nine trackers in terms of accuracy, robustness and real-time performance. Furthermore, the qualitative experiments are also conducted in all videos of OTB-2015 benchmark. The results for the four representative videos are illustrated in Figure 12. Among the compared trackers, the proposed PCF tracker significantly outperforms the other top nine trackers in terms of location and scale estimation. It again validates that the effectiveness of the proposed PCF tracker when the target undergoes the situations of deformation, rotation, and illumination variation.  Figure 11 shows the overall results for the top ten trackers on OTB-2015 benchmark. Among these compared trackers, the proposed PCF tracker obtains the best ranks in PP and SP of the OPE including the AP scores of 86.3% and the AUC scores of 64.7%. In addition, attributes-based evaluations are conducted on this benchmark. The results of this experiment are reported in Tables 6 and 7. Specifically, the proposed PCF tracker achieves the top scores in terms of PP on 9 out of 11 attributes as shown in Table 7, and obtains the best ranks in terms of SP on all 11 attributes as demonstrated in Table 7. As is demonstrated explicitly in Tables 6 and 7, the proposed PCF tracker obtains the best AP and AUC scores compared to the other top nine trackers, at the same time, the PCF tracker achieves a real-time performance running about 41 FPS on a CPU. It indicates the superior tracking performance for all 11 attributes. In general, the proposed tracker achieves a substantial improvement of the other top nine trackers in terms of accuracy, robustness and real-time performance. Furthermore, the qualitative experiments are also conducted in all videos of OTB-2015 benchmark. The results for the four representative videos are illustrated in Figure 12. Among the compared trackers, the proposed PCF tracker significantly outperforms the other top nine trackers in terms of location and scale estimation. It again validates that the effectiveness of the proposed PCF tracker when the target undergoes the situations of deformation, rotation, and illumination variation.       The results for the top ten trackers are marked in different colors. In these challenging videos, the proposed PCF tracker performs better than the other top nine trackers. Figure 13 shows the failure cases of the proposed PCF tracker. In the first row, the target underwent complete occlusions for long spans of time which caused the PCF tracker to drift off the target. While the LCT tracker can still track the target because of the re-detection strategy [18]. In the second row, the target underwent heavy occlusion with background clutters which also resulted in the PCF tracker failure to track the target. However, the trackers with deep features (e.g., deepSRDCF and SRDCFdecon) have high robustness in this situation. Furthermore, in the field of multi-object tracking, the top-down Bayesian formulation proposed in the work [41] can also solve the problems of occlusion and background clutters effectively. Hence, in the future works, the works [14,18,31,41] will be analyzed in detail and the strategies of these works will be merged into the proposed PCF tracker to address these issues.  The results for the top ten trackers are marked in different colors. In these challenging videos, the proposed PCF tracker performs better than the other top nine trackers. Figure 13 shows the failure cases of the proposed PCF tracker. In the first row, the target underwent complete occlusions for long spans of time which caused the PCF tracker to drift off the target. While the LCT tracker can still track the target because of the re-detection strategy [18]. In the second row, the target underwent heavy occlusion with background clutters which also resulted in the PCF tracker failure to track the target. However, the trackers with deep features (e.g., deepSRDCF and SRDCFdecon) have high robustness in this situation. Furthermore, in the field of multi-object tracking, the top-down Bayesian formulation proposed in the work [41] can also solve the problems of occlusion and background clutters effectively. Hence, in the future works, the works [14,18,31,41] will be analyzed in detail and the strategies of these works will be merged into the proposed PCF tracker to address these issues. The results for the top ten trackers are marked in different colors. In these challenging videos, the proposed PCF tracker performs better than the other top nine trackers. Figure 13 shows the failure cases of the proposed PCF tracker. In the first row, the target underwent complete occlusions for long spans of time which caused the PCF tracker to drift off the target. While the LCT tracker can still track the target because of the re-detection strategy [18]. In the second row, the target underwent heavy occlusion with background clutters which also resulted in the PCF tracker failure to track the target. However, the trackers with deep features (e.g., deepSRDCF and SRDCFdecon) have high robustness in this situation. Furthermore, in the field of multi-object tracking, the top-down Bayesian formulation proposed in the work [41] can also solve the problems of occlusion and background clutters effectively. Hence, in the future works, the works [14,18,31,41] will be analyzed in detail and the strategies of these works will be merged into the proposed PCF tracker to address these issues.

Conclusions
In this paper, a novel PCF framework was proposed to address the issues of target appearance changes and model drifting. The proposed method constructed two parallel correlation filters with different learning rate. The correlation filter with the bigger learning rate was proposed to track the appearance changes in the target. The correlation filter with the smaller learning rate was applied to track the location of the target. Through weighting the response maps of the two parallel correlation filters, the position and scale of the target can be estimated accurately. Furthermore, a new reasonable distribution of correlation response that organically merges the Gaussian distribution and the triangle distribution was proposed to replace the original Gaussian distribution to prevent the model from drifting to achieve higher tracking accuracy and robustness. Extensive qualitative and quantitative evaluations on serval common benchmarks have demonstrated the competitive accuracy, robustness, and the superior tracking speed performance of the proposed PCF tracker compared to the state-of-the-art trackers. After analyzing the failure cases of the proposed PCF tracker, a new re-detection strategy will be studied in detail to improve the disadvantages of heavy or complete occlusions in visual tracking.