Minimum Barrier Distance-Based Object Descriptor for Visual Tracking

: In most visual tracking tasks, the target is tracked by a bounding box given in the ﬁrst frame. The complexity and redundancy of background information in the bounding box inevitably exist and affect tracking performance. To alleviate the inﬂuence of background, we propose a robust object descriptor for visual tracking in this paper. First, we decompose the bounding box into non-overlapping patches and extract the color and gradient histograms features for each patch. Second, we adopt the minimum barrier distance (MBD) to calculate patch weights. Speciﬁcally, we consider the boundary patches as the background seeds and calculate the MBD from each patch to the seed set as the weight of each patch since the weight calculated by MBD can represent the difference between each patch and the background more effectively. Finally, we impose the weight on the extracted feature to get the descriptor of each patch and then incorporate our MBD-based descriptor into the structured support vector machine algorithm for tracking. Experiments on two benchmark datasets demonstrate the effectiveness of the proposed approach.


Introduction
Object tracking is an important issue for video analysis in the field of computer vision, with wide-ranging applications including surveillance, human-computer interaction and medical imaging.Given the first frame including the target location and the size of the bounding box, the object tracking task is to estimate the location of the target in the current frame without the target motion information in advance [1].In recent years, along with many new tracking algorithms emerging, the performance of visual tracking has been greatly promoted.However, object tracking is still a challenging issue due to many problems existing in the tracking process such as illumination variation, occlusion, large deformations and background clutters, which are not solved well.Therefore, object tracking still needs to be investigated.In this paper, we tackle the complex background challenges in the bounding box caused by target occlusion, large deformations or background clutters in visual tracking.
As an important branch in visual tracking, many tracking-by-detection algorithms [2][3][4][5][6][7][8][9] have attracted much attention.The idea of tracking-by-detection algorithms is regarded as finding target localization as a classification problem.The classifier is trained with positive and negative samples corresponding to foreground and background areas, respectively.During the tracking process, the object always has occlusion, deformation or size variation, leading to the bounding box containing too much background area, which brings a negative influence to the classifier updating.To reduce the effect of an enlarged background, a new representation for the target was proposed, called the spatially-ordered and weighted patch descriptor (SOWP) [10], which conveys structural information of the object within the bounding box.However, SOWP uses a calculation method similar to Euclidean distance to calculate the similarity between two patches, as well as many current tracking algorithms [6,[11][12][13], which makes it difficult to distinguish the target from the background when there are some appearance similarities brought by background blur, occlusion or fast motion in the bounding box.
In order to improve the above problem and considering that the minimum barrier distance (MBD) [14] is more robust to pixel value fluctuation caused by motion or noise, we construct a robust object representation by applying MBD to calculate the weights of patches.Given the bounding box of a target in the search area, we divide it into non-overlapping patches and describe them with a combination of an RGB (Red-Green-Blue) color histogram and a histogram of oriented gradients (HOG) feature.Then, we expand the bounding box outwards, and the boundary patches in the extended bounding box are regarded as the seed patches according to the appearance-based backgroundness cue [15,16].The image boundary prior assumes that most of the image boundary area is background [15], so we set the boundary patches as the background seed set.Instead of generating an affinity matrix by calculating the similarity between patches, we construct the MBD-based descriptor through a distance transform map for visual tracking.Specifically speaking, we use each patch as a node and two adjacent nodes to form an edge and obtain a distance transform map by iteratively calculating the MBD from each patch to the seed set.The weight of each node represented by the value of the MBD describes how likely this patch belongs to the target.The larger the value of the MBD, the smaller the similarity of the current patch to the boundary patch and the greater the probability that the current patch belongs to the target.
As shown in Figure 1, the target in the sequence "Skating2-1" is a female skater, and it is obvious that the video includes fast motion and deformation due to the rapid movement of the skater.The bounding box in the result of SOWP [10] and contains a large number of background regions (male skaters) in Frame 136, resulting in false tracking and the loss of real targets at Frame 396.The athletes in the sequence "Bolt2" are almost similar to each other in appearance, and their appearance changes significantly due to background clutter and deformation, which leads to SOWP [10], ACFN (Attentional Correlation Filter Network) [18] and MEEM (Multiple Experts Using Entropy Minimization) [19] mistakenly tracking other athletes; meanwhile, the trackers MUSTer (MUlti-Store Tracker) [20] and TLD (Tracking-Learning-Detection) [21] drift.It is easy to see that our algorithm achieves accurate tracking under these challenges because MBD is used for calculating the similarity between the target and the background.
The main contribution of this paper is that we present a novel MBD-based descriptor for visual tracking.Specifically, we propose to apply the MBD to calculate the weights of patches to construct the object descriptor for visual tracking, since MBD is more robust to pixel value fluctuation caused by motion or noise.Then, we adopt the boundary patches on each side in the bounding box as labeled background queries and calculate the MBD between the image patches and the seed patches; thus, the foreground target can be highlighted, and the background noise can be suppressed.Extensive experiments demonstrate that our proposed MBD-based descriptors can achieve more robust performances against other recent visual trackers when confronting fast motion and background clutter, as well as reduce drift effectively.
The rest of this paper is organized as follows: Section 2 reviews the background and some state-of-the-art or classic methods related to our method.We will give a detailed introduction of our proposed algorithm in Section 3. Section 4 discusses experimental results including qualitative analysis and quantitative analysis.We draw the conclusion of this paper in Section 5.

Related Work
Visual tracking is an important issue in computer vision, as it is the foundation of a high level visual task.The current tracking algorithms can be divided into two major categories: generative models [22][23][24][25] and discriminative models [26][27][28][29][30][31].By describing the target appearances using generative models such as subspace models, generative methods search for regions most similar to the target object and updating the appearance model dynamically to solve object tracking.In general, the generative models focus on the characterization of the target and ignore the background information; hence, they are prone to drift when the target changes dramatically and often fail in a cluttered background [21].In contrast, discriminative trackers build a binary classifier, which focuses on differentiating the target from the background [32].Belonging to discriminative trackers, adaptive tracking-by-detection approaches build a classifier during tracking and update the classifier using generated binary labeled training samples around the current object location [21].To avoid the label errors during the classification step, the kernelized structured output support vector machine (SVM) has been used in [33] for adaptive visual object tracking.However, this method uses simple low-level features and has low efficiency at handling occlusion.Therefore, generating correct labeled training samples is very important for the classifier updating, and designing a good sample descriptor is crucial for distinguishing positive or negative samples clearly.
As we mentioned above, tracking-by-detection approaches almost perform not well when there is complex background information in the bounding box, which inevitably exists, caused by occlusion, deformation, size variation and background clutters.Many works have been proposed to improve the robustness of trackers to handle the influence of these challenging problems.Lu et al. [34] proposed a pixel-wise appearance model called pixel-wise spatial pyramid, which employs the pixel feature vector to combine several features.This pixel-wise spatial pyramid contributes to the accurate location of the tracked object, especially for drastic appearance change.Wei et al. [35] proposed a locality sensitive histogram, taking into account the contributions from each pixel and facilitating robust multi-region tracking.However, these pixel-based trackers failed to handle occlusion and cluttered background due to pixel-wise appearance models lacking structural information.
For improving some pixel-based trackers, superpixel-based trackers [36][37][38] then came out to handle non-rigid and deformable targets, by segmenting images into superpixels and providing the semantic structure of the target.Wang and Yang et al. [36,39] built the superpixel-based appearance models, with some effects on object occlusion and deformation.However, trackers based on superpixels are complex and have difficulty dealing with cluttered backgrounds, which often lead to unstable results.In recent years, patch-based tracking methods have received more attention, because patch-based trackers can describe the structure information of the object better, and they often use different descriptors to represent the object [10,12,40].As a new and effective method, SOWP [10] decomposed the object bounding box into multiple patches, and the patch weights were calculated by performing random walk with restart simulations.The weighted patch descriptor [10] is more flexible for partial occlusion and appearance changes of the object.Similar to SOWP, some methods [41,42] weight the sample with patch likelihoods in particle filters, and [43] used adaptive weighting in patch-based correlation filters to mitigate the effect of occlusion.However, when the object is occluded for a long time, SOWP still has drift, and it is difficult to distinguish target and background when there are some appearance similarities brought by background blur or target occlusion in the bounding box.In this paper, we propose a novel approach to solve the appearance similarity measure problem by minimum barrier distance weighting.Experiments have shown that the performance of our method has a better robustness than many other visual trackers.

Overview
We propose a novel tracking algorithm named the MBD-based descriptor for tracking.Given a candidate bounding box of a target in a search region at the current frame, we first divide it into n * n patches and expand the bounding box to (n + 2) * (n + 2) patches, as shown in Figure 2b.Each patch is described with a combination of RGB color histogram and HOG feature.Then, in each frame, an MBD map is constructed for each bounding box, and we calculate the MBD from each patch to the seed set, which is represented as S, and generate an MBD transform map by performing the raster scan, as well as the MBD of each patch corresponding to the weight of the patch, which reflects the difference between target and background.Finally, we combine MBD with corresponding patch features to construct a robust object descriptor and evaluate the descriptor on structured SVM to carry out the tracking.The pipeline of our algorithm is shown in Figure 2.

Patch-Based Representation
First, we divide the bounding box into n * n patches, and different gridding sizes from 7 * 7 to 10 * 10 are tried in the proposed method.We decided to divide the bounding box into 64 patches as a trade-off between accuracy and efficiency, i.e., the parameter n is 8. Since the results will become worse if adopting other gridding sizes bigger or smaller than 8 * 8, a patch should have the proper size to make the tracker more robust.Each patch is characterized with a d-dimensional feature vector through a combination of the RGB color histogram and HOG feature.Then, the feature descriptor [10] for a bounding box in the t-th frame is given by: where r represents a bounding box in the t-th frame z t and f T i is the feature vector of each patch.ψ(z t , r) is a simple feature descriptor for r.To assign a weight for each patch, we adopt the MBD to calculate it.By considering these patches as nodes V and the 4-neighbor links between patches as edges E, the bounding box in the current frame can be represented by a graph G(V, E).The MBD from a patch to seed set S reflects the difference between the patch with background.Therefore, our next step is to calculate the MBD between each patch and the boundary patches, while the MBD of each boundary patch is set to 0. Then, the distance is regarded as the weight of each patch, which describes how likely it belongs to the target object.

MBD-Based Patch Weighting
The MBD is defined by the path cost function.It is the minimum value of the barrier strength of the path between two nodes in the path [14].The difference between MBD and a common distance function is that the length of the path may remain constant during the growth of the path, until a new, stronger barrier is encountered on the path.The MBD calculates the distance value in a digital image effectively.MBD has many interesting theoretical properties.It has proven to be a potentially useful tool in image processing [44].When considering the variation of the seed position and the introduction of noise, it is advantageous compared to fuzzy and geodesic distance [45].
In the graph G(V, E), we set a = π(0), c = π(k) where π(i) represents patch i.Then, a path π =< π(0), π(1), ..., π(k) > on the graph refers to a sequence of patches along a to c, and this sequence contains any path < π(i − 1), π(i) >, where π(i) and π(i − 1) are adjacent < π(i − 1), π(i) >∈ E, i ∈ {1, ..., k}.We use Π s,c to denote all the paths connecting the seeds to c, and f (π) indicates the path cost on the path π.Then, we calculate the distance D(c) for node c to the seeds and update the distance map with the following formula: The definition of the function f depends on the image intensity and differs for different methods.In our work, to calculate the barrier distance, we use G(π(i)) to denote the path cost connecting S to node i on graph G, and the cost function of distance transform [46] is defined as: where β G (π) returns the MBD of path π, which is the formula for the MBD transform.Similar to the raster scan algorithm used for the geodesic distance transform [46], we need to access each patch in the raster scan or reverse raster scan in order.In the raster scan method, the distance values are propagated by sequentially scanning the image in a predefined sequence.That is, each line is scanned, first from the upper left to the lower right, and then from the lower right to the upper left until convergence (see the illustration in Figure 3).For example, when visiting a node x in a raster scan order, for each node b in the mask area of the current node x as shown in Figure 3b, we compute the MBD β G (π x ∪ b) of x according to Equation (4).
where H(b) and L(b) represent the highest and lowest values on b, respectively, which are used to ensure the highest and lowest values of the cost on the current path, and H and L are auxiliary maps.
Given the image G, the seed set S is set to 0, and D(x) is initialized to ∞. D(x) refers to the path cost of MBD for each path from the node x to S.Then, the distance map D and the auxiliary maps are updated in a raster scan pass and an inverse raster scan pass, until the number of iterations is k.The MBD is updated when the value of path costs is greater than the current maximum or less than the current minimum in the path [46,47].
where π x refers to the path from x to S, b is the adjacent node of x and π x ∪ b represents the path from the node x to S and through the node b.For example, a path connects the node b to the seed set S, and the edge < b, x > is appended to the path π b to form a new edge, which is denoted as  As shown in Figure 3, one raster scan will only access the two neighbors of the current node since graph G is a map of the 4-neighbor.π x1 indicated by the green arrow represents a path from current node x to S and through its upper neighbor b 1 , and π x2 represents the path of the current node to S and through its left neighbor b Finally, we can obtain the enhanced object descriptor by multiplying the feature descriptor with the corresponding patch weight as follows: update D(x) according to Equation (5); end for 19: end for

Structured SVM Tracking
In this section, we incorporate our descriptor into the conventional tracking algorithm, structured SVM [4], as Struck [4] excels in the recent benchmark according to [1,10].Firstly, we set a searching window around the bounding box obtained in the previous frame.Then, in the searching window, our MBD-based descriptors for candidate bounding boxes are incorporated into structured SVM to estimate the maximum classification score and select the current bounding box standing for the object.Finally, we may need to update the classifier and proceed to the next frame.Specifically, Struck estimates the object bounding box r t in the t-th frame z t by maximizing a classification score.To avoid the model drift, we fuse the model of the first frame and the current frame in our algorithm.The bounding box r t , which can maximize a classifier score, is the object, as follows: where w is the normal vector of a decision plane and w 0 is learned in the initial frame.
ψ(z t , r) represents the weighted feature descriptor for bounding box r in frame t, and γ is a tradeoff parameter.Maximizing a classifier score by evaluating the samples represented with weighted features can obtain the best classification.We adopt the good strategy mentioned in SOWP [10] to detect abrupt changes of object appearance, and we update the classifier when the confidence score ρ of the tracking result is greater than a threshold θ.A confidence score ρ is given by: where s is a positive support vector and S t is the set of s at time t.The confidence score ρ in the t-th frame refers to the average similarity between the MBD descriptor of the tracked bounding box and the positive support vectors.Therefore, the confidence score measures the reliability of the object in the bounding box in the current frame.
We estimate the target scale by training a scale classifier on a scale pyramid [48].This allows us to estimate the target scale independently after the optimal position is found.p * q stands for the object size in the current frame, and M is the size of the scale filter.Then, we extract an image patch with the size J m centered around the target as follows: In Formula ( 9), a denotes the scale factor between feature layers, m ∈ {− M−1 2 , ..., M−1 2 }.The value of the training example at scale level m is set to the d-dimensional feature descriptor of J m [48].Finally, the new sample is used to update the scale filter.

Experimental Results
The proposed tracker with MBD-based descriptor was implemented in C++ on an Intel Core i7-6700K (Intel Corporation, Santa Clara, CA, USA) 3.40 GHz CPU with 8 GB RAM and run at 4.7 FPS (frames per second).We evaluated our tracker on the OTB-100 (object tracking benchmark) dataset [17] and TColor-128 (Temple Color) dataset [49].The experimental results demonstrate the effectiveness of our algorithm through qualitative analysis and quantitative analysis.For a fair comparison, the parameter γ was empirically set as 0.395, and a = 1.052,M = 33, θ = 0.3, k = 3 for all the sequences in the experiment.

Evaluation Method
We evaluated our tracker on the OTB-100 dataset [17] and TColor-128 dataset [49].In these datasets, precision rate (PR) and success rate (SR) were adopted to evaluate the quantitative performances of trackers.The precision shows the ratio of frames that the distance between the estimated bounding box and the given ground-truth is smaller than a threshold.OTB-100 uses the score when the threshold is set to 20 pixels as the representative precision rate score for each tracker.By measuring the overlap ratio between the estimated bounding box and ground-truth, success rate counts the number of successful frames whose overlap ratio is larger than the given threshold 0.5.The area under curve (AUC) of each success plot is adopted to rank the tracking algorithms.

Qualitative Evaluation
We will first perform the qualitative analysis.Comparing with SOWP [10], we found that our algorithm can track the target accurately in many challenging sequences, while the SOWP failed.As shown in Figure 4, the "Basketball", "Car4", "Freeman1", "Girl2", "Lemming", "Suv" and "Human3" sequences contain occlusion, background clutters, scale variation and deformation, for which many trackers cannot track the given object accurately.In this work, we compare our tracker with SOWP on OTB-100 and give some results in Figure 4.The illustration shows that the proposed approach effectively handled scenes' fast motion ("Basketball" and "Lemming"), background clutters ("Car4" and "Basketball"), occlusion ("Freeman1", "Suv" and "Human3") and deformation ("Basketball", "Girl2" and "Human3").For example, SOWP lost the object on the sequence "Basketball", due to sudden illumination variation and another athlete with a similar appearance.In contrast, our tracker could track the target successfully.SOWP also did not perform well on the sequence "Lemming".When the object reappeared in frame 901, SOWP failed to find it, but our tracker could track it steadily.The performance of SOWP was not satisfactory for sequence "Freeman1" due to background blur.It considered some background as a part of the target and led to inaccurate tracking results.It is easy to find that our algorithm could achieve better results on these challenging videos due to the utilization of weighted patches with the MBD.In addition to the 29 trackers given by OTB-100, we also compared the proposed algorithm with some recent proposed tracking methods, and Table 1 shows the results of a comparison between our tracker and nine recent trackers, including SOWP [10], ACFN [18], MEEM [19], MUSTer [20], LCT (Long-term correlation tracking) [50], DSST (Discriminative Scale Space Tracker) [48], KCF (Kernelized Correlation Filters) [5], Struck [4] and TLD [21], among which ACFN [18] is a deep learning method.We used the PR/SR scores proposed in OTB-100, and the results of the trackers were obtained by running the published codes.We found that our tracker performed not well especially for the low resolution (LR) sequences, as shown in Table 1, because our algorithm took boundary patches as seeds in the bounding box, and the low resolution or heavy background clutter would influence the initialization of seeds, which led to unsatisfactory results in these cases.However, we observed that our tracker performed favorably against the other state-of-the-art trackers, and it achieved the best overall performance and performed well in most of the attributes.
In order to evaluate the effectiveness of the proposed MBD-based object descriptor, we compared our algorithm with recent trackers and replaced the MBD with different distance metrics in Table 2.The results in Table 2 indicate that our algorithm performed well compared to several state-of-the-art algorithms including MemTrack (Dynamic Memory Networks) [51], scaleDLSSVM (Dual Linear Structured SVM) [29], DCFNet (Discriminant Correlation Filters Network) [52], CNN-SVM (Convolutional Neural Network) [8], SiamFC-3s (Fully-Convolutional Siamese Networks) [53] and SINT++ (Siamese Network) [2].Of these trackers, MemTrack [51] and SINT++ [2] are neural network-based methods; scaleDLSSVM [29] and CNN-SVM [8] are tracking-by-detection-based methods; DCFNet [52] and SiamFC-3s [53] are correlation filter-based methods.In addition to the latest tracker MemTrack, which obtained comparable results to ours, our method was better than other state-of-the-art algorithms.OURS-Eu, OURS-L1 and OURS-KL are the results of replacing our MBD with Euclidean distance, L1 distance and KL divergence, respectively.To verify the effectiveness of MBD-tuned weights, we also created a version of our algorithm with random weights subjected to Gaussian distribution, named OURS-Ga.Our method also gets better results than OURS-Eu, OURS-L1, OURS-KL and OURS-Ga, which verifies the effectiveness of MBD-tuned weighting.
We rank the top 16 trackers by running the one-pass evaluation (OPE) on the OTB-100 as shown in Figure 5. MemTrack [51] is the latest method based on a deep network and obtained comparable results to ours.Our tracker (0.835/0.595) achieved performance gains of 5.4% in precision and 6.5% in success rates over MEEM [19] (0.781/0.530), even if MEEM may have recovered from the corrupted classifier by utilizing a multi-expert tracking framework.Our tracker outperformed the tracker SOWP [10] (0.803/0.560) with 3.2%/3.5% in precision and success rates, respectively.That means that our method had a more robust descriptor than SOWP, which also used patch descriptors.We present the precision plots and success plots with eight different attributes in Figure 6 such as DEF, OPR, SV, OCC, OV, MB, FM and BC.We can see from the illustration that our algorithm outperformed on deformation compared with other trackers; especially, our tracker was better than the second tracker CNN-SVM [8] by 2% and 1.6% in precision and success rates on DEF, respectively.At the same time, our tracker achieved performance gains of 5.7% in the PR score and 3.9% in the SR score over SOWP [10] on FM.Compared with SOWP adopting the weighted descriptor, it shows that the MBD-based weighted patches performed more robustly with the fluctuation of node values.The results in Figure 6 show that the hard positive sample generated by training the hard positive transformation network in SINT++ [2] was advantageous for OV, while our methods outperformed SINT++ in the other seven attributes.The latest MemTrack [51] obtained the best results on most of the attributes due to the superiority of deep networks.The MUSTer [20] tracker achieved better results over ours for BC by exploiting both the short-term and long-term systems to process the image with the target being tracked, but its overall performance was not so robust as ours.ACFN [18] achieved better results for SV in the success plots than our method, but our algorithm outperformed against the ACFN on the other attributes including OCC, FM and BC.This shows that our proposed MBD-based weighted descriptor could locate the target object in a complex background even without strong features.These experiments validate the effectiveness of the proposed algorithm.
We compared our tracker with real-time part-based visual tracking (RPVT) [43] with dynamic weighting.RPVT tested its method on 16 selected challenging tracking sequences in OTB-50 [1] to show its ability for handling occlusion, scale and appearance changes.We have also run our method on the same sequences and adopted the average center location error and average overlap rate the same as in that paper, and the results are shown in Tables 3 and 4. In general, RPVT obtained better results than ours for these 16 tracking sequences.However, our tracker performed better on the most of the other 34 sequences in Table 5.This evinces that RPVT was better for handling occlusion, scale and appearance changes, while our tracker was good at the challenges including fast motion, background clutter and deformation.
We also compared our tracker with the occlusion probability mask-based tracker (OPM) [41].We ran our method on the same sequences and adopted the same average center location error for comparing with this method, and the results are shown in Table 6.OPM selected eight sequences from OTB-100 in its experiment.We can see from Table 6 that OPM performed better than ours on these sequences, and this shows that OPM had advantages in dealing with some sequences with occlusion; however, our tracker performed better for the sequences with illumination variation and scale variation.TColor-128 [49] contains a large set of 129 color sequences with ground truth and challenge factor annotations.Compared with OTB-100, TColor-128 contains more challenging sequences and is more difficult to track.We evaluated our tracker on the TColor-128 dataset and show the evaluation results between our tracker and recent trackers including Staple [54], DGT (Dynamic Graph for Visual Tracking) [9], MEEM [19], Struck [4], KCF [5], ALSA ( Adaptive Structural Local Sparse Appearance Model) [55] and LCT [50] in Figure 7.

Evaluation on Challenges
Occlusion is a difficult issue as the object might be occluded by other objects or disappear from the scene.Inspired by [56,57], we evaluated the proposed algorithm on motion blur and different levels of occlusion and background clutter, as shown in Table 7.We divided these challenges into partial occlusion (PO) [57], heavy occlusion (HO) [57], slight background clutter (SBC), heavy background clutter (HBC), motion blur (MB) and deformation (DEF).We have found 84 sequences in total with occlusion in OTB-100 and TColor-128 datasets, including 41 partial occlusion and 43 heavy occlusion sequences.Almost all the targets in these sequences with heavy occlusion were completely occluded and disappeared.We also found 31 sequences with slight background clutter, 27 sequences with heavy background clutter, 53 sequences with motion blur and 63 sequences with deformation in OTB-100 and TColor-128.Our proposed method was more robust to PO, SBC and DEF, as shown in Table 7.This shows that the MBD-based object descriptor could effectively alleviate the influence of background information in the bounding box.ACFN [18] performed best for handling HBC due to its powerful deep features.As mentioned in Section 3.1, we regard boundary patches of bounding box as seeds.If the boundary patches have heavy blur, they will affect the initialization of the seeds and lead to our algorithm not performing best on MB.In our future work, we will improve on this shortcoming to make our tracker more robust to these challenges.

Conclusions
In this paper, we propose an effective descriptor named the minimum barrier distance-based object descriptor to obtain more accurate target object representation for visual tracking.We adopt MBD to calculate the reliable patch weights iteratively to highlight the foreground target and suppress the background noise.Finally, we incorporate the feature descriptor constructed by combining patch weights with patch features into the structured SVM framework and achieve an accurate visual tracking result.Experiments on two benchmark datasets demonstrate the effectiveness of the proposed approach against many other visual trackers.

Figure 2 .
Figure 2. The pipeline of our proposed algorithm.(a) A frame with some candidate bounding boxes in a search region around the position from the previous frame.(b) An extended bounding box and the bounding box shown in red.(c) The minimum barrier distance (MBD) map of the bounding box in the current frame.The MBD of boundary patches in the extended bounding box is set to 0 (the dark blue area) because they are seeds.(d) The MBD map of the bounding box in the first frame.(e) The bounding box obtained by our tracker.Structured support vector machine (SVM) is learned online to provide adaptive tracking.
update H(x) to max{H(b), G(x)}, L(x) to min{L(b), G(x)}, and then, visit the next node.We summarize the MBD transform in Algorithm 1, and k is the number of iterations in the raster scan.

Figure 3 .
Figure 3. Illustration of the raster scan pass and the inverse raster scan pass.The node x is the current visited patch and its masked neighbor area for 4-adjacency is denoted as b.

Figure 4 .
Figure 4.The visualization of the results from our tracker and SOWP [10] on several challenging sequences from OTB-100.The red bounding box is ours, and the green is SOWP.

Figure 5 .
Figure 5.The evaluation results on OTB-100.The left shows precision plots, and the right is the success plots of one-pass evaluation (OPE).

Figure 6 .
Figure 6.The evaluation results of precision rate (PR) and success rate (SR) on eight attributes.

Figure 7 .
Figure 7.The evaluation results on TColor-128 (Temple Color).The left shows the precision plots, and the right is the success plots.

Table 6 .
[41]age center location error comparison with occlusion probability mask-based tracker (OPM)[41]on some sequences in OTB-100.The red number is the best result.

Table 7 .
The PR/SR score comparison of our tracker with recent state-of-the-art trackers on some challenges of motion blur and different levels of occlusion and background clutter: partial occlusion (PO), heavy occlusion (HO), slight background clutter (SBC), heavy background clutter (HBC), motion blur (MB) and deformation (DEF).The red number is the best result.