Next Article in Journal
Nighttime Image Dehazing Based on Point Light Sources
Previous Article in Journal
Energy Ratio Variation-Based Structural Damage Detection Using Convolutional Neural Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Adaptive Dynamic Multi-Template Correlation Filter for Robust Object Tracking

Institute of Electrical and Control Engineering, National Yang Ming Chiao Tung University, No. 1001, Daxue Rd. East Dist., Hsinchu City 300093, Taiwan
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(20), 10221; https://doi.org/10.3390/app122010221
Submission received: 24 July 2022 / Revised: 1 October 2022 / Accepted: 3 October 2022 / Published: 11 October 2022

Abstract

:
In the field of computer vision and robotics, scholars use object tracking technology to track objects of interest in various video streams and extend practical applications, such as unmanned vehicles, self-driving cars, robotics, drones, and security surveillance. Object tracking is a mature technology in the field of computer vision and robotics; however, there is still no one object tracking algorithm that can comprehensively and simultaneously solve the four problems encountered by tracking objects, namely deformation, illumination variation, motion blur, and occlusion. We propose an algorithm called an adaptive dynamic multi-template correlation filter (ADMTCF) which can simultaneously solve the above four difficulties encountered in tracking moving objects. The ADMTCF encodes local binary pattern (LBP) features in the HSV color space, so the encoded features can resist the pollution of the tracking image caused by illumination variation. The ADMTCF has four templates that can be adaptively and dynamically resized to maintain tracking accuracy to combat tracking problems such as deformation, motion blur, and occlusion. In this paper, we experimented with our ADMTCF algorithm and various state-of-the-art tracking algorithms in scenarios such as deformation, illumination variation, motion blur, and occlusion. Experimental results show that our proposed ADMTCF exhibits excellent performance, stability, and robustness in various scenarios.

1. Introduction

From the development of traditional analog TV to the recent digital TV, digital video advertising billboards, smart phones and car audio and video equipment, digital video have brought convenience to human life. The human desire for clear images promotes the continuous improvement of the resolution of digital images. The Consumer Technology Association (CTA) announced in December 2019 the industry-recognized standard and official designation for 8K UHD TVs, which will be used from 2020 in models to help retailers and consumers identify compliant technologies Certified 8K ultra HD requirements [1]. With the growth of the resolution of digital images, scholars in the field of digital image processing have not only improved image quality optimization techniques such as color correction and noise reduction but also developed a variety of image processing techniques such as object recognition and image tracking. Object tracking is a fast-growing technology in the field of image processing, computer vision, and robotics. Object tracking has many practical applications in life, such as self-driving cars that automatically brake to avoid collisions [2,3,4,5]. Another example is the use of face recognition when entering and exiting subway gates in order to maintain the safety of passengers [6,7,8,9]. Object tracking includes single object tracking (SOT) and multiple object tracking (MOT). The main difference between the single object tracking problem and the multiple object tracking problem is whether there are one or more moving objects in the image. The focus of SOT is on high tracking accuracy. Therefore, it is necessary to distinguish the background from the foreground and overcome the factors that affect the tracking accuracy including deformation, illumination changes, motion blur, and occlusion. The MOT is committed to distinguishing multiple objects, so it is necessary to distinguish the differences between multiple objects and deal with the association of multiple objects in adjacent frame images [10,11,12]. The MOT is built on the basis of good SOT. Even if there is only one tracking object, the importance of SOT cannot be ignored. So far, scholars are still trying to solve the four difficulties encountered in SOT, namely deformation, illumination variation, motion blur, and occlusion.

2. Related Work

In recent years, some scholars have tried to improve the robustness of tracking objects through deep learning methods. They increased the number of training sets to improve the ability of feature extraction or obtain a more general tracker. Some scholars use filters to compare the partially extracted object features and then dynamically update the filter.
Many scholars have proposed methods trying to solve the object encountered motion blur in the tracking. In 2021, Qing Guo et al. proposed a new generic scheme based on the generative adversarial network (GAN). They used a fine-tune discriminator as an adaptive blur evaluator to enable selective frame de-blurring during tracking to improve the robustness of tracking blurred objects [13]. In 2021, Zhongjie Mao et al. cited image quality assessment (IQA) and de-blurring components into the basic D3S (a discriminative single-shot segmentation tracker) framework to enhance context patches, thereby improving the accuracy of tracking blurred objects [14]. In 2021, Zehan Tan et al. used the circle algorithm to calculate the neighborhood of offset estimation and then used a short-learning based on SiamRPN++ to achieve online tracking of high-speed targets to solve the impact of motion blur [15]. In 2021, Zhongguan Zhai et al. proposed a space-time memory planar object tracking networks (STMPOT) network to classify the pixels belonging to the object or background of the current frame by remembering the information of the object and background of each frame to solve the motion blur effects [16]. In 2022, Chao Liang et al. built a new one-shot MOT tracker. They used a global embedding search to propagate previous trajectories to the current frame and extended the role of ID embedding from data association to motion prediction which improves the tracker only relying on single-frame detections to predict candidate bounding boxes to address the impact of motion blur [17]. In 2022, Jeongseok Hyun et al. proposed a novel JDT model that recovers missed detections by learning object-level spatiotemporal coherence from edge features in graph neural networks (GNNs), while correlating detection candidates for consecutive frames, solving the effect of motion blur [18].
Many scholars have proposed methods trying to solve object encountered deformation in the tracking. In 2019, Wenxi Liu et al. proposed a deformable convolutional layer. It can enrich object appearance representations in a detection tracking framework to adaptively enhance its original features. They believe that the rich feature representation through deformable convolution can help the convolutional neural network (CNN) classifier to distinguish the target object from the background and reduce the influence of deformation on tracking objects [19]. In 2019, Detian Huang et al. proposed an improved algorithm that improved the feature extraction ability by incorporating the multi-domain training, redesigned both the selection criteria for optimal action and the reward function, and used an effective online adaptive update strategy to adapt to the deformation of the object during tracking [20]. In 2020, Jianming Zhang et al. proposed to use offline pre-trained Resnet-101 to obtain mid-level and high-level extracted features combined with correlation filters to improve the ability to track moving objects with deformation [21]. In 2021, Shiyong Lan et al. proposed a new approach by embedding the occlusion perception block into the model update stage to adaptively adjust the model update according to the situation of occlusion and using the relatively stable color statistics to deal with the appearance shape changes in large targets and compute the histogram response scores as a complementary part of final correlation response to mitigate appearance deformation [22]. In 2022, Xuesong Gao et al. proposed a novel deformed sample generator to obtain a more general classifier and avoid larger training datasets. The classifier and the deformed sample generator are learned jointly, thereby improving the robustness of tracking deformed objects [23].
Many scholars have proposed methods trying to solve the object encountered illumination changes in the tracking. In 2017, Yijun Yan et al. proposed foreground detection in visible and thermal images to reduce the effects of red-green-blue (RGB) color on lighting noise and shadows to reduce the impact of illumination variations on tracking moving objects [24]. In 2018, Shuai Liu et al. proposed an optimized discriminative correlation filter (DCF) tracker that improves the accuracy under illumination changes by performing multiple region detection and using alternate templates (MRAT) while saving alternate templates through a template update mechanism [25]. In 2021, Jieming Yang et al. proposed a neural network that uses the historical location of the target combined with the historical location of the target to expand the training data and uses the metric loss of the historical appearance feature of the target to train the appearance feature extraction module to improve the extraction performance to address the effect of lighting changes on tracking moving objects [26]. In 2022, Zhou, Yuxin, and Yi Zhang proposed SiamET, a Siamese-based network using Resnet-50 as its backbone with enhanced template modules. They address the effect of illumination variations on tracking moving objects by using templates that are obtained based on all historical frames [27].
Many scholars have proposed methods trying to solve the object encountered occlusion changes in the tracking. In 2019, Wei Feng et al. proposed a new dynamic saliency-aware regularized CF tracking (DSAR-CF) scheme that defines a simple and efficient energy function to guide the online update of the regularized weight map to address the effect of occlusion when tracking [28]. In 2020, Yue Yuan et al. proposed a scale-adaptive object-tracking method to reduce the impact of occlusion on tracking moving objects. They extracted features from different layers of ResNet to produce response maps fused based on the AdaBoost algorithm, prevented the filters from updating when occlusion occurs, and used a scale filter to estimate the target scale [29]. In 2020, Di Yuan et al. designed a mask set to generate local filters to capture the local structures of the target and adopted an adaptive weighting fusion strategy for these local filters to adapt to the changes in the target appearance, which could enhance the robustness of the tracker effectively [30]. In 2021, Yuan Tai et al. constructed the subspace with image patches of the search window in previous frames. When the appearance of an object is occluded, the original image patch used to learn the filter is replaced by the reconstructed patch so that the filter can learn from the object instead of the background, thus reducing the effect of occlusion [31]. In 2022, Jinkun Cao et al. demonstrated the effect of a simple motion model, observation-centric SORT (OC-SORT), to reduce the errors accumulated by linear motion models during loss, so that they could reduce the effect of occlusion phenomena on tracking moving objects [32].
However, most of these scholars only propose individual solutions to the problems of deformation, illumination variation, motion blur, and occlusion when tracking moving objects. One solution for a single phenomenon is not enough to track moving objects correctly and stably. When the surface features of the tracked object are drastically changed due to the simultaneous occurrence of various factors such as deformation, illumination variation, motion blur, and occlusion, the tracker that can only solve a single phenomenon will still result in low tracking accuracy or tracking loss. Unfortunately, when tracking moving objects, four phenomena, such as deformation, illumination variation, motion blur, and occlusion, are encountered almost simultaneously. Because of the above reasons, we propose an adaptive dynamic multi-template object tracker (ADMTCF) in this paper, which can simultaneously overcome the difficulty of tracking moving objects with deformation, motion blur, illumination variation, and occlusion. We believe a great tracker must have the following capabilities to simultaneously overcome the four problems of motion blur, deformation, illumination variation, and occlusion when tracking.
  • The template of the tracking object must have sufficient characteristics of the target object;
  • The template of the tracking object must not be sensitive to illumination variations;
  • The template of the tracking object must be more than one set;
  • The template of the tracking object must be dynamically updatable.
Adel Bibi and Bernard Ghanem proposed similar concepts. They believed that using multiple and multi-scale templates can improve tracking accuracy and can overcome the shortcomings of KCF’s single template size [33]. We differ from them in two ways. The first difference is that they use multiple templates to calculate at the same time, but we chose the template with the highest similarity for tracking the next frame of the target after calculations and sorting of multiple templates. The second point is that they update the scale of the tracker by maximizing over the posterior distribution of a grid of scales, but we use the most similar template and then adjust the size of the template adaptively to further improve the similarity between the template and the target.
The adaptive dynamic multi-template object tracker we proposed in this paper has several characteristics. First, we convert the image of the selected object from RGB to HSV color space and then perform LBP conversion on the luminance to obtain a sample of the object’s image feature [34,35]. After HSV color space conversion and local binary pattern (LBP) conversion, the image features do not change drastically with the illumination variation. Therefore, our tracker does not reduce the stability and accuracy of image tracking due to changes in ambient light. Secondly, different from the general tracker, the tracker we proposed not only has multiple sets of samples sorted by time but also dynamically updates or adds the tracker templates when tracking objects. Our strategy for updating or adding new tracker templates must meet one of two conditions. The first condition is to choose a smaller change threshold when the characteristics of the target change greatly. The second condition is to choose a larger change threshold when the characteristics of the target change less. Since our tracker has multiple sets of templates sorted by time and can be updated dynamically, it can overcome the tracking loss caused by deformation or motion blur during tracking. Third, our tracker template can resize as the surface features of moving objects change. To compare with our adaptively adjusting template, the traditional use of a single threshold to judge the similarity between the moving object and the template is too monotonous and prone to tracking loss. Therefore, during the tracking process, our tracker can exhibit excellent tracking robustness even if the target is occluded.
In this paper, we have two main contributions. The first contribution is that our proposed ADMTCF algorithm can maintain the tracking accuracy and robustness even if the object encounters deformation, illumination change, motion blur, and occlusion. The second contribution is that we propose an evaluation method with a penalty factor, which can objectively reflect the accuracy of various algorithms for estimating the object’s size.

3. Methods

We divided this section into Section 3.1 and Section 3.2. for description. In Section 3.1, we detail the operation flow of the ADMTCF. In Section 3.2, we describe the two evaluation methods and compare their tracking effects.

3.1. The Process of the ADMTCF

In this subsection, we detail the operation flow of the ADMTCF by splitting it into four parts. The first part is HSV color space and LBP encoding. The second part is the strategy for adding and updating templates. The third part is the dynamic change of the adaptive template’s size. The fourth part is the selection of the mechanism of the multi-template with adaptive size change for the process of tracking. The operation flow of our adaptive dynamic multi-template object tracker is described below and refers to the operation flow chart in Figure 1 and Figure 2.

3.1.1. HSV Color Space and LBP Encoding

According to the literature [34,35], there are two key points. Firstly, in HSV color space, the hue is less sensitive to illumination changes. Secondly, having sufficient object features can obtain high tracking accuracy. When the tracking object encounters illumination changes, the ADMTCF converts the image from RGB to the HSV color space and then uses hue to perform LBP encoding to maintain good tracking accuracy. In the ADMTCF, we select the object by using the ground truth of the first frame. This image is as in Figure 3a. We convert the image of the object from the RGB to the HSV color space. This image is as in Figure 3b. We use LBP to encode the luminance of the image. This image is as in Figure 3c.

3.1.2. The Strategy for Adding and Updating Templates

There are three key points in the strategy for adding and updating templates. The first key point is that the first template image is the target image of the first frame. The second key point is that before reading the next image frame, we calculate the similarity between all template images and the current target image. The formulas for calculating the similarity are as Equations (1)–(3). I is the current object image. I ¯ x y is the pixel average of the current object image. T is the current template image. T ¯ x y is the pixel average of the current template image. R(x,y) is the similarity. The closer the value of R(x,y) is to 1, the more similar the current object image and current template image are.
I ¯ x y = 1 m n i = 0 m 1 j = 0 n 1 I x + i , y + j
T ¯ x y = 1 m n i = 0 m 1 j = 0 n 1 T x , y
R ( x , y ) = i = 0 m 1 j = 0 n 1 [ I x + i , y + j I ¯ x y ] [ T i , j T ¯ x y ] i = 0 m 1 j = 0 n 1 I x + i , y + j I ¯ x y 2 i = 0 m 1 j = 0 n 1 T i , j T ¯ x y 2
If the similarity between the currently selected template image and the target image is less than 80%, we add the current target image as a new template image until there are four sets of template images. If there are already four sets of template images, the template image most similar to the current target image will be replaced with the current target image. The third key point is that we set the template image that is most similar to the current target image as the template image of the tracking target in the next frame in each frame. Due to the strategy for adding and updating templates, the features of the template used in the tracking process maintain a high similarity with the current object image. Furthermore, the four sets of template images memorize the object images at different times. When the object image is deformed, the ADMTCF will select the most similar template among the four sets of templates as the tracking feature to achieve anti-deformation.

3.1.3. The Dynamic Change of the Adaptive Template’s Size

The dynamic change of the adaptive template’s size is our practice to improve the tracking accuracy. We believe that knowing the scale of the target can improve the tracking accuracy. Therefore, there are three key points. The first key point is that the size change of the adaptive template must be based on tracking using the template that is most similar to the target. The second key point is to judge whether the size of the template is expanding or shrinking according to whether the similarity between the template and the target is improved. The third key point is that we limit the adaptive size to expand or shrink less than 20 pixels to avoid the tracking accuracy drop caused by the tracker over-tracking the background.

3.1.4. The Mechanism of Multi-Template with Adaptive Size Change for the Process of Tracking

The mechanism of multi-template with adaptive size change for the process of tracking is the comprehensive use of the above two key points. ADMTCF selects the template image most similar to the current target as the tracking feature in each image frame to track. After selecting the most similar template image, the adaptive size of the template is dynamically changed to conform to the actual scale of the object. After dynamically adjusting the template’s size, the ADMTCF calculates the response. We follow João F. Henriques et al. to calculate the response value of the correlation filter [36]. The response is calculated as Equations (4)–(7) [36]. The z 1 and z 2 are the input images. The w is the parameter matrix of the regression model. The f ( z ) is the response value. The position with the highest response value in the image is the predicted position of the object. ADMTCF selects the template which is the most similar to the object and then performs adaptive size changes to improve the tracking accuracy.
z 1 = φ ( x 1 ) , z 2 = φ ( x 2 )
w = a i φ ( x i ) ,   a = [ a 1 , a 2 , a 3 , a n ] T
φ T ( x ) φ ( x ) = k ( x , x )
f ( z ) = w T z = j = 1 n a i k ( z , x i )
We go through all steps of ADMTCF and describe as below. We take different strategies depending on the total number of templates. We calculate the similarity according to Equations (1)–(3). If the total number of templates is 0, then we make three settings. Firstly, we set the No. 1 template as the target in the first frame. Secondly, we set the current template as the No. 1 template. Thirdly, the ADMTCF uses the No. 1 template to calculate the response according to Equations (4)–(7) and obtains the next frame of the image. If the total number of templates is equal to 1, we calculate the similarity by using the No. 1 template and the current adaptive adjustment template, respectively. If the current adaptive adjustment template changes size to obtain greater similarity than the No. 1 template’s similarity and the similarity difference is greater than the threshold, we make three operations. Firstly, we add a new No. 1 template. Secondly, we set the image of the original No. 1 template to the No. 2 template. Thirdly, we set the current template as the No. 1 template to calculate the response according to Equations (4)–(7). If not, the algorithm will use the No. 1 template to calculate the response according to Equations (4)–(7) and fetch the next frame of the image. If the total number of templates is equal to 2, we make three operations. Firstly, we use No. 1~2 templates to calculate the similarity in sequence and reorder the templates. Secondly, we set the template with the largest similarity as the No. 1 template. Thirdly, we set the current adaptive adjustment template as the No. 1 template.
If the current adaptive adjustment template changes size to obtain a greater similarity than the No. 1 template’s similarity and the similarity difference is greater than the threshold, we make three operations. Firstly, we add a new No. 1 template. Secondly, we set the image of the original No. 1 template to the No. 2 template, set the image of the original No. 2 template to the No. 3 template. Thirdly, we set the current template as No. 1 template to calculate the response according to Equations (4)–(7). If not, the algorithm will use the No. 1 template to calculate the response according to Equations (4)–(7) and fetch the next frame of the image. If the total number of templates is equal to 3, we make three operations. Firstly, we use No. 1~3 templates to calculate the similarity in sequence and reorder the templates. Secondly, we set the template with the largest similarity as the No. 1 template. Thirdly, we set the current adaptive adjustment template as the No. 1 template. If the current adaptive adjustment template changes size to obtain greater similarity than the No. 1 template’s similarity and the similarity difference is greater than the threshold, we make four operations. Firstly, we add a new No. 1 template. Secondly, we set the image of the original No. 1 template to the No. 2 template and set the image of the original No. 2 template to the No. 3 template. Thirdly, we set the image of the original No. 3 template to the No. 4 template. Fourthly, we set the current template as the No. 1 template to calculate the response according to Equations (4)–(7). If not, the algorithm will use the No. 1 template to calculate the response according to Equations (4)–(7) and fetch the next frame of the image. If the total number of templates is equal to 4, we make three operations. Firstly, we use No. 1~4 templates to calculate the similarity in sequence. Secondly, we reorder the templates and set the template with the largest similarity as the No. 1 template. Thirdly, we set the current adaptive adjustment template as the No. 1 template. If the current adaptive adjustment template changes size to obtain greater similarity than the No. 1 template and the similarity difference is greater than the threshold, we set the No. 1 template as the current adaptive adjustment template to calculate the response according to Equations (4)–(7). If not, the algorithm will use the original No. 1 template to calculate the response according to Equations (4)–(7) and fetch the next frame of the image. If there is no next frame of the image, the tracking ends.
We use the current adaptive adjustment template adaptively resizing to track the target that changes shape. Before resizing the template, we record the current target location, target size and its image features as a reference for the current adaptive adjustment template resizing. When the template is resizing, we check the top, the bottom, the left, and the right edges of the expanded or reduced template in order.
We extend the top side of the current template upwards and re-compare the similarity with the new frame of the target image. If the similarity between the current template and the temporary target increases, we continue to expand the top side of the current template until the similarity is no longer improved, as in Figure 4a. If the similarity between the current template and the temporary target decreases, we reduce the top side of the current template downward until the similarity no longer increases, as in Figure 4b.
We extend the bottom side of the current template down and re-compare the similarity with the new frame of the target image. If the similarity between the current template and the temporary target increases, we continue to expand the bottom side of the current template until the similarity is no longer improved, as in Figure 5a. If the similarity between the current template and the temporary target decreases, we reduce the scope of the bottom side of the current template upward until the similarity no longer increases, as in Figure 5b.
We extend the left side of the current template to the left and re-compare the similarity with the new frame of the target image. If the similarity between the current template and the temporary target increases, we continue to expand the left side of the current template leftward until the similarity no longer increases, as in Figure 6a. If the similarity between the current template and the temporary target decreases, we reduce the range of the left side of the current template to the right until the similarity no longer increases, as in Figure 6b.
We extend the right side of the current template to the right and re-compare the similarity with the new frame of the target image. If the similarity between the current template and the temporary target increases, we continue to expand the range of the right side of the current template to the right until the similarity is no longer improved, as in Figure 6c. If the similarity between the current template and the temporary target decreases, we reduce the scope of the right side of the current template to the left until the similarity no longer increases, as in Figure 6d.
After updating the current template and calculating the response value between the adaptively dynamically adjusted template and the target, we obtain the new response value, the fine-tuned target’s position, and the correct target’s image. It is a very important contribution that by using adaptively dynamically adjustment, the size of the template to track can maintain a high similarity to the target in every frame.

3.2. Evaluation of the Tracking Performance

In general, the Metrics-1 that is used to evaluate the tracker’s ability calculates the ratio of area C to the real object area B. Area A, formed by the tracker inferring the four coordinates, is marked with the symbol D. Area B, formed by the four coordinates of the object’s ground truth, is marked with the symbol G. Area C, marked with symbol GD, is where area A and area B overlap, as in Figure 7 and Equations (8) and (9).
The   overlap   area   C = ( G D )
The   overlap   ratio = ( G D / ( G )
We believe that such an overlap ratio loses its validity in several situations. In the first case, when the tracker determines that the size of the object surrounds the real object, the overlapping area is equal to the area of the real object. At this time, using the general tracking evaluation method will determine the tracking ability of the tracker to be 100%, as in Figure 8a. Another situation is that the tracker determines that the size of the object is surrounded by the real moving object, so the overlapping area will be equal to the area of the moving object identified by the tracker, so the size of the object identified by the tracker cannot reflect the size of the object, as in Figure 8b.
Non - overlap   area = ( G ( G D ) ) + ( D ( G D ) )
The   overlap   ratio = [ 3   ×   ( G D ) ( G + D ) ] / G
Therefore, we adopted a different tracking ability estimation method, Metrics-2. It finds the ratio of the difference between the overlapping area and the area of non-overlapping to the area of the real object, as Equations (10) and (11). By subtracting the area of non-overlapping from the overlapping area to magnify the components of the tracking inaccuracy, it can highlight the effect of the tracker in tracking.

4. Results

In this section, we used the original KCF, four the state of the art tracking algorithms proposed recently, and our adaptive dynamic multi-template object tracker to perform the tracking experiments. In the experiments, all objects suffered the problems of motion blur, deformation, illumination variation, and occlusions. We used four scenes to represent the objects encountering motion blur, deformation, illumination variation, and occlusion, respectively. Firstly, we used the first scene to experiment with the effect of various trackers on the tracking of objects in a light-changing environment, as shown in Figure 9. Secondly, we used the second scene to experiment with the tracking effect of various trackers for tracking the deformation object, as shown in Figure 10. Thirdly, we used the third scene to experiment with the tracking effect of various trackers for tracking the moving blur object, as shown in Figure 11. Finally, we used the fourth scene to experiment with the tracking effect of various trackers for tracking the occluded object, as shown in Figure 12.
The algorithms we used as the experimental control group were KCF, MCCTH, MKCFup, LDES, and SiamRPN++ [36,37,38,39,40]. According to João F. Henriques et al., the KCF using HOG features for object tracking, has the advantage of high tracking accuracy rather than using grayscale features or color features [36]. According to Ning Wang et al., MCCTH uses an adaptively updated multi-cue analysis framework for object tracking and has good object tracking robustness [37]. According to Ming Tang et al., MKCFup introduces a new type of multi-kernel learning (MKL), which can use the powerful discriminability of nonlinear kernels to track objects and can track high-speed moving objects [38]. According to Yang Li et al., LDES can overcome the scale change, rotation and large displacement of the target when tracking the target [39]. Bo Li et al., used a training dataset that locates objects biasing from the image center to deepen the network of SiamRPN++ and proposed a new model architecture to perform layer-wise and depth-wise aggregations to improve tracking accuracy [40]. Tracking experiments were performed for scenarios 1, 2, 3, and 4. Some screenshots of the tracking results using the algorithms ADMTCF, KCF, MCCTH, MKCFup, LDES, and SIAMPRN++ are shown in Figure 9, Figure 10, Figure 11 and Figure 12, respectively. The experimental data are shown in Table 1 and Table 2.
In scenario 1, the target is a red car driving out of the tunnel from the tunnel. In the tunnel, due to the light, the target appears dark red nearly black. When the target exits the tunnel and encounters light outside the tunnel, it has a pristine red appearance. This video tests the tracker’s ability to track the color of part or all of an object’s appearance as the illumination varies. According to Table 1, the experimental results show that the algorithms KCF, MKCFup, LDES and our tracker showed good tracking ability in the case of illumination variations.
In scenario 2, the target is a fish, which moves from the left to the right. During the movement of the fish, the fish gradually turned from sideways to front facing the camera. The image features, size, and outline of the fish’s side and front are different. This video tests the tracker’s ability to track changes in the appearance and shape of the target. According to Table 1, the experimental results showed that the algorithms MCCTH, LEDS, and KCF showed good tracking ability in the case of deformation.
In scenario 3, the target is a moving pedestrian, moving from the left side of the screen to the right side. During the movement, the outline of the pedestrian appears blurred. This video tests the tracker’s ability to track objects with a blurry appearance. According to Table 1, the experimental results showed that the algorithms ADMTCF, KCF, and MCCTH showed good tracking ability in the case of motion blur.
In scenario 4, the target is a moving pedestrian, moving from the right side to the left. During the pedestrian’s movement, the outline of the pedestrian is occluded by a white car. This video tests the tracker’s ability to track the occlusion of the target. According to Table 1, the experimental results showed that the algorithms MKCFup, LDES, and MCCTH showed good tracking ability in the case of occlusion.
In scenario 1 and scenario 2, due to the position and angle of the camera fixing, the coordinates of the target object did not shift due to camera movement or angle swing. Therefore, we marked the bounding boxes of the various trackers and the tracked trajectories in the images in scenario 1 and scenario 2, as shown in Figure 10 and Figure 11. The trajectories are the lines connecting the center points of the bounding boxes in frames.
According to Table 1 and Figure 13, we can observe the tracking performance of various algorithms evaluated using metrics 1. By using HOG features, KCF performed best in scenario 1 (illumination variation). Due to using an adaptively updated multicue analysis framework, MCCTH performed best in scenario 2 (deformation). LDES was second in scenario 2. Our adaptive dynamic multi-template CF performed best in scenario 3. MKCFup was second in scenario 4.
According to Table 2 and Figure 14, we can observe the tracking performance of various algorithms evaluated using Metrics-2. SiamRPN++ performed best in scenario 1. MCCTH performed best in scenario 2. Our adaptive dynamic multi-template CF performed best in scenarios 3 and 4.
We took the average of the AVG accuracy of the 4 scenarios in Table 3 and Table 4 to indicate the tracking ability of the algorithm that encounters illumination variation, deformation, motion blur, and occlusion at the same time, and plots Figure 15a,b. From Figure 15a, it can be seen that our proposed adaptive dynamic multi-template object tracker achieved third place and maintained good tracking stability and robustness when the tracking encountered illumination variation, deformation, motion blur, and occlusion. From Figure 15b, it can be seen that our proposed ADMTCF achieved the highest tracking accuracy of all trackers.
Finally, we carried out an evaluation with respect to the running time of the algorithms. For objectivity, we made the evaluation of the running time by using two different CPUs, the Intel Celeron N2940 1.83GHz and Intel I5 8250U 3.4GHz. We recorded the frame rates in Table 5 and Table 6, respectively. For the objectivity of comparison, we resized the resolution of all images of the scenarios to 720 × 480. In Table 5 and Table 6, we found that our algorithm was faster than SiamRPN++. It means that the ADMTCF not only runs faster but also reduces the influence of illumination variation, deformation, motion blur, and occlusion when tracking objects. In addition, using the I5 8250U CPU, the best frame rate of our ADMTCF was more than 30 fps and meets the requirements of real-time applications without a high power consumption GPU.

5. Discussion

Our ADMTCF still had good tracking accuracy when the target object encountered illumination variations because the ADMTCF is less affected by illumination variations, resulting from using the features obtained by LBP encoding of the image in the HSV color space to track. The deformation of the target is a severe test for the tracker. If the tracker only maintains the target features of the first frame, it is easy to drop the tracking accuracy. Therefore, the tracker must have the feature of updating the template to track objects with different shapes at different times. Our ADMTCF has four sets of templates, which can remember the appearance characteristics of the target at different times. In addition, we selected a set of templates with the highest similarity according to the target object of the current frame, and then adaptively and dynamically fine-tuned the size of the template to achieve the highest feature similarity between the template and the target image. After that, we let the tracker infer the location and size of the target based on the template. Due to the function of dynamically updating and fine-tuning the size of the template, ADMTCF has good tracking stability and robustness, even when the shape of the target changes, the movement is blurred, and occlusion occurs. In Table 1, we find the tracking accuracy of various trackers in various scenarios. The tracking accuracy was according to evaluation Metrics-1. It can express the overlapping area between the real target and the tracker inferred target but cannot reflect whether the inferred object’s size is close to the real one. Therefore, we proposed evaluation Metric-2 to distinguish the capability of inferring the object’s size. By evaluating Metric-2, to obtain a high score, the tracker must infer the size of the target as close to the real object as possible. The inferred size of the object will approach the ground truth, since ADMTCF has four sets of templates that will be dynamically adjusted and updated. According to Table 2, the ADMTCF performed quite well. We averaged the various scenarios’ data from Table 1 and Table 2, respectively, to make Table 3 and Table 4. We believe that the data in Table 3 and Table 4 show the tracking ability and versatility of the various trackers when they encountered tracking problems such as illumination variations, deformations, motion blur, and occlusion. According to Table 3 and Table 4, our ADMTCF is quite suitable for tracking objects with illumination variations, deformation, motion blur, and occlusions.

6. Conclusions

We firmly believe that the tracker can overcome the four tracking problems, motion blur, deformation, illumination variation, and occlusion at the same time if the tracker has sufficient characteristics of the target object, more than one set of templates, dynamically updated templates, and less sensitivity to illumination variations. Our proposed ADMTCF has several features. Firstly, we convert the image of the pre-framed target object into the HSV color space, and then perform LBP conversion on the luminance to obtain the image features of the target object. So, our tracker will not degrade the stability and accuracy of tracking due to changes in ambient light. Secondly, our tracker has four sets of dynamically updatable templates. Our ADMTCF uses the template with the highest similarity to track objects while retaining the template with lower similarity to memorize the pose of the previous object to avoid the deformation of the target in the future. Therefore, the ADMTCF can maintain good tracking accuracy when the shape of the target changes or the movement is blurred during the tracking process. Thirdly, the template of our tracker can be adaptively resized as the surface features of the moving object change, maintaining a high similarity between the template and the target object. The ADMTCF can maintain good tracking accuracy even if the target is occluded during the tracking process. Experiments show that our proposed ADMTCF provides a good object tracker when object tracking encounters illumination variation, deformation, motion blur and occlusion.

Author Contributions

Conceptualization, K.-C.H.; methodology, K.-C.H.; validation, K.-C.H.; writing—original draft preparation, K.-C.H. and S.-F.L.; writing—review and editing, K.-C.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

No consent required. OTB-100 dataset is publicly available and can be used for research purposes.

Data Availability Statement

Not applicable.

Acknowledgments

Thanks to friends who are keen on object tracking for their suggestions on my program.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. CTA Launches Industry-Led 8K Ultra HD Display Definition, Logo Program. Available online: https://www.cta.tech/Resources/i3-Magazine/i3-Issues/2019/November-December/CTA-Launches-Industry-Led-8K-Ultra-HD-Display-Defi (accessed on 30 September 2022).
  2. Agarwal, N.; Chiang, C.-W.; Sharma, A. A Study on Computer Vision Techniques for Self-Driving Cars. In Proceedings of the International Conference on Frontier Computing, Kuala Lumpur, Malaysia, 3–6 July 2018; pp. 629–634. [Google Scholar]
  3. Buyval, A.; Gabdullin, A.; Mustafin, R.; Shimchik, I. Realtime Vehicle and Pedestrian Tracking for Didi Udacity Self-Driving Car Challenge. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 2064–2069. [Google Scholar]
  4. Cho, H.; Seo, Y.-W.; Kumar, B.V.; Rajkumar, R.R. A multi-sensor fusion system for moving object detection and tracking in urban driving environments. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 1836–1843. [Google Scholar]
  5. Petrovskaya, A.; Thrun, S.J.A.R. Model based vehicle detection and tracking for autonomous urban driving. Auton Robot 2009, 26, 123–139. [Google Scholar] [CrossRef]
  6. Gajjar, V.; Gurnani, A.; Khandhediya, Y. Human detection and tracking for video surveillance: A cognitive science approach. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 2805–2809. [Google Scholar]
  7. Lee, Y.-G.; Tang, Z.; Hwang, J.-N. Online-learning-based human tracking across non-overlapping cameras. IEEE Trans. Circuits Syst. Video Technol. 2017, 28, 2870–2883. [Google Scholar] [CrossRef]
  8. Xu, R.; Nikouei, S.Y.; Chen, Y.; Polunchenko, A.; Song, S.; Deng, C.; Faughnan, T.R. Real-time human objects tracking for smart surveillance at the edge. In Proceedings of the 2018 IEEE International Conference on Communications (ICC), Kansas City, MO, USA, 20–24 May 2018; pp. 1–6. [Google Scholar]
  9. Zhouabc, Y.; Zlatanovac, S.; Wanga, Z.; Zhangcd, Y.; Liuc, L. Moving human path tracking based on video surveillance in 3D indoor scenarios. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 3, 97–101. [Google Scholar]
  10. Teutsch, M.; Krüger, W. Detection, segmentation, and tracking of moving objects in UAV videos. In Proceedings of the 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance, Beijing, China, 18–21 September 2012; pp. 313–318. [Google Scholar]
  11. Muresan, M.P.; Nedevschi, S.; Danescu, R.J.S. Robust data association using fusion of data-driven and engineered features for real-time pedestrian tracking in thermal images. Sensors 2021, 21, 8005. [Google Scholar] [CrossRef] [PubMed]
  12. Karunasekera, H.; Wang, H.; Zhang, H.J.I.A. Multiple object tracking with attention to appearance, structure, motion and size. IEEE Access 2019, 7, 104423–104434. [Google Scholar] [CrossRef]
  13. Guo, Q.; Feng, W.; Gao, R.; Liu, Y.; Wang, S.J. Exploring the effects of blur and deblurring to visual object tracking. IEEE Trans. Image Process. 2021, 30, 1812–1824. [Google Scholar] [CrossRef]
  14. Mao, Z.; Chen, X.; Wang, Y.; Yan, J. Robust Tracking for Motion Blur Via Context Enhancement. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; pp. 659–663. [Google Scholar]
  15. Tan, Z.; Yang, W.; Li, S.; Chen, Y.; Ma, X.; Wu, S. Research on High-speed Object Tracking Based on Circle Migration Estimation Neighborhood. In Proceedings of the 2021 8th International Conference on Computational Science/Intelligence and Applied Informatics (CSII), Zhuhai, China, 13–15 September 2021; pp. 29–33. [Google Scholar]
  16. Zhai, Z.; Sun, S.; Liu, J. Tracking Planar Objects by Segment Pixels. In Proceedings of the 2021 3rd International Academic Exchange Conference on Science and Technology Innovation (IAECST), Guangzhou, China, 10–12 December 2021; pp. 308–311. [Google Scholar]
  17. Liang, C.; Zhang, Z.; Zhou, X.; Li, B.; Lu, Y.; Hu, W.J. One More Check: Making “Fake Background” Be Tracked Again. Process. AAAI Conf. Artif. Intell. 2022, 36, 1546–1554. [Google Scholar] [CrossRef]
  18. Hyun, J.; Kang, M.; Wee, D.; Yeung, D.-Y.J. Detection Recovery in Online Multi-Object Tracking with Sparse Graph Tracker. arXiv 2022, arXiv:2205.00968. [Google Scholar]
  19. Liu, W.; Song, Y.; Chen, D.; He, S.; Yu, Y.; Yan, T.; Hancke, G.P.; Lau, R.W. Deformable object tracking with gated fusion. IEEE Trans. Image Process. 2019, 28, 3766–3777. [Google Scholar] [CrossRef] [Green Version]
  20. Huang, D.; Kong, L.; Zhu, J.; Zheng, L.J. Improved action-decision network for visual tracking with meta-learning. IEEE Access 2019, 7, 117206–117218. [Google Scholar] [CrossRef]
  21. Zhang, J.; Sun, J.; Wang, J.; Yue, X.-G. Visual object tracking based on residual network and cascaded correlation filters. J. Ambient. Intell. Humaniz. Comput. 2021, 12, 8427–8440. [Google Scholar] [CrossRef]
  22. Lan, S.; Li, J.; Sun, S.; Lai, X.; Wang, W. Robust Visual Object Tracking with Spatiotemporal Regularisation and Discriminative Occlusion Deformation. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; pp. 1879–1883. [Google Scholar]
  23. Gao, X.; Zhou, Y.; Huo, S.; Li, Z.; Li, K.J. Robust object tracking via deformation samples generator. J. Vis. Commun. Image Represent. 2022, 83, 103446. [Google Scholar] [CrossRef]
  24. Yan, Y.; Ren, J.; Zhao, H.; Sun, G.; Wang, Z.; Zheng, J.; Marshall, S.; Soraghan, J.J. Cognitive fusion of thermal and visible imagery for effective detection and tracking of pedestrians in videos. Cogn. Comput. 2018, 10, 94–104. [Google Scholar] [CrossRef] [Green Version]
  25. Liu, S.; Liu, G.; Zhou, H.J. A robust parallel object tracking method for illumination variations. Mob. Netw. Appl. 2019, 24, 5–17. [Google Scholar] [CrossRef] [Green Version]
  26. Yang, J.; Ge, H.; Yang, J.; Tong, Y.; Su, S.J. Online multi-object tracking using multi-function integration and tracking simulation training. Appl. Intell. 2022, 52, 1268–1288. [Google Scholar] [CrossRef]
  27. Zhou, Y.; Zhang, Y.J. SiamET: A Siamese based visual tracking network with enhanced templates. Appl. Intell. 2022, 52, 9782–9794. [Google Scholar] [CrossRef]
  28. Feng, W.; Han, R.; Guo, Q.; Zhu, J.; Wang, S.J. Dynamic saliency-aware regularization for correlation filter-based object tracking. IEEE Trans. Image Process. 2019, 28, 3232–3245. [Google Scholar] [CrossRef]
  29. Yuan, Y.; Chu, J.; Leng, L.; Miao, J.; Kim, B.-G. A scale-adaptive object-tracking algorithm with occlusion detection. EURASIP J. Image Video Process. 2020, 2020, 1–15. [Google Scholar] [CrossRef] [Green Version]
  30. Yuan, D.; Li, X.; He, Z.; Liu, Q.; Lu, S.J. Visual object tracking with adaptive structural convolutional network. Knowl.-Based Syst. 2020, 194, 105554. [Google Scholar] [CrossRef]
  31. Tai, Y.; Tan, Y.; Xiong, S.; Tian, J.J. Subspace reconstruction based correlation filter for object tracking. Comput. Vis. Image Underst. 2021, 212, 103272. [Google Scholar] [CrossRef]
  32. Cao, J.; Weng, X.; Khirodkar, R.; Pang, J.; Kitani, K.J. Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking. arXiv 2022, arXiv:2203.14360. [Google Scholar]
  33. Bibi, A.; Ghanem, B. Multi-template scale-adaptive kernelized correlation filters. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Santiago, Chile, 7–13 December 2015; pp. 50–57. [Google Scholar]
  34. Ojala, T.; Pietikainen, M.; Harwood, D. Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. In Proceedings of the 12th International Conference on Pattern Recognition, Jerusalem, Israel, 9–13 October 1994; pp. 582–585. [Google Scholar]
  35. Cucchiara, R.; Grana, C.; Neri, G.; Piccardi, M.; Prati, A. The Sakbot System for Moving Object Detection and Tracking. In Video-Based Surveillance Systems; Springer: Boston, MA, USA, 2002; pp. 145–157. [Google Scholar]
  36. Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J.J. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 37, 583–596. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  37. Wang, N.; Zhou, W.; Tian, Q.; Hong, R.; Wang, M.; Li, H. Multi-cue correlation filters for robust visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4844–4853. [Google Scholar]
  38. Tang, M.; Yu, B.; Zhang, F.; Wang, J. High-speed tracking with multi-kernel correlation filters. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4874–4883. [Google Scholar]
  39. Li, Y.; Zhu, J.; Hoi, S.C.; Song, W.; Wang, Z.; Liu, H. Robust estimation of similarity transformation for visual object tracking. AAAI 2019, 33, 8666–8673. [Google Scholar] [CrossRef]
  40. Li, B.; Wu, W.; Wang, Q.; Zhang, F.; Xing, J.; Yan, J. Siamrpn++: Evolution of siamese visual tracking with very deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4282–4291. [Google Scholar]
Figure 1. The operation flow of the ADMTCF.
Figure 1. The operation flow of the ADMTCF.
Applsci 12 10221 g001
Figure 2. There are two sub-functions in the operation of ADMTCF: (a) one sub-function is calculating the response that is used to obtain the response of the four templates after adaptively resizing forward up, down, left, and right directions; (b) another sub-function is called the adaptive adjustment template which expands or shrinks the size of the template.
Figure 2. There are two sub-functions in the operation of ADMTCF: (a) one sub-function is calculating the response that is used to obtain the response of the four templates after adaptively resizing forward up, down, left, and right directions; (b) another sub-function is called the adaptive adjustment template which expands or shrinks the size of the template.
Applsci 12 10221 g002
Figure 3. There are three operations of pre-processing of ADMTCF: (a) the first one is to select the object in the first frame; (b) the second one is to convert the object’s image from the RGB color domain to the HSV color domain; (c) the third one is to carry out LBP encoding.
Figure 3. There are three operations of pre-processing of ADMTCF: (a) the first one is to select the object in the first frame; (b) the second one is to convert the object’s image from the RGB color domain to the HSV color domain; (c) the third one is to carry out LBP encoding.
Applsci 12 10221 g003
Figure 4. Adaptive adjustment of the top side of the template in the ADMTCF algorithm: (a) expanding upward; (b) shrinking downward.
Figure 4. Adaptive adjustment of the top side of the template in the ADMTCF algorithm: (a) expanding upward; (b) shrinking downward.
Applsci 12 10221 g004
Figure 5. Adaptive adjustment of the bottom side of the template in the ADMTCF algorithm: (a) expanding downward; (b) shrinking upward.
Figure 5. Adaptive adjustment of the bottom side of the template in the ADMTCF algorithm: (a) expanding downward; (b) shrinking upward.
Applsci 12 10221 g005
Figure 6. Adaptive adjustment of the left side and the right side of the template in the ADMTCF algorithm: (a) expanding leftward of the left side of the template; (b) shrinking rightward of the left side of the template; (c) expand rightward of the right side of the template; (d) shrink leftward of the right side of the template.
Figure 6. Adaptive adjustment of the left side and the right side of the template in the ADMTCF algorithm: (a) expanding leftward of the left side of the template; (b) shrinking rightward of the left side of the template; (c) expand rightward of the right side of the template; (d) shrink leftward of the right side of the template.
Applsci 12 10221 g006
Figure 7. The overlap is used to evaluate the tracking ability.
Figure 7. The overlap is used to evaluate the tracking ability.
Applsci 12 10221 g007
Figure 8. There are two overlap conditions: (a) the first overlap condition; (b) the other overlap condition.
Figure 8. There are two overlap conditions: (a) the first overlap condition; (b) the other overlap condition.
Applsci 12 10221 g008
Figure 9. The experimental results of the scenario 1 video with illumination variation: (a) the 1st frame of the video; (b) the 43rd frame of the video; (c) the 71st frame of the video; (d) the 126th frame of the video; (e) the cropped and enlarged image of image (a); (f) the cropped and enlarged image of image (b); (g) the cropped and enlarged image of image (c); (h) the cropped and enlarged image of image (d).
Figure 9. The experimental results of the scenario 1 video with illumination variation: (a) the 1st frame of the video; (b) the 43rd frame of the video; (c) the 71st frame of the video; (d) the 126th frame of the video; (e) the cropped and enlarged image of image (a); (f) the cropped and enlarged image of image (b); (g) the cropped and enlarged image of image (c); (h) the cropped and enlarged image of image (d).
Applsci 12 10221 g009
Figure 10. The experimental results of the scenario 2 video with deformation: (a) the 15th frame of the video; (b) the 38th frame of the video; (c) the 49th frame of the video; (d) the 58th frame of the video.
Figure 10. The experimental results of the scenario 2 video with deformation: (a) the 15th frame of the video; (b) the 38th frame of the video; (c) the 49th frame of the video; (d) the 58th frame of the video.
Applsci 12 10221 g010
Figure 11. The experimental results of the scenario 3 video with motion blur: (a) the 1st frame of the video; (b) the 15th frame of the video; (c) the 31st frame of the video; (d) the 46th frame of the video; (e) the cropped and enlarged image of image (a); (f) the cropped and enlarged image of image (b); (g) the cropped and enlarged image of image (c); (h) the cropped and enlarged image of image (d).
Figure 11. The experimental results of the scenario 3 video with motion blur: (a) the 1st frame of the video; (b) the 15th frame of the video; (c) the 31st frame of the video; (d) the 46th frame of the video; (e) the cropped and enlarged image of image (a); (f) the cropped and enlarged image of image (b); (g) the cropped and enlarged image of image (c); (h) the cropped and enlarged image of image (d).
Applsci 12 10221 g011
Figure 12. The experimental results of the scenario 3 video with occlusion: (a) the 100th frame of the video; (b) the 113rd frame of the video; (c) the 140th frame of the video; (d) the 172nd frame of the video; (e) the cropped and enlarged image of image (a); (f) the cropped and enlarged image of image (b); (g) the cropped and enlarged image of image (c); (h) the cropped and enlarged image of image (d).
Figure 12. The experimental results of the scenario 3 video with occlusion: (a) the 100th frame of the video; (b) the 113rd frame of the video; (c) the 140th frame of the video; (d) the 172nd frame of the video; (e) the cropped and enlarged image of image (a); (f) the cropped and enlarged image of image (b); (g) the cropped and enlarged image of image (c); (h) the cropped and enlarged image of image (d).
Applsci 12 10221 g012
Figure 13. Tracking accuracy using Metrics-1: (a) scenario 1; (b) scenario 2; (c) scenario 3; (d) scenario 4.
Figure 13. Tracking accuracy using Metrics-1: (a) scenario 1; (b) scenario 2; (c) scenario 3; (d) scenario 4.
Applsci 12 10221 g013
Figure 14. Tracking accuracy using Metrics-2: (a) scenario 1; (b) scenario 2; (c) scenario 3; (d) scenario 4.
Figure 14. Tracking accuracy using Metrics-2: (a) scenario 1; (b) scenario 2; (c) scenario 3; (d) scenario 4.
Applsci 12 10221 g014aApplsci 12 10221 g014b
Figure 15. Tracking accuracy using Metrics-1 and Metrics-2: (a) tracking accuracy using Metrics-1; (b) tracking accuracy using Metrics-2.
Figure 15. Tracking accuracy using Metrics-1 and Metrics-2: (a) tracking accuracy using Metrics-1; (b) tracking accuracy using Metrics-2.
Applsci 12 10221 g015
Table 1. Tracking accuracy using Metrics-1.
Table 1. Tracking accuracy using Metrics-1.
Title 2OursKCFMCCTHMKCFupLDESSiamRPN++
Scenario 10.9430.983 *0.9350.9450.9430.924
Scenario 20.9140.9350.944 *0.8500.9430.863
Scenario 30.965 *0.9330.8790.8660.8730.852
Scenario 40.8430.8360.9020.937 *0.9350.890
* First place.
Table 2. Tracking accuracy using Metrics-2.
Table 2. Tracking accuracy using Metrics-2.
Title 2OursKCFMCCTHMKCFupLDESSiamRPN++
Scenario 10.5930.4560.6140.6720.6490.690 *
Scenario 20.7300.7930.815 *0.6750.7540.637
Scenario 30.581 *0.4860.4240.4430.4490.506
Scenario 40.242 *0.1170.1230.0810.1220.237
* First place.
Table 3. Tracking accuracy using Metrics-1.
Table 3. Tracking accuracy using Metrics-1.
Title 2OursKCFMCCTHMKCFupLDESSiamRPN++
Scenario 1 + 2 + 3 + 40.916 0.922 0.915 0.900 0.924 *0.882
* First place.
Table 4. Tracking accuracy using Metrics-2.
Table 4. Tracking accuracy using Metrics-2.
Title 2OursKCFMCCTHMKCFupLDESSiamRPN++
Scenario 1 + 2 + 3 + 40.537 * 0.463 0.494 0.468 0.494 0.518
* First place.
Table 5. The running time of the algorithms by using Intel Celeron N2940, 1.83 GHz.
Table 5. The running time of the algorithms by using Intel Celeron N2940, 1.83 GHz.
fpsOursKCFMCCTHMKCFupLDESSiamRPN++
Scenario 17.379.31 *2.565.141.410.07
Scenario 23.28 *1.622.131.460.220.06
Scenario 33.243.54 *1.93.040.670.07
Scenario 44.28 *2.112.662.190.360.04
* First place.
Table 6. The running time of the algorithms by using Intel I5 8250U, 3.4 GHz.
Table 6. The running time of the algorithms by using Intel I5 8250U, 3.4 GHz.
fpsOursKCFMCCTHMKCFupLDESSiamRPN++
Scenario133.77 *12.676.5811.542.290.52
Scenario213.9 *4.45.934.250.930.11
scenario314.12 *7.014.756.591.350.27
scenario418.57 *5.996.615.663.350.41
* First place.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Hung, K.-C.; Lin, S.-F. An Adaptive Dynamic Multi-Template Correlation Filter for Robust Object Tracking. Appl. Sci. 2022, 12, 10221. https://doi.org/10.3390/app122010221

AMA Style

Hung K-C, Lin S-F. An Adaptive Dynamic Multi-Template Correlation Filter for Robust Object Tracking. Applied Sciences. 2022; 12(20):10221. https://doi.org/10.3390/app122010221

Chicago/Turabian Style

Hung, Kuo-Ching, and Sheng-Fuu Lin. 2022. "An Adaptive Dynamic Multi-Template Correlation Filter for Robust Object Tracking" Applied Sciences 12, no. 20: 10221. https://doi.org/10.3390/app122010221

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop