Next Article in Journal
Highly Configurable 100 Channel Recording and Stimulating Integrated Circuit for Biomedical Experiments
Next Article in Special Issue
First Gradually, Then Suddenly: Understanding the Impact of Image Compression on Object Detection Using Deep Learning
Previous Article in Journal
Detecting Teeth Defects on Automotive Gears Using Deep Learning
Previous Article in Special Issue
Real-Time 3D Object Detection and SLAM Fusion in a Low-Cost LiDAR Test Vehicle Setup
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Efficient Online Object Tracking Scheme for Challenging Scenarios

1
Department of Electrical Engineering, International Islamic University, Islamabad 44000, Pakistan
2
Department of Software Engineering, Bahria University, Islamabad 44000, Pakistan
3
School of Electrical Engineering, Southeast University, Nanjing 210096, China
4
School of Electrical and Computer Engineering, King Abdulaziz University, Jeddah 21589, Saudi Arabia
*
Author to whom correspondence should be addressed.
Sensors 2021, 21(24), 8481; https://doi.org/10.3390/s21248481
Submission received: 4 October 2021 / Revised: 3 December 2021 / Accepted: 14 December 2021 / Published: 20 December 2021
(This article belongs to the Special Issue Sensors for Object Detection, Classification and Tracking)

Abstract

:
Visual object tracking (VOT) is a vital part of various domains of computer vision applications such as surveillance, unmanned aerial vehicles (UAV), and medical diagnostics. In recent years, substantial improvement has been made to solve various challenges of VOT techniques such as change of scale, occlusions, motion blur, and illumination variations. This paper proposes a tracking algorithm in a spatiotemporal context (STC) framework. To overcome the limitations of STC based on scale variation, a max-pooling-based scale scheme is incorporated by maximizing over posterior probability. To avert target model from drift, an efficient mechanism is proposed for occlusion handling. Occlusion is detected from average peak to correlation energy (APCE)-based mechanism of response map between consecutive frames. On successful occlusion detection, a fractional-gain Kalman filter is incorporated for handling the occlusion. An additional extension to the model includes APCE criteria to adapt the target model in motion blur and other factors. Extensive evaluation indicates that the proposed algorithm achieves significant results against various tracking methods.

1. Introduction

Visual object tracking (VOT) is an essential task in a variety of computer vision applications such as video surveillance [1,2,3], automobile [4], human–computer interaction [5], cinematography [6], sensor network [7], motion analysis [8], robotics [9,10,11], anti-aircraft system [12], autonomous vehicles [13], and traffic monitoring [14]. As presented in Figure 1, VOT remains a challenging issue due to motion blur, occlusion, fast motion, among others [15,16,17,18,19].
Tracking methods can be categorized as generative and discriminative. In generative tracking methods, the computation cost is high, and they are adaptable with environmental factors due to which these tracking methods might fail in background clutter situations [20,21,22]. Discriminative tracking methods perform better in clutter background situations since they treat these as a binary classification problem. However, they are slow, making them unsuitable for real-time applications [23,24,25].

1.1. Related Work

The STC tracker [27] has been widely used in recent years due to its computational efficiency. STC integrates spatial context information around the target of interest and considers prior information of previous frames for computing the extreme-of-confidence map by using Fourier transform. Die et al. [28] combined a correlation filter (CF) and STC. They extracted HOG (histogram of oriented gradients), (CN) color naming, and gray features for learning-correlation filters. Then, the response of CF and STC is fused. Yang et al. [29] proposed an improved tracking method by incorporating peak to sidelobe ratio (PSR)-based occlusion detection mechanism and model update scheme in the STC framework. Zhang et al. [30] proposed a tracking method by incorporating HOG, CN features, and an average difference of frames-based adaptive learning rate mechanism in the spatiotemporal context framework. Zhang et al. [31] suggested a tracking method by incorporating a selection update mechanism in the spatiotemporal context framework. Song et al. [32] anticipated an improved STC-based tracking method by combining a scale filter and loss function criteria for better performance in UAV applications.
During the past decade, significant progress has been made to develop accurate scale estimation in VOT [33,34,35,36,37,38]. Danelljan et al. [39] proposed a tracking-by-detection framework by learning filters for translation and scale estimation based on pyramid representation. Li et al. [40] incorporated an adaptive scale scheme in a kernelized correlation filter (KCF) tracker using HOG and CN features. Bibi et al. [41] modify the KCF tracker by maximizing posterior distribution over the scales grid and updating the filter by fixed point optimization. Lu et al. [42] combined KCF and Fourier–Mellin transform to deal with rotation and scale variation of the target. Yin et al. [43] modified the scale adaptive with multiple features (SAMF) tracker by using APCE-based rate of change between consecutive frames to control scale size. Ma et al. [44] incorporated APCE in discriminative correlation filters to address fixed template size.
A Kalman filter is used in various tracking algorithms for occlusion handling [45,46,47,48,49]. Kaur et al. [50] suggested a real-time tracking approach using a fractional-gain Kalman filter for nonlinear systems. Soleh et al. [51] proposed the Hungarian Kalman filter (HKF) for multiple target tracking. Farahi et al. [52] proposed a probabilistic Kalman filter (PKF) by incorporating an extra stage for estimating target position by applying the Viterbi algorithm to a probabilistic graph. Gunjal et al. [53] proposed a Kalman filter-based tracking algorithm for moving targets under surveillance applications. Ali et al. [54] address issues in VOT such as fast maneuvering of the target, occlusions, and deformation by combining Kalman filter, CF, and adaptive mean shift in the heuristic framework. Kaur et al. [55] proposed a modified fractional-gain-based Kalman filter for vehicle tracking by incorporating a fractional feedback loop and cost function minimization. Zhou et al. [56] address issues in VOT such as occlusions, motion blur, and clutter background by incorporating a Kalman filter in a compressive tracking framework.
By summarizing the current methods, it can be perceived that significant work has been done to develop a robust tracking algorithm by incorporating scale update schemes, model update mechanisms, occlusion detection, and handling techniques in different tracking frameworks. The STC algorithm proposed in [27] uses FFT for detection and context information for a model update. However, it cannot effectively deal with occlusions, scale variations, and motion blur.

1.2. Our Contributions

To address the limitations of the STC, this paper proposes a robust tracking algorithm suitable for various image processing applications, such as surveillance and autonomous vehicles. The contributions can be listed concisely as follows.
  • We introduce novel criteria for detecting occlusion by utilizing APCE, model update rules, and previous history of the modified response map to prevent the tracking model from wrong updates.
  • We introduce an effective occlusion handling mechanism by incorporating a modified feedback-based fractional-gain Kalman filter in the spatiotemporal context framework to track an object’s motion.
  • We incorporate a max-pooling-based scale scheme by maximizing over posterior probability in the STC framework’s detection stage. We applied a combination of STC and max-pooling to attain higher accuracy.
  • We introduce an APCE-based adaptive learning rate mechanism that utilizes information of current frame and previous history to reduce error accumulation and correctly updates from the wrong appearance of the target.
  • Extensive performance analysis of the proposed tracker is carried out on standard benchmark videos in comparison with STC [27], KCF_MTSA [41], MACF [57], MOSSECA [58], and Modified KCF [59].

1.3. Organization

The organization of this paper follows: brief explanations of STC and fractional calculus are provided in Section 2. In Section 3, the tracking modules of the proposed tracker are explained. Section 4 includes performance analysis. Discussion is given in Section 5, while Section 6 concludes the paper.

2. Review of STC and Fractional Calculus

2.1. STC Tracking

The STC tracking algorithm formulates the relation between the target of interest and its context in the Bayesian framework. The feature set X c   = { l ( r ) = ( I ( r ) , r ) | r c(x*)} and spatial relation between target context is presented in Figure 2.
The confidence map is given as follows:
l ( x ) =   P ( x | k ) = n ( r )     X c P ( x , l ( r ) | k ) = l ( r )     X c P ( x , l ( r ) | k ) P ( l ( r ) | k )
P ( l ( r ) | k ) is the prior context model and P ( x , l ( r ) | k ) is the spatial context model. The confidence map function l ( x ) is given in (2):
l ( x ) = P ( x | k ) =   v   e ( | x     x * |   Þ )
where v is the normalization constant, Þ is a parameter for shape, and is a parameter for scale. The spatial context uses the intensity of the image and weighted Gaussian function given in (3) and (4):
P ( l ( r ) | k ) =   I ( r ) ω γ ( r     x * )
  ω γ =   θ   e ( | x x * | 2 σ 2 )
Equation (5) describes the spatial context model:
P ( x , l ( r ) | k ) = h sc ( x     r )
Explaining for the spatial context:
=   h sc ( x     r ) I ( r ) ω γ ( r     x * )
=   h sc ( x ) ( I ( x ) ω γ ( x     x * ) )
Fast Fourier transform (FFT) can be calculated as follows:
( l ( x ) ) = ( h sc ( x ) ) F ( I ( x ) ω γ ( x     x * ) )
The solution of (8) follows:
h sc ( x ) = 1 ( F ( v . e | x     x * |   Þ ) F ( ( I ( x ) ω γ ( x     x * ) ) ) )
As presented in (10), x t + 1 * can be obtained by computing the extreme-of-confidence map:
x t + 1 * =   arg x c ( x t * ) max   l t + 1 ( x )
The confidence map can be considered from (11):
l t + 1 ( x ) = 1 ( F ( H t + 1 stc ( x ) ) ( I t + 1 ( x ) ω γ ( x   x t * ) ) )
Spatiotemporal context is updated on learning rate ρ , as given in (12):
H t + 1 stc = ( 1   ρ ) H t stc + ρ h t sc

2.2. Fractional Calculus

In this work, the Grünwald–Letnikov definition [60] is used for calculating fractional difference defined in (13):
          Δ γ   x k = 1 h n   q = 0 k ( 1 ) q   ( n q )   x k + 1 q
where n is fractional order, h is the sampling interval, k is the number of samples of given signal x , and ( n q ) is obtained using (14):
( n q ) = { 1 for   q = 0 n ( n 1 ) ( n q + 1 ) q for   q > 0

3. Proposed Solution

In this section, tracking modules are elaborated. First, the max-pooling-based scale mechanism is presented. Second, the APCE-based occlusion detection mechanism is discussed. Third, the fractional-gain Kalman filter-based mechanism for occlusion handling is examined. Fourth, an APCE-based modified learning rate mechanism is explained. The flowchart of the proposed tracker is displayed in Figure 3.
As presented in Figure 3, for each sequence, the ground truth of the target is manually initialized in the first frame. Afterward, the confidence map of the target is calculated. Then, by maximizing the posterior probability, the scale of the target is estimated. Then APCE of the response map is calculated along with the difference of APCE between consecutive frames. Based on occlusion criteria, the fractional-gain Kalman filter activates and predicts the location of the target. Afterward, the learning rate of the tracking model is updated by utilizing the current target position and previous history of APCE values.

3.1. Scale Integration Scheme

One limitation of STC is the inability to rapidly change the scale. During the detection phase of STC, we applied max-pooling over multiple scales by maximizing the posterior probability, as given in (15):
max i   P   ( r i | y ) = P   ( y | r i )   P   ( r i )  
where r i represents ith scale and P   ( y | r i ) is the maximum detection likelihood response at ith scale. The prior term P   ( r i ) is the Gaussian distribution whose standard deviation is set through experimentation. It allows for a smooth transition between frames, given that the target scale does not vary much between frames.

3.2. Occlusion Detection Mechanism

The performance of any tracking algorithm is affected by various factors, of which the most common is occlusion. It is essential to create a mechanism for the detection of occlusion. In the present work, an occlusion feedback mechanism is presented, which detects occlusion and updates the target model by evaluating the tracking status of each frame.
Average peak to correlation energy (APCE) [61] determines tracker effectiveness. The value of APCE changes according to the target occlusion state. Small values of APCE specify tracking failure or target occlusion. It is given in (16):
APCE t = | g max     g min | 2 mean ( w , h ( g w , h   g min ) 2 )
where g max   and   g min are maximum and minimum response values, respectively, and g w , h gives indices of the response map. The occlusion detection criteria are built as given in (17) and (18):
δ = APCE t APCE t 1
APCE t < ϵ th
where APCE t and APCE t 1 are the APCE values at t and (t − 1) frames, respectively, δ is the difference of the APCE between two sequential frames, and ϵ th is the threshold value acquired by performing multiple experiments. Rules of occlusion and model update follow:
  • When δ 0 or   APCE t   ϵ th , it indicates that the target is coming out of the shelter, and both the tracking and model updates are based on STC.
  • When δ 0 and   APCE t < ϵ th , it indicates that the target is in the occlusion state and tracking is based on the fractional-gain Kalman filter. The tracking model is also updated based on the Kalman filter prediction.
  • When δ > 0 or   APCE t < ϵ th , it indicates that the target occludes, and both the tracking and model update are based on STC.
  • When δ > 0 or   APCE t   ϵ th , it indicates that the target tracking is good and that both the tracking and model update are based on STC.
As seen in Figure 4a, without occlusion, both APCE and δ are high; therefore, no occlusion occurs. However, when both APCE and δ give low values, as shown in Figure 4b, case occlusion occurs and the occlusion handling mechanism is activated. By using this mechanism, proposed tracking achieved significant results for the occlusion challenge.

3.3. Fractional-Gain Kalman Filter

The Kalman filter is widely used in the research area of VOT. A modified discrete time linear system can be characterized by Equations (19) and (20):
x k = Ax k 1 + Bu k +   w k
z k = Hx k +   v k
where x k is the state vector, z k is system output, u k is system input, and v k is output noise. A ,   B , and H are transition, control, and measurement matrices, respectively. The innovation equation is the difference between the estimated output z ^ k   and actual output z k defined in (21):
innovation =   z k z ^ k =   z k H x ^ k
where x ^ k is the priori state. The estimation of the next state x ^ k with a modified gain is given in (22) and (23):
x ^ k = x ^ k +   K new ( innovation ) = x ^ k +   K new ( z k H x ^ k )  
K new =   K k +   f k =   K k + Δ γ   K k
where Δ γ   K k is the fractional derivative of previous Kalman gain. Priori error e ^ k between actual and estimated state and its covariance P k can be given in (24) and (25):
e ^ k =   x k x ^ k
P k = E   { ( e ^ k ) 2 }
Posteriori error e k between actual and estimated state and its covariance P k can be given, as in (26) and (27):
e k =   x k x ^ k
P k = E   { ( e k ) 2 }
Kalman gain K is calculated by minimizing posteriori error covariance P k as given in (28):
P k =   E ( x k x ^ k ) 2 = E ( x ^ k x ^ k ( K + Δ γ   K ) ( z k H x ^ k ) ) 2
Finding the value of K in (29):
d E   ( x ^ k x ^ k ( K + Δ γ   K ) ( z k H x ^ k ) ) 2 d K = 0
K new can be written as in (30):
K new = K + E   { q = 0 k ( 1 ) q + 1 ( n q )   K k q   }
The modified Kalman gain K new consists of two terms. The first term represents the Kalman filter’s gain, and the second represents the mean of the fractional difference of previous gains. The ( 1 ) q + 1 makes the mean value nominal.

3.4. Adaptive Learning Rate

The motion of the target is changed in each frame during tracking. It is, therefore, necessary to update the target model adaptively rather than on a fixed learning rate. We used an APCE-based degree indicator to better cope with environmental changes occurring during tracking to make it adaptive. In the present work, we used maxima of historical APCE values to normalize APCE, since the APCE value is very high. The degree indicator d APCE is defined in (31):
d APCE = APCE t max ( { APCE t s , , APCE t 1 } )
where t s is the start index frame. The value of the learning rate is adjusted as in (32):
ρ = { 0.075       ,   d APCE > τ th       0.075 d APCE   ,   d APCE τ th  
where τ th is the threshold value acquired by performing multiple experiments.
Figure 5a shows that, without both motion and blur, APCE and d APCE are high; therefore, the learning rate of tracking should be fast. However, when motion blur occurs, both APCE and d APCE give low values, as shown in Figure 5b. Thus, in that case, the model should be updated slowly due to the appearance change of the target. By using this mechanism, the proposed tracking achieved significant results for the motion blur challenge. The tracker is given in Algorithm 1.
Algorithm 1: Proposed Tracking Method
  Input: Video with initialized ground truth on frame 1.
  Output: Rectangle on each frame.
for 1st to the last frame.
  Compute context prior model by using (3).
  Compute confidence map by using (11).
  Compute center of target location.
  Estimate scale by using (15).
  Compute APCE by using (16).
  Determine occlusion detection using (17) and (18).
  Check four rules of occlusion detection given in Section 3.2.
  if rule 2 occurs  
   Activate fractional-gain Kalman filter
  Compute fractional Kalman gain by using (30).
  Predict position by using (22).
  Compute error covariance by using (28).
  end
  Calculate occlusion indicator using (31).
  Calculate learning rate using (32)
  Update context prior model by using (3).
  Update spatial context model by using (9).
  Update STC model by using (12).
  Estimate the position of target.
End

4. Performance Analysis

Comprehensive assessments were conducted on videos taken from the OTB 2015 [26] dataset for the proposed tracking method’s quantitative and quantitative evaluation. These sequences include scale variations, motion blur, and fast motion challenges.

4.1. Evaluation Criteria

The proposed algorithm is compared with tracking methods on two evaluation criteria: distance precision rate (DPR) and center location error (CLE). The calculation formula for CLE is mentioned in (33):
CLE   = ( x i x gt ) 2 + ( y i y gt ) 2

4.2. Quantitative Analysis

DPR evaluation is presented in Table 1. In videos Blurcar1, Car2, Human7, Jogging1, and Jogging2, the proposed algorithm outperforms Modified KCF, MOSSECA, MACF, KCF_MTSA, and STC. For the sequences Blurcar3, Blurcar4, Boy, Dancer2, and Suv, the proposed tracker has marginally less precision value. Overall, the proposed algorithm has a higher mean value than the other algorithms.
Average center location error evaluation is presented in Table 2. In the videos Blurcar1, Car2, Dancer2, Jogging1, and Human7, the proposed algorithm outperforms Modified KCF, MOSSECA, MACF, KCF_MTSA, and STC. For the videos Blurcar3, Blurcar4, Boy, Jogging2, and Suv, the proposed algorithm has marginally high error values. Overall, the proposed algorithm has the lowest mean error compared to the other algorithms.
The precision and error plots are presented in Figure 6 and Figure 7, respectively. These plots provide a frame-by-frame comparison in entire image sequences. Since precision and location error gives the mean of the entire sequence, it is possible that the algorithm loses the target for a few frames but correctly tracks again. Therefore, these plots were presented to show the effectiveness of the tracking method. In the videos Blurcar1, Human7, Jogging1, and Jogging2, the proposed algorithm has the highest precision in the entire video. It has slightly low accuracy in the Blurcar3, Blurcar4, Boy, Car2, Dancer2 and Suv videos. The proposed algorithm has the lowest error in the Blurcar1, Human7, Jogging1, and Jogging2 videos. It has marginally high error compared with a few trackers for the Blurcar3, Blurcar4, Boy, Car2, Dancer2, and Suv sequences.
Frames per second (fps) analysis is presented in Table 3. In the Blurcar1, Car2, Dancer2, Human7, and Jogging1 videos, the proposed algorithm outperforms Modified KCF, MOSSECA, MACF, KCF_MTSA, and STC in terms of precision in error at the expense of modest frame rate.
The computational time for the learning rate module is presented in Table 4. It can be seen that the proposed tracker takes less time in motion blur sequences. However, the overall speed of the tracker is slightly slow, given in Table 3. Combining the different tracking modules presented in Section 3, performance of the proposed tracker is significant as each module is specifically designed and incorporated into the STC framework, making it efficient in terms of less error and high precision for different challenging attributes in VOT.

4.3. Qualitative Analysis

Figure 8 depicts the qualitative analysis of the proposed tracking with five state-of-the-art trackers. Modified KCF and KCF_MTSA are extensions of KCF [62] based tracking methods. However, Modified KCF is not robust to motion blur (Blurcar1, Blurcar3, and Human7), whereas the performance of KCF_MTSA is affected in occlusion (Jogging2) and motion blur (Human7). MACF is an improved version of fast discriminative scale space tracking [63] and achieved favorable results in various challenges of VOT. However, it does not perform well in motion blur (Blurcar1) and occlusion (Jogging1 and Jogging 2). MOSSECA is an improved context-aware formulation version of the MOSSE [64] tracker. The results are exceptional except in the Jogging1 and Human7 sequences. STC is the baseline tracker of the proposed method and achieves favorable results. However, it can be seen that it does not address occlusion (Jogging1 and Jogging2) or motion blur (Blurcar1, Blurcar3, Blurcar4, Boy, and Human7).
It can be seen that the proposed tracker outperforms other tracking methods in these sequences. This performance is attributed to three factors. First, a max-pooling-based scale scheme is incorporated, making it less sensitive to scale variations (Boy). Second, incorporation of the APCE-based modified occlusion detection mechanism and fractional-gain Kalman filter-based occlusion handling makes it effective toward occlusions (Jogging1, Jogging2, and Suv). Third, the combination of APCE criteria in the learning rate of the proposed algorithm model update effectively, making it efficient towards motion blur (Blurcar1, Blurcar3, Blurcar4, Boy, and Human7) and illumination variations (Car2 and Dancer2).

5. Discussion

We discuss several observations from performance analysis. First, max-pooling-based scale formulation in spatiotemporal context outperforms trackers without this formulation. This can be attributed to estimating maximum likelihood by using target appearance sampled at a different set of scales. Second, trackers which utilize modules for occlusion detection and handling module outperform trackers without these modules. This can be attributed to the fractional-gain Kalman filter and an APCE-based occlusion detection mechanism preventing tracker from drift. Third, trackers with adaptive learning rate perform better than those with fixed learning rate.

6. Conclusions

This paper contributes insight into an STC-based accurate tracking algorithm by incorporating max-pooling, fractional-gain Kalman, and APCE measures for occlusion detection and tracking model update. It can improve the adaptability of the target model and prevent error accumulation. Evaluations specify that the proposed tracker achieves enhanced results in various complicated scenarios. However, there are some problems: (1) tracking performance is severely affected in dense occlusion; (2) the tracker lost the target of interest in deformation and fast motion; and (3) frame rate of the proposed tracking method is slow. These three points will be the focus of follow-up research. Additionally, considering the challenges of VOT, we also plan to perform future in-depth research on the fusion of features and better prediction estimation mechanisms, and carry out Raspberry Pi, FPGA, and DSP-based hardware implementation and practical application for meeting the requirements of society.

Author Contributions

Writing—original draft presentation, K.M.; writing—review and editing, K.M., A.A., A.J., B.K., M.M. and K.M.C.; conceptualization, K.M., A.J. and A.A.; supervision, A.A. and A.J.; data analysis and interpretation, K.M., B.K. and M.M.; investigation, K.M., A.A. and B.K.; methodology, K.M., A.J. and A.A.; software, B.K., M.M., K.M.C. and A.H.M.; visualization, K.M., M.M. and B.K.; resources, A.J., K.M.C. and A.H.M.; project administration, A.A. and A.J.; funding acquisition, K.M.C. and A.H.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

No consent required. OTB-100 dataset is publicly available and can be used for research purposes. Dataset is used in more than 5000 published research articles.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Pantrigo, J.J.; Hernández, J.; Sánchez, A. Multiple and variable target visual tracking for video-surveillance applications. Pattern Recognit. Lett. 2010, 31, 1577–1590. [Google Scholar] [CrossRef]
  2. Ahmed, I.; Jeon, G. A real-time person tracking system based on SiamMask network for intelligent video surveillance. J. Real-Time Image Process 2021, 18, 1803–1814. [Google Scholar] [CrossRef]
  3. Carcagnì, P.; Mazzeo, P.L.; Distante, C.; Spagnolo, P.; Adamo, F.; Indiveri, G. A UAV-Based Visual Tracking Algorithm for Sensible Areas Surveillance. In Proceedings of the International Workshop on Modelling and Simulation for Autonomous Systems, Rome, Italy, 5–6 May 2014; Springer: New York, NY, USA, 2014; pp. 12–19. [Google Scholar]
  4. Geiger, A.; Lauer, M.; Wojek, C.; Stiller, C.; Urtasun, R. 3d traffic scene understanding from movable platforms. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 36, 1012–1025. [Google Scholar] [CrossRef] [Green Version]
  5. Wang, N.; Shi, J.; Yeung, D.-Y.; Jia, J. Understanding and diagnosing visual tracking systems. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 3101–3109. [Google Scholar]
  6. Bonatti, R.; Ho, C.; Wang, W.; Choudhury, S.; Scherer, S. Towards a robust aerial cinematography platform: Localizing and tracking moving targets in unstructured environments. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 4–8 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 229–236. [Google Scholar]
  7. Petitti, A.; di Paola, D.; Milella, A.; Mazzeo, P.L.; Spagnolo, P.; Cicirelli, G.; Attolico, G. A distributed heterogeneous sensor network for tracking and monitoring. In Proceedings of the 2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance, Krakow, Poland, 27–30 August 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 426–431. [Google Scholar]
  8. Wu, Y.; Lim, J.; Yang, M.-H. Online object tracking: A benchmark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 2411–2418. [Google Scholar]
  9. Danelljan, M.; Khan, F.S.; Felsberg, M.; Granström, K.; Heintz, F.; Rudol, P.; Wzorek, M.; Kvarnström, J.; Doherty, P. A low-level active vision framework for collaborative unmanned aircraft systems. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: New York, NY, USA, 2014; pp. 223–237. [Google Scholar]
  10. Petitti, A.; di Paola, D.; Milella, A.; Mazzeo, P.L.; Spagnolo, P.; Cicirelli, G.; Attolico, G. A heterogeneous robotic network for distributed ambient assisted living. In Human Behavior Understanding in Networked Sensing; Springer: Cham, Switzerland, 2014; pp. 321–338. [Google Scholar]
  11. Amorim, T.G.S.; Souto, L.A.; Nascimento, T.P.D.; Saska, M. Multi-Robot Sensor Fusion Target Tracking with Observation Constraints. IEEE Access 2021, 9, 52557–52568. [Google Scholar] [CrossRef]
  12. Ali, A.; Kausar, H.; Muhammad, I.K. Automatic visual tracking and firing system for anti aircraft machine gun. In Proceedings of the 6th International Bhurban Conference on Applied Sciences & Technology, Islamabad, Pakistan, 19–22 January 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 253–257. [Google Scholar]
  13. Cao, J.; Song, C.; Song, S.; Xiao, F.; Zhang, X.; Liu, Z.; Ang, M.H., Jr. Robust Object Tracking Algorithm for Autonomous Vehicles in Complex Scenes. Remote Sens. 2021, 13, 3234. [Google Scholar] [CrossRef]
  14. Lee, Y.; Lee, S.; Yoo, J.; Kwon, S. Efficient Single-Shot Multi-Object Tracking for Vehicles in Traffic Scenarios. Sensors 2021, 21, 6358. [Google Scholar] [CrossRef] [PubMed]
  15. Ali, A.; Jalil, A.; Niu, J.; Zhao, X.; Rathore, S.; Ahmed, J.; Iftikhar, M.A. Visual object tracking—Classical and contemporary approaches. Front. Comput. Sci. 2016, 10, 167–188. [Google Scholar] [CrossRef]
  16. Mazzeo, P.L.; Spagnolo, P.; Distante, C. Visual Tracking by using dense local descriptors. In Adaptive Optics: Analysis, Methods & Systems; Optical Society of America: Washington, DC, USA, 2015; p. JT5A-16. [Google Scholar]
  17. Ali, A.; Jalil, A.; Ahmed, J. A new template updating method for correlation tracking. In Proceedings of the 2016 International Conference on Image and Vision Computing New Zealand (IVCNZ), Palmerston North, New Zealand, 21–22 November 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–6. [Google Scholar]
  18. Abbasi, S.; Rezaeian, M. Visual object tracking using similarity transformation and adaptive optical flow. Multimed. Tools Appl. 2021, 80, 33455–33473. [Google Scholar] [CrossRef]
  19. Adamo, F.; Mazzeo, P.L.; Spagnolo, P.; Distante, C. A FragTrack algorithm enhancement for total occlusion management in visual object tracking. In Automated Visual Inspection and Machine Vision; International Society for Optics and Photonics: Bellingham, WA, USA, 2015; Volume 9530, p. 95300R. [Google Scholar]
  20. Yang, L.; Zhong-li, W.; Bai-gen, C. An intelligent vehicle tracking technology based on SURF feature and Mean-shift algorithm. In Proceedings of the 2014 IEEE International Conference on Robotics and Biomimetics (ROBIO 2014), Bali, Indonesia, 5–10 December 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 1224–1228. [Google Scholar]
  21. Matsushita, Y.; Yamaguchi, T.; Harada, H. Object tracking using virtual particles driven by optical flow and Kalman filter. In Proceedings of the 2019 19th International Conference on Control, Automation and Systems (ICCAS), Jeju, Korea, 15–18 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1064–1069. [Google Scholar]
  22. Mei, X.; Ling, H. Robust visual tracking and vehicle classification via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 2259–2272. [Google Scholar] [PubMed] [Green Version]
  23. Judy, M.; Poore, N.C.; Liu, P.; Yang, T.; Britton, C.; Bolme, D.S.; Mikkilineni, A.K.; Holleman, J. A digitally interfaced analog correlation filter system for object tracking applications. IEEE Trans. Circuits Syst. I Regul. Pap. 2018, 65, 2764–2773. [Google Scholar] [CrossRef]
  24. Adamo, F.; Carcagnì, P.; Mazzeo, P.L.; Distante, C.; Spagnolo, P. TLD and Struck: A Feature Descriptors Comparative Study. In International Workshop on Activity Monitoring by Multiple Distributed Sensing; Springer: Cham, Switzerland, 2014; pp. 52–63. [Google Scholar]
  25. Dong, E.; Deng, M.; Tong, J.; Jia, C.; Du, S. Moving vehicle tracking based on improved tracking–learning–detection algorithm. IET Comput. Vis. 2019, 13, 730–741. [Google Scholar] [CrossRef]
  26. Wu, Y.; Lim, J.; Yang, M. Object Tracking Benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1834–1848. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  27. Zhang, K.; Zhang, L.; Liu, Q.; Zhang, D.; Yang, M.H. Fast visual tracking via dense spatio-temporal context learning. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: New York, NY, USA, 2014; Volume 8693, pp. 127–141. [Google Scholar] [CrossRef] [Green Version]
  28. Die, J.; Li, N.; Liu, Y.; Wu, Y. Correlation Filter Tracking Algorithm Based on Spatio-Temporal Context. In Proceedings of the International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery, Kunming, China, 20–22 July 2019; Springer: Cham, Switzerland, 2019; pp. 279–289. [Google Scholar]
  29. Yang, X.; Zhu, S.; Zhou, D.; Zhang, Y. An improved target tracking algorithm based on spatio-temporal context under occlusions. Multidimens. Syst. Signal Process. 2020, 31, 329–344. [Google Scholar] [CrossRef]
  30. Zhang, Y.; Wang, L.; Qin, J. Adaptive spatio-temporal context learning for visual tracking. Imaging Sci. J. 2019, 67, 136–147. [Google Scholar] [CrossRef]
  31. Zhang, D.; Dong, E.; Yu, H.; Jia, C. An Improved Object Tracking Algorithm Combining Spatio-Temporal Context and Selection Update. In Proceedings of the 2020 Chinese Automation Congress (CAC), Shanghai, China, 6–8 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 4822–4827. [Google Scholar]
  32. Song, H.; Wu, Y.; Zhou, G. Design of bio-inspired binocular UAV detection system based on improved STC algorithm of scale transformation and occlusion detection. Int. J. Micro Air Veh. 2021, 13, 17568293211004846. [Google Scholar] [CrossRef]
  33. Feng, F.; Shen, B.; Liu, H. Visual object tracking: In the simultaneous presence of scale variation and occlusion. Syst. Sci. Control Eng. 2018, 6, 456–466. [Google Scholar] [CrossRef] [Green Version]
  34. Li, J.; Zhou, X.; Chan, S.; Chen, S. Robust object tracking via large margin and scale-adaptive correlation filter. IEEE Access 2017, 6, 12642–12655. [Google Scholar] [CrossRef]
  35. Zhang, M.; Xing, J.; Gao, J.; Hu, W. Robust visual tracking using joint scale-spatial correlation filters. In Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, 27–30 September 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1468–1472. [Google Scholar]
  36. Nguyen, A.H.; Mai, L.; Do, H.N. Visual Object Tracking Method of Spatio-temporal Context Learning with Scale Variation. In Proceedings of the International Conference on the Development of Biomedical Engineering in Vietnam, Ho Chi Minh City, Vietnam, 20–22 July 2020; Springer: Cham, Switzerland, 2020; pp. 733–742. [Google Scholar]
  37. Wang, X.; Hou, Z.; Yu, W.; Pu, L.; Jin, Z.; Qin, X. Robust occlusion-aware part-based visual tracking with object scale adaptation. Pattern Recognit. 2018, 81, 456–470. [Google Scholar] [CrossRef]
  38. Ma, H.; Lin, Z.; Acton, S.T. FAST: Fast and Accurate Scale Estimation for Tracking. IEEE Signal Process. Lett. 2019, 27, 161–165. [Google Scholar] [CrossRef]
  39. Danelljan, M.; Häger, G.; Khan, F.; Felsberg, M. Accurate scale estimation for robust visual tracking. In Proceedings of the British Machine Vision Conference, Nottingham, UK, 1–5 September 2014; BMVA Press: Durham, UK, 2014. [Google Scholar]
  40. Li, Y.; Zhu, J. A scale adaptive kernel correlation filter tracker with feature integration. In Proceedings of the European conference on computer vision, Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 254–265. [Google Scholar]
  41. Bibi, A.; Ghanem, B. Multi-template scale-adaptive kernelized correlation filters. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Santiago, Chile, 7–13 December 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 50–57. [Google Scholar]
  42. Lu, H.; Xiong, D.; Xiao, J.; Zheng, Z. Robust long-term object tracking with adaptive scale and rotation estimation. Int. J. Adv. Robot. Syst. 2020, 17, 1729881420909736. [Google Scholar] [CrossRef]
  43. Yin, X.; Liu, G.; Ma, X. Fast Scale Estimation Method in Object Tracking. IEEE Access 2020, 8, 31057–31068. [Google Scholar] [CrossRef]
  44. Ma, H.; Acton, S.T.; Lin, Z. SITUP: Scale invariant tracking using average peak-to-correlation energy. IEEE Trans. Image Process. 2020, 29, 3546–3557. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  45. Mehmood, K.; Jalil, A.; Ali, A.; Khan, B.; Murad, M.; Khan, W.U.; He, Y. Context-Aware and Occlusion Handling Mechanism for Online Visual Object Tracking. Electronics 2021, 10, 43. [Google Scholar] [CrossRef]
  46. Khan, B.; Ali, A.; Jalil, A.; Mehmood, K.; Murad, M.; Awan, H. AFAM-PEC: Adaptive Failure Avoidance Tracking Mechanism Using Prediction-Estimation Collaboration. IEEE Access 2020, 8, 149077–149092. [Google Scholar] [CrossRef]
  47. Mehmood, K.; Jalil, A.; Ali, A.; Khan, B.; Murad, M.; Cheema, K.M.; Milyani, A.H. Spatio-Temporal Context, Correlation Filter and Measurement Estimation Collaboration Based Visual Object Tracking. Sensors 2021, 21, 2841. [Google Scholar] [CrossRef] [PubMed]
  48. Yang, H.; Wang, J.; Miao, Y.; Yang, Y.; Zhao, Z.; Wang, Z.; Sun, Q.; Wu, D.O. Combining Spatio-Temporal Context and Kalman Filtering for Visual Tracking. Mathematics 2019, 7, 1059. [Google Scholar] [CrossRef] [Green Version]
  49. Ali, A.; Mirza, S.M. Object tracking using correlation, Kalman filter and fast means shift algorithms. In Proceedings of the 2006 International Conference on Emerging Technologies, Peshawar, Pakistan, 13–14 November 2006; IEEE: Piscataway, NJ, USA, 2006; pp. 174–178. [Google Scholar]
  50. Kaur, H.; Sahambi, J.S. Vehicle tracking using fractional order Kalman filter for non-linear system. In Proceedings of the International Conference on Computing, Communication & Automation, Greater Noida, India, 15–16 May 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 474–479. [Google Scholar]
  51. Soleh, M.; Jati, G.; Hilman, M.H. Multi Object Detection and Tracking Using Optical Flow Density–Hungarian Kalman Filter (Ofd-Hkf) Algorithm for Vehicle Counting. J. Ilmu Komput. dan Inf. 2018, 11, 17–26. [Google Scholar] [CrossRef]
  52. Farahi, F.; Yazdi, H.S. Probabilistic Kalman filter for moving object tracking. Signal Process. Image Commun. 2020, 82, 115751. [Google Scholar] [CrossRef]
  53. Gunjal, P.R.; Gunjal, B.R.; Shinde, H.A.; Vanam, S.M.; Aher, S.S. Moving object tracking using kalman filter. In Proceedings of the 2018 International Conference on Advances in Communication and Computing Technology (ICACCT), Sangamner, India, 8–9 February 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 544–547. [Google Scholar]
  54. Ali, A.; Jalil, A.; Ahmed, J.; Iftikhar, M.A.; Hussain, M. Correlation, Kalman filter and adaptive fast mean shift based heuristic approach for robust visual tracking. Signal Image Video Process. 2015, 9, 1567–1585. [Google Scholar] [CrossRef]
  55. Kaur, H.; Sahambi, J.S. Vehicle tracking in video using fractional feedback Kalman filter. IEEE Trans. Comput. Imaging 2016, 2, 550–561. [Google Scholar] [CrossRef]
  56. Zhou, X.; Fu, D.; Shi, Y.; Wu, C. Adaptive Learning Compressive Tracking Based on Kalman Filter. In Proceedings of the International Conference on Image and Graphics, Shanghai, China, 13–15 September 2017; Springer: Cham, Switzerland, 2017; pp. 243–253. [Google Scholar]
  57. Zhang, Y.; Yang, Y.; Zhou, W.; Shi, L.; Li, D. Motion-aware correlation filters for online visual tracking. Sensors 2018, 18, 3937. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  58. Mueller, M.; Smith, N.; Ghanem, B. Context-aware correlation filter tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1396–1404. [Google Scholar]
  59. Shin, J.; Kim, H.; Kim, D.; Paik, J. Fast and robust object tracking using tracking failure detection in kernelized correlation filter. Appl. Sci. 2020, 10, 713. [Google Scholar] [CrossRef] [Green Version]
  60. Sierociuk, D.; Dzieliński, A. Fractional Kalman filter algorithm for the states, parameters and order of fractional system estimation. Int. J. Appl. Math. Comput. Sci. 2006, 16, 129–140. [Google Scholar]
  61. Wang, M.; Liu, Y.; Huang, Z. Large margin object tracking with circulant feature maps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 4021–4029. [Google Scholar]
  62. Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 37, 583–596. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  63. Danelljan, M.; Häger, G.; Khan, F.S.; Felsberg, M. Discriminative scale space tracking. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1561–1575. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  64. Bolme, D.S.; Beveridge, J.R.; Draper, B.A.; Lui, Y.M. Visual object tracking using adaptive correlation filters. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 2544–2550. [Google Scholar]
Figure 1. Challenging scenarios in visual object tracking (VOT). The first row shows motion blur in an image sequence. The second row shows the scale variation of the target. The third row shows heavy occlusion of the target. Pictures in the figure are part of OTB-100 dataset [26].
Figure 1. Challenging scenarios in visual object tracking (VOT). The first row shows motion blur in an image sequence. The second row shows the scale variation of the target. The third row shows heavy occlusion of the target. Pictures in the figure are part of OTB-100 dataset [26].
Sensors 21 08481 g001
Figure 2. The spatial relation between object and its context. Picture in the figure is part of OTB-100 dataset [26].
Figure 2. The spatial relation between object and its context. Picture in the figure is part of OTB-100 dataset [26].
Sensors 21 08481 g002
Figure 3. Flowchart of proposed tracking method.
Figure 3. Flowchart of proposed tracking method.
Sensors 21 08481 g003
Figure 4. Occlusion detection mechanism. Pictures in the figure are part of OTB-100 dataset [26].
Figure 4. Occlusion detection mechanism. Pictures in the figure are part of OTB-100 dataset [26].
Sensors 21 08481 g004
Figure 5. Learning rate mechanism. Pictures in the figure are part of OTB-100 dataset [26].
Figure 5. Learning rate mechanism. Pictures in the figure are part of OTB-100 dataset [26].
Sensors 21 08481 g005
Figure 6. Precision plot comparison for the OTB-100 dataset [26].
Figure 6. Precision plot comparison for the OTB-100 dataset [26].
Sensors 21 08481 g006aSensors 21 08481 g006b
Figure 7. Center location error (in pixels) comparison for the OTB-100 dataset [26].
Figure 7. Center location error (in pixels) comparison for the OTB-100 dataset [26].
Sensors 21 08481 g007aSensors 21 08481 g007bSensors 21 08481 g007c
Figure 8. Qualitative comparison for the OTB-100 dataset [26].
Figure 8. Qualitative comparison for the OTB-100 dataset [26].
Sensors 21 08481 g008aSensors 21 08481 g008bSensors 21 08481 g008c
Table 1. Distance precision rate.
Table 1. Distance precision rate.
SequenceProposedModified KCF [59]STC [27]MACF [57]MOSSECA [58]KCF_MTSA [41]
Blurcar10.9780.8580.0240.6980.9990.999
Blurcar30.8960.8290.406111
Blurcar40.8760.9870.1130.94411
Boy0.9730.640.761111
Car20.9881110.9931
Dancer20.99311111
Human70.9040.760.3320.6360.8240.448
Jogging10.9730.9930.2280.2310.2310.964
Jogging20.8660.9450.1860.16610.189
Suv0.7780.9780.8050.9780.9760.98
Mean Precision0.9230.8990.4860.7650.9020.858
Table 2. Average center location error.
Table 2. Average center location error.
SequenceProposedModified KCF [59]STC [27]MACF [57]MOSSECA [58]KCF_MTSA [41]
Blurcar14.8616.051.31 × 10685.166.346.01
Blurcar39.1214.4671.373.692.983.7
Blurcar415.0111.192.61 × 1038.0410.157.15
Boy8.0950.3427.42.652.312.91
Car22.683.9612.431.555.392.13
Dancer26.826.4115.36.485.86.68
Human77.5916.7442.9819.6212.1436.63
Jogging18.393.72501094.97115.984.27
Jogging214.24.74104.02147.773.47136.4
Suv15.363.65483.343.733.71
Mean Error9.21213.1261.3 × 10637.32716.82920.959
Table 3. Frames per second (fps).
Table 3. Frames per second (fps).
SequenceProposedModified KCF [59]STC [27]MACF [57]MOSSECA [58]KCF_MTSA [41]
Blurcar110.7866.2927.7518.553.0615.35
Blurcar318.0433.6228.8732.751.746.08
Blurcar45.721.4220.078.6427.655.83
Boy26.6785.5133.4858.7157.1722.02
Car257.1890.794.0855.395.3811.2
Dancer229.6629.6565.129.238.876.26
Human725.1734.4459.6640.526.1111.48
Jogging142.7195.4561.754936.5912.55
Jogging222.7733.0156.9234.633.9711
Suv69.6176.3298.0350.979.78.44
Table 4. Computation time of the proposed tracker’s learning rate module.
Table 4. Computation time of the proposed tracker’s learning rate module.
SequenceFrame SizeNumber of FramesTime
Blurcar1640 × 4807420.011
Blurcar3640 × 4803570.008
Blurcar4640 × 4803800.009
Boy640 × 4806020.009
Car2320 × 2409130.018
Dancer2320 × 2621500.006
Human7320 × 2402500.007
Jogging1352 × 2883070.012
Jogging2352 × 2883070.008
Suv320 × 2409450.017
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Mehmood, K.; Ali, A.; Jalil, A.; Khan, B.; Cheema, K.M.; Murad, M.; Milyani, A.H. Efficient Online Object Tracking Scheme for Challenging Scenarios. Sensors 2021, 21, 8481. https://doi.org/10.3390/s21248481

AMA Style

Mehmood K, Ali A, Jalil A, Khan B, Cheema KM, Murad M, Milyani AH. Efficient Online Object Tracking Scheme for Challenging Scenarios. Sensors. 2021; 21(24):8481. https://doi.org/10.3390/s21248481

Chicago/Turabian Style

Mehmood, Khizer, Ahmad Ali, Abdul Jalil, Baber Khan, Khalid Mehmood Cheema, Maria Murad, and Ahmad H. Milyani. 2021. "Efficient Online Object Tracking Scheme for Challenging Scenarios" Sensors 21, no. 24: 8481. https://doi.org/10.3390/s21248481

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop