^{1}

^{2}

^{3}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Tracking moving targets in complex scenes using an active video camera is a challenging task. Tracking accuracy and efficiency are two key yet generally incompatible aspects of a Target Tracking System (TTS). A compromise scheme will be studied in this paper. A fast mean-shift-based Target Tracking scheme is designed and realized, which is robust to partial occlusion and changes in object appearance. The physical simulation shows that the image signal processing speed is >50 frame/s.

Visual tracking plays an important role in various computer vision applications, such as surveillance [

Target tracking, according to its properties, can be mainly divided into two types: feature- and optical flow-based approaches. Optical flow is the vector field which describes how the image changes with time [

Feature-based algorithms were originally developed for tracking a small number of salient features in an image sequence. These features include: color, grain, contour and some detection operators such as invariant feature transform (SIFT) [

The TM is a simple and popular technique in target tracking, which is widely used in civilian and military automatic target recognition systems. Given an input and a template image, the matching algorithm finds the partial image that most closely matches the template image in terms of some specific criterion, such as the Euclidean distance or cross correlation. The conventional template matching methods consume a large amount of computational time. A number of techniques have been investigated with the intent of speeding up the template matching, and have given perfect results [

The Kalman filter and particle filter are used to estimate target location in the next frame, which has also been extensively studied. Comparing to the Kalman filter, the particle filter has a more robust performance in the case of nonlinear and non-Gaussian problems due to the simulated posterior distribution. Many efforts have been carried out to speed up the particle filter. Martinez-del-Rincon

In image sequences, the target appearances have a strong correlation. Among all appearance based tracking models, there is one popular subset called “subspace model”. Black [

The MS based tracker has very good robustness to the variation of translation, rotation and scale. The MS algorithm is a nonparametric density gradient estimation approach to local mode seeking and it was originally invented for data clustering. Comaniciu [

One of MS's drawbacks is that it often converges slowly. To the best of our knowledge, few attempts have been made to speed up the convergence of MS. The k_{d}-tree can be used to reduce the large number of nearest-neighbor queries. Although a dramatic decrease in the computational time is achieved for high-dimensional clustering, these techniques are not attractive for relatively low-dimensional problems such as visual tracking. Cheng [

The innovative work in this paper is to propose a novel fast robust tracking algorithm combining the MS with the template match (TM), which is a balanced scheme between robustness and real-time performance. A fast MS-based target tracking scheme is designed and implemented, which has a good robustness to target pose variation and partial occlusion. The hardware-in-loop simulation shows that the image signal processing speed is >50 frame/s.

The paper is organized as follows: the target tracking system description is described in Section 2, the hardware composition is presented in Section 3, the software structure and algorithm are described in details in Section 4, and, finally, Section 5 reports tests and results, and Section 6 describes the future works.

As shown in

Robustness. In a complex background, most of the applications require the tracker to be robust to partial occlusion, clutter and changes in object appearance.

Real-time performance. TTS needs to complete the image signal pre-processing, tracking and predicting target location, control 2D-turntable and other computational tasks which requires that the image processing speed should be >25 frames/s, and for some special applications processing speeds need to be >50 frame/s.

The signal flow diagram of a typical target tracking system is shown in _{p}_{c}_{m}

The video signal processer used in this paper is the TDS642EVM multi-channel real-time image processing platform produced by the TI Company. Its main performance features are listed in

The structure of the TDS642EVM is shown in

The pitch and yaw axis of 2D-turntable (as shown in

The structure of the TTS software is shown in

The tracking algorithm is to identify the location of the target in the current image. A fast robust MS-based target tracking algorithm is presented.

The target prediction algorithm is to predict the location of the target in the next image though the sequence image. There are many algorithms that can achieve the prediction goal such as Kalman filter, particle filter and linear prediction method. Although the Kalman filter and particle filter [

Kernel density estimation is a nonparametric method that extracts information about the underlying structure of a data set when no appropriate parametric model is available. Given _{i}^{d×d}, the kernel density estimation at the location _{k}_{G}

According to the classical MS tracking algorithm [

Target feature vectors:

Candidate target feature vectors:
_{i}_{h}

The similarity function defines a distance between target model and candidates. To accommodate comparisons among various targets, this distance should have a metric structure. We define the distance between two discrete distributions as:

To find the location of the target in the current frame, the distance (5) of a function of

Thus, the probabilities {_{u}_{0}_{u=1,…}_{m}_{0} in the current frame must be computed first. Using Taylor expansion around the values _{u}_{0}

This approximation is satisfactory when the candidate {_{u}_{u}_{=1,…}_{m}_{u}_{1})}_{u}_{=1,…}_{m}

In which:

In this way, minimizing _{1}. Thus we can use the MS procedure to find the great density estimation value in the neighborhood:

The general MS algorithm steps are as follows [

Given: the target model {_{u}_{u=1,…,m} at _{0}_{1} is the new location of spot. Then the flow of MS algorithm is:

Set the spot with a feature vector {_{u}_{u}_{=1,…,}_{m}_{0} in the previous frame.

Compute the feature vector of candidate spot {_{u}_{0})}_{u}_{=1,…,}_{m}

Derive {_{i}_{i}_{=}_{1}_{…}_{m}

Find the new location of spot with

Compute {_{u}_{1})}_{u}_{=1,…}_{m}

While _{1}_{0}

Do

Evaluate ρ[_{1}),

If ║ _{1} − _{0} ║<

Else _{0} ← _{1} jump to 2

References [

From another point of view, bound optimization methods always adopt conservative bounds in order to guarantee increasing the cost function value at each iteration [_{G} is the MS shift vector, and then the over-relaxed bound optimization iteration is given by:

Apparently when the

Initialization:

Set the iteration index

Iterate until convergence condition is met:

Compute _{i}_{+1} with _{G}(_{i}_{+1})= _{i}_{+1}_{i}

_{i}_{+1} = _{i}_{G}(_{i}_{+1})

If _{i}_{+1})_{i}

Accept _{i}_{+1} and

Else reject _{i}_{+1}, and _{i}_{+1} = _{i}_{+1},

Set

If m_{G}(y_{i}_{+1})<

We compare the performance of the accelerated MS algorithm to the standard MS algorithm on real images (as shown in

The occlusion issue is a technical challenge in the image tracking field. Many methods have been proposed to solve this problem. In this paper, the Bhattacharyya coefficient is used to determine whether the target is in occlusion or lost. Setting thresholds T1, T2, if T1 < Bhattacharyya coefficients < T2, the target is considered to be occluded, if Bhattacharyya coefficients < T1, the target is considered to be lost. In addition, by the effects of the environment illumination and the target appearance changes, the Bhattacharyya coefficient of the target candidate is, in general, the local maximum rather than the global maximum. When the target is in occlusion, the distance between the local maximum and the global maximum would increase, so some special method needs to be implemented to improve the tracking robustness. The Local Template Matching (LTM) method is used in this article to solve this problem. Template Matching (LM) is an existing algorithm, and, usually, it is a global template matching technique. In this paper template matching is implemented in the region of the candidate target, so here it is called Local Template Matching.

The final location (_{Min}(

In order to improve the TTS response speed it is necessary to use the prediction method in the tracking scheme. Compared to the Kalman filter and particle filter, the linear prediction algorithm is less complex and offers moderate performance. In this paper we use the linear prediction method to get the predicted angular position of the target.

A simple method to estimate the location of the target in the image can be formulated by the following equation:

where (_{t}_{+1}, _{t}_{+1}) represents the estimated location of the target, (Δ _{t}_{+1}, Δ _{t}_{+1}) represents the estimated shift vector from _{t}_{t}

Another advanced algorithm which formulates the shift vector (Δ_{t}_{+1}, Δ_{t}_{+1}) as a linear combination of the shift vectors: {(Δ_{t}_{−}_{k}_{t}_{−}_{k}_{t}_{−}_{k}_{+1}, Δ_{t}_{−}_{k}_{+1}),…, (Δ_{t}_{t}_{k}_{0},_{k}_{0} is a group of fix coefficients which are set offline.

The 2D-Turntable's pitch and yaw angular deviation can be obtained by the following formula:
_{x}_{y}

A reliable PD controller is used for the tracking system, and the angular deviation _{m}

The kernel function has an important influence on the experimental results. In this paper the Epanechnikove kernel profile is used as:

The quantization function

Region of interest (ROI) is 20 × 20.

Four experiments have been implemented to test the above target tracking scheme. A wireless remote control car (as shown in

From the following tracking image sequence, we can find two rectangular boxes. One represents the center of the optical system; the other represents the target location in the current image. The distance between the two rectangular boxes are used as errors to control the 2D-turntable. When the target is in stop condition, the two rectangular boxes should overlap.

From the following experiments results, we can conclude that the TTS designed in this paper has good robustness to the target pose variation and occlusion. The system totally processes an image in 18.21 ms, in which the fast MS consuming 14.6 ms, TM consumes 1.83 ms, other algorithms consume 1.78 ms. The Target Tracking Scheme time-consuming statistical table is as shown in

In this paper, a balanced scheme between the robustness and real-time performance of a TTS is presented. A novel robust tracking algorithm combining the MS with template match (TM) has been proposed, which has a good robustness to target pose variation, partial occlusion, and a fast MS-based target tracking scheme is designed and implemented. The hardware-in-loop simulation shows that the image signal processing speed is >50 frame/s. The TTS presented in this paper utilized s common CCD camera to realize acquisition of images, but for some special applications infrared CCD sensors or heterogeneous sensors are used, so IR CCD or heterogeneous sensor-based fast target tracking techniques would be a future research direction.

The authors thank the anonymous reviewers for their professional questions and constructive comments that help improve the quality of this manuscript. This paper was funded by the National Natural Science Foundation of China (No. 60904089) and the Fundamental Research Funds for the Central Universities of China. This research work was mainly carried out in Northwest Polytechnical University. The authors wish to acknowledge the important contributions made by Fengqi Zhou, Jun Zhou and Yanning Zhang.

The target tracking system structure chart.

Signal flow diagram of a typical target tracking system.

The structure of TDS642EVM.

2D turntable and video camera.

The structure of the TTS software structure.

Two images for fast MS

The template matching algorithm.

The scheme of the feed forward compensation-based PD controller.

The wireless remote car.

Tracking with the traditional MS.

Tracking with proposed method in case of poses variation.

Tracking with proposed method in case of occlusion.

Tracking with proposed method in case of poses variation under complex scene.

The mainly performance of TDS642EVM.

Video In/Out | PAL/NTSC/SECAMS | |

External Interface | RS232 UART |

Performance of the 2D turntable.

Maximum Speed | 10°/s |

Rotation Range | Pitch: ±20°; Yaw: ±80° |

Motor Type | Stepper Motor |

Maximum Torque | 2 Nm |

Comparison of CPU times for two cases.

| ||||
---|---|---|---|---|

Number of iterations | CPU time | Number of iterations | CPU time | |

Fast MS | 8 | 18.8 ms | 7 | 14.2 ms |

Standard MS | 26 | 61.1 ms | 28 | 60.8 ms |

TTS time-consuming statistical.

Fast MS (10 iteration) | 14.6 ms |

TM | 1.83 ms |

Other | 1.78 ms |

Total | 18.21 ms |