^{1}

^{2}

^{*}

^{3}

^{3}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license http://creativecommons.org/licenses/by/3.0/.

Model-free tracking is important for solving tasks such as moving-object tracking and action recognition in cases where no prior object knowledge is available. For this purpose, we extend the concept of spatially synchronous dynamics in spin-lattice models to the spatiotemporal domain to track segments within an image sequence. The method is related to synchronization processes in neural networks and based on superparamagnetic clustering of data. Spin interactions result in the formation of clusters of correlated spins, providing an automatic labeling of corresponding image regions. The algorithm obeys detailed balance. This is an important property as it allows for consistent spin-transfer across subsequent frames, which can be used for segment tracking. Therefore, in the tracking process the correct equilibrium will always be found, which is an important advance as compared with other more heuristic tracking procedures. In the case of long image sequences,

How can me make sense out of a complex visual scene having no or only little prior knowledge about its contents and the objects therein? Such problems occur, for example, if we wish to learn cause-effects in an hitherto unknown environment. Vice versa, many object definitions are only meaningful within the context of a given scenario and a set of possible actions.

Object tracking,

In the works described above, assumptions of the nature of objects being tracked or the data itself is being made either in the tracking procedure itself (matching assumptions) or already in the segmentation step, for example by assuming a priori a data model of some kind. In our work, we aim to reduce a priori assumptions on the data by choosing a data-driven method,

Superparamagnetic clustering finds the equilibrium states of a ferromagnetic Potts model defined by an energy function in the superparamagnetic phase [

The equilibrium states of the Potts model have been approximated in the past using the Metropolis-Hastings algorithm with annealing [

In our method, tracking of segments is accomplished through simultaneous segmentation of adjacent frames which are linked using local correspondence information, e.g., computed via standard algorithms for optic flow [

Thus, the main contribution here is the development of a data-driven, model-free tracking algorithm,

The paper is structured as follows: In Section 2, we extend the method of superparamagnetic clustering in spin models to the temporal dimension and introduce the controller algorithm. We also discuss the method in front of the background of energy minimization methods in Markov random fields. In Section 3, we first verify the core algorithm using short image sequences because these are more suitable to introduce and test the method. We further investigate the sensitivity of the algorithm to system parameters and noise. Then, we demonstrate that segment tracking can be achieved for real movies. The performance of the method is quantified in terms of partitioning consistency along an artificial image sequence. In Section 4, the results are discussed.

Segment tracking can be roughly divided into the following subtasks: (i) image segmentation, (ii) linking (tracking), and (iii) stabilization (tracking). The third point acknowledges that segments, unlike objects, are not per se stable entities, but are sensitive to changes in the visual scene. Subtasks (i–ii) will be solved using a conjoint spin-relaxation process emulated in an n-dimensional (n-D) lattice, which defines the core algorithm (Section 2.1). Local correspondence information for linking is obtained using standard algorithms for either stereo or optic flow [

Since simultaneous segmentation of long image sequences is practically impossible due to the high computational costs, we usually split the image sequence into a sequence of pairs. For example, the subsequent frames _{0}, _{1} and _{2} are split into two pairs {_{0}, _{1}} and {_{1}, _{2}}, where the last frame of previous pair is identical to the first frame of the next pair. If a segment of the last frame of {_{0}, _{1}} and a segment of the first frame of {_{1}, _{2}} occupy the same image region, we can assign the same segment label to both segments. This way segments can be tracked through the entire sequence. Since the algorithm preserves detailed balance (Section 2.1), spins can be transferred from one frame to the next, greatly reducing the number of iterations required to achieve a stable segmentation.

We further stabilize segment tracking by introducing a feedback controller (Section 2.2). In long image sequences, partitioning instabilities are likely to arise at some point during the tracking process. Thus, segments may be lost due to merging or splitting of segments. The feedback controller detects these kind of instabilities and adjusts a control parameter of the core algorithm to recover the original segments.

The method of superparamagnetic clustering has been previously used to segment single images [

The aim of this work is to find corresponding image regions in image sequences, _{1}, .., _{M}

_{i}_{j}_{i}

if point _{i}_{i}_{j}

if _{i}, y_{i}, t_{i}_{r}_{i}_{i}, y_{i}_{i}, t_{i}_{r}_{i}_{i}_{i}_{i}, t_{i}_{i}_{i}

To perform this task, we assign a spin variable _{i}_{2}_{D}_{i}, y_{i}, t_{i}_{k}, y_{k},t_{k}_{2}_{D}_{nD}_{nD}_{i}_{j}_{ij}

We define for every bond on the lattice the distance
_{i}_{j}

The spin model is now implemented such a way that neighboring spins with similar color have the tendency to align. We use a _{2}_{D}_{nD}_{2}_{D}_{nD}_{i}

We find the equilibrium spin configuration using a clustering algorithm. In a first step, “satisfied” bonds, _{i}_{j}_{ij}

The ECU algorithm computing the equilibrium of

Initialization: A spin value _{i}

Computing bond freezing probabilities: If two spins _{i}_{j}

Cluster identification: Pixels which are connected by frozen bonds define a cluster. A pixel belonging to a cluster

Cluster updating: We perform a Metropolis update [_{k}_{i}

The probability for the choosing the new spin value _{k}_{k}_{2}_{D}_{k}_{j}_{nD}_{k}_{j}

Similar to a Gibbs sampler, the selecting probability
_{k}

Iteration: The new spin states are returned to step 2 of the algorithm, and steps 2–5 are repeated, until the total number of clusters stabilizes,

Segments are defined as groups of correlated spins and can be extracted using a thresholding procedure. All pairs of pixels connected by a bond (_{i}_{j}

In an earlier study we had provided evidence that this algorithm obeys

The consequence of detailed balance is that spin states can be transferred across image pairs, where spins are being calculated for one pair (the first pair) and then pixels in the next two frames (the second pair) are just assigned these spins from where on a new relaxation process starts (see

The following should be noted. In this method, bonds between adjacent frames are created from the precomputed optic-flow or disparity maps and frozen with a probability that depends on the feature similarity of the respective corresponding pixels. Whether these bonds are frozen in the final configuration (

Segmentation instabilities arising during the tracking process can be partly removed by adjusting the temperature parameter of the core algorithm. The temperature choice affects the formation of segments, hence, a segment which has been lost in a previous frame can sometimes be recovered by increasing the temperature for a certain period.

The feedback controller tracks the size of the segments and reacts if the size of a segment changes suddenly. The first controller function
_{j}_{j}_{j}_{j}_{1} are constant parameters. The history of segment _{j}_{j}

Segmentation instabilities may cause a segment to be lost, for example through segment merging or splitting. We define two threshold parameters _{2} and _{3}. An unexpected segment loss is detected by the controller if the conditions

A schematic of the entire system,

The method described in this paper is compared with energy minimization in Markov random fields. Similarities and differences of the approaches are analyzed and summarized in this section. Particular attention is given to combinatorial graph cuts methods, which have provided powerful computer vision algorithms for stereo, motion, image restoration, and image segmentation in recent years [

The image segmentation problem has been previously formulated in terms of finding a discrete labeling _{p}_{smooth} = 0. We are assuming _{p,q}

Various techniques have been proposed to find the minimum energy configuration, e.g., simulated annealing [

The method of superparamagnetic clustering also formulates an energy function (see

Finding the equilibrium states in the Potts model is in general NP hard. In practice, the Swendsen-Wang algorithm often shows fast convergence to the equilibrium state, unlike the Metropolis-Hastings algorithm, which usually requires exponential time. For certain instances of the Potts model, rapid mixing,

For the 2D Ising model without an external field,

We apply the algorithm to various synthetic and real image sequences. Unless otherwise indicated, the following parameter values _{2}_{D}_{nD}

We first use stereo image pairs and a three-frame motion sequence to test and verify the core method (Section 2.1) before applying the algorithm to long image sequences,

We first demonstrate the algorithm for a synthetic scene which contains a single, solid square, which is shifted by a disparity value of 40 pixels along the ^{2}. We estimate the disparity of the pixels by applying a stereo algorithm [

We investigate the sensitivity of the algorithm with regard to the parameter _{c}_{c}/N^{−4}. For a noise level of 0%, the performance of the algorithm is only weakly sensitive to changes in temperature (red line). However, when adding noise to the images, the algorithm becomes more sensitive to changes in temperature (blue line), but fast saturates for increasing noise levels (black and green line). For each noise level, the segmentation results are depicted for

We further investigate the performance of the algorithm with respect to establishing correspondences on the example of the Cones stereo pair (^{2} image points. The plot demonstrates that the performance of the algorithm is higher for large segments than for small segments, confirming our expectation that color segmentation works best for large uniform image regions. In textured areas, corresponding to very small segment sizes, the performance of the algorithm decreases rapidly.

We also investigated the influence of errors in the precomputed disparity on the performance of the algorithm by replacing disparity values of the ground-truth map randomly by erroneous values ranging from 0 to

This stereo pair shows two views of a scene of cluttered objects, ^{2}. This stereo pair is demanding because of the amount of occlusion, the light reflexions, shadows, and the large disparities, which lead to perspective distortions, posing a problem to approaches based on segment matching. The stereo algorithm returns reliable disparity values at the edges (

So far we had been validating our method using synthetic and real stereo pairs. Now we demonstrate that spatiotemporal synchronization of spins enables segments to be tracked through the frames of real movies.

We apply the core algorithm to three frames of a motion sequence showing a woman walking from the right to the left. The sequence was obtained from ^{2} (_{0} to frame _{1}, and from frame _{1} to frame _{2}, are depicted in

When analyzing long motion sequences, it is inefficient to apply the algorithm to all frames at once because the computational costs increase with the number of pixels. Hence, we split the sequences in pairs of two frames at a time, where the last frame of the previous sequence is identical with the first frame of the next sequence. Then, we initialize the spin states of each sequence with the final spin states of the previous sequence. The spin states for the first sequence containing frame _{0}_{1} after 100 iterations are shown in _{1} and _{2}, where the spin states of both frame have been initialized to the final spin states of frame _{1} of the previous sequence. The spin states after 13 iterations are shown in

The segments of adjacent image pairs are connected as follows. Two segments belonging to the segmentation of frame _{1} of pair {_{0}, _{1}} and frame _{1} of pair {_{1}, _{2}}, respectively, are assigned the same label if they occupy the same region in image frame _{1}. For this purpose, we compute the percentage of the cluster areas which overlap in the image space. If the overlap is larger than a fixed threshold, the clusters are assigned the same label. This way we can track the segment through the whole sequence.

We add feedback control (see Section 2.2) with parameters _{1}_{2} = 0.9, and _{3} = 0.6 to the core algorithm with temperature

The work of the feedback controller is further illustrated in

We further applied the algorithm to another movie, showing the filling of a cup with sugar (

We obtained similar segment-tracking results for other real movies, e.g., Moving Object, Making Sandwich, Opening a Book. Results can be found at

We use an artificial image sequence to demonstrate that both 3D linking and feedback control improve consistency of the partitioning into segments of adjacent frames. The original image consists of 4 × 4 uniformly-valued squares. By adding Gaussian noise to the image, we create an image sequence of 40 frames (see

We presented an algorithm for model-free segment tracking based on a novel, conjoint framework, combining local correspondences and image segmentation to synchronize the segmentation of adjacent images. The algorithm provides a partitioning of the image sequence in segments, such that points in a segment are more similar to each other than to points in another segment, and such that corresponding image points belong to the same segment. We tested the method on various synthetic and real image sequences, and showed stable and reliable results overall, thus fulfilling the most important requirement of segmentation algorithms. The method leads to the formation of stable region correspondences despite largely incomplete disparity or optic-flow maps. Similar algorithms for the extraction of region correspondences could potentially be constructed using other image segmentation algorithms,

We further introduced a feedback controller which allows to detect segmentation instabilities,

Segment tracking has been performed previously in the context of video segmentation [

There have been a few other approaches combining image segmentation with correspondence information. The work by Toshev

The controller employed in this model serves the detection and removal of segmentation instabilities. No assumptions about the objects giving rise to the measurements,

The method described here is related to energy minimization in Markov random fields which has been used to solve vision problems many times before [

The algorithm has potential applications in model-free moving object detection and tracking by merging coherently moving segments (Gestalt law of common fate). The method is further applicable to action-recognition tasks, where certain characteristic action patterns are inferred from the spatiotemporal relationships of segments. First results for this problem are reported in [

Currently, the algorithm requires ≈ 4-5 s per frame for images of size 160 × 140 pixels and ≈ 43 s per frame for images of size 360 × 240 pixels (Taking-an-apple sequence) on an Intel Dual Core CPU with 3.16 GHz RAM (for each core). Since our goal is the development of a vision-front end for real-time video segment tracking on top of which other algorithms,

We thank Sinan Kalkan for valuable discussion. The work has received support from the German Ministry for Education and Research (BMBF) via the Bernstein Center for Computational Neuroscience (BCCN) Göttingen under Grant No. 01GQ0430 and the EU Project PACO-PLUS under Contract No. 027657.

Solid-square stereo pair.

Sensitivity analysis. _{c}_{c}/N

Cluttered-objects stereo pair.

_{0}, _{1}, and _{2}, respectively. _{0} to _{1}, and _{1} to _{2}. _{0} and _{1}. _{1} and _{2}. _{0} and _{1} (dashed line) and for the second sequence containing frame _{1} and _{2} (solid line).

Feedback control for segmentation stabilization.

Segment tracking for real movies.

Improving partitioning consistency using 3D linking and feedback control.