^{*}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

In this paper, we propose an application of a compressive imaging system to the problem of wide-area video surveillance systems. A parallel coded aperture compressive imaging system is proposed to reduce the needed high resolution coded mask requirements and facilitate the storage of the projection matrix. Random Gaussian, Toeplitz and binary phase coded masks are utilized to obtain the compressive sensing images. The corresponding motion targets detection and tracking algorithms directly using the compressive sampling images are developed. A mixture of Gaussian distribution is applied in the compressive image space to model the background image and for foreground detection. For each motion target in the compressive sampling domain, a compressive feature dictionary spanned by target templates and noises templates is sparsely represented. An _{1} optimization algorithm is used to solve the sparse coefficient of templates. Experimental results demonstrate that low dimensional compressed imaging representation is sufficient to determine spatial motion targets. Compared with the random Gaussian and Toeplitz phase mask, motion detection algorithms using a random binary phase mask can yield better detection results. However using random Gaussian and Toeplitz phase mask can achieve high resolution reconstructed image. Our tracking algorithm can achieve a real time speed that is up to 10 times faster than that of the _{1} tracker without any optimization.

In the field of computer vision, video surveillance is always an important tool in a variety of security applications. The challenge in video surveillance systems is that the use of conventional imaging approaches in such applications can result in overwhelming data bandwidths. To solve this problem, researchers generally compress those high-resolution video streams by using various data compression algorithms to reduce the overall bandwidth to a more manageable level. However, the optics and photo detector hardware must still operate at the native bandwidth, which seriously wastes valuable sensing resources and increases overall system cost. In fact, in video surveillance systems moving objects occupy only a small part of the full image, and a large portion of any obtained image data is redundant, such as the static background in the field of view that is repeated in every frame. We thus pose the following question: could we directly obtain compressed images during the collection process while ensuring that relevant information is preserved, only using these compressive measurements for detection and tracking of objects in motion?

The new emerging theory of compressive sensing (CS) demonstrates that it is possible to reconstruct signals perfectly or robustly approximated with far fewer samples than the Shannon sampling theorem implies, when signals are sparse in some linear transform domain [

The main contributions of this research can be summarized in the following three aspects: first, we propose a coded aperture lens array optical system to realize CS imaging. This architecture can effectively reduce the needed high-resolution coded mask requirements and facilitate the storage of the projection matrix. Second, we describe a motion detection algorithm that is directly employed by using CI data without recovering traditional images. A mixture of Gaussian distribution is applied to model the background image directly in the CS space. Third, a real-time CS _{1} tracking algorithm which is 10 times faster than the _{1} tracking method is proposed.

The rest of this paper is organized as follows: in Section 2 the related work on the compressive sensing theory, state of the art CS imaging and motion detection and tracking algorithms using CS theory is reviewed. In Section 3, CS imaging based on the coded aperture lens array system is discussed. In Sections 4 and 5, motion detection and tracking algorithms applied directly on compressive sampling space are exploited. Experimental results for our CI optical system and the motion detection and tracking methods are presented in Section 6. In Section 7 we draw some conclusions from the results of our simulation study.

Consider a scene represented as a vector

or:

where the dimensions of the projection matrix Φ are ^{N}^{N}^{×}^{N}

In many cases, the basis ψ = [_{1}_{2} … _{n}_{M}_{×}_{N}

CS addresses the problem of solving for

Candes and Tao [_{2} – _{1} minimization problem:

Here the regularization parameter λ > 0 helps to overcome the ill-posed problem, and the _{1} penalty term drives small components of

Compared with conventional camera architectures, the CI camera is specifically designed to exploit the CS framework for imaging. For example, the single pixel camera designed by Rice University differs fundamentally from a conventional camera [

In summary, all the aforementioned compressive sampling strategies satisfy the following features: each element _{i}_{1}_{2} … _{m}_{i}_{1}_{2} … _{n}_{i}

In surveillance systems, background subtraction is commonly used for segmenting out objects of interest in a scene. However background subtraction techniques may require complicated density estimates for each pixel, which become burdensome in the case of a high-resolution image. In fact, performing background subtraction on compressed images, such as MPEG images, is not novel. In [

Compared with the information that is ultimately of use, researchers have begun to consider whether such a large amount of image data is substantially necessary. New motion target detection and tracking strategies need to be developed. With the emergence of CS theory, researchers have begun to engage in motion detection and tracking algorithms by using CS data. For example, [_{i} onto a matrix Φ_{i}_{i}. A Kalman filter in the compressive domain is utilized to estimate signal changes. This algorithm is only suitable for stationary or slowly-moving objects in surveillance scenarios. Wang _{1} tracker. Each motion target is expressed as a sparse representation of multiple pre-established templates. The _{1} tracker demonstrates promising robustness compared with a number of existing trackers. However computational complexity hinders its real time applications.

Developing practical optical systems to exploit CS theory is a significant challenge. Researchers have proposed several CS imaging architectures and have tested these architectures in the laboratory (see Section 2.2). As Stern proposed in [^{6}). For CI system it needs to store the projection matrix Φ^{M}^{×}^{N}^{12} maximally. Data storage and the computation for ^{M}^{×}^{N}_{B}^{MB}^{×}^{NB}_{B}_{B}

For each 4f subsystem, the action of each phase-coded mask can be considered as implementing a linear projection function across a block of original scene. Each block data collected by a compressive imaging 4f subsystem is represented as:

where * denotes convolution,

where _{h}^{−1}_{h}F^{B}

As previously mentioned, our CI system will segment the CS image into small blocks by using lens arrays. In this section we will demonstrate the method by which to detect CS motion targets directly for each CS imaging block without performing any recovery algorithm. This motion detection algorithm in the CS space is robust and has low computational cost, which will make it suitable for embedded systems.

For motion detection algorithms background images are generally assumed to be temporally stationary, whereas moving objects or foreground objects change over time. Suppose that _{b}_{t}_{d}_{b}_{b}_{t}_{d}

where _{t}_{t}_{b}_{d}_{b}_{d}_{2} – _{1} minimization problem [

The foreground image _{d}_{2} – _{1} minimization, reconstructing the foreground image frame by frame is time consuming. Can we detect the moving object directly in the compressive domain without recovering the foreground image? If the answer is positive, it will dramatically reduce the computational cost and energy consumption of surveillance systems.

The Gaussian background model is often used to segment the foreground and background region in conventional motion detection algorithms. Each pixel (^{2}^{2}_{1}, _{2} are two independent Gaussian random variables, with means _{1}, _{2} and standard deviations _{1}, _{2}, then their linear combination will also be Gaussian distributed
_{i}_{i}x_{i}

Using K Gaussian distributions, the probability density function of each compressive measurement at time

where _{i}_{,}_{j}_{,}_{t}_{i}_{,}_{j}_{,}_{t}_{i}_{,}_{j}_{,}_{t}_{i}_{,}_{t}_{i}_{,}_{j}_{,}_{t},_{i}_{,}_{j}_{,}_{t}

when a compressive measurement belongs to one Gaussian distribution, its weight parameter _{i}_{,}_{j}_{,}_{t}_{i}_{,}_{j}_{,}_{t}_{i}_{,}_{j}_{,}_{t}_{i}_{,}_{j}_{,}_{t}_{i}_{,}_{j}_{,}_{t}

With static background and lighting, only additional Gaussian noise is incurred in the sampling process, the density of background image could be described by a Gaussian distribution centered at the mean pixel value. However most surveillance videos involve lighting changes, shadows, slow moving objects and objects introduced to or removed from the scene. It is very necessary to update the background model continuously. Otherwise, errors in the background accumulate over time and finally trigger unwanted detections.

To update the background, the background parameter of pixel _{i,t}_{+1} at time instant

where _{t}_{+1},_{j}_{j}_{i,t}_{+1} matches one of the K distributions and is declared as the foreground, then that matched distribution is updated as defined above. Otherwise, the distribution with the smallest weight is discarded, and initialized to this pixel's value.

As described in [

The other distributions are considered to represent a foreground distribution. At time

The _{1}^{T}^{d}^{d}^{×(}^{Nt}^{+2}^{d}^{)} spanned by target template sets ^{d}^{×}^{Nt}

They use particle filter to estimate the posterior distribution
_{t}_{t}_{t}^{1}, ^{2}, …, ^{n}^{T}^{T}^{1}, ^{T}^{2}, …^{Tn}

An _{1}

A template update scheme is subsequently employed to reduce the drift. The main problem of the _{1}^{T}^{T}

where Φ′ ∈ ℝ^{m}^{×}^{d}_{1} – _{2} algorithms:

The feature dictionary ^{m}^{×(}^{Nt}^{+2}^{d}^{)} (

After observing ^{T}^{T}_{T}^{T}_{T}^{T}_{T}^{T}_{T}

Romberg has proven that the random Toeplitz or Gaussian matrix is incoherent with any orthonormal basis ψ with high probability [

A total variation (TV) optimization algorithm is used to reconstruct the original image from compressive measurements [

From _{d}_{d}_{d}_{b}_{t}_{d}_{b}_{t}

As presented earlier, we utilize a mixture Gaussian distribution to model the background. The foreground detection algorithm described in Section 4.3 is used to declare motion objects in compressive sampling space. The motion detection algorithms that use random binary, Gaussian, and Toeplitz phase masks are denoted by RB, RG, and RT respectively in this paper. _{binary}_{gaussian}_{toeplitz}

where _{bu}_{y}_{bμ}_{μ}

We employ the Area Under Curve (AUC) metrics to evaluate the performance of our motion detection algorithm.

To evaluate the performance of our tracking algorithm, three videos were used in the experiments. The first test sequence is an infrared (IR) image sequence that was also used in [_{1} tracker and our CS tracker for each test experiment. According to _{1} tracker, even without dimensional reduction operation. With the decrease in sampling rates, our CS tracker is 10 times faster than _{1} tracker.

From the experimental results we can seen that the computation of our CS-_{1} tracking algorithm is much cheaper. First, the reduction of templates' dimensionality would speed up the optimization process. Second, probably the most important reason is that our method can lower the rank of feature dictionary matrix _{1} tracker is smaller than that of _{1} tracker, which accelerates the rate of iteration convergence obviously and hence makes it faster than its counterpart.

Intuitively, with the reduction of the sampling rate the tracking accuracy will decrease. Thus we also examine the tracking accuracy of our tracker with _{1} tracker. For the PetsD2 video sequence, the red points are the trajectories of the motion target computed by using the _{1} tracker. Cyan, blue and green points are positions computed using our method with a sampling rate from 22%, 55% to 100%. As illustrated in

We have demonstrated that by using a CI system we can detect and track objects in motion with significantly fewer data samples than conventional image methods. A parallel coded aperture imaging array, which is based on a phase-coded 4F system, is used to simulate compressive sensing images. A Gaussian mixture model is generated off-line for later use in on-line foreground detection directly in the compressive domain and a TV optimization algorithm is used for image reconstruction. A real-time CS tracking algorithm is proposed and then applied using compressive sensing images. For compressive imaging system, experimental results show that with the decrease in measurement rates, the recovered image performance is gradually reduced. Compared with the random binary mask, simulation results show that the use of random Gaussian or Toeplitz phase masks can achieve high resolution reconstructed images. Motion detection experimental results demonstrate that low dimensional compressed imaging representation is sufficient to determine spatial motion targets. The minimum amount of measurements to perform motion detection algorithm in compressive domain is fewer than the number of measurements needed to recover background and the test image. Motion tracking results show that we can construct a compressive dictionary and use it as a template set in the CS image space. With the same _{1} reconstruction algorithm, our CS tracking method is 10 times faster than _{1} tracking method.

This work is supported by the National Basic Research Program of China (2010CB732505) and the National Natural Science Foundation of China (60903070, 61271375, 60903069, 60902103).

(

Calculation of CS motion target.

Detection and tracking framework using CS images.

Different mask types.

Original image and the corresponding compressive image via Matlab simulation platform. (

(

Energy curves computed in a 64 × 64 CI block using different phase masks with sampling rate 70%, 50% and 10% respectively.

The tracking results with our CS tracker.

The position of motion targets computed by using our method and _{1} tracker for pets sequences.

Reconstruction performance with different phased coded mask styles.

Binary | 32 | 15.9 | 13 | 10.3 | 7.2 | 5.7 |

Gaussian | 32.1 | 26.6 | 20.3 | 14.4 | 9.2 | 7.4 |

Toeplitz | 32 | 25.7 | 19.5 | 14.1 | 9.0 | 7.3 |

AUC for motion detection using different thresholds and 50%, 10% sampling rates.

_{bu} |
0.975 | 0.8875 | 0.9375 | 0.9625 | 0.825 | 0.8 |

_{bu} |
0.975 | 0.9625 | 0.9625 | 0.95 | 0.9625 | 0.9625 |

_{bu} |
0.9375 | 0.95 | 0.95 | 0.925 | 0.95 | 0.95 |

The running speed of _{1} tracker and our CS tracker with 300 particles.

IR image | 4.6 s | 1 s | 0.77 s | 0.56 s | 0.50 s | 0.45 s |

CAVIAR | 4.79 s | 0.91 s | 0.68 s | 0.61 s | 0.55 s | 0.51 s |

Pets | 5.14 s | 0.72 s | 0.63 s | 0.57 s | 0.51 s | 0.47 s |