<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Sensors</journal-id>
<journal-title>Sensors</journal-title>
<issn pub-type="epub">1424-8220</issn>
<publisher>
<publisher-name>Molecular Diversity Preservation International (MDPI)</publisher-name></publisher></journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3390/s121114397</article-id>
<article-id pub-id-type="publisher-id">sensors-12-14397</article-id>
<article-categories>
<subj-group>
<subject>Article</subject></subj-group></article-categories>
<title-group>
<article-title>A Coded Aperture Compressive Imaging Array and Its Visual Detection and Tracking Algorithms for Surveillance Systems</article-title></title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Chen</surname><given-names>Jing</given-names></name><xref ref-type="corresp" rid="c1-sensors-12-14397"><sup>*</sup></xref></contrib>
<contrib contrib-type="author">
<name><surname>Wang</surname><given-names>Yongtian</given-names></name></contrib>
<contrib contrib-type="author">
<name><surname>Wu</surname><given-names>Hanxiao</given-names></name></contrib>
<aff id="af1-sensors-12-14397">Key Laboratory of Photoelectronic Imaging Technology and System, Ministry of Education of China, School of Optoelectronics, Beijing Institute of Technology, Beijing 100081, China; E-Mails: <email>wyt@bit.edu.cn</email> (Y.W.); <email>whx0647@163.com</email> (H.W.)</aff></contrib-group>
<author-notes>
<corresp id="c1-sensors-12-14397">
<label>*</label>Author to whom correspondence should be addressed; E-Mail: <email>chen74jing29@bit.edu.cn</email>; Tel.: +86-010-689-125-6515.</corresp></author-notes>
<pub-date pub-type="collection">
<year>2012</year></pub-date>
<pub-date pub-type="epub">
<day>29</day>
<month>10</month>
<year>2012</year></pub-date>
<volume>12</volume>
<issue>11</issue>
<fpage>14397</fpage>
<lpage>14415</lpage>
<history>
<date date-type="received">
<day>17</day>
<month>07</month>
<year>2012</year></date>
<date date-type="rev-recd">
<day>17</day>
<month>09</month>
<year>2012</year></date>
<date date-type="accepted">
<day>15</day>
<month>10</month>
<year>2012</year></date></history>
<permissions>
<copyright-statement>© 2012 by the authors; licensee MDPI, Basel, Switzerland.</copyright-statement>
<copyright-year>2012</copyright-year>
<license>
<p>This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).</p></license></permissions>
<abstract>
<p>In this paper, we propose an application of a compressive imaging system to the problem of wide-area video surveillance systems. A parallel coded aperture compressive imaging system is proposed to reduce the needed high resolution coded mask requirements and facilitate the storage of the projection matrix. Random Gaussian, Toeplitz and binary phase coded masks are utilized to obtain the compressive sensing images. The corresponding motion targets detection and tracking algorithms directly using the compressive sampling images are developed. A mixture of Gaussian distribution is applied in the compressive image space to model the background image and for foreground detection. For each motion target in the compressive sampling domain, a compressive feature dictionary spanned by target templates and noises templates is sparsely represented. An <italic>l</italic><sub>1</sub> optimization algorithm is used to solve the sparse coefficient of templates. Experimental results demonstrate that low dimensional compressed imaging representation is sufficient to determine spatial motion targets. Compared with the random Gaussian and Toeplitz phase mask, motion detection algorithms using a random binary phase mask can yield better detection results. However using random Gaussian and Toeplitz phase mask can achieve high resolution reconstructed image. Our tracking algorithm can achieve a real time speed that is up to 10 times faster than that of the <italic>l</italic><sub>1</sub> tracker without any optimization.</p></abstract>
<kwd-group>
<kwd>compressive imaging</kwd>
<kwd>coded aperture</kwd>
<kwd>compressive sensing</kwd>
<kwd>motion detection and tracking</kwd></kwd-group></article-meta></front>
<body>
<sec sec-type="intro">
<label>1.</label>
<title>Introduction</title>
<p>In the field of computer vision, video surveillance is always an important tool in a variety of security applications. The challenge in video surveillance systems is that the use of conventional imaging approaches in such applications can result in overwhelming data bandwidths. To solve this problem, researchers generally compress those high-resolution video streams by using various data compression algorithms to reduce the overall bandwidth to a more manageable level. However, the optics and photo detector hardware must still operate at the native bandwidth, which seriously wastes valuable sensing resources and increases overall system cost. In fact, in video surveillance systems moving objects occupy only a small part of the full image, and a large portion of any obtained image data is redundant, such as the static background in the field of view that is repeated in every frame. We thus pose the following question: could we directly obtain compressed images during the collection process while ensuring that relevant information is preserved, only using these compressive measurements for detection and tracking of objects in motion?</p>
<p>The new emerging theory of compressive sensing (CS) demonstrates that it is possible to reconstruct signals perfectly or robustly approximated with far fewer samples than the Shannon sampling theorem implies, when signals are sparse in some linear transform domain [<xref ref-type="bibr" rid="b1-sensors-12-14397">1</xref>,<xref ref-type="bibr" rid="b2-sensors-12-14397">2</xref>]. In fact, almost all images are sparse and compressible. Based on this assertion, a new research direction on compressive imaging (CI) has been developed [<xref ref-type="bibr" rid="b3-sensors-12-14397">3</xref>]. The objective of a compressive imager is to design optical sensors that can collect linear random projections of a scene onto a small focal plane array and allow sophisticated computational methods to be used to recover the original scene image. CI has valuable implications for image acquisition fields, especially in fields with limited power, communication bandwidth and image sensor hardware, such as distributed camera networks, camera arrays and IR or UV cameras, and several promising compressive optical imaging architectures have been proposed. Although the field of CI is rapidly becoming viable for real-world sensing applications, little attention has been paid on motion target detection and tracking by using compressive sampling images, which could be an important application field of practical compressive imaging systems. In this paper, our goal is to optimize the optical CS imaging process not only to collect data in a compressed format, but also to perform motion target detection and tracking algorithms directly in a CI surveillance system.</p>
<p>The main contributions of this research can be summarized in the following three aspects: first, we propose a coded aperture lens array optical system to realize CS imaging. This architecture can effectively reduce the needed high-resolution coded mask requirements and facilitate the storage of the projection matrix. Second, we describe a motion detection algorithm that is directly employed by using CI data without recovering traditional images. A mixture of Gaussian distribution is applied to model the background image directly in the CS space. Third, a real-time CS <italic>l</italic><sub>1</sub> tracking algorithm which is 10 times faster than the <italic>l</italic><sub>1</sub> tracking method is proposed.</p>
<p>The rest of this paper is organized as follows: in Section 2 the related work on the compressive sensing theory, state of the art CS imaging and motion detection and tracking algorithms using CS theory is reviewed. In Section 3, CS imaging based on the coded aperture lens array system is discussed. In Sections 4 and 5, motion detection and tracking algorithms applied directly on compressive sampling space are exploited. Experimental results for our CI optical system and the motion detection and tracking methods are presented in Section 6. In Section 7 we draw some conclusions from the results of our simulation study.</p></sec>
<sec>
<label>2.</label>
<title>Related Work</title>
<sec>
<label>2.1.</label>
<title>Background of CS</title>
<p>Consider a scene represented as a vector <italic>X</italic> of length <italic>N</italic>. The CI camera observes the scene and generates a measurement vector <italic>Y</italic> of length <italic>M</italic>. In a noise free scenario, each of the <italic>M</italic> elements in the measurement <italic>Y</italic> represents a projection of the scene <italic>X</italic> onto the basis vectors comprising the projection matrix Φ. In matrix vector form, this set of linear equations can be expressed as:</p>
<disp-formula id="FD1">
<label>(1)</label>
<mml:math id="mm1" display="block">
<mml:semantics id="sm1">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo>⋮</mml:mo></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow>
<mml:mo>]</mml:mo></mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mo>Φ</mml:mo>
<mml:mrow>
<mml:mn>11</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mo>Φ</mml:mo>
<mml:mrow>
<mml:mn>12</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mo>…</mml:mo></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mo>…</mml:mo></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mo>Φ</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mo>Φ</mml:mo>
<mml:mrow>
<mml:mn>21</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mo>Φ</mml:mo>
<mml:mrow>
<mml:mn>22</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mo>…</mml:mo></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mo>…</mml:mo></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mo>Φ</mml:mo>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo>⋮</mml:mo></mml:mtd>
<mml:mtd>
<mml:mo>⋮</mml:mo></mml:mtd>
<mml:mtd>
<mml:mo>⋱</mml:mo></mml:mtd>
<mml:mtd>
<mml:mrow/></mml:mtd>
<mml:mtd>
<mml:mo>⋮</mml:mo></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mo>Φ</mml:mo>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mo>Φ</mml:mo>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow/></mml:mtd>
<mml:mtd>
<mml:mrow/></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mo>Φ</mml:mo>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow>
<mml:mo>]</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mtable columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:mo>⋮</mml:mo></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo>⋮</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow>
<mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>or:</p>
<disp-formula id="FD2">
<label>(2)</label>
<mml:math id="mm2" display="block">
<mml:semantics id="sm2">
<mml:mrow>
<mml:mi>Y</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo>Φ</mml:mo>
<mml:mi>X</mml:mi></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>where the dimensions of the projection matrix Φ are <italic>M</italic> × <italic>N</italic>, and each row of Φ represents a sampling of the underlying image signal. If image signals are sparse, such signals can be expressed by a set of coefficients θº<italic>R<sup>N</sup></italic> in some orthonormal basis ψ ∈ <italic>R<sup>N</sup></italic><sup>×</sup><italic><sup>N</sup></italic>:</p>
<disp-formula id="FD3">
<label>(3)</label>
<mml:math id="mm3" display="block">
<mml:semantics id="sm3">
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo>Ψ</mml:mo>
<mml:mi>θ</mml:mi></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>In many cases, the basis ψ = [<italic>ψ</italic><sub>1</sub><italic>ψ</italic><sub>2</sub> … <italic>ψ<sub>n</sub></italic>] can be chosen so that only <italic>K</italic> ≪ <italic>N</italic> coefficients have significant magnitude. The image signal can be called K-sparse. The key principle of CS is that, with slightly more than <italic>K</italic> well-chosen measurements, a K-sparse signal can be recovered by multiplying it by a random projection matrix Φ<italic><sub>M</sub></italic><sub>×</sub><italic><sub>N</sub></italic>. Here <italic>M</italic> is significantly smaller than <italic>N</italic> but larger than <italic>K</italic>. Substituting <xref rid="FD3" ref-type="disp-formula">Equation (3)</xref> into <xref rid="FD2" ref-type="disp-formula">Equation (2)</xref> we observe that:</p>
<disp-formula id="FD4">
<label>(4)</label>
<mml:math id="mm4" display="block">
<mml:semantics id="sm4">
<mml:mrow>
<mml:mi>Y</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo>Φ</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo>Φ</mml:mo>
<mml:mo>Ψ</mml:mo>
<mml:mi>θ</mml:mi></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>CS addresses the problem of solving for <italic>X</italic> when the measurements are much smaller than original image signals. This is generally an ill-posed problem, because there are an infinite number of candidate solutions for <italic>X</italic>. Nevertheless, the CS theory provides a set of conditions that, if <italic>X</italic> is sparse or compressible in a basis ψ, and Φ in conjunction with ψ satisfies a technical condition called the Restricted Isometry Property (RIP):</p>
<disp-formula id="FD5">
<label>(5)</label>
<mml:math id="mm5" display="block">
<mml:semantics id="sm5">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>−</mml:mo>
<mml:mi>δ</mml:mi></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>‖</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>‖</mml:mo></mml:mrow></mml:mrow>
<mml:mn>2</mml:mn>
<mml:mn>2</mml:mn></mml:msubsup>
<mml:mo>≤</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>‖</mml:mo>
<mml:mrow>
<mml:mo>Φ</mml:mo>
<mml:mo>Ψ</mml:mo>
<mml:mi>x</mml:mi></mml:mrow>
<mml:mo>‖</mml:mo></mml:mrow></mml:mrow>
<mml:mn>2</mml:mn>
<mml:mn>2</mml:mn></mml:msubsup>
<mml:mo>≤</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:mi>δ</mml:mi></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>‖</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>‖</mml:mo></mml:mrow></mml:mrow>
<mml:mn>2</mml:mn>
<mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>Candes and Tao [<xref ref-type="bibr" rid="b4-sensors-12-14397">4</xref>,<xref ref-type="bibr" rid="b5-sensors-12-14397">5</xref>] show that the signal <italic>X</italic> can be exactly recovered from few measurements by solving a <italic>l</italic><sub>2</sub> – <italic>l</italic><sub>1</sub> minimization problem:</p>
<disp-formula id="FD6">
<label>(6)</label>
<mml:math id="mm6" display="block">
<mml:semantics id="sm6">
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>^</mml:mo></mml:mover>
<mml:mo>=</mml:mo>
<mml:mo>arg</mml:mo>
<mml:mo>min</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn></mml:mfrac>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>‖</mml:mo>
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo>−</mml:mo>
<mml:mo>Φ</mml:mo>
<mml:mi>x</mml:mi></mml:mrow>
<mml:mo>‖</mml:mo></mml:mrow></mml:mrow>
<mml:mn>2</mml:mn>
<mml:mn>2</mml:mn></mml:msubsup>
<mml:mo>+</mml:mo>
<mml:mi>λ</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>‖</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mo>Ψ</mml:mo>
<mml:mi>T</mml:mi></mml:msup>
<mml:mi>x</mml:mi></mml:mrow>
<mml:mo>‖</mml:mo></mml:mrow></mml:mrow>
<mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>Here the regularization parameter λ &gt; 0 helps to overcome the ill-posed problem, and the <italic>l</italic><sub>1</sub> penalty term drives small components of <italic>θ</italic> to zero and helps promote sparse solutions. In fact, the RIP constrained condition of <xref rid="FD5" ref-type="disp-formula">Equation (5)</xref> suggests that the energy contained in the projected image <italic>Y</italic> is close to the energy contained in the original image <italic>X</italic>.</p></sec>
<sec>
<label>2.2.</label>
<title>CI</title>
<p>Compared with conventional camera architectures, the CI camera is specifically designed to exploit the CS framework for imaging. For example, the single pixel camera designed by Rice University differs fundamentally from a conventional camera [<xref ref-type="bibr" rid="b6-sensors-12-14397">6</xref>]. A programmed digital micro-mirror device is used to perform linear projections of an image onto a single optical photodiode. In this type of optical architecture, the system cycles sequentially through the rows of the projection matrix Φ to determine the measurement elements one at a time. Any arbitrary pattern of values in the domain [0,1] can be easily used by reprogramming the control software. However, as the measurement elements of <italic>y</italic> are measured sequentially, dynamic imaging is inherently time consuming. Considering the dynamic scene imaging problem, researchers have proposed some other optical CI systems. Rather than measuring a sequence of a scene image to a single pixel, they make a parallel measurement of the original scene image onto a small set of pixels. For example, the Duke University group describes the design of coded aperture masks for super resolution image reconstruction from a single, low-resolution, noisy observation image [<xref ref-type="bibr" rid="b7-sensors-12-14397">7</xref>,<xref ref-type="bibr" rid="b8-sensors-12-14397">8</xref>]. This architecture is simple and highly suitable for optical CS imaging because all measurements are collected at one time. More recently, based on their prior work, Harmany <italic>et al.</italic>[<xref ref-type="bibr" rid="b9-sensors-12-14397">9</xref>] proposed a coded aperture keyed exposure sensing paradigm to realize spatio-temporal compressive sensing imaging. However, how to make the random coded aperture practically remains a key problem that needs to be solved. Fergus <italic>et al</italic>. reported a compact CI camera that uses a random lens [<xref ref-type="bibr" rid="b10-sensors-12-14397">10</xref>]. This approach can achieve an ultra-thin optical system design and can be applied to numerous practical applications. However obtaining the sensing matrix from these random lenses is difficult. Shi <italic>et al.</italic>[<xref ref-type="bibr" rid="b11-sensors-12-14397">11</xref>] proposed a compressive optical imaging system based on spherical aberration. Spherical aberration is an optical phenomenon attributed to the intrinsic refraction property of a spherical lens. The larger the curvature of the lens surface, the more serious the aberration will be. The optical structure of this architecture only needs a lens with significant spherical aberration. Although the research on this method is being undertaken, the method by which to design and to manufacture this special lens may be not easy. In [<xref ref-type="bibr" rid="b12-sensors-12-14397">12</xref>,<xref ref-type="bibr" rid="b13-sensors-12-14397">13</xref>], Neifeld <italic>et al</italic>. proposed an adaptive feature-specific imaging system for face recognition tasks.</p>
<p>In summary, all the aforementioned compressive sampling strategies satisfy the following features: each element <italic>x<sub>i</sub></italic> in the source image contributes to all compressed measurements {<italic>y</italic><sub>1</sub><italic>y</italic><sub>2</sub> … <italic>y<sub>m</sub></italic>} and each compressed measurement <italic>y<sub>i</sub></italic> is a linear combination of all source elements {<italic>x</italic><sub>1</sub><italic>x</italic><sub>2</sub> … <italic>x<sub>n</sub></italic>}. The coding of a particular pixel <italic>y<sub>i</sub></italic> is relatively uncorrelated with that of its neighbors.</p></sec>
<sec>
<label>2.3.</label>
<title>Motion Targets Detection and Tracking by Using CS</title>
<p>In surveillance systems, background subtraction is commonly used for segmenting out objects of interest in a scene. However background subtraction techniques may require complicated density estimates for each pixel, which become burdensome in the case of a high-resolution image. In fact, performing background subtraction on compressed images, such as MPEG images, is not novel. In [<xref ref-type="bibr" rid="b14-sensors-12-14397">14</xref>], the authors performed background subtraction on a MPEG-compressed video by using the DC-DCT coefficients of image frames. Toreyin <italic>et al.</italic>[<xref ref-type="bibr" rid="b15-sensors-12-14397">15</xref>] similarly used this technique on wavelet representation. However, our technique focuses on CS imaging data, not on compressed video files. Moreover for motion tracking algorithms, Kalman filter, particle filter and mean shift methods are often used for tracking motion targets. However higher data dimensionality may be detrimental to the real time performance of tracking, which will lead to greater computational complexity when performing the density and background model estimations.</p>
<p>Compared with the information that is ultimately of use, researchers have begun to consider whether such a large amount of image data is substantially necessary. New motion target detection and tracking strategies need to be developed. With the emergence of CS theory, researchers have begun to engage in motion detection and tracking algorithms by using CS data. For example, [<xref ref-type="bibr" rid="b16-sensors-12-14397">16</xref>] describes a method to directly recover background subtracted images by using the CS theory. A single Gaussian distribution background model is employed and a compressive single-pixel camera is used to obtain the compressive sampling images. However the researchers need to recover the original image to update the background model and a single-pixel camera is used to obtain compressive images, which is time consuming and unsuitable for dynamic scenes imaging. In [<xref ref-type="bibr" rid="b17-sensors-12-14397">17</xref>], compressive measurements of a surveillance video sequence are decomposed into a low rank matrix and a sparse matrix. The low rank matrix represents the background model, and the sparse components are utilized to identify the moving objects. The augmented Lagrangian alternating direction method is employed to solve the low rank and the sparse matrix simultaneously. However this algorithm requires a video sequence to identify the moving targets, which cannot be used in real time applications. In [<xref ref-type="bibr" rid="b18-sensors-12-14397">18</xref>], authors propose a signal tracking algorithm the use compressive observations. The signal being tracked is assumed to be sparse and with slow changes. Compressive measurements are obtained by projecting the known signal <italic>x</italic><sub>i</sub> onto a matrix Φ<italic><sub>i</sub></italic>, which retains only the columns of Φ with indices that lie in <italic>x</italic><sub>i</sub>. A Kalman filter in the compressive domain is utilized to estimate signal changes. This algorithm is only suitable for stationary or slowly-moving objects in surveillance scenarios. Wang <italic>et al</italic>. [<xref ref-type="bibr" rid="b19-sensors-12-14397">19</xref>] developed a compressive particle filtering algorithm for moving targets tracking with compressive measurements to avoid image reconstruction procedures. Recently, Mei <italic>et al</italic>. [<xref ref-type="bibr" rid="b20-sensors-12-14397">20</xref>] proposed a robust <italic>l</italic><sub>1</sub> tracker. Each motion target is expressed as a sparse representation of multiple pre-established templates. The <italic>l</italic><sub>1</sub> tracker demonstrates promising robustness compared with a number of existing trackers. However computational complexity hinders its real time applications.</p></sec></sec>
<sec>
<label>3.</label>
<title>Coded Aperture CI Array</title>
<p>Developing practical optical systems to exploit CS theory is a significant challenge. Researchers have proposed several CS imaging architectures and have tested these architectures in the laboratory (see Section 2.2). As Stern proposed in [<xref ref-type="bibr" rid="b21-sensors-12-14397">21</xref>], the typical size of a conventional image is megapixels (<italic>N</italic> = 10<sup>6</sup>). For CI system it needs to store the projection matrix Φ<italic><sup>M</sup></italic><sup>×</sup><italic><sup>N</sup></italic>, which is <italic>M</italic> times larger than <italic>N</italic> and can reach 10<sup>12</sup> maximally. Data storage and the computation for <xref rid="FD6" ref-type="disp-formula">Equation (6)</xref> will be challenge. Furthermore to calibrate projection matrix Φ<italic><sup>M</sup></italic><sup>×</sup><italic><sup>N</sup></italic>, <italic>N</italic> point spread functions have to be measured, which is exhaustive and time consuming. In order to solve the aforementioned problems, we propose a coded aperture array optical system to realize CS imaging. <xref ref-type="fig" rid="f1-sensors-12-14397">Figure 1(a)</xref> shows the architecture of our CI system. The general design is based on a 4f system, which comprises of a Fourier transform lens array, an inverse Fourier transform lens array and the corresponding phase-coded masks located between these two lens arrays. For each phase coded 4f system (see <xref ref-type="fig" rid="f1-sensors-12-14397">Figure 1(b)</xref>), the first lens is a Fourier lens, on the focus plane of the Fourier lens it produces a frequency spectrum of the light beam corresponding to the Fourier transformation. Placing a spatial light modulator on this plane to modulate the phase of lights, a phase coded “frequency image” can be obtained. After that we use another Fourier lens to transfer the modulated frequency spectrum to spatial image domain. Thus through a phased coded 4f system, the scene we wish to image can yield a phase coded measurements on detector elements, and finally can be digitally post processed to reconstruct the original scene. For a megapixel image, if we consider a 9 × 9 4f subsystem, the original image will be separated into 9 × 9 blocks. For each block, the image data will be 1/81 of the original image. Therefore the stored sensing matrix Φ<sub>B</sub><italic><sup>MB</sup></italic><sup>×</sup><italic><sup>NB</sup></italic> (<italic>M<sub>B</sub></italic> ≪ <italic>N<sub>B</sub></italic>) of each block will be at least 1/81 × 1/81, which is only 1/6561 of a single aperture CI system. Using separable scheme can effectively reduce the high resolution requirements coded mask needed and facilitate the storage of the coded matrix.</p>
<p>For each 4f subsystem, the action of each phase-coded mask can be considered as implementing a linear projection function across a block of original scene. Each block data collected by a compressive imaging 4f subsystem is represented as:</p>
<disp-formula id="FD7">
<label>(7)</label>
<mml:math id="mm7" display="block">
<mml:semantics id="sm7">
<mml:mrow>
<mml:msup>
<mml:mi>y</mml:mi>
<mml:mi>B</mml:mi></mml:msup>
<mml:mo>=</mml:mo>
<mml:mi>D</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>h</mml:mi>
<mml:mo>*</mml:mo>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mi>B</mml:mi></mml:msup>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>where * denotes convolution, <italic>h</italic> is the phase-coding mask, and D is the random sampling operation of the scene. As shown in [<xref ref-type="bibr" rid="b22-sensors-12-14397">22</xref>,<xref ref-type="bibr" rid="b23-sensors-12-14397">23</xref>], the convolution of <italic>h</italic> with an image <italic>x</italic> can be represented as the application of the Fourier transform to <italic>x</italic> and <italic>h</italic>. In matrix notation, <xref rid="FD7" ref-type="disp-formula">Equation (7)</xref> can be expressed as:</p>
<disp-formula id="FD8">
<label>(8)</label>
<mml:math id="mm8" display="block">
<mml:semantics id="sm8">
<mml:mrow>
<mml:msup>
<mml:mi>y</mml:mi>
<mml:mi>B</mml:mi></mml:msup>
<mml:mo>=</mml:mo>
<mml:mi>D</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>h</mml:mi>
<mml:mo>*</mml:mo>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mi>B</mml:mi></mml:msup>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>D</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>F</mml:mi>
<mml:mrow>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msup>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mi>h</mml:mi></mml:msub>
<mml:mi>F</mml:mi>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mi>B</mml:mi></mml:msup>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>where <italic>F</italic> is the two-dimensional Fourier transform matrix and <italic>C<sub>h</sub></italic> is the diagonal matrix of the <italic>F</italic>(<italic>h</italic>). If the matrix production <italic>F</italic><sup>−1</sup><italic>C<sub>h</sub>F</italic> satisfies the RIP, we can accurately recover the original image <italic>x<sup>B</sup></italic> with high probability when the compressive measurements m ≥ <italic>Ck</italic> log(<italic>n</italic>/<italic>k</italic>). After obtaining all CI signals in each 4f subsystem, the block CS algorithm can be used to reconstruct original signals. Thus by designing such a special optical system, we can acquire compressed imaging measurements.</p></sec>
<sec>
<label>4.</label>
<title>Motion Objects Detection Based on CS Images</title>
<p>As previously mentioned, our CI system will segment the CS image into small blocks by using lens arrays. In this section we will demonstrate the method by which to detect CS motion targets directly for each CS imaging block without performing any recovery algorithm. This motion detection algorithm in the CS space is robust and has low computational cost, which will make it suitable for embedded systems.</p>
<sec>
<label>4.1.</label>
<title>Background Model</title>
<p>For motion detection algorithms background images are generally assumed to be temporally stationary, whereas moving objects or foreground objects change over time. Suppose that <italic>x<sub>b</sub></italic> and <italic>x<sub>t</sub></italic> are real background and test images in the scene and <italic>x<sub>d</sub></italic> is a difference image or a foreground image. Given that the foreground image is composed by those pixels which only differ from background images. Therefore the foreground image is always smaller than the background image, and can be considered as a sparse signal in a special transformation domain. Suppose that we obtain compressive measurements <italic>y<sub>b</sub></italic> of training background images <italic>x<sub>b</sub></italic> and <italic>y<sub>t</sub></italic> the compressed measurements of current images, the compressive measurements of the foreground image <italic>y<sub>d</sub></italic> can be expressed as:</p>
<disp-formula id="FD9">
<label>(9)</label>
<mml:math id="mm9" display="block">
<mml:semantics id="sm9">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>d</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>t</mml:mi></mml:msub>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>b</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mo>Φ</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi></mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>n</mml:mi>
<mml:mi>t</mml:mi></mml:msub>
<mml:mo>−</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>Φ</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>b</mml:mi></mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>n</mml:mi>
<mml:mi>b</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo>Φ</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>d</mml:mi></mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi></mml:msub></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>where <italic>n<sub>t</sub></italic> is an additional Gaussian noise of <italic>y<sub>t</sub></italic>, <italic>n<sub>b</sub></italic> and <italic>n<sub>d</sub></italic> are the noises of <italic>y<sub>b</sub></italic> and <italic>y<sub>d</sub></italic> respectively. By solving a <italic>l</italic><sub>2</sub> – <italic>l</italic><sub>1</sub> minimization problem [<xref ref-type="bibr" rid="b4-sensors-12-14397">4</xref>–<xref ref-type="bibr" rid="b5-sensors-12-14397">5</xref>]:</p>
<disp-formula id="FD10">
<label>(10)</label>
<mml:math id="mm10" display="block">
<mml:semantics id="sm10">
<mml:mrow>
<mml:msub>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>^</mml:mo></mml:mover>
<mml:mi>d</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mo>arg</mml:mo>
<mml:mo>min</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn></mml:mfrac>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>‖</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>d</mml:mi></mml:msub>
<mml:mo>−</mml:mo>
<mml:mo>Φ</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>d</mml:mi></mml:msub></mml:mrow>
<mml:mo>‖</mml:mo></mml:mrow></mml:mrow>
<mml:mn>2</mml:mn>
<mml:mn>2</mml:mn></mml:msubsup>
<mml:mo>+</mml:mo>
<mml:mi>λ</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>‖</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mo>Ψ</mml:mo>
<mml:mi>T</mml:mi></mml:msup>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>d</mml:mi></mml:msub></mml:mrow>
<mml:mo>‖</mml:mo></mml:mrow></mml:mrow>
<mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>The foreground image <italic>x<sub>d</sub></italic> can be exactly recovered. In <xref rid="FD10" ref-type="disp-formula">Equation (10)</xref>, ψ can be the wavelet basis which is always used as the sparse basis. Although detecting moving objects in the compressive domain can be easily achieved by using a background subtraction algorithm and recovering the foreground image in the real world space with <italic>l</italic><sub>2</sub> – <italic>l</italic><sub>1</sub> minimization, reconstructing the foreground image frame by frame is time consuming. Can we detect the moving object directly in the compressive domain without recovering the foreground image? If the answer is positive, it will dramatically reduce the computational cost and energy consumption of surveillance systems.</p>
<p>The Gaussian background model is often used to segment the foreground and background region in conventional motion detection algorithms. Each pixel (<italic>x</italic>, <italic>y</italic>) over a time series <italic>t</italic> = 1,2……<italic>T</italic> is modeled by a Gaussian distribution <italic>I</italic>(<italic>x</italic>, <italic>y</italic>) ∼ <italic>N</italic>(<italic>u</italic>, <italic>σ</italic><sup>2</sup><italic>I</italic>). <italic>σ</italic><sup>2</sup><italic>I</italic> is the covariance matrix of the Gaussian model, and <italic>N</italic> is a Gaussian probability density function. According to the Gaussian theorem, if <italic>M</italic><sub>1</sub>, <italic>M</italic><sub>2</sub> are two independent Gaussian random variables, with means <italic>μ</italic><sub>1</sub>, <italic>μ</italic><sub>2</sub> and standard deviations <italic>σ</italic><sub>1</sub>, <italic>σ</italic><sub>2</sub>, then their linear combination will also be Gaussian distributed 
<inline-formula>
<mml:math id="mm11" display="inline">
<mml:semantics id="sm11">
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>b</mml:mi>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mn>2</mml:mn></mml:msub>
<mml:mo>~</mml:mo>
<mml:mi>N</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>a</mml:mi>
<mml:msub>
<mml:mi>μ</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>b</mml:mi>
<mml:msub>
<mml:mi>μ</mml:mi>
<mml:mn>2</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>a</mml:mi>
<mml:mn>2</mml:mn></mml:msup>
<mml:msubsup>
<mml:mi>σ</mml:mi>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn></mml:msubsup>
<mml:mo>+</mml:mo>
<mml:msup>
<mml:mi>b</mml:mi>
<mml:mn>2</mml:mn></mml:msup>
<mml:msubsup>
<mml:mi>σ</mml:mi>
<mml:mn>2</mml:mn>
<mml:mn>2</mml:mn></mml:msubsup>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></inline-formula>. Therefore it is reasonable to assume each compressive measurement with a Gaussians distribution 
<inline-formula>
<mml:math id="mm12" display="inline">
<mml:semantics id="sm12">
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>σ</mml:mi>
<mml:mi>i</mml:mi>
<mml:mn>2</mml:mn></mml:msubsup>
<mml:mi>I</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></inline-formula>. Here the mean value is <italic>y<sub>i</sub></italic> = Φ<italic><sub>i</sub>x</italic>. When the scene changes to include an object that was not part of the background model, theoretically every compressive pixel value <italic>y<sub>i</sub></italic>, <italic>i</italic> = 1,2……<italic>m</italic> will be against the existing Gaussian distributions. In order to handle image acquisition noise and illumination changes, we use a mixture Gaussian distribution [<xref ref-type="bibr" rid="b24-sensors-12-14397">24</xref>,<xref ref-type="bibr" rid="b25-sensors-12-14397">25</xref>] to model the background of compressive images and a simple threshold test to declare motion targets.</p>
<p>Using K Gaussian distributions, the probability density function of each compressive measurement at time <italic>t</italic> can be expressed as:</p>
<disp-formula id="FD11">
<label>(11)</label>
<mml:math id="mm13" display="block">
<mml:semantics id="sm13">
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi></mml:mrow></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:munderover>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>k</mml:mi></mml:munderover>
<mml:mrow>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi></mml:mrow></mml:msub>
<mml:mo>×</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi></mml:mrow></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>μ</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi></mml:mrow></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi></mml:mrow></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>where <italic>w<sub>i</sub></italic><sub>,</sub><italic><sub>j</sub></italic><sub>,</sub><italic><sub>t</sub></italic>, <italic>μ<sub>i</sub></italic><sub>,</sub><italic><sub>j</sub></italic><sub>,</sub><italic><sub>t</sub></italic> and Σ<italic><sub>i</sub></italic><sub>,</sub><italic><sub>j</sub></italic><sub>,</sub><italic><sub>t</sub></italic> are the estimates of the weight, mean value, and covariance matrix of the <italic>j</italic> th Gaussian distribution of the <italic>i</italic> th pixel at time <italic>t</italic> in the mixture model respectively. The <italic>j</italic> th Gaussian probability density function <italic>p</italic>(<italic>y<sub>i</sub></italic><sub>,</sub><italic><sub>t</sub></italic>, <italic>μ<sub>i</sub></italic><sub>,</sub><italic><sub>j</sub></italic><sub>,</sub><italic><sub>t</sub>,</italic> Σ<italic><sub>i</sub></italic><sub>,</sub><italic><sub>j</sub></italic><sub>,</sub><italic><sub>t</sub></italic>) is defined as:</p>
<disp-formula id="FD12">
<label>(12)</label>
<mml:math id="mm14" display="block">
<mml:semantics id="sm14">
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi></mml:mrow></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>μ</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi></mml:mrow></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>∑</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi></mml:mrow></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>2</mml:mn>
<mml:mi>π</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mi>n</mml:mi>
<mml:mn>2</mml:mn></mml:mfrac></mml:mrow></mml:msup>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>∑</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow>
<mml:mo>|</mml:mo></mml:mrow></mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn></mml:mfrac></mml:mrow></mml:msup></mml:mrow></mml:mfrac>
<mml:mo>exp</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>−</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn></mml:mfrac>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi></mml:mrow></mml:msub>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mi>μ</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi></mml:mrow></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mi>T</mml:mi></mml:msup>
<mml:msub>
<mml:mi>∑</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi></mml:mrow></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi></mml:mrow></mml:msub>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mi>μ</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi></mml:mrow></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>when a compressive measurement belongs to one Gaussian distribution, its weight parameter <italic>w<sub>i</sub></italic><sub>,</sub><italic><sub>j</sub></italic><sub>,</sub><italic><sub>t</sub></italic> will be large and the standard deviation <italic>σ<sub>i</sub></italic><sub>,</sub><italic><sub>j</sub></italic><sub>,</sub><italic><sub>t</sub></italic> will be small, which indicates that the measurement belongs to a distribution with high certainty. In this paper, the background model parameters <italic>w<sub>i</sub></italic><sub>,</sub><italic><sub>j</sub></italic><sub>,</sub><italic><sub>t</sub></italic>, <italic>μ<sub>i</sub></italic><sub>,</sub><italic><sub>j</sub></italic><sub>,</sub><italic><sub>t</sub></italic> and Σ<italic><sub>i</sub></italic><sub>,</sub><italic><sub>j</sub></italic><sub>,</sub><italic><sub>t</sub></italic> are estimated by using EM algorithm [<xref ref-type="bibr" rid="b26-sensors-12-14397">26</xref>].</p></sec>
<sec>
<label>4.2.</label>
<title>Background Model Update</title>
<p>With static background and lighting, only additional Gaussian noise is incurred in the sampling process, the density of background image could be described by a Gaussian distribution centered at the mean pixel value. However most surveillance videos involve lighting changes, shadows, slow moving objects and objects introduced to or removed from the scene. It is very necessary to update the background model continuously. Otherwise, errors in the background accumulate over time and finally trigger unwanted detections.</p>
<p>To update the background, the background parameter of pixel <italic>y<sub>i,t</sub></italic><sub>+1</sub> at time instant <italic>t</italic> + 1 can be estimated by using following equations:</p>
<disp-formula id="FD13">
<label>(13)</label>
<mml:math id="mm15" display="block">
<mml:semantics id="sm15">
<mml:mrow>
<mml:msub>
<mml:mover accent="true">
<mml:mi>w</mml:mi>
<mml:mo>^</mml:mo></mml:mover>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>−</mml:mo>
<mml:mi>α</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi></mml:mrow></mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>α</mml:mi></mml:mrow></mml:semantics></mml:math></disp-formula>
<disp-formula id="FD14">
<label>(14)</label>
<mml:math id="mm16" display="block">
<mml:semantics id="sm16">
<mml:mrow>
<mml:msub>
<mml:mover accent="true">
<mml:mi>μ</mml:mi>
<mml:mo>^</mml:mo></mml:mover>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>−</mml:mo>
<mml:mi>ρ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:msub>
<mml:mi>μ</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi></mml:mrow></mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>ρ</mml:mi>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:semantics></mml:math></disp-formula>
<disp-formula id="FD15">
<label>(15)</label>
<mml:math id="mm17" display="block">
<mml:semantics id="sm17">
<mml:mrow>
<mml:msub>
<mml:mover accent="true">
<mml:mi>∑</mml:mi>
<mml:mo>^</mml:mo></mml:mover>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>−</mml:mo>
<mml:mi>ρ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:msub>
<mml:mi>∑</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi></mml:mrow></mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>ρ</mml:mi>
<mml:mo>Δ</mml:mo>
<mml:msub>
<mml:mi>∑</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>where <italic>α</italic> is the leaning rate and the parameter <italic>ρ</italic> = <italic>N</italic>(<italic>y<sub>t</sub></italic><sub>+1</sub>,<italic>μ<sub>j</sub></italic>,Σ<italic><sub>j</sub></italic>) If the pixel <italic>y<sub>i,t</sub></italic><sub>+1</sub> matches one of the K distributions and is declared as the foreground, then that matched distribution is updated as defined above. Otherwise, the distribution with the smallest weight is discarded, and initialized to this pixel's value.</p></sec>
<sec>
<label>4.3.</label>
<title>Motion Detection Based on Compressive Sampling Images</title>
<p>As described in [<xref ref-type="bibr" rid="b27-sensors-12-14397">27</xref>], at time <italic>t</italic> the K distributions of the background model are ordered in descending order based on 
<inline-formula>
<mml:math id="mm18" display="inline">
<mml:semantics id="sm18">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>σ</mml:mi>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:semantics></mml:math></inline-formula>. This ordering supposes that a background pixel corresponds to a high weight with a weak variance due to the fact that the background is more static and the background pixel value is practically constant. The first B Gaussian distributions which exceed a certain threshold <italic>T</italic> are considered a background distribution:</p>
<disp-formula id="FD16">
<label>(16)</label>
<mml:math id="mm19" display="block">
<mml:semantics id="sm19">
<mml:mrow>
<mml:mi>B</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo>arg</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mo>min</mml:mo></mml:mrow>
<mml:mi>b</mml:mi></mml:munder>
<mml:mo stretchy="false">(</mml:mo>
<mml:munderover>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>b</mml:mi></mml:munderover>
<mml:mrow>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi></mml:mrow></mml:msub>
<mml:mo>&gt;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>The other distributions are considered to represent a foreground distribution. At time <italic>t</italic> + 1, if a pixel matches a Gaussian distribution of any B distribution, this pixel will be identified as “background”, otherwise the pixel is classified as “foreground”. If no match is found with any of the K Gaussians, the pixel is also classified as “foreground”. We declare that there is a new object when the result of <xref rid="FD17" ref-type="disp-formula">Equation (17)</xref> is above a threshold.</p>
<disp-formula id="FD17">
<label>(17)</label>
<mml:math id="mm20" display="block">
<mml:semantics id="sm20">
<mml:mrow>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mi>y</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:munderover>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>M</mml:mi></mml:munderover>
<mml:mrow>
<mml:munderover>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>k</mml:mi></mml:munderover>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>‖</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mi>μ</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>σ</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow>
<mml:mo>‖</mml:mo></mml:mrow></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula></sec></sec>
<sec>
<label>5.</label>
<title>Motion Objects Tracking Based on CS Images</title>
<sec>
<label>5.1.</label>
<title>CS-l<sub>1</sub> Tracking Algorithm</title>
<p>The <italic>l<sub>1</sub></italic> tracker proposed by the authors in [<xref ref-type="bibr" rid="b20-sensors-12-14397">20</xref>] is a promising motion target tracking algorithm, which can handle occlusions, corruption, and lighting changes issues. Their algorithm is based on a particle filter framework and each tracking target <italic>x<sup>T</sup></italic> ∈ ℝ<italic><sup>d</sup></italic> is sparsely represented in a feature dictionary <italic>A</italic> ∈ ℝ<italic><sup>d</sup></italic><sup>×(</sup><italic><sup>Nt</sup></italic><sup>+2</sup><italic><sup>d</sup></italic><sup>)</sup> spanned by target template sets <italic>T</italic> ∈ ℝ<italic><sup>d</sup></italic><sup>×</sup><italic><sup>Nt</sup></italic> and noises templates sets [<italic>I</italic> −<italic>I</italic>] as:</p>
<disp-formula id="FD18">
<label>(18)</label>
<mml:math id="mm21" display="block">
<mml:semantics id="sm21">
<mml:mrow>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mi>T</mml:mi></mml:msup>
<mml:mo>=</mml:mo>
<mml:mo stretchy="false">[</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>I</mml:mi>
<mml:mo>,</mml:mo>
<mml:mo>−</mml:mo>
<mml:mi>I</mml:mi>
<mml:mo stretchy="false">]</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mi>a</mml:mi></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mo>+</mml:mo></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mo>−</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow>
<mml:mo>]</mml:mo></mml:mrow>
<mml:mo>=</mml:mo>
<mml:mi>A</mml:mi>
<mml:mi>c</mml:mi></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>They use particle filter to estimate the posterior distribution 
<inline-formula>
<mml:math id="mm22" display="inline">
<mml:semantics id="sm22">
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi></mml:msub></mml:mrow>
<mml:mo>|</mml:mo></mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>T</mml:mi></mml:msubsup>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></inline-formula>. The state variable <italic>s<sub>t</sub></italic> is modeled by affine transformation parameters of a target object at time <italic>t</italic>, and the observation <italic>x<sub>t</sub></italic> is the corresponding object cropped from images by using <italic>s<sub>t</sub></italic> as parameters. Let <italic>S</italic> = {<italic>s</italic><sup>1</sup>, <italic>s</italic><sup>2</sup>, …, <italic>s<sup>n</sup></italic>} be the <italic>n</italic> state candidates and <italic>X<sup>T</sup></italic> = {<italic>x<sup>T</sup></italic><sup>1</sup>, <italic>x<sup>T</sup></italic><sup>2</sup>, …<italic>x<sup>Tn</sup></italic>} be the corresponding target candidates at time <italic>t</italic>. The target candidate is estimated by finding the smallest projection errors:</p>
<disp-formula id="FD19">
<label>(19)</label>
<mml:math id="mm23" display="block">
<mml:semantics id="sm23">
<mml:mrow>
<mml:msup>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>^</mml:mo></mml:mover>
<mml:mi>T</mml:mi></mml:msup>
<mml:mo>=</mml:mo>
<mml:mo>arg</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo>max</mml:mo></mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mi>T</mml:mi></mml:msup>
<mml:mo>∈</mml:mo>
<mml:msup>
<mml:mi>X</mml:mi>
<mml:mi>T</mml:mi></mml:msup></mml:mrow></mml:msub>
<mml:munder>
<mml:mo>∏</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi></mml:mrow></mml:munder>
<mml:mi>ℕ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mi>T</mml:mi></mml:msup>
<mml:mo>−</mml:mo>
<mml:mi>A</mml:mi>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>;</mml:mo>
<mml:msup>
<mml:mi>σ</mml:mi>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>An <italic>l<sub>1</sub></italic> optimization algorithm is used to solve the sparse coefficient <italic>c</italic> as follows:</p>
<disp-formula id="FD20">
<label>(20)</label>
<mml:math id="mm24" display="block">
<mml:semantics id="sm24">
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>^</mml:mo></mml:mover>
<mml:mo>=</mml:mo>
<mml:mo>arg</mml:mo>
<mml:mo>min</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn></mml:mfrac>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>‖</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mi>T</mml:mi></mml:msup>
<mml:mo>−</mml:mo>
<mml:mi>A</mml:mi>
<mml:mi>c</mml:mi></mml:mrow>
<mml:mo>‖</mml:mo></mml:mrow></mml:mrow>
<mml:mn>2</mml:mn>
<mml:mn>2</mml:mn></mml:msubsup>
<mml:mo>+</mml:mo>
<mml:mi>λ</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>‖</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>‖</mml:mo></mml:mrow></mml:mrow>
<mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>A template update scheme is subsequently employed to reduce the drift. The main problem of the <italic>l<sub>1</sub></italic> tracker is the extremely high dimensionality of its feature dictionary space, which leads to a heavy computation burden. Inspired by their outstanding work, we aim to accelerate their tracking algorithm and discuss its application in CI systems. According to <xref rid="FD18" ref-type="disp-formula">Equation (18)</xref>, in the context of CS the corresponding compressive measurements <italic>y<sup>T</sup></italic> of <italic>x<sup>T</sup></italic> can be represented by:</p>
<disp-formula id="FD21">
<label>(21)</label>
<mml:math id="mm25" display="block">
<mml:semantics id="sm25">
<mml:mrow>
<mml:msup>
<mml:mi>y</mml:mi>
<mml:mi>T</mml:mi></mml:msup>
<mml:mo>=</mml:mo>
<mml:mo>Φ</mml:mo>
<mml:mo>'</mml:mo>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mi>T</mml:mi></mml:msup>
<mml:mo>=</mml:mo>
<mml:mo>Φ</mml:mo>
<mml:mo>'</mml:mo>
<mml:mi>A</mml:mi>
<mml:mi>c</mml:mi></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>where Φ′ ∈ ℝ<italic><sup>m</sup></italic><sup>×</sup><italic><sup>d</sup></italic> is a projection matrix. Obviously, the sparse coefficient <italic>c</italic> in <xref rid="FD21" ref-type="disp-formula">Equation (21)</xref> can also be recovered with high probability by using TV optimization algorithm [<xref ref-type="bibr" rid="b28-sensors-12-14397">28</xref>], OMP algorithm [<xref ref-type="bibr" rid="b29-sensors-12-14397">29</xref>], gradient projection algorithms [<xref ref-type="bibr" rid="b30-sensors-12-14397">30</xref>], LARS algorithm [<xref ref-type="bibr" rid="b31-sensors-12-14397">31</xref>], and other <italic>l</italic><sub>1</sub> – <italic>l</italic><sub>2</sub> algorithms:</p>
<disp-formula id="FD22">
<label>(22)</label>
<mml:math id="mm26" display="block">
<mml:semantics id="sm26">
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>^</mml:mo></mml:mover>
<mml:mo>=</mml:mo>
<mml:mo>arg</mml:mo>
<mml:mo>min</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn></mml:mfrac>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>‖</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>y</mml:mi>
<mml:mi>T</mml:mi></mml:msup>
<mml:mo>−</mml:mo>
<mml:mi>Φ</mml:mi>
<mml:mo>'</mml:mo>
<mml:mi>A</mml:mi>
<mml:mi>c</mml:mi></mml:mrow>
<mml:mo>‖</mml:mo></mml:mrow></mml:mrow>
<mml:mn>2</mml:mn>
<mml:mn>2</mml:mn></mml:msubsup>
<mml:mo>+</mml:mo>
<mml:mi>λ</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>‖</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>‖</mml:mo></mml:mrow></mml:mrow>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>=</mml:mo>
<mml:mo>arg</mml:mo>
<mml:mo>min</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn></mml:mfrac>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>‖</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>y</mml:mi>
<mml:mi>T</mml:mi></mml:msup>
<mml:mo>−</mml:mo>
<mml:mi>D</mml:mi>
<mml:mi>c</mml:mi></mml:mrow>
<mml:mo>‖</mml:mo></mml:mrow></mml:mrow>
<mml:mn>2</mml:mn>
<mml:mn>2</mml:mn></mml:msubsup>
<mml:mo>+</mml:mo>
<mml:mi>λ</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>‖</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>‖</mml:mo></mml:mrow></mml:mrow>
<mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>The feature dictionary <italic>A</italic> in <xref rid="FD18" ref-type="disp-formula">Equation (18)</xref> is substituted by a sparse projection dictionary <italic>D</italic> = Φ′ <italic>A</italic>, which can be considered as a compressive measurement of original feature dictionary <italic>A</italic>. As [<xref ref-type="bibr" rid="b20-sensors-12-14397">20</xref>] does, the sparse feature dictionary D should also be updated to avoid drift. Clearly, the dimension of dictionary <italic>D</italic> ∈ ℝ<italic><sup>m</sup></italic><sup>×(</sup><italic><sup>Nt</sup></italic><sup>+2</sup><italic><sup>d</sup></italic><sup>)</sup> (<italic>m</italic> ≪ <italic>d</italic>) is reduced by using the random projection matrix Φ′. This will significantly speeds up the process of solving <xref rid="FD22" ref-type="disp-formula">Equation (22)</xref>.</p></sec>
<sec>
<label>5.2.</label>
<title>Compressive Target Image in CI system</title>
<p>After observing <xref rid="FD21" ref-type="disp-formula">Equation (21)</xref>, we have a intuitive idea, whether the compressive measurements <italic>y<sup>T</sup></italic> can be found in a CI system. Suppose that the motion target <italic>x<sup>T</sup></italic> has been detected through our motion detection algorithm and then reconstructed and labeled (see <xref ref-type="fig" rid="f2-sensors-12-14397">Figure 2</xref>), then we can utilize a projection matrix Φ<italic><sub>T</sub></italic> to obtain compressive measurements image <italic>y<sup>T</sup></italic>. Here Φ<italic><sub>T</sub></italic> is a projection matrix by only keeping those columns of Φ whose indices lie in <italic>x<sup>T</sup></italic>. For our CI system, the projection matrix Φ can be accurately identified by an optical calibration method. Therefore, given the location index of motion targets, the projection matrix Φ<italic><sub>T</sub></italic> can be acquired. However, with the movement of target <italic>x<sup>T</sup></italic>, the projection matrix Φ<italic><sub>T</sub></italic> changes as well. In order to simplify our tracking algorithm, the projection matrix Φ′ used in <xref rid="FD21" ref-type="disp-formula">Equation (21)</xref> is fixed. The compressive dictionary <italic>D</italic> can be constructed with these compressive target templates. <xref ref-type="fig" rid="f3-sensors-12-14397">Figure 3</xref> illustrates our motion detection and tracking framework that uses CS sampling images.</p></sec></sec>
<sec>
<label>6.</label>
<title>Experiments</title>
<sec>
<label>6.1.</label>
<title>Optical System Simulated in Matlab</title>
<p>Romberg has proven that the random Toeplitz or Gaussian matrix is incoherent with any orthonormal basis ψ with high probability [<xref ref-type="bibr" rid="b32-sensors-12-14397">32</xref>]. In [<xref ref-type="bibr" rid="b33-sensors-12-14397">33</xref>], a random binary matrix is also proven to be suitable for a projection matrix. Therefore in our experiments, random Gaussian, Toeplitz and binary matrixes are all utilized for phase coded masks. The CAVIAR database provided by INRIA Labs at Grenoble [<xref ref-type="bibr" rid="b34-sensors-12-14397">34</xref>] is utilized as original image sequences. In an outdoor sequence, each frame has a size of 288 × 384 with dynamic range [0,255] and motion objects have been generated manually. <xref ref-type="fig" rid="f4-sensors-12-14397">Figure 4</xref> shows three different phase coded masks we used in our simulation experiments. The corresponding compressive image using random Gaussian phase mask via Matlab simulation is shown in <xref ref-type="fig" rid="f5-sensors-12-14397">Figure 5</xref>.</p></sec>
<sec>
<label>6.2.</label>
<title>Performance of Reconstruction Algorithm</title>
<p>A total variation (TV) optimization algorithm is used to reconstruct the original image from compressive measurements [<xref ref-type="bibr" rid="b28-sensors-12-14397">28</xref>]. The reconstruction is performed using several measurement rates ranging from 50% to 5% and with random Gaussian, Toeplitz and binary phase coded masks, respectively. In our experiments, the signal-to-noise ratio (SNR) is applied to evaluate reconstruction performance. <xref ref-type="fig" rid="f6-sensors-12-14397">Figure 6</xref> shows the reconstruction results with a random Gaussian phase mask.</p>
<p>From <xref ref-type="fig" rid="f6-sensors-12-14397">Figure 6(a)</xref>, we can see that the measurement rate can reduce to 20% without sacrificing performance. While a further decreasing measurement rate, the performance is gradually reduced. With rates as low as 5%, the background and test images are not recovered accurately. <xref ref-type="fig" rid="f6-sensors-12-14397">Figure 6(b)</xref> shows the reconstruction results of foreground <italic>y<sub>d</sub></italic>. We can clearly find in <xref ref-type="fig" rid="f6-sensors-12-14397">Figure 6(b)</xref> that the sparser foreground can be recovered correctly from <italic>y<sub>d</sub></italic> with rates as low as 5%. These simulation results can be explained by the following assumptions: when the sizes of moving objects are smaller than the original image sizes, we can assume that the sparsity of the motion image <italic>K<sub>d</sub></italic> is smaller than <italic>K<sub>b</sub></italic> and <italic>K<sub>t</sub></italic>. According to the CS theory, the number of compressive measurements necessary to reconstruct original image can be given by <italic>K</italic>log(<italic>N</italic>/<italic>k</italic>). Therefore, if <italic>K<sub>d</sub></italic> &lt; <italic>K<sub>b</sub></italic> ≈ <italic>K<sub>t</sub></italic>, the number of compressive measurements will be smaller than the background and test images.</p>
<p><xref ref-type="table" rid="t1-sensors-12-14397">Table 1</xref> compares the reconstruction results by using different phase coded masks. Here, the sampling rate decreased from 100% to 5%, the same TVAL recovery algorithm is utilized to reconstruct the original image, and the SNR is taken as the average of 10 tests. According to <xref ref-type="table" rid="t1-sensors-12-14397">Table 1</xref>, the reconstruction algorithm that employs random Gaussian and Toeplitz masks achieves superior recoverey performances than a random binary mask.</p></sec>
<sec>
<label>6.3.</label>
<title>Performance of Motion Detection Algorithm</title>
<p>As presented earlier, we utilize a mixture Gaussian distribution to model the background. The foreground detection algorithm described in Section 4.3 is used to declare motion objects in compressive sampling space. The motion detection algorithms that use random binary, Gaussian, and Toeplitz phase masks are denoted by RB, RG, and RT respectively in this paper. <xref ref-type="fig" rid="f7-sensors-12-14397">Figure 7</xref> shows the energy curves computed by using <xref rid="FD17" ref-type="disp-formula">Equation (17)</xref> for three different phase mask systems with sampling rates of 10%, 50% and 70% in a 64 × 64 CI block (which included a motion target). Comparing random Gaussian, Toeplitz and binary projections, the energy value collected of compressive measurements is ordered as <italic>E<sub>binary</sub></italic> &gt; <italic>E<sub>gaussian</sub></italic> &gt;<italic>E<sub>toeplitz</sub></italic>. With the decrease of the sampling rate, the energy values computed by using different phase coded masks all reduced gradually. The CS image is declared to include motion targets by using following equation:</p>
<disp-formula id="FD23">
<label>(23)</label>
<mml:math id="mm27" display="block">
<mml:semantics id="sm27">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mtext>If</mml:mtext>
<mml:mo>log</mml:mo>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mi>y</mml:mi></mml:msub>
<mml:mo>≥</mml:mo>
<mml:mi>t</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>d</mml:mi>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mtext>motion target</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mtext>true</mml:mtext></mml:mrow></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mtext>Otherwise</mml:mtext>
<mml:mo>log</mml:mo>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mi>y</mml:mi></mml:msub>
<mml:mo>&lt;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>d</mml:mi>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mtext>motion target</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mtext>false</mml:mtext></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>where <italic>thershold</italic> = log(<italic>E<sub>bu</sub></italic> + <italic>Cσ</italic>), <italic>E<sub>y</sub></italic> is the energy computed by using <xref rid="FD17" ref-type="disp-formula">Equation (17)</xref>, and <italic>E<sub>bμ</sub></italic> is the mean energy of the background CS image. <italic>σ</italic> is the standard variance of <italic>E<sub>μ</sub></italic> and <italic>C</italic> is a constant.</p>
<p>We employ the Area Under Curve (AUC) metrics to evaluate the performance of our motion detection algorithm. <xref ref-type="table" rid="t2-sensors-12-14397">Table 2</xref> shows that the AUC values are affected by the constant <italic>C.</italic> The motion detection performance is the best with constant <italic>C</italic> = 8. Meanwhile the motion detection performance of RB is slightly better than that of RG and RT. The reconstruction performance of RG and RT is better than RB (see <xref ref-type="table" rid="t1-sensors-12-14397">Table 1</xref>). This observation can be explained by the CS theory. In [<xref ref-type="bibr" rid="b32-sensors-12-14397">32</xref>], researchers have proven that random Gaussian and random Toeplitz is incoherent with almost all sparse basis Ψ and thus can recover compressive signals with high possibility. While the binary matrix we used in our experiments are 0–1 matrices, which has been shown that 0,1-matrices require more than O (k log (n/k)) rows to satisfy the RIP [<xref ref-type="bibr" rid="b35-sensors-12-14397">35</xref>]. Therefore when the sparsity of the original image is fixed, we need more compressive measurements to recover original signals by using a random binary mask.</p></sec>
<sec>
<label>6.4.</label>
<title>Performance of Our Motion Tracking Algorithm</title>
<sec>
<label>6.4.1.</label>
<title>Tracking Efficiency</title>
<p>To evaluate the performance of our tracking algorithm, three videos were used in the experiments. The first test sequence is an infrared (IR) image sequence that was also used in [<xref ref-type="bibr" rid="b20-sensors-12-14397">20</xref>]. CAVIAR [<xref ref-type="bibr" rid="b34-sensors-12-14397">34</xref>] and PET2001 databases [<xref ref-type="bibr" rid="b36-sensors-12-14397">36</xref>] were also used to examine our algorithm in terms of efficiency and accuracy. In our experiments, a random Gaussian projection matrix was performed with the dictionary dimension reduced from 100% to 83%, 55%, 22% and 10%. We retained the other experimental parameters as in [<xref ref-type="bibr" rid="b20-sensors-12-14397">20</xref>]. In <xref ref-type="table" rid="t3-sensors-12-14397">Table 3</xref> we recorded the elapsed time of the <italic>l</italic><sub>1</sub> tracker and our CS tracker for each test experiment. According to <xref ref-type="table" rid="t3-sensors-12-14397">Table 3</xref>, our CS tracker is 4–5 times faster than <italic>l</italic><sub>1</sub> tracker, even without dimensional reduction operation. With the decrease in sampling rates, our CS tracker is 10 times faster than <italic>l</italic><sub>1</sub> tracker. <xref ref-type="fig" rid="f8-sensors-12-14397">Figure 8</xref> shows our tracking results with three video sequences.</p>
<p>From the experimental results we can seen that the computation of our CS-<italic>l</italic><sub>1</sub> tracking algorithm is much cheaper. First, the reduction of templates' dimensionality would speed up the optimization process. Second, probably the most important reason is that our method can lower the rank of feature dictionary matrix <italic>A</italic>. Mathematically, <italic>rank</italic>(<italic>AB</italic>) ≤ min {<italic>rank</italic>(<italic>A</italic>), <italic>rank</italic>(<italic>B</italic>)}, therefore <italic>rank</italic>(<italic>D</italic> = Φ<italic>A</italic>) ≤ <italic>rank</italic>(<italic>A</italic>). The rank of our CS-<italic>l</italic><sub>1</sub> tracker is smaller than that of <italic>l</italic><sub>1</sub> tracker, which accelerates the rate of iteration convergence obviously and hence makes it faster than its counterpart.</p></sec>
<sec>
<label>6.4.2.</label>
<title>Tracking Accuracy</title>
<p>Intuitively, with the reduction of the sampling rate the tracking accuracy will decrease. Thus we also examine the tracking accuracy of our tracker with <italic>l</italic><sub>1</sub> tracker. For the PetsD2 video sequence, the red points are the trajectories of the motion target computed by using the <italic>l</italic><sub>1</sub> tracker. Cyan, blue and green points are positions computed using our method with a sampling rate from 22%, 55% to 100%. As illustrated in <xref ref-type="fig" rid="f9-sensors-12-14397">Figure 9</xref>, the tracking approaches achieve similar performance on the video sequence with a sampling rate of 100%. With the decrease in sampling rates, the position error gradually increased.</p></sec></sec></sec>
<sec sec-type="conclusions">
<label>7.</label>
<title>Conclusions</title>
<p>We have demonstrated that by using a CI system we can detect and track objects in motion with significantly fewer data samples than conventional image methods. A parallel coded aperture imaging array, which is based on a phase-coded 4F system, is used to simulate compressive sensing images. A Gaussian mixture model is generated off-line for later use in on-line foreground detection directly in the compressive domain and a TV optimization algorithm is used for image reconstruction. A real-time CS tracking algorithm is proposed and then applied using compressive sensing images. For compressive imaging system, experimental results show that with the decrease in measurement rates, the recovered image performance is gradually reduced. Compared with the random binary mask, simulation results show that the use of random Gaussian or Toeplitz phase masks can achieve high resolution reconstructed images. Motion detection experimental results demonstrate that low dimensional compressed imaging representation is sufficient to determine spatial motion targets. The minimum amount of measurements to perform motion detection algorithm in compressive domain is fewer than the number of measurements needed to recover background and the test image. Motion tracking results show that we can construct a compressive dictionary and use it as a template set in the CS image space. With the same <italic>l</italic><sub>1</sub> reconstruction algorithm, our CS tracking method is 10 times faster than <italic>l</italic><sub>1</sub> tracking method.</p></sec></body>
<back>
<ack>
<p>This work is supported by the National Basic Research Program of China (2010CB732505) and the National Natural Science Foundation of China (60903070, 61271375, 60903069, 60902103).</p></ack>
<ref-list>
<title>References</title>
<ref id="b1-sensors-12-14397"><label>1.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Candes</surname><given-names>E.J.</given-names></name><name><surname>Romberg</surname><given-names>J.</given-names></name><name><surname>Tao</surname><given-names>T.</given-names></name></person-group><article-title>Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information</article-title><source>IEEE Trans. Inform. Theory</source><year>2006</year><volume>52</volume><fpage>489</fpage><lpage>509</lpage><pub-id pub-id-type="doi">10.1109/TIT.2005.862083</pub-id></citation></ref>
<ref id="b2-sensors-12-14397"><label>2.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Donoho</surname><given-names>D.L.</given-names></name></person-group><article-title>Compressed sensing</article-title><source>IEEE Trans. Inform. Theory</source><year>2006</year><volume>52</volume><fpage>1289</fpage><lpage>1306</lpage><pub-id pub-id-type="doi">10.1109/TIT.2006.871582</pub-id></citation></ref>
<ref id="b3-sensors-12-14397"><label>3.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Haupt</surname><given-names>J.</given-names></name><name><surname>Nowak</surname><given-names>R.</given-names></name></person-group><article-title>Compressive sampling <italic>vs.</italic>conventional imaging</article-title><conf-name>Proceedings of International Conference on Image Processing (ICIP)</conf-name><conf-loc>Atlanta, GA, USA</conf-loc><conf-date>8– 11 October 2006</conf-date><fpage>1269</fpage><lpage>1272</lpage></citation></ref>
<ref id="b4-sensors-12-14397"><label>4.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Candes</surname><given-names>E.J.</given-names></name><name><surname>Tao</surname><given-names>T.</given-names></name></person-group><article-title>Near optimal signal recovery from random projections: Universal encoding strategies?</article-title><source>IEEE Trans. Inform. Theory</source><year>2006</year><volume>52</volume><fpage>5406</fpage><lpage>5425</lpage><pub-id pub-id-type="doi">10.1109/TIT.2006.885507</pub-id></citation></ref>
<ref id="b5-sensors-12-14397"><label>5.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tropp</surname><given-names>J.A.</given-names></name></person-group><article-title>Just relax: Convex programming methods for identifying sparse signals in noise</article-title><source>IEEE Trans. Inform. Theory</source><year>2006</year><volume>52</volume><fpage>1030</fpage><lpage>1051</lpage><pub-id pub-id-type="doi">10.1109/TIT.2005.864420</pub-id></citation></ref>
<ref id="b6-sensors-12-14397"><label>6.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Duarte</surname><given-names>M.F.</given-names></name><name><surname>Davenport</surname><given-names>M.A.</given-names></name><name><surname>Takhar</surname><given-names>D.</given-names></name><name><surname>Laska</surname><given-names>J.N.</given-names></name><name><surname>Sun</surname><given-names>T.</given-names></name><name><surname>Kelly</surname><given-names>K.F.</given-names></name><name><surname>Baraniuk</surname><given-names>R.G.</given-names></name></person-group><article-title>Single-pixel imaging via compressive sampling</article-title><source>IEEE Signal Process. Mag.</source><year>2008</year><volume>25</volume><fpage>83</fpage><lpage>91</lpage><pub-id pub-id-type="doi">10.1109/MSP.2007.914730</pub-id></citation></ref>
<ref id="b7-sensors-12-14397"><label>7.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Marcia</surname><given-names>R.F.</given-names></name><name><surname>Willett</surname><given-names>R.M.</given-names></name></person-group><article-title>Compressive coded aperture video reconstruction</article-title><conf-name>Proceedings of 2008 Sixteenth European Signal Processing Conference</conf-name><conf-loc>Lausanne, Switzerland</conf-loc><conf-date>25–29 August 2008</conf-date></citation></ref>
<ref id="b8-sensors-12-14397"><label>8.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marcia</surname><given-names>R.F.</given-names></name><name><surname>Harmany</surname><given-names>Z.T.</given-names></name><name><surname>Willett</surname><given-names>R.M.</given-names></name></person-group><article-title>Compressive coded apertures for high-resolution imaging</article-title><source>Proc. SPIE</source><year>2010</year><volume>7723</volume><pub-id pub-id-type="doi">10.1117/12.849487</pub-id></citation></ref>
<ref id="b9-sensors-12-14397"><label>9.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Harmany</surname><given-names>Z.T.</given-names></name><name><surname>Marcia</surname><given-names>R.F.</given-names></name><name><surname>Willett</surname><given-names>R.M.</given-names></name></person-group><article-title>Spatio-temporal compressed sensing with coded apertures and keyed exposures</article-title><source>IEEE Trans. Image Process.</source><year>2011</year><comment>submitted</comment></citation></ref>
<ref id="b10-sensors-12-14397"><label>10.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Fergus</surname><given-names>R.</given-names></name><name><surname>Torralba</surname><given-names>A.</given-names></name><name><surname>Freeman</surname><given-names>W.T.</given-names></name></person-group><source>Random Lens Imaging</source><comment>MIT-CSAIL-TR-2006-058</comment><publisher-name>Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory</publisher-name><publisher-loc>Cambridge, MA, USA</publisher-loc><year>2006</year></citation></ref>
<ref id="b11-sensors-12-14397"><label>11.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Wang</surname><given-names>Q.</given-names></name><name><surname>Shi</surname><given-names>G.M.</given-names></name></person-group><article-title>Super-resolution imager via compressive sensing</article-title><conf-name>Proceedings of 2010 IEEE 10th International Conference on Signal Processing</conf-name><conf-loc>Beijing, China</conf-loc><conf-date>24– 28 October 2010</conf-date><fpage>956</fpage><lpage>959</lpage></citation></ref>
<ref id="b12-sensors-12-14397"><label>12.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Neifeld</surname><given-names>M.A.</given-names></name><name><surname>Shankar</surname><given-names>P.M.</given-names></name></person-group><article-title>Feature-specific imaging</article-title><source>Appl. Opt.</source><year>2003</year><volume>42</volume><fpage>3379</fpage><lpage>3389</lpage><pub-id pub-id-type="doi">10.1364/AO.42.003379</pub-id><pub-id pub-id-type="pmid">12816325</pub-id></citation></ref>
<ref id="b13-sensors-12-14397"><label>13.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baheti</surname><given-names>P.</given-names></name><name><surname>Neifeld</surname><given-names>M.A.</given-names></name></person-group><article-title>Adaptive feature-specific imaging: A face recognition example</article-title><source>Appl. Opt.</source><year>2008</year><volume>47</volume><fpage>B21</fpage><lpage>B31</lpage><pub-id pub-id-type="doi">10.1364/AO.47.000B21</pub-id><pub-id pub-id-type="pmid">18382548</pub-id></citation></ref>
<ref id="b14-sensors-12-14397"><label>14.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aggarwal</surname><given-names>A.</given-names></name><name><surname>Biswas</surname><given-names>S.</given-names></name><name><surname>Singh</surname><given-names>E.</given-names></name><name><surname>Sural</surname><given-names>S.</given-names></name><name><surname>Majumdar</surname><given-names>A.K.</given-names></name></person-group><article-title>Object tracking Using Background Subtraction and Motion Estimation in MPEG Videos</article-title><source>Lect. Notes Comput. Sci.</source><year>2006</year><volume>3852</volume><fpage>121</fpage><lpage>130</lpage></citation></ref>
<ref id="b15-sensors-12-14397"><label>15.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Toreyin</surname><given-names>B.U.</given-names></name><name><surname>Cetin</surname><given-names>A.E.</given-names></name><name><surname>Aksay</surname><given-names>A.</given-names></name><name><surname>Akhan</surname><given-names>M.B.</given-names></name></person-group><article-title>Moving object detection in wavelet compressed video</article-title><source>Signal Process. Image Commun.</source><year>2005</year><volume>20</volume><fpage>255</fpage><lpage>264</lpage><pub-id pub-id-type="doi">10.1016/j.image.2004.12.002</pub-id></citation></ref>
<ref id="b16-sensors-12-14397"><label>16.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cevher</surname><given-names>V.</given-names></name><name><surname>Sankaranarayanan</surname><given-names>A.</given-names></name><name><surname>Duarte</surname><given-names>M.F.</given-names></name><name><surname>Reddy</surname><given-names>D.</given-names></name><name><surname>Baraniuk</surname><given-names>R.G.</given-names></name><name><surname>Chellappa</surname><given-names>R.</given-names></name></person-group><article-title>Compressive sensing for background subtraction</article-title><source>Lect. Notes Comput. Sci.</source><year>2008</year><volume>5303</volume><fpage>155</fpage><lpage>168</lpage></citation></ref>
<ref id="b17-sensors-12-14397"><label>17.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jiang</surname><given-names>H.</given-names></name><name><surname>Deng</surname><given-names>W.</given-names></name><name><surname>Shen</surname><given-names>Z.</given-names></name></person-group><article-title>Surveillance video processing using compressive sensing</article-title><source>AIMS</source><year>2012</year><volume>6</volume><fpage>201</fpage><lpage>214</lpage></citation></ref>
<ref id="b18-sensors-12-14397"><label>18.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Vaswani</surname><given-names>N.</given-names></name></person-group><article-title>Kalman filtered compressed sensing</article-title><conf-name>Proceedings of 15th IEEE International Conference on Image Processing</conf-name><conf-loc>San Diego, CA, USA</conf-loc><conf-date>12–15 October 2008</conf-date><fpage>893</fpage><lpage>896</lpage></citation></ref>
<ref id="b19-sensors-12-14397"><label>19.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Wang</surname><given-names>E.</given-names></name><name><surname>Silva</surname><given-names>J.</given-names></name><name><surname>Carin</surname><given-names>L.</given-names></name></person-group><article-title>Compressive particle filtering for target tracking</article-title><conf-name>Proceedings of IEEE /SP 15th Workshop on Statistical Signal Processing</conf-name><conf-loc>Cardiff, Wales, UK</conf-loc><conf-date>31 August–3 September 2009</conf-date><fpage>233</fpage><lpage>236</lpage></citation></ref>
<ref id="b20-sensors-12-14397"><label>20.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mei</surname><given-names>X.</given-names></name><name><surname>Ling</surname><given-names>H.</given-names></name></person-group><article-title>Robust visual tracking and vehicle classification via sparse representation</article-title><source>IEEE Trans. Pattern Anal. Mach. Int.</source><year>2011</year><volume>33</volume><fpage>2259</fpage><lpage>2272</lpage><pub-id pub-id-type="doi">10.1109/TPAMI.2011.66</pub-id></citation></ref>
<ref id="b21-sensors-12-14397"><label>21.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rivenson</surname><given-names>Y.</given-names></name><name><surname>Stern</surname><given-names>A.</given-names></name></person-group><article-title>Compressed imaging with a separable sensing operator</article-title><source>IEEE Signal Process. Lett.</source><year>2009</year><volume>16</volume><fpage>449</fpage><lpage>452</lpage><pub-id pub-id-type="doi">10.1109/LSP.2009.2017817</pub-id></citation></ref>
<ref id="b22-sensors-12-14397"><label>22.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Seber</surname><given-names>F.</given-names></name><name><surname>Zou</surname><given-names>Y.M.</given-names></name><name><surname>Ying</surname><given-names>L.</given-names></name></person-group><article-title>Toeplitz block matrices in compressed sensing and their applications in imaging</article-title><conf-name>Proceedings of International Conference on Information Technology and Applications in Biomedicine</conf-name><conf-loc>Shenzhen, China</conf-loc><conf-date>30–31 May 2008</conf-date><fpage>47</fpage><lpage>50</lpage></citation></ref>
<ref id="b23-sensors-12-14397"><label>23.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yin</surname><given-names>W.</given-names></name><name><surname>Morgan</surname><given-names>S.</given-names></name><name><surname>Yang</surname><given-names>J.</given-names></name><name><surname>Zhang</surname><given-names>Y.</given-names></name></person-group><article-title>Practical compressive sensing with toeplitz and circulant matrices</article-title><source>Proc. SPIE</source><year>2010</year><volume>7744</volume><pub-id pub-id-type="doi">10.1117/12.863527</pub-id></citation></ref>
<ref id="b24-sensors-12-14397"><label>24.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Friedman</surname><given-names>N.</given-names></name><name><surname>Russell</surname><given-names>S.</given-names></name></person-group><article-title>Image segmentation in video sequences: A probabilistic approach</article-title><conf-name>Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence</conf-name><conf-loc>Providence, RI, USA</conf-loc><conf-date>1–3 August 1997</conf-date><fpage>175</fpage><lpage>181</lpage></citation></ref>
<ref id="b25-sensors-12-14397"><label>25.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yu</surname><given-names>G.S.</given-names></name><name><surname>Sapiro</surname><given-names>G.</given-names></name><name><surname>Mallat</surname><given-names>S.</given-names></name></person-group><article-title>Solving Inverse Problems with Piecewise Linear Structured Sparsity Estimators: From Gaussian Mixture Models to Structured Sparsity</article-title><source>IEEE Trans. Image Process.</source><year>2012</year><volume>21</volume><fpage>2481</fpage><lpage>2499</lpage><pub-id pub-id-type="doi">10.1109/TIP.2011.2176743</pub-id><pub-id pub-id-type="pmid">22180506</pub-id></citation></ref>
<ref id="b26-sensors-12-14397"><label>26.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dempster</surname><given-names>A.</given-names></name><name><surname>Laird</surname><given-names>N.</given-names></name><name><surname>Rubin</surname><given-names>D.</given-names></name></person-group><article-title>Maximum likelihood from incomplete data via the EM algorithm</article-title><source>J. Roy. Stat. Soc. Ser. B Met.</source><year>1977</year><volume>39</volume><fpage>1</fpage><lpage>38</lpage></citation></ref>
<ref id="b27-sensors-12-14397"><label>27.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Stauffer</surname><given-names>C.</given-names></name><name><surname>Grimson</surname><given-names>W.</given-names></name></person-group><article-title>Adaptive background mixture models for real-time tracking</article-title><conf-name>Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition</conf-name><conf-loc>Fort Collins, CO, USA</conf-loc><conf-date>23–25 June 1999</conf-date></citation></ref>
<ref id="b28-sensors-12-14397"><label>28.</label><citation citation-type="web"><article-title>TVAL3: TV minimization by Augmented Lagrangian and ALternating direction Algorithms</article-title><comment>Available online: <ext-link xlink:href="http://www.caam.rice.edu/~optimization/L1/TVAL3/" ext-link-type="uri">http://www.caam.rice.edu/∼optimization/L1/TVAL3/</ext-link> (accessed on 17 October 2012)</comment></citation></ref>
<ref id="b29-sensors-12-14397"><label>29.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tropp</surname><given-names>J.</given-names></name><name><surname>Gilbert</surname><given-names>A.</given-names></name></person-group><article-title>Signal recovery from random measurements via orthogonal matching pursuit</article-title><source>IEEE Trans. Inform. Theory</source><year>2007</year><volume>53</volume><fpage>4655</fpage><lpage>4666</lpage><pub-id pub-id-type="doi">10.1109/TIT.2007.909108</pub-id></citation></ref>
<ref id="b30-sensors-12-14397"><label>30.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fiqueiredo</surname><given-names>M.A.T.</given-names></name><name><surname>Nowak</surname><given-names>R.D.</given-names></name><name><surname>Wright</surname><given-names>S.J.</given-names></name></person-group><article-title>Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems</article-title><source>J. STSP</source><year>2007</year><volume>1</volume><fpage>586</fpage><lpage>598</lpage></citation></ref>
<ref id="b31-sensors-12-14397"><label>31.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Efron</surname><given-names>B.</given-names></name><name><surname>Hastie</surname><given-names>T.</given-names></name><name><surname>Johnstone</surname><given-names>I.</given-names></name><name><surname>Tibshirani</surname><given-names>R.</given-names></name></person-group><article-title>Least angle regression</article-title><source>Ann. Stat.</source><year>2004</year><volume>32</volume><fpage>407</fpage><lpage>499</lpage><pub-id pub-id-type="doi">10.1214/009053604000000067</pub-id></citation></ref>
<ref id="b32-sensors-12-14397"><label>32.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Romberg</surname><given-names>J.</given-names></name></person-group><article-title>Compressive sensing by random convolution</article-title><source>SIAM J. Imaging Sci.</source><year>2009</year><volume>2</volume><fpage>1098</fpage><lpage>1128</lpage><pub-id pub-id-type="doi">10.1137/08072975X</pub-id></citation></ref>
<ref id="b33-sensors-12-14397"><label>33.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Berinde</surname><given-names>R.</given-names></name><name><surname>Indyk</surname><given-names>P.</given-names></name></person-group><article-title>Sparse recovery using sparse random matrices</article-title><source>Lect. Notes Comput. Sci.</source><year>2010</year><volume>6034</volume><fpage>157</fpage><lpage>167</lpage></citation></ref>
<ref id="b34-sensors-12-14397"><label>34.</label><citation citation-type="web"><article-title>CAVIAR: Context Aware Vision using Image-based Active Recognition</article-title><comment>Available online: <ext-link xlink:href="http://homepages.inf.ed.ac.uk/rbf/CAVIAR/" ext-link-type="uri">http://homepages.inf.ed.ac.uk/rbf/CAVIAR/</ext-link> (accessed on 19 October 2012)</comment></citation></ref>
<ref id="b35-sensors-12-14397"><label>35.</label><citation citation-type="web"><person-group person-group-type="author"><name><surname>Chandar</surname><given-names>V.</given-names></name></person-group><article-title>A Negative Result Concerning Explicit Matrices with the Restricted Isometry Property</article-title><comment>Available online: <ext-link xlink:href="http://www.projectedu.com/a-negative-result-concerning-explicit-matrices-with-the-restricted/" ext-link-type="uri">http://www.projectedu.com/a-negative-result-concerning-explicit-matrices-with-the-restricted/</ext-link> (accessed on 17 October 2012)</comment></citation></ref>
<ref id="b36-sensors-12-14397"><label>36.</label><citation citation-type="web"><article-title>Performance Evaluation of Surveillance Systems</article-title><comment>Available online: <ext-link xlink:href="http://www.research.ibm.com/peoplevision/performanceevaluation.html" ext-link-type="uri">http://www.research.ibm.com/peoplevision/performanceevaluation.html</ext-link> (accessed on 17 October 2012)</comment></citation></ref></ref-list>
<sec sec-type="display-objects">
<title>Figures and Tables</title>
<fig id="f1-sensors-12-14397" position="float">
<label>Figure 1.</label>
<caption>
<p>(<bold>a</bold>) Optical compressive imaging system. (<bold>b</bold>) A typical 4F optical system.</p></caption>
<graphic xlink:href="sensors-12-14397f1.gif"/></fig>
<fig id="f2-sensors-12-14397" position="float">
<label>Figure 2.</label>
<caption>
<p>Calculation of CS motion target.</p></caption>
<graphic xlink:href="sensors-12-14397f2.gif"/></fig>
<fig id="f3-sensors-12-14397" position="float">
<label>Figure 3.</label>
<caption>
<p>Detection and tracking framework using CS images.</p></caption>
<graphic xlink:href="sensors-12-14397f3.gif"/></fig>
<fig id="f4-sensors-12-14397" position="float">
<label>Figure 4.</label>
<caption>
<p>Different mask types.</p></caption>
<graphic xlink:href="sensors-12-14397f4.gif"/></fig>
<fig id="f5-sensors-12-14397" position="float">
<label>Figure 5.</label>
<caption>
<p>Original image and the corresponding compressive image via Matlab simulation platform. (<bold>a</bold>) Original image; (<bold>b</bold>) CS image using random Gaussian phase coded mask.</p></caption>
<graphic xlink:href="sensors-12-14397f5.gif"/></fig>
<fig id="f6-sensors-12-14397" position="float">
<label>Figure 6.</label>
<caption>
<p>(<bold>a</bold>). Reconstruction of background images and test images with sampling rates from 50% to 5%, and iterations = 800. (<bold>b</bold>). The foreground compressive image reconstructed with sampling rates from 50% to 5% and iterations = 800.</p></caption>
<graphic xlink:href="sensors-12-14397f6.gif"/></fig>
<fig id="f7-sensors-12-14397" position="float">
<label>Figure 7.</label>
<caption>
<p>Energy curves computed in a 64 × 64 CI block using different phase masks with sampling rate 70%, 50% and 10% respectively.</p></caption>
<graphic xlink:href="sensors-12-14397f7.gif"/></fig>
<fig id="f8-sensors-12-14397" position="float">
<label>Figure 8.</label>
<caption>
<p>The tracking results with our CS tracker.</p></caption>
<graphic xlink:href="sensors-12-14397f8.gif"/></fig>
<fig id="f9-sensors-12-14397" position="float">
<label>Figure 9.</label>
<caption>
<p>The position of motion targets computed by using our method and <italic>l</italic><sub>1</sub> tracker for pets sequences.</p></caption>
<graphic xlink:href="sensors-12-14397f9.gif"/></fig>
<table-wrap id="t1-sensors-12-14397" position="float">
<label>Table 1.</label>
<caption>
<p>Reconstruction performance with different phased coded mask styles.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center" valign="top"><bold>SNR</bold></th>
<th align="center" valign="top"><bold>100%</bold></th>
<th align="center" valign="top"><bold>70%</bold></th>
<th align="center" valign="top"><bold>50%</bold></th>
<th align="center" valign="top"><bold>30%</bold></th>
<th align="center" valign="top"><bold>10%</bold></th>
<th align="center" valign="top"><bold>5%</bold></th></tr></thead>
<tbody>
<tr>
<td align="center" valign="top">Binary</td>
<td align="center" valign="top">32</td>
<td align="center" valign="top">15.9</td>
<td align="center" valign="top">13</td>
<td align="center" valign="top">10.3</td>
<td align="center" valign="top">7.2</td>
<td align="center" valign="top">5.7</td></tr>
<tr>
<td align="center" valign="top">Gaussian</td>
<td align="center" valign="top">32.1</td>
<td align="center" valign="top">26.6</td>
<td align="center" valign="top">20.3</td>
<td align="center" valign="top">14.4</td>
<td align="center" valign="top">9.2</td>
<td align="center" valign="top">7.4</td></tr>
<tr>
<td align="center" valign="top">Toeplitz</td>
<td align="center" valign="top">32</td>
<td align="center" valign="top">25.7</td>
<td align="center" valign="top">19.5</td>
<td align="center" valign="top">14.1</td>
<td align="center" valign="top">9.0</td>
<td align="center" valign="top">7.3</td></tr></tbody></table></table-wrap>
<table-wrap id="t2-sensors-12-14397" position="float">
<label>Table 2.</label>
<caption>
<p>AUC for motion detection using different thresholds and 50%, 10% sampling rates.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center" valign="middle"><bold>AUC</bold></th>
<th align="center" valign="top"><bold>RB (50%)</bold></th>
<th align="center" valign="top"><bold>RG (50%)</bold></th>
<th align="center" valign="top"><bold>RT (50%)</bold></th>
<th align="center" valign="top"><bold>RB (10%)</bold></th>
<th align="center" valign="top"><bold>RG (10%)</bold></th>
<th align="center" valign="top"><bold>RT (10%)</bold></th></tr></thead>
<tbody>
<tr>
<td align="center" valign="top"><italic>th</italic> = log(<italic>E<sub>bu</sub></italic> + 6<italic>σ</italic>)</td>
<td align="center" valign="top">0.975</td>
<td align="center" valign="top">0.8875</td>
<td align="center" valign="top">0.9375</td>
<td align="center" valign="top">0.9625</td>
<td align="center" valign="top">0.825</td>
<td align="center" valign="top">0.8</td></tr>
<tr>
<td align="center" valign="top"><italic>th</italic> = log(<italic>E<sub>bu</sub></italic> + 8<italic>σ</italic>)</td>
<td align="center" valign="top">0.975</td>
<td align="center" valign="top">0.9625</td>
<td align="center" valign="top">0.9625</td>
<td align="center" valign="top">0.95</td>
<td align="center" valign="top">0.9625</td>
<td align="center" valign="top">0.9625</td></tr>
<tr>
<td align="center" valign="top"><italic>th</italic> = log(<italic>E<sub>bu</sub></italic> + 15<italic>σ</italic>)</td>
<td align="center" valign="top">0.9375</td>
<td align="center" valign="top">0.95</td>
<td align="center" valign="top">0.95</td>
<td align="center" valign="top">0.925</td>
<td align="center" valign="top">0.95</td>
<td align="center" valign="top">0.95</td></tr></tbody></table></table-wrap>
<table-wrap id="t3-sensors-12-14397" position="float">
<label>Table 3.</label>
<caption>
<p>The running speed of <italic>l</italic><sub>1</sub> tracker and our CS tracker with 300 particles.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center" valign="top"/>
<th align="center" valign="top"><bold>L1 tracker</bold></th>
<th align="center" valign="top"><bold>Our 100%</bold></th>
<th align="center" valign="top"><bold>Our 83%</bold></th>
<th align="center" valign="top"><bold>Our 55%</bold></th>
<th align="center" valign="top"><bold>Our 22%</bold></th>
<th align="center" valign="top"><bold>Our 10%</bold></th></tr></thead>
<tbody>
<tr>
<td align="center" valign="top">IR image</td>
<td align="center" valign="top">4.6 s</td>
<td align="center" valign="top">1 s</td>
<td align="center" valign="top">0.77 s</td>
<td align="center" valign="top">0.56 s</td>
<td align="center" valign="top">0.50 s</td>
<td align="center" valign="top">0.45 s</td></tr>
<tr>
<td align="center" valign="top">CAVIAR</td>
<td align="center" valign="top">4.79 s</td>
<td align="center" valign="top">0.91 s</td>
<td align="center" valign="top">0.68 s</td>
<td align="center" valign="top">0.61 s</td>
<td align="center" valign="top">0.55 s</td>
<td align="center" valign="top">0.51 s</td></tr>
<tr>
<td align="center" valign="top">Pets</td>
<td align="center" valign="top">5.14 s</td>
<td align="center" valign="top">0.72 s</td>
<td align="center" valign="top">0.63 s</td>
<td align="center" valign="top">0.57 s</td>
<td align="center" valign="top">0.51 s</td>
<td align="center" valign="top">0.47 s</td></tr></tbody></table></table-wrap></sec></back></article>
