<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">REMSE</journal-id>
<journal-title>Remote Sensing</journal-title>
<issn pub-type="epub">2072-4292</issn>
<publisher>
<publisher-name>Molecular Diversity Preservation International (MDPI)</publisher-name></publisher></journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3390/rs4041090</article-id>
<article-id pub-id-type="publisher-id">remotesensing-04-01090</article-id>
<article-categories>
<subj-group>
<subject>Article</subject></subj-group></article-categories>
<title-group>
<article-title>A Real-Time Method to Detect and Track Moving Objects (DATMO) from Unmanned Aerial Vehicles (UAVs) Using a Single Camera</article-title></title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Rodríguez-Canosa</surname><given-names>Gonzalo R.</given-names></name><xref ref-type="aff" rid="af1-remotesensing-04-01090"><sup>1</sup></xref><xref ref-type="corresp" rid="c1-remotesensing-04-01090"><sup>*</sup></xref></contrib>
<contrib contrib-type="author">
<name><surname>Thomas</surname><given-names>Stephen</given-names></name><xref ref-type="aff" rid="af2-remotesensing-04-01090"><sup>2</sup></xref></contrib>
<contrib contrib-type="author">
<name><surname>del Cerro</surname><given-names>Jaime</given-names></name><xref ref-type="aff" rid="af1-remotesensing-04-01090"><sup>1</sup></xref></contrib>
<contrib contrib-type="author">
<name><surname>Barrientos</surname><given-names>Antonio</given-names></name><xref ref-type="aff" rid="af1-remotesensing-04-01090"><sup>1</sup></xref></contrib>
<contrib contrib-type="author">
<name><surname>MacDonald</surname><given-names>Bruce</given-names></name><xref ref-type="aff" rid="af2-remotesensing-04-01090"><sup>2</sup></xref></contrib></contrib-group>
<aff id="af1-remotesensing-04-01090">
<label>1</label> Centro de Automática &amp; Robótica, Universidad Politécnica de Madrid, C/Jose Gutierrez Abascal nº2, E-28006 Madrid, Spain; E-Mails: <email>j.cerro@upm.es</email> (J.C.); <email>antonio.barrientos@upm.es</email> (A.B.)</aff>
<aff id="af2-remotesensing-04-01090">
<label>2</label> Department of Electrical and Computer Engineering, University of Auckland, Private Bag 92019, Auckland, New Zealand; E-Mails: <email>s.thomas@auckland.ac.nz</email> (S.T.); <email>b.macdonald@auckland.ac.nz</email> (B.M.)</aff>
<author-notes>
<corresp id="c1-remotesensing-04-01090">
<label>*</label>Author to whom correspondence should be addressed; E-Mail: <email>gonzalo.rcanosa@upm.es</email>; Tel.: +34-91-336-3061.</corresp></author-notes>
<pub-date pub-type="collection">
<year>2012</year></pub-date>
<pub-date pub-type="epub">
<day>20</day>
<month>04</month>
<year>2012</year></pub-date>
<volume>4</volume>
<issue>4</issue>
<fpage>1090</fpage>
<lpage>1111</lpage>
<history>
<date date-type="received">
<day>29</day>
<month>02</month>
<year>2012</year></date>
<date date-type="rev-recd">
<day>11</day>
<month>04</month>
<year>2012</year></date>
<date date-type="accepted">
<day>13</day>
<month>04</month>
<year>2012</year></date></history>
<copyright-statement>© 2012 by the authors; licensee MDPI, Basel, Switzerland.</copyright-statement>
<copyright-year>2012</copyright-year>
<license>
<p>This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (<ext-link xlink:href="http://creativecommons.org/licenses/by/3.0/" ext-link-type="uri">http://creativecommons.org/licenses/by/3.0/</ext-link>.)</p></license>
<abstract>
<p>We develop a real-time method to detect and track moving objects (DATMO) from unmanned aerial vehicles (UAVs) using a single camera. To address the challenging characteristics of these vehicles, such as continuous unrestricted pose variation and low-frequency vibrations, new approaches must be developed. The main concept proposed in this work is to create an artificial optical flow field by estimating the camera motion between two subsequent video frames. The core of the methodology consists of comparing this artificial flow with the real optical flow directly calculated from the video feed. The motion of the UAV between frames is estimated with available parallel tracking and mapping techniques that identify good static features in the images and follow them between frames. By comparing the two optical flows, a list of dynamic pixels is obtained and then grouped into dynamic objects. Tracking these dynamic objects through time and space provides a filtering procedure to eliminate spurious events and misdetections. The algorithms have been tested with a quadrotor platform using a commercial camera.</p></abstract>
<kwd-group>
<kwd>UAV</kwd>
<kwd>DATMO</kwd>
<kwd>optical flow</kwd>
<kwd>surveillance</kwd>
<kwd>homography-based optical flow</kwd>
<kwd>Kalman Filter Tracking</kwd></kwd-group></article-meta></front>
<body>
<sec sec-type="intro">
<label>1.</label>
<title>Introduction</title>
<p>Detection and tracking of dynamic objects has become an important field for the correct development of many multidisciplinary applications, such as traffic supervision [<xref ref-type="bibr" rid="b1-remotesensing-04-01090">1</xref>], autonomous robot navigation [<xref ref-type="bibr" rid="b2-remotesensing-04-01090">2</xref>,<xref ref-type="bibr" rid="b3-remotesensing-04-01090">3</xref>], and surveillance of large facilities [<xref ref-type="bibr" rid="b4-remotesensing-04-01090">4</xref>]. This article is primarily focused on detection of moving objects from aerial vehicles for surveillance, although other potential applications could also benefit from the results.</p>
<p>The background to dynamic image analysis from moving vehicles can be divided into four main topics [<xref ref-type="bibr" rid="b5-remotesensing-04-01090">5</xref>]: background subtraction methods, sparse features tracking methods, background modeling techniques and robot motion models.</p>
<p>Background subtraction methods, mostly used with stationary cameras, separate foreground moving objects from the background [<xref ref-type="bibr" rid="b6-remotesensing-04-01090">6</xref>,<xref ref-type="bibr" rid="b7-remotesensing-04-01090">7</xref>]. Other approaches use stereo disparity background models [<xref ref-type="bibr" rid="b8-remotesensing-04-01090">8</xref>] for people tracking. Kalafatic <italic>et al.</italic> [<xref ref-type="bibr" rid="b9-remotesensing-04-01090">9</xref>] propose a real-time system to detect and track quasi-rigid moving objects for pharmaceutical purposes that is based on computing sparse optical flow along contours. Zhang <italic>et al.</italic> [<xref ref-type="bibr" rid="b10-remotesensing-04-01090">10</xref>] use polar-log images to enhance the performance of optical flow estimation methods. In this latter case, the optical flow is only computed along the edge of the moving features. Since these two methods use static cameras, the moving contours are easily determined since the static pixels do not change their position in the image.</p>
<p>These techniques are not sufficient when the camera is attached to a moving robot. Under these recording conditions, adaptive background models [<xref ref-type="bibr" rid="b11-remotesensing-04-01090">11</xref>] have been used because they can incorporate changes in the images produced by illumination variations in outdoor scenes or background changes due to small camera motions. However, these methods are not robust when the scene changes rapidly, and then they usually fail. To improve the detection process under such conditions, the camera motion model can be constrained. Thus, Franke <italic>et al.</italic> [<xref ref-type="bibr" rid="b12-remotesensing-04-01090">12</xref>] developed an obstacle detection method for urban traffic situations by assuming forward camera motion, while dealing with rotation by means of rotation motion templates. Other methods include multiple degrees of freedom for egomotion calculation, although in this case most of the research is focused on cameras that are mounted on ground vehicles, and so there are some constraints on their movement [<xref ref-type="bibr" rid="b13-remotesensing-04-01090">13</xref>]. Improved sensors, such as LIDARS, have also been used to detect and track dynamic objects [<xref ref-type="bibr" rid="b14-remotesensing-04-01090">14</xref>].</p>
<p>Techniques for tracking point features have been used in ground-level moving platforms, using both monocular [<xref ref-type="bibr" rid="b15-remotesensing-04-01090">15</xref>] and stereo [<xref ref-type="bibr" rid="b16-remotesensing-04-01090">16</xref>] approaches to determine the movement of the robot and to construct maps of the terrain [<xref ref-type="bibr" rid="b17-remotesensing-04-01090">17</xref>]. Jia <italic>et al.</italic> [<xref ref-type="bibr" rid="b18-remotesensing-04-01090">18</xref>] proposed an extended Kalman filter algorithm to estimate the state of a target. Optical flow vectors, color features and stereo pair disparities were used as visual features. Each of these approaches for ground moving vehicles impose a different set of constrains for the determination of the optical flow. For aerial vehicles quite different approaches are required because of their additional freedom of movement. Some of the most common methods are described below.</p>
<p>As shown by Miller <italic>et al.</italic> [<xref ref-type="bibr" rid="b19-remotesensing-04-01090">19</xref>], one possible approach is to use background subtraction methods with a combination of intensity threshold (for IR imagery), motion compensation and pattern classification. Chung <italic>et al.</italic> [<xref ref-type="bibr" rid="b20-remotesensing-04-01090">20</xref>] applied accumulative frame differencing to detect the pixels with motion and combined these pixels with homogeneous regions in the frame obtained by image segmentation. Other methods use optical flow as the main analysis technique. For example, Samija <italic>et al.</italic> [<xref ref-type="bibr" rid="b21-remotesensing-04-01090">21</xref>] used a segmentation of the optical flow in an omnidirectional camera. In this case the movement of the camera was known and the vectors of the optical flow were mapped on a sphere. Using optical flow methods, Suganuma <italic>et al.</italic> [<xref ref-type="bibr" rid="b22-remotesensing-04-01090">22</xref>] presented a stereo system to obtain occupancy grids and to determine direction and speed of dynamic objects for safe driving environments.</p>
<p>Herein, we have developed a new method that combines egomotion determination based on static point features with optical flow comparison to determine pixels that belong to dynamic objects. Chung <italic>et al.</italic> [<xref ref-type="bibr" rid="b20-remotesensing-04-01090">20</xref>] proposed a frame differencing procedure that would not work in our case due to the high frequency vibrations in the movement of currently available commercial UAVs. Meanwhile, our method only tracks single static features to determine the movement of the camera. Like Sugamanuma <italic>et al.</italic> [<xref ref-type="bibr" rid="b22-remotesensing-04-01090">22</xref>] we use optical flows techniques, but instead of a stereo vision system, we only need a single camera to obtain a list of all possible dynamic objects in the environment. Samija <italic>et al.</italic> [<xref ref-type="bibr" rid="b21-remotesensing-04-01090">21</xref>] also employed a single camera, but the movement of the camera was known prior to the optical flow calculation. In our case the camera motion estimation is obtained without any further sensor. Another advantage of our method is that, due to its mathematical simplicity, it can be executed in real-time on the onboard UAV computer.</p>
<p>The paper is organized as follows. Section 2 presents a general overview of the algorithm and briefly discusses each part, as well as the interconnections between them. In Section 3 we describe the new methods proposed to calculate the optical flows, together with the heuristic rules defined to compare two optical flows. Section 4 discusses the object definition procedure and the filtering and matching techniques used to track the real dynamic objects. In the results Section 5 we present the hardware setup used to test these algorithms and the results obtained. These tests have been carried out with a commercial quadrotor taking videos of a landscape field. Finally, Section 6 highlights conclusions, advantages, as well as the shortcomings of the procedure. We also point out the different fields of application where this technology might be applied. Future works aiming to add some additional functionality to the algorithm are also mentioned.</p>
<sec>
<label>2.</label>
<title>Methodology Overview</title>
<p>The main problem to solve when trying to detect moving objects from a flying UAV is to separate the changes in the image caused by the movement of the vehicle from those caused by dynamic objects. Although this problem is not limited to aerial vehicles, it represents an additional difficulty with UAVs since they have more degrees of freedom. In this case the input data adopts the form of a continuous flow of images produced by a single grayscale camera. From these images we have to obtain the position and velocity of the dynamic objects in the scene.</p>
<p>The main part of the new proposed methodology consists of comparing an artificial optical flow based on the movement of the camera with the real optical flow, and tracking the discrepancies. The complete architecture of the system can be appreciated in <xref ref-type="fig" rid="f1-remotesensing-04-01090">Figure 1</xref>. The core of this algorithm is the calculation of an artificial optical flow and its comparison with the real optical flow (highlighted in <xref ref-type="fig" rid="f1-remotesensing-04-01090">Figure 1</xref>). We have developed this method because it permits analysis of the whole image using a very small set of pixels in the actual process of comparison. The extrapolation of the information obtained using this set is enough for detecting and tracking moving objects in the whole image.</p>
<p>In addition to the overall scheme, the method contains the following elements intended to carry out different functions:
<list list-type="bullet">
<list-item>
<p><italic>Image Sequence:</italic> In general, our method should work with any type of image sequence, provided that the resolution is adequate. Both triple and mono-color channel images can be handled.</p></list-item>
<list-item>
<p><italic>Motion Estimation:</italic> A method to obtain the estimated movement of the camera is crucial for the performance of the algorithm. Klein <italic>et al.</italic> [<xref ref-type="bibr" rid="b23-remotesensing-04-01090">23</xref>] have developed a method of estimating camera pose in an unknown scene with a single handheld camera for small augmented reality (AR) workspaces using a Parallel Tracking and Mapping (PTAM) algorithm. It consists of two parallel threads for tracking and mapping a previously unknown scene. This method has been modified here to adapt it to our working conditions. For these purposes, the most important thread is the tracking one, which provides an estimation of the camera position in the map that is being dynamically generated and updated by the second thread. The map consists of 3D point features that are tracked through time in previously observed video frames. This way of collecting data permits the use of batch optimization techniques that are rather uncommon in real-time systems because they are computationally expensive. This system is designed to produce detailed maps with thousands of features in small restricted areas such as an office. It also enables tracking the features at the frame rate with great accuracy and robustness. Further details of this method can be found in this previous work [<xref ref-type="bibr" rid="b23-remotesensing-04-01090">23</xref>].</p></list-item>
<list-item>
<p><italic>Optical flows:</italic> For the image sequence, both a real and an artificial optical flow are obtained. This implies a careful selection of the pixels which are later on used to calculate the optical flows. This selection process is carried out by using procedures adapted from a tree based 9 point FAST feature detection process described by Rosten <italic>et al.</italic> in [<xref ref-type="bibr" rid="b24-remotesensing-04-01090">24</xref>]. Generally, the initially selected group of features is too large to be used in real-time, so that a reduced set of these features must be chosen. To accomplish this reduction, a non-maximal suppression of the FAST features is carried out. The choice of using two optical flows is motivated by the need to enhance the differences produced by dynamic objects. As described in Section 3, the real optical flow is calculated by using the iterative Lucas–Kanade method with pyramids [<xref ref-type="bibr" rid="b25-remotesensing-04-01090">25</xref>], while the artificial optical flow is calculated by means of a homography.</p></list-item>
<list-item>
<p><italic>Identification of dynamic pixels:</italic> This part of the work incorporates a new approach for identifying moving objects from a moving camera. The identification of the pixels that might belong to a dynamic object moving in the image is obtained by comparing the real and the artificial optical flows. Discrepancies between the two optical flows are then calculated using a two stage filter. Since an optical flow is a vector field, the first computational step consists of analyzing the direction of the flows in each pixel. Great variations in their direction (more than 20°) indicate a discrepancy and then the pixel is flagged as dynamic. The second step analyzes the magnitude of the vector looking for differences greater than a predefined threshold (in our case more than 30%). These pixels are also flagged as dynamic.</p></list-item>
<list-item>
<p><italic>Pixel association:</italic> The comparison of the flows yields a subset of vectors of the real optical flow that might belong to dynamic objects. These elements may come from both actual dynamic objects and also from spurious events, or may just be caused by errors from the optical flow calculation process. Possible sources of errors are, for example, patterned textures, homogeneous surfaces, <italic>etc</italic>. To discard these erroneous pixels and associate the others into dynamic objects, different filtering techniques are used. To minimize the processed information, the groups of vectors are represented by a rectangular area, which is described by a center coordinate and two dimensions. A list of the potential dynamic objects representations is generated and stored for further filtering and temporal tracking.</p></list-item>
<list-item>
<p><italic>Filtering and temporal tracking of dynamic objects:</italic> The process described above relies on the comparison of two subsequent images of the selected sequence. To efficiently track dynamic objects and discard any possible misidentification, temporal constraints must be incorporated into the algorithm.</p></list-item></list></p>
<p>In the following sections we will develop in more detail the basic principles of this new approach. The first step in this process is to show how the optical flows are calculated and which heuristic rules have been defined to compare them.</p></sec></sec>
<sec>
<label>3.</label>
<title>Motion through Optical Flow Difference</title>
<p>In this section we describe the process of identifying the pixels that might belong to a dynamic object. This is done by calculating and comparing the real optical flow and an artificially calculated one. This part is the core of the algorithm and is based on a new concept which, to our knowledge, has not been previously used in the literature [<xref ref-type="bibr" rid="b6-remotesensing-04-01090">6</xref>,<xref ref-type="bibr" rid="b7-remotesensing-04-01090">7</xref>,<xref ref-type="bibr" rid="b13-remotesensing-04-01090">13</xref>,<xref ref-type="bibr" rid="b20-remotesensing-04-01090">20</xref>,<xref ref-type="bibr" rid="b21-remotesensing-04-01090">21</xref>]. Thus, contrary to the method in [<xref ref-type="bibr" rid="b6-remotesensing-04-01090">6</xref>] and [<xref ref-type="bibr" rid="b7-remotesensing-04-01090">7</xref>], our method uses a free moving camera instead of a stationary one. In [<xref ref-type="bibr" rid="b13-remotesensing-04-01090">13</xref>] moving cameras were mounted on terrestrial vehicles but their movements were constrained and therefore the experimental conditions are not applicable to UAV. Our method, intended to be used with images taken from an UAV, has to work when there are no movement constraints and with real-time analysis. Due to the complexity of the recording conditions, the use of optical flow techniques provides a means to analyze the whole set of video images by focusing the calculations on a small number of selected pixels.</p>
<p>The PTAM algorithm is running onboard at a maximum frequency of 10 Hz [<xref ref-type="bibr" rid="b26-remotesensing-04-01090">26</xref>]. Since the input image sequence is streaming at 30 Hz, this means that the motion estimator is working with a third of the available images. Even that number of image frames is excessive for our purposes. To tackle with this problem, we adapt the working frequency of our algorithm to the movement of the UAV. When the UAV is moving very fast we work at the highest available frequency (10 Hz), but if the movement is slow we wait for the UAV to move a certain distance to permit changes in the image. We define a minimum working frequency of 5 Hz for the case that the UAV is staying over the same place (hovering).</p>
<p>We describe below the calculation process of the two optical flows and the rules imposed for their straightforward comparison.</p>
<sec>
<label>3.1.</label>
<title>Real Optical Flow</title>
<p>To obtain the real optical flow, the first step is to select the group of features that should be tracked in two subsequent frames. We have used two different approaches to determine the feature set. They have been applied to the same frame with the results shown in <xref ref-type="fig" rid="f2-remotesensing-04-01090">Figure 2</xref> taken as an example. As shown in <xref ref-type="fig" rid="f2-remotesensing-04-01090">Figure 2(a)</xref> the first approach relies on the classical method of selecting pixels by defining a regular grid in the image. These pixels are selected merely based on their position, without any consideration to their contrast or surrounding pixels. The main problem by this approach is that pixels with bad tracking features (e.g., pixels from a homogeneous patterned area, a smooth surface with little or no contrast, <italic>etc.</italic>) have the same probability to be selected as good features (e.g., pixels from rough surfaces, natural landscapes with vegetation, <italic>etc</italic>.). Although this method may present some restrictions when applied in complex real situations, we have included it here because we obtained good results in simple cases.</p>
<p>As shown in <xref ref-type="fig" rid="f2-remotesensing-04-01090">Figure 2(b)</xref>, the second approach consists of applying the 9 point FAST feature detection developed by Rosten <italic>et al.</italic> in [<xref ref-type="bibr" rid="b24-remotesensing-04-01090">24</xref>]. This algorithm looks for small points of interest with variations in two dimensions. Such points often arise as the result of geometric discontinuities, such as the corners of real world objects, although they can also arise from small patches of texture. This second procedure also presents some limitations, particularly when used with images of different areas, some with highly heterogeneous textures and others with little contrast and very homogeneous. In this case, the homogeneous area would be neglected and only pixels of the highly featured areas would be selected. Nevertheless, for the purposes of the present work, this method is quite appropriate because of the particular characteristics of the examined terrain consisting of fields and vegetated areas lacking large homogeneous zones. Under these conditions the pixels with the best tracking features would be more evenly distributed through the whole image. Although a feature tracking process is carried out by the PTAM algorithm, we must note that the features tracked by this algorithm are static. Since we want to detect dynamic objects, a new group of features needs to be selected for each image. Alternative feature selection techniques, such as SIFT (<italic>Scale-Invariant Feature Transform</italic>) [<xref ref-type="bibr" rid="b27-remotesensing-04-01090">27</xref>], SURF (<italic>Speeded Up Robust Features</italic>) [<xref ref-type="bibr" rid="b28-remotesensing-04-01090">28</xref>] or KLT (<italic>Kanade–Lucas–Tomasi</italic>) [<xref ref-type="bibr" rid="b29-remotesensing-04-01090">29</xref>] could have been used. A comparative study of these methods has been recently performed by Bonin <italic>et al.</italic> [<xref ref-type="bibr" rid="b30-remotesensing-04-01090">30</xref>], and although they conclude that for their application the most efficient method was KLT, we found that FAST feature selection best suits our purposes. This is likely because FAST feature selection is more suitable for high frame rate video streams and rapid motions, as noted by those authors.</p>
<p>Once the selection of the pixels has been completed, the optical flow is calculated by applying the iterative Lucas–Kanade method with pyramids [<xref ref-type="bibr" rid="b25-remotesensing-04-01090">25</xref>].</p></sec>
<sec>
<label>3.2.</label>
<title>Artificial Optical Flow</title>
<p>From the pixel set selected to obtain the real optical flow, an artificial flow represents the position of all these pixels projected mathematically in the next frame by considering the movement of the camera between the two images. This movement is represented by a rotation and a translation (<italic>R</italic>, <italic>T</italic>) obtained by the PTAM algorithm. Using an artificial flow we get information about the change in the position of these pixels resulting exclusively from the movement of the camera, thus neglecting any possible dynamic objects in the image.</p>
<p>Typically, the homography matrix <italic>H</italic> is calculated by using a variation of the RANSAC algorithm [<xref ref-type="bibr" rid="b31-remotesensing-04-01090">31</xref>,<xref ref-type="bibr" rid="b32-remotesensing-04-01090">32</xref>], which iterates the calculation with a selection of pixels trying to maximize the number of inliers. For our application we prefer to obtain the matrix <italic>H</italic> directly from the motion estimation because it avoids such an iterative process, which does not seem necessary as our method appears to cope effectively with outliers.</p>
<p>To obtain the artificial optical flow based on the camera motion we have used a conventional homography projection [<xref ref-type="bibr" rid="b33-remotesensing-04-01090">33</xref>]. A typical homography projects the position of a given point in a plane from one camera coordinate frame into that of another camera. In our case, some particular assumptions have been made to simplify and speed up the projection process. The most important one assumes that the ground is not significantly inclined, so that the average slope can be considered close to zero. Small relief changes, in the order of 40–50 cm, do not affect the algorithm ability to detect dynamic objects, as long as the long distance average slope is close to zero.</p>
<p>Mathematically we deal with the homography in the following way. For two subsequent frames identified as <italic>k</italic> and <italic>k</italic> + 1, the projection of a point <italic>p<sub>i</sub></italic> with coordinates in the first frame [<italic>x<sub>k</sub></italic><italic>, y<sub>k</sub></italic>] can be calculated as shown in <xref ref-type="disp-formula" rid="FD1">Equation (1)</xref>. In this equation <italic>R<sub>k,k</sub></italic><sub>+1</sub> and <italic>t<sub>k,k</sub></italic><sub>+1</sub> correspond to the rotation and translation between the camera coordinate frames <italic>k</italic> and <italic>k</italic> + 1; <italic>n<sup>T</sup></italic> is the vector perpendicular to the ground plane; and <italic>d̄</italic> is the mean distance from the camera position to the ground plane. An important assumption for this calculation is that the altitude of the camera does not vary much between frame <italic>k</italic> and <italic>k</italic> + 1 (note that the elapsed time between frames is at most 1/5 s).
<disp-formula id="FD1">
<label>(1)</label>
<mml:math id="mm1" display="block">
<mml:mtable columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi></mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi></mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mn>1</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:mrow>
<mml:mo>]</mml:mo></mml:mrow>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>H</mml:mi></mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo>*</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi></mml:mrow>
<mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi></mml:mrow>
<mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mn>1</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:mrow>
<mml:mo>]</mml:mo></mml:mrow>
<mml:mo>,</mml:mo></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mtext>where </mml:mtext>
<mml:msub>
<mml:mrow>
<mml:mi>H</mml:mi></mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>R</mml:mi></mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo>−</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi></mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo>⋅</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>n</mml:mi></mml:mrow>
<mml:mi>T</mml:mi></mml:msup></mml:mrow>
<mml:mover accent="true">
<mml:mi>d</mml:mi>
<mml:mo>¯</mml:mo></mml:mover></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>An example of the artificial flow calculation is depicted in <xref ref-type="fig" rid="f3-remotesensing-04-01090">Figure 3</xref>. In this figure it is shown how the position (<italic>x<sub>k</sub></italic>, <italic>y<sub>k</sub></italic>) of a given pixel in an image is projected by using <xref ref-type="disp-formula" rid="FD1">Equation (1)</xref> into a new coordinate frame defined by the new position of the camera. For this example we have assumed that the camera has rotated a certain angle and translated parallel to the ground. In the global coordinate frame the point has not changed its position, but in the images’ coordinate frames the position of the point has changed from (<italic>x<sub>k</sub></italic>, <italic>y<sub>k</sub></italic>) to (<italic>x<sub>k</sub></italic><sub>+1</sub>, <italic>y<sub>k</sub></italic><sub>+1</sub>).</p></sec>
<sec>
<label>3.3.</label>
<title>Identification of Dynamic Pixels</title>
<p>The dynamic character of a pixel is deduced by the comparison of the two previously calculated optical flows. <xref ref-type="fig" rid="f4-remotesensing-04-01090">Figure 4</xref> shows a series of illustrations corresponding to, respectively, a real optical flow, an artificial optical flow and their superposition into the same image. For this comparison we assume that moving objects were present in the scene. The real optical flow shows some vectors that point clearly in a different direction than the rest.</p>
<p>In general, there is always an offset between the real and the artificial flows. This is caused by the intrinsic error in the estimation of the position of the camera. However, there are points where that difference is clearly higher than the average. These pixels are highlighted in the superposition image of the flows. In principle, these points could be associated with a moving object. In the following sections we will see that not all of these points correspond to actual dynamic objects and that a suitable discrimination procedure has to be set up.</p>
<p>In general, to classify the pixels in the optical flows as moving or static, a specific procedure taking into account both their angles and modules is applied to each pair of vectors.
<list list-type="order">
<list-item>
<p>Calculation of the angle <italic>α</italic> formed by each vector with the camera coordinate frame. This yields the vector angles of both the real optical flow <italic>α<sub>r</sub></italic> and the artificial one <italic>α<sub>a</sub></italic>.</p></list-item>
<list-item>
<p>Comparison of the angle difference <italic>Δα</italic> = |<italic>α<sub>r</sub></italic> − <italic>α<sub>a</sub></italic>| with a predefined critical threshold <italic>α<sub>t</sub></italic> (we use 20° for this threshold, although this parameter should be adjusted depending on the expected average altitude of the UAV). A possible way of dynamically calculating this threshold is to obtain the statistical mode from all angle differences. If <italic>Δα</italic> &gt; <italic>α<sub>t</sub></italic> the pixel is flagged as dynamic.</p></list-item>
<list-item>
<p>If <italic>Δα</italic> &lt; <italic>α<sub>t</sub></italic> the vectors modules are then compared. Although complicated statistical methods can be implemented to obtain a module difference threshold, for simplicity and to speed up the calculations, we have assumed a fixed threshold value of ± 30% with respect to the artificial optical flow.</p></list-item></list></p>
<p>As a result of this filtering process a group of vectors of the real optical flow is flagged as dynamic. To proceed with the calculations it is still necessary to group them spatially in the image. It may happen that some of the vectors appear isolated in the image. Commonly they are outliers due to uncontrolled fluctuations or other errors and are discarded.</p></sec></sec>
<sec>
<label>4.</label>
<title>Object Definition, Filtering and Tracking</title>
<p>In this section we describe the procedure developed to convert a list of pixels marked as dynamic into a list of moving objects currently present in the image. Additionally this part of the algorithm aims at filtering those objects and tracking them through time.</p>
<sec>
<label>4.1.</label>
<title>Object Definition through Pixel Association</title>
<p>In general, dynamic objects include a considerable number of pixels. This number depends on the size of the object and the altitude of the UAV for each frame. To handle such variety of possibilities, in this part of the algorithm different criteria are defined to associate the pixels flagged as dynamic into possible moving objects and to discard those caused by errors and outliers. Typically these errors and outliers are caused by pairing failures during the application of the Lucas–Kanade method to calculate the optical flow. An example is schematically shown in <xref ref-type="fig" rid="f5-remotesensing-04-01090">Figure 5</xref>. The associated vectors are grouped into two different elements and reduced to a rectangle as schematically represented in <xref ref-type="fig" rid="f5-remotesensing-04-01090">Figure 5(b)</xref>. There are also some isolated vectors that, although initially identified as dynamic pixels, are discarded because they do not form a group with the critical size required to be classified as part of a moving object. Another criterion to remove these vectors is that they point in quite different directions.</p>
<p>Mathematically the association process is carried out by performing the following operations:
<list list-type="order">
<list-item>
<p>Discard large module vectors (<italic>i.e</italic>., |<italic>v</italic>| &gt; 0.3 * <italic>ImageWidth</italic>).</p></list-item>
<list-item>
<p>Remove single isolated dynamic pixels.</p></list-item>
<list-item>
<p>Group vectors with similar angles and magnitudes positioned in a nearby area into a single moving object. This grouping process is carried out as follows:</p>
<list list-type="alpha-lower">
<list-item>
<p>K-Means [<xref ref-type="bibr" rid="b34-remotesensing-04-01090">34</xref>] clustering of the vectors using [<italic>x</italic>, <italic>y</italic>, <italic>v<sub>x</sub></italic>, <italic>v<sub>y</sub></italic>] as multidimensional variable.</p></list-item>
<list-item>
<p>In each group we eliminate the points that deviate more than 30% of the mean.</p></list-item></list></list-item>
<list-item>
<p>Discard groups with fewer vectors than a minimum threshold (<italic>i.e.</italic>, we discard groups with fewer than 5 points).</p></list-item></list></p>
<p>Once a set of dynamic pixels has been grouped into a moving object, its basic characteristics must be deduced. Namely, we extract information about the size (length and width) and the position of the center. At this stage, using sensor fusion techniques, color and temperature could be also added to the algorithm and treated in a similar fashion. For further computation the center and size of the object are used to create a virtual rectangle in the position of the object as represented in <xref ref-type="fig" rid="f5-remotesensing-04-01090">Figure 5(b)</xref>. This reduction of information enables the fast real-time processing which is necessary for onboard computation.</p>
<p>At the end of this step there is a list of actual possible moving objects <italic>L<sub>a</sub></italic>. This list is checked by the algorithm to match the objects it contains with the global dynamic object list <italic>L<sub>g</sub></italic>. To initialize <italic>L<sub>g</sub></italic>, the first <italic>L<sub>a</sub></italic>, determined by comparing the two first video frames, is taken as the initial list.</p></sec>
<sec>
<label>4.2.</label>
<title>Object Temporal Tracking and Spatial Filtering</title>
<p>First, the algorithm implements a matching method to pair the objects in <italic>L<sub>a</sub></italic> with those in <italic>L<sub>g</sub></italic>. This method is based on an Extended Kalman Filter (EKF). We define the state of the system as all the states of each object <italic>x<sub>i</sub></italic> = [<italic>x</italic>, <italic>y</italic>, <italic>size</italic>, <italic>R</italic>, <italic>G</italic>, <italic>B</italic>, <italic>T</italic>] in <italic>L<sub>g</sub></italic>; where (<italic>x</italic>, <italic>y</italic>) are the coordinates of the position of the center of the object; <italic>size</italic> is the maximum value between the length and the width obtained by the algorithm; <italic>R</italic>, <italic>G</italic>, <italic>B</italic> are the color of the object obtained by sensor fusion techniques; and <italic>T</italic> is the mean temperature of the object (supposing we had this information). We use the Extended Kalman filter to predict the next state of the system. This calculation assumes a constant velocity movement model to determine the position and assumes a constant behavior for the other variables. To pair the objects from <italic>L<sub>g</sub></italic> with those in <italic>L<sub>a</sub></italic> we use the Mahalanobis distance, as shown in <xref ref-type="disp-formula" rid="FD2">Equation (2)</xref>. This procedure allows us to determine the similarities between two multidimensional variables, in this case the predicted state for object i (
<inline-formula>
<mml:math id="mm2" display="inline">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>^</mml:mo></mml:mover></mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>i</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula>) and the measurements obtained for object j (
<inline-formula>
<mml:math id="mm3" display="inline">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>z</mml:mi></mml:mrow>
<mml:mi>k</mml:mi>
<mml:mi>j</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula>), using the variance (
<inline-formula>
<mml:math id="mm4" display="inline">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>P</mml:mi></mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>i</mml:mi></mml:msubsup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>) for each variable (<italic>v</italic>) obtained during the EKF process.
<disp-formula id="FD2">
<label>(2)</label>
<mml:math id="mm5" display="block">
<mml:mrow>
<mml:mi mathvariant="italic">dist</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>^</mml:mo></mml:mover></mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>i</mml:mi></mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>z</mml:mi></mml:mrow>
<mml:mi>k</mml:mi>
<mml:mi>j</mml:mi></mml:msubsup></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow>
<mml:mo>=</mml:mo>
<mml:msqrt>
<mml:mrow>
<mml:munderover>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mn>7</mml:mn></mml:munderover>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>^</mml:mo></mml:mover></mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>i</mml:mi></mml:msubsup>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>−</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>z</mml:mi></mml:mrow>
<mml:mi>k</mml:mi>
<mml:mi>j</mml:mi></mml:msubsup>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>P</mml:mi></mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>i</mml:mi></mml:msubsup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:mrow></mml:msqrt></mml:mrow></mml:math></disp-formula></p>
<p>The pairs with the smallest distances are then selected according to the pairing <xref ref-type="table" rid="t1-remotesensing-04-01090">Algorithm 1</xref>.</p>
<p>Up to now the algorithm has been dealing with pairs of consecutive image frames. For the following steps a longer time scale must be considered. The previous analysis has introduced new information for the objects in <italic>L<sub>g</sub></italic> and even new objects might have been added to this list. Therefore, as the recording time elapses, new information must be filtered. In the global list there are two types of object. The first set includes objects that have been tracked over long periods of time or distances. This object information is being transmitted to the central base of operations or to other robots in the network. The objects in the second set have been detected only recently and therefore have been present in the list for a short period of time. Only when they have been tracked for a predefined period of time or distance (2 s or 2 m), it is concluded that they are real moving objects in the scene. They are then qualified for the first set of objects in the global list and are also transmitted to the other components of the system (the central base or the other robots, depending on the case). This process is required to avoid an unnecessary growth in the dimension of the list being transmitted due to including objects that might be produced from short time misdetections.</p>
<table-wrap id="t1-remotesensing-04-01090" position="anchor">
<label>Algorithm 1</label>
<caption>
<p>Pairing of objects lists</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr>
<td align="left" valign="top"><bold>Require:</bold> Actual Object List (<italic>L<sub>a</sub></italic>) with <italic>n<sub>a</sub></italic> objects</td></tr>
<tr>
<td align="left" valign="top"><bold>Require:</bold> Global Object List (<italic>L<sub>g</sub></italic>) with <italic>n<sub>g</sub></italic> objects</td></tr>
<tr>
<td align="left" valign="top"><bold>Ensure:</bold> <italic>n<sub>a</sub></italic> &gt; 0 ∨ <italic>n<sub>g</sub></italic> &gt; 0</td></tr>
<tr>
<td align="left" valign="top">  Prepare <italic>n<sub>a</sub></italic> <italic>× n<sub>g</sub></italic> pairs: <italic>PossiblePairs</italic></td></tr>
<tr>
<td align="left" valign="top">  Calculate distance between objects in each pair according to <xref ref-type="disp-formula" rid="FD2">Equation (2)</xref></td></tr>
<tr>
<td align="left" valign="top">  <italic>SelectedPairs</italic>: Selected Pairs List</td></tr>
<tr>
<td align="left" valign="top">  <bold>while</bold> Size(<italic>SelectedPairs</italic>) <italic>&lt; min</italic>(<italic>n<sub>a</sub></italic><italic>, n<sub>g</sub></italic>) <bold>do</bold></td></tr>
<tr>
<td align="left" valign="top">    Select pair with minimum distance</td></tr>
<tr>
<td align="left" valign="top">    <bold>if</bold> Any of the objects is already in <italic>SelectedPairs</italic> <bold>then</bold></td></tr>
<tr>
<td align="left" valign="top">      Eliminate pair from <italic>PossiblePairs</italic></td></tr>
<tr>
<td align="left" valign="top">    <bold>else</bold></td></tr>
<tr>
<td align="left" valign="top">      Include pair in <italic>SelectedPairs</italic></td></tr>
<tr>
<td align="left" valign="top">    <bold>end if</bold></td></tr>
<tr>
<td align="left" valign="top">    <bold>end while</bold></td></tr></tbody></table></table-wrap>
<p>During a real recording process, there might be several situations which lead to identifying unreal moving objects. A typical example might occur when on a terrain there are elements with a high aspect ratio (e.g., trees, high fences, <italic>etc</italic>). To remove these faulty objects from the list of real moving objects, we implement in the algorithm basic tracking criteria that compare the position of the objects through time and eliminate those which are, in fact, static. For objects of this second category to become part of the first category, they must comply, at least, with the two next rules:
<list list-type="bullet">
<list-item>
<p>They must be detected in more than 10 consecutively selected frames.</p></list-item>
<list-item>
<p>In a consistent movement their position must change more than a certain critical distance.</p></list-item></list></p>
<p>The first rule eliminates unlikely but possible groups of vectors in the same area due to patterned surfaces. The second rule applies to objects oscillating around a center without displacing a long distance. A clear example of this situation would be a bush under the action of the wind. If a certain moving object does not meet these two criteria it is eliminated from <italic>L<sub>g</sub></italic>.</p>
<p>Additionally, for a UAV taking images for a long time, <italic>L<sub>g</sub></italic> might become too large to be reasonably handled. This means that we need to implement a procedure to reduce the size of the list by eliminating objects that have not been detected for a very long time. To remove an object from the list we select a maximum time. This time can be adjusted for each application as a function of computation capabilities that, acting as a bottleneck, restrict the size of the list. However, we must consider that the time period must be compatible with frequent circumstances in the detection of moving objects such as occlusions and/or crossings.</p>
<p>Another possibility for tracking the objects could be to define a region of interest (ROI) in the image and track the keypoint features in the ROIs with local descriptors combined with robust motion estimation as proposed by Garcia <italic>et al.</italic> [<xref ref-type="bibr" rid="b35-remotesensing-04-01090">35</xref>]. We have chosen our method firstly because it does not rely on feature tracking to follow the objects trajectories and secondly because the information produced by our method can be easily shared with other robots, which will be necessary in a security and surveillance system.</p></sec></sec>
<sec sec-type="results">
<label>5.</label>
<title>Experimental Results</title>
<p>In this section we first describe the hardware setup used to perform the tests in Section 5.1. In Section 5.2 we present some test examples carried out with a quadrotor and the results obtained with our algorithm.</p>
<sec>
<label>5.1.</label>
<title>Hardware Setup</title>
<p>The algorithm presented in this paper is intended to work in real time in an onboard processor mounted in an aerial vehicle. For this reason and to minimize the computing cost, we have chosen mono-color images with a resolution of 640 × 480 pixels. The selected image frequency was 30 Hz. Photos of the hardware used in these experiments are shown in <xref ref-type="fig" rid="f6-remotesensing-04-01090">Figure 6</xref>.</p>
<p>The aerial vehicle used to take the image sequence is the AscTec Pelican(R) quadrotor, shown in <xref ref-type="fig" rid="f6-remotesensing-04-01090">Figure 6(a)</xref>. It is equipped with an Intel Atom processor board (1.6 GHz, 1 GB RAM, 90 g gross weight). Its main characteristics are listed below:
<list list-type="bullet">
<list-item>
<p>Maximum speed 50 km/h.</p></list-item>
<list-item>
<p>Maximum wind load 36 km/h.</p></list-item>
<list-item>
<p>Maximum flight time 20 min.</p></list-item>
<list-item>
<p>GPS aided flight.</p></list-item>
<list-item>
<p>MEMS Gyro Sensors.</p></list-item>
<list-item>
<p>2× ARM7 micro processor.</p></list-item>
<list-item>
<p>Modular tower design.</p></list-item>
<list-item>
<p>Suitable for ATOM processor board.</p></list-item>
<list-item>
<p>500 g Payload.</p></list-item></list></p>
<p>The camera used was an USB Firefly MV FMVU-03MTM from PointGrey(R), shown in <xref ref-type="fig" rid="f6-remotesensing-04-01090">Figure 6(b)</xref>. Its main characteristics are:
<list list-type="bullet">
<list-item>
<p>Maximum resolution of 752 × 480.</p></list-item>
<list-item>
<p>Maximum frequency of 60 fps.</p></list-item>
<list-item>
<p>Shutter: 0.03 ms to 512 ms for 0.3MP.</p></list-item>
<list-item>
<p>Operating voltage: 4.75–5.25 V, via Mini B USB 2.0 or GPIO connector.</p></list-item>
<list-item>
<p>Dimensions: 44 × 34 × 24.4 mm.</p></list-item>
<list-item>
<p>Weight: 37 g.</p></list-item></list></p></sec>
<sec>
<label>5.2.</label>
<title>Example Test</title>
<p>In this part we are going to show some examples of the performance of our algorithm in a real situation. For the sake of clarity, we have drastically down-sampled the represented number of features shown in the screen captures of the system output.</p>
<p>According to Klein <italic>et al.</italic> [<xref ref-type="bibr" rid="b23-remotesensing-04-01090">23</xref>], an initialization is necessary in the tracking and mapping algorithm. The stereo initialization procedure needs a baseline to function correctly. To provide this initialization the camera must be moved sideways between the first two frames. However, using this initialization faces the problem that the map scale is unknown. To overcome this constraint we decided to modify the available libraries by including an initialization marker patten. This pattern provides a means to obtain a map with real dimensions (in our case meters).</p>
<p><xref ref-type="fig" rid="f7-remotesensing-04-01090">Figure 7(a)</xref> shows the selected marker pattern. This type of two dimensional marker pattern is advantageous because it is easily identified in the image irrespective of its orientation in and the tilting angles of the camera. Moreover, since it has multiple markers with many corner features, it provides simultaneously many measurements of the map scale in a single view. To initialize the algorithm correctly, the camera needs to pass over the marker pattern tracking the corners as shown by the gradually colored segments in <xref ref-type="fig" rid="f7-remotesensing-04-01090">Figure 7(b)</xref>.</p>
<p>Following the procedure of Klein <italic>et al.</italic> [<xref ref-type="bibr" rid="b23-remotesensing-04-01090">23</xref>], the motion camera estimate module tracks static features through the frames. <xref ref-type="fig" rid="f8-remotesensing-04-01090">Figure 8</xref> represents an example of the tracking of these static features between two frames. As we mentioned before, since the image sequence frequency is 30 Hz, the differences between two consecutive frames are not easily appreciated and they pose difficulties for the calculations. To overcome this shortcoming, we drastically reduce the number of processed images to a working frequency comprised between 5 and 10 Hz (see Section 3). The frames in <xref ref-type="fig" rid="f8-remotesensing-04-01090">Figure 8(a,b)</xref> are subsequent images separated by one second. For the sake of clarity we only show ten relevant static points in each frame. The complete set of recorded static points, in the order of thousands, is used to estimate the position of the camera for each frame.</p>
<p>When the camera motion has been estimated, the next step is to obtain the optical flows, compare them according to the methodology described in Section 3 and associate the dynamic pixels obtained into dynamic objects. The example represented in <xref ref-type="fig" rid="f9-remotesensing-04-01090">Figure 9</xref> shows these three steps of the analysis.</p>
<p>This example involves a UAV flying above a field with a walking person. The UAV and the person were moving in distinct directions. Usually the calculation of the optical flows involves around one thousand pixels. To simplify the representation we only show in <xref ref-type="fig" rid="f9-remotesensing-04-01090">Figure 9(a)</xref> the optical flows of a selection of these pixels. In <xref ref-type="fig" rid="f9-remotesensing-04-01090">Figure 9(b)</xref> we only represent the dynamically flagged pixels. We clearly appreciate two groups and some single pixels. The groups are qualified as dynamic objects in <xref ref-type="fig" rid="f9-remotesensing-04-01090">Figure 9(c)</xref>, whereas the isolated pixels are discarded. It is worth noting that both the walking person and a part of his/her shadow are taken as dynamic objects. Usually the quality of the pixels of the shadows is not so good and are not normally taken as dynamic objects. Of course, this is highly dependent on the weather conditions. The detection of shadows as part of a dynamic object could be interpreted as a downside of the algorithm as it would introduce an error in the estimated position of the dynamic objects. However, for security and surveillance applications, the small changes in the position estimate of the dynamic objects is irrelevant for detecting the object and correctly tracking it.</p>
<p>The algorithm was able to detect and track a single dynamic object during a flight of the quadrotor. <xref ref-type="fig" rid="f10-remotesensing-04-01090">Figure 10</xref> shows a sequence where a single dynamic object is detected and tracked. In this sequence both the quadrotor and the imaged person were moving. It is clear that the algorithm was still able to detect and track this latter. In <xref ref-type="fig" rid="f10-remotesensing-04-01090">Figure 10(a)</xref>, besides the dynamic object, there was a static shadow that belonged to a non-moving person located outside the scene. This shadow was not identified as a moving object. Meanwhile, the dynamic object dimension changed along the sequence depending on the extension of shadow which was identified as part of the object. However, we must note that the detected center position of the object was always close to the real center of the person.</p>
<p>The algorithm was also able to detect and track multiple objects from different altitudes. <xref ref-type="fig" rid="f11-remotesensing-04-01090">Figure 11</xref> shows three different moving objects detected from a higher altitude of the quadrotor. They move according to different directions as indicated in the figure. When one of the objects stops moving (<xref ref-type="fig" rid="f11-remotesensing-04-01090">Figure 11(c)</xref>) it was no longer identified as dynamic object and therefore it is not highlighted in the image (<xref ref-type="fig" rid="f11-remotesensing-04-01090">Figure 11(d)</xref>). The detection and tracking process continued efficiently for the other two moving objects (<xref ref-type="fig" rid="f11-remotesensing-04-01090">Figures 11(c,d)</xref>). We must note that even though between <xref ref-type="fig" rid="f11-remotesensing-04-01090">Figures 11(c) and (d)</xref> not only the position but also the altitude of the quadrotor had changed, the algorithm continued detecting and tracking the two present dynamic objects properly.</p>
<p>The presented tests were examples of the performance of the algorithm during real flights of the quadrotor over a field of scarcely vegetated terrain. With the shown examples we have demonstrated that the algorithm is able to efficiently detect and track multiple dynamic objects. Since the main designed application of the algorithm is security and surveillance, situations with a large number of dynamic objects in the environment are not expected, and the type of presented tests are considered sufficient to validate the algorithm performance.</p></sec></sec>
<sec sec-type="conclusions">
<label>6.</label>
<title>Conclusions and Remarks</title>
<p>In this work we present a new approach to detect and track moving objects from a UAV. For this purpose we have developed a new optical flow technique that has proven effective to identify real moving objects moving on a real landscape.</p>
<p>Image sequences taken from aerial vehicles have no static elements due to the intrinsic movements and vibrations of the UAV. Through this new method we have been able to differentiate between the changes in the images due to the movements of the UAV and the changes actually produced by the dynamic objects moving in the scene.</p>
<p>The techniques and algorithms presented in this paper incorporate an innovative approach. Estimated camera motion is used to calculate an artificial optical flow, which is compared to the real optical flow. Although mathematically simple, this method has proven successful in determining divergences in the real optical flow, leading to what we later on identify as dynamic objects. An additional advantage of our method is the use of a single camera, without the need of other sensors, to track static features in the ground and to estimate the camera motion. Since any moving object will have a different optical flow behavior than the optical flow based on the camera motion, this algorithm allows us to detect any moving object in the camera stream regardless of its direction of movement.</p>
<p>The developed technique is expected to be useful for surveillance applications in external critical facilities (e.g., nuclear plants, industrial storage facilities, solar energy plants, <italic>etc.</italic>). However, the algorithms could also be used for cattle tracking in agroindustrial environments or wild animal surveillance for ecological control activities. Another large field of applications relates to the defense industry, for example to track potential threats to critical facilities and infrastructures of national interest.</p>
<p>However, under certain circumstances, the procedure might have some shortcomings. Although these limitations do not hamper the practical use of our method, it is worthy to mention them to be aware of the range of applications and the possible solutions. Some limitations relate to the minimum velocity and size of the objects to be detected and with the maximum flight altitude of the UAV. Of course these three parameters are intimately linked. To minimize these restrictions, the algorithm parameters must be defined specifically for each practical case after a careful assessment of the real scenario where our technology is to be applied. Another type of shortcoming that could be devised for this type of technology is inherent to the use of a light UAV. These vehicles can be easily displaced by large unexpected wind blasts. If such a situation occurs, the UAV will be displaced suddenly by a great distance and might lose its tracking of static references on the ground. To return to a normal work cycle the UAV should be manually directed to a previously known area.</p>
<p>A possible and feasible solution for these problems is to integrate the position estimate given by the PTAM algorithm provided by Klein <italic>et al.</italic> [<xref ref-type="bibr" rid="b23-remotesensing-04-01090">23</xref>] with the position measurements obtained from other sensors, such as IMU or GPS. Work is being carried out at present in our group along this line. Further improvements on which we are also working consist of applying artificial vision techniques such as contour definitions and pattern recognition.</p></sec></body>
<back>
<ack>
<p>The authors gratefully acknowledge the contribution of Technical University of Madrid (UPM), Automatic Robotic Center (CAR) and project ROTOS (DPI2010-17998) from the Ministerio de Ciencia e Innovación (MICIIN) for supporting this work. This work has also been performed within the frame of the Robocity 2030-II network of the Comunidad de Madrid. GRC acknowledges a grant of the UPM to carry out this work, during a research stay in the University of Auckland (UoA) in New Zealand. The Robotics Research Group (UoA) and the Robotics and Cybernetics Research Group (CAR-UPM) provided the means and inspired the realization of this work.</p></ack>
<ref-list>
<title>References</title>
<ref id="b1-remotesensing-04-01090"><label>1.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gao</surname><given-names>T.</given-names></name><name><surname>Liu</surname><given-names>Z.</given-names></name><name><surname>Yue</surname><given-names>S.</given-names></name></person-group><article-title>Traffic video-based moving vehicle detection and tracking in the complex environment</article-title><source>Cyber. Syst</source><year>2009</year><volume>40</volume><fpage>569</fpage><lpage>588</lpage><pub-id pub-id-type="doi">10.1080/01969720903152544</pub-id></citation></ref>
<ref id="b2-remotesensing-04-01090"><label>2.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Watanabe</surname><given-names>Y.</given-names></name><name><surname>Fabiani</surname><given-names>P.</given-names></name><name><surname>Le Besnerais</surname><given-names>G.</given-names></name></person-group><article-title>Simultaneous Visual Target Tracking and Navigation in a GPS-Denied Environment</article-title><conf-name>Proceedings of IEEE 14th International Conference on Advanced Robotics</conf-name><conf-loc>Munich, Germany</conf-loc><conf-date>22–26 June 2009</conf-date><volume>1</volume><fpage>297</fpage><lpage>302</lpage></citation></ref>
<ref id="b3-remotesensing-04-01090"><label>3.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Hashimoto</surname><given-names>M.</given-names></name><name><surname>Takahashi</surname><given-names>K.</given-names></name><name><surname>Matsui</surname><given-names>Y.</given-names></name></person-group><article-title>Moving-Object Tracking with Multi-Laser Range Sensors for Mobile Robot Navigation</article-title><conf-name>Proceedings of IEEE International Conference on Robotics and Biomimetics</conf-name><conf-loc>Hainan, China</conf-loc><conf-date>15–18 December 2007</conf-date><volume>1</volume><fpage>399</fpage><lpage>404</lpage></citation></ref>
<ref id="b4-remotesensing-04-01090"><label>4.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ahn</surname><given-names>J.H.</given-names></name><name><surname>Choi</surname><given-names>C.</given-names></name><name><surname>Kwak</surname><given-names>S.</given-names></name><name><surname>Kim</surname><given-names>K.</given-names></name><name><surname>Byun</surname><given-names>H.</given-names></name></person-group><article-title>Human tracking and silhouette extraction for human-robot interaction systems</article-title><source>Pattern Anal. Appl</source><year>2009</year><volume>12</volume><fpage>167</fpage><lpage>177</lpage><pub-id pub-id-type="doi">10.1007/s10044-008-0112-3</pub-id></citation></ref>
<ref id="b5-remotesensing-04-01090"><label>5.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Talukder</surname><given-names>A.</given-names></name><name><surname>Matthies</surname><given-names>L.</given-names></name></person-group><article-title>Real-Time Detection of Moving Objects from Moving Vehicles Using Dense Stereo and Optical Flow</article-title><conf-name>Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2004)</conf-name><conf-loc>Sendai, Japan</conf-loc><conf-date>28 September–2 October 2004</conf-date><volume>4</volume><fpage>3718</fpage><lpage>3725</lpage></citation></ref>
<ref id="b6-remotesensing-04-01090"><label>6.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Collins</surname><given-names>R.</given-names></name><name><surname>Lipton</surname><given-names>A.</given-names></name><name><surname>Fujiyoshi</surname><given-names>H.</given-names></name><name><surname>Kanade</surname><given-names>T.</given-names></name></person-group><article-title>Algorithms for cooperative multisensor surveillance</article-title><source>Proc. IEEE</source><year>2001</year><volume>89</volume><fpage>1456</fpage><lpage>1477</lpage><pub-id pub-id-type="doi">10.1109/5.959341</pub-id></citation></ref>
<ref id="b7-remotesensing-04-01090"><label>7.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cheung</surname><given-names>S.C.S.</given-names></name><name><surname>Kamath</surname><given-names>C.</given-names></name></person-group><article-title>Robust background subtraction with foreground validation for urban traffic video</article-title><source>EURASIP J. Appl. Signal Process</source><year>2005</year><volume>2005</volume><fpage>2330</fpage><lpage>2340</lpage><pub-id pub-id-type="doi">10.1155/ASP.2005.2330</pub-id></citation></ref>
<ref id="b8-remotesensing-04-01090"><label>8.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Eveland</surname><given-names>C.</given-names></name><name><surname>Konolige</surname><given-names>K.</given-names></name><name><surname>Bolles</surname><given-names>R.</given-names></name></person-group><article-title>Background Modeling for Segmentation of Video-Rate Stereo Sequences</article-title><conf-name>Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition</conf-name><conf-loc>Santa Barbara, CA, USA</conf-loc><conf-date>23–25 June 1998</conf-date><fpage>266</fpage><lpage>271</lpage></citation></ref>
<ref id="b9-remotesensing-04-01090"><label>9.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Kalafatic</surname><given-names>Z.</given-names></name><name><surname>Ribaric</surname><given-names>S.</given-names></name><name><surname>Stanisavljevic</surname><given-names>V.</given-names></name></person-group><article-title>Real-Time Object Tracking Based on Optical Flow and Active Rays</article-title><conf-name>Proceedings of 10th Mediterranean Electrotechnical Conference (MELECON 2000)</conf-name><conf-loc>Lemesos, Cyprus</conf-loc><conf-date>29–31 May 2000</conf-date><volume>2</volume><fpage>542</fpage><lpage>545</lpage></citation></ref>
<ref id="b10-remotesensing-04-01090"><label>10.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Zhang</surname><given-names>H.Y.</given-names></name></person-group><article-title>Multiple Moving Objects Detection and Tracking Based on Optical Flow in Polar-Log Images</article-title><conf-name>Proceedings of 2010 International Conference on Machine Learning and Cybernetics (ICMLC)</conf-name><conf-loc>Qingdao, China</conf-loc><conf-date>11–14 July 2010</conf-date><volume>3</volume><fpage>1577</fpage><lpage>1582</lpage></citation></ref>
<ref id="b11-remotesensing-04-01090"><label>11.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Hall</surname><given-names>D.</given-names></name><name><surname>Nascimento</surname><given-names>J.</given-names></name><name><surname>Ribeiro</surname><given-names>P.</given-names></name><name><surname>Andrade</surname><given-names>E.</given-names></name><name><surname>Moreno</surname><given-names>P.</given-names></name><name><surname>Pesnel</surname><given-names>S.</given-names></name><name><surname>List</surname><given-names>T.</given-names></name><name><surname>Emonet</surname><given-names>R.</given-names></name><name><surname>Fisher</surname><given-names>R.</given-names></name><name><surname>Victor</surname><given-names>J.</given-names></name><name><surname>Crowley</surname><given-names>J.</given-names></name></person-group><article-title>Comparison of Target Detection Algorithms Using Adaptive Background Models</article-title><conf-name>Proceedings of 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance</conf-name><conf-loc>Beijing, China</conf-loc><conf-date>15–16 October 2005</conf-date><fpage>113</fpage><lpage>120</lpage></citation></ref>
<ref id="b12-remotesensing-04-01090"><label>12.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Franke</surname><given-names>U.</given-names></name><name><surname>Heinrich</surname><given-names>S.</given-names></name></person-group><article-title>Fast obstacle detection for urban traffic situations</article-title><source>IEEE Trans. Intell. Transp. Syst</source><year>2002</year><volume>3</volume><fpage>173</fpage><lpage>181</lpage><pub-id pub-id-type="doi">10.1109/TITS.2002.802934</pub-id></citation></ref>
<ref id="b13-remotesensing-04-01090"><label>13.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Braillon</surname><given-names>C.</given-names></name><name><surname>Pradalier</surname><given-names>C.</given-names></name><name><surname>Crowley</surname><given-names>J.</given-names></name><name><surname>Laugier</surname><given-names>C.</given-names></name></person-group><article-title>Real-Time Moving Obstacle Detection Using Optical Flow Models</article-title><conf-name>Proceedings of 2006 IEEE Intelligent Vehicles Symposium</conf-name><conf-loc>Tokyo, Japan</conf-loc><conf-date>13–15 June 2006</conf-date><fpage>466</fpage><lpage>471</lpage></citation></ref>
<ref id="b14-remotesensing-04-01090"><label>14.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Leslar</surname><given-names>M.</given-names></name><name><surname>Wang</surname><given-names>J.</given-names></name><name><surname>Hu</surname><given-names>B.</given-names></name></person-group><article-title>Comprehensive utilization of temporal and spatial domain outlier detection methods for mobile terrestrial lidar data</article-title><source>Remote Sens</source><year>2011</year><volume>3</volume><fpage>1724</fpage><lpage>1742</lpage><pub-id pub-id-type="doi">10.3390/rs3081724</pub-id></citation></ref>
<ref id="b15-remotesensing-04-01090"><label>15.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tissainayagam</surname><given-names>P.</given-names></name><name><surname>Suter</surname><given-names>D.</given-names></name></person-group><article-title>Object tracking in image sequences using point features</article-title><source>Pattern Recogn</source><year>2005</year><volume>38</volume><fpage>105</fpage><lpage>113</lpage><pub-id pub-id-type="doi">10.1016/j.patcog.2004.05.011</pub-id></citation></ref>
<ref id="b16-remotesensing-04-01090"><label>16.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Milella</surname><given-names>A.</given-names></name><name><surname>Siegwart</surname><given-names>R.</given-names></name></person-group><article-title>Stereo-Based Ego-Motion Estimation Using Pixel Tracking and Iterative Closest Point</article-title><conf-name>Proceedings of 2006 IEEE International Conference on Computer Vision Systems (ICVS ’06)</conf-name><conf-loc>New York, NY, USA</conf-loc><conf-date>5–7 January 2006</conf-date><fpage>21</fpage><lpage>21</lpage></citation></ref>
<ref id="b17-remotesensing-04-01090"><label>17.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Konolige</surname><given-names>K.</given-names></name><name><surname>Agrawal</surname><given-names>M.</given-names></name><name><surname>Bolles</surname><given-names>R.</given-names></name><name><surname>Cowan</surname><given-names>C.</given-names></name><name><surname>Fischler</surname><given-names>M.</given-names></name><name><surname>Gerkey</surname><given-names>B.</given-names></name></person-group><article-title>Outdoor Mapping and Navigation Using Stereo Vision</article-title><conf-name>Proceedings of 10th International Symposium on Experimental Robotics 2006 (ISER ’06)</conf-name><conf-loc>Rio de Janeiro, Brazil</conf-loc><conf-date>6–10 July 2006</conf-date><fpage>179</fpage><lpage>190</lpage></citation></ref>
<ref id="b18-remotesensing-04-01090"><label>18.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jia</surname><given-names>Z.</given-names></name><name><surname>Balasuriya</surname><given-names>A.</given-names></name><name><surname>Challa</surname><given-names>S.</given-names></name></person-group><article-title>Sensor fusion-based visual target tracking for autonomous vehicles with the out-of-sequence measurements solution</article-title><source>Robot. Auton. Syst</source><year>2008</year><volume>56</volume><fpage>157</fpage><lpage>176</lpage><pub-id pub-id-type="doi">10.1016/j.robot.2007.05.014</pub-id></citation></ref>
<ref id="b19-remotesensing-04-01090"><label>19.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Miller</surname><given-names>A.</given-names></name><name><surname>Babenko</surname><given-names>P.</given-names></name><name><surname>Hu</surname><given-names>M.</given-names></name><name><surname>Shah</surname><given-names>M.</given-names></name></person-group><article-title>Person Tracking in UAV Video</article-title><source>Multimodal Technologies for Perception of Humans</source><person-group person-group-type="editor"><name><surname>Stiefelhagen</surname><given-names>R.</given-names></name><name><surname>Bowers</surname><given-names>R.</given-names></name><name><surname>Fiscus</surname><given-names>J.</given-names></name></person-group><comment>Lecture Notes in Computer Science</comment><publisher-name>Springer</publisher-name><publisher-loc>Berlin/Heidelberg, Germany</publisher-loc><year>2008</year><volume>4625</volume><fpage>215</fpage><lpage>220</lpage></citation></ref>
<ref id="b20-remotesensing-04-01090"><label>20.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Huang</surname><given-names>C.H.</given-names></name><name><surname>Wu</surname><given-names>Y.T.</given-names></name><name><surname>Kao</surname><given-names>J.H.</given-names></name><name><surname>Shih</surname><given-names>M.Y.</given-names></name><name><surname>Chou</surname><given-names>C.C.</given-names></name></person-group><article-title>A Hybrid Moving Object Detection Method for Aerial Images</article-title><source>Advances in Multimedia Information Processing (PCM 2010)</source><person-group person-group-type="editor"><name><surname>Qiu</surname><given-names>G.</given-names></name><name><surname>Lam</surname><given-names>K.</given-names></name><name><surname>Kiya</surname><given-names>H.</given-names></name><name><surname>Xue</surname><given-names>X.Y.</given-names></name><name><surname>Kuo</surname><given-names>C.C.</given-names></name><name><surname>Lew</surname><given-names>M.</given-names></name></person-group><comment>Lecture Notes in Computer Science</comment><publisher-name>Springer</publisher-name><publisher-loc>Berlin/Heidelberg, Germany</publisher-loc><year>2010</year><volume>6297</volume><fpage>357</fpage><lpage>368</lpage></citation></ref>
<ref id="b21-remotesensing-04-01090"><label>21.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Samija</surname><given-names>H.</given-names></name><name><surname>Markovic</surname><given-names>I.</given-names></name><name><surname>Petrovic</surname><given-names>I.</given-names></name></person-group><article-title>Optical Flow Field Segmentation in an Omnidirectional Camera Image Based on Known Camera Motion</article-title><conf-name>Proceedings of the 34th International Convention MIPRO</conf-name><conf-loc>Opatija, Croatia</conf-loc><conf-date>23–27 May 2011</conf-date><fpage>805</fpage><lpage>809</lpage></citation></ref>
<ref id="b22-remotesensing-04-01090"><label>22.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Suganuma</surname><given-names>N.</given-names></name><name><surname>Kubo</surname><given-names>T.</given-names></name></person-group><article-title>Fast Dynamic Object Extraction Using Stereovision Based on Occupancy Grid Maps and Optical Flow</article-title><conf-name>Proceedings of 2011 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM)</conf-name><conf-loc>Budapest, Hungary</conf-loc><conf-date>3–7 July 2011</conf-date><fpage>978</fpage><lpage>983</lpage></citation></ref>
<ref id="b23-remotesensing-04-01090"><label>23.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Klein</surname><given-names>G.</given-names></name><name><surname>Murray</surname><given-names>D.</given-names></name></person-group><article-title>Parallel Tracking and Mapping for Small AR Workspaces</article-title><conf-name>Proceedings of 6th IEEE and ACM International Symposium onMixed and Augmented Reality (ISMAR 2007)</conf-name><conf-loc>Nara, Japan</conf-loc><conf-date>13–16 November 2007</conf-date><fpage>225</fpage><lpage>234</lpage></citation></ref>
<ref id="b24-remotesensing-04-01090"><label>24.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Rosten</surname><given-names>E.</given-names></name><name><surname>Drummond</surname><given-names>T.</given-names></name></person-group><article-title>Machine Learning for High Speed Corner Detection</article-title><conf-name>Proceedings of 9th European Conference on Computer Vision</conf-name><conf-loc>Graz, Austria</conf-loc><conf-date>7–13 May 2006.</conf-date></citation></ref>
<ref id="b25-remotesensing-04-01090"><label>25.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Bouguet</surname><given-names>J.Y.</given-names></name></person-group><source>Pyramidal Implementation of the Lucas Kanade Feature Tracker. Description of the Algorithm</source><comment>Technical Report</comment><publisher-name>Intel Corporation Microprocessor Research Lab</publisher-name><publisher-loc>Santa Clara, CA, USA</publisher-loc><fpage>1999</fpage></citation></ref>
<ref id="b26-remotesensing-04-01090"><label>26.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Achtelik</surname><given-names>M.</given-names></name><name><surname>Achtelik</surname><given-names>M.</given-names></name><name><surname>Weiss</surname><given-names>S.</given-names></name><name><surname>Siegwart</surname><given-names>R.</given-names></name></person-group><article-title>Onboard IMU and Monocular Vision Based Control for MAVs in Unknown in- and Outdoor Environments</article-title><conf-name>Proceedings of 2011 IEEE International Conference on Robotics and Automation (ICRA)</conf-name><conf-loc>Shanghai, China</conf-loc><conf-date>9–13 May 2011</conf-date><fpage>3056</fpage><lpage>3063</lpage></citation></ref>
<ref id="b27-remotesensing-04-01090"><label>27.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lowe</surname><given-names>D.G.</given-names></name></person-group><article-title>Distinctive image features from scale-invariant keypoints</article-title><source>Int. J. Comput. Vis</source><year>2004</year><volume>60</volume><fpage>91</fpage><lpage>110</lpage><pub-id pub-id-type="doi">10.1023/B:VISI.0000029664.99615.94</pub-id></citation></ref>
<ref id="b28-remotesensing-04-01090"><label>28.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Bay</surname><given-names>H.</given-names></name><name><surname>Tuytelaars</surname><given-names>T.</given-names></name><name><surname>Van Gool</surname><given-names>L.</given-names></name></person-group><article-title>SURF: Speeded Up Robust Features</article-title><conf-name>Proceedings of 9th European Conference on Computer Vision (ECCV 2006)</conf-name><conf-loc>Graz, Austria</conf-loc><conf-date>7–13 May 2006</conf-date><volume>3951</volume><fpage>404</fpage><lpage>417</lpage></citation></ref>
<ref id="b29-remotesensing-04-01090"><label>29.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Shi</surname><given-names>J.</given-names></name><name><surname>Tomasi</surname><given-names>C.</given-names></name></person-group><article-title>Good Features to Track</article-title><conf-name>Proceedings of 1994 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’94)</conf-name><conf-loc>Seattle, WA, USA</conf-loc><conf-date>21–23 June 1994</conf-date><fpage>593</fpage><lpage>600</lpage></citation></ref>
<ref id="b30-remotesensing-04-01090"><label>30.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Bonin-Font</surname><given-names>F.</given-names></name><name><surname>Ortiz</surname><given-names>A.</given-names></name><name><surname>Oliver</surname><given-names>G.</given-names></name></person-group><article-title>Experimental Assessment of Different Feature Tracking Strategies for an IPT-based Navigation Task</article-title><conf-name>Proceedings of 7th IFAC Symposium on Intelligent Autonomous Vehicles (IAV)</conf-name><conf-loc>Lecce, Italy</conf-loc><conf-date>6–8 September 2010.</conf-date></citation></ref>
<ref id="b31-remotesensing-04-01090"><label>31.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Fei</surname><given-names>T.</given-names></name><name><surname>Xiao-hui</surname><given-names>L.</given-names></name><name><surname>Zhi-ying</surname><given-names>H.</given-names></name><name><surname>Guo-liang</surname><given-names>H.</given-names></name></person-group><article-title>A Registration Method Based on Nature Feature with KLT Tracking Algorithm for Wearable Computers</article-title><conf-name>Proceedings of 2008 International Conference on Cyberworlds</conf-name><conf-loc>Hangzhou, China</conf-loc><conf-date>22–24 September 2008</conf-date><fpage>416</fpage><lpage>421</lpage></citation></ref>
<ref id="b32-remotesensing-04-01090"><label>32.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Takada</surname><given-names>C.</given-names></name><name><surname>Sugaya</surname><given-names>Y.</given-names></name></person-group><article-title>Detecting Incorrect Feature Tracking by Affine Space Fitting</article-title><conf-name>Proceedings of the 3rd Pacific Rim Symposium on Advances in Image and Video Technology</conf-name><conf-loc>Tokyo, Japan</conf-loc><conf-date>13–16 January 2009</conf-date><volume>5414</volume><fpage>191</fpage><lpage>202</lpage></citation></ref>
<ref id="b33-remotesensing-04-01090"><label>33.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Kanatani</surname><given-names>K.</given-names></name><name><surname>Ohta</surname><given-names>N.</given-names></name></person-group><article-title>Accuracy Bounds and Optimal Computation of Homography for Image Mosaicing Applications</article-title><conf-name>Proceedings of 7th IEEE International Conference on Computer Vision</conf-name><conf-loc>Corfu, Greece</conf-loc><conf-date>20–27 September 1999</conf-date><volume>1</volume><fpage>73</fpage><lpage>78</lpage></citation></ref>
<ref id="b34-remotesensing-04-01090"><label>34.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Inaba</surname><given-names>M.</given-names></name><name><surname>Katoh</surname><given-names>N.</given-names></name><name><surname>Imai</surname><given-names>H.</given-names></name></person-group><article-title>Applications of Weighted Voronoi Diagrams and Randomization to Variance-Based k-Clustering: (Extended abstract)</article-title><conf-name>Proceedings of the 10th Annual Symposium on Computational Geometry</conf-name><conf-loc>Stony Brook, NY, USA</conf-loc><conf-date>6–8 June 1994</conf-date><fpage>332</fpage><lpage>339</lpage></citation></ref>
<ref id="b35-remotesensing-04-01090"><label>35.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Garcia</surname><given-names>V.</given-names></name><name><surname>Debreuve</surname><given-names>E.</given-names></name><name><surname>Barlaud</surname><given-names>M.</given-names></name></person-group><article-title>Region-of-Interest Tracking Based on Keypoint Trajectories on a Group of Pictures</article-title><conf-name>Proceedings of 2007 International Workshop on Content-Based Multimedia Indexing (CBMI ’07)</conf-name><conf-loc>Bordeaux, France</conf-loc><conf-date>25–25 June 2007</conf-date><fpage>198</fpage><lpage>203</lpage></citation></ref></ref-list>
<sec sec-type="display-objects">
<title>Figures</title>
<fig id="f1-remotesensing-04-01090" position="float">
<label>Figure 1.</label>
<caption>
<p>Workflow of the dynamic object identification and tracking process. Based on an image sequence the algorithm provides a list of dynamic objects with several associated characteristics such as position, size, velocity and color. Additional sensors could also provide other parameters, such as the temperature.</p></caption>
<graphic xlink:href="remotesensing-04-01090f1.gif"/></fig>
<fig id="f2-remotesensing-04-01090" position="float">
<label>Figure 2.</label>
<caption>
<p>Selection of features for optical flow calculations based on two different methods. In (a) a homogeneous spaced grid of pixels is shown, in (b) a selected grid of pixels based on 9 point FAST feature detection is reported.</p></caption>
<graphic xlink:href="remotesensing-04-01090f2.gif"/></fig>
<fig id="f3-remotesensing-04-01090" position="float">
<label>Figure 3.</label>
<caption>
<p>Example showing two successive observation areas taking over an object, highlighting the variation in the coordinates of a single point when referring to two different camera frames.</p></caption>
<graphic xlink:href="remotesensing-04-01090f3.gif"/></fig>
<fig id="f4-remotesensing-04-01090" position="float">
<label>Figure 4.</label>
<caption>
<p>Illustrative representation of a real optical flow, an artificial optical flow and the superposition of these two flows. In the last image pixels are highlighted where the discrepancies between the flows surpass some thresholds defined by mathematical rules explained in the text.</p></caption>
<graphic xlink:href="remotesensing-04-01090f4.gif"/></fig>
<fig id="f5-remotesensing-04-01090" position="float">
<label>Figure 5.</label>
<caption>
<p>Pixel association into objects. Groups of vectors with similar orientation and magnitude are grouped as shown in (a). These groups are converted into dynamic objects. As shown in (b) single vectors pointing to different directions are discarded even if present in nearby areas of the image.</p></caption>
<graphic xlink:href="remotesensing-04-01090f5.gif"/></fig>
<fig id="f6-remotesensing-04-01090" position="float">
<label>Figure 6.</label>
<caption>
<p>Hardware used in the experiments. (a) shows an image of the AscTec Pelican Quadrotor used to perform the experiments and (b) shows the PointGrey Firefly Camera used to take the videos.</p></caption>
<graphic xlink:href="remotesensing-04-01090f6.gif"/></fig>
<fig id="f7-remotesensing-04-01090" position="float">
<label>Figure 7.</label>
<caption>
<p>Pattern of markers used to initialize the mapping and tracking algorithm. In (a) we show the design of the marker pattern. In (b) we show a screen capture of the algorithm output window when tracking the marker pattern.</p></caption>
<graphic xlink:href="remotesensing-04-01090f7.gif"/></fig>
<fig id="f8-remotesensing-04-01090" position="float">
<label>Figure 8.</label>
<caption>
<p>Screen captures of the system output in two different frames highlighting a selection of ten significant static points in each frame. The frames (a) and (b) are not consecutive. In this example we have selected two frames separated by one second to better appreciate the differences between the frames. Green points mark features that are appear in the two frames. Red points mark the rest of the selection.</p></caption>
<graphic xlink:href="remotesensing-04-01090f8.gif"/></fig>
<fig id="f9-remotesensing-04-01090" position="float">
<label>Figure 9.</label>
<caption>
<p>Example of detection and definition of a real dynamic object. In (a) we show the real (in green) and the artificial (in red) optical flows for a selection of features to improve the clarity of the example. In (b) we only show the optical flows for the pixels flagged as dynamic. We associate these pixels into dynamic objects, as represented in (c).</p></caption>
<graphic xlink:href="remotesensing-04-01090f9.gif"/></fig>
<fig id="f10-remotesensing-04-01090" position="float">
<label>Figure 10.</label>
<caption>
<p>Example of detection and tracking of a single dynamic object. In (a),(b),(c) and (d) the dynamic object’s position and direction are highlighted as both the UAV and the person move in the environment.</p></caption>
<graphic xlink:href="remotesensing-04-01090f10.gif"/></fig>
<fig id="f11-remotesensing-04-01090" position="float">
<label>Figure 11.</label>
<caption>
<p>Example of detection and tracking of multiple dynamic objects. In (a), (b), (c) and (d) the dynamic objects’ position and direction are highlighted. One of the objects stops moving during this test (d).</p></caption>
<graphic xlink:href="remotesensing-04-01090f11.gif"/></fig></sec></back></article>
