<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Sensors</journal-id>
<journal-title>Sensors</journal-title>
<issn pub-type="epub">1424-8220</issn>
<publisher>
<publisher-name>Molecular Diversity Preservation International (MDPI)</publisher-name></publisher></journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3390/100201093</article-id>
<article-id pub-id-type="publisher-id">sensors-10-01093</article-id>
<article-categories>
<subj-group>
<subject>Article</subject></subj-group></article-categories>
<title-group>
<article-title>Using Fuzzy Logic to Enhance Stereo Matching in Multiresolution Images</article-title></title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Medeiros</surname><given-names>Marcos D.</given-names></name><xref ref-type="aff" rid="af1-sensors-10-01093"><sup>1</sup></xref></contrib>
<contrib contrib-type="author">
<name><surname>Gonçalves</surname><given-names>Luiz Marcos G.</given-names></name><xref ref-type="aff" rid="af1-sensors-10-01093"><sup>1</sup></xref><xref ref-type="corresp" rid="c1-sensors-10-01093"><sup>⋆</sup></xref></contrib>
<contrib contrib-type="author">
<name><surname>Frery</surname><given-names>Alejandro C.</given-names></name><xref ref-type="aff" rid="af2-sensors-10-01093"><sup>2</sup></xref></contrib></contrib-group>
<aff id="af1-sensors-10-01093">
<label>1</label> DCA-CT-UFRN, Campus Universitário, Lagoa Nova, Universidade Federal do Rio Grande do Norte, 59072-970 Natal RN, Brazil; E-Mail: <email>marcosdumay@dca.ufrn.br</email></aff>
<aff id="af2-sensors-10-01093">
<label>2</label> Instituto de Computação, LCCV &amp; CPMAT, Universidade Federal de Alagoas, BR 104 Norte km 97, 57072-970 Maceió AL, Brazil; E-Mail: <email>acfrery@pesquisador.cnpq.br</email></aff>
<author-notes>
<corresp id="c1-sensors-10-01093">
<label>⋆</label> Author to whom correspondence should be addressed; E-Mail: <email>lmarcos@dca.ufrn.br</email>; Tel.: +55-84-9928-0730 or +55-84-3215-3738; Fax: +55-84-3771-3738.</corresp></author-notes>
<pub-date pub-type="collection">
<year>2010</year></pub-date>
<pub-date pub-type="epub">
<day>29</day>
<month>1</month>
<year>2010</year></pub-date>
<volume>10</volume>
<issue>2</issue>
<fpage>1093</fpage>
<lpage>1118</lpage>
<history>
<date date-type="received">
<day>10</day>
<month>12</month>
<year>2009</year></date>
<date date-type="rev-recd">
<day>14</day>
<month>1</month>
<year>2010</year></date>
<date date-type="accepted">
<day>18</day>
<month>1</month>
<year>2010</year></date></history>
<permissions>
<copyright-statement>© 2010 by the authors; licensee Molecular Diversity Preservation International, Basel, Switzerland.</copyright-statement>
<copyright-year>2010</copyright-year>
<license>
<p>This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).</p></license></permissions>
<abstract>
<p>Stereo matching is an open problem in Computer Vision, for which local features are extracted to identify corresponding points in pairs of images. The results are heavily dependent on the initial steps. We apply image decomposition in multiresolution levels, for reducing the search space, computational time, and errors. We propose a solution to the problem of how deep (coarse) should the stereo measures start, trading between error minimization and time consumption, by starting stereo calculation at varying resolution levels, for each pixel, according to fuzzy decisions. Our heuristic enhances the overall execution time since it only employs deeper resolution levels when strictly necessary. It also reduces errors because it measures similarity between windows with enough details. We also compare our algorithm with a very fast multi-resolution approach, and one based on fuzzy logic. Our algorithm performs faster and/or better than all those approaches, becoming, thus, a good candidate for robotic vision applications. We also discuss the system architecture that efficiently implements our solution.</p></abstract>
<kwd-group>
<kwd>image analysis</kwd>
<kwd>fuzzy rules</kwd>
<kwd>multiresolution</kwd>
<kwd>sensor configuration</kwd>
<kwd>stereo matching</kwd>
<kwd>vision</kwd></kwd-group></article-meta></front>
<body>
<sec sec-type="intro">
<label>1.</label>
<title>Introduction</title>
<p>The goal of stereo vision is to recover 3D information given incomplete and possibly noisy information of the scene [<xref ref-type="bibr" rid="b1-sensors-10-01093">1</xref>, <xref ref-type="bibr" rid="b2-sensors-10-01093">2</xref>]. Depth (or shape) is useful for terrain mapping [<xref ref-type="bibr" rid="b3-sensors-10-01093">3</xref>], robot controlling [<xref ref-type="bibr" rid="b4-sensors-10-01093">4</xref>–<xref ref-type="bibr" rid="b7-sensors-10-01093">7</xref>] and several other applications. Shape from shading, structured light and stereoscopy are among the many possible sources of information. In this work we propose enhancements to the determination of matching points in pairs of images, which stems as the bottleneck of the stereo vision process.</p>
<p>Our approach consists of performing an initial coarse matching between low resolution versions of the original images. The result is refined on small areas of increasingly higher resolution, until the matching is done between pixels in the original images resolution level. This is usually termed “coarse to fine” or “cascade correlation”.</p>
<p>Multiresolution procedures can, in principle, be performed in any order, even in a backwards and forwards scheme, but our choice is based upon computational considerations aiming at reducing the required processing time. Multiresolution matching, in particular, is known to reduce the complexity of several classes of image processing applications, including the matching problem, leading to fast implementations. The general problem with multiresolution algorithms is that, more often than not, they start with the coarsest resolution for all pixels and thus spend a long time. Our approach improves the search for an optimal resolution where to find correspondence points.</p>
<p>The main contribution of this work is proposing, implementing and assessing a multiresolution matching algorithm with starting points whose levels depend on local information. Such levels are computed using a new heuristic based on fuzzy decisions, yielding good quality and fast processing.</p>
<p>The paper unfolds as follows. Section 2 presents a review of image matching, focused on the use of multilevel and fuzzy techniques. Section 3 formulates the problem. Section 4 presents the main algorithms, and Section 5 discusses relevant implementation details. Section 6 presents results, and Section 7 closes with the main contributions, drawbacks and possible extensions of this work.</p></sec>
<sec>
<label>2.</label>
<title>State of the Art</title>
<p>Vision is so far the most powerful biological sensory system. Since computers appeared, several artificial vision systems have been proposed, inspired by their biological versions, aiming at providing vision to machines. However, the heterogeneity of techniques necessary for modeling complete vision algorithms makes the implementation of a real-time vision system a hard and complex task.</p>
<p>Stereo vision is used to recover the depth of scene objects, given two different images of them. This is a well-defined problem, with several text books and articles in the literature [<xref ref-type="bibr" rid="b1-sensors-10-01093">1</xref>, <xref ref-type="bibr" rid="b2-sensors-10-01093">2</xref>, <xref ref-type="bibr" rid="b8-sensors-10-01093">8</xref>–<xref ref-type="bibr" rid="b11-sensors-10-01093">11</xref>]. Disparity calculation is the main issue, making it a complex problem. Several algorithms have been proposed in order to enhance precision or to reduce the complexity of the problem [<xref ref-type="bibr" rid="b12-sensors-10-01093">12</xref>–<xref ref-type="bibr" rid="b16-sensors-10-01093">16</xref>]. Features as depth (or a disparity map) are useful for terrain mapping [<xref ref-type="bibr" rid="b3-sensors-10-01093">3</xref>], robot controlling [<xref ref-type="bibr" rid="b6-sensors-10-01093">6</xref>, <xref ref-type="bibr" rid="b7-sensors-10-01093">7</xref>, <xref ref-type="bibr" rid="b17-sensors-10-01093">17</xref>] and several other applications.</p>
<p>Stereo matching is generally defined as the problem of discovering points or regions of one image that match points or regions of the other image on a stereo image pair. That is, the goal is finding pairs of points or regions in two images that have local image characteristics most similar to each other [<xref ref-type="bibr" rid="b1-sensors-10-01093">1</xref>, <xref ref-type="bibr" rid="b2-sensors-10-01093">2</xref>, <xref ref-type="bibr" rid="b8-sensors-10-01093">8</xref>–<xref ref-type="bibr" rid="b10-sensors-10-01093">10</xref>, <xref ref-type="bibr" rid="b18-sensors-10-01093">18</xref>–<xref ref-type="bibr" rid="b20-sensors-10-01093">20</xref>]. The result of the matching process is the displacement between the points in the images, or disparity, also called the 2.5D information. Depth reconstruction can be directly calculated from this information, generating a 3D model of the detected objects using triangulation or other mesh representation. Disparity can also be directly used for other purposes as, for instance, real-time navigation [<xref ref-type="bibr" rid="b21-sensors-10-01093">21</xref>].</p>
<p>There are several stereo matching algorithms, generally classified into two categories: area matching and/or feature (element) matching [<xref ref-type="bibr" rid="b1-sensors-10-01093">1</xref>]. Area matching algorithms are characterized by comparing features distributed over regions. Feature matching uses local features, edges and borders for instance, with which it is possible to perform the matching.</p>
<p>Area based algorithms are usually slower than feature based ones, but they generate full disparity maps and error estimates. Area based algorithms usually employ correlation estimates between image pairs for generating the match. Such estimates are obtained using discrete convolution operations between images templates. The algorithm performance is, thus, very dependent on the correlation and on the search window sizes. Small correlation windows usually generate maps that are more sensitive to noise, but less sensitive to occlusions, better defining the objects [<xref ref-type="bibr" rid="b22-sensors-10-01093">22</xref>].</p>
<p>In order to exploit the advantages of both small and big windows, algorithms based on variable window size were proposed [<xref ref-type="bibr" rid="b3-sensors-10-01093">3</xref>, <xref ref-type="bibr" rid="b22-sensors-10-01093">22</xref>, <xref ref-type="bibr" rid="b23-sensors-10-01093">23</xref>]. These algorithms trade better quality of matching for shorter execution time. In fact, the use of full resolution images fairly complicates the stereo matching process, mainly if real time is a requirement.</p>
<p>Several models have been proposed in the literature for image data reduction. Most of them treat visual data as a classical pyramidal structure. The scale space theory is formalized by Witkin [<xref ref-type="bibr" rid="b24-sensors-10-01093">24</xref>] and by Lindeberg [<xref ref-type="bibr" rid="b25-sensors-10-01093">25</xref>]. The Laplacian pyramid is formally introduced by Burt and Adelson [<xref ref-type="bibr" rid="b26-sensors-10-01093">26</xref>], but its first use in visual search tasks is by Uhr [<xref ref-type="bibr" rid="b27-sensors-10-01093">27</xref>]. Several works use it as input, mainly for techniques that employ visual attention [<xref ref-type="bibr" rid="b28-sensors-10-01093">28</xref>, <xref ref-type="bibr" rid="b29-sensors-10-01093">29</xref>].</p>
<p>Wavelets [<xref ref-type="bibr" rid="b30-sensors-10-01093">30</xref>] are also used for building multiresolution images [<xref ref-type="bibr" rid="b31-sensors-10-01093">31</xref>], with applications in stereo matching [<xref ref-type="bibr" rid="b32-sensors-10-01093">32</xref>–<xref ref-type="bibr" rid="b34-sensors-10-01093">34</xref>]. Other multiresolution algorithms have also been used for the development of real-time stereo vision systems, using small (reduced) versions of the images [<xref ref-type="bibr" rid="b35-sensors-10-01093">35</xref>, <xref ref-type="bibr" rid="b36-sensors-10-01093">36</xref>].</p>
<p>Multiresolution algorithms mix both area and feature matching for achieving fast execution [<xref ref-type="bibr" rid="b34-sensors-10-01093">34</xref>, <xref ref-type="bibr" rid="b37-sensors-10-01093">37</xref>]. Multiresolution matching can even reduce the asymptotic complexity of the matching problem, but at the expense of worse results.</p>
<p>Besides the existence of these <italic>direct</italic> algorithms, Udupa [<xref ref-type="bibr" rid="b38-sensors-10-01093">38</xref>] suggests that approaches based on fuzzy sets should be taken into consideration, considering the fact that images are inherently fuzzy. Such approach should be able to handle realistically uncertainties and heterogeneity of object properties.</p>
<p>Several works use logic fuzzy clustering algorithms in stereo matching in order to accelerate the correspondence process [<xref ref-type="bibr" rid="b39-sensors-10-01093">39</xref>–<xref ref-type="bibr" rid="b46-sensors-10-01093">46</xref>]; some of these technique achieve real time processing. The idea is to pre-process images, group features by some fuzzy criteria or guide the search so the best match between features can be determined, or at least guided, using a small set of candidate features. Fuzzy logic for object identification and feature recovering on stereo images and video is also used [<xref ref-type="bibr" rid="b47-sensors-10-01093">47</xref>–<xref ref-type="bibr" rid="b50-sensors-10-01093">50</xref>].</p>
<p>Fuzzy theory is also applied to determine the best window size with which to process correlation measures in images [<xref ref-type="bibr" rid="b51-sensors-10-01093">51</xref>]. This is in certain degree related to our work, since we determine the best resolution level to start stereo matching, which means determining window size if only one level of resolution would be used. Fuzzy techniques have also been used in tracking and robot control with stereo images [<xref ref-type="bibr" rid="b52-sensors-10-01093">52</xref>–<xref ref-type="bibr" rid="b54-sensors-10-01093">54</xref>].</p>
<p>Our proposed approach is rather different from the above-listed works and integrates multiresolution procedures with fuzzy techniques. As stated above, the main problem with the multiresolution approach is how to determine the level with which to start correlation measures. A second problem is that, even if a good level is determined for a given pixel, this will not be the best for all the other image pixels, because this issue is heavily dependent on local image characteristics. So, we propose the use of fuzzy rules in order to determine the optimal level for each region in the image. This proposal leads to the precise determination of matching points in real time, since most of the image area is not considered in full resolution.</p>
<p>Our algorithm performs faster and better than plain correlation, and it presents improved results with respect to a very fast multi-resolution approach [<xref ref-type="bibr" rid="b17-sensors-10-01093">17</xref>], and one based on fuzzy logic [<xref ref-type="bibr" rid="b41-sensors-10-01093">41</xref>].</p>
<p>This paper extends results by Medeiros and Gonçalves [<xref ref-type="bibr" rid="b55-sensors-10-01093">55</xref>] by presenting an updated literature review, by a more detailed discussion and explanation about the proposed technique and by the presentation and discussion of further results.</p></sec>
<sec>
<label>3.</label>
<title>Stereo Matching Problem</title>
<p>In the stereo matching problem, we have a pair of pictures of the same scene taken from different positions, and possibly orientations, and the goal is to discover corresponding points, that is, pixels in both images that are projections of the same scene point. The most intuitive way of doing that is by comparing groups of pixels of the two images to obtain a similarity value. After similarities are computed, one may or may not include restrictions and calculate the matching that maximizes the global similarity. Our proposal assumes (i) continuity of disparity, and (ii) uniqueness of the correct matching.</p>
<p>In general, given a point in one image, the comparison is not made with all points of the other image. Using the epipolar restriction [<xref ref-type="bibr" rid="b2-sensors-10-01093">2</xref>, <xref ref-type="bibr" rid="b16-sensors-10-01093">16</xref>], only pixels on a certain line in one image are the corresponding candidates of a pixel in the other one. The orientation of this line depends only of the relative orientation of the two cameras. The test images used in the current work have a horizontal epipolar line, thus pixels are searched only in such direction.</p>
<p>We measure similarity with the normalized sample cross correlation between images <italic>x</italic> = (<italic>x</italic>(<italic>i</italic>, <italic>j</italic>))<sub>1≤<italic>i</italic>≤<italic>m</italic>,1≤<italic>j</italic>≤<italic>n</italic></sub> and <italic>y</italic> = (<italic>y</italic>(<italic>i</italic>, <italic>j</italic>))<sub>1≤<italic>i</italic>≤<italic>m</italic>,1≤<italic>j</italic>≤<italic>n</italic></sub>, estimated by the linear Pearson correlation coefficient as
<disp-formula id="FD1">
<label>(1)</label>
<mml:math display="block">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>r</mml:mi></mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mo>∑</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:msub>
<mml:mo> </mml:mo>
<mml:mo stretchy="false">[</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">]</mml:mo>
<mml:mo>−</mml:mo>
<mml:mo stretchy="false">[</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo>∑</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:msub>
<mml:mo> </mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">]</mml:mo>
<mml:mo stretchy="false">[</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo>∑</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:msub>
<mml:mi>y</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">]</mml:mo></mml:mrow>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mo>∑</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:msub>
<mml:mo> </mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo stretchy="false">]</mml:mo></mml:mrow>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>−</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mo>∑</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:msub>
<mml:mo> </mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo stretchy="false">]</mml:mo></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:msqrt>
<mml:msqrt>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mo>∑</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:msub>
<mml:mo> </mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo stretchy="false">]</mml:mo></mml:mrow>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>−</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mo>∑</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:msub>
<mml:mi>y</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo stretchy="false">]</mml:mo></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac>
<mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>If the objects are known to lie within a distance range, the search for the best match can be restricted to a subset of the epipolar line. We will refer to this subset as the “search interval”, to avoid confusion with the refining interval that will be defined latter.</p>
<p>Small search intervals, if can be defined, improve the quality of the resulting matching and avoid false positives that are far from the desired match on the epipolar line. While for many problems this is convenient, for some, remarkably in robotic vision, near objects are the most important ones, requiring thus a full matching between the images.</p>
<sec>
<label>3.1.</label>
<title>Plain correlation algorithm</title>
<p>We compare here the plain correlation and multiresolution matching approaches. Both algorithms have as common attribute the window size. Although some authors recommend the use of a 7 × 7 window for plain correlation (see, for instance, the work of Hirshmuller [<xref ref-type="bibr" rid="b22-sensors-10-01093">22</xref>]), we opted for testing several window sizes in order to compare the relative performances of both approaches.</p>
<p>Traditional plain correlation calculates the normalized, linear cross correlation between all possible windows of both images. For each point in one image, the matching point is chosen in the other image such as to maximize the correlation coefficient.</p>
<p>When matching square images of side <italic>w</italic>, this algorithm calculates <italic>w</italic><sup>3</sup> correlations, but when a search interval <italic>w<sub>s</sub> &lt; w</italic> is available, the number of correlations drops down to <italic>w<sub>s</sub>w</italic><sup>2</sup>. Of course, in the worst case, we should assume that the plain correlation approach would have <italic>O</italic>(<italic>w</italic><sup>3</sup>) complexity.</p></sec>
<sec>
<label>3.2.</label>
<title>Multiresolution matching with fixed depth</title>
<p>Multi-resolution stereo matching uses several pairs of images of the same scene, sampled with different levels of detail, as a double pyramidal representation of the scene [<xref ref-type="bibr" rid="b17-sensors-10-01093">17</xref>]. As in any scale space, images at the base of the pyramid have higher resolution and, therefore, more detail of the scene than those at the top. The credit for using this idea in visual tasks can be given to Uhr [<xref ref-type="bibr" rid="b27-sensors-10-01093">27</xref>]. The scale space theory is formalized by Witkin [<xref ref-type="bibr" rid="b24-sensors-10-01093">24</xref>], and further by Lindeberg [<xref ref-type="bibr" rid="b25-sensors-10-01093">25</xref>]. A variation, the Laplacian pyramid, was introduced by Burt and Adelson [<xref ref-type="bibr" rid="b26-sensors-10-01093">26</xref>]. Tsotsos [<xref ref-type="bibr" rid="b56-sensors-10-01093">56</xref>, <xref ref-type="bibr" rid="b57-sensors-10-01093">57</xref>] integrated multi-resolution into visual attention, implemented as such by Burt [<xref ref-type="bibr" rid="b58-sensors-10-01093">58</xref>], and used in several visual models [<xref ref-type="bibr" rid="b28-sensors-10-01093">28</xref>, <xref ref-type="bibr" rid="b29-sensors-10-01093">29</xref>, <xref ref-type="bibr" rid="b59-sensors-10-01093">59</xref>, <xref ref-type="bibr" rid="b60-sensors-10-01093">60</xref>]. Based on multi-resolution, Lindeberg [<xref ref-type="bibr" rid="b61-sensors-10-01093">61</xref>] detected features using an automatic scale selection algorithm, while Lowe [<xref ref-type="bibr" rid="b62-sensors-10-01093">62</xref>] dealt with detection of scale-invariant features.</p>
<p>Multiresolution algorithms in stereo matching calculate the disparity of all pixels (or blocks of pixels) of a coarse level image and refine them, matching the pixels of finer level images with a small number of pixels around the coarser match. We refer to the interval that contains those pixels as the “refining interval”.</p>
<p>For example, a multiresolution algorithm with fixed depth that matches the points of two 256 × 256 pixels images, say <italic>x</italic><sub>0</sub> and <italic>y</italic><sub>0</sub>, may use three pairs of images having, thus, level 3 of sizes 128 × 128, 64 × 64 and 32 × 32; we denote these pairs of images (<italic>x<sub>ℓ</sub></italic>, <italic>y<sub>ℓ</sub></italic>), 1 ≤ <italic>ℓ</italic> ≤ 3 respectively. Note that usually <italic>x<sub>ℓ</sub></italic>(<italic>i</italic>, <italic>j</italic>) = (<italic>x</italic><sub><italic>ℓ−</italic>1</sub>(2<italic>i</italic>, 2<italic>j</italic>) + <italic>x</italic><sub><italic>ℓ−</italic>1</sub>(2<italic>i</italic> + 1, 2<italic>j</italic>) + <italic>x</italic><sub><italic>ℓ−</italic>1</sub>(2<italic>i</italic>, 2<italic>j</italic> + 1) + <italic>x</italic><sub><italic>ℓ−</italic>1</sub>(2<italic>i</italic> + 1, 2<italic>j</italic> + 1))<italic>/</italic>4, for every 1 ≤ <italic>ℓ</italic> ≤ 3, but other operators are also possible as will be seen in Section 4. In this case the window size is <italic>w</italic> = 2. The same transformation is recursively applied to <italic>y</italic><sub>0</sub> in order to obtain <italic>y</italic><sub>1</sub>, <italic>y</italic><sub>2</sub> and <italic>y</italic><sub>3</sub>. We omit the dependence of the coordinates (<italic>i</italic>, <italic>j</italic>) on the level <italic>ℓ</italic> for the sake of simplicity.</p>
<p>The classical approach would attempt to match all the 32 × 32 pixels of the pair (<italic>x</italic><sub>3</sub>, <italic>y</italic><sub>3</sub>) to, then, proceed to their refinement. The refinement of pixel <italic>x</italic><sub>3</sub>(<italic>i</italic>, <italic>j</italic>) consists of correlating the values <italic>x</italic><sub>2</sub>(2<italic>i</italic>, 2<italic>j</italic>), <italic>x</italic><sub>2</sub>(2<italic>i</italic> + 1, 2<italic>j</italic>), <italic>x</italic><sub>2</sub>(2<italic>i</italic>, 2<italic>j</italic> + 1) and <italic>x</italic><sub>2</sub>(2<italic>i</italic> + 1, 2<italic>j</italic> + 1) with the pixels within the refining interval around the matching point of <italic>y</italic><sub>2</sub>. This is repeated until the matching is done on the (<italic>x</italic><sub>0</sub>, <italic>y</italic><sub>0</sub>) pair, obtaining the final result.</p>
<p>This approach is known to be faster than the brute force search on (<italic>x</italic><sub>0</sub>, <italic>y</italic><sub>0</sub>) (plain correlation). In fact, on the extreme case, where the images are squares and the smallest ones are single pixels, it requires <italic>w</italic><sup>2</sup> log(<italic>w</italic>) correlations, were <italic>w</italic> is the window size, thus its complexity is <italic>O</italic>(<italic>w</italic><sup>2</sup> log(<italic>w</italic>)). Of course, there is the time used for building the pyramid. So, to determine final algorithm complexity, one must add the complexity for building the pyramid, which is <italic>O</italic>(<italic>w</italic><sup>2</sup>) + <italic>O</italic>(<italic>w</italic><sup>2</sup><italic>/</italic>4) + ⋯ + <italic>O</italic>(<italic>w</italic><sup>2</sup><italic>/w</italic><sup>2</sup>), with the complexity of the matching, given above, which results anyway in <italic>O</italic>(<italic>w</italic><sup>2</sup> log(<italic>w</italic>)).</p>
<p>Reducing the search interval is not very efficient at improving this algorithm, since the gain in operations comes at the expense of more errors. Often, important characteristics are lost in the smaller images, reducing correlation precision. Those errors can sometimes be alleviated by a larger refining interval, which increases the execution time.</p>
<p>In practice, some implementations relate that the processing time used for building the multiresolution pyramid often compensates for the time gained on optimizing the correlations [<xref ref-type="bibr" rid="b22-sensors-10-01093">22</xref>]. This basic multiresolution matching is seldom used in current applications [<xref ref-type="bibr" rid="b21-sensors-10-01093">21</xref>].</p></sec></sec>
<sec>
<label>4.</label>
<title>Proposal: Multiresolution Matching with Variable Depth</title>
<p>As previously seen, plain correlation matching is very expensive and prone to generating errors such as ambiguity or lack of correspondence when there is not enough texture detail. On the other hand, multiresolution matching with fixed depth also tends to generate errors, but most of the pixels are still near correctly assigned. Also, the number of errors increases with the depth of the algorithm, since they are due to loss of information on the coarser images.</p>
<p>To get the best of both algorithms, one could assign for each pixel a different level: hard-to-compute positions should be treated at the highest resolution, while the others could be treated at an optimum, coarser level with just enough information. This adaptive approach, which is the proposed multiresolution matching with variable depth, will be shown to be able to reduce errors while still requiring less computational effort. The optimal level is computed on one of the images, and then each displacement is calculated in the same way as is done on the fixed depth algorithm.</p>
<p>An heuristic is, then, needed to calculate the desired depth. Also, we need to generate the small resolution images.</p>
<p>The proposed algorithm uses, for each image, a scale pyramid with several resolution versions of the original image, and one or more detail images. Scale images are obtained by a sub-band filter applied to the original images, while detail images are obtained by filtering the contents of the same level, scale image. We assessed two distinct approaches for the pyramid creation that differentiate mainly in the manner that the detail images are calculated: wavelets, and by Gaussian and Laplacian operators. They are described in the following sections.</p>
<sec>
<label>4.1.</label>
<title>Building the pyramids with wavelets</title>
<p>We used a discrete wavelet transform to build the pyramids. With this approach, in a given level <italic>i</italic>, the scale image of the pyramid (<italic>I<sub>i</sub></italic>) is obtained by applying a low pass filter (<italic>L</italic>) to the scale image of level <italic>i</italic> − 1 followed by a decimation (↓). Detail images <italic>D<sub>i</sub></italic> (with vertical, horizontal and diagonal details) are calculated using high-pass filters applied to the scale image of level <italic>i</italic> − 1 followed by a decimation. <xref ref-type="fig" rid="f1-sensors-10-01093">Figure 1</xref> shows the schema for calculating a wavelet pyramid of level 2. We used the Daubechies and Haar bases [<xref ref-type="bibr" rid="b63-sensors-10-01093">63</xref>].</p></sec>
<sec>
<label>4.2.</label>
<title>Building the pyramids with Gaussian and Laplacian operators</title>
<p>We build two multiresolution pyramids by successively convolving the previous images with the low-pass Gaussian (ϒ<italic><sub>G</sub></italic>) and high-pass Laplacian masks (ϒ<italic><sub>L</sub></italic>) defined in <xref ref-type="disp-formula" rid="FD2">Equations (2)</xref>, and then decimating:
<disp-formula id="FD2">
<label>(2)</label>
<mml:math display="block">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="normal">ϒ</mml:mi></mml:mrow>
<mml:mi>G</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>16</mml:mn></mml:mfrac>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mn>1</mml:mn></mml:mtd>
<mml:mtd>
<mml:mn>2</mml:mn></mml:mtd>
<mml:mtd>
<mml:mn>1</mml:mn></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mn>2</mml:mn></mml:mtd>
<mml:mtd>
<mml:mn>4</mml:mn></mml:mtd>
<mml:mtd>
<mml:mn>2</mml:mn></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mn>1</mml:mn></mml:mtd>
<mml:mtd>
<mml:mn>2</mml:mn></mml:mtd>
<mml:mtd>
<mml:mn>1</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:mrow>
<mml:mo>]</mml:mo></mml:mrow>
<mml:mo>,</mml:mo>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="normal">ϒ</mml:mi></mml:mrow>
<mml:mi>L</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mn>1</mml:mn></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mn>1</mml:mn></mml:mtd>
<mml:mtd>
<mml:mn>4</mml:mn></mml:mtd>
<mml:mtd>
<mml:mn>1</mml:mn></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mn>1</mml:mn></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow>
<mml:mo>]</mml:mo></mml:mrow>
<mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>With this, we generate a pyramid of images and another of details. <xref ref-type="fig" rid="f2-sensors-10-01093">Figure 2</xref> illustrates this filtering process used for the creation of a pyramid with three levels. By convolving the original image <italic>I</italic><sub>0</sub> with the high-pass filter (<italic>H</italic> mask), image <italic>D</italic><sub>0</sub> is generated. <italic>I</italic><sub>0</sub> is then convolved with the low-pass filter defined by the mask <italic>L</italic>, and decimated by a factor 2, which generates <italic>I</italic><sub>1</sub>. This last image is then convolved again with the high-pass filter defined by the mask <italic>H</italic>, generating <italic>D</italic><sub>1</sub>. A second low-pass filter (<italic>L</italic>) followed by a decimation, applied to <italic>I</italic><sub>1</sub>, generates image <italic>I</italic><sub>2</sub>, which is finally filtered by <italic>H</italic> generating <italic>D</italic><sub>2</sub>.</p>
<p>These two pyramids are able to retain enough information in order to allow an efficient search for matching points.</p>
<p>The use of a sub-band filtering makes this algorithm much faster than the one proposed by Hoff and Ahuja [<xref ref-type="bibr" rid="b37-sensors-10-01093">37</xref>], by removing the bottle-neck which is filtering. This fact, plus a lower error rate, allows to use a smaller refinement interval, which makes the multiresolution matching with variable depth much faster than the one with fixed depth and than the simple correlation approach in the original images.</p>
<p>Due to decimation, the construction of the scale images of the pyramid cannot be made shift-invariant. However, the detail images can be shift-invariant and this is a key difference between the two techniques. In the case of wavelets, the detail images are sensitive to shifts, but with 2D filtering they are invariant.</p>
<p>The wavelet transform is invertible. 2D filtering based transform is invertible only if both the high-pass and low-pass filters are ideal filters [<xref ref-type="bibr" rid="b64-sensors-10-01093">64</xref>], which amounts to using convolution masks of the size of the original image. In order to be economic, small masks are employed and, therefore, this transformation is not invertible.</p></sec>
<sec>
<label>4.3.</label>
<title>Desired level calculation</title>
<p>We use a propositional logic based on fuzzy evidence to derive a heuristic for calculating the desired level from which the matching will be performed. Such level is the coarsest one that can be labeled as “reliable”, in the sense that it provides enough information for the matching.</p>
<p>Fuzzy logic is composed of propositions <italic>P</italic> with continuous rather than binary truth values <italic>μ</italic>(<italic>P</italic>) ∈ [0, 1]. We used the following operators on those propositions: “¬”, where <italic>μ</italic>(¬<italic>P</italic>) = 1 − <italic>μ</italic>(<italic>P</italic>), “∧”, where <italic>μ</italic>(<italic>A</italic> ∧ <italic>B</italic>) = min(<italic>μ</italic>(<italic>A</italic>), <italic>μ</italic>(<italic>B</italic>)), “∨”, where <italic>μ</italic>(<italic>A</italic> ∨ <italic>B</italic>) = max(<italic>μ</italic>(<italic>A</italic>), <italic>μ</italic>(<italic>B</italic>)), “⇒”, where (<italic>A</italic> ⇒ <italic>B</italic>) ⇔ (<italic>μ</italic>(<italic>B</italic>) ≥ <italic>μ</italic>(<italic>A</italic>)) and “⇏”, where (<italic>A</italic> ⇏ <italic>B</italic>) ⇔ (<italic>μ</italic>(<italic>A</italic>) <italic>&gt; μ</italic>(<italic>B</italic>)).</p>
<p>We define a predicate <italic>σ<sub>ℓ</sub></italic>(<italic>i</italic>, <italic>j</italic>) meaning “the classification of the block at position (<italic>i</italic>, <italic>j</italic>) and level <italic>ℓ</italic> is not reliable”. This predicate must satisfy the following conditions:
<list list-type="bullet">
<list-item>
<p>If the detail at (<italic>i</italic>, <italic>j</italic>) is zero, the classification is reliable: <italic>D</italic>(<italic>i</italic>, <italic>j</italic>) ≠ 0 ⇒ <italic>σ<sub>ℓ</sub></italic>(<italic>i</italic>, <italic>j</italic>), where <italic>D</italic> is the amount of detail available.</p></list-item>
<list-item>
<p>The deeper the classification the less reliable it is: if <italic>K<sub>ℓ</sub></italic><sub>+1</sub>(<italic>i</italic>, <italic>j</italic>) is the set of pixels at level <italic>ℓ</italic> + 1 that collapse into pixel (<italic>i</italic>, <italic>j</italic>) at level <italic>ℓ</italic>, we have that ∨<sub><italic>v</italic>∈<italic>K</italic><sub><italic>ℓ</italic>+1</sub>(<italic>i,j</italic>)</sub> <italic>σ<sub>ℓ</sub></italic>(<italic>v</italic>) ⇒ <italic>σ</italic><sub><italic>ℓ</italic>+1</sub>(<italic>i</italic>, <italic>j</italic>).</p></list-item></list>Lack of texture details may cause accumulation of small errors, but this conflicts with getting always some minimum texture at the coarsest level, so we opted not to accumulate errors.</p>
<p>Because short execution time is our main objective, the heuristic has to be easy to compute by general purpose computers, leading to <xref ref-type="disp-formula" rid="FD3">Equation (3)</xref>:
<disp-formula id="FD3">
<label>(3)</label>
<mml:math display="block">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>σ</mml:mi></mml:mrow>
<mml:mi>ℓ</mml:mi></mml:msub>
<mml:mo> </mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:munder>
<mml:mo>∨</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>∈</mml:mo>
<mml:mi>K</mml:mi></mml:mrow></mml:munder>
<mml:msub>
<mml:mrow>
<mml:mi>σ</mml:mi></mml:mrow>
<mml:mrow>
<mml:mi>ℓ</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo> </mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow>
<mml:mo>∨</mml:mo>
<mml:mi>D</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>≠</mml:mo>
<mml:mn>0.</mml:mn></mml:mrow></mml:math></disp-formula></p>
<p>We define, for any <italic>a</italic> ∈ [−1, 1], <italic>μ</italic>(<italic>a</italic> ≠ 0) = |<italic>a</italic>|, completely specifying the heuristic. Defining a dependability threshold <italic>δ</italic> ∈ [0, 1], our desired level for each pixel is the maximum level <italic>ℓ</italic> for which <italic>δ</italic> ⇒ <italic>σ<sub>ℓ</sub></italic>.</p>
<p>The ideal values of <italic>δ</italic> depend on the amount of detail in the image and, in principle, different values of <italic>δ</italic> should be associated to each pixel. For example, an image with substantial detail (texture) would be better treated at highest resolution, i.e., it should have values of <italic>δ</italic> very close to zero. Flat images with little detail could be dealt with at very coarse resolution without loosing information, <italic>i.e.</italic>, with <italic>δ</italic> close to 1. <xref ref-type="fig" rid="f3-sensors-10-01093">Figure 3</xref> illustrates this with a 5 × 5 image, where each pixel has a different <italic>δ</italic> associated to it; notice that the smallest values are associated to the border, where there is detail that would be lost if treated at a coarse resolution.</p>
<p>However, the amount of texture is not known a priori. So, in this work, an empirically value is assigned for <italic>δ</italic> and kept constant for the whole image. In practice, we found that values greater than 0.2, cause the algorithm not to perform well, as it will be seen in the experiments.</p></sec>
<sec>
<label>4.4.</label>
<title>Execution time considerations</title>
<p>The fuzzy heuristic presented above is able to assign a proper level to every pixel of an image, identifying detailed and flat areas. A successful technique for our purposes should be able to detect the level of detail of each image region based on texture. Flat regions should be treated at coarser, <italic>i.e.</italic>, higher levels of the pyramid (at the pyramid top) since they carry less information than detailed regions, which should be treated at lower levels (at the pyramid basis).</p>
<p>As it will be shown, at the coarsest level, the variable depth multiresolution matching also makes less mistakes than the fixed depth approaches. Because of that, we were able to obtain good results even with a refining interval as small as four pixels wide, leading to very fast execution.</p>
<p>The implementation of our proposal requires complex memory management that allocates and frees amounts of memory equivalent of several pages of the most common processors. Most operating systems lose performance on such conditions. So, also as a contribution of this work, we implemented a secondary memory management strategy that uses a buffer allocated only once at the beginning of execution. This pre-allocated memory is then managed by our procedure avoiding several calls to the operating system to perform this task. This approach alleviates the execution time, rendering a still faster procedure.</p></sec></sec>
<sec>
<label>5.</label>
<title>System Architecture</title>
<p>The proposed technique was implemented as a C++ library and a collection of test programs. This library generates disparity maps using the default correlation method and our approach, using multi-resolution with variable depth, considering or not a search interval. Due to the complexity of this library, its implementation was divided in several modules as shown in <xref ref-type="fig" rid="f4-sensors-10-01093">Figure 4</xref>.</p>
<p>The <italic>Basics</italic> module contains common classes used by other modules. <italic>Signal</italic> is composed by classes that store and operate on images. <italic>Memory</italic> comprises the classes responsible for memory management and for the implementation of the data structures used. <italic>FuzzyLogic</italic> implements the fuzzy decision given in <xref ref-type="disp-formula" rid="FD3">Equation (3)</xref>, and disparity calculation. <italic>Vision</italic> is composed by classes that implement the stereo vision algorithms and related functions. <italic>Utils</italic> packs auxiliary code used for the manipulation of the test images and extraction of results from data.</p>
<p>Each module is detailed in the following.</p>
<sec>
<label>5.1.</label>
<title>Module Basics</title>
<p>This module contains the library 
<monospace>ops.h</monospace>, that implements operations which are required in almost every stage. It also has the classes 
<monospace>Position</monospace>, that stores a position of type “(row, column)”, 
<monospace>Window</monospace>, that defines a rectangular area of interest, and 
<monospace>Interval</monospace>, that defines a connected subset of integer values. Classes 
<monospace>Window</monospace> and 
<monospace>Interval</monospace> also store some pre-calculated values used to accelerate the matching.</p></sec>
<sec>
<label>5.2.</label>
<title>Module Signal</title>
<p>This module contains the template 
<monospace>Image</monospace>, and classes that specialize 
<monospace>Pixel: ColorPixel, BWPixel, PositionPixel, BWLabel</monospace> and 
<monospace>ColorLabel</monospace>. 
<monospace>Image&lt;PixType&gt;</monospace> has an array of elements of type 
<monospace>PixType</monospace> that represents the pixels. This template implements operations for image reading and writing images in PGM and PPM formats, and also guarantees access to operations in pixels and the wavelet transform.</p>
<p>
<monospace>Pixel</monospace> provides arithmetic operators used in transformations and convolutions, besides methods for extracting data. Types 
<monospace>ColorPixel</monospace> and 
<monospace>BWPixel</monospace> implement pixels for color and monochromatic images. Types 
<monospace>ColorLabel</monospace> and 
<monospace>BWLabel</monospace> implement color and monochromatic pixels also, but with an integer identification code (id). 
<monospace>PositionPixel</monospace> implements a gray level pixel with integer value; it stores the final disparity map values and an integer id.</p>
<p>The data structures that store pyramids of images, regardless the technique (wavelets or 2D filtering), are created by the classes 
<monospace>ImgPair</monospace> and 
<monospace>LowHigh</monospace>. The former returns the first pair of images in the pyramid, while the latter builds the remaining pairs. Classes 
<monospace>ImgSet</monospace> and 
<monospace>ImgListSet</monospace> implement the data structure that contains the four images generated by wavelets transform and the lists of the images generated in a sequence of transformations, respectively.</p>
<p>Class 
<monospace>DWT</monospace> has values and methods used by the Daubechies wavelet transform. An object of class 
<monospace>DWT</monospace> has filters of a transformation implemented in another class; this strategy is adopted to avoid the use of a virtual class. Classes 
<monospace>Haar</monospace> and 
<monospace>Daub4</monospace> implement the two types of wavelets used in this work, namely Daubechies and Haar.</p></sec>
<sec>
<label>5.3.</label>
<title>Memory Module</title>
<p>The result of the heuristic that calculates the desired depth for each pixel requires a complex data structure. We implemented linked lists that contain objects of class 
<monospace>Position</monospace>. These lists have different formats in each execution of the matching requiring, thus, dynamical allocation of memory. A problem is that a list may use a large region of memory that may, sometimes, grow up to several megabytes. This is beyond the size of the memory page of most modern computer architectures, which is usually 16 Kb. As current operational systems usually lose performance as they allocate and free, repeatedly, such amounts of memory, we developed a memory managing system for our library. To do that, we created the class 
<monospace>MemoryBuffer</monospace> containing a buffer, which is allocated at the initialization, and resources for managing it.</p>
<p>By using the class 
<monospace>MemoryBuffer</monospace>, tailored to the needs of our library, program execution is much faster than by using the memory management provided by the operating system. The directive 
<monospace>FAST_MEMORY</monospace>, available at compiling time, makes memory management still faster by disabling the checking of buffer limit. When used through this library, all data stored in these buffers are calculated locally and not brought from other programs. We remark that this strategy presents low risk for the system security.</p>
<p>The class 
<monospace>List</monospace> implements a low-level list that can deal with allocated memory, with or without the aid of an object of the type 
<monospace>MemoryBuffer</monospace>. The other classes of this module are 
<monospace>LinkedList</monospace> and 
<monospace>Stack</monospace>, that implement high-level data structures (linked list and stack, respectively), useful for other modules of the library.</p></sec>
<sec>
<label>5.4.</label>
<title>FuzzyLogic Module</title>
<p>Class 
<monospace>Fuzzy</monospace> represents the <italic>fuzzy</italic> hypotheses, with the following operators: ¬ (!), ⊕ (+), . (*), ∨ (|), ∧ (&amp;), ⇒ (&lt;), and ⇏ (&gt;).</p>
<p>Class 
<monospace>FuzzyImax</monospace> also composes this module. It is responsible for calculating the desired depth for each pixel. The return value of this method is of type 
<monospace>LinkedList&lt;LinkedList&lt;Position&gt;&gt;</monospace>, where 
<monospace>Position</monospace> stores a position in image. The output is a list of depth levels. For each depth, there is a list of pixels where disparity calculations start from that depth.</p>
<p>Note that each image pixel can be represented in more than a depth. In such case, matching must be performed at the least resolution depth in which the pixel is found. For example, if the sixth element of the returned list has position (1, 1), this means that for all pixels in the original image that lie in positions (<italic>x</italic>, <italic>y</italic>), <italic>x</italic>, <italic>y &lt;</italic> 2<sup>6</sup>, the greater level that can be used is 5 (starting from zero). It is possible for a pixel to appear twice in the list, for instance if position (2, 3) appears at the fourth list, for all pixels of the interval (x,y), 2 × 2<sup>4</sup> ≤ <italic>x &lt;</italic>3 ×2<sup>4</sup>, 3 × 2<sup>4</sup> ≤ <italic>y &lt;</italic>4 × 2<sup>4</sup>, that is, in the interval (<italic>x</italic>, <italic>y</italic>), <italic>x</italic>, <italic>y &lt;</italic> 2<sup>6</sup>, the depth must be up to 3, and not 5 anymore.</p>
<p>The easiest way of obtaining depth for each level is, thus, by traveling this list starting from the less coarse level and marking positions already visited. For that, pixels of the type 
<monospace>ColorLabel</monospace> and 
<monospace>BWLabel</monospace> are used.</p></sec>
<sec>
<label>5.5.</label>
<title>Using the library</title>
<p>The main classes for our application are 
<monospace>LeftImax</monospace> and 
<monospace>PlainCorr</monospace>, both derived from 
<monospace>Vision</monospace>. These classes implement the multiresolution with variable depth matching and the simple correlation methods. Objects of both classes are created using as parameters the left and right images, and the resulting image were disparity will be stored. Images can be created through allocation of a memory area or using an already allocated area. Image data are stored linewise as one-dimensional arrays.</p>
<p>Objects of classes 
<monospace>LeftImax</monospace> and 
<monospace>PlainCorr</monospace> can then be initialized with 
<monospace>setWindow</monospace>. For simple correlation, arguments are 
<monospace>setWindow (Window C, Interval B)</monospace>, where 
<monospace>C</monospace> is the comparison window and 
<monospace>B</monospace> is the search interval. In this implementation, arguments are 
<monospace>setWindow (Window C, Interval B, Interval R)</monospace>, where 
<monospace>C</monospace> and 
<monospace>B</monospace> are the same and 
<monospace>R</monospace> is the refining interval.</p>
<p>Classes 
<monospace>Window</monospace> and 
<monospace>Interval</monospace> define windows and intervals, respectively, as integer numbers. Windows can be created at any position, using 
<monospace>Window (int rmin, int rmax, int cmin, int cmax)</monospace>, where 
<monospace>rmin</monospace> and 
<monospace>rmax</monospace> are the extreme lines that the window contains, and 
<monospace>cmin</monospace> and 
<monospace>cmax</monospace> the extreme columns. Intervals can be created in arbitrary positions; 
<monospace>Interval (int min, int max)</monospace> creates the interval [
<monospace>min; max</monospace>].</p>
<p>After windows are initialized, the matching is performed using 
<monospace>match</monospace> of 
<monospace>LeftImax</monospace> or 
<monospace>PlainCorr</monospace>. For plain correlation, this method does not receive arguments, and in multiresolution matching with variable depth it receives 
<monospace>match</monospace> (
<monospace>Fuzzy</monospace> <italic>δ</italic>) as argument, where <italic>δ</italic> is as defined in <xref ref-type="disp-formula" rid="FD3">Equation (3)</xref>. After matching is performed, disparities can be read at the resulting image.</p>
<p>Memory allocation is always done in a transparent way to the programmer. All necessary memory is allocated at the creation of the objects of classes 
<monospace>LeftImax</monospace> and 
<monospace>PlainCorr</monospace>. Garbage collection, however, is not supported. This is not a problem in most applications, but might be an issue when dealing with images from several pairs of different cameras. The constructor of class 
<monospace>Fuzzy</monospace> receives only an argument of type double that represents, in this case, <italic>μ</italic>(<italic>δ</italic>).</p></sec></sec>
<sec sec-type="results">
<label>6.</label>
<title>Experimental Results</title>
<p>An example of pyramids is shown in <xref ref-type="fig" rid="f5-sensors-10-01093">Figure 5</xref>. The image to the left is the well known Lena data set, used as a benchmark in many applications because it presents both flat and detailed areas. Middle and right of <xref ref-type="fig" rid="f5-sensors-10-01093">Figure 5</xref> show the levels computed by the Daubechies wavelet decomposition (of size 4) and by our approach (computed using <italic>μ</italic>(<italic>δ</italic>) = 0.2), respectively; darker pixels are coarser and, thus, require more time to process.</p>
<p>We performed stereo measures using both approaches, but the use of wavelets (both Daubechies and Haar) for computing the pyramid turns out not being as efficient to subsequent phases as our proposal. Differently from other works [<xref ref-type="bibr" rid="b31-sensors-10-01093">31</xref>, <xref ref-type="bibr" rid="b65-sensors-10-01093">65</xref>], our approach employs the detail coefficients being, thus, more vulnerable to problems due to the transformation not being shift invariant. So we adopt the approach that uses the high and low pass filtered pyramid due to its better performance.</p>
<p>We contrasted plain correlation and multiresolution with variable depth matching using them on two well known pair of images, namely the Tsukuba and Corridor data sets, and comparing the results with the available ground truth. <xref ref-type="fig" rid="f6-sensors-10-01093">Figures 6</xref> and <xref ref-type="fig" rid="f7-sensors-10-01093">7</xref> show the pairs, along with the desired disparity maps (ground truths).</p>
<p>The matching results are compared with the desired ones in two ways, by visual analysis and by using an error metric. We use the mean error (<xref ref-type="disp-formula" rid="FD4">Equation (4)</xref>) and its standard deviation (<xref ref-type="disp-formula" rid="FD5">Equation (5)</xref>) as measures of precision:
<disp-formula id="FD4">
<label>(4)</label>
<mml:math display="block">
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mo>∑</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:msub>
<mml:mo> </mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>O</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>−</mml:mo>
<mml:mi>D</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mi>N</mml:mi></mml:mfrac>
<mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="FD5">
<label>(5)</label>
<mml:math display="block">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>N</mml:mi></mml:mfrac>
<mml:msqrt>
<mml:mrow>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi></mml:mrow></mml:munder>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>−</mml:mo>
<mml:mi>D</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow>
<mml:mo>.</mml:mo>
<mml:mo>,</mml:mo></mml:mrow></mml:msqrt></mml:mrow></mml:math></disp-formula>where <italic>O</italic> and <italic>D</italic> denote, respectively, the observed and desired disparity maps.</p>
<p>These error measurements are insensitive to the shape of the objects but are not so good for describing the quality of results on regions close to borders and edges. In this case, we use visual inspection that is, on the other hand, good in these tasks at the expense of being subjective. We therefore use these two complementary methods.</p>
<p>We used square correlation windows of side 3, 5, 7, 9, and 11 pixels, in order to test our approach with more than one window size. This means that, for a certain resolution level, given a pixel in one image (say the left) to be matched to a pixel in the other image (say right), a template window of a specified size will be taken around the pixel in the left image. Correlation measures will be calculated for this window with several windows of the same size taken around pixels in the epipolar line in the right image, within a certain search interval. When using the plain correlation algorithm, if a search interval is defined, it is always 70 pixels wide (not the whole epipolar line). We remark that, even with this optimization, plain correlation is still a time consuming algorithm. On the multiresolution matching, the refining interval is always 4 pixels wide.</p>
<sec>
<label>6.1.</label>
<title>Comparing Multiresolution Algorithms</title>
<p>We performed tests with two versions of our multiresolution matching. The first uses only scale images in all levels based on correlation measures. The second uses the detail images in each level and the scale images at the coarsest level, since at this level there is less detail.</p>
<p>Disparity maps generated by both versions of our multiresolution algorithm are shown in <xref ref-type="fig" rid="f8-sensors-10-01093">Figure 8</xref>. These results are obtained with a correlation window of size 3 and a threshold <italic>δ</italic> = 0.3. Note that borders and edges obtained by the algorithm that uses detail coefficients are sharper and better defined than the ones produced by the other technique, which only uses scale images. Besides that, the overall aspect of the former disparity map is better than the latter. <xref ref-type="fig" rid="f9-sensors-10-01093">Figure 9</xref> shows average measures of the errors obtained with several thresholds for both versions, keeping the correlation window at size 3. The minimum in both lines near the origin indicates that the threshold <italic>δ</italic> = 0.3 produced less errors. The use of scale images at all levels produces results with less errors, what is represented by the bottom lines in both graphs.</p>
<p>With the new fuzzy heuristic, multi-resolution matching is likely to start at the lowest level where there is a border adjacent to the pixel under assessment. The correlation of the images at the coarsest depth is, thus, highly prone to errors due to occlusions. Matching the details, instead of the raw images, should, in principle, lead to higher resistance to occlusions. That behavior was confirmed in our experiments, as the results obtained matching the scale images at each level were consistently better than those that employed detail information.</p></sec>
<sec>
<label>6.2.</label>
<title>Comparing Multiresolution and Plain Correlation</title>
<p>Here we contrast plain correlation with multiresolution algorithm. Disparity maps obtained by both algorithms are shown in <xref ref-type="fig" rid="f10-sensors-10-01093">Figure 10</xref>.</p>
<p>We made experiments with both approaches for window sizes of 3, 5, 7, 9 and 11. Standard deviation and mean distance of the measured errors for multiresolution approach with variable depth are shown in <xref ref-type="fig" rid="f11-sensors-10-01093">Figure 11</xref>. The same error measures produced by the technique without search interval are shown in <xref ref-type="fig" rid="f12-sensors-10-01093">Figure 12</xref> for the same window sizes.</p>
<p>We observe that larger windows generate smaller errors in both approaches. Multiresolution incurred in smaller errors than plain correlation in most cases, and it made mistakes as often as the plain correlation. Plain correlation produces errors distributed on bigger areas than our algorithm, which is hard to visualize in the disparity figures. By the results, on the overall, our approach performed better than plain correlation.</p>
<p><xref ref-type="fig" rid="f13-sensors-10-01093">Figure 13</xref> shows a comparison between the matching using the two algorithms (plain correlation and ours, with threshold <italic>δ</italic> = 0.1, 0.2) for the Tsukuba images, while <xref ref-type="fig" rid="f14-sensors-10-01093">Figure 14</xref> shows the same comparison applied to the Corridor images.</p>
<p><xref ref-type="fig" rid="f15-sensors-10-01093">Figure 15</xref> shows results of varying <italic>δ</italic>, with a search interval of 6 pixels wide.</p>
<p>We tested both algorithms also in the Corridor image, and the results are shown in <xref ref-type="fig" rid="f16-sensors-10-01093">Figure 16</xref>. In this case, a search interval of 10 pixels was imposed, a refinement interval of 4 and 6 pixels and square search window sizes of 5, 7, and 11 pixels. We tried with several limits (<italic>δ</italic>). <xref ref-type="fig" rid="f17-sensors-10-01093">Figure 17</xref> shows the time necessary for running this experiment. The best result of the matching is achieved for <italic>δ</italic> = 0.05 and the best times start at <italic>δ</italic> = 0.1. So, one has to weight between precision and time. The result of the matching is still better than plain correlation for <italic>δ</italic> = 0.05, whose error and standard deviation are shown in <xref ref-type="fig" rid="f18-sensors-10-01093">Figure 18</xref>.</p>
<p>The time needed for the matching processes is shown in <xref ref-type="fig" rid="f19-sensors-10-01093">Figure 19</xref> as a function of the threshold (<italic>δ</italic>). Multiresolution matching was consistently faster than plain correlation. It should be remarked that the execution time of our algorithm is much shorter than the plain correlation, on all thresholds, and it is even faster at small thresholds. Note that smaller correlation windows need less time. One has to weight between precision and available time when deciding the size to be used. Plain correlation errors usually increase a little from <italic>δ</italic> = 0, but they fall at near the same or smaller values near <italic>δ</italic> = 0.3, which seems to be an optimum threshold.</p></sec></sec>
<sec sec-type="discussion|conclusions">
<label>7.</label>
<title>Discussion and Conclusions</title>
<p>We have proposed a new approach to stereo matching using multiresolution in which the level with which to start is variable as a function of the images content. That is, in a given region, for example a smooth one without edges, our algorithm starts in coarser (deeper) levels in order to improve precision; in regions with edges or well textured, it starts in finer (lower) levels reaching, thus, better execution time. Our approach is based on fuzzy logic, in order to define the level with which to start the matching, for each image region. By the results, this fuzzy logic decision process has proven to be excellent for this calculation.</p>
<p>The ideal value for <italic>δ</italic> depends on the image content and on lighting conditions. Such value should, in principle, be tuned automatically or dynamically, as a function of the amount of texture, both locally and globally. Such measure can be performed by means of using the operators described in [<xref ref-type="bibr" rid="b66-sensors-10-01093">66</xref>, <xref ref-type="bibr" rid="b67-sensors-10-01093">67</xref>], or by calculating the image focus [<xref ref-type="bibr" rid="b68-sensors-10-01093">68</xref>, <xref ref-type="bibr" rid="b69-sensors-10-01093">69</xref>]. Our best results were obtained in the vicinity of <italic>δ</italic> = 0.1, and they are robust in the interval [0.05, 0.3).</p>
<p>The ideal window size is also dependent on the amount of texture in the original image pair. This parameter and can also be estimated using a similar procedure as the one proposed for <italic>δ</italic> [<xref ref-type="bibr" rid="b70-sensors-10-01093">70</xref>].</p>
<p>Initial experiments using wavelets in order to calculate the multiresolution pyramid were not good enough due to the use of the detail coefficients. We then decided to apply a sub-band filtering based on a low pass Gaussian and a high pass Laplacian masks to generate the two multiresolution pyramids: one of images and other of details. With this approach, stereo matching performed much better, that is, faster and with better precision in stereo measurements.</p>
<p>The main contribution of this work is the multiresolution approach, which differs from usual methods, as seen above, by using a new fuzzy logic heuristic for calculating the starting level.</p>
<p>Our algorithm was able to generate disparity maps faster than plain correlation, with smaller errors. We conjecture that the use of Gaussian and Laplacian masks reduced even further the errors that occur close to borders. That is, those filters have a smoothing effect in such regions, allowing the algorithm to better treat occlusions.</p>
<p>Recent research on stereo matching based on multi-resolution and fuzzy techniques has been conducted, as discussed in Section 2. However, when facing the problem of real-time stereo matching, as in robotics vision, correlation based algorithms are known to be the best [<xref ref-type="bibr" rid="b71-sensors-10-01093">71</xref>]. Despite that, in order to validate our approach with respect to techniques other than plain correlation, we tested two procedures, namely, a very fast multi-resolution approach [<xref ref-type="bibr" rid="b17-sensors-10-01093">17</xref>], and one based on fuzzy logic [<xref ref-type="bibr" rid="b41-sensors-10-01093">41</xref>].</p>
<p>In the fast multi-resolution approach [<xref ref-type="bibr" rid="b17-sensors-10-01093">17</xref>], we used 4 levels with images of sizes 96 × 72 and 64 × 48 pixels. Average errors of 30 and 35 pixels were observed, with standard deviation of 65 and 54, respectively. The time spent for disparity calculation was 5 and 12 milliseconds, making the technique a very efficient algorithm that runs in real time. Despite its efficiency, it has poor precision.</p>
<p>The fuzzy approach by Kumar and Chatterji [<xref ref-type="bibr" rid="b41-sensors-10-01093">41</xref>] leads to errors and time execution also bigger than the ones produced by our approach. We tested with a search interval of 64 pixels wide, with windows of sizes 3, 5, 7, 9 and 11, as reported in <xref ref-type="table" rid="t1-sensors-10-01093">Table 1</xref>. This method produces a mean error of 14 pixels with standard deviation 19, and time execution of 21 seconds when using window size of 3 × 3. When using a window of size 11 × 11, the error decreases to 7 with standard deviation 12, however the time execution increases to 241 seconds. <xref ref-type="fig" rid="f20-sensors-10-01093">Figure 20</xref> shows the disparity maps obtained with this approach (from top to bottom, window sizes of 3, 5, 7, 9 and 11 are shown).</p>
<p>These two techniques are, therefore, outperformed by our proposal when both precision and performance are required.</p></sec></body>
<back>
<ack>
<p>The authors would like to thank Brazilian Sponsoring Agency CNPq for the grants of Marcos Medeiros, Luiz Gonçalves and Alejandro C. Frery.</p></ack>
<ref-list>
<title>References</title>
<ref id="b1-sensors-10-01093"><label>1.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Horn</surname><given-names>B.K.P.</given-names></name></person-group><source>Robot Vision</source><publisher-name>The MIT Press</publisher-name><publisher-loc>Cambridge, MA, USA</publisher-loc><year>1986</year></citation></ref>
<ref id="b2-sensors-10-01093"><label>2.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Marr</surname><given-names>D.</given-names></name></person-group><source>Vision — A Computational Investigation into the Human Representation and Processing of Visual Information</source><publisher-name>The MIT Press</publisher-name><publisher-loc>Cambridge, MA, USA</publisher-loc><year>1982</year></citation></ref>
<ref id="b3-sensors-10-01093"><label>3.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Lotti</surname><given-names>J.L.</given-names></name><name><surname>Giraudon</surname><given-names>G.</given-names></name></person-group><article-title>Correlation algorithm with adaptive window for aerial image in stereo vision</article-title><source>Image and Signal Processing for Remote Sensing</source><person-group person-group-type="editor"><name><surname>Desachy</surname><given-names>J.</given-names></name></person-group><publisher-name>SPIE</publisher-name><publisher-loc>Bellingham, WA, USA</publisher-loc><year>1994</year><volume>2315</volume><fpage>76</fpage><lpage>87</lpage></citation></ref>
<ref id="b4-sensors-10-01093"><label>4.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Hespanha</surname><given-names>J.</given-names></name><name><surname>Dods</surname><given-names>Z.</given-names></name><name><surname>Hagger</surname><given-names>G.</given-names></name><name><surname>Morse</surname><given-names>A.</given-names></name></person-group><article-title>Decidability of robot positioning tasks using stereo vision system</article-title><conf-name>Proceedings of the 37th IEEE Conference on Decision and Control</conf-name><conf-loc>Tampa, FL, USA</conf-loc><conf-date>December 16–18, 1998</conf-date><fpage>1</fpage><lpage>6</lpage></citation></ref>
<ref id="b5-sensors-10-01093"><label>5.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Hubber</surname><given-names>E.</given-names></name><name><surname>Kortenkamp</surname><given-names>D.</given-names></name></person-group><article-title>Using stereo vision to pursue moving agents with a mobile robot</article-title><conf-name>Proceedings of the 1995 IEEE International Conference on Robotics and Automation</conf-name><conf-loc>Nagoya, Japan</conf-loc><conf-date>May 21–27, 1995</conf-date><volume>3</volume><fpage>2340</fpage><lpage>234</lpage></citation></ref>
<ref id="b6-sensors-10-01093"><label>6.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Matsumoto</surname><given-names>Y.</given-names></name><name><surname>Shibata</surname><given-names>T.</given-names></name><name><surname>Sakai</surname><given-names>K.</given-names></name><name><surname>Inaba</surname><given-names>M.</given-names></name><name><surname>Inoue</surname><given-names>H.</given-names></name></person-group><article-title>Real-time color stereo vision system for a mobile robot based on field multiplexing</article-title><conf-name>Proceedings of the 1997 IEEE International Conference on Robotics and Automation</conf-name><conf-loc>Albuquerque, NM, USA</conf-loc><conf-date>April 20–25, 1997</conf-date><volume>3</volume><fpage>1934</fpage><lpage>1939</lpage></citation></ref>
<ref id="b7-sensors-10-01093"><label>7.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Murray</surname><given-names>D.</given-names></name><name><surname>Little</surname><given-names>J.</given-names></name></person-group><article-title>Using real-time stereo vision for mobile robot navigation</article-title><source>Auton. Rob</source><year>2000</year><volume>8</volume><fpage>161</fpage><lpage>171</lpage><pub-id pub-id-type="doi">10.1023/A:1008987612352</pub-id></citation></ref>
<ref id="b8-sensors-10-01093"><label>8.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Ballard</surname><given-names>D.H.</given-names></name><name><surname>Brown</surname><given-names>C.M.</given-names></name></person-group><source>Computer Vision</source><publisher-name>Prentice-Hall</publisher-name><publisher-loc>Englewood Cliffs, NJ, USA</publisher-loc><year>1982</year></citation></ref>
<ref id="b9-sensors-10-01093"><label>9.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marr</surname><given-names>D.</given-names></name><name><surname>Poggio</surname><given-names>T.</given-names></name></person-group><article-title>Cooperative computation of stereo disparity</article-title><source>Science</source><year>1976</year><volume>194</volume><fpage>209</fpage><lpage>236</lpage><pub-id pub-id-type="doi">10.1126/science.959851</pub-id><pub-id pub-id-type="pmid">959851</pub-id></citation></ref>
<ref id="b10-sensors-10-01093"><label>10.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marr</surname><given-names>D.</given-names></name><name><surname>Poggio</surname><given-names>T.</given-names></name></person-group><article-title>A computational theory of human stereo vision</article-title><source>Proc. Royal Soc. London</source><year>1979</year><volume>204</volume><fpage>301</fpage><lpage>328</lpage><pub-id pub-id-type="doi">10.1098/rspb.1979.0029</pub-id></citation></ref>
<ref id="b11-sensors-10-01093"><label>11.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Trucco</surname><given-names>E.</given-names></name><name><surname>Verri</surname><given-names>A.</given-names></name></person-group><source>Introductory Techniques for 3D Computer Vision</source><publisher-name>Prentice Hall</publisher-name><publisher-loc>Englewood Cliffs, NJ, USA</publisher-loc><year>1998</year></citation></ref>
<ref id="b12-sensors-10-01093"><label>12.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fleet</surname><given-names>D.J.</given-names></name><name><surname>Wagner</surname><given-names>H.</given-names></name><name><surname>Heeger</surname><given-names>D.J.</given-names></name></person-group><article-title>Neural encoding of binocular disparity: Energy models, position shifts and phase shifts</article-title><source>Vis. Res</source><year>1997</year><volume>36</volume><fpage>1839</fpage><lpage>1857</lpage></citation></ref>
<ref id="b13-sensors-10-01093"><label>13.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Gonçalves</surname><given-names>L.M.G.</given-names></name><name><surname>Oliveira</surname><given-names>A.A.F.</given-names></name></person-group><article-title>Pipeline stereo matching in binary images</article-title><conf-name>Proceedings of the XI International Conference on Computer Graphics and Image Processing (SIBGRAPI’98)</conf-name><conf-loc>Rio de Janeiro, Brazil</conf-loc><conf-date>October 20–23, 1998</conf-date><fpage>426</fpage><lpage>433</lpage></citation></ref>
<ref id="b14-sensors-10-01093"><label>14.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Oliveira</surname><given-names>A.A.F.</given-names></name><name><surname>Gonçalves</surname><given-names>L.M.</given-names></name><name><surname>Matias</surname><given-names>I.O.</given-names></name></person-group><article-title>Enhancing the Volumetric Approach to Stereo Matching</article-title><conf-name>Proceedings of the 14th Brazilian Symposium on Computer Graphics and Image Processing</conf-name><conf-loc>Florianopolis, Brazil</conf-loc><conf-date>October 15–18, 2001</conf-date><fpage>218</fpage><lpage>225</lpage></citation></ref>
<ref id="b15-sensors-10-01093"><label>15.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Theimer</surname><given-names>W.M.</given-names></name><name><surname>Mallot</surname><given-names>H.A.</given-names></name></person-group><article-title>Phase-based binocular vergence control and depth reconstruction using active vision</article-title><source>Comput. Vis. Graph., Image Process.: Image Underst</source><year>1994</year><volume>60</volume><fpage>343</fpage><lpage>358</lpage><pub-id pub-id-type="doi">10.1006/cviu.1994.1067</pub-id></citation></ref>
<ref id="b16-sensors-10-01093"><label>16.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zitnick</surname><given-names>C.L.</given-names></name><name><surname>Kanade</surname><given-names>T.</given-names></name></person-group><article-title>A cooperative algorithm for stereo matching and occlusion detection</article-title><source>Trans. Pattern Anal. Mach. Intell</source><year>2000</year><volume>22</volume><fpage>675</fpage><lpage>684</lpage><pub-id pub-id-type="doi">10.1109/34.865184</pub-id></citation></ref>
<ref id="b17-sensors-10-01093"><label>17.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gonçalves</surname><given-names>L.M.G.</given-names></name><name><surname>Oliveira</surname><given-names>A.A.</given-names></name><name><surname>Grupen</surname><given-names>R.A.</given-names></name><name><surname>Wheeler</surname><given-names>D.</given-names></name><name><surname>Fagg</surname><given-names>A.</given-names></name></person-group><article-title>Tracing patterns and attention: humanoid robot cognition</article-title><source>IEEE Intell. Syst. Their Appl</source><year>2000</year><volume>15</volume><fpage>70</fpage><lpage>77</lpage><pub-id pub-id-type="doi">10.1109/5254.867915</pub-id></citation></ref>
<ref id="b18-sensors-10-01093"><label>18.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Nishihara</surname><given-names>K.</given-names></name></person-group><source>Practical Real-Time Stereo Matcher</source><comment>AI lab technical report;</comment><publisher-name>Massachusetts Institute of Technology</publisher-name><publisher-loc>Cambridge, MA, USA</publisher-loc><year>1984</year></citation></ref>
<ref id="b19-sensors-10-01093"><label>19.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ohta</surname><given-names>Y.</given-names></name><name><surname>Kanade</surname><given-names>T.</given-names></name></person-group><article-title>Stereo by Intra and inter-scanline searching using dynamic programming</article-title><source>Trans. Pattern Anal. Mach. Intell</source><year>1985</year><comment>PAMI-7,</comment><fpage>139</fpage><lpage>154</lpage></citation></ref>
<ref id="b20-sensors-10-01093"><label>20.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Ullman</surname><given-names>S.</given-names></name></person-group><source>High-level Vision: Object Recognition and Visual Cognition</source><publisher-name>The MIT Press</publisher-name><publisher-loc>Cambridge, MA, USA</publisher-loc><year>1996</year></citation></ref>
<ref id="b21-sensors-10-01093"><label>21.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Segundo</surname><given-names>S.S.</given-names></name><name><surname>Bezerra</surname><given-names>J.P.</given-names></name><name><surname>Silveira</surname><given-names>R.W.</given-names></name><name><surname>Gonçalves</surname><given-names>L.M.G.</given-names></name></person-group><article-title>Development of a multiresolution stereo vision system in real time</article-title><conf-name>Proceedings of the IEEE Latin America Robotics Symposium (LARS2005)</conf-name><conf-loc>Sao Luis, Brazil</conf-loc><conf-date>September 21–23, 2005</conf-date><fpage>1</fpage><lpage>8</lpage></citation></ref>
<ref id="b22-sensors-10-01093"><label>22.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Hirschmüller</surname><given-names>H.</given-names></name></person-group><article-title>Improvements in real-time correlation-based stereo vision</article-title><conf-name>Proceedings of the IEEE Workshop on Stereo and Multi-Baseline Vision</conf-name><conf-loc>Kauai, HI, USA</conf-loc><conf-date>December 9–10 2001</conf-date><fpage>141</fpage><lpage>148</lpage></citation></ref>
<ref id="b23-sensors-10-01093"><label>23.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname><given-names>C.</given-names></name></person-group><article-title>Fast stereo matching using rectangular subregioning and 3d maximum-surface techniques</article-title><source>Int. J. Comput. Vis</source><year>2002</year><volume>47</volume><fpage>99</fpage><lpage>117</lpage><pub-id pub-id-type="doi">10.1023/A:1014585622703</pub-id></citation></ref>
<ref id="b24-sensors-10-01093"><label>24.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Witkin</surname><given-names>A.P.</given-names></name></person-group><article-title>Scale-space filtering</article-title><conf-name>Proceedings of the 8th International Joint Conference on Artificial Intelligence</conf-name><conf-loc>Karlsruhe, Germany</conf-loc><conf-date>August 7–12, 1983</conf-date><volume>1</volume><fpage>1019</fpage><lpage>1022</lpage></citation></ref>
<ref id="b25-sensors-10-01093"><label>25.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Lindeberg</surname><given-names>T.</given-names></name></person-group><source>Scale-Space Theory in Computer Vision</source><publisher-name>Kluwer Academic Publishers</publisher-name><publisher-loc>Norwell, MA, USA</publisher-loc><year>1994</year></citation></ref>
<ref id="b26-sensors-10-01093"><label>26.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Burt</surname><given-names>P.</given-names></name><name><surname>Adelson</surname><given-names>T.</given-names></name></person-group><article-title>The Laplacian pyramid as a compact image code</article-title><source>IEEE Trans. Commun</source><year>1983</year><volume>9</volume><fpage>532</fpage><lpage>540</lpage></citation></ref>
<ref id="b27-sensors-10-01093"><label>27.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Uhr</surname><given-names>L.</given-names></name></person-group><article-title>Layered ‘recognition cone’ networks that preprocess, classify and describe</article-title><source>IEEE Trans. Comput</source><year>1972</year><comment>C-21,</comment><fpage>758</fpage><lpage>768</lpage></citation></ref>
<ref id="b28-sensors-10-01093"><label>28.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Itti</surname><given-names>L.</given-names></name><name><surname>Koch</surname><given-names>C.</given-names></name><name><surname>Niebur</surname><given-names>E.</given-names></name></person-group><article-title>A model of saliency-based visual attention for rapid scene analysis</article-title><source>IEEE Trans. Patten Anal. Mach. Intell</source><year>1998</year><volume>20</volume><fpage>1254</fpage><lpage>1259</lpage><pub-id pub-id-type="doi">10.1109/34.730558</pub-id></citation></ref>
<ref id="b29-sensors-10-01093"><label>29.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Sandon</surname><given-names>P.A.</given-names></name></person-group><article-title>Logarithmic search in a winner-take-all network</article-title><conf-name>Proceedings of the IEEE International Joint Conference on Neural Networks</conf-name><conf-loc>Singapore</conf-loc><conf-date>November 18–21, 1991</conf-date><fpage>454</fpage><lpage>459</lpage></citation></ref>
<ref id="b30-sensors-10-01093"><label>30.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Daubechies</surname><given-names>I.</given-names></name></person-group><article-title>Orthonormal bases of compactly supported wavelets</article-title><source>Commun. Pure Appl. Math</source><year>1988</year><volume>41</volume><fpage>909</fpage><lpage>996</lpage><pub-id pub-id-type="doi">10.1002/cpa.3160410705</pub-id></citation></ref>
<ref id="b31-sensors-10-01093"><label>31.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Mallat</surname><given-names>S.</given-names></name></person-group><article-title>Wavelets for a vision</article-title><source>Proc. IEEE</source><year>1996</year><volume>84</volume><fpage>604</fpage><lpage>614</lpage></citation></ref>
<ref id="b32-sensors-10-01093"><label>32.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Iocchi</surname><given-names>L.</given-names></name><name><surname>Konolige</surname><given-names>K.</given-names></name></person-group><article-title>A multiresolution stereo vision for mobile robots</article-title><conf-name>Proceedings of AI*IA’98 Workshop on New Trends in Robotics</conf-name><conf-loc>Padua, Italy</conf-loc><conf-date>October 1998</conf-date></citation></ref>
<ref id="b33-sensors-10-01093"><label>33.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Magarey</surname><given-names>J.</given-names></name><name><surname>Dick</surname><given-names>A.</given-names></name></person-group><article-title>Multiresolution stereo image matching using complex wavelets</article-title><conf-name>Proceedings of International Conference on Pattern Recognition</conf-name><conf-loc>Brisbane, Australia</conf-loc><conf-date>August 17–20, 1998</conf-date><volume>1</volume><fpage>4</fpage><lpage>7</lpage></citation></ref>
<ref id="b34-sensors-10-01093"><label>34.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Pan</surname><given-names>H.P.</given-names></name></person-group><article-title>General stereo image matching using symetric complex wavelets</article-title><source>Wavelets Aplications in Signal and Image Processing</source><publisher-name>SPIE</publisher-name><publisher-loc>Bellingham, WA, USA</publisher-loc><year>1996</year><volume>2825</volume><fpage>697</fpage><lpage>720</lpage></citation></ref>
<ref id="b35-sensors-10-01093"><label>35.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Gonçalves</surname><given-names>L.M.G.</given-names></name><name><surname>Grupen</surname><given-names>R.A.</given-names></name></person-group><article-title>Towards a real-time framework for visual monitoring tasks</article-title><conf-name>Proceedings of the 3rd IEEE International Workshop on Visual Surveillance</conf-name><conf-loc>Dublin, Ireland</conf-loc><conf-date>July 1, 2000</conf-date><publisher-name>IEEE Computer Society Press</publisher-name><publisher-loc>Los Alamitos, CA, USA</publisher-loc><year>2000</year><fpage>47</fpage><lpage>55</lpage></citation></ref>
<ref id="b36-sensors-10-01093"><label>36.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Piater</surname><given-names>J.</given-names></name><name><surname>Ramamritham</surname><given-names>K.</given-names></name><name><surname>Grupen</surname><given-names>R.A.</given-names></name></person-group><article-title>Learning real-time stereo vergence control</article-title><conf-name>Proceedings of the IEEE International Symposium on Intelligent Control/Intelligent Systems and Semiotics</conf-name><conf-loc>Cambridge, MA, USA</conf-loc><conf-date>September 15–17, 1999</conf-date><fpage>272</fpage><lpage>277</lpage></citation></ref>
<ref id="b37-sensors-10-01093"><label>37.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hoff</surname><given-names>W.</given-names></name><name><surname>Ahuja</surname><given-names>N.</given-names></name></person-group><article-title>Surfaces from stereo: integrating feature maching, disparity estimation, and contour detection</article-title><source>IEEE Trans. Pattern Anal. Mach. Intell</source><year>1989</year><volume>11</volume><fpage>121</fpage><lpage>136</lpage><pub-id pub-id-type="doi">10.1109/34.16709</pub-id></citation></ref>
<ref id="b38-sensors-10-01093"><label>38.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Udupa</surname><given-names>J.K.</given-names></name><name><surname>Grevera</surname><given-names>G.J.</given-names></name></person-group><article-title>Go digital, go fuzzy</article-title><source>Pattern Recogn. Lett</source><year>2002</year><volume>23</volume><fpage>743</fpage><lpage>754</lpage><pub-id pub-id-type="doi">10.1016/S0167-8655(01)00149-0</pub-id></citation></ref>
<ref id="b39-sensors-10-01093"><label>39.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bigand</surname><given-names>A.</given-names></name><name><surname>Bouwmans</surname><given-names>T.</given-names></name><name><surname>Dubus</surname><given-names>J.P.</given-names></name></person-group><article-title>A new stereomatching algorithm based on linear features and the fuzzy integral</article-title><source>Pattern Recogn. Lett</source><year>2001</year><volume>22</volume><fpage>133</fpage><lpage>146</lpage><pub-id pub-id-type="doi">10.1016/S0167-8655(00)00105-7</pub-id></citation></ref>
<ref id="b40-sensors-10-01093"><label>40.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname><given-names>D.H.</given-names></name><name><surname>Choi</surname><given-names>W.Y.</given-names></name><name><surname>Park</surname><given-names>R.H.</given-names></name></person-group><article-title>Stereo matching technique based on the theory of possibility</article-title><source>Pattern Recogn. Lett</source><year>1992</year><volume>13</volume><fpage>735</fpage><lpage>744</lpage><pub-id pub-id-type="doi">10.1016/0167-8655(92)90103-7</pub-id></citation></ref>
<ref id="b41-sensors-10-01093"><label>41.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kumar</surname><given-names>S.S.</given-names></name><name><surname>Chatterji</surname><given-names>B.N.</given-names></name></person-group><article-title>Stereo matching algorithms based on fuzzy approach</article-title><source>Int. J. Pattern Recogn. Artif. Intell</source><year>2002</year><volume>16</volume><fpage>883</fpage><lpage>899</lpage><pub-id pub-id-type="doi">10.1142/S0218001402002040</pub-id></citation></ref>
<ref id="b42-sensors-10-01093"><label>42.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pajares</surname><given-names>G.</given-names></name><name><surname>de la Cruz</surname><given-names>J.M.</given-names></name></person-group><article-title>A new learning strategy for stereo matching derived from a fuzzy clustering method</article-title><source>Fuzzy Sets Syst</source><year>2000</year><volume>110</volume><fpage>413</fpage><lpage>427</lpage><pub-id pub-id-type="doi">10.1016/S0165-0114(97)00382-5</pub-id></citation></ref>
<ref id="b43-sensors-10-01093"><label>43.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pajares</surname><given-names>G.</given-names></name><name><surname>de la Cruz</surname><given-names>J.M.</given-names></name></person-group><article-title>Fuzzy cognitive maps for stereovision matching</article-title><source>Pattern Recogn</source><year>2006</year><volume>39</volume><fpage>2101</fpage><lpage>2114</lpage><pub-id pub-id-type="doi">10.1016/j.patcog.2006.04.003</pub-id></citation></ref>
<ref id="b44-sensors-10-01093"><label>44.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sainarayanan</surname><given-names>G.</given-names></name><name><surname>Nagarajan</surname><given-names>R.</given-names></name><name><surname>Yaacob</surname><given-names>S.</given-names></name></person-group><article-title>Fuzzy image processing scheme for autonomous navigation of human blind</article-title><source>Appl. Soft Comput</source><year>2007</year><volume>7</volume><fpage>257</fpage><lpage>264</lpage><pub-id pub-id-type="doi">10.1016/j.asoc.2005.06.005</pub-id></citation></ref>
<ref id="b45-sensors-10-01093"><label>45.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shamir</surname><given-names>L.</given-names></name></person-group><article-title>A proposed stereo matching algorithm for noisy sets of color images</article-title><source>Comput. Geosc</source><year>2007</year><volume>33</volume><fpage>1052</fpage><lpage>1063</lpage><pub-id pub-id-type="doi">10.1016/j.cageo.2006.11.013</pub-id></citation></ref>
<ref id="b46-sensors-10-01093"><label>46.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tolt</surname><given-names>G.</given-names></name><name><surname>Kalaykov</surname><given-names>I.</given-names></name></person-group><article-title>Measures based on fuzzy similarity for stereo matching of color images</article-title><source>Soft Comput</source><year>2006</year><volume>10</volume><fpage>1117</fpage><lpage>1126</lpage><pub-id pub-id-type="doi">10.1007/s00500-005-0034-6</pub-id></citation></ref>
<ref id="b47-sensors-10-01093"><label>47.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Doulamis</surname><given-names>N.D.</given-names></name><name><surname>Doulamis</surname><given-names>A.D.</given-names></name><name><surname>Avrithis</surname><given-names>Y.S.</given-names></name><name><surname>Ntalianis</surname><given-names>K.S.</given-names></name><name><surname>Kollias</surname><given-names>S.D.</given-names></name></person-group><article-title>Efficient summarization of stereoscopic video sequences</article-title><source>IEEE Trans. Circ. Syst. Video Technol</source><year>2000</year><volume>10</volume><fpage>501</fpage><lpage>517</lpage><pub-id pub-id-type="doi">10.1109/76.844996</pub-id></citation></ref>
<ref id="b48-sensors-10-01093"><label>48.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>McCane</surname><given-names>B.</given-names></name><name><surname>Caelli</surname><given-names>T.</given-names></name><name><surname>DeVel</surname><given-names>O.</given-names></name></person-group><article-title>Learning to recognize 3D objects using sparse depth and intensity information</article-title><source>Int. J. Pattern Recogn. Artif. Intell</source><year>1997</year><volume>11</volume><fpage>909</fpage><lpage>931</lpage><pub-id pub-id-type="doi">10.1142/S021800149700041X</pub-id></citation></ref>
<ref id="b49-sensors-10-01093"><label>49.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nagata</surname><given-names>T.</given-names></name><name><surname>Zha</surname><given-names>H.B.</given-names></name></person-group><article-title>Recognizing and locating a known object from multiple images</article-title><source>IEEE Trans. Rob. Autom</source><year>1991</year><volume>7</volume><fpage>434</fpage><lpage>448</lpage><pub-id pub-id-type="doi">10.1109/70.86075</pub-id></citation></ref>
<ref id="b50-sensors-10-01093"><label>50.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nagarajan</surname><given-names>R.</given-names></name><name><surname>Sainarayanan</surname><given-names>G.</given-names></name><name><surname>Yaacob</surname><given-names>S.</given-names></name><name><surname>Porle</surname><given-names>R.R.</given-names></name></person-group><article-title>Fuzzy-rule-based object identification methodology for NAVI system</article-title><source>EURASIP Journal on Applied Signal Processing</source><year>2005</year><volume>2005</volume><fpage>2260</fpage><lpage>2267</lpage><pub-id pub-id-type="doi">10.1155/ASP.2005.2260</pub-id></citation></ref>
<ref id="b51-sensors-10-01093"><label>51.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chatterji</surname><given-names>G.J.</given-names></name><name><surname>Chatterji</surname><given-names>B.N.</given-names></name></person-group><article-title>Fuzzy compactness based adaptive window approach for image matching in stereo vision</article-title><source>Neural Inf. Process</source><year>2004</year><volume>3316</volume><fpage>935</fpage><lpage>940</lpage></citation></ref>
<ref id="b52-sensors-10-01093"><label>52.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aja-Fernandez</surname><given-names>S.</given-names></name><name><surname>Alberola-Lopez</surname><given-names>C.</given-names></name><name><surname>Ruiz-Alzola</surname><given-names>J.</given-names></name></person-group><article-title>A fuzzy-controlled Kalman filter applied to stereo-visual tracking schemes</article-title><source>Signal Process</source><year>2003</year><volume>83</volume><fpage>101</fpage><lpage>120</lpage><pub-id pub-id-type="doi">10.1016/S0165-1684(02)00381-X</pub-id></citation></ref>
<ref id="b53-sensors-10-01093"><label>53.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chow</surname><given-names>Y.H.</given-names></name><name><surname>Chung</surname><given-names>R.</given-names></name></person-group><article-title>VisionBug: A hexapod robot controlled by stereo cameras</article-title><source>Auton. Rob</source><year>2002</year><volume>13</volume><fpage>259</fpage><lpage>276</lpage><pub-id pub-id-type="doi">10.1023/A:1020520209488</pub-id></citation></ref>
<ref id="b54-sensors-10-01093"><label>54.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marichal</surname><given-names>G.N.</given-names></name><name><surname>Toledo</surname><given-names>J.</given-names></name><name><surname>Acosta</surname><given-names>L.</given-names></name><name><surname>Gonzalez</surname><given-names>E.J.</given-names></name><name><surname>Coll</surname><given-names>G.</given-names></name></person-group><article-title>A neuro-fuzzy method applied to the motors of a stereovision system</article-title><source>Eng. Appl. Artif. Intell</source><year>2007</year><volume>20</volume><fpage>951</fpage><lpage>958</lpage><pub-id pub-id-type="doi">10.1016/j.engappai.2006.12.010</pub-id></citation></ref>
<ref id="b55-sensors-10-01093"><label>55.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Medeiros</surname><given-names>M.D.</given-names></name><name><surname>Gonçalves</surname><given-names>L.M.</given-names></name></person-group><article-title>A fuzzy approach to stereo vision using pyramidal images with different starting level</article-title><conf-name>Proceedings of International Joint Conference on Neural Networks</conf-name><conf-loc>Orlando, FL, USA</conf-loc><conf-date>August 12–17, 2007</conf-date><fpage>2914</fpage><lpage>2919</lpage></citation></ref>
<ref id="b56-sensors-10-01093"><label>56.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Tsotos</surname><given-names>J.K.</given-names></name></person-group><article-title>A complexity level analysis of vision</article-title><conf-name>Proceedings of International Conference on Computer Vision: Human and Machine Vision Workshop</conf-name><conf-loc>London, UK</conf-loc><conf-date>June 1987</conf-date><volume>1</volume></citation></ref>
<ref id="b57-sensors-10-01093"><label>57.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tsotsos</surname><given-names>J.K.</given-names></name></person-group><article-title>Knowledge organization and its role in representation and interpretation for time-varying data: the ALVEN system</article-title><source>Comput. Intell</source><year>1985</year><volume>1</volume><fpage>16</fpage><lpage>32</lpage><pub-id pub-id-type="doi">10.1111/j.1467-8640.1985.tb00056.x</pub-id></citation></ref>
<ref id="b58-sensors-10-01093"><label>58.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Burt</surname><given-names>P.</given-names></name></person-group><article-title>Smart sensing within a pyramid vision machine</article-title><source>Proc. IEEE</source><year>1988</year><volume>76</volume><fpage>1006</fpage><lpage>1015</lpage><pub-id pub-id-type="doi">10.1109/5.5971</pub-id></citation></ref>
<ref id="b59-sensors-10-01093"><label>59.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sandon</surname><given-names>P.</given-names></name></person-group><article-title>Simulating visual attention</article-title><source>J. Cogn. Neurosci</source><year>1990</year><volume>2</volume><fpage>213</fpage><lpage>231</lpage><pub-id pub-id-type="doi">10.1162/jocn.1990.2.3.213</pub-id></citation></ref>
<ref id="b60-sensors-10-01093"><label>60.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tsotsos</surname><given-names>J.</given-names></name><name><surname>Culhane</surname><given-names>S.</given-names></name><name><surname>Wai</surname><given-names>W.</given-names></name><name><surname>Lai</surname><given-names>Y.</given-names></name><name><surname>Davis</surname><given-names>N.</given-names></name><name><surname>Nuflo</surname><given-names>F.</given-names></name></person-group><article-title>Modeling visual attention via selective tuning</article-title><source>Artif. Intell. Mag</source><year>1995</year><volume>78</volume><fpage>507</fpage><lpage>547</lpage><pub-id pub-id-type="doi">10.1016/0004-3702(95)00025-9</pub-id></citation></ref>
<ref id="b61-sensors-10-01093"><label>61.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lindeberg</surname><given-names>T.</given-names></name></person-group><article-title>Feature detection with automatic scale selection</article-title><source>Int. J. Comput. Vis</source><year>1998</year><volume>30</volume><fpage>77</fpage><lpage>116</lpage></citation></ref>
<ref id="b62-sensors-10-01093"><label>62.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lowe</surname><given-names>D.G.</given-names></name></person-group><article-title>Distinctive image features from scale-invariant keypoints</article-title><source>Int. J. Comput. Vis</source><year>2004</year><volume>60</volume><fpage>91</fpage><lpage>110</lpage><pub-id pub-id-type="doi">10.1023/B:VISI.0000029664.99615.94</pub-id></citation></ref>
<ref id="b63-sensors-10-01093"><label>63.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Mallat</surname><given-names>S.</given-names></name></person-group><source>A Wavelet Tour Of Signal Processing</source><edition>2 Ed</edition><publisher-name>Academic</publisher-name><publisher-loc>San Diego, CA, USA</publisher-loc><year>1999</year></citation></ref>
<ref id="b64-sensors-10-01093"><label>64.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Gonzalez</surname><given-names>R.C.</given-names></name><name><surname>Woods</surname><given-names>R.E.</given-names></name></person-group><source>Digital Image Processing</source><publisher-name>Addison-Wesley Publication Company</publisher-name><publisher-loc>Reading, MA, USA</publisher-loc><year>1992</year></citation></ref>
<ref id="b65-sensors-10-01093"><label>65.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Moon</surname><given-names>P.</given-names></name><name><surname>de Jager</surname><given-names>G.</given-names></name></person-group><article-title>An investigation into the applicability of the wavelet transform to digital stereo matching</article-title><conf-name>Proceedings 1993 IEEE South African Symposium on Communications and Signal Processing</conf-name><conf-loc>Johannesburg, South Africa</conf-loc><conf-date>August 6, 1993</conf-date><fpage>75</fpage><lpage>79</lpage></citation></ref>
<ref id="b66-sensors-10-01093"><label>66.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Karu</surname><given-names>K.</given-names></name><name><surname>Jain</surname><given-names>A.K.</given-names></name><name><surname>Bolles</surname><given-names>R.M.</given-names></name></person-group><article-title>Is there any texture in the image?</article-title><source>Pattern Recogn</source><year>1996</year><volume>29</volume><fpage>1437</fpage><lpage>1446</lpage><pub-id pub-id-type="doi">10.1016/0031-3203(96)00004-0</pub-id></citation></ref>
<ref id="b67-sensors-10-01093"><label>67.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Malik</surname><given-names>J.</given-names></name><name><surname>Belongie</surname><given-names>S.</given-names></name><name><surname>Leung</surname><given-names>T.</given-names></name><name><surname>Shi</surname><given-names>J.</given-names></name></person-group><article-title>Contour and texture analysis for image segmentation</article-title><source>Int. J. Comput. Vis</source><year>2001</year><volume>43</volume><fpage>7</fpage><lpage>27</lpage><pub-id pub-id-type="doi">10.1023/A:1011174803800</pub-id></citation></ref>
<ref id="b68-sensors-10-01093"><label>68.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Ahmad</surname><given-names>M.B.</given-names></name></person-group><article-title>Focus measure operator using 3D gradient</article-title><conf-name>Proceedings International Conference on Machine Vision</conf-name><conf-loc>Islamabad, Pakistan</conf-loc><conf-date>December 28–29, 2007</conf-date><fpage>18</fpage><lpage>22</lpage></citation></ref>
<ref id="b69-sensors-10-01093"><label>69.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Favaro</surname><given-names>P.</given-names></name><name><surname>Soatto</surname><given-names>S.</given-names></name></person-group><article-title>A geometric approach to shape from defocus</article-title><source>IEEE Trans. Pattern Anal. Mach. Intell</source><year>2005</year><volume>27</volume><fpage>406</fpage><lpage>417</lpage><pub-id pub-id-type="doi">10.1109/TPAMI.2005.43</pub-id><pub-id pub-id-type="pmid">15747795</pub-id></citation></ref>
<ref id="b70-sensors-10-01093"><label>70.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Woo</surname><given-names>D.M.</given-names></name><name><surname>Schultz</surname><given-names>H.</given-names></name><name><surname>Riseman</surname><given-names>E.</given-names></name><name><surname>Hanson</surname><given-names>A.</given-names></name></person-group><article-title>Performance of correlation-based stereo algorithm with respect to the change of the window size</article-title><conf-name>Proceedings 5th Pacific Rim Conference on Multimedia</conf-name><publisher-name>Springer</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>2005</year><volume>3332</volume><fpage>778</fpage><lpage>785</lpage></citation></ref>
<ref id="b71-sensors-10-01093"><label>71.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yoon</surname><given-names>S.</given-names></name><name><surname>Park</surname><given-names>S.K.</given-names></name><name><surname>Kang</surname><given-names>S.</given-names></name><name><surname>Kwak</surname><given-names>Y.K.</given-names></name></person-group><article-title>Fast correlation-based stereo matching with the reduction of systematic errors</article-title><source>Pattern Recogn. Lett</source><year>2005</year><volume>26</volume><fpage>2221</fpage><lpage>2231</lpage><pub-id pub-id-type="doi">10.1016/j.patrec.2005.03.037</pub-id></citation></ref></ref-list>
<sec sec-type="display-objects">
<title>Figures and Table</title>
<fig id="f1-sensors-10-01093" position="float">
<label>Figure 1.</label>
<caption>
<p>Creation of a pyramid with wavelet transform.</p></caption>
<graphic xlink:href="sensors-10-01093f1.gif"/></fig>
<fig id="f2-sensors-10-01093" position="float">
<label>Figure 2.</label>
<caption>
<p>Illustration of the creation of a pyramid with three levels.</p></caption>
<graphic xlink:href="sensors-10-01093f2.gif"/></fig>
<fig id="f3-sensors-10-01093" position="float">
<label>Figure 3.</label>
<caption>
<p>Cartoon image and <italic>δ</italic> map.</p></caption>
<graphic xlink:href="sensors-10-01093f3.gif"/></fig>
<fig id="f4-sensors-10-01093" position="float">
<label>Figure 4.</label>
<caption>
<p>Scheme of the software architecture.</p></caption>
<graphic xlink:href="sensors-10-01093f4.gif"/></fig>
<fig id="f5-sensors-10-01093" position="float">
<label>Figure 5.</label>
<caption>
<p>Computed pyramids. Left to right: original image, Daubechies wavelet levels, and levels computed by our proposal.</p></caption>
<graphic xlink:href="sensors-10-01093f5.gif"/></fig>
<fig id="f6-sensors-10-01093" position="float">
<label>Figure 6.</label>
<caption>
<p>Tsukuba data set. From left to right: left image, right image, desired disparity map.</p></caption>
<graphic xlink:href="sensors-10-01093f6.gif"/></fig>
<fig id="f7-sensors-10-01093" position="float">
<label>Figure 7.</label>
<caption>
<p>Tsukuba data set. From left to right: left image, right image, desired disparity map.</p></caption>
<graphic xlink:href="sensors-10-01093f7.gif"/></fig>
<fig id="f8-sensors-10-01093" position="float">
<label>Figure 8.</label>
<caption>
<p>Disparity maps generated by multiresolution matching using the detail images at the coarsest level (level), and using always the scale images (right).</p></caption>
<graphic xlink:href="sensors-10-01093f8.gif"/></fig>
<fig id="f9-sensors-10-01093" position="float">
<label>Figure 9.</label>
<caption>
<p>Errors measured with both algorithms: mean distance <italic>d</italic> (left) and standard deviation <italic>s</italic> (right).</p></caption>
<graphic xlink:href="sensors-10-01093f9.gif"/></fig>
<fig id="f10-sensors-10-01093" position="float">
<label>Figure 10.</label>
<caption>
<p>Disparities obtained by plain correlation (right) and multiresolution (left) with correlation windows of size 3 (top) and 5 (bottom) pixels, using <italic>δ</italic> = 0.3.</p></caption>
<graphic xlink:href="sensors-10-01093f10.gif"/></fig>
<fig id="f11-sensors-10-01093" position="float">
<label>Figure 11.</label>
<caption>
<p>Measured errors for multiresolution with variable depth: Tsukuba pair.</p></caption>
<graphic xlink:href="sensors-10-01093f11.gif"/></fig>
<fig id="f12-sensors-10-01093" position="float">
<label>Figure 12.</label>
<caption>
<p>Measured errors for plain correlation with no search interval: Tsukuba pair.</p></caption>
<graphic xlink:href="sensors-10-01093f12.gif"/></fig>
<fig id="f13-sensors-10-01093" position="float">
<label>Figure 13.</label>
<caption>
<p>Visual comparison between disparity maps generated by correlation (right column) and multiresolution matching with <italic>δ</italic> ∈ {0.1, 0.2} (middle and left columns, respectively), Tsukuba data set, using windows of size 3, 5, 9 (top, middle and bottom rows, resp.), 4 pixels search interval.</p></caption>
<graphic xlink:href="sensors-10-01093f13.gif"/></fig>
<fig id="f14-sensors-10-01093" position="float">
<label>Figure 14.</label>
<caption>
<p>Visual comparison for the Corridor images between disparity maps generated by correlation (right column) and multiresolution matching with <italic>δ</italic> ∈ {0.1, 0.2} (middle and left columns, respectively), using windows of size 5, 9, 13 (top, middle and bottom rows, resp.), 10 pixels search interval</p></caption>
<graphic xlink:href="sensors-10-01093f14.gif"/></fig>
<fig id="f15-sensors-10-01093" position="float">
<label>Figure 15.</label>
<caption>
<p>Disparity maps generated by multiresolution matching with <italic>δ</italic> ∈ {0, 0.2, 0.3, 0.4} (columns from left to right) and windows of size 3, 5, 7 (rows from top to bottom), 6 pixels search interval.</p></caption>
<graphic xlink:href="sensors-10-01093f15.gif"/></fig>
<fig id="f16-sensors-10-01093" position="float">
<label>Figure 16.</label>
<caption>
<p>Disparity maps generated, Corridor, by generated by correlation (right column) and multiresolution matching multiresolution matching with <italic>δ</italic> ∈ {0, 0.1, 0.2} (columns from left to right), windows of size 5, 7, 11 (from top to bottom), refinement windows of 4 pixels.</p></caption>
<graphic xlink:href="sensors-10-01093f16.gif"/></fig>
<fig id="f17-sensors-10-01093" position="float">
<label>Figure 17.</label>
<caption>
<p>Time needed for computing the disparity by our approach in the Corridor pair.</p></caption>
<graphic xlink:href="sensors-10-01093f17.gif"/></fig>
<fig id="f18-sensors-10-01093" position="float">
<label>Figure 18.</label>
<caption>
<p>Error and standard variation for the Corridor images.</p></caption>
<graphic xlink:href="sensors-10-01093f18.gif"/></fig>
<fig id="f19-sensors-10-01093" position="float">
<label>Figure 19.</label>
<caption>
<p>Required time.</p></caption>
<graphic xlink:href="sensors-10-01093f19.gif"/></fig>
<fig id="f20-sensors-10-01093" position="float">
<label>Figure 20.</label>
<caption>
<p>Disparity maps, Kumar and Chatterji algorithm, for window of sizes 3, 5, 7, 9, and 11 (from top to bottom).</p></caption>
<graphic xlink:href="sensors-10-01093f20.gif"/></fig>
<table-wrap id="t1-sensors-10-01093" position="float">
<label>Table 1.</label>
<caption>
<p>Performance measures, Kumar and Chatterji’s algorithm, as a function of the window size.</p></caption>
<table frame="hsides" rules="none">
<thead>
<tr>
<th align="right" valign="middle">Window</th>
<th align="right" valign="middle">Mean</th>
<th align="right" valign="middle">Standard</th>
<th align="right" valign="middle">Execution</th></tr>
<tr>
<th colspan="4" align="right" valign="middle">
<hr/></th></tr>
<tr>
<th align="right" valign="middle">Size</th>
<th align="right" valign="middle">Error</th>
<th align="right" valign="middle">Deviation</th>
<th align="right" valign="middle">Time</th></tr></thead>
<tbody>
<tr>
<td align="right" valign="middle">3</td>
<td align="right" valign="middle">14.12</td>
<td align="right" valign="middle">19.74</td>
<td align="right" valign="middle">21.00</td></tr>
<tr>
<td align="right" valign="middle">5</td>
<td align="right" valign="middle">10.66</td>
<td align="right" valign="middle">16.04</td>
<td align="right" valign="middle">53.31</td></tr>
<tr>
<td align="right" valign="middle">7</td>
<td align="right" valign="middle">8.92</td>
<td align="right" valign="middle">13.96</td>
<td align="right" valign="middle">98.48</td></tr>
<tr>
<td align="right" valign="middle">9</td>
<td align="right" valign="middle">8.09</td>
<td align="right" valign="middle">13.07</td>
<td align="right" valign="middle">161.06</td></tr>
<tr>
<td align="right" valign="middle">11</td>
<td align="right" valign="middle">7.61</td>
<td align="right" valign="middle">12.56</td>
<td align="right" valign="middle">241.15</td></tr></tbody></table></table-wrap></sec></back></article>
