<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Sensors</journal-id>
<journal-title>Sensors</journal-title>
<issn pub-type="epub">1424-8220</issn>
<publisher>
<publisher-name>Molecular Diversity Preservation International (MDPI)</publisher-name></publisher></journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3390/s120912661</article-id>
<article-id pub-id-type="publisher-id">sensors-12-12661</article-id>
<article-categories>
<subj-group>
<subject>Article</subject></subj-group></article-categories>
<title-group>
<article-title>Multispectral Image Feature Points</article-title></title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Aguilera</surname><given-names>Cristhian</given-names></name><xref ref-type="aff" rid="af1-sensors-12-12661"><sup>1</sup></xref><xref ref-type="corresp" rid="c1-sensors-12-12661"><sup>*</sup></xref></contrib>
<contrib contrib-type="author">
<name><surname>Barrera</surname><given-names>Fernando</given-names></name><xref ref-type="aff" rid="af2-sensors-12-12661"><sup>2</sup></xref></contrib>
<contrib contrib-type="author">
<name><surname>Lumbreras</surname><given-names>Felipe</given-names></name><xref ref-type="aff" rid="af2-sensors-12-12661"><sup>2</sup></xref><xref ref-type="aff" rid="af3-sensors-12-12661"><sup>3</sup></xref></contrib>
<contrib contrib-type="author">
<name><surname>Sappa</surname><given-names>Angel D.</given-names></name><xref ref-type="aff" rid="af2-sensors-12-12661"><sup>2</sup></xref></contrib>
<contrib contrib-type="author">
<name><surname>Toledo</surname><given-names>Ricardo</given-names></name><xref ref-type="aff" rid="af2-sensors-12-12661"><sup>2</sup></xref><xref ref-type="aff" rid="af3-sensors-12-12661"><sup>3</sup></xref></contrib></contrib-group>
<aff id="af1-sensors-12-12661">
<label>1</label> Department of Electrical and Electronics Engineering, Collao 1202, University of Bío-Bío, 4051381 Concepción, Chile</aff>
<aff id="af2-sensors-12-12661">
<label>2</label> Computer Vision Center, Edifici O, Campus UAB, 08193 Bellaterra, Barcelona, Spain; E-Mails: <email>jfbarrera@cvc.uab.es</email> (F.B.); <email>felipe@cvc.uab.es</email> (F.L.); <email>angel.sappa@cvc.uab.es</email> (A.D.S.); <email>ricardo@cvc.uab.es</email> (R.T.)</aff>
<aff id="af3-sensors-12-12661">
<label>3</label> Computer Science Department, Edifici O, Campus UAB, 08193 Bellaterra, Barcelona, Spain</aff>
<author-notes>
<corresp id="c1-sensors-12-12661">
<label>*</label> Author to whom correspondence should be addressed; E-Mail: <email>cristhia@ubiobio.cl</email>; Tel.: +56-41-311-1287; Fax: +56-41-311-1013.</corresp></author-notes>
<pub-date pub-type="collection">
<year>2012</year></pub-date>
<pub-date pub-type="epub">
<day>17</day>
<month>09</month>
<year>2012</year></pub-date>
<volume>12</volume>
<issue>9</issue>
<fpage>12661</fpage>
<lpage>12672</lpage>
<history>
<date date-type="received">
<day>06</day>
<month>07</month>
<year>2012</year></date>
<date date-type="rev-recd">
<day>24</day>
<month>08</month>
<year>2012</year></date>
<date date-type="accepted">
<day>30</day>
<month>08</month>
<year>2012</year></date></history>
<permissions>
<copyright-statement>© 2012 by the authors; licensee MDPI, Basel, Switzerland.</copyright-statement>
<copyright-year>2012</copyright-year>
<license>
<p>This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).</p></license></permissions>
<abstract>
<title>This paper presents a novel feature point descriptor for the multispectral image case</title>
<p>Far-Infrared and Visible Spectrum images. It allows matching interest points on images of the same scene but acquired in different spectral bands. Initially, points of interest are detected on both images through a SIFT-like based scale space representation. Then, these points are characterized using an Edge Oriented Histogram (EOH) descriptor. Finally, points of interest from multispectral images are matched by finding nearest couples using the information from the descriptor. The provided experimental results and comparisons with similar methods show both the validity of the proposed approach as well as the improvements it offers with respect to the current state-of-the-art.</p></abstract>
<kwd-group>
<kwd>multispectral image descriptor</kwd>
<kwd>color and infrared images</kwd>
<kwd>feature point descriptor</kwd></kwd-group></article-meta></front>
<body>
<sec sec-type="intro">
<label>1.</label>
<title>Introduction</title>
<p>The analysis of multispectral or multiband imaging has recently attracted the attention of the research community for applications in the areas of image and video processing (e.g., [<xref ref-type="bibr" rid="b1-sensors-12-12661">1</xref>–<xref ref-type="bibr" rid="b4-sensors-12-12661">4</xref>]). The information provided by the images from different spectral bands, either directly or supplementary to the information provided by the visible spectrum, can help tackle different problems in an efficient way. However, the processing of images from spectral bands outside the visible spectrum requires the development of new tools, or the adaptation of current ones, opening new challenges for the image and video processing community. Multispectral analysis has been widely studied in the remote sensing field, where satellite images are commonly registered and fused. Recently, due to advances in the technology, multispectral imaging is being used in applications such as video surveillance or driver assistance (e.g., [<xref ref-type="bibr" rid="b5-sensors-12-12661">5</xref>–<xref ref-type="bibr" rid="b7-sensors-12-12661">7</xref>]); just to mention a few; where visible (VS: 0.4–0.7 μm) images are merged with Near-Infrared (NIR: 0.75–1.4 μm), Short-Wave Infrared (SWIR: 1.4–3 μm), Mid-Wave Infrared (MWIR: 3–8 μm) or Long-Wave Infrared (LWIR: 8–15 μm) ones.</p>
<p>Feature points are at the base of different computer vision problems and are generally studied at three levels—feature point: detection, descriptor and matching. One of the most successful and widely used approaches is the SIFT algorithm [<xref ref-type="bibr" rid="b8-sensors-12-12661">8</xref>]. Over the last decade several variations of the original algorithm, as well as some novel approaches, have been proposed focusing on the perceived weaknesses of SIFT (e.g., [<xref ref-type="bibr" rid="b9-sensors-12-12661">9</xref>,<xref ref-type="bibr" rid="b10-sensors-12-12661">10</xref>]). All the approaches mentioned before are intended for applications that involve images from the same spectral band, generally Visible Spectrum (VS) images. Recently, applications that combine feature points from different spectral band images are being developed. These works are mainly based on the use of classical SIFT algorithm, or minor modifications to the classical approach. For instance [<xref ref-type="bibr" rid="b11-sensors-12-12661">11</xref>] proposes a scale restriction criteria in order to reduce the number of incorrect matches of SIFT when it is adopted to tackle the multispectral case. Accurate matching results have been reported when the spectral bands of the pair of images are somehow near (VS-NIR); however, further improvements are needed for tackling those cases where the spectral bands are far away from each other (VS-LWIR). Actually, recent studies have shown that as the spectral bands go away from visible spectrum, classical feature descriptors generally used for finding matching and registration of images in the visible spectrum (e.g., SIFT, SURF, <italic>etc.</italic>) are useless (e.g., [<xref ref-type="bibr" rid="b12-sensors-12-12661">12</xref>,<xref ref-type="bibr" rid="b13-sensors-12-12661">13</xref>]).</p>
<p>The difficulty in finding correspondences between feature points from VS-LWIR images results from the nonlinear relationship between pixel intensities. Variations in LWIR intensities are related to variations in the temperature of the objects, while variations in VS intensities come from color object and light reflections. Therefore, this nonlinear relationship results in a lack of correlation between their respective gradients. Furthermore, LWIR images appear smoother, with loss of detail and texture [<xref ref-type="bibr" rid="b14-sensors-12-12661">14</xref>], so that the detection of corners, as candidates for local descriptor points, is also poorly favored. In conclusion, most of the image processing tools that use gradient of pixels based descriptors need to be adapted, or otherwise they become useless. <xref ref-type="fig" rid="f1-sensors-12-12661">Figure 1</xref> shows a couple of images, from the same scenario, obtained with a camera working in the visible spectrum and a camera in the infrared spectrum; their corresponding histograms are provided in the right column showing the lack of contrast in the infrared image as well as showing how some details are missed. In addition to the lack of contrast and missed details it can be observed that in the infrared image there is a big difference in the transformation of intensities with respect to the one presented in the visible spectrum. This fact becomes critical to define the method that best describes this kind of images, since the transformations of intensity pixels in the infrared spectrum are non linear or non correlated with respect to the corresponding visible ones. Additionally, it should be noticed that in the LWIR images other kinds of information is available, which is not present in the visible one. The latter is an important characteristic where there is not enough illumination in the given scene, for instance when driving a car at night (e.g., [<xref ref-type="bibr" rid="b6-sensors-12-12661">6</xref>,<xref ref-type="bibr" rid="b15-sensors-12-12661">15</xref>]).</p>
<p>From the three levels mentioned above—feature point: detection, descriptor and matching—the feature point descriptor becomes <italic>the key element</italic> when images from VS and LWIR spectrum are considered. Even though the percentage of correct matching can be improved by introducing modifications at the matching stage, results remain very poor when SIFT, or modifications of it (e.g., [<xref ref-type="bibr" rid="b11-sensors-12-12661">11</xref>,<xref ref-type="bibr" rid="b16-sensors-12-12661">16</xref>]), are used as descriptors in the multispectral case (VS-LWIR), as will be presented in the Experimental Results section. This low correspondence rate is mainly due to the lack of descriptive capability of gradient in LWIR images, which in general appears smoother with loss of detail and texture. Actually, even in the cases where the detected feature points correspond to the same position in both images, the matching results remain quite poor due to the differences in their gradient orientation, which is used as a descriptor by SIFT. In other words, since the descriptors are by nature different it is not correct to try to use them for finding similarities and matches.</p></sec>
<sec>
<label>2.</label>
<title>Proposed Approach</title>
<p>The proposed scheme consists of a scale-space pyramid, like the one used by SIFT. Similarly, invariant features are used, but by modifying the feature vector in such a way to incorporate spatial information from the contours of each keypoint without using gradient information. This allows us to generate a correlated parameter space in both the VS and LWIR images. Our proposal uses a descriptor based on the edge histogram. This edge orientation histogram describes the shapes and contours from LWIR images, keeping in the scale-space their invariance. <xref ref-type="fig" rid="f2-sensors-12-12661">Figure 2</xref> presents a flow chart of the proposed method. It consists of three steps, namely detection, description and matching, as detailed next.</p>
<sec>
<label>2.1.</label>
<title>Feature Point Detection</title>
<p>Feature points, which will be also referred to as keypoints, are detected by using a scale-space pyramidal representation. They correspond to the maxima and minima of a difference of Gaussiass applied over a series of smoothed and resampled images [<xref ref-type="bibr" rid="b8-sensors-12-12661">8</xref>]. The result from this first stage is a set of stable keypoints, similar to those resulting from classical SIFT. Note that this result is invariant to scale, position and orientation, which are needed for registration applications. In the current implementation potential keypoints are obtained by setting the SIFT like detector with the following parameters: sigma = 1.2 and threshold = 40. The same parameters' setting is used in both multispectral images resulting in a set of <bold>P</bold><sub>VS</sub> and <bold>P</bold><sub>LWIR</sub> keypoints. Each keypoint is denoted by a vector (<italic>x<sub>i</sub>, y<sub>i</sub>, σ<sub>i</sub></italic>), where (<italic>x<sub>i</sub>, y<sub>i</sub></italic>) correspond to the location and (<italic>σ<sub>i</sub></italic>) is the scale of that pyramidal representation where the keypoint appears.</p></sec>
<sec>
<label>2.2.</label>
<title>Feature Point Description</title>
<p>Detected feature points are described through the use of an Edge-Oriented-Histogram, which is the main contribution of current work and will be referred to henceforth as EOH. These EOHs incorporate spatial information from the contours in the neighbourhood of each feature point. They describe the shapes and contours from both VS and LWIR images. The idea behind the proposed descriptor is motivated by the nonlinear relationships between image intensities so that the descriptor should be mainly based on region information instead of pixel information. Hence, the proposed descriptor is based on the use of histograms of contours' orientations in the neighbourhood of the given keypoints. Initially, both images (VS and LWIR) are represented by means of their edges, which are extracted using the Canny edge detector algorithm [<xref ref-type="bibr" rid="b17-sensors-12-12661">17</xref>]. In all the cases Canny's thresholds have been automatically set relative to the highest value of the gradient magnitude of the image and σ = 4. Once both images are represented by means of their edges, feature points detected in Section 2.1 are described as illustrated in <xref ref-type="fig" rid="f3-sensors-12-12661">Figure 3</xref>. The different steps of the feature point descriptor are detailed below.</p>
<p>Firstly, a region of N × N pixels, centered at the given keypoint, is obtained. Then, this region is split up into 4 × 4 = 16 subregions. Finally, each one of these subregions is represented by a histogram of contours computed following the Edge Histogram Descriptor (EHD) [<xref ref-type="bibr" rid="b18-sensors-12-12661">18</xref>] of the MPEG-7 standard [<xref ref-type="bibr" rid="b19-sensors-12-12661">19</xref>]. This histogram represents the spatial distribution of four directional edges and one non-directional edge (five bins in total); these bins correspond to contour orientations of 0, 45, 90, and 135 degrees; additionally a bin with no orientation (n.o.) is considered, which corresponds to those areas that do not contain a contour. Every pixel of each subregion contributes to a bin of the histogram according to the five filters, of 3 × 3 pixels, shown in <xref ref-type="fig" rid="f3-sensors-12-12661">Figure 3</xref>. The filter with the largest value is used as a criterion to vote in the corresponding bin. After processing all the elements of the 16 subregions a vector with 80 elements is obtained (16 × 5 = 80). This vector is normalized (by dividing each one of its components by the Euclidean norm) and used as the vector of characteristics, which will be also referred to as the descriptor vector for every keypoint.</p>
<p>The selection of the right window size (N) for every keypoint is an important factor of the proposed scheme. A window with a too small or big size will increase the number of wrong matching of keypoints. <xref ref-type="table" rid="t1-sensors-12-12661">Table 1</xref> presents the performance of the proposed approach when windows with different sizes are considered to compute the vector of characteristics mentioned above; these values represent the average obtained with the whole data set, which contains 100 pair of multispectral images. It can be seen that the best performance is obtained when windows with sizes in between 80 × 80 and 100 × 100 are used.</p></sec>
<sec>
<label>2.3.</label>
<title>Feature Point Matching</title>
<p>This stage finds nearest keypoints from different spectral images, in the descriptor space by filtering feature vectors with low descriptive elements. This process is based on the Euclidean distance between the corresponding descriptor vectors. Like the SIFT algorithm, in order to increase the matching robustness, two keypoints are matched only if the ratio between the first and second best matches is smaller than a given threshold. Additionally, the matching robustness is increased by discarding those keypoints that have some of their subregions without information (<italic>i.e.</italic>, subregions only containing a few contours); finally, the scale restriction proposed in [<xref ref-type="bibr" rid="b11-sensors-12-12661">11</xref>], and used in [<xref ref-type="bibr" rid="b16-sensors-12-12661">16</xref>], is also considered in the current work to improve the performance of the proposed approach. This scale restriction process consists in discarding incorrect matches using the scale difference (<italic>SD</italic>) of the given pair of keypoints <italic>P<sub>VS</sub>(x<sub>VS</sub>,y<sub>VS</sub>,σ<sub>VS</sub>)</italic> and <italic>P<sub>LWIR</sub>(x<sub>LWIR</sub>,y<sub>LWIR</sub>,σ<sub>LWIR</sub>)</italic>:
<disp-formula id="FD1">
<label>(1)</label>
<mml:math id="mm1" display="block">
<mml:semantics id="sm1">
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mi>D</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>V</mml:mi>
<mml:mi>S</mml:mi></mml:mrow></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mtext mathvariant="italic">LWIR</mml:mtext></mml:mrow></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>σ</mml:mi>
<mml:mrow>
<mml:mi>V</mml:mi>
<mml:mi>S</mml:mi></mml:mrow></mml:msub>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mi>σ</mml:mi>
<mml:mrow>
<mml:mtext mathvariant="italic">LWIR</mml:mtext></mml:mrow></mml:msub></mml:mrow></mml:semantics></mml:math></disp-formula>as proposed in [<xref ref-type="bibr" rid="b11-sensors-12-12661">11</xref>], the match is rejected if it does not satisfy the following scale restriction criteria:
<disp-formula id="FD2">
<label>(2)</label>
<mml:math id="mm2" display="block">
<mml:semantics id="sm2">
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>&lt;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mi>D</mml:mi>
<mml:mo>&lt;</mml:mo>
<mml:mi>b</mml:mi></mml:mrow></mml:semantics></mml:math></disp-formula>where the values <italic>a</italic> and <italic>b</italic> are obtained by first computing a histogram of <italic>SDs</italic> of all matches; then, the peak in that <italic>SDs</italic> histogram, which is noted as 
<inline-formula>
<mml:math id="mm3" display="inline">
<mml:semantics id="sm3">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mi>D</mml:mi></mml:mrow>
<mml:mo>¯</mml:mo></mml:mover></mml:mrow></mml:semantics></mml:math></inline-formula>, is extracted and used to define the sough values:
<disp-formula id="FD3">
<label>(3)</label>
<mml:math id="mm4" display="block">
<mml:semantics id="sm4">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>=</mml:mo>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mi>D</mml:mi></mml:mrow>
<mml:mo>¯</mml:mo></mml:mover>
<mml:mo>−</mml:mo>
<mml:mn mathvariant="italic">0.9</mml:mn></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mi>b</mml:mi>
<mml:mo>=</mml:mo>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mi>D</mml:mi></mml:mrow>
<mml:mo>¯</mml:mo></mml:mover>
<mml:mo>+</mml:mo>
<mml:mn mathvariant="italic">0.9</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p><xref ref-type="fig" rid="f4-sensors-12-12661">Figure 4</xref> shows the different steps of the proposed algorithm for a pair of keypoints from VS and LWIR images.</p></sec></sec>
<sec sec-type="results">
<label>3.</label>
<title>Experimental Results</title>
<p>The proposed approach has been evaluated with a data set containing 100 pairs of VS-LWIR images. These images have been obtained using the cameras detailed in <xref ref-type="table" rid="t2-sensors-12-12661">Table 2</xref>.</p>
<p>In order to make the comparisons easier, in all the cases the images were rectified and aligned so that matches should be found in horizontal lines. This rectification and alignment process is just applied to facilitate the evaluation of the performance of the proposed approach as well as for further comparisons with other algorithms. The data set contains outdoor images of different urban scenarios. <xref ref-type="fig" rid="f5-sensors-12-12661">Figure 5</xref> shows three pairs of multispectral images contained in the data set.</p>
<p>This data set is available through our website (<ext-link xlink:href="http://www.cvc.uab.es/adas/projects/simeve/" ext-link-type="uri">http://www.cvc.uab.es/adas/projects/simeve/</ext-link>) for evaluation and comparison with other multispectral feature point detectors and descriptors. The performance of the proposed approach is evaluated with a Precision and Recall scheme. Furthermore, results are compared with other implementations. <xref ref-type="table" rid="t3-sensors-12-12661">Table 3</xref> shows average results, computed over the whole data set, obtained with different descriptors; we can appreciate that the proposed approach has the best performance when compared with all the other methods.</p>
<p><xref ref-type="fig" rid="f6-sensors-12-12661">Figure 6</xref> just illustrates the results obtained with only two pairs of VS-LWIR images using both SIFT and the proposed EOH based descriptors (quantitative evaluation over the whole data set is presented in <xref ref-type="table" rid="t3-sensors-12-12661">Table 3</xref>). Note that since these pairs correspond to rectified images the matching should correspond to keypoints lying in the same row; in other words the segments that connect keypoints should be horizontal lines.</p>
<p>It can be appreciated that in the first case (top illustration) SIFT only matches correctly a few keypoints (two points), from a total of 34 keypoints, which represents about 5% success. On the contrary, when the proposed EOH based approach is considered, 36 keypoints, from a total of 51, are correctly matched (about 68% success). In the second case (bottom illustration), the classical SIFT algorithm matches 32 keypoints, from a total of 69 keypoints (46% success); while the proposed approach reaches the 90% success (28 keypoints are correctly matched from a total of 31).</p>
<p>Finally, although out of the scope of current work, <xref ref-type="fig" rid="f7-sensors-12-12661">Figure 7</xref> shows how the proposed approach can also be used as a feature point descriptor when images from the same spectral band are considered. In these illustrations in can be seen that the proposed approach reaches 86.2% of success when a pair of VS-VS images is considered and 84.6% of success in the LWIR-LWIR case. Note that these two cases are just presented as illustrations; more rigorous evaluations and comparisons are needed, over a large set of image pairs, to obtain more solid conclusions.</p></sec>
<sec sec-type="conclusions">
<label>4.</label>
<title>Conclusions</title>
<p>A novel multispectral feature descriptor method is presented, which is useful to register VS-LWIR images. It is based on the use of a SIFT-like detector to extract feature points. Then, an Edge Oriented Histogram (EOH) based approach is proposed as a robust descriptor for characterizing multispectral keypoints. Finally, matches are obtained by finding nearest couples in the feature description space. The proposed approach has been evaluated with a large set of multispectral pairs of images and compared with the current state-of-the-art. Finally, it is shown that the proposed approach could also be also used when pairs of images from the same spectral band are considered. Future work will focus on studying the use of more elaborated edge descriptor techniques in order to improve the matching percentages.</p></sec></body>
<back>
<ack>
<p>This work was supported in part by MECESUP2 Postdoctoral program and Regular Project 100710 3/R from the University of Bío-Bío, Chile; and by the Spanish Government under Research Program Consolider Ingenio 2010: MIPRCV (CSD2007-00018) and Projects TIN2011-29494-C03-02 and TIN2011-25606.</p></ack>
<ref-list>
<title>References</title>
<ref id="b1-sensors-12-12661"><label>1.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Mahmudul</surname><given-names>H.</given-names></name><name><surname>Xiuping</surname><given-names>J.</given-names></name><name><surname>Robles</surname><given-names>A.</given-names></name></person-group><article-title>Multi-Spectral Remote Sensing Image Registration via Spatial Relationship Analysis on SIFT Keypoints</article-title><conf-name>Proceedings of IEEE International Geoscience &amp; Remote Sensing Symposium</conf-name><conf-loc>Honolulu, HI, USA</conf-loc><conf-date>25– 30 July 2010</conf-date><fpage>1011</fpage><lpage>1014</lpage></citation></ref>
<ref id="b2-sensors-12-12661"><label>2.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Brown</surname><given-names>M.</given-names></name><name><surname>Su</surname><given-names>S.</given-names></name></person-group><article-title>Multi-Spectral SIFT for Scene Category Recognition</article-title><conf-name>Proceedings of IEEE Conference on Computer Vision and Pattern Recognition</conf-name><conf-loc>Colorado Springs, CO, USA</conf-loc><conf-date>20– 25 June 2011</conf-date><fpage>177</fpage><lpage>184</lpage></citation></ref>
<ref id="b3-sensors-12-12661"><label>3.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Abdelrahman</surname><given-names>M.</given-names></name><name><surname>Ali</surname><given-names>A.</given-names></name><name><surname>Aly</surname><given-names>A.</given-names></name><name><surname>Farag</surname><given-names>A.</given-names></name></person-group><article-title>Precise Change Detection In Multi-Spectral Remote Sensing Imagery Using SIFT-based Registration</article-title><conf-name>Proceedings of IEEE International Conference on Multimedia Technology</conf-name><conf-loc>Hangzhou, China</conf-loc><conf-date>26– 28 July 2011</conf-date><fpage>6238</fpage><lpage>6242</lpage></citation></ref>
<ref id="b4-sensors-12-12661"><label>4.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Lalonde</surname><given-names>M.</given-names></name><name><surname>Byrns</surname><given-names>D.</given-names></name><name><surname>Gagnon</surname><given-names>L.</given-names></name><name><surname>Teasdale</surname><given-names>N.</given-names></name><name><surname>Laurendeau</surname><given-names>D.</given-names></name></person-group><article-title>Real-Time Eye Blink Detection with GPU-Based SIFT Tracking</article-title><conf-name>Proceedings of Fourth Canadian Conference on Computer and Robot Vision</conf-name><conf-loc>Montreal, QC, Canada</conf-loc><conf-date>28–30 May 2007</conf-date><fpage>481</fpage><lpage>487</lpage></citation></ref>
<ref id="b5-sensors-12-12661"><label>5.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Leykin</surname><given-names>A.</given-names></name><name><surname>Hammoud</surname><given-names>R.</given-names></name></person-group><article-title>Pedestrian tracking by fusion of thermal-visible surveillance videos</article-title><source>Mach. Vis. Appl.</source><year>2010</year><volume>21</volume><fpage>587</fpage><lpage>595</lpage><pub-id pub-id-type="doi">10.1007/s00138-008-0176-5</pub-id></citation></ref>
<ref id="b6-sensors-12-12661"><label>6.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krotosky</surname><given-names>S.</given-names></name><name><surname>Trivedi</surname><given-names>M.</given-names></name></person-group><article-title>On color-, infrared-, and multimodal-stereo approaches to pedestrian detection</article-title><source>IEEE Trans. Intell. Transp. Syst.</source><year>2007</year><volume>8</volume><fpage>619</fpage><lpage>629</lpage><pub-id pub-id-type="doi">10.1109/TITS.2007.908722</pub-id></citation></ref>
<ref id="b7-sensors-12-12661"><label>7.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Jung</surname><given-names>S.</given-names></name><name><surname>Eledath</surname><given-names>J.</given-names></name><name><surname>Johansson</surname><given-names>S.</given-names></name><name><surname>Mathevon</surname><given-names>V.</given-names></name></person-group><article-title>Egomotion Estimation in Monocular Infra-Red Image Sequence for Night Vision Applications</article-title><conf-name>Proceedings of IEEE Workshop on Applications of Computer Vision</conf-name><conf-loc>Austin, TX, USA</conf-loc><conf-date>20– 21 February 2007</conf-date><fpage>8</fpage></citation></ref>
<ref id="b8-sensors-12-12661"><label>8.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Lowe</surname><given-names>D.G.</given-names></name><name><surname>Local</surname><given-names>Feature</given-names></name><name><surname>View</surname><given-names>Clustering</given-names></name></person-group><article-title>for 3D Object Recognition</article-title><conf-name>Proceedings of IEEE Conference on Computer Vision and Pattern Recognition</conf-name><conf-loc>Kauai, HI, USA</conf-loc><conf-date>8– 14 December 2001</conf-date><fpage>682</fpage><lpage>688</lpage></citation></ref>
<ref id="b9-sensors-12-12661"><label>9.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Lazebnik</surname><given-names>S.</given-names></name><name><surname>Schmid</surname><given-names>C.</given-names></name><name><surname>Ponce</surname><given-names>J.</given-names></name></person-group><article-title>Semi-Local Affine Parts for Object Recognition</article-title><conf-name>Proceedings of British Machine Vision Conference</conf-name><conf-loc>London, UK</conf-loc><conf-date>7–9 September 2004</conf-date><fpage>779</fpage><lpage>788</lpage></citation></ref>
<ref id="b10-sensors-12-12661"><label>10.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Bay</surname><given-names>H.</given-names></name><name><surname>Tuytelaars</surname><given-names>T.</given-names></name><name><surname>Gool</surname><given-names>L.V.</given-names></name></person-group><article-title>SURF: Speeded Up Robust Features</article-title><conf-name>Proceedings of European Conference on Computer Vision</conf-name><conf-loc>Graz, Austria</conf-loc><conf-date>7–13 May 2006</conf-date><fpage>404</fpage><lpage>417</lpage></citation></ref>
<ref id="b11-sensors-12-12661"><label>11.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yi</surname><given-names>Z.</given-names></name><name><surname>Zhiguo</surname><given-names>C.</given-names></name><name><surname>Yang</surname><given-names>X.</given-names></name></person-group><article-title>Multi-spectral remote image registration based on SIFT</article-title><source>Electron. Lett.</source><year>2008</year><volume>44</volume><fpage>107</fpage><lpage>108</lpage><pub-id pub-id-type="doi">10.1049/el:20082477</pub-id></citation></ref>
<ref id="b12-sensors-12-12661"><label>12.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Istenic</surname><given-names>R.</given-names></name><name><surname>Heric</surname><given-names>D.</given-names></name><name><surname>Ribaric</surname><given-names>S.</given-names></name><name><surname>Zazula</surname><given-names>D.</given-names></name></person-group><article-title>Thermal and Visual Image Registration in Hough Parameter Space</article-title><conf-name>Proceedings of the 14th International Workshop on Systems, Signals and Image Processing and 6th EURASIP Conference focused on Speech and Image Processing, Multimedia Communications and Services</conf-name><conf-loc>Maribor, Slovenia</conf-loc><conf-date>27–30 June 2007</conf-date><fpage>106</fpage><lpage>109</lpage></citation></ref>
<ref id="b13-sensors-12-12661"><label>13.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname><given-names>J.</given-names></name><name><surname>Kim</surname><given-names>Y.</given-names></name><name><surname>Lee</surname><given-names>D.</given-names></name><name><surname>Kang</surname><given-names>D.</given-names></name><name><surname>Ra</surname><given-names>J.</given-names></name></person-group><article-title>Robust CCD and IR image registration using gradient-based statistical information</article-title><source>IEEE Signal Process. Lett.</source><year>2010</year><volume>17</volume><fpage>347</fpage><lpage>350</lpage><pub-id pub-id-type="doi">10.1109/LSP.2010.2040928</pub-id></citation></ref>
<ref id="b14-sensors-12-12661"><label>14.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Zhang</surname><given-names>L.</given-names></name><name><surname>Wu</surname><given-names>B.</given-names></name><name><surname>Nevatia</surname><given-names>R.</given-names></name></person-group><article-title>Pedestrian Detection in Infrared Images Based on Local Shape Features</article-title><conf-name>Proceedings of IEEE Conference on Computer Vision and Pattern Recognition</conf-name><conf-loc>Minneapolis, MN, USA</conf-loc><conf-date>18– 23 June 2007</conf-date><fpage>1</fpage><lpage>8</lpage></citation></ref>
<ref id="b15-sensors-12-12661"><label>15.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Trivedi</surname><given-names>M.</given-names></name><name><surname>Cheng</surname><given-names>S.</given-names></name><name><surname>Childers</surname><given-names>E.</given-names></name><name><surname>Krotosky</surname><given-names>S.</given-names></name></person-group><article-title>Occupant posture analysis with stereo and thermal infrared video: Algorithms and experimental evaluation</article-title><source>IEEE Trans. Veh. Technol.</source><year>2004</year><volume>53</volume><fpage>1698</fpage><lpage>1712</lpage><pub-id pub-id-type="doi">10.1109/TVT.2004.835526</pub-id></citation></ref>
<ref id="b16-sensors-12-12661"><label>16.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Vural</surname><given-names>M.</given-names></name><name><surname>Yardimci</surname><given-names>Y.</given-names></name><name><surname>Temizel</surname><given-names>A.</given-names></name></person-group><article-title>Registration of Multispectral Satellite Images with Orientation-Restricted SIFT</article-title><conf-name>Proceedings of IEEE International Symposium on Geoscience and Remote Sensing</conf-name><conf-loc>Cape Town, South Africa</conf-loc><conf-date>12– 17 July 2009</conf-date><fpage>243</fpage><lpage>246</lpage></citation></ref>
<ref id="b17-sensors-12-12661"><label>17.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Canny</surname><given-names>J.</given-names></name></person-group><article-title>A computational approach to edge detection</article-title><source>IEEE Trans. Pattern Anal. Mach. Intell.</source><year>1986</year><volume>8</volume><fpage>679</fpage><lpage>698</lpage><pub-id pub-id-type="doi">10.1109/TPAMI.1986.4767851</pub-id><pub-id pub-id-type="pmid">21869365</pub-id></citation></ref>
<ref id="b18-sensors-12-12661"><label>18.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Manjunath</surname><given-names>B.</given-names></name><name><surname>Ohm</surname><given-names>J.</given-names></name><name><surname>Vasudevan</surname><given-names>V.</given-names></name><name><surname>Yamada</surname><given-names>A.</given-names></name></person-group><article-title>Color and texture descriptors</article-title><source>IEEE Trans. Circuits Syst. Video Technol.</source><year>2001</year><volume>11</volume><fpage>703</fpage><lpage>715</lpage><pub-id pub-id-type="doi">10.1109/76.927424</pub-id></citation></ref>
<ref id="b19-sensors-12-12661"><label>19.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sikora</surname><given-names>T.</given-names></name></person-group><article-title>The MPEG-7 visual standard for content description—an overview</article-title><source>IEEE Trans. Circuits Syst. Video Technol.</source><year>2001</year><volume>11</volume><fpage>696</fpage><lpage>702</lpage><pub-id pub-id-type="doi">10.1109/76.927422</pub-id></citation></ref></ref-list>
<sec sec-type="display-objects">
<title>Figures and Tables</title>
<fig id="f1-sensors-12-12661" position="float">
<label>Figure 1.</label>
<caption>
<p>(<bold>a</bold>) LWIR image; (<bold>b</bold>) Histogram of LWIR image; (<bold>c</bold>) VS image; (<bold>d</bold>) Histogram of VS image.</p></caption>
<graphic xlink:href="sensors-12-12661f1.gif"/></fig>
<fig id="f2-sensors-12-12661" position="float">
<label>Figure 2.</label>
<caption>
<p>Flowchart of the proposed approach.</p></caption>
<graphic xlink:href="sensors-12-12661f2.gif"/></fig>
<fig id="f3-sensors-12-12661" position="float">
<label>Figure 3.</label>
<caption>
<p>Proposed EOH based keypoint image descriptor for VS-LWIR images.</p></caption>
<graphic xlink:href="sensors-12-12661f3.gif"/></fig>
<fig id="f4-sensors-12-12661" position="float">
<label>Figure 4.</label>
<caption>
<p>Steps of the proposed VS-LWIR keypoint description method: (<bold>a</bold>) and (<bold>b</bold>) keypoints on VS and LWIR images together with their corresponding neighborhoods; (<bold>c</bold>) and (<bold>d</bold>) images of contours of the given keypoints; (<bold>e</bold>) and (<bold>f</bold>) histograms used as descriptor vectors.</p></caption>
<graphic xlink:href="sensors-12-12661f4a.gif"/>
<graphic xlink:href="sensors-12-12661f4b.gif"/></fig>
<fig id="f5-sensors-12-12661" position="float">
<label>Figure 5.</label>
<caption>
<p>Illustrations of some of the VS-LWIR image pairs from the evaluated data set, which contains 100 pairs.</p></caption>
<graphic xlink:href="sensors-12-12661f5a.gif"/>
<graphic xlink:href="sensors-12-12661f5b.gif"/></fig>
<fig id="f6-sensors-12-12661" position="float">
<label>Figure 6.</label>
<caption>
<p>Keypoints matched (green segments) in two image pairs (the whole data set contains 100 pairs of <bold>VS</bold>-LWIR images) using: (<bold>top</bold>) SIFT descriptor; (<bold>bottom</bold>) proposed EOH based descriptor.</p></caption>
<graphic xlink:href="sensors-12-12661f6.gif"/></fig>
<fig id="f7-sensors-12-12661" position="float">
<label>Figure 7.</label>
<caption>
<p>Illustrations of the results obtained with the proposed approach when images from the same spectral band are considered: (<bold>top</bold>) VS-VS; (<bold>bottom</bold>) LWIR-LWIR.</p></caption>
<graphic xlink:href="sensors-12-12661f7.gif"/></fig>
<table-wrap id="t1-sensors-12-12661" position="float">
<label>Table 1.</label>
<caption>
<p>Average correct matching from the whole data set; images of 408 × 506 pixels.</p></caption>
<table frame="box" rules="all">
<thead>
<tr>
<th align="left" valign="top"><bold>Window Size</bold></th>
<th align="center" valign="top"><bold>20 × 20</bold></th>
<th align="center" valign="top"><bold>40 × 40</bold></th>
<th align="center" valign="top"><bold>60 × 60</bold></th>
<th align="center" valign="top" content-type="background-color:#B6DDE8"><bold>80 × 80</bold></th>
<th align="center" valign="top" content-type="background-color:#B6DDE8"><bold>100 × 100</bold></th>
<th align="center" valign="top"><bold>120 × 120</bold></th>
<th align="center" valign="top"><bold>140 × 140</bold></th></tr></thead>
<tbody>
<tr>
<td align="left" valign="top">SIFT</td>
<td align="center" valign="top">6%</td>
<td align="center" valign="top">6%</td>
<td align="center" valign="top">6%</td>
<td align="center" valign="top" content-type="background-color:#B6DDE8">6%</td>
<td align="center" valign="top" content-type="background-color:#B6DDE8">6%</td>
<td align="center" valign="top">6%</td>
<td align="center" valign="top">6%</td></tr>
<tr>
<td align="left" valign="top">EOH-SIFT</td>
<td align="center" valign="top">2%</td>
<td align="center" valign="top">17%</td>
<td align="center" valign="top">27%</td>
<td align="center" valign="top" content-type="background-color:#B6DDE8">36%</td>
<td align="center" valign="top" content-type="background-color:#B6DDE8">37%</td>
<td align="center" valign="top">35%</td>
<td align="center" valign="top">32%</td></tr></tbody></table></table-wrap>
<table-wrap id="t2-sensors-12-12661" position="float">
<label>Table 2.</label>
<caption>
<p>Camera specifications.</p></caption>
<table frame="box" rules="cols">
<thead>
<tr>
<th align="center" valign="top"><bold>Specifications</bold></th>
<th align="center" valign="top"><bold>VS</bold></th>
<th align="center" valign="top"><bold>LWIR</bold></th></tr>
<tr>
<th align="center" valign="top" colspan="3">
<hr/></th></tr></thead>
<tbody>
<tr>
<td align="center" valign="top">Image sensor type</td>
<td align="center" valign="top">CCD</td>
<td align="center" valign="top">Thermal</td></tr>
<tr>
<td align="center" valign="top">Resolution</td>
<td align="center" valign="top">640 × 480</td>
<td align="center" valign="top">320 × 240</td></tr>
<tr>
<td align="center" valign="top">Wavelength</td>
<td align="center" valign="top">0.4 to 0.7 μm</td>
<td align="center" valign="top">8 to 14 μm</td></tr>
<tr>
<td align="center" valign="top">Focal length</td>
<td align="center" valign="top">6 mm</td>
<td align="center" valign="top">19 mm</td></tr></tbody></table></table-wrap>
<table-wrap id="t3-sensors-12-12661" position="float">
<label>Table 3.</label>
<caption>
<p>Average (1-Precision) and Recall for 100 VS-LWIR images.</p></caption>
<table frame="box" rules="cols">
<thead>
<tr>
<th align="left" valign="top"><bold>VS\LWIR</bold></th>
<th align="center" valign="top"><bold><italic>Recall</italic></bold></th>
<th align="center" valign="top"><bold><italic>(1-Precision)</italic></bold></th></tr>
<tr>
<th align="center" valign="top" colspan="3">
<hr/></th></tr></thead>
<tbody>
<tr>
<td align="left" valign="top">SIFT [<xref ref-type="bibr" rid="b8-sensors-12-12661">8</xref>]</td>
<td align="center" valign="top">6%</td>
<td align="center" valign="top">93%</td></tr>
<tr>
<td align="left" valign="top">SURF [<xref ref-type="bibr" rid="b10-sensors-12-12661">10</xref>]</td>
<td align="center" valign="top">45%</td>
<td align="center" valign="top">96%</td></tr>
<tr>
<td align="left" valign="top">GOM-SIFT [<xref ref-type="bibr" rid="b11-sensors-12-12661">11</xref>]</td>
<td align="center" valign="top">14%</td>
<td align="center" valign="top">86%</td></tr>
<tr>
<td align="left" valign="top">OR-SIFT [<xref ref-type="bibr" rid="b16-sensors-12-12661">16</xref>]</td>
<td align="center" valign="top">5%</td>
<td align="center" valign="top">97%</td></tr>
<tr>
<td align="left" valign="top"><italic>Proposed Approach</italic></td>
<td align="center" valign="top"><bold>74%</bold></td>
<td align="center" valign="top"><bold>59%</bold></td></tr></tbody></table></table-wrap></sec></back></article>
