<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Sensors</journal-id>
<journal-title>Sensors</journal-title>
<issn pub-type="epub">1424-8220</issn>
<publisher>
<publisher-name>Molecular Diversity Preservation International (MDPI)</publisher-name></publisher></journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3390/s111009573</article-id>
<article-id pub-id-type="publisher-id">sensors-11-09573</article-id>
<article-categories>
<subj-group>
<subject>Article</subject></subj-group></article-categories>
<title-group>
<article-title>Facial Expression Recognition Based on Local Binary Patterns and Kernel Discriminant Isomap</article-title></title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Zhao</surname><given-names>Xiaoming</given-names></name><xref ref-type="aff" rid="af1-sensors-11-09573"><sup>1</sup></xref><xref ref-type="corresp" rid="c1-sensors-11-09573"><sup>*</sup></xref></contrib>
<contrib contrib-type="author">
<name><surname>Zhang</surname><given-names>Shiqing</given-names></name><xref ref-type="aff" rid="af2-sensors-11-09573"><sup>2</sup></xref></contrib></contrib-group>
<aff id="af1-sensors-11-09573">
<label>1</label> Department of Computer Science, Taizhou University, Taizhou 317000, China</aff>
<aff id="af2-sensors-11-09573">
<label>2</label> School of Physics and Electronic Engineering, Taizhou University, Taizhou 318000, China; E-Mail: <email>tzczsq@163.com</email></aff>
<author-notes>
<corresp id="c1-sensors-11-09573">
<label>*</label>Author to whom correspondence should be addressed; E-Mail: <email>tzxyzxm@163.com</email>; Tel.: +86-576-8513-7178; Fax: ++86-576-8513-7178.</corresp></author-notes>
<pub-date pub-type="collection">
<year>2011</year></pub-date>
<pub-date pub-type="epub">
<day>11</day>
<month>10</month>
<year>2011</year></pub-date>
<volume>11</volume>
<issue>10</issue>
<fpage>9573</fpage>
<lpage>9588</lpage>
<history>
<date date-type="received">
<day>31</day>
<month>8</month>
<year>2011</year></date>
<date date-type="rev-recd">
<day>27</day>
<month>9</month>
<year>2011</year></date>
<date date-type="accepted">
<day>9</day>
<month>10</month>
<year>2011</year></date></history>
<permissions>
<copyright-statement>© 2011 by the authors; licensee MDPI, Basel, Switzerland.</copyright-statement>
<copyright-year>2011</copyright-year>
<license>
<p>This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).</p></license></permissions>
<abstract>
<p>Facial expression recognition is an interesting and challenging subject. Considering the nonlinear manifold structure of facial images, a new kernel-based manifold learning method, called kernel discriminant isometric mapping (KDIsomap), is proposed. KDIsomap aims to nonlinearly extract the discriminant information by maximizing the interclass scatter while minimizing the intraclass scatter in a reproducing kernel Hilbert space. KDIsomap is used to perform nonlinear dimensionality reduction on the extracted local binary patterns (LBP) facial features, and produce low-dimensional discrimimant embedded data representations with striking performance improvement on facial expression recognition tasks. The nearest neighbor classifier with the Euclidean metric is used for facial expression classification. Facial expression recognition experiments are performed on two popular facial expression databases, <italic>i.e</italic>., the JAFFE database and the Cohn-Kanade database. Experimental results indicate that KDIsomap obtains the best accuracy of 81.59% on the JAFFE database, and 94.88% on the Cohn-Kanade database. KDIsomap outperforms the other used methods such as principal component analysis (PCA), linear discriminant analysis (LDA), kernel principal component analysis (KPCA), kernel linear discriminant analysis (KLDA) as well as kernel isometric mapping (KIsomap).</p></abstract>
<kwd-group>
<kwd>kernel</kwd>
<kwd>isometric mapping</kwd>
<kwd>dimensionality reduction</kwd>
<kwd>local binary patterns</kwd>
<kwd>facial expression recognition</kwd></kwd-group></article-meta></front>
<body>
<sec sec-type="intro">
<label>1.</label>
<title>Introduction</title>
<p>Facial expressions are the facial changes indicating a person’s internal affective states, intentions or social communications. Based on the shown facial expressions, human face is the predominant mode of expressing and interpreting affective states of human beings. Automatic facial expression recognition impacts important applications in many areas such as natural human computer interactions, image retrieval, talking heads and human emotion analysis [<xref ref-type="bibr" rid="b1-sensors-11-09573">1</xref>]. Over the last decade, automatic facial expression recognition has been increasingly attracting attention and has become an important issue in the scientific community, since facial expressions are one of the most powerful, nature and immediate means for human beings to communicate their emotions and intentions.</p>
<p>A basic automatic facial expression recognition system generally consists of three steps [<xref ref-type="bibr" rid="b2-sensors-11-09573">2</xref>]: face acquisition, facial feature extraction and representation, and facial expression classification. Face acquisition is a preprocessing stage to detect or locate the face region in input images or sequences. The real-time face detection algorithm developed by Viola and Jones [<xref ref-type="bibr" rid="b3-sensors-11-09573">3</xref>] is the most commonly employed face detector, in which a cascade of classifiers is employed with Harr-wavelet features. Based on the eye position detected in the face region, the detected face region is usually aligned. After detecting or locating the face, the next step is to extract facial features from original face images to represent facial expressions. There are mainly two approaches to this task: geometric features-based methods and appearance features-based methods [<xref ref-type="bibr" rid="b2-sensors-11-09573">2</xref>]. Geometric features present the shape and locations of facial components such as mouth, nose, eyes, and brows. Nevertheless, the geometric feature extraction requires accurate and reliable facial feature detection, which is difficult to realize in real time applications. More crucially, geometric features usually cannot encode changes in skin texture such as wrinkles and furrows that are critical for facial expression modeling. In contrast, appearance features present appearance changes (skin texture) of the face, including wrinkles, bulges and furrows. Image filters, such as principal component analysis (PCA) [<xref ref-type="bibr" rid="b4-sensors-11-09573">4</xref>], linear discriminant analysis (LDA) [<xref ref-type="bibr" rid="b5-sensors-11-09573">5</xref>], Gabor wavelet analysis [<xref ref-type="bibr" rid="b6-sensors-11-09573">6</xref>–<xref ref-type="bibr" rid="b9-sensors-11-09573">9</xref>], can be applied to either the whole-face or specific face regions to extract facial appearance changes. However, it is computationally expensive to convolve face images with a set of Gabor filters to extract multi-scale and multi-orientation coefficients. It is thus inefficient in both time and memory for high redundancy of Gabor wavelet features [<xref ref-type="bibr" rid="b10-sensors-11-09573">10</xref>–<xref ref-type="bibr" rid="b13-sensors-11-09573">13</xref>]. In recent years, local binary patterns (LBP) [<xref ref-type="bibr" rid="b10-sensors-11-09573">10</xref>], originally proposed for texture analysis and a non-parametric method efficiently summarizing the local structures of an image, have received increasing interest for facial expression representation. The most important property of LBP features is their tolerance against illumination changes and their computational simplicity. LBP has been successfully applied as a local feature extraction method in facial expression recognition [<xref ref-type="bibr" rid="b11-sensors-11-09573">11</xref>–<xref ref-type="bibr" rid="b13-sensors-11-09573">13</xref>]. The last step of an automatic facial expression recognition system is to classify different expressions based on the extracted facial features. A variety of classifiers, such as Neural Networks (NN) [<xref ref-type="bibr" rid="b14-sensors-11-09573">14</xref>], Support Vector Machines (SVM) [<xref ref-type="bibr" rid="b15-sensors-11-09573">15</xref>], k-Nearest Neighbor (kNN) [<xref ref-type="bibr" rid="b16-sensors-11-09573">16</xref>], rule-based classifiers [<xref ref-type="bibr" rid="b17-sensors-11-09573">17</xref>], and Hidden Markov Models (HMM) [<xref ref-type="bibr" rid="b18-sensors-11-09573">18</xref>], have been used for facial expression recognition.</p>
<p>In the second step of an automatic facial expression recognition system, the extracted facial features are represented by a set of high-dimensional data. Therefore, it would be desired to analyze facial expressions in the low-dimensional subspace rather than the ambient space. To solve this problem, there are mainly two methods: linear and nonlinear. The well-known linear methods are PCA and LDA. However, they are not suitable for representing dynamically changing facial expressions, which can be represented as low-dimensional nonlinear manifolds embedded in a high-dimensional image space [<xref ref-type="bibr" rid="b19-sensors-11-09573">19</xref>,<xref ref-type="bibr" rid="b20-sensors-11-09573">20</xref>]. Considering the nonlinear manifold structure of facial images, manifold learning (also called nonlinear dimensionality reduction) methods, which aim to find a smooth low-dimensional manifold embedded in a high-dimensional data space, have been recently applied to facial images for automatic facial expression analysis [<xref ref-type="bibr" rid="b21-sensors-11-09573">21</xref>,<xref ref-type="bibr" rid="b22-sensors-11-09573">22</xref>]. The two representative manifold learning methods are locally linear embedding (LLE) [<xref ref-type="bibr" rid="b19-sensors-11-09573">19</xref>] and isometric feature mapping (Isomap) [<xref ref-type="bibr" rid="b20-sensors-11-09573">20</xref>]. By using manifold learning methods such as LLE and Isomap, facial expression images can be projected to a low-dimensional subspace in which facial data representation is optimal for classification. However, these methods lack a good generalization property on new data points since they are defined only on training data.</p>
<p>To overcome the above drawback of manifold leaning methods, some kernel-based manifold learning methods, like kernel isometric mapping (KIsomap) [<xref ref-type="bibr" rid="b23-sensors-11-09573">23</xref>], have been recently developed. KIsomap effectively combines the kernel idea and Isomap and can directly project new data points into a low-dimensional space by using a kernel trick as in kernel principal component analysis (KPCA) [<xref ref-type="bibr" rid="b24-sensors-11-09573">24</xref>]. However, this kind of KIsomap algorithm still has two shortcomings. First, KIsomap fails to extract the discriminant embedded data representations since KIsomap does not take into account the known class label information of input data. Second, as a kernel-based method, KIsomap cannot employ the characteristic of a kernel-based learning, <italic>i.e</italic>., a nonlinear kernel mapping, to explore higher order information of input data. Because the kernel idea of KIsomap is that the geodesic distance matrix with a constant-shifting technique is referred to be a Mercer kernel matrix [<xref ref-type="bibr" rid="b25-sensors-11-09573">25</xref>].</p>
<p>To overcome the limitations of KIsomap mentioned above, in this paper a new kernel-based feature extraction method, called kernel discriminant Isomap (KDIsomap) is proposed. On one hand, KDIsomap considers both the intraclass scatter information and the interclass scatter information in a reproducing kernel Hilbert space (RKHS), and emphasizes the discriminant information. On the other hand, KDIsomap performs a nonlinear kernel mapping with a kernel function to extract the nonlinear features when mapping input data into some high-dimensional feature space. After extracting LBP features for facial representations, the proposed KDIsomap is used to produce the low-dimensional discriminant embedded data representations from the extracted LBP features with striking performance improvement on facial expression recognition tasks.</p>
<p>The remainder of this paper is organized as follows. The Local Binary Patterns (LBP) operator is described in Section 2. In Section 3, KIsomap is reviewed briefly and the proposed KDIsomap algorithm is presented in detail. In Section 4, two facial expression databases used for experiments are described. Section 5 shows the experiment results and analysis. Finally, the conclusions are given in Section 6.</p></sec>
<sec>
<label>2.</label>
<title>Local Binary Patterns (LBP)</title>
<p>The original local binary patterns (LBP) [<xref ref-type="bibr" rid="b10-sensors-11-09573">10</xref>] operator takes a local neighborhood around each pixel, thresholds the pixels of the neighborhood at the value of the central pixel and uses the resulting binary-valued image patch as a local image descriptor. It was originally defined for 3 × 3 neighborhoods, giving 8 bit codes based on the 8 pixels around the central one. The operator labels the pixels of an image by thresholding a 3 × 3 neighborhood of each pixel with the center value and considering the results as a binary number, and the 256-bin histogram of the LBP labels computed over a region is used as a texture descriptor. <xref ref-type="fig" rid="f1-sensors-11-09573">Figure 1</xref> gives an example of the basic LBP operator.</p>
<p>The limitation of the basic LBP operator is that its small 3 × 3 neighborhood cannot capture the dominant features with large scale structures. As a result, to deal with the texture at different scales, the operator was later extended to use neighborhoods of different sizes. <xref ref-type="fig" rid="f2-sensors-11-09573">Figure 2</xref> gives an example of the extended LBP operator, where the notation (<italic>P</italic>, <italic>R</italic>) denotes a neighborhood of <italic>P</italic> equally spaced sampling points on a circle of radius of <italic>R</italic> that form a circularly symmetric neighbor set. The second defined the so-called uniform patterns: an LBP is ‘uniform’ if it contains at most one 0-1 and one 1-0 transition when viewed as a circular bit string. For instance, 00000000, 001110000 and 11100001 are uniform patterns. It is observed that uniform patterns account for nearly 90% of all patterns in the (8, 1) neighborhood and for about 70% in the (16, 2) neighborhood in texture images. Accumulating the patterns which have more than 2 transitions into a single bin yields an LBP operator, 
<inline-formula>
<mml:math>
<mml:mrow>
<mml:msubsup>
<mml:mi mathvariant="italic">LBR</mml:mi>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>R</mml:mi></mml:mrow>
<mml:mrow>
<mml:mi>μ</mml:mi>
<mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula>, with less than 2<italic><sup>P</sup></italic> bins. Here, the superscript u2 in 
<inline-formula>
<mml:math>
<mml:mrow>
<mml:msubsup>
<mml:mi mathvariant="italic">LBR</mml:mi>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>R</mml:mi></mml:mrow>
<mml:mrow>
<mml:mi>μ</mml:mi>
<mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> indicates using only uniform patterns and labeling all remaining patterns with a single label.</p>
<p>After labeling an image with the LBP operator, a histogram of the labeled image <italic>f<sub>l</sub></italic>(<italic>x</italic>, <italic>y</italic>) can be defined as:
<disp-formula id="FD1">
<label>(1)</label>
<mml:math display="block">
<mml:mrow>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi></mml:mrow></mml:munder>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>l</mml:mi></mml:msub>
<mml:mo> </mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
<mml:mtext>   </mml:mtext>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>⋯</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:mrow></mml:math></disp-formula>where <italic>n</italic> is the number of different labels produced by the LBP operator and:
<disp-formula id="FD2">
<label>(2)</label>
<mml:math display="block">
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mtable columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>  </mml:mo>
<mml:mtext>A is true</mml:mtext></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>  </mml:mo>
<mml:mtext>A is false</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:math></disp-formula></p>
<p>This LBP histogram contains information about the distribution of the local micro-patterns, such as edges, spots and flat areas, over the whole image, so can be used to statistically describe image characteristics. For efficient face representation, face images were equally divided into <italic>m</italic> small regions <italic>R</italic><sub>1</sub>, <italic>R</italic><sub>2</sub>, ..., <italic>R<sub>m</sub></italic> Once the <italic>m</italic> small regions <italic>R</italic><sub>1</sub>, <italic>R</italic><sub>2</sub>, ..., <italic>R<sub>m</sub></italic> are determined, a histogram is computed independently within each of the <italic>m</italic> small regions. The resulting <italic>m</italic> histograms are concatenated into a single, spatially enhanced histogram which encodes both the appearance and the spatial relations of facial regions. In this spatially enhanced histogram, we effectively have a description of the face image on three different levels of locality: the labels for the histogram contain information about the patterns on a pixel-level, the labels are summed over a small region to produce information on a regional level and the regional histograms are concatenated to build a global description of the face image.</p></sec>
<sec>
<label>3.</label>
<title>Our Method</title>
<p>In this section, we review the existing KIsomap algorithm in brief and explain the proposed KDIsomap algorithm in detail.</p>
<sec>
<label>3.1.</label>
<title>Review of KIsomap</title>
<p>The approximate geodesic distance matrix used in Isomap [<xref ref-type="bibr" rid="b20-sensors-11-09573">20</xref>], can be interpreted as a Mercer kernel matrix. However, the kernel matrix based on the doubly centered geodesic distance matrix, is not always positive semi-definite. The method which incorporates a constant-shifting method into Isomap, is referred to as KIsomap [<xref ref-type="bibr" rid="b23-sensors-11-09573">23</xref>], since the geodesic distance matrix with a constant-shifting technique is guaranteed to be a Mercer kernel matrix. This Mercer KIsomap algorithm has a good generalization property, enabling us to project new data points onto an associated low-dimensional manifold.</p>
<p>The KIsomap [<xref ref-type="bibr" rid="b23-sensors-11-09573">23</xref>] operations can be summarized as follows:
<list list-type="simple">
<list-item>
<p>Step 1: Identify the <italic>k</italic> nearest neighbors of each input data point and construct a neighborhood graph where edge lengths between points in a neighborhood are set as their Euclidean distances.</p></list-item>
<list-item>
<p>Step 2: Compute the geodesic distances, <italic>d<sub>ij</sub></italic>, containing shortest paths for all pairs of data points by Dijkstra’ algorithm, and define 
<inline-formula>
<mml:math>
<mml:mrow>
<mml:msup>
<mml:mi>D</mml:mi>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>=</mml:mo>
<mml:mo stretchy="false">[</mml:mo>
<mml:msubsup>
<mml:mi>d</mml:mi>
<mml:mi mathvariant="italic">ij</mml:mi>
<mml:mn>2</mml:mn></mml:msubsup>
<mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:math></inline-formula>.</p></list-item>
<list-item>
<p>Step 3: Construct a matrix <bold><italic>K</italic></bold>(<bold><italic>D</italic></bold><sup>2</sup>) based on the approximate geodesic distance matrix:
<disp-formula id="FD3">
<label>(3)</label>
<mml:math display="block">
<mml:mrow>
<mml:mi mathvariant="bold-italic">K</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi mathvariant="bold-italic">D</mml:mi>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo>−</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn></mml:mfrac>
<mml:msup>
<mml:mi mathvariant="bold-italic">HD</mml:mi>
<mml:mn>2</mml:mn></mml:msup>
<mml:mi mathvariant="bold-italic">H</mml:mi></mml:mrow></mml:math></disp-formula>where <bold><italic>H</italic></bold> = <bold><italic>I</italic></bold> – (1/<italic>N</italic>)<italic>ee<sup>T</sup></italic>, e = [1,...,1]<italic><sup>T</sup></italic> ∈ <italic>R<sup>N</sup></italic>.</p></list-item>
<list-item>
<p>Step 4: Compute the largest eigenvalue, <bold><italic>c</italic></bold>*, of the matrix 
<inline-formula>
<mml:math>
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mn>0</mml:mn></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mi>K</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>D</mml:mi>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mo>−</mml:mo>
<mml:mi>I</mml:mi></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mo>−</mml:mo>
<mml:mn>4</mml:mn>
<mml:mi>K</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>D</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow>
<mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> and construct a Mercer kernel matrix:
<disp-formula id="FD4">
<label>(4)</label>
<mml:math display="block">
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold-italic">K</mml:mi>
<mml:mo>*</mml:mo></mml:msup>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="bold-italic">K</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi mathvariant="bold-italic">D</mml:mi>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mn>2</mml:mn>
<mml:mi>c</mml:mi>
<mml:mi mathvariant="bold-italic">K</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi mathvariant="bold-italic">D</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn></mml:mfrac>
<mml:msup>
<mml:mi>c</mml:mi>
<mml:mn>2</mml:mn></mml:msup>
<mml:mi mathvariant="bold-italic">H</mml:mi></mml:mrow></mml:math></disp-formula>where <bold><italic>K</italic></bold>* is guaranteed to be positive semi-definite for <italic>c</italic> ≥ <italic>c</italic><sup>*</sup>.</p></list-item>
<list-item>
<p>Step 5: Compute the top <italic>d</italic> eigenvectors of <bold><italic>K</italic></bold>*, which leads to the eigenvector matrix <bold><italic>V</italic></bold>∈<italic>R<sup>N×d</sup></italic> and the eigenvalue matrix <italic>Λ<sup>d×d</sup></italic>.</p></list-item>
<list-item>
<p>Step 6: The embedded coordinates of the input points in the <italic>d</italic>-dimensional Euclidean space are given by:
<disp-formula id="FD5">
<label>(5)</label>
<mml:math display="block">
<mml:mrow>
<mml:mi mathvariant="bold-italic">Y</mml:mi>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mi mathvariant="bold-italic">Λ</mml:mi>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn></mml:mfrac></mml:mrow></mml:msup>
<mml:msup>
<mml:mi mathvariant="bold-italic">V</mml:mi>
<mml:mi>T</mml:mi></mml:msup></mml:mrow></mml:math></disp-formula></p></list-item></list></p></sec>
<sec>
<label>3.2.</label>
<title>The Proposed KDIsomap</title>
<p>The Fisher’s criterion in LDA [<xref ref-type="bibr" rid="b5-sensors-11-09573">5</xref>], that is, the interclass scatter should be maximized while the intraclass scatter should be simultaneously minimized, has become one of the most important selection criteria for projection techniques since it endows the projected data vectors with a good discriminating power. Motivated by the Fisher’s criterion, when using KIsomap to extract the low-dimensional embedded data representations, the interclass dissimilarity could be maximized while the intraclass dissimilarity could be minimized in order to have improved tightness among similar patterns and better separability for dissimilar patterns. This can be realized by means of modifying the used Euclidean distance measure in the first step of KIsomap. In this section, we develop an improved kernelized variant of KIsomap by designing a kernel discriminant distance in a reproducing kernel Hilbert space (RKHS), which gives rise to the KDIsomap algorithm.</p>
<p>To develop the KDIsomap algorithm, a kernel matrix is firstly constructed by performing a nonlinear kernel mapping with a kernel function, and then a kernel discriminant distance, in which the interclass scatter is maximized while the intraclass scatter is simultaneously minimized, is designed to extract the discriminant information in a RKHS.</p>
<p>Given the input data point (<italic>x<sub>i</sub></italic>, <italic>L<sub>i</sub></italic>), where <italic>x<sub>i</sub></italic> ∈ <italic>R<sup>D</sup></italic> and <italic>L<sub>i</sub></italic> is the class label of <italic>x<sub>i</sub></italic>, the output data point is <italic>y<sub>i</sub></italic> ∈ <italic>R<sup>d</sup></italic> (<italic>i</italic> = 1, 2, 3, ..., <italic>N</italic>). The detailed steps of KDIsomap are presented as follows:
<list list-type="simple">
<list-item>
<p>Step 1: Kernel mapping for each input data point <italic>x<sub>i</sub></italic>.</p>
<p>A nonlinear mapping function <italic>φ</italic> is defined as: <italic>φ: R<sup>D</sup></italic> → ℱ, <italic>x</italic> ↦ <italic>φ</italic>(<italic>x</italic>)</p>
<p>The input data point <italic>x<sub>i</sub></italic> ∈ <italic>R<sup>D</sup></italic> is mapping into an implicit high-dimensional feature space <italic>F</italic> with the nonlinear mapping function <italic>φ</italic>. In a RKHS, a kernel function <italic>κ</italic>(<italic>x<sub>i</sub></italic>, <italic>x<sub>j</sub></italic>) can be defined as:
<disp-formula id="FD6">
<label>(6)</label>
<mml:math display="block">
<mml:mrow>
<mml:mi mathvariant="bold-italic">κ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo>〈</mml:mo>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>〉</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>φ</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mi>T</mml:mi></mml:msup>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></disp-formula>where <italic>κ</italic> is known as a kernel.</p></list-item>
<list-item>
<p>Step 2: Find the nearest neighbors of each data point <italic>φ</italic>(<italic>x<sub>i</sub></italic>) and construct a neighborhood graph where edge lengths between points in a neighborhood are set as the following kernel discriminant distance.</p>
<p>The kernel Euclidean distance measure induced by a kernel <italic>κ</italic> can be defined as:
<disp-formula id="FD7">
<label>(7)</label>
<mml:math display="block">
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>κ</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>=</mml:mo></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mo>〈</mml:mo>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>−</mml:mo>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>−</mml:mo>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>〉</mml:mo></mml:mrow></mml:msqrt></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd/>
<mml:mtd columnalign="left">
<mml:mo>=</mml:mo></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mi>κ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>−</mml:mo>
<mml:mn>2</mml:mn>
<mml:mi>κ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mi>κ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msqrt></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math></disp-formula></p>
<p>To preserve the intraclass neighbouring geometry, while maximizing the interclass scatter, a kernel discriminant distance in a RKHS is given as follows:
<disp-formula id="FD8">
<label>(8)</label>
<mml:math display="block">
<mml:mrow>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>κ</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>−</mml:mo>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mo>−</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>κ</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">]</mml:mo></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow>
<mml:mi>β</mml:mi></mml:mfrac></mml:mrow></mml:msup></mml:mrow></mml:msqrt></mml:mrow></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>κ</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">]</mml:mo></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow>
<mml:mi>β</mml:mi></mml:mfrac></mml:mrow></mml:msup></mml:mrow></mml:msqrt>
<mml:mo>−</mml:mo>
<mml:mi>α</mml:mi></mml:mrow></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>≠</mml:mo>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>where <italic>d<sub>κ</sub></italic>(<italic>x<sub>i</sub></italic>, <italic>x<sub>j</sub></italic>) is the kernel Euclidean distance matrix without class label information, whereas <italic>D<sub>κ</sub></italic>(<italic>x<sub>i</sub></italic>, <italic>x<sub>j</sub></italic>) is the kernel discriminant distance matrix integrating class label information. β is a smoothing parameter related to the data ‘density’, and it is usually feasible to set β to be the average kernel Euclidean distance between all pairs of data points. α is a constant factor (0 ≤ α ≤ 1) and gives the intraclass dissimilarity a certain probability to exceed the interclass dissimilarity. As shown in <xref ref-type="disp-formula" rid="FD8">Equation (8)</xref>, we can make two observations. First, each dissimilarity function in <italic>D<sub>κ</sub></italic>(<italic>x<sub>i</sub></italic>, <italic>x<sub>j</sub></italic>), <italic>i.e.</italic>, interclass dissimilarity and intraclass dissimilarity, is increasing monotonously with respect to the kernel Euclidean distance. This ensures that the main geometric structure of the original data sets can be preserved well when using KDIsomap to produce low-dimensional embedded data representations. Second, the interclass dissimilarity in <italic>D<sub>κ</sub></italic>(<italic>x<sub>i</sub></italic>, <italic>x<sub>j</sub></italic>) can be always definitely larger than the intraclass dissimilarity, conferring a high discriminating power of KDIsomap’s projected data vectors. This is a good property for classification.</p></list-item>
<list-item>
<p>Step 3: Estimate the approximate geodesic distances.</p></list-item>
<list-item>
<p>Step 4: Construct a matrix <bold><italic>K</italic></bold>(<bold><italic>D</italic></bold><sup>2</sup>) based on <bold><italic>K</italic></bold>(<bold><italic>D</italic></bold><sup>2</sup>) = <bold><italic>−</italic>0.5<italic>HD</italic></bold><sup>2</sup><bold><italic>H</italic></bold>.</p></list-item>
<list-item>
<p>Step 5: Compute the top <italic>d</italic> eigenvectors with the largest eigenvalue and give the embedded coordinates of the input points in the <italic>d</italic>-dimensional Euclidean space.</p></list-item></list></p></sec></sec>
<sec sec-type="methods">
<label>4.</label>
<title>Facial Expression Database</title>
<p>Two popular facial expression databases, <italic>i.e.</italic>, the JAFFE database [<xref ref-type="bibr" rid="b9-sensors-11-09573">9</xref>] and the Cohn-Kanade database [<xref ref-type="bibr" rid="b26-sensors-11-09573">26</xref>], are used for facial expression recognition. Each database contains seven emotions: anger, joy, sadness, neutral, surprise, disgust and fear.</p>
<p>The JAFFE database contains 213 images of female facial expressions. Each image has a resolution of 256 × 256 pixels. The head is almost in frontal pose. The number of images corresponding to each of the seven categories of expressions is roughly the same. A few of them are shown in <xref ref-type="fig" rid="f3-sensors-11-09573">Figure 3</xref>.</p>
<p>The Cohn-Kanade database consists of 100 university students aged from 18 to 30 years, of which 65% were female, 15% were African-American and 3% were Asian or Latino. Subjects were instructed to perform a series of 23 facial displays, six of which were based on description of prototypic emotions. Image sequences from neutral to target display were digitized into 640 × 490 pixels with 8-bit precision for grayscale values. <xref ref-type="fig" rid="f4-sensors-11-09573">Figure 4</xref> shows some sample images from the Cohn-Kanade database. As in [<xref ref-type="bibr" rid="b11-sensors-11-09573">11</xref>,<xref ref-type="bibr" rid="b12-sensors-11-09573">12</xref>], we selected 320 image sequences from the Cohn-Kanade database. The selected sequences, each of which could be labeled as one of the six basic emotions, come from 96 subjects, with 1 to 6 emotions per subject. For each sequence, the neutral face and three peak frames were used for prototypic expression recognition, resulting in 1,409 images (96 anger, 298 joy, 165 sadness, 225 surprise, 141 fear, 135 disgust and 349 neutral).</p>
<p>Following the setting in [<xref ref-type="bibr" rid="b11-sensors-11-09573">11</xref>,<xref ref-type="bibr" rid="b12-sensors-11-09573">12</xref>], we normalized the eye distance of face images to a fixed distance of 55 pixels once the centers of two eyes were located. Generally, it is observed that the width of a face is roughly two times the distance, and the height is roughly three times. Therefore, based on the normalized value of the eye distance, a resized image of 110 × 150 pixels was cropped from original image. To locate the centers of two eyes, automatic face registration was performed by using a robust real-time face detector based on a set of rectangle Harr-wavelet features [<xref ref-type="bibr" rid="b3-sensors-11-09573">3</xref>]. From the results of automatic face detection including face location, face width and face height, two square bounding boxes for left eye and right eye were automatically constructed by using the geometry of a typical up-right face which has been widely utilized to find a proper spatial arrangement of facial features [<xref ref-type="bibr" rid="b27-sensors-11-09573">27</xref>]. Then, the approximate center locations of two eyes can be automatically worked out in terms of the centers of two square bounding boxes for left eye and right eye. <xref ref-type="fig" rid="f5-sensors-11-09573">Figure 5</xref> shows the detailed process of two eyes location and the final cropped image from the Cohn-Kanade database. No further alignment of facial features such as alignment of mouth was performed. Additionally, there was no attempt made to remove illumination changes due to LBP’s gray-scale invariance.</p>
<p>The cropped facial images of 110 × 150 pixels contain facial main components such as mouth, eyes, brows and noses. The LBP operator is applied to the whole region of the cropped facial images. For better uniform-LBP feature extraction, two parameters, <italic>i.e</italic>., the LBP operator and the number of regions divided, need to be optimized. Similar to the setting in [<xref ref-type="bibr" rid="b28-sensors-11-09573">28</xref>], we selected the 59-bin operator 
<inline-formula>
<mml:math>
<mml:mrow>
<mml:msubsup>
<mml:mi mathvariant="italic">LBR</mml:mi>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>R</mml:mi></mml:mrow>
<mml:mrow>
<mml:mi>μ</mml:mi>
<mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula>, and divided the 110 × 150 pixels face images into 18 × 21 pixels regions, giving a good trade-off between recognition performance and feature vector length. Thus face images were divided into 42 (6 × 7) regions, and represented by the LBP histograms with the length of 2,478 (59 × 42).</p></sec>
<sec sec-type="methods|results">
<label>5.</label>
<title>Experimental Results and Analysis</title>
<p>To evaluate the performance of KDIsomap, facial expression recognition experiments were performed separately on the JAFFE database and the Cohn-Kanade Database. The performance of KDIsomap is compared with PCA [<xref ref-type="bibr" rid="b4-sensors-11-09573">4</xref>], LDA [<xref ref-type="bibr" rid="b5-sensors-11-09573">5</xref>], KPCA [<xref ref-type="bibr" rid="b24-sensors-11-09573">24</xref>], kernel linear discriminant analysis (KLDA) [<xref ref-type="bibr" rid="b29-sensors-11-09573">29</xref>], and KIsomap. The typical Gaussian kernel <italic>κ</italic>(<italic>x<sub>i</sub></italic>, <italic>x<sub>j</sub></italic>) = exp(−‖<italic>x<sub>i</sub></italic> − <italic>x<sub>j</sub></italic>‖<sup>2</sup>/2<italic>σ</italic><sup>2</sup>) is adopted for KPCA, KLDA, KIsomap and KDIsomap, and the parameter <italic>σ</italic> is empirically set to 1 for its satisfying results. For simplicity the nearest neighbor classifier with the Euclidean metric is used for classification. Due to the computation complexity constraint, the embedded feature dimension is confined to the range [2,100] with an interval of 5. In each embedded dimension, the constant α (0 ≤ α ≤ 1) for KDIsomap can be optimized using a simple exhaustive search within a scope (α = 0, 0.1, 0.2, ..., 1). Note that the embedded dimensions of LDA and KLDA are limited to the range [<xref ref-type="bibr" rid="b2-sensors-11-09573">2</xref>,<xref ref-type="bibr" rid="b6-sensors-11-09573">6</xref>] because they can only find at most <italic>c</italic> − 1 meaningful embedded features, where <italic>c</italic> is the number of facial expression classes. Additionally, to detailedly explore the performance of all used methods in the low range [<xref ref-type="bibr" rid="b2-sensors-11-09573">2</xref>,<xref ref-type="bibr" rid="b10-sensors-11-09573">10</xref>], we present the recognition results of each embedded dimension with a small interval of 1.</p>
<p>A 10-fold cross validation scheme is employed for 7-class facial expression recognition experiments, and the average recognition results are reported. In detail, the data sets are split randomly into ten groups of roughly equal numbers of subjects. Nine groups are used as the training data to train a classifier, while the remaining group is used as the testing data. The above process is repeated ten times for each group in turn to be omitted from the training process. Finally, the average recognition results on the testing data are reported.</p>
<sec>
<label>5.1.</label>
<title>System Structure</title>
<p>In order to clarify the experiment scheme of how to employ dimensionality reduction techniques such as PCA, LDA, KPCA, KLDA, KIsomap and KDIsomap on facial expression recognition tasks, <xref ref-type="fig" rid="f6-sensors-11-09573">Figure 6</xref> presents the basic structure of a facial expression recognition system based on dimensionality reduction techniques.</p>
<p>It can be observed from <xref ref-type="fig" rid="f6-sensors-11-09573">Figure 6</xref> that this system consists of three main components: feature extraction, feature dimensionality reduction and facial expression classification. In the feature extraction stage, the original facial images from the used facial expression databases are divided into two parts: training data and testing data. The corresponding LBP features for training data and testing data are extracted. The result of this stage is the extracted facial feature data represented by a set of high-dimensional LBP features. The second stage aims at reducing the size of LBP features and generating the new low-dimensional embedded features with dimensionality reduction techniques. It is noted that for the mapping of testing data, the low-dimensional embedded mapping of training data is needed to be learnt. This is realized by using the out-of-sample extensions of dimensionality reduction methods. For linear methods such as PCA and LDA, due to the linearity, their out-of-sample extensions are performed by multiplying testing data with the linear mapping matrix with a straightforward method. In other words, in PCA and LDA, the out-of-sample extension is computed by multiplying testing data with the linear mapping matrix obtained over training data. For nonlinear methods such as KPCA, KLDA, KIsomap and KDIsomap, their out-of-sample extensions are easily realized by a kernel trick as in KPCA [<xref ref-type="bibr" rid="b24-sensors-11-09573">24</xref>]. The kernel trick first maps the original input data to another higher dimensional space with a kernel mapping function, and then performs the linear operations in this new feature space. The last stage in this system is in the low-dimensional embedded feature space the trained pattern classifiers like the nearest neighbor classifier are used to predict the accurate facial expression categories on testing data and the recognition results are given.</p></sec>
<sec sec-type="methods|results">
<label>5.2.</label>
<title>Experimental Results on the JAFFE Database</title>
<p>The recognition results of different dimensionality reduction methods, <italic>i.e.</italic>, PCA, LDA, KPCA, KLDA, KIsomap and KDIsomap, are given in <xref ref-type="fig" rid="f7-sensors-11-09573">Figure 7</xref>. The best accuracy for different methods with corresponding embedded dimension is presented in <xref ref-type="table" rid="t1-sensors-11-09573">Table 1</xref>. From the results in <xref ref-type="fig" rid="f7-sensors-11-09573">Figure 7</xref> and <xref ref-type="table" rid="t1-sensors-11-09573">Table 1</xref>, we can make four observations. Firstly, KDIsomap achieves the highest accuracy of 81.59% with 20 embedded features, outperforming the other methods, <italic>i.e</italic>., PCA, LDA, KPCA, KLDA and KIsomap. This shows that KDIsomap is able to extract the most discriminative low-dimensional embedded data representations for facial expression recognition. This can be attributed to the good property of KDIsomap for classification, <italic>i.e.</italic>, the interclass scatter is maximized while the intraclass scatter is simultaneously minimized. Secondly, KIsomap performs worst and obtains the lowest accuracy of 69.52%. The main reason is that KIsomap does not consider the class label information of data sets. In contrast, KDIsomap performs best. This reveals that KDIsomap makes an obvious improvement over KIsomap due to its supervised learning ability. Thirdly, two kernel methods, KPCA and KLDA, slightly outperform the corresponding non-kernel methods, <italic>i.e</italic>., PCA and LDA. This demonstrates the effectiveness of kernel methods. That is, they can employ the characteristic of a kernel-based learning to explore higher order information of input data. Finally, there is no significant improvement on facial expression recognition performance if more embedded feature dimensions are used. This shows that in our experiments it is acceptable that the embedded target dimension is confined to the range [2,100].</p>
<p>The recognition accuracy of 81.59% with basic LBP features and the nearest neighbor classifier is very encouraging, compared with the previously reported work [<xref ref-type="bibr" rid="b12-sensors-11-09573">12</xref>] on the JAFFE database. In [<xref ref-type="bibr" rid="b12-sensors-11-09573">12</xref>], using experimental settings similar to ours, the authors found the most discriminative LBP histograms using AdaBoost for better facial representation and then based on boosted-LBP features and SVM, they reported 7-class facial expression recognition accuracy of 79.8%, 79.8% and 81.0% for linear, polynomial and radial basis function (RBF) kernels, respectively. Nevertheless, in this study we did not used boosted-LBP features and SVM. To further compare the performance of KDIsomap with the work in [<xref ref-type="bibr" rid="b12-sensors-11-09573">12</xref>], it’s an interesting task to explore the performance of boosted-LBP features and SVM integrating with KDIsomap in our future work.</p>
<p>To further explore the recognition accuracy per expression when KDIsomap performs best, <xref ref-type="table" rid="t2-sensors-11-09573">Table 2</xref> gives the confusion matrix of 7-class facial expression recognition results obtained by KDIsomap. From <xref ref-type="table" rid="t2-sensors-11-09573">Table 2</xref> we can see that two expressions, <italic>i.e.</italic>, anger and joy, are classified well with an accuracy of more than 90%, while other five expressions are discriminated with relatively low accuracy (less than 90%). In particular, sadness is recognized with the lowest accuracy of 61.88% since sadness is highly confused to neutral and fear.</p></sec>
<sec sec-type="methods|results">
<label>5.3.</label>
<title>Experimental Results on the Cohn-Kanade Database</title>
<p><xref ref-type="fig" rid="f8-sensors-11-09573">Figure 8</xref> presents the recognition performance of different dimensionality reduction methods on the Cohn-Kanade database. <xref ref-type="table" rid="t3-sensors-11-09573">Table 3</xref> shows the best accuracy for different methods with corresponding embedded dimension. The results in <xref ref-type="fig" rid="f8-sensors-11-09573">Figure 8</xref> and <xref ref-type="table" rid="t3-sensors-11-09573">Table 3</xref> indicate that KDIsomap still obtains a recognition performance superior to that of other methods. In detail, among all used methods KDIsomap achieves the highest accuracy of 94.88% with 30 embedded features, whereas KIsomap gives the lowest accuracy of 75.81% with 40 embedded features. Again, this demonstrates the effectiveness of KDIsomap.</p>
<p><xref ref-type="table" rid="t4-sensors-11-09573">Table 4</xref> gives the confusion matrix of 7-class expression recognition results when KDIsomap obtains the best performance. As shown in <xref ref-type="table" rid="t4-sensors-11-09573">Table 4</xref>, we can see that 7-class facial expressions except sadness are identified very well with an accuracy of over 90%.</p>
<p>Compared with the previously reported work [<xref ref-type="bibr" rid="b11-sensors-11-09573">11</xref>,<xref ref-type="bibr" rid="b12-sensors-11-09573">12</xref>] with similar experimental settings to ours, the recognition performance of 94.88% is highly comparable. In [<xref ref-type="bibr" rid="b11-sensors-11-09573">11</xref>], on 7-class facial expression recognition tasks they used LBP-based template matching to obtain an accuracy of 79.1%. Additionally, they also employed LBP-based SVM classifier to give an accuracy of 87.2%, 88.4% and 87.6% with linear, polynomial and RBF kernels, respectively. In [<xref ref-type="bibr" rid="b12-sensors-11-09573">12</xref>], based on boosted-LBP features and SVM, on 7-class facial expression recognition tasks they reported an accuracy of 91.1%, 91.1% and 91.4% with linear, polynomial and RBF kernels, respectively.</p></sec>
<sec>
<label>5.4.</label>
<title>Computational and Memory Complexity Comparison</title>
<p>The computational and memory complexity of a dimensionality reduction method is mainly determined by the target embedded feature dimensionality <italic>d</italic> and the number of training data points <italic>n</italic> (<italic>d</italic> &lt; <italic>n</italic>). <xref ref-type="table" rid="t5-sensors-11-09573">Table 5</xref> presents a comparison of computational and memory complexity of different dimensionality reduction methods. The computational complexity demanding part of PCA and LDA is the eigenanalyis of a <italic>d</italic> × <italic>d</italic> matrix performed using a power method in <italic>O</italic>(<italic>d</italic><sup>3</sup>). The corresponding memory requirement of PCA and LDA is <italic>O</italic>(<italic>d</italic><sup>2</sup>). For the used kernel methods including KPCA, KLDA, KIsomap and KDIsomap, an eigenanalysis of an <italic>n</italic> × <italic>n</italic> matrix is performed using a power method in <italic>O</italic>(<italic>n</italic><sup>3</sup>), so their computational complexity is <italic>O</italic>(<italic>n</italic><sup>3</sup>). Since a full <italic>n</italic> × <italic>n</italic> kernel matrix is stored when performing KPCA, KLDA, KIsomap and KDIsomap, the memory complexity of these kernel methods is <italic>O</italic>(<italic>n</italic><sup>2</sup>). As shown in <xref ref-type="table" rid="t5-sensors-11-09573">Table 5</xref>, we can see that the proposed KDIsomap has the same computational and memory complexity as other kernel methods such as KPCA, KLDA and KIsomap.</p></sec></sec>
<sec sec-type="conclusions">
<label>6.</label>
<title>Conclusions</title>
<p>In this paper, a new kernel-based manifold learning algorithm, called KDIsomap, is proposed for facial expression recognition. KDIsomap has two prominent characteristics. For one thing, as a kernel-based feature extraction method, KDIsomap can extract the nonlinear feature information embedded on a data set, as KPCA and KLDA do. For another, KDIsomap is designed to offer a high discriminating power for its low-dimensional embedded data representations in an effort to improve the performance on facial expression recognition. It’s worth pointing out that in our work we focus on facial expression recognition by using static images from two well-known facial expression databases, but we do not consider the temporal behaviors of facial expressions, which can potentially lead to more robust and accurate classification results. Therefore, it is also an interesting task to explore the performance of temporal information on facial expression recognition in our future work.</p></sec></body>
<back>
<ack>
<p>This work is supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. Z1101048 and Grant No. Y1111058.</p></ack>
<ref-list>
<title>References</title>
<ref id="b1-sensors-11-09573"><label>1.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cowie</surname><given-names>R</given-names></name><name><surname>Douglas-Cowie</surname><given-names>E</given-names></name><name><surname>Tsapatsoulis</surname><given-names>N</given-names></name><name><surname>Votsis</surname><given-names>G</given-names></name><name><surname>Kollias</surname><given-names>S</given-names></name><name><surname>Fellenz</surname><given-names>W</given-names></name><name><surname>Taylor</surname><given-names>JG</given-names></name></person-group><article-title>Emotion recognition in human-computer interaction</article-title><source>IEEE Signal Proc. Mag</source><year>2001</year><volume>18</volume><fpage>32</fpage><lpage>80</lpage><pub-id pub-id-type="doi">10.1109/79.911197</pub-id></citation></ref>
<ref id="b2-sensors-11-09573"><label>2.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Tian</surname><given-names>Y</given-names></name><name><surname>Kanade</surname><given-names>T</given-names></name><name><surname>Cohn</surname><given-names>J</given-names></name></person-group><article-title>Facial expression analysis</article-title><source>Handbook of Face Recognition</source><publisher-name>Springer</publisher-name><publisher-loc>Berlin, Germany</publisher-loc><year>2005</year><fpage>247</fpage><lpage>275</lpage></citation></ref>
<ref id="b3-sensors-11-09573"><label>3.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Viola</surname><given-names>P</given-names></name><name><surname>Jones</surname><given-names>M</given-names></name></person-group><article-title>Robust real-time face detection</article-title><source>Int. J. Comput. Vis</source><year>2004</year><volume>57</volume><fpage>137</fpage><lpage>154</lpage><pub-id pub-id-type="doi">10.1023/B:VISI.0000013087.49260.fb</pub-id></citation></ref>
<ref id="b4-sensors-11-09573"><label>4.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Turk</surname><given-names>MA</given-names></name><name><surname>Pentland</surname><given-names>AP</given-names></name></person-group><article-title>Face recognition using eigenfaces</article-title><conf-name>Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</conf-name><conf-loc>Maui, HI, USA</conf-loc><conf-date>3–6 June 1991</conf-date><fpage>586</fpage><lpage>591</lpage></citation></ref>
<ref id="b5-sensors-11-09573"><label>5.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Belhumeur</surname><given-names>PN</given-names></name><name><surname>Hespanha</surname><given-names>JP</given-names></name><name><surname>Kriegman</surname><given-names>DJ</given-names></name></person-group><article-title>Eigenfaces <italic>vs</italic>. fisherfaces: Recognition using class specific linear projection</article-title><source>IEEE Trans. Pattern Anal. Mach. Intell</source><year>1997</year><volume>19</volume><fpage>711</fpage><lpage>720</lpage><pub-id pub-id-type="doi">10.1109/34.598228</pub-id></citation></ref>
<ref id="b6-sensors-11-09573"><label>6.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Daugman</surname><given-names>JG</given-names></name></person-group><article-title>Complete discrete 2-d gabor transforms by neural networks for image analysis and compression</article-title><source>IEEE Trans. Acoust. Speech Signal Process</source><year>1988</year><volume>36</volume><fpage>1169</fpage><lpage>1179</lpage><pub-id pub-id-type="doi">10.1109/29.1644</pub-id></citation></ref>
<ref id="b7-sensors-11-09573"><label>7.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shen</surname><given-names>L</given-names></name><name><surname>Bai</surname><given-names>L</given-names></name></person-group><article-title>A review on gabor wavelets for face recognition</article-title><source>Pattern Anal. Appl</source><year>2006</year><volume>9</volume><fpage>273</fpage><lpage>292</lpage><pub-id pub-id-type="doi">10.1007/s10044-006-0033-y</pub-id></citation></ref>
<ref id="b8-sensors-11-09573"><label>8.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shen</surname><given-names>L</given-names></name><name><surname>Bai</surname><given-names>L</given-names></name><name><surname>Fairhurst</surname><given-names>M</given-names></name></person-group><article-title>Gabor wavelets and general discriminant analysis for face identification and verification</article-title><source>Image Vis. Comput</source><year>2007</year><volume>25</volume><fpage>553</fpage><lpage>563</lpage><pub-id pub-id-type="doi">10.1016/j.imavis.2006.05.002</pub-id></citation></ref>
<ref id="b9-sensors-11-09573"><label>9.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lyons</surname><given-names>MJ</given-names></name><name><surname>Budynek</surname><given-names>J</given-names></name><name><surname>Akamatsu</surname><given-names>S</given-names></name></person-group><article-title>Automatic classification of single facial images</article-title><source>IEEE Trans. Pattern Anal. Mach. Intell</source><year>1999</year><volume>21</volume><fpage>1357</fpage><lpage>1362</lpage><pub-id pub-id-type="doi">10.1109/34.817413</pub-id></citation></ref>
<ref id="b10-sensors-11-09573"><label>10.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ojala</surname><given-names>T</given-names></name><name><surname>Pietikäinen</surname><given-names>M</given-names></name><name><surname>Mäenpää</surname><given-names>T</given-names></name></person-group><article-title>Multiresolution gray scale and rotation invariant texture analysis with local binary patterns</article-title><source>IEEE Trans. Pattern Anal. Mach. Intell</source><year>2002</year><volume>24</volume><fpage>971</fpage><lpage>987</lpage><pub-id pub-id-type="doi">10.1109/TPAMI.2002.1017623</pub-id></citation></ref>
<ref id="b11-sensors-11-09573"><label>11.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Shan</surname><given-names>C</given-names></name><name><surname>Gong</surname><given-names>S</given-names></name><name><surname>McOwan</surname><given-names>P</given-names></name></person-group><article-title>Robust facial expression recognition using local binary patterns</article-title><conf-name>Proceedings of IEEE International Conference on Image Processing (ICIP)</conf-name><conf-loc>Genoa, Italy</conf-loc><conf-date>11–14 September 2005</conf-date><fpage>370</fpage><lpage>373</lpage></citation></ref>
<ref id="b12-sensors-11-09573"><label>12.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shan</surname><given-names>C</given-names></name><name><surname>Gong</surname><given-names>S</given-names></name><name><surname>McOwan</surname><given-names>P</given-names></name></person-group><article-title>Facial expression recognition based on local binary patterns: A comprehensive study</article-title><source>Image Vis. Comput</source><year>2009</year><volume>27</volume><fpage>803</fpage><lpage>816</lpage><pub-id pub-id-type="doi">10.1016/j.imavis.2008.08.005</pub-id></citation></ref>
<ref id="b13-sensors-11-09573"><label>13.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moore</surname><given-names>S</given-names></name><name><surname>Bowden</surname><given-names>R</given-names></name></person-group><article-title>Local binary patterns for multi-view facial expression recognition</article-title><source>Comput. Vis. Image. Und</source><year>2011</year><volume>115</volume><fpage>541</fpage><lpage>558</lpage><pub-id pub-id-type="doi">10.1016/j.cviu.2010.12.001</pub-id></citation></ref>
<ref id="b14-sensors-11-09573"><label>14.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tian</surname><given-names>Y</given-names></name><name><surname>Kanade</surname><given-names>T</given-names></name><name><surname>Cohn</surname><given-names>J</given-names></name></person-group><article-title>Recognizing action units for facial expression analysis</article-title><source>IEEE Trans. Pattern Anal. Mach. Intell</source><year>2002</year><volume>23</volume><fpage>97</fpage><lpage>115</lpage></citation></ref>
<ref id="b15-sensors-11-09573"><label>15.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kotsia</surname><given-names>I</given-names></name><name><surname>Pitas</surname><given-names>I</given-names></name></person-group><article-title>Facial expression recognition in image sequences using geometric deformation features and support vector machines</article-title><source>IEEE Trans. Image Process</source><year>2007</year><volume>16</volume><fpage>172</fpage><lpage>187</lpage><pub-id pub-id-type="doi">10.1109/TIP.2006.884954</pub-id><pub-id pub-id-type="pmid">17283776</pub-id></citation></ref>
<ref id="b16-sensors-11-09573"><label>16.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Donato</surname><given-names>G</given-names></name><name><surname>Bartlett</surname><given-names>M</given-names></name><name><surname>Hager</surname><given-names>J</given-names></name><name><surname>Ekman</surname><given-names>P</given-names></name><name><surname>Sejnowski</surname><given-names>T</given-names></name></person-group><article-title>Classifying facial actions</article-title><source>IEEE Trans. Pattern Anal. Mach. Intell</source><year>1999</year><volume>21</volume><fpage>974</fpage><lpage>989</lpage><pub-id pub-id-type="doi">10.1109/34.799905</pub-id><pub-id pub-id-type="pmid">21188284</pub-id></citation></ref>
<ref id="b17-sensors-11-09573"><label>17.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pantic</surname><given-names>M</given-names></name><name><surname>Patras</surname><given-names>I</given-names></name></person-group><article-title>Dynamics of facial expression: Recognition of facial actions and their temporal segments from face profile image sequences</article-title><source>IEEE Trans. Syst. Man. Cybern. Part B</source><year>2006</year><volume>36</volume><fpage>433</fpage><lpage>449</lpage><pub-id pub-id-type="doi">10.1109/TSMCB.2005.859075</pub-id></citation></ref>
<ref id="b18-sensors-11-09573"><label>18.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cohen</surname><given-names>I</given-names></name><name><surname>Sebe</surname><given-names>N</given-names></name><name><surname>Garg</surname><given-names>A</given-names></name><name><surname>Chen</surname><given-names>LS</given-names></name><name><surname>Huang</surname><given-names>TS</given-names></name></person-group><article-title>Facial expression recognition from video sequences: Temporal and static modeling</article-title><source>Comput. Vis. Image. Und</source><year>2003</year><volume>91</volume><fpage>160</fpage><lpage>187</lpage><pub-id pub-id-type="doi">10.1016/S1077-3142(03)00081-X</pub-id></citation></ref>
<ref id="b19-sensors-11-09573"><label>19.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roweis</surname><given-names>ST</given-names></name><name><surname>Saul</surname><given-names>LK</given-names></name></person-group><article-title>Nonlinear dimensionality reduction by locally linear embedding</article-title><source>Science</source><year>2000</year><volume>290</volume><fpage>2323</fpage><lpage>2326</lpage><pub-id pub-id-type="doi">10.1126/science.290.5500.2323</pub-id><pub-id pub-id-type="pmid">11125150</pub-id></citation></ref>
<ref id="b20-sensors-11-09573"><label>20.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tenenbaum</surname><given-names>JB</given-names></name><name><surname>Silva</surname><given-names>VD</given-names></name><name><surname>Langford</surname><given-names>JC</given-names></name></person-group><article-title>A global geometric framework for nonlinear dimensionality reduction</article-title><source>Science</source><year>2000</year><volume>290</volume><fpage>2319</fpage><lpage>2323</lpage><pub-id pub-id-type="doi">10.1126/science.290.5500.2319</pub-id><pub-id pub-id-type="pmid">11125149</pub-id></citation></ref>
<ref id="b21-sensors-11-09573"><label>21.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chang</surname><given-names>Y</given-names></name><name><surname>Hu</surname><given-names>C</given-names></name><name><surname>Feris</surname><given-names>R</given-names></name><name><surname>Turk</surname><given-names>M</given-names></name></person-group><article-title>Manifold based analysis of facial expression</article-title><source>Image Vis. Comput</source><year>2006</year><volume>24</volume><fpage>605</fpage><lpage>614</lpage><pub-id pub-id-type="doi">10.1016/j.imavis.2005.08.006</pub-id></citation></ref>
<ref id="b22-sensors-11-09573"><label>22.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cheon</surname><given-names>Y</given-names></name><name><surname>Kim</surname><given-names>D</given-names></name></person-group><article-title>Natural facial expression recognition using differential-aam and manifold learning</article-title><source>Pattern Recogn</source><year>2009</year><volume>42</volume><fpage>1340</fpage><lpage>1350</lpage><pub-id pub-id-type="doi">10.1016/j.patcog.2008.10.010</pub-id></citation></ref>
<ref id="b23-sensors-11-09573"><label>23.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Choi</surname><given-names>H</given-names></name><name><surname>Choi</surname><given-names>S</given-names></name></person-group><article-title>Robust kernel isomap</article-title><source>Pattern Recogn</source><year>2007</year><volume>40</volume><fpage>853</fpage><lpage>862</lpage><pub-id pub-id-type="doi">10.1016/j.patcog.2006.04.025</pub-id></citation></ref>
<ref id="b24-sensors-11-09573"><label>24.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scholkopf</surname><given-names>B</given-names></name><name><surname>Smola</surname><given-names>A</given-names></name><name><surname>Muller</surname><given-names>K</given-names></name></person-group><article-title>Nonlinear component analysis as a kernel eigenvalue problem</article-title><source>Neural Comput</source><year>1998</year><volume>10</volume><fpage>1299</fpage><lpage>1319</lpage><pub-id pub-id-type="doi">10.1162/089976698300017467</pub-id></citation></ref>
<ref id="b25-sensors-11-09573"><label>25.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Ham</surname><given-names>J</given-names></name><name><surname>Lee</surname><given-names>D</given-names></name><name><surname>Mika</surname><given-names>S</given-names></name><name><surname>Scholkopf</surname><given-names>B</given-names></name></person-group><article-title>A kernel view of the dimensionality reduction of manifolds</article-title><conf-name>Proceedings of Twenty-First International Conference on Machine Learning</conf-name><conf-loc>Banff, AB, Canada</conf-loc><conf-date>4–8 July 2004</conf-date><fpage>369</fpage><lpage>376</lpage></citation></ref>
<ref id="b26-sensors-11-09573"><label>26.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Kanade</surname><given-names>T</given-names></name><name><surname>Tian</surname><given-names>Y</given-names></name><name><surname>Cohn</surname><given-names>J</given-names></name></person-group><article-title>Comprehensive database for facial expression analysis</article-title><conf-name>Proceedings of the 4th IEEE International Conference on Automatic Face and Gesture Recognition</conf-name><conf-loc>Grenoble, France</conf-loc><conf-date>26–30 March 2000</conf-date><fpage>46</fpage><lpage>53</lpage></citation></ref>
<ref id="b27-sensors-11-09573"><label>27.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Campadelli</surname><given-names>P</given-names></name><name><surname>Lanzarotti</surname><given-names>R</given-names></name><name><surname>Lipori</surname><given-names>G</given-names></name><name><surname>Salvi</surname><given-names>E</given-names></name></person-group><article-title>Face and facial feature localization</article-title><conf-name>Proceedings of the 13th International Conference on Image Analysis and Processing</conf-name><conf-loc>Cagliari, Italy</conf-loc><conf-date>6–8 September 2005</conf-date><fpage>1002</fpage><lpage>1009</lpage></citation></ref>
<ref id="b28-sensors-11-09573"><label>28.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ahonen</surname><given-names>T</given-names></name><name><surname>Hadid</surname><given-names>A</given-names></name><name><surname>Pietikainen</surname><given-names>M</given-names></name></person-group><article-title>Face description with local binary patterns: Application to face recognition</article-title><source>IEEE Trans. Pattern Anal. Mach. Intell</source><year>2006</year><volume>28</volume><fpage>2037</fpage><lpage>2041</lpage><pub-id pub-id-type="doi">10.1109/TPAMI.2006.244</pub-id><pub-id pub-id-type="pmid">17108377</pub-id></citation></ref>
<ref id="b29-sensors-11-09573"><label>29.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baudat</surname><given-names>G</given-names></name><name><surname>Anouar</surname><given-names>F</given-names></name></person-group><article-title>Generalized discriminant analysis using a kernel approach</article-title><source>Neural Comput</source><year>2000</year><volume>12</volume><fpage>2385</fpage><lpage>2404</lpage><pub-id pub-id-type="doi">10.1162/089976600300014980</pub-id><pub-id pub-id-type="pmid">11032039</pub-id></citation></ref></ref-list>
<sec sec-type="display-objects">
<title>Figures and Tables</title>
<fig id="f1-sensors-11-09573" position="float">
<label>Figure 1.</label>
<caption>
<p>An example of basic LBP operator.</p></caption>
<graphic xlink:href="sensors-11-09573f1.gif"/></fig>
<fig id="f2-sensors-11-09573" position="float">
<label>Figure 2.</label>
<caption>
<p>An example of the extended LBP with different (<italic>P</italic>, <italic>R</italic>).</p></caption>
<graphic xlink:href="sensors-11-09573f2.gif"/></fig>
<fig id="f3-sensors-11-09573" position="float">
<label>Figure 3.</label>
<caption>
<p>Examples of facial expression images from the JAFFE database.</p></caption>
<graphic xlink:href="sensors-11-09573f3.gif"/></fig>
<fig id="f4-sensors-11-09573" position="float">
<label>Figure 4.</label>
<caption>
<p>Examples of facial expression images from the Cohn-Kanade database.</p></caption>
<graphic xlink:href="sensors-11-09573f4.gif"/></fig>
<fig id="f5-sensors-11-09573" position="float">
<label>Figure 5.</label>
<caption>
<p>(<bold>a</bold>) Two eyes location of an original image from the Cohn-Kanade database. (<bold>b</bold>) The final cropped image of 110 × 150 pixels.</p></caption>
<graphic xlink:href="sensors-11-09573f5.gif"/></fig>
<fig id="f6-sensors-11-09573" position="float">
<label>Figure 6.</label>
<caption>
<p>The basic system structure for facial expression recognition experiments using dimensionality reduction methods.</p></caption>
<graphic xlink:href="sensors-11-09573f6.gif"/></fig>
<fig id="f7-sensors-11-09573" position="float">
<label>Figure 7.</label>
<caption>
<p>Performance comparisons of different methods on the JAFFE database.</p></caption>
<graphic xlink:href="sensors-11-09573f7.gif"/></fig>
<fig id="f8-sensors-11-09573" position="float">
<label>Figure 8.</label>
<caption>
<p>Performance comparisons of different methods on the Cohn-Kanade database.</p></caption>
<graphic xlink:href="sensors-11-09573f8.gif"/></fig>
<table-wrap id="t1-sensors-11-09573" position="float">
<label>Table 1.</label>
<caption>
<p>The best accuracy (%) of different methods on the JAFFE database.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center" valign="middle"><bold>Method</bold></th>
<th align="center" valign="middle"><bold>PCA</bold></th>
<th align="center" valign="middle"><bold>LDA</bold></th>
<th align="center" valign="middle"><bold>KPCA</bold></th>
<th align="center" valign="middle"><bold>KLDA</bold></th>
<th align="center" valign="middle"><bold>KIsomap</bold></th>
<th align="center" valign="middle"><bold>KDIsomap</bold></th></tr></thead>
<tbody>
<tr>
<td align="center" valign="top">Dimension</td>
<td align="center" valign="top">20</td>
<td align="center" valign="top">6</td>
<td align="center" valign="top">40</td>
<td align="center" valign="top">6</td>
<td align="center" valign="top">70</td>
<td align="center" valign="top">20</td></tr>
<tr>
<td align="center" valign="top">Accuracy</td>
<td align="center" valign="top">78.09 ± 4.2</td>
<td align="center" valign="top">80.81 ± 3.6</td>
<td align="center" valign="top">78.47 ± 4.0</td>
<td align="center" valign="top">80.93 ± 3.9</td>
<td align="center" valign="top">69.52 ± 4.7</td>
<td align="center" valign="top">81.59 ± 3.5</td></tr></tbody></table></table-wrap>
<table-wrap id="t2-sensors-11-09573" position="float">
<label>Table 2.</label>
<caption>
<p>Confusion matrix of 7-class facial expression recognition results obtained by KDIsomap on the JAFFE database.</p></caption>
<table frame="below" rules="groups">
<thead>
<tr>
<th align="center" valign="bottom"/>
<th colspan="7" align="center" valign="bottom">
<hr/></th></tr>
<tr>
<th align="left" valign="top"/>
<th align="center" valign="top"><bold>Anger (%)</bold></th>
<th align="center" valign="top"><bold>Joy (%)</bold></th>
<th align="center" valign="top"><bold>Sadness (%)</bold></th>
<th align="center" valign="top"><bold>Surprise (%)</bold></th>
<th align="center" valign="top"><bold>Disgust (%)</bold></th>
<th align="center" valign="top"><bold>Fear (%)</bold></th>
<th align="center" valign="top"><bold>Neutral (%)</bold></th></tr></thead>
<tbody>
<tr>
<td align="left" valign="top"><bold>Anger</bold></td>
<td align="center" valign="top"><bold>90.10</bold></td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">3.58</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">3.32</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">3.00</td></tr>
<tr>
<td align="left" valign="top"><bold>Joy</bold></td>
<td align="center" valign="top">0</td>
<td align="center" valign="top"><bold>93.54</bold></td>
<td align="center" valign="top">3.12</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">3.34</td></tr>
<tr>
<td align="left" valign="top"><bold>Sadness</bold></td>
<td align="center" valign="top">6.45</td>
<td align="center" valign="top">3.21</td>
<td align="center" valign="top"><bold>61.88</bold></td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">3.29</td>
<td align="center" valign="top">9.68</td>
<td align="center" valign="top">15.49</td></tr>
<tr>
<td align="left" valign="top"><bold>Surprise</bold></td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">3.13</td>
<td align="center" valign="top">3.54</td>
<td align="center" valign="top"><bold>86.67</bold></td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">6.66</td>
<td align="center" valign="top">0</td></tr>
<tr>
<td align="left" valign="top"><bold>Disgust</bold></td>
<td align="center" valign="top">7.42</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">3.68</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top"><bold>81.48</bold></td>
<td align="center" valign="top">7.42</td>
<td align="center" valign="top">0</td></tr>
<tr>
<td align="left" valign="top"><bold>Fear</bold></td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">12.48</td>
<td align="center" valign="top">6.25</td>
<td align="center" valign="top">3.13</td>
<td align="center" valign="top"><bold>78.14</bold></td>
<td align="center" valign="top">0</td></tr>
<tr>
<td align="left" valign="top"><bold>Neutral</bold></td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">17.23</td>
<td align="center" valign="top">3.45</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top"><bold>79.32</bold></td></tr></tbody></table></table-wrap>
<table-wrap id="t3-sensors-11-09573" position="float">
<label>Table 3.</label>
<caption>
<p>The best accuracy (%) of different methods on the Cohn-Kanade database.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center" valign="middle"><bold>Method</bold></th>
<th align="center" valign="middle"><bold>PCA</bold></th>
<th align="center" valign="middle"><bold>LDA</bold></th>
<th align="center" valign="middle"><bold>KPCA</bold></th>
<th align="center" valign="middle"><bold>KLDA</bold></th>
<th align="center" valign="middle"><bold>KIsomap</bold></th>
<th align="center" valign="middle"><bold>KDIsomap</bold></th></tr></thead>
<tbody>
<tr>
<td align="center" valign="top">Dimension</td>
<td align="center" valign="top">55</td>
<td align="center" valign="top">6</td>
<td align="center" valign="top">60</td>
<td align="center" valign="top">6</td>
<td align="center" valign="top">40</td>
<td align="center" valign="top">30</td></tr>
<tr>
<td align="center" valign="top">Accuracy</td>
<td align="center" valign="top">92.43 ± 3.3</td>
<td align="center" valign="top">90.18 ± 3.0</td>
<td align="center" valign="top">92.59 ± 3.6</td>
<td align="center" valign="top">93.32 ± 3.0</td>
<td align="center" valign="top">75.81 ± 4.2</td>
<td align="center" valign="top">94.88 ± 3.1</td></tr></tbody></table></table-wrap>
<table-wrap id="t4-sensors-11-09573" position="float">
<label>Table 4.</label>
<caption>
<p>Confusion matrix of 7-class facial expression recognition results obtained by KDIsomap on the Cohn-Kanade database.</p></caption>
<table frame="below" rules="groups">
<thead>
<tr>
<th align="center" valign="bottom"/>
<th colspan="7" align="center" valign="bottom">
<hr/></th></tr>
<tr>
<th align="left" valign="top"/>
<th align="center" valign="top"><bold>Anger (%)</bold></th>
<th align="center" valign="top"><bold>Joy (%)</bold></th>
<th align="center" valign="top"><bold>Sadness (%)</bold></th>
<th align="center" valign="top"><bold>Surprise (%)</bold></th>
<th align="center" valign="top"><bold>Disgust (%)</bold></th>
<th align="center" valign="top"><bold>Fear (%)</bold></th>
<th align="center" valign="top"><bold>Neutral (%)</bold></th></tr></thead>
<tbody>
<tr>
<td align="left" valign="top"><bold>Anger</bold></td>
<td align="center" valign="top"><bold>97.60</bold></td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">0.96</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">1.44</td>
<td align="center" valign="top">0</td></tr>
<tr>
<td align="left" valign="top"><bold>Joy</bold></td>
<td align="center" valign="top">0.31</td>
<td align="center" valign="top"><bold>95.53</bold></td>
<td align="center" valign="top">0.28</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">1.97</td>
<td align="center" valign="top">0.30</td>
<td align="center" valign="top">1.61</td></tr>
<tr>
<td align="left" valign="top"><bold>Sadness</bold></td>
<td align="center" valign="top">2.15</td>
<td align="center" valign="top">1.02</td>
<td align="center" valign="top"><bold>89.84</bold></td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">5.76</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">1.23</td></tr>
<tr>
<td align="left" valign="top"><bold>Surprise</bold></td>
<td align="center" valign="top">0.24</td>
<td align="center" valign="top">0.24</td>
<td align="center" valign="top">1.99</td>
<td align="center" valign="top"><bold>97.18</bold></td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">0.35</td></tr>
<tr>
<td align="left" valign="top"><bold>Disgust</bold></td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">1.16</td>
<td align="center" valign="top">1.28</td>
<td align="center" valign="top">3.00</td>
<td align="center" valign="top"><bold>94.21</bold></td>
<td align="center" valign="top">0.35</td>
<td align="center" valign="top">0</td></tr>
<tr>
<td align="left" valign="top"><bold>Fear</bold></td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">0.38</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top"><bold>99.62</bold></td>
<td align="center" valign="top">0</td></tr>
<tr>
<td align="left" valign="top"><bold>Neutral</bold></td>
<td align="center" valign="top">2.12</td>
<td align="center" valign="top">1.79</td>
<td align="center" valign="top">3.27</td>
<td align="center" valign="top">0.44</td>
<td align="center" valign="top">1.79</td>
<td align="center" valign="top">0.44</td>
<td align="center" valign="top"><bold>90.15</bold></td></tr></tbody></table></table-wrap>
<table-wrap id="t5-sensors-11-09573" position="float">
<label>Table 5.</label>
<caption>
<p>Computational and memory complexity of different dimensionality reduction methods.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center" valign="middle"><bold>Method</bold></th>
<th align="center" valign="middle"><bold>PCA</bold></th>
<th align="center" valign="middle"><bold>LDA</bold></th>
<th align="center" valign="middle"><bold>KPCA</bold></th>
<th align="center" valign="middle"><bold>KLDA</bold></th>
<th align="center" valign="middle"><bold>KIsomap</bold></th>
<th align="center" valign="middle"><bold>KDIsomap</bold></th></tr></thead>
<tbody>
<tr>
<td align="center" valign="top">Computational complexity</td>
<td align="center" valign="top"><italic>O</italic>(<italic>d</italic><sup>3</sup>)</td>
<td align="center" valign="top"><italic>O</italic>(<italic>d</italic><sup>3</sup>)</td>
<td align="center" valign="top"><italic>O</italic>(<italic>n</italic><sup>3</sup>)</td>
<td align="center" valign="top"><italic>O</italic>(<italic>n</italic><sup>3</sup>)</td>
<td align="center" valign="top"><italic>O</italic>(<italic>n</italic><sup>3</sup>)</td>
<td align="center" valign="top"><italic>O</italic>(<italic>n</italic><sup>3</sup>)</td></tr>
<tr>
<td align="center" valign="top">Memory complexity</td>
<td align="center" valign="top"><italic>O</italic>(<italic>d</italic><sup>2</sup>)</td>
<td align="center" valign="top"><italic>O</italic>(<italic>d</italic><sup>2</sup>)</td>
<td align="center" valign="top"><italic>O</italic>(<italic>n</italic><sup>2</sup>)</td>
<td align="center" valign="top"><italic>O</italic>(<italic>n</italic><sup>2</sup>)</td>
<td align="center" valign="top"><italic>O</italic>(<italic>n</italic><sup>2</sup>)</td>
<td align="center" valign="top"><italic>O</italic>(<italic>n</italic><sup>2</sup>)</td></tr></tbody></table></table-wrap></sec></back></article>
