^{1}

^{*}

^{2}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Facial expression recognition is an interesting and challenging subject. Considering the nonlinear manifold structure of facial images, a new kernel-based manifold learning method, called kernel discriminant isometric mapping (KDIsomap), is proposed. KDIsomap aims to nonlinearly extract the discriminant information by maximizing the interclass scatter while minimizing the intraclass scatter in a reproducing kernel Hilbert space. KDIsomap is used to perform nonlinear dimensionality reduction on the extracted local binary patterns (LBP) facial features, and produce low-dimensional discrimimant embedded data representations with striking performance improvement on facial expression recognition tasks. The nearest neighbor classifier with the Euclidean metric is used for facial expression classification. Facial expression recognition experiments are performed on two popular facial expression databases,

Facial expressions are the facial changes indicating a person’s internal affective states, intentions or social communications. Based on the shown facial expressions, human face is the predominant mode of expressing and interpreting affective states of human beings. Automatic facial expression recognition impacts important applications in many areas such as natural human computer interactions, image retrieval, talking heads and human emotion analysis [

A basic automatic facial expression recognition system generally consists of three steps [

In the second step of an automatic facial expression recognition system, the extracted facial features are represented by a set of high-dimensional data. Therefore, it would be desired to analyze facial expressions in the low-dimensional subspace rather than the ambient space. To solve this problem, there are mainly two methods: linear and nonlinear. The well-known linear methods are PCA and LDA. However, they are not suitable for representing dynamically changing facial expressions, which can be represented as low-dimensional nonlinear manifolds embedded in a high-dimensional image space [

To overcome the above drawback of manifold leaning methods, some kernel-based manifold learning methods, like kernel isometric mapping (KIsomap) [

To overcome the limitations of KIsomap mentioned above, in this paper a new kernel-based feature extraction method, called kernel discriminant Isomap (KDIsomap) is proposed. On one hand, KDIsomap considers both the intraclass scatter information and the interclass scatter information in a reproducing kernel Hilbert space (RKHS), and emphasizes the discriminant information. On the other hand, KDIsomap performs a nonlinear kernel mapping with a kernel function to extract the nonlinear features when mapping input data into some high-dimensional feature space. After extracting LBP features for facial representations, the proposed KDIsomap is used to produce the low-dimensional discriminant embedded data representations from the extracted LBP features with striking performance improvement on facial expression recognition tasks.

The remainder of this paper is organized as follows. The Local Binary Patterns (LBP) operator is described in Section 2. In Section 3, KIsomap is reviewed briefly and the proposed KDIsomap algorithm is presented in detail. In Section 4, two facial expression databases used for experiments are described. Section 5 shows the experiment results and analysis. Finally, the conclusions are given in Section 6.

The original local binary patterns (LBP) [

The limitation of the basic LBP operator is that its small 3 × 3 neighborhood cannot capture the dominant features with large scale structures. As a result, to deal with the texture at different scales, the operator was later extended to use neighborhoods of different sizes. ^{P}

After labeling an image with the LBP operator, a histogram of the labeled image _{l}

This LBP histogram contains information about the distribution of the local micro-patterns, such as edges, spots and flat areas, over the whole image, so can be used to statistically describe image characteristics. For efficient face representation, face images were equally divided into _{1}, _{2}, ..., _{m}_{1}, _{2}, ..., _{m}

In this section, we review the existing KIsomap algorithm in brief and explain the proposed KDIsomap algorithm in detail.

The approximate geodesic distance matrix used in Isomap [

The KIsomap [

Step 1: Identify the

Step 2: Compute the geodesic distances, _{ij}

Step 3: Construct a matrix ^{2}) based on the approximate geodesic distance matrix:
^{T}^{T}^{N}

Step 4: Compute the largest eigenvalue, ^{*}.

Step 5: Compute the top ^{N×d}^{d×d}

Step 6: The embedded coordinates of the input points in the

The Fisher’s criterion in LDA [

To develop the KDIsomap algorithm, a kernel matrix is firstly constructed by performing a nonlinear kernel mapping with a kernel function, and then a kernel discriminant distance, in which the interclass scatter is maximized while the intraclass scatter is simultaneously minimized, is designed to extract the discriminant information in a RKHS.

Given the input data point (_{i}_{i}_{i}^{D}_{i}_{i}_{i}^{d}

Step 1: Kernel mapping for each input data point _{i}

A nonlinear mapping function ^{D}

The input data point _{i}^{D}_{i}_{j}

Step 2: Find the nearest neighbors of each data point _{i}

The kernel Euclidean distance measure induced by a kernel

To preserve the intraclass neighbouring geometry, while maximizing the interclass scatter, a kernel discriminant distance in a RKHS is given as follows:
_{κ}_{i}_{j}_{κ}_{i}_{j}_{κ}_{i}_{j}_{κ}_{i}_{j}

Step 3: Estimate the approximate geodesic distances.

Step 4: Construct a matrix ^{2}) based on ^{2}) = ^{2}

Step 5: Compute the top

Two popular facial expression databases,

The JAFFE database contains 213 images of female facial expressions. Each image has a resolution of 256 × 256 pixels. The head is almost in frontal pose. The number of images corresponding to each of the seven categories of expressions is roughly the same. A few of them are shown in

The Cohn-Kanade database consists of 100 university students aged from 18 to 30 years, of which 65% were female, 15% were African-American and 3% were Asian or Latino. Subjects were instructed to perform a series of 23 facial displays, six of which were based on description of prototypic emotions. Image sequences from neutral to target display were digitized into 640 × 490 pixels with 8-bit precision for grayscale values.

Following the setting in [

The cropped facial images of 110 × 150 pixels contain facial main components such as mouth, eyes, brows and noses. The LBP operator is applied to the whole region of the cropped facial images. For better uniform-LBP feature extraction, two parameters,

To evaluate the performance of KDIsomap, facial expression recognition experiments were performed separately on the JAFFE database and the Cohn-Kanade Database. The performance of KDIsomap is compared with PCA [_{i}_{j}_{i}_{j}^{2}/2^{2}) is adopted for KPCA, KLDA, KIsomap and KDIsomap, and the parameter

A 10-fold cross validation scheme is employed for 7-class facial expression recognition experiments, and the average recognition results are reported. In detail, the data sets are split randomly into ten groups of roughly equal numbers of subjects. Nine groups are used as the training data to train a classifier, while the remaining group is used as the testing data. The above process is repeated ten times for each group in turn to be omitted from the training process. Finally, the average recognition results on the testing data are reported.

In order to clarify the experiment scheme of how to employ dimensionality reduction techniques such as PCA, LDA, KPCA, KLDA, KIsomap and KDIsomap on facial expression recognition tasks,

It can be observed from

The recognition results of different dimensionality reduction methods,

The recognition accuracy of 81.59% with basic LBP features and the nearest neighbor classifier is very encouraging, compared with the previously reported work [

To further explore the recognition accuracy per expression when KDIsomap performs best,

Compared with the previously reported work [

The computational and memory complexity of a dimensionality reduction method is mainly determined by the target embedded feature dimensionality ^{3}). The corresponding memory requirement of PCA and LDA is ^{2}). For the used kernel methods including KPCA, KLDA, KIsomap and KDIsomap, an eigenanalysis of an ^{3}), so their computational complexity is ^{3}). Since a full ^{2}). As shown in

In this paper, a new kernel-based manifold learning algorithm, called KDIsomap, is proposed for facial expression recognition. KDIsomap has two prominent characteristics. For one thing, as a kernel-based feature extraction method, KDIsomap can extract the nonlinear feature information embedded on a data set, as KPCA and KLDA do. For another, KDIsomap is designed to offer a high discriminating power for its low-dimensional embedded data representations in an effort to improve the performance on facial expression recognition. It’s worth pointing out that in our work we focus on facial expression recognition by using static images from two well-known facial expression databases, but we do not consider the temporal behaviors of facial expressions, which can potentially lead to more robust and accurate classification results. Therefore, it is also an interesting task to explore the performance of temporal information on facial expression recognition in our future work.

This work is supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. Z1101048 and Grant No. Y1111058.

An example of basic LBP operator.

An example of the extended LBP with different (

Examples of facial expression images from the JAFFE database.

Examples of facial expression images from the Cohn-Kanade database.

(

The basic system structure for facial expression recognition experiments using dimensionality reduction methods.

Performance comparisons of different methods on the JAFFE database.

Performance comparisons of different methods on the Cohn-Kanade database.

The best accuracy (%) of different methods on the JAFFE database.

Dimension | 20 | 6 | 40 | 6 | 70 | 20 |

Accuracy | 78.09 ± 4.2 | 80.81 ± 3.6 | 78.47 ± 4.0 | 80.93 ± 3.9 | 69.52 ± 4.7 | 81.59 ± 3.5 |

Confusion matrix of 7-class facial expression recognition results obtained by KDIsomap on the JAFFE database.

| |||||||
---|---|---|---|---|---|---|---|

0 | 3.58 | 0 | 3.32 | 0 | 3.00 | ||

0 | 3.12 | 0 | 0 | 0 | 3.34 | ||

6.45 | 3.21 | 0 | 3.29 | 9.68 | 15.49 | ||

0 | 3.13 | 3.54 | 0 | 6.66 | 0 | ||

7.42 | 0 | 3.68 | 0 | 7.42 | 0 | ||

0 | 0 | 12.48 | 6.25 | 3.13 | 0 | ||

0 | 0 | 17.23 | 3.45 | 0 | 0 |

The best accuracy (%) of different methods on the Cohn-Kanade database.

Dimension | 55 | 6 | 60 | 6 | 40 | 30 |

Accuracy | 92.43 ± 3.3 | 90.18 ± 3.0 | 92.59 ± 3.6 | 93.32 ± 3.0 | 75.81 ± 4.2 | 94.88 ± 3.1 |

Confusion matrix of 7-class facial expression recognition results obtained by KDIsomap on the Cohn-Kanade database.

| |||||||
---|---|---|---|---|---|---|---|

0 | 0.96 | 0 | 0 | 1.44 | 0 | ||

0.31 | 0.28 | 0 | 1.97 | 0.30 | 1.61 | ||

2.15 | 1.02 | 0 | 5.76 | 0 | 1.23 | ||

0.24 | 0.24 | 1.99 | 0 | 0 | 0.35 | ||

0 | 1.16 | 1.28 | 3.00 | 0.35 | 0 | ||

0 | 0 | 0 | 0.38 | 0 | 0 | ||

2.12 | 1.79 | 3.27 | 0.44 | 1.79 | 0.44 |

Computational and memory complexity of different dimensionality reduction methods.

Computational complexity | ^{3}) |
^{3}) |
^{3}) |
^{3}) |
^{3}) |
^{3}) |

Memory complexity | ^{2}) |
^{2}) |
^{2}) |
^{2}) |
^{2}) |
^{2}) |