Facial Expression Recognition Based on Discrete Separable Shearlet Transform and Feature Selection

: In this paper, a novel approach to facial expression recognition based on the discrete separable shearlet transform (DSST) and normalized mutual information feature selection is proposed. The approach can be divided into ﬁve steps. First, all test and training images are preprocessed. Second, DSST is applied to the preprocessed facial expression images, and all the transformation coefﬁcients are obtained as the original feature set. Third, an improved normalized mutual information feature selection is proposed to ﬁnd the optimal feature subset of the original feature set, thus we can retain the key classiﬁcation information of the original data. Fourth, the feature extraction and selection of the feature space is reduced by employing linear discriminant analysis. Finally, a support vector machine is used to recognize the expressions. In this study, experimental veriﬁcation was carried out on four open facial expression databases. The results show that this method can not only improve the recognition rate of facial expressions, but also signiﬁcantly reduce the computational complexity and improve the system efﬁciency.


Introduction
Facial expression recognition (FER) is attracting a lot of research attention because of its prospective applications in human-computer interactions and intelligent transportation systems [1][2][3][4].Researchers have found that facial expressions can be analyzed to identify emotions, behavioral information, and psychological activities.In the field of multimedia, if an effective FER system can recognize users' facial expressions in real time, it can feedback the changes of customers' facial expressions to the multimedia system, and the multimedia system can provide different multimedia contents for the customers [5][6][7][8].Therefore, facial expressions not only reflect the inner thoughts of human beings but are also an indispensable part of interpersonal communication [9].
FER usually involves three steps: preprocessing the image, extracting the facial expression features, and training and recognizing the expression feature model.Facial expression feature extraction is the most important part of an FER system.An effective expression feature extraction greatly improves the recognition performance.Many algorithms have been developed for this purpose.The study addressed in [10] employed biorthogonal wavelet entropy to extract multiscale features, and used a fuzzy multiclass support vector machine to be the classifier; experimental results demonstrate the effectiveness of the proposed algorithm.The study in [11] proposed a novel facial emotion recognition method based on discrete wavelet transform, principal component analysis, and cat swarm optimization.They used discrete wavelet transform to extract wavelet coefficients and principal component analysis was utilized to reduce the features.Finally, a single-hidden-layer neural network was used as the classifier.Experimental results demonstrate the feasibility of the proposed algorithm.Gabor is widely used in the pattern recognition fields, such as image processing and feature extraction [12].Gabor wavelet transform is an important feature extraction tool because of its multiresolution analysis.After the original image is decomposed by Gabor wavelets at different decomposition scales, we can obtain the approximated and detailed information of the target image at different levels.Unfortunately, for FER, the Gabor wavelet transform still has defects.Because the two-dimensional (2D) separable wavelet formed by the one-dimensional (1D) wavelet transform has limited directions, it cannot best represent the high-dimensional features with line or surface singularities.In fact, the features with line or surface singularities are very common in facial expression images, such as the outline of the mouth and the eyes.These are remarkable and precise features of human facial expressions.To overcome this limitation, multiscale geometric analysis tools have been developed, such as curvelet transform [13] and shearlet transform [14].Although curvelet basis functions are advantageous for approximating linear singularities, the edge of a facial expression image is usually a curve rather than a straight-line, so curvelet transform has limitations in expression recognition.The discrete separable shearlet transform (DSST) is a multiscale geometric framework for image analysis [15].It can better detect the edge and detail information because it has characteristics like multidirectional, multiscale, localization, and anisotropy.However, the coefficients of the image after DSST are large, so it is not desirable to use all the transformation coefficients as the expression feature set.Therefore, it is necessary to find the optimal feature subset of the original feature set so as to retain the key classification information of the original data and improve the identification efficiency of the system.In this paper, a novel approach to facial expression recognition based on DSST and normalized mutual information feature selection is proposed.
The remainder of this paper is organized as follows: Section 2 describes the theoretical methods adopted in this study, Section 3 describes the methodology, Section 4 presents the experimental results and discussions, and finally, Section 5 presents the conclusions and future work.

Theoretical Analysis
DSST is a multiscale geometric framework for image analysis that is designed to represent information-not only across several scales but also across several orientations-so that it can efficiently represent geometric features, like edges and other landmarks in images.Compared with shearlet transform [14], the construction of DSST is simpler [15].It has many properties, such as multidirectional, multiscale, localization, and anisotropy.These properties enable DSST to detect edges and other elongated geometric features very effectively.These features occupy the dominant position in facial expression images.
The traditional shearlet transform (ST) was introduced in [14].Compared with ST, an advantage of DSST is that the separable scaling function, φ ∈ L 2 (R), and separable shearlet generators, ψ (0) , ψ (1) ∈ L 2 R 2 , can be selected.First, we define the following: (1) We construct the separable shearlet generators, ψ ∈ L 2 R 2 , and the related scaling function, φ ∈ L 2 (R), on the horizontal cone.The vertical cone is computed similarly.We assume that φ ∈ L 2 (R) is a compactly supported scaling function, and Additionally, we define the compactly supported wavelet function as: Thus, the shearlet generator function can be expressed as: Moreover, the scaling function can be expressed as: where h 0 and g are conjugate mirror filters for a fixed J > 0. Thus, we can write: where f J (n) = f 2 −J n .From the above discussion, we conclude that DSST f , ψ j,k,m (j = 0, 1, • • • , J − 1) can be calculated as: where * 1 is the 1D convolution, ↓ 2 j is down-sampled by 2 j along the horizontal axis, S k is a shearing sampling matrix, and Φ k (n) comprises the filter coefficients, w j 1 ,j 2 (c), j 1 , j 2 > 0, c ∈ l Z 2 is the separable wavelet transform.

Framework of the Algorithm
Figure 1 presents the flowchart of our proposed method.It is summarized as follows: 1.
To reduce the computational complexity and satisfy the requirement of DSST for the input image size, the original image is first preprocessed.2.
After preprocessing, the test and training images are discrete separable shearlet transformed, and all DSST coefficients are extracted as the original expression feature set.

3.
The improved normalized mutual information feature selection method proposed in this paper is used to find the optimal feature subset of the original feature set.The feature subset retains the key classification information of the original set.4.
After the feature extraction and selection, the feature space is reduced by employing linear discriminant analysis (LDA). 5.
The support vector machine (SVM) is used to recognize the expressions.
5. The support vector machine (SVM) is used to recognize the expressions.

Preprocessing
The original images in a dataset usually have issues with inconsistent size and contain too much redundant information.Moreover, the expression is mainly reflected by the eyes, nose, and mouth area, whereas the surrounding area is basically useless.Therefore, it is unnecessary to extract features from the whole image, and the processing of redundant information will only increase the workload of the system.Thus, image preprocessing is necessary.Normalization and equalization were performed on the original images.We followed the preprocessing method given by [16].The facial images were detected and all images were normalized to a gray-level image of size 64 × 64 pixels.

Discrete Separable Shearlet Transform
Given an N N × image, the DSST procedure described in Section 2 as a fixed decomposition scale, j , can be summarized as follows, and the procedure is shown in Figure 2.

1.
J f is up-sampled by

Preprocessing
The original images in a dataset usually have issues with inconsistent size and contain too much redundant information.Moreover, the expression is mainly reflected by the eyes, nose, and mouth area, whereas the surrounding area is basically useless.Therefore, it is unnecessary to extract features from the whole image, and the processing of redundant information will only increase the workload of the system.Thus, image preprocessing is necessary.Normalization and equalization were performed on the original images.We followed the preprocessing method given by [16].The facial images were detected and all images were normalized to a gray-level image of size 64 × 64 pixels.

Discrete Separable Shearlet Transform
Given an N × N image, the DSST procedure described in Section 2 as a fixed decomposition scale, j, can be summarized as follows, and the procedure is shown in Figure 2.

1.
f J is up-sampled by 2 j/2 to obtain f J↑2 j/2 .

2.
Compute the 1D convolution of f J↑2 j/2 and h 0 j/2 , where h 0 j/2 is a 1D low-pass filter to obtain f J . 3.
f J is up-sampled by the shear matrix S k to obtain f J (S k (n)).

5.
Use the separable wavelet transform call and proceed through all the scales, j,j = 0, 1, • • • , J − 1. DSST has no limitation on the number of directions.If we define the direction number as L , the feature vectors will still have ( ) elements, where N is the size of the input image (e.g., 64 2 64 2 6 7680 + × = , which is still a high dimension for subsequent processing and recognition.Therefore, in this paper, DSST has no limitation on the number of directions.If we define the direction number as L, the feature vectors will still have N/2 j 2 + N 2 /2 j × L elements, where N is the size of the input image (e.g., N = 64, j = 2, L = 6).The number of feature elements is 64/2 2 2 + 64 2 /2 2 × 6 = 7680, which is still a high dimension for subsequent processing and recognition.Therefore, in this paper, we propose an improved method for normalized mutual information feature selection to find the optimal feature subset of the original feature set.Moreover, the feature subset not only retains the key classification information of the original set, but also reduces the amount of data.

Feature Selection
Battiti [17] defined the feature selection problem as the process of selecting the most relevant and proposed a greedy selection method to solve it.Ideally, the problem can be solved by maximizing M I(C; W), the joint mutual information between the class variable C and the subset of selected features W. Pablo et al. [18] used an incremental search pattern to solve the problem of feature selection.They redefined a criterion function G as: where NM I c j ; t i represents the normalized mutual information of c i and t i , NM I(t i ; t w ) represents the normalized mutual information of t i and t w , NM I(t i ; t w ) can be calculated as: where H(t i ) is the entropy of information for t i and H(t w ) is the entropy of information for t w .However, the above method has the drawback of unequal weight in normalization, since the weight min[H(t i ), H(t w )] depends on the variables t i and t w .To eliminate this drawback, an improved algorithm is proposed in this paper.Here, every feature t i is quantized by employing the same number of levels (N) that has been decided to achieve the expected quantization error.Algorithm 1 illustrates the quantization algorithm, where t i represents the original feature; U pper and L ower represent the maximum and minimum values of the original feature respectively; S tep stands for quantization step; P artition is the segmentation vector, which represents the level of quantization range segmentation; y i represents the quantized value of the original feature; Q uantiz is a quantization function defined in MATLAB; and C odebook represents the set of quantized values.
The number of quantization levels progressively increases until the quantization error is smaller than a predefined constant ξ, which is the expected quantization error.Here, we used ξ = 0.05.
Clearly, |Ω T | = N, where Ω T is the alphabet of the variable T, and the entropy function H(T) of T satisfies Jensen's inequality [19]: Here, p(t i ) is the probability distribution of t i , so, H(T) ≤ log 2 (N).The joint mutual information of T and Y is computed as: Hence, it is clear from ( 15) and ( 16) that: where log 2 (N) is the upper bound of the mutual information M I(T; Y) and does not depend either on T or Y.To eliminate the drawback of unequal normalization weights, we propose to use (17) to normalize the mutual information instead of ( 12) in [18].
Hence, the improved criterion function G of the feature selection problem based on normalized mutual information is: The improved feature selection algorithm based on normalized mutual information can be summarized as follows: 1.
Initialization: Assume T = {t i |i = 1, 2, • • • , M } as the original set of features; initialize W as an empty set.

2.
Calculate the joint mutual information of each feature and class: M I(C; t i ).

3.
Find the first selected feature: find the feature t i that maximizes M I(C; t i ); delete t i from set T; and then add t i to set W, i.e., T = T\{t i }, W = W ∪ {t i }.

4.
Repeat the following procedure until |W| = k: (a) Computed the feature-feature mutual information M I(t i ; t w ); (b) find the next selected feature: t i ∈ T that maximizes the criterion function G shown in ( 18); (c) delete t i from set T, then add t i to set W, i.e., T = T\{t i }, W = W ∪ {t i }; (d) the output set W, which contains k selected features, is the most relevant features subset from an initial set of

Dimension Reduction Based on Linear Discriminant Analysis
Dimension reduction can not only reduce data dimensions, but also extract effective information and discard useless information.Some well-known methods used for dimension reduction of a feature space are the principal component analysis (PCA) [20] and the linear discriminant analysis (LDA) [21].In this study, LDA is used for data dimension reduction.
The within-class scatter matrix U A and between-class scatter matrix U B are defined as follows: where ) is the number of vectors in the jth class c j , C is the number of classes (here, C represents 6 facial expressions), m is the mean of all the vectors, m j is the mean of the class c j , and m k is the vector of a specific class.Therefore, the optimal discrimination projection matrix of LDA can be written as: The size of D opt is o × r, o ≤ C − 1, and r is the number of elements in a vector.When the within-class scatter matrix U A is nonsingular, according to Lagrange multiplier method, the column vector of the optimal projection matrix D opt satisfies the characteristic equation 21) and we get D opt = arg max|λ|.Therefore, we only need to reserve the eigenvectors that correspond to o eigenvalues with larger absolute values, and the low-dimensional space composed of o eigenvectors is the low-dimensional space we are looking for.Because the rank of each m k − m j in U B is 1, the maximum rank of U B is C.However, if we know the first C − 1 m j , the last m C can be linearly represented by the first C − 1 m j .Thus, the maximum rank of U B is C − 1, so there's at most C − 1 eigenvectors, thus o ≤ C − 1.Thus, LDA maximizes the total scattering of the data while minimizing the within scattering of the classes.

Facial Expression Recognition and Classification Using Support Vector Machine
In machine learning, SVM [22] uses a kernel function to map the data in an input space to a high-dimensional feature space in which we can process a problem in linear form.The steps of facial expression recognition based on SVM are as follows: 1.
Selection of the kernel function: in this method, we choose radial basis function (RBF) as the kernel function, as shown in (22).The reasons for choosing RBF are that RBF can realize nonlinear mapping and solve nonlinear separable problems, and, compared with other kernel functions, RBF has only one parameter σ, so its model complexity is lower than that of others.
Selection of RBF parameter σ and penalty coefficient E: when using the RBF kernel function, the values of two parameters, σ and E, are considered.However, there is no theoretical guidance on how to select the two parameters' values, therefore, we use Dr. Lin's tool grid.py in LibSVM to select the optimal values of σ and E, E = 128 and σ = 0.0078125.

3.
Construction of the multiclass SVM classifier: we adopt the one-against-one voting strategy of SVM.In the training stage, we use 6 categories of samples to construct 6 × (6 − 1)/2 = 15 SVM binary classifiers.We save the results of each SVM binary classifier into an array of structural cells, and hence save all the information needed for multiclass SVM classification into the array of structural cells.In the multiclass SVM classification stage, the training samples are successively passed through the 15 SVM binary classifiers, and the category of data is determined by one-against-one voting strategy.

Results and Discussion
To verify the effectiveness of the proposed algorithm, we performed experiments on the Japanese Female Facial Expression (JAFFE) [23], extended Cohn-Kanade (CK+) [24], MMI Facial Expression Database (MMI) [25], and Psychological Image Collection at Stirling (PICS) [26] facial expression datasets.These are the four most comprehensive datasets currently available for facial expression research.We applied a 10-fold cross-validation scheme, i.e., out of 10 subjects, data from a single subject were reserved as the validation data for testing the algorithm proposed in this paper, whereas the data for the remaining nine subjects were used as the training data.

Experimental Database
JAFFE dataset: the JAFFE dataset contains 213 images of seven facial expressions (six basic facial expressions + one neutral) posed by 10 Japanese female models.Each image was rated on six emotion adjectives by 60 Japanese subjects.The database was planned and assembled by Michael Lyons, Miyuki Kamachi, and Jiro Gyoba.The photos were taken at the psychology department in Kyushu University.The images in the JAFFE dataset are all positive faces, and the original images are adjusted and pruned to make the position of eyes and size of the faces roughly the same.The illumination is all positive light sources but the intensity of illumination is different.Because the expression database is open access and the expression calibration is very standard, it is used for training and testing in most articles on expression recognition nowadays.All 213 images of the JAFFE database were used for six-class expression recognition in this study.For each subject, we randomly chose 14 images for training and used the rest for testing.Each time, data from a single subject were reserved as the validation data for testing the algorithm proposed in this paper, whereas the data for the remaining nine subjects were used as the training data.
CK+ dataset: the Cohn-Kanade AU-Coded Facial Expression Database is for research in automatic facial image analysis and synthesis and for perceptual studies.Cohn-Kanade is available in two versions and a third is in preparation, and CK+ dataset is the second version.It includes both posed and non-posed (spontaneous) expressions and additional types of metadata, and it consists of 123 university students aged 18-30 years, of which 65% are female, 15% are African-American, and 3% are Asian or Latino.This dataset is much larger than JAFFE and is available free of charge.The database contains 593 image sequences, of which 327 have emotion labels.This database is a popular database in facial expression recognition.Many articles use this database for training and testing.We selected 320 image sequences from the CK+ dataset.The only selection criterion was that a sequence could be labeled as one of the six basic emotions.The sequences come from 96 subjects, with one to six emotions per subject.Because we studied facial expression recognition based on static images in this paper, for each sequence, the neutral face and three peak expression frames were used for prototypic expression recognition, resulting in 960 images (i.e., 108 anger, 120 disgust, 99 fear, 282 joy, 126 sadness, and 225 surprise).The peak expression frame was chosen because it reflects the best state of expression.Therefore, we can extract the features that best reflect this expression.More precisely, we distributed the images randomly into 10 groups with roughly equal numbers of subjects.Nine groups were used as the training data to train classifiers, and the remaining group was used as the test data.
MMI dataset: the MMI facial expression database is an ongoing project that aims to deliver large volumes of visual data of facial expressions to the facial expression analysis community.A major issue hindering new developments in the area of automatic human behavior analysis in general, and affect recognition in particular, is the lack of databases with displays of behavior and affect.To address this problem, the MMI facial expression dataset was conceived in 2002 as a resource for building and evaluating facial expression recognition algorithms.The database addresses a number of key omissions in other databases of facial expressions.In particular, it contains recordings of the full temporal pattern of facial expressions ranging from neutral, through a series of onset, apex, and offset phases, and back again to a neutral face.The database consists of over 2900 videos and high-resolution still images of 75 subjects.It is fully annotated for the presence of action units (AUs) in videos and partially coded on frame-level, indicating for each frame whether an AU is in either the neutral, onset, apex, or offset phase.A small part was annotated for audio-visual laughter.The database is freely available to the scientific community.Then, 96 image sequences were selected from the MMI database; the sequences were obtained from 20 subjects with one to six emotions per subject.The neutral face and three peak frames of each sequence (384 images in total) were used for six-class expression recognition.We randomly chose all images from 15 people for training and used the rest for testing.
PICS dataset: the PICS was planned and assembled by Psychology, School of Natural Sciences University of Stirling.It has many subset databases, such as the Stirling/ESCR 3D face dataset, 2D face sets, 3D face sets, and the other image sets; this paper chose the pain expression subset.It contains 599 images of 13 females and 10 males, including the same seven expression categories as in JAFFE.The resolution of each image is 181 × 241.The database is also freely available to the scientific community.From the PICS dataset, we chose 180 images of 10 people, including six expression categories and three images per person.In this study, we randomly chose nine subjects for training and used the rest for testing.

Recognition Rates of the Proposed Method
The recognition rates of the proposed method were evaluated for each dataset separately under the settings, as mentioned above.The 3D feature plots of the proposed method for the six expressions after applying LDA on four datasets are shown in Figure 3a-d, and the recognition rates are shown in Table 1.
In the legend in Figure 3, H represents happiness, Su represents surprise, Sa represents sadness, D represents disgust, A represents anger, and F represents fear.It is observed that the algorithm clearly classifies the features of six kinds of expressions, which provides a powerful tool for subsequent classification and recognition.It is clear from Table 1 that the proposed FER system achieved a high recognition rate on the four datasets.Moreover, the recognition rate decreased significantly without the feature selection method.The experimental results show that the improved feature selection method proposed in this paper plays an important role in the high recognition of the FER system.
of key omissions in other databases of facial expressions.In particular, it contains recordings of the full temporal pattern of facial expressions ranging from neutral, through a series of onset, apex, and offset phases, and back again to a neutral face.The database consists of over 2900 videos and highresolution still images of 75 subjects.It is fully annotated for the presence of action units (AUs) in videos and partially coded on frame-level, indicating for each frame whether an AU is in either the neutral, onset, apex, or offset phase.A small part was annotated for audio-visual laughter.The database is freely available to the scientific community.Then, 96 image sequences were selected from the MMI database; the sequences were obtained from 20 subjects with one to six emotions per subject.The neutral face and three peak frames of each sequence (384 images in total) were used for six-class expression recognition.We randomly chose all images from 15 people for training and used the rest for testing.
PICS dataset: the PICS was planned and assembled by Psychology, School of Natural Sciences University of Stirling.It has many subset databases, such as the Stirling/ESCR 3D face dataset, 2D face sets, 3D face sets, and the other image sets; this paper chose the pain expression subset.It contains 599 images of 13 females and 10 males, including the same seven expression categories as in JAFFE.
The resolution of each image is 181 241 × .The database is also freely available to the scientific community.From the PICS dataset, we chose 180 images of 10 people, including six expression categories and three images per person.In this study, we randomly chose nine subjects for training and used the rest for testing.

Recognition Rates of the Proposed Method
The recognition rates of the proposed method were evaluated for each dataset separately under the settings, as mentioned above.The 3D feature plots of the proposed method for the six expressions after applying LDA on four datasets are shown in Figure 3a-d, and the recognition rates are shown in Table 1.
In the legend in Figure 3, H represents happiness, Su represents surprise, Sa represents sadness, D represents disgust, A represents anger, and F represents fear.It is observed that the algorithm clearly classifies the features of six kinds of expressions, which provides a powerful tool for subsequent classification and recognition.It is clear from Table 1 that the proposed FER system achieved a high recognition rate on the four datasets.Moreover, the recognition rate decreased significantly without the feature selection method.The experimental results show that the improved feature selection method proposed in this paper plays an important role in the high recognition of the FER system.To further verify that this algorithm is independent of the dataset and has strong robustness, we employed 4-fold cross-validation on the dataset.Out of the four datasets, one was utilized as the training data and the others were used as the testing data.This process was repeated four times.The weighted average recognition rates of the proposed method are shown in Table 2.As seen from Table 2, the proposed algorithm not only has a high recognition rate for a single database, but also has a high recognition rate when training one database and testing the other three databases.This proves that the algorithm has strong robustness.

Contrast Experiment
Table 3 compares the proposed FER method with state-of-the-art methods, which were selected because they use frequency domain features, a similar testing strategy, and the same dataset.For a fair comparison, we implemented some of these methods, and for those which we did not, we quoted their published results.A 10-fold cross-validation scheme was used on each dataset.For the four datasets, the weighted average recognition rates of the state-of-the-art methods and the proposed method are shown in Table 3. Table 3 clearly indicates that the proposed method outperforms the state-of-the-art methods for the four datasets.The weighted average recognition rate of the proposed method was 7.62%, 1.46%, 1.98%, 4.27%, 10.71%, and 6.25% higher than the rates of the methods proposed in [27][28][29][30][31][32], respectively.In order to analyze the computational cost of the proposed framework, we selected the most efficient method (that is, [27] from the above experiments of Table 3).The framework of [27] took 1493 ms, 1998 ms, 1215 ms, and 2031 ms recognize an expression frame from JAFFE, CK+, MMI, and PICS datasets of facial expressions, respectively.On the other hand, our framework took 1185 ms, 1826 ms, 1002 ms, and 1725 ms to recognize an expression frame from the same datasets.Thus, our framework not only achieved high recognition rate, but it is also less expensive in terms of computational.

Conclusions
At present, facial expression recognition algorithms are mainly composed of three modules: image preprocessing, feature extraction, and recognition.However, feature selection is also very important for facial expression recognition and is a topic worthy of further study.In this paper, a novel approach to facial expression recognition based on DSST and normalized mutual information feature selection is proposed.This method was tested on four different datasets.After preprocessing, the test and training images were discrete separable shearlet transformed, and all DSST coefficients were extracted as the original expression feature set.The improved feature selection method proposed in this paper was used to find the optimal feature subset of the original feature set.Moreover, the feature subset retained the key classification information of the original set.For dimension reduction, we used LDA.SVM was used as the recognizer.The experimental results show that the weighted average recognition rate of the proposed algorithm was 96.63%, which is significantly higher than the recognition rate of existing facial expression recognition systems.Unfortunately, our framework is not yet ready for use in real-time scenarios because there exist several factors in a real-time scenario that might decrease the performance of the framework, such as complicated background, image rotation, and blur.Therefore, further study is needed to tackle these issues and maintain the same high recognition rate in a real time scenario.

Figure 1 .
Figure 1.The framework of the proposed method.

2 .
Compute the 1D convolution of

5 .
Use the separable wavelet transform call and proceed through all the scales, j ,

Figure 1 .
Figure 1.The framework of the proposed method.

Figure 2 .
Figure 2. (a) Two computation steps for discrete separable shearlet transform (DSST) coefficients: refinement along the horizontal axis (top) and resampling associated with shear matrix (middle), followed by the separable wavelet transform along the vertical axis (bottom); (b) refinement along the horizontal axis to obtain the coefficients when 4 j = and 1 k = .

Figure 2 .
Figure 2. (a) Two computation steps for discrete separable shearlet transform (DSST) coefficients: refinement along the horizontal axis (top) and resampling associated with shear matrix (middle), followed by the separable wavelet transform along the vertical axis (bottom); (b) refinement along the horizontal axis to obtain the coefficients when j = 4 and k = 1.

Figure 3 .
Figure 3. 3D feature plots of the proposed method for recognizing the expressions on the four datasets: (a) Japanese Female Facial Expression (JAFFE) dataset, (b) extended Cohn-Kanade (CK+) dataset, (c) MMI Facial Expression (MMI) dataset, and (d) Psychological Image Collection at Stirling (PICS) dataset.

Figure 3 .
Figure 3. 3D feature plots of the proposed method for recognizing the expressions on the four datasets: (a) Japanese Female Facial Expression (JAFFE) dataset, (b) extended Cohn-Kanade (CK+) dataset, (c) MMI Facial Expression (MMI) dataset, and (d) Psychological Image Collection at Stirling (PICS) dataset.

Moreover, the quantization
algorithm has a complexity of O(M), and M is the number of features.The experiments were performed in MATLAB R2014a, [Intel-(R) Core-(TM) (3.60 GHz) with a RAM capacity of 56 GB].The proposed framework has a complexity of O(TM), and T is the number of input expression images.

Table 1 .
Comparison of average accuracy rate of the proposed method with and without feature selection on the four data sets.

Table 1 .
Comparison of average accuracy rate of the proposed method with and without feature selection on the four data sets.

Table 2 .
Average accuracy rate of the proposed method trained on one dataset and tested on the others.

Table 3 .
Comparison of the proposed method with state-of-the-art methods in terms of average accuracy rate.