Face Recognition with Symmetrical Face Training Samples Based on Local Binary Patterns and the Gabor Filter

In the practical reality of face recognition applications, the human face can have only a limited number of training images. However, it is known that, in general, increasing the number of training images also increases the performance of face recognition systems. In this case, a new set of training samples can be generated from the original samples, using the symmetry property of the face. Although many face recognition methods have been proposed in the literature, a robust face recognition system is still a challenging task. In this paper, recognition performance was improved by using the property of face symmetry. Moreover, the effects of illumination and pose variations were reduced. A Two-Dimensional Discrete Wavelet Transform, based on the Local Binary Pattern, which is a new approach for face recognition using symmetry, has been presented. The method has three main stages, preprocessing, feature extraction, and classification. A Two-Dimensional Discrete Wavelet Transform with Single-Level and Gaussian Low-Pass Filter were used, separately, for preprocessing. The Local Binary Pattern, Gray Level Co-Occurrence Matrix, and the Gabor filter were used for feature extraction, and the Euclidean Distance was used for classification. The proposed method was implemented and evaluated using the Olivetti Research Laboratory (ORL) and Yale datasets. This study also examined the importance of the preprocessing stage in a face recognition system. The experimental results showed that the proposed method had a recognition accuracy of 100%, for both the ORL and Yale datasets, and these recognition rates were higher than the methods in the literature.


Introduction
Robust and accurate face recognition (FR) is one of the most important problems in computer vision applications.In the literature, there are several methods used for FR, including holistic, local, and hybrid methods [1,2].However, recent research has revealed that a symmetry-based dataset for FR is a useful method to increase the performance of the FR system; thus, it is possible to realize FR using the property of face symmetry [3].
The property of face symmetry is useful for solving two main problems in FR that are still prevalent-the limited number of face training samples and the variations in poses and facial expressions, in addition to the lighting conditions.The proposed method uses the property of face symmetry to reduce the effect of these two problems.
In this study, the Local Binary Pattern (LBP) [4][5][6], the Gray Level Co-Occurrence Matrix (GLCM) [7], and the Gabor Filter [8] were used for feature extraction, since these methods performed well for a texture feature extraction that could be used for the FR [9][10][11].Moreover, any two methods from the list could be combined [8,12], such as LBP with GLCM, in order to make the feature extraction operation more robust.The images of the face were enhanced, before extracting their features.This enhancement operation was accomplished by a preprocessing step using well-known techniques, namely the Gaussian low-pass filter (GLPF) [13], Difference of Gaussian (DoG) [14], and the Discrete Wavelet Transform (DWT) [15].The proposed method was analyzed using two benchmark facial datasets, namely the Olivetti Research Laboratory (ORL) [16] and Yale [17] datasets.These datasets were widely used to test the performance of the FR methods [3,18,19].The method had three main stages: Preprocessing, Feature Extraction, and Classification.The Two-Dimensional Discrete Wavelet Transform (2-D DWT), GLPF, and DoG, were used for preprocessing.The LBP, GLCM, and Gabor filter were used for feature extraction.Finally, the Euclidean Distance was used for classification, as shown in Figure 1.

Literature Review
FR is among the most important and well-studied problems in computer vision [20].However, illumination and pose variations are still some open problems that need to be solved.Facial images are taken in environments that are usually not under control, which contain variations in viewpoint and illumination; therefore, these two factors play a vital role in the efficiency of recognition.Developing an algorithm that can handle variations in illumination, pose, facial expression, and occlusion, etc., altogether, still seems to be a very challenging task.
There are many studies related to FR, such as the authors of Reference [21], who have presented a robust method for FR, using a sparse representation-based classification (SRC).Although the results were good, the method had a high computational cost.Zhang et al. in Reference [22] proposed an SRC-based classification algorithm, based on the Gabor feature, by combining the features from SRC and Gabor.Furthermore, they succeeded in reducing the complexity of computation and improving the FR rate.Mairal et al. [23] added a new step to SRC for signals, by successfully using their method to recognize a handwritten digit and to classify the textures.In Reference [24], the authors mapped the facial images to the so-called face subspace.Here, Locality Preserving Projections (LPP) were used to calculate a basis set, called Laplacian Faces.Linear Discriminant Analysis (LDA) has been used in Reference [25] to construct a subspace on which the inter-person variance was optimally large, while the intra-person variance was efficiently small.The main disadvantage of this technique, the same as that of Principal Component Analysis (PCA) [26], was the data-space Euclidean consideration, since the method fails when data points lie in a nonlinear subspace, which is usually true with multimodally distributed facial images.
Although there exists many studies [14,27,28] on invariant representations for handling certain variations, apparently, a generic approach to model different variations at once, has not yet come to light.It has been known for a long time that feature-based methods, such as elastic bunch graph matching, are promisingly successful against many factors, including variations of illumination and viewpoint [29].Nevertheless, their extreme sensitivity to feature extraction and the measurement of extracted features makes them unreliable [30].Many authors have studied the effect of variations in illumination conditions on FR [14,28,[30][31][32][33].As a result, appearance-based methods have dominated the literature.
FR with LBP has been proposed by Ahonen et al. [6], in which the algorithm was not sensitive to light, and accordingly, this point was considered to be the robustness of their study.The authors of Reference [34] used discriminative dictionary learning and SRC, along with the Gabor filter bank and the LBP, for feature extraction, and reduced the influences of illumination changes.One of the milestones for FR under variations, could be stated as the Fisherfaces and Eigenfaces [25] technique, which is insensitive to illumination variations.A good improvement has been recommended in Reference [35], in which local linear transformations were used instead of one global transformation.Although the technique suggests different mapping functions for different pose classes, it could not treat the case of critical variations.Facial images with different poses, facial expressions, and illumination conditions were studied and the performances of the recognition were shown to be higher, compared to the Fisherfaces or Eigenfaces [36].
Pose variation has also been studied in Reference [37], by using view-based Eigenfaces.For each view, Eigenfaces were calculated and applied as separate transformations into a standard lower-dimensional subspace.The authors in Reference [38] introduced Eigen features, in which a feature-based scheme was incorporated.In fact, their performance highly depended on discretization, where the Eigen light-field technique was used to define the subspace of poses.Moreover, uncommon poses could be treated by this technique.
The authors of Reference [39] combined the generalized photometric stereo and Eigen light field concept to generate a generic method which was also insensitive to illumination changes.The authors of Reference [40] presented a method to arrange the variation of poses and illumination, including shadows and reflections; however, the computational cost in their method decreased the efficiency of the recognition system, since they generated 3D models from 2D images.Shashua et al. proposed a method in Reference [31], based on the illumination invariant signature image, since they showed that it was possible, even in bad conditions, to use a small dataset to generate more images with varying illumination.However, their method was not appropriate when the images included some shadows.Then, Zhou et al. in Reference [32] reduced the effect of the shadow issued, by utilizing extra limitations on the albedo.
Georghiades et al. showed that in Reference [30], when the pose was fixed, all possibilities of illumination in the image space had a convex cone.In addition, they used their method to reconstruct the shape and albedo of the face by training the system using only a few images, with different directions of light.The authors proved in Reference [41] that all possible illumination variations were accomplished using only a nine-dimensional linear subspace, by using spherical harmonics.The authors of Reference [42] examined different illumination conditions and also hypothetically analyzed the subspace for images of a convex Lambertian object.
The authors of Reference [43] proposed a nonlinear subspace approach using the tensor representation of faces in different cases, including facial expressions, illuminations, and poses, since they used the n mode tensor Singular Value Decomposition (SVD), to generate an image base.Even though this technique gave good results, it still requires several images under different variations, for each training identity.
Another nonlinear subspace analysis has been proposed in Reference [44], using the manifold assumption in which a gallery manifold for each identity was stored in the database.To define a test identity with several new poses, first its probe manifold was constructed, by its identity being defined using a manifold-to-manifold distance.The method was fairly good, but the necessity for various images of the test person, could be considered a disadvantage.
In Reference [45], the illumination invariance was analyzed, using a ridge regression technique to overcome the matrix inversion that was required in the symmetric bilinear model.The authors of Reference [46] introduced a modified asymmetric model to overcome pose variations.However, the performance of their method was affected by the discretization resolution of the pose space.
One of the most important properties in nature, and particularly in human faces, is that of symmetry.Many authors have noticed its role [47,48].It has been observed that the human face is almost symmetrical, so the use of this property for face detection (FD) and FR has been previously studied [49], where the authors have developed a technique to automatically compute bilateral symmetry axis and use it in their research.
Zhao and Chellappa in Reference [50] used the symmetry of the face to reduce the effects of illumination in FD.It has been shown that symmetry was also useful for extracting the facial profile in facial recognition techniques [51,52].The authors of Reference [53] successfully applied the symmetry property to FD, and they concluded that the expressions of the face were also symmetrical.Thus, the benefit of this property has been used for FR, in our study.
The FR algorithms suffer from two problems.First, in general, there is only a limited number of training images.Second, the existence of variations in illumination and poses, in addition to facial expressions, complicates the task.Although there have been a number of proposed methods to overcome these problems using the property of symmetry in face, such problems are still considered open and are not yet solved.A recent method has been proposed by the authors of References [3,54], wherein, they improve the rate of FR recognition accuracy by using the symmetry property of the face, to using Symmetry for Collaborative Representation-Based Classification (SCRC).

Wavelet Transforms
Wavelet Transforms were selected for preprocessing, since they examine images in a time-frequency localization, which helps to implement many methods, based on the wavelet for image processing [55].The image was dismantled into two parts, using an LP filter and an HP filter, and each of these parts was down-sampled by two [56], as illustrated in Figure 2. processing [55].The image was dismantled into two parts, using an LP filter and an HP filter, and 159 each of these parts was down-sampled by two [56], as illustrated in Figure 2.

162
where Lo_D is a low-pass filter, Hi_D is a high-pass filter, ↓ 2 denotes a down-sampling with a 163 factor of two (keeping the even indexed rows or columns).165 The Gaussian Low-Pass filter or Gaussian smoothing is a filter that results in the smoothing of 166 an image, by using a Gaussian function.It is used to filter images and reduce image noise [57].

167
The GLPF is used in many image processing systems that require a pre-preparing for their inputs,

168
since it reduces the image noise [58] and allows only the lower-frequency components of the image to 169 pass [13].The equation of a Gaussian function in two dimensions is given by the following formula: where, x and y are, respectively, the distance from the origin, in the horizontal and vertical axes,

171
represents the standard deviation of the Gaussian distribution.Figure 3 shows the Gaussian Low- where Lo_D is a low-pass filter, Hi_D is a high-pass filter, ↓ 2 denotes a down-sampling with a factor of two (keeping the even indexed rows or columns).

Gaussian Low-Pass Filter (GLPF)
The Gaussian Low-Pass filter or Gaussian smoothing is a filter that results in the smoothing of an image, by using a Gaussian function.It is used to filter images and reduce image noise [57].
The GLPF is used in many image processing systems that require a pre-preparing for their inputs, since it reduces the image noise [58] and allows only the lower-frequency components of the image to pass [13].The equation of a Gaussian function in two dimensions is given by the following formula: where, x and y are, respectively, the distance from the origin, in the horizontal and vertical axes, σ represents the standard deviation of the Gaussian distribution.Figure 3 shows the Gaussian Low-Pass Filter for (σ = 2).
Symmetry 2018, 10, x FOR PEER REVIEW 5 of 22 processing [55].The image was dismantled into two parts, using an LP filter and an HP filter, and each of these parts was down-sampled by two [56], as illustrated in Figure 2. where Lo_D is a low-pass filter, Hi_D is a high-pass filter, ↓ 2 denotes a down-sampling with a factor of two (keeping the even indexed rows or columns).

Gaussian Low-Pass Filter (GLPF)
The Gaussian Low-Pass filter or Gaussian smoothing is a filter that results in the smoothing of an image, by using a Gaussian function.It is used to filter images and reduce image noise [57].
The GLPF is used in many image processing systems that require a pre-preparing for their inputs, since it reduces the image noise [58] and allows only the lower-frequency components of the image to pass [13].The equation of a Gaussian function in two dimensions is given by the following formula: where, x and y are, respectively, the distance from the origin, in the horizontal and vertical axes,  represents the standard deviation of the Gaussian distribution.Figure 3 shows the Gaussian Low-Pass Filter for ( = 2 .

Difference of Gaussians (DoG)
If there are two copies of the same image and these two copies are being filtered using two Gaussian filters with different variances σ 2 1 and σ 2 2 (where σ 2 > σ 1 ), to produce two new images, the result of subtracting these two new images is the DoG [59].The filtering process is the convolution of the image with the filter kernel.Filtering the image keeps only the low-frequency spatial information.Therefore, subtracting one result from the other becomes a bandpass operation [60].If σ 1 = σ and σ 2 = Kσ, then the DoG of image I, for the two-dimensional case, is the function: where Γ is the DoG function, I is the original image.

Feature Extraction Using GLCM
The GLCM is one of the methods used for feature extraction.Its concept was introduced by Harlick et al. [61].In GLCM, the extracted features depend on the direction (angle θ) and the distance (D) from the pixel of interest [7], as illustrated in Figure 4. the result of subtracting these two new images is the DoG [59].The filtering process is the convolution of the image with the filter kernel.Filtering the image keeps only the low-frequency spatial information.Therefore, subtracting one result from the other becomes a bandpass operation [60].If  =  and  = , then the DoG of image I, for the two-dimensional case, is the function: where  is the DoG function, I is the original image.

Feature Extraction Using GLCM
The GLCM is one of the methods used for feature extraction.Its concept was introduced by Harlick et al. [61].In GLCM, the extracted features depend on the direction (angle θ) and the distance (D) from the pixel of interest [7], as illustrated in Figure 4.In this study, a number of values (D = 1, 2, and 3 and θ = 0°, 45°, 90°, and 135°) were examined to calculate the best scenario.The used features were the correlations, contrast, maximum probability, angular second moment, mean, homogeneity, entropy, and dissimilarity [61].These features were calculated using the following formulae: 1. Correlation: 2. Contrast: 3. Maximum probability: 4. Angular Second Moment: 5. Mean: In this study, a number of values (D = 1, 2, and 3 and θ = 0 • , 45 • , 90 • , and 135 • ) were examined to calculate the best scenario.The used features were the correlations, contrast, maximum probability, angular second moment, mean, homogeneity, entropy, and dissimilarity [61].These features were calculated using the following formulae: 1. Correlation: 2. Contrast: 3. Maximum probability: 4. Angular Second Moment: 5. Mean: 6. Homogeneity: 7. Entropy: 8. Dissimilarity: where µ x is the mean of the column values in the image, µ y is the mean of the row values in the image, p(x,y) denotes the elements of the Gray Level Co-Occurrence Matrix, i and j are, respectively, the lengths of the row and column of the image [61].

Feature Extraction Using LBP
One of the most widely used methods to analyze and model texture is the LBP method [9].It could be basically described as a 3 × 3 square operator.In each square, the eight-neighborhood pixels were compared with the one in the center.If the pixel values of the neighbors were greater than or equal to the pixel value at the center, they were replaced by 1.If not, then their values were replaced by 0. Then, the new binary values of the neighbors were concatenated to produce one decimal value that was considered to be a new value for the pixel in the center.The window was passed to the next pixel and the same operation was repeated.These new decimal values represented the histogram of the input texture.Equation (11) described the algorithm of the LBP operation: where s is the sign function, N P is the number of neighborhood pixels, g p represents the gray level value of the neighboring pixels, and g c represents the gray level value of the central pixels.2 P is required to produce decimal values.The traditional LBP [6] analyzes the texture of the image and thresholds a 3 × 3 square neighborhood as the center pixel value.It only uses the sign information to produce the LBP, as illustrated in Figure 5 [4].

𝑓 =
∑  ,    (7) 6. Homogeneity: 7. Entropy: 8. Dissimilarity: where  is the mean of the column values in the image,  is the mean of the row values in the image, p(x,y) denotes the elements of the Gray Level Co-Occurrence Matrix, i and j are, respectively, the lengths of the row and column of the image [61].

Feature Extraction Using LBP
One of the most widely used methods to analyze and model texture is the LBP method [9].It could be basically described as a 3 × 3 square operator.In each square, the eight-neighborhood pixels were compared with the one in the center.If the pixel values of the neighbors were greater than or equal to the pixel value at the center, they were replaced by 1.If not, then their values were replaced by 0. Then, the new binary values of the neighbors were concatenated to produce one decimal value that was considered to be a new value for the pixel in the center.The window was passed to the next pixel and the same operation was repeated.These new decimal values represented the histogram of the input texture.Equation 11 described the algorithm of the LBP operation: where s is the sign function, N is the number of neighborhood pixels, gp represents the gray level value of the neighboring pixels, and gc represents the gray level value of the central pixels.In a newer implementation, the LBP operation has been upgraded to deal with any neighborhood size, by replacing the square with a circle [9].This can be described by (N ,R), where In a newer implementation, the LBP operation has been upgraded to deal with any neighborhood size, by replacing the square with a circle [9].This can be described by (N P ,R), where R is the radius of the circle used.Figure 6 illustrates an (8, 2) neighborhood.Additionally, there are a number of other modifications to the LBP [4].The term LBP u 2 P,R is used to describe the LBP operation, where u 2 denotes the use of a uniform pattern.The resulting histogram results in the necessary information distributed in the image, such as edges, corners, uniform areas, etc.The effective operation must take care of the spatial information in the image, during the representation.One strategy to accomplish this is to partition the image into a number of small areas R 0 , R 1 , . . ., R m−1 [6], where m is the number of areas.If the size of the histogram is B, then the length of the feature vector is mB.It is obvious from this relation that the number of areas m determines the length of the feature vector, which means selecting small areas results in long feature vectors, leading to extreme use of memory and a slow classification processing.Selecting large areas causes a loss of spatial information.An example of a preprocessed face image partitioned into thirty-six windows and the resulting face feature histogram are illustrated in Figure 7 [62].
R is the radius of the circle used.Figure 6 illustrates an (8, 2) neighborhood.Additionally, there are a number of other modifications to the LBP [4].

Feature Extraction Using the Gabor Filter
The Gabor filter is a very helpful tool in image processing, especially in FR [63].In the spatial domain, the Gabor filter with two dimensions is the modulation of a Gaussian kernel function, by a complex sinusoidal plane wave with a center frequency f and orientation θ [64], and is defined as:

𝑦 = −𝑥sin𝜃 + 𝑦cos𝜃
where γ and η denote the ratio between the envelope of the Gaussian function with standard deviation σ and the center frequency, and ϕ defines the phase offset.
The frequency (or wavelength) governs the width of the stripes in the function, and by increasing the frequency, the stripes become thinner.The orientation governs the rotation of the Gabor envelope

Feature Extraction Using the Gabor Filter
The Gabor filter is a very helpful tool in image processing, especially in FR [63].In the spatial domain, the Gabor filter with two dimensions is the modulation of a Gaussian kernel function, by a complex sinusoidal plane wave with a center frequency f and orientation θ [64], and is defined as: 2 ) e (j2π f x +φ) x = x cos θ + y sin θy = −x sin θ + y cos θ (12) where γ and η denote the ratio between the envelope of the Gaussian function with standard deviation σ and the center frequency, and φ defines the phase offset.
The frequency (or wavelength) governs the width of the stripes in the function, and by increasing the frequency, the stripes become thinner.The orientation governs the rotation of the Gabor envelope and the aspect ratio controls the height of the function.For a very large aspect ratio, the envelope approaches a height of one pixel, and for a very small aspect ratio, the height stretches across the image.The bandwidth controls the overall size of the Gabor envelope, such that for a large bandwidth, the envelope increases, allowing more stripes [65].show the effect of changing some parameters for the function of a Gabor.and the aspect ratio controls the height of the function.For a very large aspect ratio, the envelope approaches a height of one pixel, and for a very small aspect ratio, the height stretches across the image.The bandwidth controls the overall size of the Gabor envelope, such that for a large bandwidth, the envelope increases, allowing more stripes [65].and the aspect ratio controls the height of the function.For a very large aspect ratio, the envelope approaches a height of one pixel, and for a very small aspect ratio, the height stretches across the image.The bandwidth controls the overall size of the Gabor envelope, such that for a large bandwidth, the envelope increases, allowing more stripes [65].Gabor filters have many advantages, such as invariance to rotation, scale, and translation.
Moreover, they are robust against disturbances in images, such as change in illumination [66,67], and they have been found to be particularly appropriate to extract many features from an image, using different frequencies and orientation for Gabor filters [65].
They are useful, especially in feature extraction for texture analysis and segmentation.The varying orientation observes the texture that is oriented in a particular direction, while the varying Gaussian envelope standard deviation controls the region size of the image that is being analyzed [68].

Classification
Although there were many classifiers used for the classification, such as the Euclidean Distance, the Cosine Distance, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Learning Vector Quantization, and Support Vector Machines [69].The Minimum Euclidean Distance classifier was considered to be one of the most popular classifiers that could be easily designed [70] and widely used [71,72].In general, it was used to examine the similarities between objects.In this study, we used the k-nearest neighbor classifier (for k = 1) with a Euclidean distance function as a distance metric.

Euclidean Distance
The Euclidean distance d between two points i and j, where I = (i1, i2,..., in) and j = (j1, j2,..., jn), in Cartesian coordinates, is the length of the straightest line between them.This distance is given by the formula: Gabor filters have many advantages, such as invariance to rotation, scale, and translation.
Moreover, they are robust against disturbances in images, such as change in illumination [66,67], and they have been found to be particularly appropriate to extract many features from an image, using different frequencies and orientation for Gabor filters [65].
They are useful, especially in feature extraction for texture analysis and segmentation.The varying orientation observes the texture that is oriented in a particular direction, while the varying Gaussian envelope standard deviation controls the region size of the image that is being analyzed [68].

Classification
Although there were many classifiers used for the classification, such as the Euclidean Distance, the Cosine Distance, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Learning Vector Quantization, and Support Vector Machines [69].The Minimum Euclidean Distance classifier was considered to be one of the most popular classifiers that could be easily designed [70] and widely used [71,72].In general, it was used to examine the similarities between objects.In this study, we used the k-nearest neighbor classifier (for k = 1) with a Euclidean distance function as a distance metric.

Euclidean Distance
The Euclidean distance d between two points i and j, where I = (i1, i2,..., in) and j = (j1, j2,..., jn), in Cartesian coordinates, is the length of the straightest line between them.This distance is given by the formula: Gabor filters have many advantages, such as invariance to rotation, scale, and translation.Moreover, they are robust against disturbances in images, such as change in illumination [66,67], and they have been found to be particularly appropriate to extract many features from an image, using different frequencies and orientation for Gabor filters [65].
They are useful, especially in feature extraction for texture analysis and segmentation.The varying orientation observes the texture that is oriented in a particular direction, while the varying Gaussian envelope standard deviation controls the region size of the image that is being analyzed [68].

Classification
Although there were many classifiers used for the classification, such as the Euclidean Distance, the Cosine Distance, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Learning Vector Quantization, and Support Vector Machines [69].The Minimum Euclidean Distance classifier was considered to be one of the most popular classifiers that could be easily designed [70] and widely used [71,72].In general, it was used to examine the similarities between objects.In this study, we used the k-nearest neighbor classifier (for k = 1) with a Euclidean distance function as a distance metric.

Euclidean Distance
The Euclidean distance d between two points i and j, where I = (i 1 , i 2 ,..., i n ) and j = (j 1 , j 2 ,..., j n ), in Cartesian coordinates, is the length of the straightest line between them.This distance is given by the formula: Therefore, if the two points are close to each other, then the value of d is small; otherwise, it is large.The Euclidean vector is the location of a point in a Euclidean n-space, where the length of this vector is measured by the formula of the Euclidean norm, given by: This tool is used to test how similar one object (face) is to another, by testing the similarities between their respective feature vectors.

Dataset
The dataset in this study was taken from the ORL and Yale datasets.

The ORL Dataset
The ORL is a well-known face dataset that is used to test FR algorithms.It has 400 images of 40 distinct persons, 10 images for each person.The dataset is varied in many aspects.First, the images are taken at different times during the lives of the people.Second, the images include different variations and different facial expressions, such as closed or open eyes.Some of the people are smiling, others are not.In addition, there are a number of people wearing spectacles while others are not wearing spectacles.Furthermore, a number of the images include up to twenty degrees of tilting and rotation of the face [3].
A number of face images from the ORL dataset are illustrated in Figure 13.
Symmetry 2018, 10, x FOR PEER REVIEW 11 of 22 Therefore, if the two points are close to each other, then the value of d is small; otherwise, it is large.The Euclidean vector is the location of a point in a Euclidean n-space, where the length of this vector is measured by the formula of the Euclidean norm, given by: This tool is used to test how similar one object (face) is to another, by testing the similarities between their respective feature vectors.

Dataset
The dataset in this study was taken from the ORL and Yale datasets.

The ORL Dataset
The ORL is a well-known face dataset that is used to test FR algorithms.It has 400 images of 40 A number of face images from the ORL dataset are illustrated in Figure 13.

The Yale Dataset
In this dataset, there exists 165 images for 15 unique people, 11 images for each person with different cases, such as normal, sad, sleepy, etc.The dataset includes many variations of pose, illumination, and expression [3].A number of images from the Yale dataset are illustrated in Figure 14.

The Yale Dataset
In this dataset, there exists 165 images for 15 unique people, 11 images for each person with different cases, such as normal, sad, sleepy, etc.The dataset includes many variations of pose, illumination, and expression [3].A number of images from the Yale dataset are illustrated in Figure 14.

304
This section shows some results obtained from simulations using MATLAB 2015b.The 305 experiments were implemented on images from the ORL and Yale datasets, using the proposed 306 method.The proposed method was compared with the performance of PCA [26]

Experiments and Results
This section shows some results obtained from simulations using MATLAB 2015b.The experiments were implemented on images from the ORL and Yale datasets, using the proposed method.The proposed method was compared with the performance of PCA [26], Collaborative Representation-Based Classification (CRC) [73], SRC [21], and SCRC [3,54].
The FR system consisted of three stages.The first stage was the Preprocessing Stage, in which the 2-D DWT, the GLPF, and the DoG were used separately.The second stage was the Feature Extraction Stage, where the LBP, the GLCM, and the Gabor Filter were examined; all these algorithms were first tested separately, then, the two methods from the list were combined in the Feature Extraction Stage.In the final stage (the Classification Stage), the Euclidean distance was used as a classifier.The procedure was carried out and tested using the Original Training Samples (OTS) and the Original with Symmetrical Training Samples (OSTS) from the ORL and the Yale datasets.

Generating New Images
In order to increase the size of the training data, new training images were generated using the property of face symmetry, since those images reflect some part of the face that is not shown by the original images, as illustrated in Figure 15.

Experiments and Results
This section shows some results obtained from simulations using MATLAB 2015b.The experiments were implemented on images from the ORL and Yale datasets, using the proposed method.The proposed method was compared with the performance of PCA [26], Collaborative Representation-Based Classification (CRC) [73], SRC [21], and SCRC [3,54].
The FR system consisted of three stages.The first stage was the Preprocessing Stage, in which the 2-D DWT, the GLPF, and the DoG were used separately.The second stage was the Feature Extraction Stage, where the LBP, the GLCM, and the Gabor Filter were examined; all these algorithms were first tested separately, then, the two methods from the list were combined in the Feature Extraction Stage.In the final stage (the Classification Stage), the Euclidean distance was used as a classifier.The procedure was carried out and tested using the Original Training Samples (OTS) and the Original with Symmetrical Training Samples (OSTS) from the ORL and the Yale datasets.

Generating New Images
In order to increase the size of the training data, new training images were generated using the property of face symmetry, since those images reflect some part of the face that is not shown by the original images, as illustrated in Figure 15.

Experiments on the ORL Dataset
In the experiment, one, two, up to nine face images of each person from the ORL dataset with size 112 x 92 were used, respectively, as the training samples and the rest of images were used as the testing samples.The features of the training and testing images were extracted using the LBP, the GLCM, and the Gabor Filter.Each image had one feature vector,  =  ,  ⋯  , where m is the number of one-image features.
The feature vector of the test image was compared with the feature vectors of the training images, using the Euclidean distance classifier.The person who had a training image feature vector with a minimum Euclidean distance was considered to be the result of recognition.The experiments

Experiments on the ORL Dataset
In the experiment, one, two, up to nine face images of each person from the ORL dataset with size 112 × 92 were used, respectively, as the training samples and the rest of images were used as the testing samples.The features of the training and testing images were extracted using the LBP, the GLCM, and the Gabor Filter.Each image had one feature vector, where m is the number of one-image features.
The feature vector of the test image was compared with the feature vectors of the training images, using the Euclidean distance classifier.The person who had a training image feature vector with a minimum Euclidean distance was considered to be the result of recognition.The experiments were run ten times, with random image selection in each experiment.The recognition rate was calculated as the average of each set of these experiments.

Experiments on Symmetrical ORL Dataset
In this experiment, the original and symmetrical images were used for training.The experiment revealed the use of symmetrical images, along with the original images, improved the accuracy of FR, as compared to only using the original images as training samples.Figure 16 shows the results of using the LBP for feature extraction with OTS and OSTS.were run ten times, with random image selection in each experiment.The recognition rate was calculated as the average of each set of these experiments.

Experiments on Symmetrical ORL Dataset
In this experiment, the original and symmetrical images were used for training.The experiment revealed the use of symmetrical images, along with the original images, improved the accuracy of FR, as compared to only using the original images as training samples.Figure 16 shows the results of using the LBP for feature extraction with OTS and OSTS.

Using a Preprocessing Stage
In this experiment, three different methods for preprocessing were separately examined with LBP.First, LBP was used without any preprocessing stage, followed by the GLPF being used for the preprocessing stage, with a standard deviation of σ = 1 and a window size of 5 pixels.Then the DoG with σ1 = 0.1, σ2 = 2.0, and a window size of 5 pixels was used for the preprocessing stage.Finally, the 2-D DWT was also used for the preprocessing stage.
The results showed that the use of GLPF or 2-D DWT as a preprocessing stage improved the accuracy of FR, as compared to not using any of the preprocessing stages, as in Figure 17.The experiments were implemented using OSTS.

Using a Preprocessing Stage
In this experiment, three different methods for preprocessing were separately examined with LBP.First, LBP was used without any preprocessing stage, followed by the GLPF being used for the preprocessing stage, with a standard deviation of σ = 1 and a window size of 5 pixels.Then the DoG with σ 1 = 0.1, σ 2 = 2.0, and a window size of 5 pixels was used for the preprocessing stage.Finally, the 2-D DWT was also used for the preprocessing stage.
The results showed that the use of GLPF or 2-D DWT as a preprocessing stage improved the accuracy of FR, as compared to not using any of the preprocessing stages, as in Figure 17.The experiments were implemented using OSTS.

The GLCM Method
In this experiment, the GLCM method was used to extract the features.The parameters of the GLCM method were selected to be D = 1 and θ = 0°.

Combining Feature Extraction Methods
In this experiment, two methods were used separately for feature extraction-the LBP and the GLCM.Then, the two feature vectors obtained from these two methods were normalized and concatenated to produce one longer feature vector, which was used for training and testing.The results showed that the combination of the two methods could help to improve the accuracy of FR, as compared to using one method for feature extraction, as shown in Figure 18.The experiments were implemented using OSTS.

The GLCM Method
In this experiment, the GLCM method was used to extract the features.The parameters of the GLCM method were selected to be D = 1 and θ = 0 • .

Combining Feature Extraction Methods
In this experiment, two methods were used separately for feature extraction-the LBP and the GLCM.Then, the two feature vectors obtained from these two methods were normalized and concatenated to produce one longer feature vector, which was used for training and testing.The results showed that the combination of the two methods could help to improve the accuracy of FR, as compared to using one method for feature extraction, as shown in Figure 18.The experiments were implemented using OSTS.

The Gabor Filter Method
In this experiment, the Gabor Filter was examined to extract the features.The parameters of the Gabor filter bank were set as following.The number of scales was set to 5, the number of orientations was set to 8, and the number of rows and columns in a 2-D Gabor filter were each set to 39.
Additionally, the parameter of the Gabor function was set as following.The factor of down-sampling along the rows was set to 4 and the factor of down-sampling along the columns was set to 4. The experiment revealed that the best results were obtained using the Gabor Filter, as compared to the other methods.Figure 19 shows the results of the recognition rates for different methods on the OSTS-ORL dataset.These methods were-the LBP without any preprocessing stage (LBP), the LBP with DWT as a preprocessing stage (DWT-LBP), the LBP with GLPF as a preprocessing stage (GLPF-LBP), the GLCM, the LBP combined with the GLCM and the Gabor.For the sake of comparison, the performance of the PCA has also been shown in the figure.

The Gabor Filter Method
In this experiment, the Gabor Filter was examined to extract the features.The parameters of the Gabor filter bank were set as following.The number of scales was set to 5, the number of orientations was set to 8, and the number of rows and columns in a 2-D Gabor filter were each set to 39.Additionally, the parameter of the Gabor function was set as following.The factor of down-sampling along the rows was set to 4 and the factor of down-sampling along the columns was set to 4. The experiment revealed that the best results were obtained using the Gabor Filter, as compared to the other methods.Figure 19 shows the results of the recognition rates for different methods on the OSTS-ORL dataset.These methods were-the LBP without any preprocessing stage (LBP), the LBP with DWT as a preprocessing stage (DWT-LBP), the LBP with GLPF as a preprocessing stage (GLPF-LBP), the GLCM, the LBP combined with the GLCM and the Gabor.For the sake of comparison, the performance of the PCA has also been shown in the figure.

Other Experiments
In order to generalize the proposed method, various cases and situations were examined and studied.For this purpose, different experiments were carried out, using different preprocessing techniques and different feature extraction methods.These experiments were implemented to compare the performance of the FR system when the original training samples (OTS) was used alone and when the original training samples were used, along with the symmetrical training samples (OSTS).For the sake of completeness, the results were compared with the methods in the literature.
All obtained results have been summarized in Table 1.

Other Experiments
In order to generalize the proposed method, various cases and situations were examined and studied.For this purpose, different experiments were carried out, using different preprocessing techniques and different feature extraction methods.These experiments were implemented to compare the performance of the FR system when the original training samples (OTS) was used alone and when the original training samples were used, along with the symmetrical training samples (OSTS).For the sake of completeness, the results were compared with the methods in the literature.All obtained results have been summarized in Table 1.In this experiment, from the Yale dataset, either one, two, or up to ten facial images of size 154 × 154, were chosen for each person, which were then used as the training samples, and the rest of images were used as the testing samples.These experiments were similar in procedure to those in the ORL dataset, where a variety of methods were tested for preprocessing and feature extraction.These methods were tested and examined using the OTS and the OSTS.Many results were obtained using the different cases, these results have been summarized in Table 2 and Figure 20, along with the performance of the methods in the literature.

Conclusions
This paper presents an effective method to overcome the restricted number of training sets using the property of face symmetry.The use of this property also reduced the effect of illumination and pose variations.First, a new set of face images was generated using the left and right halves of each face.Second, the original and generated samples were preprocessed using the 2-D DWT, GLPF, and

Conclusions
This paper presents an effective method to overcome the restricted number of training sets using the property of face symmetry.The use of this property also reduced the effect of illumination and pose variations.First, a new set of face images was generated using the left and right halves of each face.Second, the original and generated samples were preprocessed using the 2-D DWT, GLPF, and DoG; then the features of these samples were extracted using the LBP, the GLCM, and the Gabor filter methods.Finally, the Euclidean classifier was used to obtain the results of the recognition.The use of the GLCM alone is not recommended, but it could support the performance of the LBP by using the combined features from both methods.It could be well-observed that combining features from different methods provided a better performance, as opposed to using a single method.The Gabor filter was indeed a very helpful tool in FR.This paper also showed that the use of the preprocessing stage in the recognition system improved the accuracy of FR, as compared to not using any of the preprocessing stages.Although the method was especially effective when the set of training samples was small, it took more time to process the increased number of training samples.

56 GLCM
, and Gabor filter were used for feature extraction.Finally, the Euclidean Distance was used 57 for classification, as shown in Figure 1.

Figure 3 . 2 3. 3 .
Figure 3. Gaussian Low-Pass Filter for ( = 2 3.3.Difference of Gaussians (DoG) If there are two copies of the same image and these two copies are being filtered using two Gaussian filters with different variances  and  (where   ), to produce two new images,

Figure 4 .
Figure 4.The representation of Gray Level Co-Occurrence Matrix (GLCM) with different angles (θ) and different distances (D) from the pixel of interest.

Figure 4 .
Figure 4.The representation of Gray Level Co-Occurrence Matrix (GLCM) with different angles (θ) and different distances (D) from the pixel of interest.
2 P is required to produce decimal values.The traditional LBP [6] analyzes the texture of the image and thresholds a 3 × 3 square neighborhood as the center pixel value.It only uses the sign information to produce the LBP, as illustrated in Figure 5 [4].

Figure 6 .
Figure 6.Circular (8, 2) neighborhood The term  , is used to describe the LBP operation, where u 2 denotes the use of a uniform pattern.The resulting histogram results in the necessary information distributed in the image, such as edges, corners, uniform areas, etc.The effective operation must take care of the spatial information in the image, during the representation.One strategy to accomplish this is to partition the image into a number of small areas  ,  , … ,  [6], where  is the number of areas.If the size of the histogram is B, then the length of the feature vector is mB.It is obvious from this relation that the number of areas m determines the length of the feature vector, which means selecting small areas results in long feature vectors, leading to extreme use of memory and a slow classification processing.Selecting large areas causes a loss of spatial information.An example of a preprocessed face image partitioned into thirtysix windows and the resulting face feature histogram are illustrated in Figure 7 [62].

Figure 7 .
Figure 7. Example of a preprocessed face image partitioned into thirty-six windows and its feature histogram using the Local Binary Pattern (LBP).

Figure 7 .
Figure 7. Example of a preprocessed face image partitioned into thirty-six windows and its feature histogram using the Local Binary Pattern (LBP).

Figures 8 -
show the effect of changing some parameters for the function of a Gabor.

Figures 8 -Figure 8 .Figure 9 .Figure 10 .
show the effect of changing some parameters for the function of a Gabor.

Figures 8 -Figure 8 .Figure 9 .Figure 10 .
show the effect of changing some parameters for the function of a Gabor.
distinct persons, 10 images for each person.The dataset is varied in many aspects.First, the images are taken at different times during the lives of the people.Second, the images include different variations and different facial expressions, such as closed or open eyes.Some of the people are smiling, others are not.In addition, there are a number of people wearing spectacles while others are not wearing spectacles.Furthermore, a number of the images include up to twenty degrees of tilting and rotation of the face [3].

Figure 13 .
Figure 13.Sample images from the Olivetti Research Laboratory (ORL) Dataset

Figure 13 .
Figure 13.Sample images from the Olivetti Research Laboratory (ORL) Dataset.

Figure 14 .
Figure 14.Sample images from the Yale Face Dataset.

303 7 .
Experiments and Results , Collaborative the Original with Symmetrical Training Samples (OSTS) from the ORL and the Yale datasets.

315 7 . 1 .
Generating New Images316In order to increase the size of the training data, new training images were generated using the 317 property of face symmetry, since those images reflect some part of the face that is not shown by the 318 original images, as illustrated in Figure15.

Figure 15 .
Figure 15.(a) Original image; (b) left side; (c) right side; (d) mirror of left side; (e) mirror of right side;

Figure 15 .
Figure 15.(a) Original image; (b) left side; (c) right side; (d) mirror of left side; (e) mirror of right side; (f) integrating left side with mirror; (g) integrating right side with mirror; and (h) Discrete Wavelet Transform (DWT) of the original image in the first level.

Figure 15 .
Figure 15.(a) Original image; (b) left side; (c) right side; (d) mirror of left side; (e) mirror of right side; (f) integrating left side with mirror; (g) integrating right side with mirror; and (h) Discrete Wavelet Transform (DWT) of the original image in the first level.

Figure 16 .
Figure 16.Recognition rates using LBP with original training sample (LBP-OTS) compared with LBP with original and symmetrical training samples (LBP-OSTS).

Figure 16 .
Figure 16.Recognition rates using LBP with original training sample (LBP-OTS) compared with LBP with original and symmetrical training samples (LBP-OSTS).

Figure 17 .
Figure 17.Recognition rates using LBP, LBP with Discrete Wavelet Transform (DWT-LBP), LBP with Gaussian Low-Pass filter (GLPF-LBP), and LBP with Difference of Gaussian (DoG-LBP) methods, versus size of the training set of the ORL dataset (OSTS).

Figure 17 .
Figure 17.Recognition rates using LBP, LBP with Discrete Wavelet Transform (DWT-LBP), LBP with Gaussian Low-Pass filter (GLPF-LBP), and LBP with Difference of Gaussian (DoG-LBP) methods, versus size of the training set of the ORL dataset (OSTS).

Figure 18 .
Figure 18.Recognition rates using the LBP, the Gray Level Co-Occurrence Matrix (GLCM), and the combination of the LBP with the GLCM (LBP-GLCM) methods versus the size of the training set of the ORL dataset (OSTS).

Figure 18 .
Figure 18.Recognition rates using the LBP, the Gray Level Co-Occurrence Matrix (GLCM), and the combination of the LBP with the GLCM (LBP-GLCM) methods versus the size of the training set of the ORL dataset (OSTS).

Figure 19 .
Figure 19.Recognition rates using different methods: Principal Component Analysis (PCA), Local Binary Pattern (LBP), LBP with Discrete Wavelet Transform (DWT LBP), LBP with Gaussian Low-Pass filter (GLPF-LBP) Gray Level Co-Occurrence Matrix (GLCM), combination of LBP with GLCM (LBP-GLCM), and the Gabor versus size of the training set of the ORL dataset (OSTS).

Figure 19 .
Figure 19.Recognition rates using different methods: Principal Component Analysis (PCA), Local Binary Pattern (LBP), LBP with Discrete Wavelet Transform (DWT LBP), LBP with Gaussian Low-Pass filter (GLPF-LBP) Gray Level Co-Occurrence Matrix (GLCM), combination of LBP with GLCM (LBP-GLCM), and the Gabor versus size of the training set of the ORL dataset (OSTS).

Figure 20 .
Figure 20.Rates of recognition using different methods: Principal Component Analysis (PCA), Collaborative Representation-Based Classification (CRC), Sparse Representation-Based Classification (SRC), Collaborative Representation-Based Classification Using Symmetry (SCRC), and the Gabor Method Using Original and Symmetrical Training Samples (Gabor-OSTS), versus the size of the training set on the Yale dataset.

Figure 20 .
Figure 20.Rates of recognition using different methods: Principal Component Analysis (PCA), Collaborative Representation-Based Classification (CRC), Sparse Representation-Based Classification (SRC), Collaborative Representation-Based Classification Using Symmetry (SCRC), and the Gabor Method Using Original and Symmetrical Training Samples (Gabor-OSTS), versus the size of the training set on the Yale dataset.

Table 1 .
The recognition rates of the different methods on the ORL dataset, using the OTS compared with the OSTS.

Table 1 .
The recognition rates of the different methods on the ORL dataset, using the OTS compared with the OSTS.

Table 2 .
The recognition rates of the different methods on the Yale dataset, using the OTS and the OSTS.