A Survey of 2d Face Recognition Techniques

Despite the existence of various biometric techniques, like fingerprints, iris scan, as well as hand geometry, the most efficient and more widely-used one is face recognition. This is because it is inexpensive, non-intrusive and natural. Therefore, researchers have developed dozens of face recognition techniques over the last few years. These techniques can generally be divided into three categories, based on the face data processing methodology. There are methods that use the entire face as input data for the proposed recognition system, methods that do not consider the whole face, but only some features or areas of the face and methods that use global and local face characteristics simultaneously. In this paper, we present an overview of some well-known methods in each of these categories. First, we expose the benefits of, as well as the challenges to the use of face recognition as a biometric tool. Then, we present a detailed survey of the well-known methods by expressing each method's principle. After that, a comparison between the three categories of face recognition techniques is provided. Furthermore, the databases used in face recognition are mentioned, and some results of the applications of these methods on face recognition databases are presented. Finally, we highlight some new promising research directions that have recently appeared.


Introduction
The advent of the computer and its capacity to store and visualize large amounts of information have led to the emergence of biometrics, such as face recognition, voice recognition, retinal scanning, fingerprint, etc.
Biometric technology is getting not only more and more important, but also more and more widely studied by many researchers.Thanks to its incomparable performance, it encompasses both the technologies used to measure and those applied to analyze the unique characteristics of a person.Indeed, there are two types of biometrics: behavioral and physical.The former is generally used for verification, while the latter can be used either for identification or verification.
For instance, "facial recognition" is one of the biometrics used for identification.It has been for a long time a very interesting area that has attracted the interest of several researchers for being non-intrusive, very popular and not expensive.Over the last few decades, many techniques, whose applications include video conferencing systems [1][2][3][4][5], facial reconstruction, security, etc., have been proposed to recognize a face in a 2D image.
As shown in Figure 1, a face recognition system can be divided into three stages, namely face detection, feature extraction and face recognition.Face recognition system starts with detecting the existence of a face in an image.Generally, a face detection system can decide if an image contains a face or not.If it does, the system's role is to locate the position of one or more faces in the image.
However, this step becomes difficult if variations in illumination, position, facial expression (smiling, surprise, etc.), orientation and morphological criteria (mustaches, glasses, etc.) occur.All of these obstacles can prevent the proper face detection and consequently decrease the rate of face detection.
After detecting a face in an image, we proceed to extract the features of the face [6][7][8].This step is important for the recognition of facial expressions and also for their animation.This step is to extract a feature vector called the signature from the detected face.The latter is then sufficient to represent a face.It must verify the uniqueness of the face, as well as the property of discriminating between two different individuals.It should be noted that this phase can be made with the face detection step.
Finally, the face recognition involves authentication and identification.Authentication involves comparing a face with another in order to approve the requested identity.Identification, however, compares a face with several other faces given to find the identity of the face among several possibilities.
In this paper, we present the state of the art of existing works in this area by focusing on approaches that revolutionized the world of face recognition, as well as recent approaches.
Although there are many recognition tools, the most commonly-used one is fingerprints.Nevertheless, several studies proved that the most reliable characteristic is iris texture, because it is stable throughout life.The two previously-mentioned methods (fingerprints and iris texture) have the major drawback of being intrusive.They also present constraints for users; that is why their application areas are considerably limited.
Conversely, the facial image recognition systems exhibit no constraint for users.Indeed, face recognition has several advantages, among which we can mention: • Short time: This is one of the fastest biometric modalities.One can talk about real-time application because you have to go through the biometric system only once.• High security: Let us take the example of a company that is checking the identities of people at the entry; such a biometric system allows not only employees to check presence at the time, but also any visitor can be added to the biometric system.Therefore, this system does not provide access to individuals not included in the system.• Automatic system: This system works automatically without being controlled by a person.
• Easy adaptation: It can be easily used in a company.It only requires the installation of the capturing system (camera).• High success rate: This system has achieved high recognition rates, especially with the emergence of three-dimensional technology, which makes it very difficult to cheat.Subsequently, this gives confidence to the system users.• Acceptance in public places: It allows getting gigantic databases and, thus, improving the recognition performance.
Among the six biometric attributes (face, voice, eye, hand, signing, fingers) considered by [9], facial features mark a compatibility score in the MRTD system ("Machine-Readable Travel Documents") based on several evaluation factors, like enrollment, renewal data, required materials and user perception [10].This score is shown in Figure 2.
Like other biometrics, face recognition has its specificity and its application fields.It has become a viable technology in our modern life.
There are various areas of face recognition application that can be used in the public sector (driving license, military application, sporting event, airport, etc.) and in the private one (online service, commerce, banking, embedded application, mobile device security, etc.).In applications, such as passport control, the images are required to adhere to an existing database.However, in uncontrolled environments, like non-intrusive monitoring, a subject can be found up, down, left or right, causing a rotation out of plane.Indeed, local approaches, such as Elastic bunch graph matching (EBGM) and local Binary Pattern (LBP) , are more robust against variations of poses than holistic approaches.However, their tolerance to pose changes is limited to small rotations.• Presence of occlusions: The use of accessories (sunglasses, scarves, hats, etc.) that partially obstruct the face area and the movement of the individual itself, such as the movement of the hand, can create an occlusion during which part of the information is lost or replaced.It is worth noting that methods based on the local regions have been successfully used in the case of partial occlusion.• Image falsification: Some facial recognition systems can be easily fooled by face images.
For example, mobile device unlocking, based on facial recognition, can easily be faked with a picture of a person's facial image, which can be available on the Internet, as well as on social networks.• Noise: This noise occurs because of the camera sensor during image capturing.The nature of these cameras in the world and the quality of the sensors make this noise inevitable, badly affecting face recognition.• Blur effect: Movement and atmospheric blur are the main sources of blur in face images.
This blurring can be caused either by peoples' movement (such as surveillance) or by the relative motion between the camera and the captured subject, as is the case in the maritime environment.

2D Face Recognition Survey
Facial recognition has since long ago been a very interesting area that attracted the attention of many researchers.Indeed, several techniques have been proposed to recognize a face in a 2D image.To understand the principle of each technique, we will classify these approaches into three categories according to the manner of treating the face image.
The first category includes the global (holistic) approaches, which use the entire face as the input data for the proposed recognition system.These data will then be projected onto a subspace of small dimension.
The second category involves local recognition approaches.They do not consider the whole face, but only some features or areas of the face that are classified according to well-defined statistics.
Hybrid approaches and methods based on statistical models represent the third category.This class includes hybrid approaches that use simultaneously global and local characteristics in order to exploit the advantages of the two above-mentioned categories to improve the 2D face recognition rate.It also includes approaches based on statistical models that formalize the relationships between variables in the form of mathematical equations that describe how one or more random variables are related to one or more random variables.The model is considered statistical when the variables are not deterministic, but stochastically related.

Global Approaches
In these approaches, also called appearance-based methods, face images are globally treated, i.e., there is no need to extract characteristic points or facial regions (mouth, eyes, etc.).Thus, a face image is represented by a matrix of pixels, and this matrix is often transformed into pixel vectors to facilitate their manipulation.Although these approaches are easy to implement, they are sensitive to variations (poses, lighting, facial expressions and orientation).Indeed, any change in the face image results in a change of pixel values.As mentioned previously, in global methods, face input data are projected later in low dimensional space.Indeed, a form of class "face" is located in a sub-space image, which often has other forms (trees, houses, etc.).
Let us consider a 2D face image sized 60 × 60 pixels.A few pixels may correspond to the face, while the remaining ones may have other shapes (background, car, etc.).Therefore, the original image can be greatly reduced by considering only the face part.Based on the technique used to model the sub-projection areas of the face input data, this category may itself be divided into linear and non-linear approaches.

Linear Techniques
These approaches use a linear projection of the image data input from a large space into an area of a relatively smaller size space (the face sub-space).However, such projection has two major drawbacks.First, the non-convex face variations, which allow us to distinguish different individuals, cannot be preserved.Thus, to compare the vectors of the pixels of a linear subspace, the used Euclidean distances are not very effective in classifying the face/non-face forms and individuals.Therefore, the detection and recognition rate of these methods are generally unsatisfactory.Several techniques can be classified as linear techniques: • Eigenface [11]: This is a very popular approach used for face recognition.It is based on the PCA technique (principal component analysis) allowing the transformation of any training image into an "eigenface".Its principle is the following: given a set of sample faces images, it essentially aims at finding the main components of these faces.This amounts to determining the eigenvectors of the covariance matrix formed by the set of the sample images.Each example will then be described by a linear combination of these eigenvectors.Figure 3 shows the eigenfaces constructed from the ORL database.To construct the covariance matrix, each face image is transformed into a vector.Each element of the vector corresponds to the pixel intensity.This transformation of the pixel matrix destroys the geometric structure of the image.
• 2D PCA (two-dimensional PCA) [13]: To avoid losing information about the neighborhood during the transformation of the image into a vector, a two-dimensional PCA method (2D PCA) was proposed.This method takes as input images rather than vectors.The reconstructed images appear more clearly when the number of sub-images increases.PCA (eigenfaces) was also used to represent and reconstruct the same face image.It is not so efficient in reconstructing the image.
• Independent Component Analysis (ICA) [14]: This is a method conceived primarily for signal processing.It consists of expressing a set of N random variables x 1 , ..., x n as a linear combination of N statistically-independent random variables s j , such as: x j = a j,1 s 1 + a j,2 s 2 + .... + a j,n s n (1) or in a matrix form, such as: • Multidimensional scaling (MDS) [15]: This is another well-known technique of linear dimension reduction.Instead of keeping the variance of data during projection, it strives to preserve all distances between each pair of examples dist(x i , x j ) seeking a linear transformation that minimizes energy.This minimization problem can be solved by eigenvalue decomposition.Using the Euclidean distance between data, the outputs of the MDS are the same as those of PCA.They are obtained by a rotation followed by a projection.• Non-negative matrix factorization (NMF) [16]: The non-negative matrix factorization is another method that represents the face without using the notion of class.The algorithm of NMF, such as PCA, treats the face as a linear combination of vectors of the reduced space base.The difference is that NMF does not allow negative elements in the vectors of the base in the combination weight.In other words, certain vectors in space reduced by PCA (eigenfaces) resemble the distorted versions of the entire face, while those reduced by NMF are located objects that better reflect parts of the face.• Linear discriminant analysis (LDA) [17]: There are other techniques that are also constructed from linear decomposition, such as linear discriminant analysis (LDA).While PCA builds a subspace to represent, in an optimal way, "only" the object "face", LDA constructs a discriminant subspace to distinguish, in an optimal way, the faces of different people.LDA, also called "Fisher linear discriminant" analysis, is one of the most widely-used approaches for face recognition.It uses the reduction criterion based on the concept of the separability of data per class.LDA includes two stages: the original space reduction by the PCA and the vectors of the final projection space, called "Fisher faces".The latter are calculated on the basis of the classes' separability criterion, but in the reduced space.This need for the input space reduction is caused by the total scattering matrix singularity criterion of the LDA approach.Comparative studies show that methods based on the LDA usually give better results than those based on PCA.• Improvements of PCA, LDA and ICA techniques: Many efforts have been made to improve the linear techniques of subspace analysis for face recognition.For example, the work done in [18] improved PCA to deal with pose variation.The probabilistic subspace was introduced to provide a more significant measure similarity in the probabilistic framework.Besides, the author [19] presented a combination between the D-LDA (direct LDA) and the F-LDA (fractional LDA), a variant of the LDA in which the weighted functions are used to avoid misclassification caused by too close categories' products.Thus, the author [20] proposed an approach based on the multi-linear tensor decomposition of image sets to resolve the confusion of several factors related to the same face recognition system, such as lighting and pose.• Independent high intensity Gabor wavelet [21]: To improve face recognition, high intensity feature vectors are extracted from the Gabor wavelet transform of frontal facial images combined together with the ICA [14].The characteristics of the Gabor wavelet have been recognized as one of the best face recognition representations.• Gabor features, LDA and ANN classifier [22]: In this work, a methodology was adopted to improve the robustness of the facial recognition system using two popular methods of statistical modeling to represent a face image: PCA and LDA.These techniques allow extracting the discriminative features of a face.A human face image pre-processing was done using Gabor wavelets that eliminate variations due to pose and lighting.PCA and LDA extract discrimination and low dimension feature vectors.The latter was used in the classification phase during which the back-propagation neural network (BPNN) was applied as a classifier.This proposed system was successfully tested on the ORL face database with 400 frontal images of 40 different subjects of variable lighting and facial expressions.Furthermore, a very large number of linear techniques was used to calculate the feature vectors.Among these techniques, we can mention: -Regularized discriminant analysis (RDA) [23].
Although these global linear methods, based on global appearance, avoid the instability of the first geometric methods that were developed, they are not specific enough to describe the subtleties of geometric varieties present in the space of the original image.This is due to their limitations to manage the non-linearity in facial recognition.In other words, their nonlinear varieties' deformations can be smoothed, and concavities may be fulfilled, causing adverse consequences.

Non-Linear Techniques
When the input data structures are linear, linear approaches offer a faithful representation of sparse data.However, when the data are non-linear, the solution, adopted by several researchers, is to use a function, named the "kernel" function, to build a large space in which the problem becomes linear.
Thus, linear techniques for dimensionality reduction can be applied when the intrinsic structure of the data is not linear.These methods typically use the "kernel trick", which proposes that any algorithm, formulated with a kernel function, can be reformulated with another kernel function.
A common process consists of expressing the method with a scalar product using a kernel function.The kernel "trick" allows working in the transformed space without having to explicitly calculate the image of each datum.In this context, several non-linear approaches were proposed: • Kernel principal component analysis, KPCA [34]: This is a non-linear reformulation of the classic linear technology PCA using kernel functions.KPCA calculates the main eigenvectors of the matrix of kernels rather than the covariance matrix.This reformulation of classical PCA can be seen as a realization of PCA on the large space transformed by the associated kernel function.KPCA allows, then, the construction of nonlinear mappings.First, it calculates the matrix of kernel K of points, x i , whose entries are defined by [35].
As the KPCA technique is based on "kernels", its performance greatly depends on the choice of the kernel function K.The typically-used kernels are linear, then they amount to performing classical PCA, the polynomial kernel or the Gaussian kernel [35].KPCA was successfully applied in several problems, such as speech recognition [36] or the detection of new elements of a set [34], but the major weakness of KPCA is that the size of the kernel matrix is the square of the number of samples of the training set, which can quickly be prohibitive.• Support vector machine (SVM) [37]: This is a learning technique effectively used for "pattern" recognition with its high generalization performance without the need to add more knowledge.Intuitively, given a set of points belonging to two classes, SVM finds the hyperplane that separates the largest possible fraction of points of the same class at the same side, while maximizing the distance between two classes to a hyperplane called the optimal separating hyperplane (OSH).It reduces the risk of misclassification not only for examples of the learning set, but also for the invisible example of the test set.SVM can also be considered as a way to train polynomial neural networks or "radial basis" function classifiers.Learning techniques used here are based on the principle of structure risk minimization (SRM), which states that the best generalization capabilities are achieved by minimizing the boundary of the generalization error.The application of SVM in computer vision problem was, afterward, proposed.
Years later, the work presented in [38] used the SVM with a binary tree recognition strategy to solve the problems of face recognition.He began by extracting the features and then the functions of discrimination between each pair learned by SVM.After that, the disjoint test sets passed to the recognition system.To construct a binary tree structure, [39] proposed to recognize the test samples.Other nonlinear techniques have also been used in the context of facial recognition: -KICA (kernel independent component analysis) [40].
These methods of projecting the space of images on the feature space are nonlinear, allowing, to a certain extent, a better reduction of the image size.However, although these techniques often improve recognition rates on some given tests, they are too flexible to be robust to new data, unlike the linear methods.

Local Approaches
Local approaches treat only some facial features that are later classified according to well-defined statistics.
Local methods, also called feature-based methods, can be classified into two categories: • Interest-point based on face recognition methods: we first detect the points of interest.Then, we extract features localized on these points.• Local appearance-based face recognition methods: the face is divided into small regions (or patches) from which local characteristics are directly extracted.

Interest-Point-Based Face Recognition Methods
In these methods, we begin by extracting specific geometric features, such as the width of the head, the distance between the eyes, etc.Then, these data will be an entry for "classifiers" to recognize individuals.
These methods can be divided into two classes according to the point of interest.The first category focuses on the performance of the detectors of the face characteristic points; whereas the second class deals with more elaborated representations of information carried by the characteristic points of the face, rather than just the geometric characteristics.
• Dynamic link architecture (DLA) [56]: This approach is based on the use of a deformable topological graph instead of a fixed topological graph as in [57] in order to propose a facial representation model called DLA.This approach allows varying the graph in scale and position based on the appearance change of the considered face.Indeed, the graph is a rectangular grid localized on the image where the nodes are labeled with the responses of Gabor filters in several directions and several spatial frequencies, called "jets".However, the edges are labeled by distances, where each edge connects two nodes on the graph.The comparison between two face graphs is performed by deforming and mapping the representative graph of the test image with each of the representative graphs of the reference images.• Elastic bunch graph matching (EBGM) [58]: This is an extension of DLA in which the nodes of the graphs are located on a number of selected points of the face.For instance, EBGM was one of the most efficient algorithms in the FERET competition in 1996.Similarly, Wiskott et al. [58] used Gabor wavelets to extract the characteristics of the points detected because Gabor filters are robust to illumination changes, distortions and scale variations.• Geometric feature vector [59]: This technique uses a training set to detect the position of the eye in an image.It first calculates, for each point, the correlation coefficients between the test image and the images of the training set and then it searches the maximum values.• Face statistical model [60]: This approach used many detectors of specific features for each part of the face, such as eyes, nose, mouth, etc.The work presented in [61] proposed to build statistical models of facial shapes.Despite all of these research works, there are no sufficiently reliable and accurate feature points.• Feature extraction by Gabor filter [62]: This consists of detecting and representing facial features from Gabor wavelets.For each detected point, two types of information are stored: its position and its characteristics (the features are extracted using Gabor filter on this point).To model the relationship between the characteristic points, a topological graph is built for each face.
Years later, the success of these methods was an incentive for some recent works.
• Robust visual similarity retrieval in single model face databases [64].
To conclude, many methods, based on extracting feature points, have been proposed.They can be effectively used for face recognition where only one reference picture is available.However, their performance depends on many effective algorithms for locating facial feature points.In practice, the precise characteristic point detection task is not easy and has not been completely resolved, especially in cases where the shape or appearance of a facial image can vary widely [65].

Local Appearance-Based Face Recognition Methods
Once local regions are defined, we continue to choose the best way to represent information about each region.This step is critical to the performance of the recognition system.The commonly-used characteristics are: Gabor coefficients [66], Haar wavelets [67], Fourier transforms, scale-invariant feature transform (SIFT) [68], the characteristics based on the local binary pattern method (LBP) [69], local phase quantization (LPQ) [70], Weber law descriptor (WLD) [71] and binarized statistical image features (BSIF) [72].
• LBP and its recent variant [73]: The original LBP method labels the image pixels with decimal numbers.LBP encodes the local structure around each pixel compared with its eight neighbors in a (3 × 3) neighborhood by subtracting the value of the central pixel.Therefore, strictly-resultant negative values are encoded with zero and the other with one.
For each given pixel, a binary number is obtained by concatenating all of the binary values in a clockwise direction, which starts from one of its top left neighborhoods.The corresponding decimal value of the generated binary number is then used to mark the given pixel derivative binary numbers called LBP codes [74].
The methodology of LBP has recently been developed with a great number of variations in order to improve various applications' performance.These variations focus on different aspects of the original LBP operator: -Improvement of its discriminatory capacity [75].
-The selection of the neighborhoods [77].
Compared to global approaches, local methods have certain advantages.First, they can provide additional information based on the local regions.In addition, for each type of local characteristic, we can choose the most appropriate classifier.
Despite these advantages, the integration of more general structure information is required in local approaches.
In general, there are two ways to achieve this goal.The first way is to integrate global information on the algorithms using data structures, such as a graph where each node represents a local feature, while an edge between two nodes represents the spatial relationship between them.
Face recognition is therefore a problem of matching two graphs.However, the second way is to use the score fusion techniques: separated classifiers are used on each local characteristic to calculate similarity.Then, the similarities obtained are combined to provide a global score for the final decision.

Hybrid Approaches and Methods Based on Statistical Models
This third category includes hybrid approaches that use simultaneously global and local characteristics in order to exploit the advantages of both local and global methods.It also includes the techniques based on statistical models.The latter formalizes the relations between the variables in the form of mathematical equations that describe how one or more random variables are related to one or more random variables.This model is considered statistical when the variables are not deterministic, but stochastically related.
• Hidden Markov model (HMM) [78]: The hidden Markov models began to be used in 1975 in different fields, especially in voice recognition.They were fully operated from the 1980s in speech recognition.Then, they were applied in manuscript text recognition, image processing, music and bioinformatics (DNA sequencing, etc.), as well as in cardiology (segmentation of the ECG signal).
The hidden Markov models, also called Markov sources or "probabilistic functions of Markov", are powerful stochastic signals modeling statistic tools.These models have been proven to be efficient since their invention by Baum and his colleagues.They were mainly used in speech processing.They can be defined by a statistical model of the Markov chain.This latter is a statistical model composed of "states" and "transitions".For face images, significant facial regions (hair, forehead, eyebrows, eyes, nose, mouth and chin) are placed in a natural order from top to bottom even if the image is taken under small rotations.
For each of these regions, a state from left to right is affected.The structure of the face model of the state and the non-zero transition probabilities are shown in Figure 5: • Gabor wavelet transform based on the pseudo hidden Markov model (GWT-PHMM) [21]: This is an approach that combines the multi-resolution capability of Gabor wavelet transform (GWT) with local interactions of facial structures expressed through the pseudo-hidden Markov model (HMM).Unlike the traditional "zigzag scanning" method for feature extraction, a continuous analysis method should be carried out from top left to right then from top to bottom and right to left, and so on, until the bottom right of the image, spiral scanning, which is proposed for a better selection of features.Furthermore, unlike traditional HMM, PHMM does not carry the state of conditional independence of the states of the visible observation sequence hypothesis.This result is achieved thanks to the concept of local structures introduced by the PHMM used to extract face bands and automatically select the most informative features of a facial image.Again, the use of the most informative pixels rather than the whole picture makes this proposed face recognition method reasonably quick.• Recognition system using PCA and discrete cosine transform (DCT) in HMM [79]: Without using DCT, PCA is directly used to reduce the dimension.First, the details of the face are taken in blocks, and the DCT is applied on these blocks.Then, without using the inverse DCT transform, the PCA method is applied directly to the reduced dimensions and, thus, makes this system faster.• HMM-LBP [80]: This is a hybrid approach called HMM-LBP permitting the classification of a 2D face image by using the LBP tool (local binary pattern) for feature extraction.It consists of four steps.First, [80] decomposes the face image into blocs.Then, this approach extracts image features using LBP.After that, it calculates probabilities.Finally, it selects the maximum probability.• Hybrid approach based on 2D wavelet decomposition SVD singular values [81]: This approach presents an effective face recognition system using the eigenvalues of the wavelet transform as feature vectors and the radial basis function neural network (RBF) as a classifier.Using the 2D wavelet transform, face images are decomposed into two levels.Then, the wavelet coefficients' average is calculated to find the characteristic centers.• Multi-task learning-based discriminative Gaussian process latent variable model DGPLVM [82]: This is a different approach that relies on a single data source learning to gain more data from multiple sources/domains to improve performance in the target area.In this work, we use asymmetric multi-task learning as it focuses only on improving the performance of the target task.This constraint aims at maximizing the mutual information between the target data distributions of the domain and data from multiple sources/domains.In addition, the Gaussian face model is a reformulation based on the Gaussian process (GP), a method of the nonparametric Bayesian core.Therefore, this model can also adapt its complexity to complex data distributions in the real world without heuristics or parameters' manual settings.• Discriminant analysis on Riemannian manifold of Gaussian distributions (DARG) [83]: Its objective consists of capturing the distribution of the underlying data in each set of images in order to facilitate the classification and make it more robust.To this end, [83] represents the set of images as a mixture of m Gaussian models (GMM) comprising a prior number of Gaussian components with probabilities.He sought to discriminate the various Gaussian components of different classes.
Given the geometric information, Gaussian components lie on a specific Riemannian manifold.
To correctly encode such a Riemannian manifold, DARG uses several distances between Gaussian components and draws a series of provably-defined positive probabilistic cores.With the latter, a weighted discriminate analysis of cores is finally developed to treat Gaussian GMM as samples and their prior probabilities as sample weights.• Affine local descriptors and probabilistic similarity [84]: This technique combines the affine transform of invariant features SIFT with probabilistic similarity under a great change of perspective.The affine SIFT, an extension of SIFT that detects local invariant descriptors, generates a series of different views using the affine transformation.In this context, it allows a difference of views between the face image of the "gallery", the "probe" and the face of the probe.However, the human face is not flat because it contains important 3D depth.Obviously, this approach is not effective for large changes in pose.In addition, it combines with probabilistic similarity that obtains the similarity between the face of "probe" and "gallery" based on the sum of squared differences (SSD) distribution in an online learning process.• PCA and Gabor wavelets [85]: This is a new approach that uses a face recognition algorithm with two steps of recognition based on both global and local features.For the first step of the coarse recognition, the proposed algorithm applies the principal components analysis (PCA) to identify a test image.The recognition step ends at this stage if the result of the confidence level proves to be reliable.Otherwise, the algorithm uses this result to filter images of the top candidates with a high degree of similarity and transmits them to the next recognition step where Gabor filters are used.
Since the recognition of a face image with Gabor filter is a heavy calculation task, the contribution of this work is to propose a more flexible and faster hybrid algorithm of face recognition carried out through two stages.
• Manual segmentation-Gabor filter-neural network [86]: This is another feature extraction technique that has given a high recognition rate.In this approach, facial topographical features are extracted using a manual segmentation of the facial regions of the eyes, nose and mouth.Thereafter, the Gabor transform of these regions' maximum is extracted to calculate the local representation of these regions.In the learning phase, this approach uses the method of nearest neighbor to compute the distances between the three feature vectors of these regions and the corresponding stored vectors.• HMM-SVM-SVD [87]: This is a combination of two classifiers: SVM and HMM.The former is used with the features of PCA, while the latter is a one-dimensional model in seven states wherein features are based on the singular value decomposition (SVD).This approach uses these combination rules for merging the outputs of SVM and HMM.It was successful with a 100% recognition rate for the ORL database.• Merging of local and global features based on Gabor-contourlet and PCA [88]: This is a combination of two types of features using local features, extracted by Gabor transform, and global ones, extracted via "contourlet transform".The recognition step is finally made by the PCA classifier.• SIFT-2D-PCA [89]: This global approach combines the SIFT, a local feature extraction method, and 2D-PCA, which represents an improvement of PCA.Since SIFT is used to extract distinctive features that are invariant to scale changes, orientation and lighting; it will be beneficial for recognition even if the global features are not available.2D-PCA is used for the extraction of the global features, as well as for the size reduction.• Multilayer perceptron-PCA-LBP [90]: This approach applies a very recent recognition method used to show the different changes (lighting, head position, facial expressions).That is why it makes the global and local feature extractions respectively using PCA and LBP.Thus, these global and local features are introduced to the network called MLP (multilayer perceptron).Finally, the classification is made by the BPMLP network (backpropagation multilayer perceptron).• Local directional pattern [91]: This is a method using the model of local direction.In this approach, the LDP feature to each pixel position is obtained by calculating the response values for the image in the eight different directions.Then, this image LDP is used as an input of the 2D-PCA for feature extraction and representation.However, the nearest neighbor classifier is used for face recognition.
Although this method has a good recognition accuracy under various lighting environments, it works only with frontal images.• Wavelet transform and directional LBP [92]: This begins with the pre-treatment using the wavelet transform in order to get series of different resolutions of sub-images and the wavelet decomposition to get different scale components.Thereafter, a Directional Wavelet LBP (DW-LBP) histogram for the different weighted face image sub-regions is calculated.Chi square is used for matching sequences of the histogram.This method reduces the computational complexity and improves the recognition rate, but it cannot be applied on different poses.

Comparison between Global, Local and Hybrid Approaches
In this section, we present a brief summary of the advantages and disadvantages of each category of face recognition approaches.Besides, for each facial recognition approach, we focus on some advantages and disadvantages that characterize each sub-class.This comparison is summarized in Table 1.

Approach Advantages Disadvantages
-Quick to implement -Calculations of medium complexity.
-Very sensitive to variations in illumination, pose and facial expression.
-It requires a very large memory size.
-No preservation of non-convex face variations allowing differentiating individuals.
GLOBAL Linear -Reduction of the dimension of the images.
-Space of representation faithful to the data when the data structure is linear.
-Euclidean distances used are not very effective either for classification between facial and non-facial shapes or for classification between individuals.
-Face detection and recognition rate generally unsatisfying.
Non linear -The use of non-linear methods of projection of images space on the feature space remarkably reduces the images size.
-The improvement of recognition rates on given tests.
-They are too flexible to be efficient to new data, unlike the linear methods.
-May provide additional information based on local parts.
-For every type of local features, we can choose the most suitable classifier.
-Less sensitive to lighting variations -The integration of global information is often required.

LOCAL
Interest-Point-Based Face Recognition methods -These methods can be useful and effective for face recognition where one reference picture is available.
-Their performance depends greatly on the effectiveness of the algorithms of feature point localization.
-The detection and the geometric feature extraction are not easy and have not been reliably resolved, especially when there are occlusions, or variations in pose and facial expressions, or when the shape of the face image can widely vary [56].
-Only geometric characteristics are not sufficient to fully represent a face, and other useful information such as the values of the image to the grayscale are fully spread.Local Appearance-Based Face Recognition Methods -Ability to choose the best way to represent information from each region.
-The step is critical to the system's performance.

HYBRID
-The Combination of both global and local analysis of a face can improve the ability of the classifier.
-Allows one to exploit complementarities and provides more efficient systems and faster recognition.
-More difficult to implement than the other two approaches.

2D Face Databases
Many face databases (public or private) are available for research purposes.These databases differ from each other according to several criteria.The most interesting ones are the following: • The number of images contained in each database is the most important criterion.It is thus recommended to choose the appropriate database during the testing of an algorithm.Indeed, some have a well-defined protocol allowing direct comparison of the results.Moreover, the choice should depend on the problem to be tested: illumination, recognition over time, facial expressions, etc.The availability of many different images per person can be a decisive argument for the proper performance of an algorithm.
Table 2 below shows the main 2D faces databases.These databases present many variations in terms of: RGB image or gray, size, number of people, number of images by person, variations of the image (illumination (i) pose (p), expression (e) occlusions (o) time delay (t)) and home page on the web.

Results
The emergence of face recognition in analyzing 2D face images and the enormous interest given to this research domain have led to a continuous improvement of the results obtained by testing the previously-mentioned approaches on the different 2D face databases presented in the previous section.Table 3 below shows some results presented by the inventories of these approaches.For more organization, these results are grouped according to the used database.

The Emergence of New Promising Research Directions
As shown in Table 3, 2D face recognition has reached a significant level of maturity and a high success rate.After over three decades of research, the face recognition state of the art continues to improve and to give more accurate results thanks to its need in different research fields, such as pattern recognition and image processing.It is unsurprising that it continues to be one of the most active research areas of computer vision.Over the last few years, new promising research directions have appeared.
• 3D face recognition: Despite the high success rate achieved in 2D face recognition, this latter still has two major unsolved problems, which are illumination and pose variations.To overcome these two issues, 3D face recognition has emerged in order to provide more exact shape information of facial surfaces.For this reason, several recent techniques using 3D data have been proposed [143][144][145][146][147][148].3D face recognition has been proposed to have the potential to achieve better accuracy than the 2D field by measuring rigid feature geometry on the face.• Multimodal face recognition: On the other hand, some recent research works state that the fusion of multimodal 2D and 3D face recognition is more accurate and robust than the single modality [149] and that it improves the performance when compared to single modal face recognition.They investigate the potential benefit of fusing 2D and 3D features [150,151].• Deep learning techniques: Deep learning techniques [152] have established themselves as a dominant technique in machine learning.Deep neural networks (DNNs) have been top performers on a wide variety of tasks, including image classification, speech recognition and face recognition.
In particular, convolutional neural networks (CNN) have recently achieved promising results in face recognition.These deep learning techniques often use the public database LFW (Labeled Faces in the Wild) to train CNNs.• Infrared imagery: Amongst the various approaches that have been proposed to overcome face recognition limitations, such as pose, facial expression, illumination changes, as well as facial disguises, which can significantly decrease recognition accuracy, infrared (IR) imaging has emerged as a novel promising research direction [153,154].IR imagery is a modality that has attracted particular attention due to its invariance to illumination changes [155].Indeed, data acquired using IR cameras have many advantages as compared with common cameras, which operate in the visible spectrum.For instance, Infrared images of faces can be obtained under any lighting condition, even in a completely dark environment, and there is some proof that the infrared technique may achieve a higher degree of robustness to facial expression changes [156].
Finally, researchers have gone further by combining these new areas as [157], which has benefited from multimodal face recognition and infrared, and [158], who has used both multimodal face recognition and deep learning.

Conclusions
In this paper, we first introduced face recognition as a biometric technique.Subsequently, we presented the state of the art of biometric approaches classified into three categories.Next, we presented face databases used by researchers in this field to test their approaches and a table summarizing the experimental findings.Finally, we highlighted some new promising research directions.

Figure 2 .
Figure 2. Compatibility score for various biometric technologies in the Machine-Readable Travel Documents (MRTD) system.

Figure 3 .
Figure 3. Eigenfaces (eigenvectors) of the 12 largest eigenvalues are presented from the AT&T division of the ORL database [12].

Figure 4
shows five reconstructed images from an image of the ORL database by adding the first number of eigenvectors d (d = 2, 4, 6, 8, 10) of sub-images together at the same time.

Figure 5 .
Figure 5.The recognition of face from the right side to the left side using HMM.

Figure 6
Figure 6 summarizes the classification of face recognition approaches presented in this paper.

Figure 6 .
Figure 6.Classification of face recognition approaches.

•
The number of images per individual class: knowing that each individual is designated by a class c, the number of images of a class represents the number of the individual's representative images.Indeed, images are acquired under different conditions (orientation, facial expression, etc.).•The size of images.•Pose and orientations of faces.•The change of illumination.•Sex of the acquired persons.•The presence of artifacts (glasses, beards, etc.).•The presence of static images or videos.•The presence of a uniform background.•The period between shots.

Table 1 .
Comparative table of 2D face recognition approaches.

Table 3 .
Summary of the results of the different face recognition approaches.