The E ﬀ ects of Facial Expressions on Face Biometric System’s Reliability

: The human mood has a temporary e ﬀ ect on the face shape due to the movement of its muscles. Happiness, sadness, fear, anger, and other emotional conditions may a ﬀ ect the face biometric system’s reliability. Most of the current studies on facial expressions are concerned about the accuracy of classifying the subjects based on their expressions. This study investigated the e ﬀ ect of facial expressions on the reliability of a face biometric system to ﬁnd out which facial expression puts the biometric system at greater risk. Moreover, it identiﬁed a set of facial features that have the lowest facial deformation caused by facial expressions to be generalized during the recognition process, regardless of which facial expression is presented. In order to achieve the goal of this study, an analysis of 22 facial features between the normal face and six universal facial expressions is obtained. The results show that the face biometric systems are a ﬀ ected by facial expressions where the disgust expression achieved the most dissimilar score, while the sad expression achieved the lowest dissimilar score. Additionally, the study identiﬁed the ﬁve and top ten facial features that have the lowest facial deformations on the face shape in all facial expressions. Besides that, the relativity score showed less variances between the sample using the top facial features. The obtained results of this study minimized the false rejection rate in the face biometric system and subsequently the ability to raise the system’s acceptance threshold to maximize the intrusion detection rate without a ﬀ ecting the user convenience. signiﬁcant e ﬀ ect on FB’s matching score. The experiment was done using python dlib face recognition and Verilook on the Ja ﬀ e dataset that involved ten female users with seven di ﬀ erent modes: neutrality, happiness, sadness, anger, disgust, fear, surprise. His results showed the following: (1) by comparing the neutral faces and FE, the average genuine similarity has been degraded; (2) sadness and disgust expressions are the most dissimilar expressions among the other expressions; (3) the best class to be veriﬁed with was the normal face, as there was no facial deformation and muscle movement; (4) For users who enrolled with happy, angry, surprised, disgusted, sad, and fearful expressions, thebest expressions to verify with are fearful, sad, fearful, sad, fearful expressions, respectively; (5) the lowest matching score was achieved when users who provided happy, angry, surprised, disgusted, sad, fearful faces during the enrollment, identiﬁed themselves with disgusted, surprised, angry, happy, angry faces respectively during the veriﬁcation. Another study investigated the e ﬀ ect of FE on FB systems by M á rquez-Olivera et al. [37], where they analyzed their FB system under the inﬂuence of FE. It has been concluded that failures occurred when the subjects expressed surprise as it has maximum facial deformations while the sadness and anger expressions express high deformation on eye regions. On the other hand, the system performed better when the subject expresses happy expressions. Moreover, they also tried to overcome the e ﬀ ect of FE in the FB system by recognizing the people under their expressions; they proposed a hybrid model of Alpha–Beta Associative memories with correlation Matrix and K-Nearest Neighbors. Although the best face recognition accuracy under the inﬂuence of FE was 90% achieved by anger expressions, Khorsheed and Yurtkan claimed that the Local Binary Pattern features form a strong base for face recognition under the inﬂuence of FE. Di ﬀ erent aspects have been studied by Azimi and Pacut [1] to investigate whether the e ﬀ ect of FE on the FB system was gender-dependent. Their results on the Stirling dataset


Introduction
Authentication is the mainline in the war to verify a user's identity and reject an illegitimate user from accessing their resources. Three types of authentication can distinguish any person among the population; one approach concerns the user's knowledge-such as a password, the second approach concerns what the user has-such as a national ID card, while the third approach is to define the user themselves using their humanistic traits-"biometrics". This type of authentication is considered the most robust compared to the other approaches as these features cannot be forgotten, shared, or stolen. Biometric authentication is the procedure of recognizing the users through their physiological and behavioral traits, such as fingerprints, iris, gait, keystrokes, face. Although facial biometrics (FBs) is one of the most potent biometric technologies, it is a challenging process. The face recognition process is more complicated than with other biometrics, such as fingerprint and iris identification, since the human face can be viewed from various angles with different poses.
Different factors can affect system reliability, such as illumination, occlusion, aging, facial surgery, and facial expressions. Facial expressions (FE) are means of expressing human feelings and reactions, which include many interconnecting elements of facial muscles movements [1]. Those expressions Information 2020, 11, 485 2 of 32 result in facial feature shape changes [2]; if the user shows a different expression than the one stored in the database such as a neutral face, this will lead to a different matching result.
In biometric systems, two samples from the same person may give different matching scores due to different causes, such as: FE, lighting effect, and imaging conditions. These causes give the following errors [3][4][5]: type I error, where the system prevents the authorized person from accessing the resources as they cannot be identified; and type II error, where it gives unauthorized access to the system by identifying an unauthorized user as the authorized one. Those types of error are evaluated using the false rejection rate (FRR)-the measurement of the possibility that the system is willing to deny a genuine user, and the false acceptance rate (FAR)-the measurement of the possibility that the system is willing to accept an illegitimate user [5].
We cannot ignore FRR and the FAR when we need to assess the performance of the FB system's security. Both rates affect the system's security level and user convenience. Moreover, both have an impact on each other. As the FAR goes down, the FRR goes up and vice versa. The FAR is concerned about the system security while the FRR is concerned about user convenience; if we raise the system security, user convenience will be less. As a result, we have one of these two options: a more secure system which is less user-friendly, or a more user-friendly system which is less secure. Most of the entities prioritize user convenience over security. There is no "magic bullet" or "one size fits all" solution. Nevertheless, we can find something that can balance these two issues.
With the rapid increase of cybersecurity crimes day by day, FBs will become vital for authentication in everyday life. Many studies have been done, but even after continuous research, a truly robust and worthwhile outcome has not been achieved yet. Therefore, this study aims to analyze the FB system's reliability under the influence of different FEs to identify a set of facial features with the lowest deformations caused by FEs. These features can then be used during the recognition process, regardless of what expression is presented, to maintain the biometric system performance. The result of this analysis will help in minimizing the FRR in order to raise the acceptance threshold without affecting user convenience. This paper presents a brief about the FE and the FB performance evaluation in Section 2, while Section 3 reviews the latest studies on the related fields. Section 4 explains the work's methodology; then, Section 5 discusses the results of the work. The findings of this study are listed in Section 6, where Section 7 concludes the work.

Background
In contrary to human recognition, automatic recognition is a challenging process. Moreover, face biometrics are more difficult than other biometrics (such as fingerprint and iris) due to the fact that the human face can be viewed from various angles with different expressions. Furthermore, different factors can affect the recognition process which can be summarized into extrinsic, such as pose variation and illumination, or intrinsic, such as aging and facial expression [6,7]. Facial expressions (FE) are a non-verbal communication role between peoples which can be included in a wide range of applications, e.g., human behavior, customer relationship management, social robots, and expression recognition [8]. Facial expressions of emotion can be categorized into: anger, disgust, fear, happiness, neutrality, sadness, and surprise. Expressions can change the face shape temporally because of the deformation of the face's muscles [9]. A facial biometric system's reliability can be affected by the subject's facial expressions; happiness and sadness and other facial emotions may lead to varying levels of facial identification accuracy and as a consequence have an effect on the system's reliability [10]. There are six basic emotions (BEs) that have been identified [11]: happiness, sadness, surprise, anger, disgust, fear.
The performance of a face biometric system can be evaluated by identifying FRR and FAR errors. To avoid the ambiguity caused by systems that allow multiple attempts or multiple templates and single attempts or single templates, there are two types of performance evaluation: decision error rate and matching error rate [5]:

Related Works
To the best of the author's knowledge, no study has investigated the effect of the facial expression on a face biometric system to find out which facial features have the most impact on the facial deformations in order to improve the system performance. Most of the related papers on facial expression were concerned about the accuracy of recognition and the classifying of samples based on their modes. However, some of the works have been done in different related areas.

Features Extraction and Facial Landmarks in Face Biometric
Facial landmarks (FL) and the extracted features are further aspects needed in the field of FB. Özseven and Düenci [12] compared FB's performance using distances and slopes between FL with statistical and classification methods. They used the BioID dataset that consists of 1521 pictures for 23 subjects. Their results showed that the best accuracy achieved by distances and slopes then by distances then by slopes. They used FGNet annotation that has 20 points, as shown in Figure 1. They have found that the following landmark points-2,3,4,5,6,7-were very influenced by FE, where they considered only the other 14 points in their analysis, as shown in Figure 2.  False acceptance rate (FAR): The measurement of the possibility that the system is willing to accept the unauthorized transactions. This rate is calculated by the following equation:

Related Works
To the best of the author's knowledge, no study has investigated the effect of the facial expression on a face biometric system to find out which facial features have the most impact on the facial deformations in order to improve the system performance. Most of the related papers on facial expression were concerned about the accuracy of recognition and the classifying of samples based on their modes. However, some of the works have been done in different related areas.

Features Extraction and Facial Landmarks in Face Biometric
Facial landmarks (FL) and the extracted features are further aspects needed in the field of FB. Ozseven and Duënci [12] compared FB's performance using distances and slopes between FL with statistical and classification methods. They used the BioID dataset that consists of 1521 pictures for 23 subjects. Their results showed that the best accuracy achieved by distances and slopes then by distances then by slopes. They used FGNet annotation that has 20 points, as shown in Figure 1. They have found that the following landmark points-2,3,4,5,6,7-were very influenced by FE, where they considered only the other 14 points in their analysis, as shown in Figure 2.  Amato et al. [13] compared between 5-points features and 68-points features, as shown in Figures 3 and 4. They conducted their experiments on videos taken in a real scenario by surveillance cameras. They used dlib library and the FL detectors to implement the approach represented by [14], which returns an array of 68-points in the form of (x,y) coordinated. The results on the Wiled dataset showed that the 68 points had high mean average precision.  [12].
Information 2020, 11, x FOR PEER REVIEW 3 of 34 False acceptance rate (FAR): The measurement of the possibility that the system is willing to accept the unauthorized transactions. This rate is calculated by the following equation:

Related Works
To the best of the author's knowledge, no study has investigated the effect of the facial expression on a face biometric system to find out which facial features have the most impact on the facial deformations in order to improve the system performance. Most of the related papers on facial expression were concerned about the accuracy of recognition and the classifying of samples based on their modes. However, some of the works have been done in different related areas.

Features Extraction and Facial Landmarks in Face Biometric
Facial landmarks (FL) and the extracted features are further aspects needed in the field of FB. Ozseven and Duënci [12] compared FB's performance using distances and slopes between FL with statistical and classification methods. They used the BioID dataset that consists of 1521 pictures for 23 subjects. Their results showed that the best accuracy achieved by distances and slopes then by distances then by slopes. They used FGNet annotation that has 20 points, as shown in Figure 1. They have found that the following landmark points-2,3,4,5,6,7-were very influenced by FE, where they considered only the other 14 points in their analysis, as shown in Figure 2.  Amato et al. [13] compared between 5-points features and 68-points features, as shown in Figures 3 and 4. They conducted their experiments on videos taken in a real scenario by surveillance cameras. They used dlib library and the FL detectors to implement the approach represented by [14], which returns an array of 68-points in the form of (x,y) coordinated. The results on the Wiled dataset showed that the 68 points had high mean average precision. cameras. They used dlib library and the FL detectors to implement the approach represented by [14], which returns an array of 68-points in the form of (x, y) coordinated. The results on the Wiled dataset showed that the 68 points had high mean average precision.   Banerjee [15] measured the distance between the following FL, as shown in Figure 5. Figure 5. Selected features by Banerrjee [15]. (a) distance between eyes, (b) distance between ears, (c) distance between the nose and forehead, (d) width of the leap, in addition to the following angles where the sum is 180°; (e) angles between eyes and nose, (f) angles between ears and mouth. The following were used to measure the distance between the face objects: Euclidian distance (EU), city block metric, Minkowski distance, Chebyshev distance and cosine distance.
Sabri et al. [16] develop a set of algorithms in a 3D face module where the captured face was segmented to obtain FLs, nose tip, mouth corners, left, and right eye corner, as shown in Figure 6. The algorithm computed two triangles; the first one between the eyes center and mouth center, where  Banerjee [15] measured the distance between the following FL, as shown in Figure 5. Figure 5. Selected features by Banerrjee [15]. (a) distance between eyes, (b) distance between ears, (c) distance between the nose and forehead, (d) width of the leap, in addition to the following angles where the sum is 180°; (e) angles between eyes and nose, (f) angles between ears and mouth. The following were used to measure the distance between the face objects: Euclidian distance (EU), city block metric, Minkowski distance, Chebyshev distance and cosine distance.
Sabri et al. [16] develop a set of algorithms in a 3D face module where the captured face was segmented to obtain FLs, nose tip, mouth corners, left, and right eye corner, as shown in Figure 6. The algorithm computed two triangles; the first one between the eyes center and mouth center, where Banerjee [15] measured the distance between the following FL, as shown in Figure 5.  Banerjee [15] measured the distance between the following FL, as shown in Figure 5. Figure 5. Selected features by Banerrjee [15]. (a) distance between eyes, (b) distance between ears, (c) distance between the nose and forehead, (d) width of the leap, in addition to the following angles where the sum is 180°; (e) angles between eyes and nose, (f) angles between ears and mouth. The following were used to measure the distance between the face objects: Euclidian distance (EU), city block metric, Minkowski distance, Chebyshev distance and cosine distance.
Sabri et al. [16] develop a set of algorithms in a 3D face module where the captured face was segmented to obtain FLs, nose tip, mouth corners, left, and right eye corner, as shown in Figure 6. The algorithm computed two triangles; the first one between the eyes center and mouth center, where  [15]. (a) distance between eyes, (b) distance between ears, (c) distance between the nose and forehead, (d) width of the leap, in addition to the following angles where the sum is 180 • ; (e) angles between eyes and nose, (f) angles between ears and mouth. The following were used to measure the distance between the face objects: Euclidian distance (EU), city block metric, Minkowski distance, Chebyshev distance and cosine distance. Sabri et al. [16] develop a set of algorithms in a 3D face module where the captured face was segmented to obtain FLs, nose tip, mouth corners, left, and right eye corner, as shown in Figure 6. The algorithm computed two triangles; the first one between the eyes center and mouth center, where the second one between the eyes center and nose tip. Additionally, they measured the distance between the left and right eye corners and mouth corners, as shown in Figure 7.
Information 2020, 11, x FOR PEER REVIEW 5 of 34 the second one between the eyes center and nose tip. Additionally, they measured the distance between the left and right eye corners and mouth corners, as shown in Figure 7.  Meanwhile, Napieralski et al. [17] used the Viola-Jones algorithm to detect three facial objects: eyes, nose, and mouth, where the midpoint for each region was calculated, as shown in Figure 8. They used the EU to measure the distance between both eyes, between lips and nose, nose width, lips height, and width. Benedict and Kumar [18] designed geometric shaped facial extraction for face recognition to identify the subjects by finding the center and the corners of the eye using eye detection and eye localization. Then, 11 fiducial points have been derived from the given face: three points on the eye, the lateral extremes, the nose tip, midpoint of the lip, and two points on lateral extremes of lips as shown in Figure 9.  [16].
Information 2020, 11, x FOR PEER REVIEW 5 of 34 the second one between the eyes center and nose tip. Additionally, they measured the distance between the left and right eye corners and mouth corners, as shown in Figure 7.  Meanwhile, Napieralski et al. [17] used the Viola-Jones algorithm to detect three facial objects: eyes, nose, and mouth, where the midpoint for each region was calculated, as shown in Figure 8. They used the EU to measure the distance between both eyes, between lips and nose, nose width, lips height, and width. Benedict and Kumar [18] designed geometric shaped facial extraction for face recognition to identify the subjects by finding the center and the corners of the eye using eye detection and eye localization. Then, 11 fiducial points have been derived from the given face: three points on the eye, the lateral extremes, the nose tip, midpoint of the lip, and two points on lateral extremes of lips as shown in Figure 9. Meanwhile, Napieralski et al. [17] used the Viola-Jones algorithm to detect three facial objects: eyes, nose, and mouth, where the midpoint for each region was calculated, as shown in Figure 8. They used the EU to measure the distance between both eyes, between lips and nose, nose width, lips height, and width.
Information 2020, 11, x FOR PEER REVIEW 5 of 34 the second one between the eyes center and nose tip. Additionally, they measured the distance between the left and right eye corners and mouth corners, as shown in Figure 7.  Meanwhile, Napieralski et al. [17] used the Viola-Jones algorithm to detect three facial objects: eyes, nose, and mouth, where the midpoint for each region was calculated, as shown in Figure 8. They used the EU to measure the distance between both eyes, between lips and nose, nose width, lips height, and width. Benedict and Kumar [18] designed geometric shaped facial extraction for face recognition to identify the subjects by finding the center and the corners of the eye using eye detection and eye localization. Then, 11 fiducial points have been derived from the given face: three points on the eye, the lateral extremes, the nose tip, midpoint of the lip, and two points on lateral extremes of lips as shown in Figure 9. Benedict and Kumar [18] designed geometric shaped facial extraction for face recognition to identify the subjects by finding the center and the corners of the eye using eye detection and eye localization. Then, 11 fiducial points have been derived from the given face: three points on the eye, the lateral extremes, the nose tip, midpoint of the lip, and two points on lateral extremes of lips as shown in Figure 9. . Selected Features in [18].
A study of Gurnani et al. [19] showed that the salient regions, eyes, nose, and mouth were the dominant features that help to classify the facial soft biometric; age, gender, and FE. According to Barroso et al. [20], they found out that the expression recognition performance using the whole face outperforms using some regions.

Facial Expression Recognition Applications
Some studies tried to improve the facial recognition procedures, such as; Teng et al. in [21] who proposed a 3D Convolutional Neural Networks (CNN) based architecture for FE recognition in videos. Mangala and Prajwala [22] used Eigenfaces and principle component analysis for the same purpose. Meanwhile, Ivanovsky et al. [23] used CNN to detect smiles and FE. Sun et al. [24] proposed FE recognition framework on discovering the region of interest to train the effective face specific of CNN. Yang et al. [25] utilized the facial action unit to recognize the expressions. Liang ji et al. [26] proposed deep learning enhanced gender conditional random forest for expressions in an uncontrolled environment to address the gender influence. Jeong et al. [27] proposed deep joint spatiotemporal features for facial expression recognition based on the deep appearance and geometric neural networks. Mehta et al. [28] recognized emotions based on its intensities while Jala and Tariq [8] aimed to get beyond classification and recognition known FE to cluster unknown facial behaviors.
In terms of the FE recognition applications, the deficits of FE in Huntington's diseases have been studied by Yitzhak et al. in [29] to improve the FE recognition using the predicting the severity of their motor symptoms. Mattavellt [30] studied it in Parkinson's diseases. Flynn et al. [31] assessed the effectiveness of automated emotion recognition in adults and children for the benefits of different applications, such as identification of children's emotions before clinical investigations.
In other aspects, FE can be used as a way of authentication; Delina et al. [32] tried to address the vulnerability of a single biometric authentication model by proposing the subject's physiological and behavioral traits' face. Their approach was to identify users by fusing the face shape and the FE to prove theirs legitimately. Additionally, Ming et al. [33] used FE as liveness detection in addition to the face verification.

The Effect of Facial Expression on Face Biometric Reliability
A few papers analyzed the performance of the biometric system under the effect of the subject's mode and his expressions, such as Pavael and Lordanescu [34], who analyzed the recognition performance by eyewitness, where their results indicated that happy and sad expressions influenced significantly the process of facial identity. Dalapicola et al. [35] took into their consideration the periocular trait and investigated the effect of FE on this region where it has been found that recognition using CNN was sensitive to the region deformation caused by FE. The experimental study has been done on the extended Cohn-Kanda (CK+) dataset that contains image sequences of 123 subjects where each subject has several samples between (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11) and the number of frames varied from (4-71). Each sample represents an FE. Azimi [36] investigated whether the emotional faces have a statistically significant effect on FB's matching score. The experiment was done using python dlib A study of Gurnani et al. [19] showed that the salient regions, eyes, nose, and mouth were the dominant features that help to classify the facial soft biometric; age, gender, and FE. According to Barroso et al. [20], they found out that the expression recognition performance using the whole face outperforms using some regions.

Facial Expression Recognition Applications
Some studies tried to improve the facial recognition procedures, such as; Teng et al. in [21] who proposed a 3D Convolutional Neural Networks (CNN) based architecture for FE recognition in videos. Mangala and Prajwala [22] used Eigenfaces and principle component analysis for the same purpose. Meanwhile, Ivanovsky et al. [23] used CNN to detect smiles and FE. Sun et al. [24] proposed FE recognition framework on discovering the region of interest to train the effective face specific of CNN. Yang et al. [25] utilized the facial action unit to recognize the expressions. Liang ji et al. [26] proposed deep learning enhanced gender conditional random forest for expressions in an uncontrolled environment to address the gender influence. Jeong et al. [27] proposed deep joint spatiotemporal features for facial expression recognition based on the deep appearance and geometric neural networks. Mehta et al. [28] recognized emotions based on its intensities while Jala and Tariq [8] aimed to get beyond classification and recognition known FE to cluster unknown facial behaviors.
In terms of the FE recognition applications, the deficits of FE in Huntington's diseases have been studied by Yitzhak et al. in [29] to improve the FE recognition using the predicting the severity of their motor symptoms. Mattavellt [30] studied it in Parkinson's diseases. Flynn et al. [31] assessed the effectiveness of automated emotion recognition in adults and children for the benefits of different applications, such as identification of children's emotions before clinical investigations.
In other aspects, FE can be used as a way of authentication; Delina et al. [32] tried to address the vulnerability of a single biometric authentication model by proposing the subject's physiological and behavioral traits' face. Their approach was to identify users by fusing the face shape and the FE to prove theirs legitimately. Additionally, Ming et al. [33] used FE as liveness detection in addition to the face verification.

The Effect of Facial Expression on Face Biometric Reliability
A few papers analyzed the performance of the biometric system under the effect of the subject's mode and his expressions, such as Pavael and Lordanescu [34], who analyzed the recognition performance by eyewitness, where their results indicated that happy and sad expressions influenced significantly the process of facial identity. Dalapicola et al. [35] took into their consideration the periocular trait and investigated the effect of FE on this region where it has been found that recognition using CNN was sensitive to the region deformation caused by FE. The experimental study has been done on the extended Cohn-Kanda (CK+) dataset that contains image sequences of 123 subjects where each subject has several samples between (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11) and the number of frames varied from (4-71). Each sample represents an FE. Azimi [36] investigated whether the emotional faces have a statistically Information 2020, 11, 485 7 of 32 significant effect on FB's matching score. The experiment was done using python dlib face recognition and Verilook on the Jaffe dataset that involved ten female users with seven different modes: neutrality, happiness, sadness, anger, disgust, fear, surprise. His results showed the following: (1) by comparing the neutral faces and FE, the average genuine similarity has been degraded; (2) sadness and disgust expressions are the most dissimilar expressions among the other expressions; (3) the best class to be verified with was the normal face, as there was no facial deformation and muscle movement; (4) For users who enrolled with happy, angry, surprised, disgusted, sad, and fearful expressions, thebest expressions to verify with are fearful, sad, fearful, sad, fearful expressions, respectively; (5) the lowest matching score was achieved when users who provided happy, angry, surprised, disgusted, sad, fearful faces during the enrollment, identified themselves with disgusted, surprised, angry, happy, angry faces respectively during the verification. Another study investigated the effect of FE on FB systems by Márquez-Olivera et al. [37], where they analyzed their FB system under the influence of FE. It has been concluded that failures occurred when the subjects expressed surprise as it has maximum facial deformations while the sadness and anger expressions express high deformation on eye regions. On the other hand, the system performed better when the subject expresses happy expressions. Moreover, they also tried to overcome the effect of FE in the FB system by recognizing the people under their expressions; they proposed a hybrid model of Alpha-Beta Associative memories with correlation Matrix and K-Nearest Neighbors. Although the best face recognition accuracy under the influence of FE was 90% achieved by anger expressions, Khorsheed and Yurtkan [38] claimed that the Local Binary Pattern features form a strong base for face recognition under the influence of FE.
Different aspects have been studied by Azimi and Pacut [1] to investigate whether the effect of FE on the FB system was gender-dependent. Their results on the Stirling dataset showed that 13 females' faces showed more intense FE than ten males' faces using python face recognition and Verilook neurotechnology. This means that the similarity score of neutral faces vs. all FE for male subjects was better than female subjects; therefore, the influence of FE on FB system was gender-dependent.
This paper aims to study the impact of the FE on FB systems due to the lack of such studies in the field, as we can notice from the previous studies. For instance, in [15,16] investigated the utilization of the selected features and landmarks for face recognition purposes only. Although the accuracy was the highest when both slopes and distances were used in [12], this study will use distances only as it analyzes which muscles and facial features are affected by FE, not for recognition purposes [21][22][23][24][25][26][27][28][29][30][31][32][33], evaluated the performance of FE classifications. While utilizing the periocular as a biometric trait in [33] has its failures when the face presents posture changes, occlusions, closed eyes, and other changes, in the FB, the recognition process can use other features than the one that exposes failure. Meanwhile, in [35], the study used only ten females' subjects without males. This work aims to fill some gaps within the field, as illustrated in the next sections.

Methodology
Humans may express different expressions during daily life, where a robust FB system's performance should not be affected by those expressions and modes. The objective is to analyze the FB system's performance under the influence of different FE to identify facial features that have the lowest deformations caused by FE to be used only during the recognition regardless of what expression is presented. This study aims to achieve the goal by answering these questions; (1) Is the effect of the FE on the FB system significant? (2) Which FE has the best results? (3) Which FE has the worst results? (4) What is the impact of each FE on the similarity score? (5) Which facial features have the lowest facial deformation that can be generalized during the recognition and cannot be affected significantly by the expressed emotion? (6) What is the FRR performance under the influence of FE?
To answer those questions, we used the IMPA-FACES3D dataset [39] to obtain the distances and position for 22 facial features. After that, we determined the relativity shift score (RSS) of different facial features and total similarity score (SS) between the neutral face and six universal expressions for Information 2020, 11, 485 8 of 32 each subject. Based on the analysis of gained data, we identified a set of facial features with the lowest facial deformations that score higher SS. This section illustrates the methodology in detail.

Dataset Description
IMPA-FACES3D dataset [39] includes acquisitions of 38 male and female subjects with 12 distinct poses. This study uses neutral mode and the six universal expressions: neutrality, joy, sadness, surprise, anger, disgust, fear. Figure 10 shows an example of those expressions for subject # one. This set is composed of 22 males and 16 females with ages between 20 and 50 years (we used only 36 subjects as there are two subjects are missing (22 male and 14 female)).
Information 2020, 11, x FOR PEER REVIEW 8 of 34

Dataset Description
IMPA-FACES3D dataset [39] includes acquisitions of 38 male and female subjects with 12 distinct poses. This study uses neutral mode and the six universal expressions: neutrality, joy, sadness, surprise, anger, disgust, fear. Figure 10 shows an example of those expressions for subject # one. This set is composed of 22 males and 16 females with ages between 20 and 50 years (we used only 36 subjects as there are two subjects are missing (22 male and 14 female)). To achieve the objectives, we have developed an in-house python script built upon OpenCV and dlib that works, as explained in the next sections.

Face Detection and Acquisition
After uploading the two faces, it will be converted into a greyscale image with a single layer of 8-bit pixels (value ranges between 0-255). The grayscale image was faded into dlib to identify 68 key points FL. FL are key points of the detected face's shape to make up the facial features. Facial features that could be compared with other facial features were made using the distance between and FL. This study uses 68 points templates for FLs, as shown in Figure 11 and explained in Table 1. To achieve the objectives, we have developed an in-house python script built upon OpenCV and dlib that works, as explained in the next sections.

Face Detection and Acquisition
After uploading the two faces, it will be converted into a greyscale image with a single layer of 8-bit pixels (value ranges between 0-255). The grayscale image was faded into dlib to identify 68 key points FL. FL are key points of the detected face's shape to make up the facial features. Facial features that could be compared with other facial features were made using the distance between and FL. This study uses 68 points templates for FLs, as shown in Figure 11 and explained in Table 1.

Preprocessing
After that, we use static (silent) features to adjust the size of the uploaded faces according to the standard size using points 1 and 17 in Figure 11. Moreover, to align the faces, we kept the angle of the line joining the midpoint between two eyes to zero degree as shown in Figure 12, where the blue line should be aligned with the red line.

Features Extraction and Verification
We have identified 22 facial features to be analyzed as shown in Table 2 and Figure 13. Table 2

Preprocessing
After that, we use static (silent) features to adjust the size of the uploaded faces according to the standard size using points 1 and 17 in Figure 11. Moreover, to align the faces, we kept the angle of the line joining the midpoint between two eyes to zero degree as shown in Figure 12, where the blue line should be aligned with the red line.

Preprocessing
After that, we use static (silent) features to adjust the size of the uploaded faces according to the standard size using points 1 and 17 in Figure 11. Moreover, to align the faces, we kept the angle of the line joining the midpoint between two eyes to zero degree as shown in Figure 12, where the blue line should be aligned with the red line.

Features Extraction and Verification
We have identified 22 facial features to be analyzed as shown in Table 2 and Figure 13. Table 2 explains the facial features and the corresponding points in 68 landmark points, while Figure 13 illustrates the features on the subject's face.

Features Extraction and Verification
We have identified 22 facial features to be analyzed as shown in Table 2 and Figure 13. Table 2 explains the facial features and the corresponding points in 68 landmark points, while Figure 13 illustrates the features on the subject's face.
After that, a comparison between neutral mode and other expressions for the same subject is conducted to obtain 22 facial features, RSS and SS. Neutral mode and other expressions were compared to each other to explore the effect of FE on the genuine score. Assuming the provided template in the enrollment session is the neutral mode-as it is the most common mode in our daily routine-the comparison conducted for the subjects were as follows: neutral mode vs. happy expression, neutral mode vs. sad expression, neutral mode vs. surprise expression, neutral mode vs. anger expression, neutral mode vs. disgust expression, neutral mode vs. fear expression. In each comparison, the 68 FL points were obtained for each image (expression) to create a list of face's organs as in Table 1 to help in obtaining the facial features showed in Table 2.   (n) distance between left eye and nose; (o) distance between right eye and nose; (p) distance between left eye and mouth; (q) distance between right eye and mouth; (r) distance between left eye and eyebrow; (s) distance between right eye and eyebrow; (t) distance between nose and forehead; (u) distance between left ear and mouth; (v) distance between right ear and mouth.
After that, a comparison between neutral mode and other expressions for the same subject is conducted to obtain 22 facial features, RSS and SS. Neutral mode and other expressions were compared to each other to explore the effect of FE on the genuine score. Assuming the provided template in the enrollment session is the neutral mode-as it is the most common mode in our daily routine-the comparison conducted for the subjects were as follows: neutral mode vs. happy expression, neutral mode vs. sad expression, neutral mode vs. surprise expression, neutral mode vs. anger expression, neutral mode vs. disgust expression, neutral mode vs. fear expression. In each comparison, the 68 FL points were obtained for each image (expression) to create a list of face's organs as in Table 1 to help in obtaining the facial features showed in Table 2.
In order to obtain the 22 facial features as described in Table 2 and Figure 13, we used Euclidian distance in Equation (3) to measure the straight-line distance between two FLs where 1 2 are the coordinate of the first landmark and 1 − 2 are the coordinates of the second landmark.
Up to this point, the values for each facial feature in the neutral mode and the expression in each comparison have been recorded. After that we considered the known measurement error rate [41] and defined it in Equation (4) as follows: Furthermore, we adapted it to our problem, and called it the "relativity shift score" (RSS) where the value of EU of the expression's feature in Equation (5) is corresponding to the experimental value in Equation (4); while the EU of the normal mode's feature corresponding in Equation (5) is corresponding to the accepted value in Equation (4) and defined in Equation (5) as: (n) distance between left eye and nose; (o) distance between right eye and nose; (p) distance between left eye and mouth; (q) distance between right eye and mouth; (r) distance between left eye and eyebrow; (s) distance between right eye and eyebrow; (t) distance between nose and forehead; (u) distance between left ear and mouth; (v) distance between right ear and mouth.
In order to obtain the 22 facial features as described in Table 2 and Figure 13, we used Euclidian distance in Equation (3) to measure the straight-line distance between two FLs where x 1 and x 2 are the coordinate of the first landmark and y 1 − y 2 are the coordinates of the second landmark.
Up to this point, the values for each facial feature in the neutral mode and the expression in each comparison have been recorded. After that we considered the known measurement error rate [41] and defined it in Equation (4) as follows: Furthermore, we adapted it to our problem, and called it the "relativity shift score" (RSS) where the value of EU of the expression's feature in Equation (5) is corresponding to the experimental value in Equation (4); while the EU of the normal mode's feature corresponding in Equation (5) is corresponding to the accepted value in Equation (4) and defined in Equation (5) as: This will measure how two faces are relative to each other in terms of particle facial features for each comparison in a range of (0,1), where 0 means the features were identical (facial feature stayed unchanged, and the expression did not change the feature) while 1 means the features were unidentical, hence lower value means higher similarity.
Next, we summed up all the relativities and divided them by their number to introduce the "similarity score" (SS) measure in Equation (6). Additionally, the results would be in a range of (0,1) where 0 means that (all features) stayed unchanged. We added (1-) to Equation (6) to be "1" as the best case, meaning that two faces are 100% "similar", i.e., the faces were the same in terms of their all facial features. Thus, to make similarity considering the best value of "1", we introduce the following formula: After obtaining all the relativities and similarities for 36 subjects in 6 FE, we conducted a statistical analysis to calculate the means ± SD of RSS and SS for 36 subjects for each expression and all expressions. Based on the results, we ranked the facial features that scored the best RSS, and accordingly, we selected the best five features, best ten features, and the worst ten features, and we compared them in terms of their SS for each FE and all expressions.

Results and Discussion
The below sections described and discussed the achieved results between neutral face and six FE as follows.

Happy Expression vs. Neutral Mode
The following analysis shows how, and which set of measured face's features have the best and worst score in terms of the relativity shift score for the happy expression compared with neutral mode.
After comparing the neutral mode with a happy expression for 36 subjects, 22 facial features have been obtained for the neutral mode and the happy expression. Then we applied Equation (5) to get the results of RSS for each facial feature as follows: After that, we ranked the happy facial features based on the RSS, where the lowest value is the best in terms of the similarity. The results in Table 3 and Figure 14 showed that the top five features in terms of the RSS were: right eye position, chin position, mouth position, nose position, and left eye position. While of the best ten features, the next best five were: forehead position, forehead width, distance between eyes, chin width, and left eye width. Additionally, we identified the worst ten features as follows: mouth width, distance between left ear and mouth, distance between right ear and mouth, nose width, distance between left eye and mouth, distance between right eye and mouth, distance between right eye and eyebrow, distance between left eye and eyebrow, distance between nose and forehead, and distance between left eye and nose.   Next, we applied Equation (6) to obtain the SS for the 22 facial features, top five features, top ten features, and the worst ten features as follows: The overall SS for all facial features was 91.7539%, while the SS for the top five was 97.5262 %, the SS for the top ten was 96.7665%, and finally, the SS for the worst ten was 86.4146%.
The SS has been increased by 5.77%, 5.01% after we selected the top five and top ten features, respectively. Additionally, it is noticed from Table 3 and Figure 14 that the top five and top ten features have the least standard deviations values which mean that the RSS for the sample are tending to be very close to the mean, while the RSS for the worst ten features indicates that they have deviated from the mean as they have a higher standard deviation.

Sad Expression vs. Neutral Mode
The following analysis shows how, and which set of measured face's features have the best and worst score in terms of the relativity shift score for the sad expression compared with neutral mode.
After comparing the neutral mode with the sad expression for 36 subjects, the values of 22 facial features have been obtained for the neutral mode and the sad expression. Then we applied Equation (5) to get the results of RSS for each facial feature as follows: Then, we ranked the sad facial features based on the RSS, where the lowest value is the best in terms of the similarity. The results in Table 4 and Figure 15 show that the top five features in terms of the RSS were: right eye position, chin position, forehead width, forehead position, and mouth position. While of the best ten features, the next best five were: left eye position, nose position, distance between eyes, distance between left eye and mouth, and chin width. Additionally, we identified the worst ten features as follows: right eye width, distance between left eye and nose, mouth width, distance between right eye and nose, distance between nose and forehead, distance between left ear and mouth, distance between right ear and mouth, distance between left eye and eyebrow, distance between right eye and eyebrow, and nose width. The SS for all facial features was; 93.9482% while the SS for the top five was 97.2315%; as for the SS for the top ten was 96.4777%. Finally, the SS for the worst ten was 91.2179%.
The SS has been increased by 3.2833%, 2.5295% after we selected only the top five and top ten features. Additionally, it is noticed from Table 4 and Figure 15 that the top five and top ten features have the least standard deviations values which mean that the RSS for the sample are tending to be close to the mean, while the RSS for the worst ten features indicates that they have deviated from the mean as they have a higher standard deviation.  Next, we applied Equation (6) to obtain the SS for the 22 facial features, top five features, top ten features, and the worst ten features as follows: The SS for all facial features was; 93.9482% while the SS for the top five was 97.2315%; as for the SS for the top ten was 96.4777%. Finally, the SS for the worst ten was 91.2179%.
The SS has been increased by 3.2833%, 2.5295% after we selected only the top five and top ten features. Additionally, it is noticed from Table 4 and Figure 15 that the top five and top ten features have the least standard deviations values which mean that the RSS for the sample are tending to be close to the mean, while the RSS for the worst ten features indicates that they have deviated from the mean as they have a higher standard deviation.

Surprise Expression vs. Neutral Mode
The following analysis shows how, and which set of measured face's features have the best and worst score in term of the relativity shift score for the surprise expression in comparison with neutral mode.

Surprise Expression vs. Neutral Mode
The following analysis shows how, and which set of measured face's features have the best and worst score in term of the relativity shift score for the surprise expression in comparison with neutral mode.
After comparing the neutral mode with surprise expression for 36 subjects, the values of 22 facial features have been obtained for the neutral mode and the surprise expression. Then we applied Equation (5) to get the results of RSS for each facial feature as follows: Then, we ranked the surprise facial features based on the RSS, where the lowest value is the best in terms of the similarity. The results in Table 5 and Figure 16 show that the top five features in terms of the RSS were: right eye position, left eye position, mouth position, forehead position, and nose position. While of the best ten features, the next best five were: chin position, forehead width, distance between eyes, chin width, and left eye width. Additionally, we identified the worst ten features as follows: distance between left eye and mouth, distance between right eye and nose, distance between right eye and mouth, mouth width, distance between nose and forehead, distance between right ear and mouth, distance between left ear and mouth, nose width, distance between right eye and eyebrow, and distance between left eye and eyebrow. Next, we applied Equation (6) to obtain the SS for the 22 facial features, top five features, top ten features, and the worst ten features as follows: The SS for all facial features was 92.6649%, while the SS for the top five was 97.1492%; the SS for the top ten was 96.4605%, and finally, the SS for the worst ten was 88.6843%.
The SS has been increased by 4.4843%, 3.7956% after we selected only the top five and top ten features. Additionally, it is noticed from Table 5 and Figure 16 that the top five and top ten features have the least standard deviations values which mean that the RSS for the sample is tending to be close to the meanwhile the RSS for the worst ten features indicates that they have deviated from the mean as they have a higher standard deviation. Next, we applied Equation (6) to obtain the SS for the 22 facial features, top five features, top ten features, and the worst ten features as follows: The SS for all facial features was 92.6649%, while the SS for the top five was 97.1492%; the SS for the top ten was 96.4605%, and finally, the SS for the worst ten was 88.6843%.
The SS has been increased by 4.4843%, 3.7956% after we selected only the top five and top ten features. Additionally, it is noticed from Table 5 and Figure 16 that the top five and top ten features have the least standard deviations values which mean that the RSS for the sample is tending to be close to the meanwhile the RSS for the worst ten features indicates that they have deviated from the mean as they have a higher standard deviation.

Anger Expression vs. Neutral Mode
The following analysis shows how, and which set of measured face's features have the best and worst score in term of the relativity shift score for the anger expression in comparison with neutral mode.
After comparing the neutral mode with anger expression for 36 subjects, the values of 22 facial features have been obtained for the neutral mode and the anger expression. Then we applied Equation (5) to get the results of RSS for each facial feature as follows:

Anger Expression vs. Neutral Mode
The following analysis shows how, and which set of measured face's features have the best and worst score in term of the relativity shift score for the anger expression in comparison with neutral mode.
After comparing the neutral mode with anger expression for 36 subjects, the values of 22 facial features have been obtained for the neutral mode and the anger expression. Then we applied Equation (5) to get the results of RSS for each facial feature as follows: Then, we ranked the surprise facial features based on the RSS, where the lowest value is the best in terms of the similarity. The results in Table 6 and Figure 17 show that the top five features in terms of the RSS were: right eye position, forehead width, chin position, forehead position, and distance between eyes. While of the best ten features the next best five were: mouth position, left eye position, nose position, chin width, and right eye width. Additionally, we identified the worst ten features as follows: distance between right eye and mouth, distance between left eye and nose, mouth width, distance between right eye and nose, distance between left ear and mouth, distance between nose and forehead, distance between right ear and mouth, nose width, distance between right eye and eyebrow, and distance between left eye and eyebrow.  Next, we applied Equation (6) to obtain the SS for the 22 facial features, top five features, top ten features, and the worst ten features as follows: The SS for all facial features was 92.5003%, while the SS for the top five was 96.7651%; the SS for  Next, we applied Equation (6) to obtain the SS for the 22 facial features, top five features, top ten features, and the worst ten features as follows: The SS for all facial features was 92.5003%, while the SS for the top five was 96.7651%; the SS for the top ten was 96.1960%, and finally, the SS for the worst ten was 88.4048%.
The SS has been increased by 4.2648%, 3.6957% after we selected only the top five features and top ten features. Additionally, it is noticed from Table 6 and Figure 17 that the top five and top ten features have the least standard deviations values which mean that the RSS for the sample are tending to be close to the mean, while the RSS for the worst ten features indicates that they have deviated from the mean as they have a higher standard deviation.

Disgust Expression vs. Neutral Mode
The following analysis shows how, and which set of measured face's features have the best and worst score in term of the relativity shift score for the disgust expression in comparison with neutral mode.
After comparing the neutral mode with disgust expression for 36 subjects, the values of 22 facial features have been obtained for the neutral mode and the disgust expression. Then we applied Equation (5) to get the results of RSS for each facial feature as follows: Then, we ranked the disgust facial features based on the RSS, where the lowest value is the best in terms of the similarity. The results in Table 7 and Figure 18 show that the top five features in terms of the relativity score were: chin position, right eye position, mouth position, forehead width, and left eye position. While of the best ten features, the next best five were: nose position, forehead position, distance between eyes, chin width, and left eye width. Additionally, we identified the worst ten features as follows: distance between left eye and nose, distance between right eye and mouth, distance between left eye and mouth, distance between nose and forehead, mouth width, distance between right ear and mouth, distance between left eye and eyebrow, distance between right eye and eyebrow, and nose width.
Next, we applied Equation (6) to obtain the SS for the 22 facial features, top five features, top ten features, and the worst ten features as follows: The SS for all facial features was 92.01283%, while the SS for the top five was 96.9166%; the SS for the top ten was 96.0945%, and finally, the SS for the worst ten was 87.9381%.
The SS has been increased by 4.9037%, 3.0816% after we selected only the top five and top ten features. Additionally, it is noticed from Table 7 and Figure 18 that the top five and top ten features have the least standard deviations values which mean that the RSS for the sample is tending to be close to the meanwhile the RSS for the worst ten features indicates that they have deviated from the mean as they have a higher standard deviation.  Next, we applied Equation (6) to obtain the SS for the 22 facial features, top five features, top ten features, and the worst ten features as follows: The SS for all facial features was 92.01283%, while the SS for the top five was 96.9166%; the SS

Fear Expression vs. Neutral Mode
The following analysis shows how, and which set of measured face's features have the best and worst score in term of the relativity shift score for the fear expression in comparison with neutral mode.
After comparing the neutral mode with the fear expression for 36 subjects, the values of 22 facial features have been obtained for the neutral mode and the fear expression. Then we applied Equation (5) to get the results of RSS for each facial feature as follows: Then, we ranked the fear facial features based on the RSS, where the lowest value is the best in terms of the similarity. The results in Table 8 and Figure 19 show that the top five features in terms of the relativity score were: right eye position, chin position, mouth position, nose position, and left eye position. While of the best ten features, the next best five were: forehead position, forehead width, distance between eyes, chin width, and distance between left eye and nose. Additionally, we identified the worst ten features as follows: distance between right eye and mouth, right eye width, mouth width, distance between right eye and nose, distance between right ear and mouth, distance between nose and forehead, distance between left ear and mouth, nose width, distance between right eye and eyebrow, and distance between left eye and eyebrow.  Next, we applied Equation (6) to obtain the SS for the 22 facial features, top five features, top ten features, and the worst ten features as follows: The SS for all facial features was 93.3441%, while the SS for the top five was 96.9428%; the SS for the top ten was 96.1102%, and finally, the SS for the worst ten was 90.5102%.
The SS has been increased by 3.5987%, 2.7661% after we selected only the top five and to ten features. Additionally, it is noticed from Table 8 and Figure 19 that the top five and top ten features have the least standard deviations values which mean that the RSS for the sample are tending to be close to the mean, while the RSS for the worst ten features indicates that they have deviated from the mean as they have a higher standard deviation.
As a summary, Table 9 and Figure 20 show the SS for all six expressions in the following situations; SS for all 22 facial features, top 10, top 5, worst ten. It could be observed that the top five facial features have achieved the best SS.  Next, we applied Equation (6) to obtain the SS for the 22 facial features, top five features, top ten features, and the worst ten features as follows: The SS for all facial features was 93.3441%, while the SS for the top five was 96.9428%; the SS for the top ten was 96.1102%, and finally, the SS for the worst ten was 90.5102%.
The SS has been increased by 3.5987%, 2.7661% after we selected only the top five and to ten features. Additionally, it is noticed from Table 8 and Figure 19 that the top five and top ten features have the least standard deviations values which mean that the RSS for the sample are tending to be close to the mean, while the RSS for the worst ten features indicates that they have deviated from the mean as they have a higher standard deviation.
As a summary, Table 9 and Figure 20 show the SS for all six expressions in the following situations; SS for all 22 facial features, top 10, top 5, worst ten. It could be observed that the top five facial features have achieved the best SS.  Figure 20. The similarity score (SS) for all six-expression using all features, top five, top ten, worst ten.

All Expressions vs. Neutral Mode
After obtaining results of the RSS and SS between neutral mode and each expression for 36 subjects, the results showed that the following FEs; happy, sadness, surprise, anger, disgust, and fear showed a lower SS compared to neutral mode. The highest dissimilar FE was achieved by the disgust expression of 92.01%, which was also agreed with results of Azimi [36] and Márquez-Olivera [37]. While the lowest one achieved by sad expression 93.94%, which is contrary to what has been reported in [37] where it was the happy expression. The following expressions: fear, surprise, anger, happy achieved 93.34%, 92.66%, 92.50%, 92.40% respectively, as shown in Table 10 and Figure 21.  Figure 20. The similarity score (SS) for all six-expression using all features, top five, top ten, worst ten.

All Expressions vs. Neutral Mode
After obtaining results of the RSS and SS between neutral mode and each expression for 36 subjects, the results showed that the following FEs; happy, sadness, surprise, anger, disgust, and fear showed a lower SS compared to neutral mode. The highest dissimilar FE was achieved by the disgust expression of 92.01%, which was also agreed with results of Azimi [36] and Márquez-Olivera [37]. While the lowest one achieved by sad expression 93.94%, which is contrary to what has been reported in [37] where it was the happy expression. The following expressions: fear, surprise, anger, happy achieved 93.34%, 92.66%, 92.50%, 92.40% respectively, as shown in Table 10 and Figure 21.  Hence, we assumed that there are facial features that cause the changes between neutral's SS and FE's SS and lead to a lower score. To find out which features in order to improve the overall SS, we calculated the mean of the RSS for each facial feature with respect to all expressions for 36 subjects. We then ranked the facial features based on the RSS, where the lowest value is the best in terms of Hence, we assumed that there are facial features that cause the changes between neutral's SS and FE's SS and lead to a lower score. To find out which features in order to improve the overall SS, we calculated the mean of the RSS for each facial feature with respect to all expressions for 36 subjects. We then ranked the facial features based on the RSS, where the lowest value is the best in terms of the similarity. The results showed that the top five features in terms of the RSS were: right eye position, chin position, mouth position, left eye position, and forehead position. While of the best ten features, the next best five were: nose position, forehead width, distance between eyes, chin width, and left eye width. Finally, the worst ten features were: distance between left eye and mouth, distance between right eye and mouth, distance between right eye and nose; distance between nose and forehead, distance between right ear and mouth, mouth width, distance between left ear and mouth, distance between right eye and eyebrow, distance between left eye and eyebrow, and nose width. This result is shown clearly in Figure 22 and Table 11. Additionally, Figure 22 shows the means for every facial feature in all six expressions; it indicates that there is a clear pattern in the lower values of the RSS where all the facial features have the same pattern. For example, all features have low right eye position's RSS while there is inconsistency in the mouth width feature's RSS as it is high. Next, we applied Equation (6) to obtain the SS for the top five features, top ten features, and the worst ten features with respect to all expressions, as follows: based on the rank in Table 9.
The purpose is to find out a set of facial features that suitable for all expressions where it will achieve a high SS score that does not matter what FE is presented. Table 12 and Figure 23 show the SS with respect to all expressions.   Additionally, Figure 22 shows the means for every facial feature in all six expressions; it indicates that there is a clear pattern in the lower values of the RSS where all the facial features have the same pattern. For example, all features have low right eye position's RSS while there is inconsistency in the mouth width feature's RSS as it is high. Next, we applied Equation (6) to obtain the SS for the top five features, top ten features, and the worst ten features with respect to all expressions, as follows: based on the rank in Table 9. The purpose is to find out a set of facial features that suitable for all expressions where it will achieve a high SS score that does not matter what FE is presented. Table 12 and Figure 23 show the SS with respect to all expressions.      Table 13 and Figures 24-26 show the SS for all 22 facial features, top five, top ten, worst ten with respect to each expression and top five, top ten, worst ten with respect to all expressions. Table 13. SS for all 22 facial features, top ten, top 5, worst ten with respect to each expression and with respect to all expressions.

Face Biometric System Performance
To validate our methodology, we have applied Equation (1) to determine the FRR at three acceptance thresholds; 99%, 95%, and 90%, and compare those rates using all facial features, top five, and top ten features. Considering that we have 216 instances (36 subjects * Six FE comparisons), the results as shown in Table 14 are; At 99% acceptance threshold; out of 216 instances, 216 have been rejected using all facial features, while 214 rejections using top ten features, and 190 rejection using

Face Biometric System Performance
To validate our methodology, we have applied Equation (1) to determine the FRR at three acceptance thresholds; 99%, 95%, and 90%, and compare those rates using all facial features, top five, and top ten features. Considering that we have 216 instances (36 subjects * Six FE comparisons), the results as shown in Table 14 are; At 99% acceptance threshold; out of 216 instances, 216 have been rejected using all facial features, while 214 rejections using top ten features, and 190 rejection using top features. At a 95% acceptance threshold, out of 216 instances, 171 have been rejected using all features, 41 rejections using the top ten features, and 34 rejections using the top five features. At 90% acceptance threshold, out of 216 instances, 30 rejections using all features, one rejection using the top ten features, and no rejection using the top five features. We can notice that the number of rejections of a genuine user has been decreased using the top five and ten features as follows: The rejection rate at 99% threshold has been decreased by 0.92% between (all) and (top ten) features and 12.03% between (all) and (top five) features. While, it decreased at a 95% threshold by 60.18% and 63.42% between (all) and (top ten), (top five) features, respectively. Finally, at a 90% threshold, it decreased to 13.42% and 13.88% between (all) and (top ten), (top five) features, respectively.

Top Five vs. Top Ten Facial Features
It could be perceived that the SS for the top five features is very close to the SS of top ten features, as shown in Figure 27. For example, the SS of anger expression in the top ten was 96.192%, while in the top five was 96.631%. This observation gave us the chance to use either of these two choices based on face biometrics' needs and restrictions. However, from a computational cost perspective, there is no difference between five and ten features. As a result, it is recommended to use the top ten features from security respective.
By comparing the neutral mode with six FEs, the average genuine SS has been degraded. This means that there is an effect of FE on the FB system's reliability. Additionally, it could be observed that the results of SS with respect to each expression are similar or very close to being similar to SS with respect to all expressions in each expression-meaning that the facial features that cause the muscle movements during the expressions are the same in all expressions as the same for each expression Moreover; from all the previous results and observations, it has been proven that the top five facial features are: right eye position, chin position, mouth position, left eye position, and forehead position. The top ten facial features (right eye position, chin position, mouth position, left eye position, forehead position, nose position, forehead width, distance between eyes, chin width, and left eye width) are suitable for all FE in FB and can be generalized during the recognition process since it will provide a higher similarity score no matter what the presented expression is.
Finally, by evaluating the performance using FRR, the results showed that using the top five facial features leads to getting more people correctly accepted and less falsely rejected within the face biometric-based authentication system. Information 2020, 11, x FOR PEER REVIEW 31 of 34 Figure 27. Top ten vs. top five features.
By comparing the neutral mode with six FEs, the average genuine SS has been degraded. This means that there is an effect of FE on the FB system's reliability. Additionally, it could be observed that the results of SS with respect to each expression are similar or very close to being similar to SS with respect to all expressions in each expression-meaning that the facial features that cause the muscle movements during the expressions are the same in all expressions as the same for each expression Moreover; from all the previous results and observations, it has been proven that the top five facial features are: right eye position, chin position, mouth position, left eye position, and forehead position. The top ten facial features (right eye position, chin position, mouth position, left eye position, forehead position, nose position, forehead width, distance between eyes, chin width, and left eye width) are suitable for all FE in FB and can be generalized during the recognition process since it will provide a higher similarity score no matter what the presented expression is.
Finally, by evaluating the performance using FRR, the results showed that using the top five facial features leads to getting more people correctly accepted and less falsely rejected within the face biometric-based authentication system.

Findings
The results of this study found that: (1) The following FEs have impacts on the FB system reliability: happy, sad, surprised, anger, disgust, fear. (2) The sad expression achieved the best SS, 93.94%. (3) The disgust expression achieved the worst SS, 92.01%. (4) Out of the 22 facial features, the following top features have the best RSS as they have the lowest facial deformations: right eye position, chin position, mouth position, left eye position, and forehead position. While the top ten features were: right eye position, chin position, mouth position, left eye position, forehead position, nose position, forehead width, distance between eyes, chin width, and left eye width. Meanwhile, the worst ten features with the highest facial deformations were: distance between left eye and mouth, distance between right eye and mouth, distance between right eye and nose, distance between nose and forehead, distance between right ear and mouth, mouth width, distance between left ear and mouth, distance between right eye and eyebrow, distance between left eye and eyebrow, and nose width. (5) Furthermore, the mean of the RSS showed less variances between the sample using the top facial features. (6) Additionally, it has been found that the performance of the top five and the top ten

Findings
The results of this study found that: (1) The following FEs have impacts on the FB system reliability: happy, sad, surprised, anger, disgust, fear. , forehead width, distance between eyes, chin width, and left eye width. Meanwhile, the worst ten features with the highest facial deformations were: distance between left eye and mouth, distance between right eye and mouth, distance between right eye and nose, distance between nose and forehead, distance between right ear and mouth, mouth width, distance between left ear and mouth, distance between right eye and eyebrow, distance between left eye and eyebrow, and nose width. (5) Furthermore, the mean of the RSS showed less variances between the sample using the top facial features. (6) Additionally, it has been found that the performance of the top five and the top ten features were very similar to each other. (7) Finally, the top features can be generalized during the recognition process regardless of what expression is presented during the verification.
By these findings, the FRR has been minimized, and the recognition acceptance threshold raised up to the possible highest without worrying about user convenience. As a result, the intrusion detection will be improved.

Conclusions
This paper investigates the effect of facial expressions on the face biometric system's reliability. Happy, sad, surprised, anger, disgust, and fear facial expressions have an impact on the accuracy and may cause false rejection for a genuine user. The statistical analysis of the obtained facial features between the neutral face and the six expressions identified a set of facial features that have the lowest facial deformations. The top features that have been identified in this study can be utilized in a part-based feature representation that removes some parts (regions) from the face and exploits the regions of interest so that will they not affect the recognition accuracy [42]. By the findings of this study, the false rejection rate has been minimized, as the false rejection instances caused by facial expressions