Facial Paralysis Detection on Images Using Key Point Analysis

: The inability to move the muscles of the face on one or both sides is known as facial paralysis, which may affect the ability of the patient to speak, blink, swallow saliva, eat, or communicate through natural facial expressions. The well-being of the patient could also be negatively affected. Computer-based systems as a means to detect facial paralysis are important in the development of standardized tools for medical assessment, treatment, and monitoring; additionally, they are expected to provide user-friendly tools for patient monitoring at home. In this work, a methodology to detect facial paralysis in a face photograph is proposed. A system consisting of three modules—facial landmark extraction, facial measure computation, and facial paralysis classiﬁcation—was designed. Our facial measures aim to identify asymmetry levels within the face elements using facial landmarks, and a binary classiﬁer based on a multi-layer perceptron approach provides an output label. The Weka suite was selected to design the classiﬁer and implement the learning algorithm. Tests on publicly available databases reveal outstanding classiﬁcation results on images, showing that our methodology that was used to design a binary classiﬁer can be expanded to other databases with great results, even if the participants do not execute similar facial expressions.


Introduction
The face plays an important role in visual communication. By looking at the face, a person can automatically extract many nonverbal messages, such as the identity, intent, and emotion of others [1]. Facial paralysis is known as the inability to move the muscles of the face on one or both sides. This inability can result from nerve damage due to congenital conditions; trauma; or disease, such as stroke, brain tumor, or Bell's palsy. The problem can affect one or both sides of the face, with the noticeable drooping of the features and problems with speaking, blinking, swallowing saliva, eating, or communicating through natural facial expressions. These physical signs of facial paralysis can provide information to the clinician concerning the state of the patient [2].
The process of detecting facial paralysis is important in assessing the severity of the facial nerve and muscle malfunction and in order to record physical improvements when treating and monitoring the patient. Computer-based automatic facial paralysis detection is important in developing standardized tools for medical assessment, treatment, and monitoring and to reduce healthcare costs through the inclusion of automatic processes [2]. Additionally, computer-based systems are expected to provide user-friendly tools, in the near future, for patient monitoring at home.
Particularly in the field of computer vision, the analysis of facial signs has motivated a lot of studies on automated facial nerve function assessment from the biomedical visual capture of the face. The visual capture of the face extends from traditional images and video to infrared (thermal imaging) and also depth images (stereo photogrammetry). A few methods based on the use of images perform a process for feature extraction that consists of detecting the face region in the image and later extracting key points (also known as facial landmarks) based on a specific model. There are some publicly available models, also known as shape predictors, that extract facial landmarks using Haar cascades. The 68 points predictor proposed by Matthews and Baker in [3] is widely known and employed in the field of facial analysis. In the literature, some authors create their own shape predictor to extract facial landmarks with a better performance of the task [4]. In a common approach, the extraction of facial landmarks is a crucial step, because these key points are used to compute other measures, such as the distances, angles, and areas between the landmarks. In the facial analysis process, those measures are fed into a classifier to train it and to detect facial nerve damage, and, later, its severity.
To achieve the automatic classification of facial nerve damage, it is necessary for a clinical practitioner to evaluate a set of facial images to label them as healthy or unhealthy faces. With annotated data, any classifier could be trained using the extracted facial measures and the labels to detect facial nerve damage in a new image. Although a variety of automated facial nerve function assessment solutions have been proposed, their outcomes fall into two main categories: (1) non-semantic numerical values quantifying static, dynamic, and synkinetic facial features; (2) semantic grade of facial nerve function designed by the clinician [5]. Most existing solutions belong to the first category; however, most of these solutions stay at the method discussion phase, and only a few of them have been implemented into prototypes-for example, the Emotrics software [6]. Solutions in the second category aim to quantify the facial nerve function according to a specific facial nerve grading scale designed by clinicians. To achieve this target, machine learning techniques should be applied to build a predictive model which is trained on labelled data and capable of making predictions on new data. Here, the data are facial images from either a healthy subject or a facial palsy patient, and the prediction is a binary decision (healthy or patient) or it could be the grade of facial nerve function. Classifiers for these applications are based on the methods of support vector machine (SVM), artificial neural network (ANN), k-nearest-neighbor (KNN), or hybrid classifier [5]. For a new subject, a typical system extracts computational features from the facial data then calls a pre-trained classifier to map the features to the facial nerve malfunction.
There are some works that use facial landmarks (i.e., facial key points extraction), before performing facial analysis [6][7][8]. There are other works that specifically intend to detect facial paralysis as a binary classification problem. Kim et al. proposed a smartphonebased automatic diagnosis system that consists of three modules: facial landmark detector, feature extractor based on facial regions, and a classifier [9]. Three facial movements were analyzed: resting, smiling, and raising the eyebrows. The system was evaluated on a private database with 23 facial palsy patients and 13 healthy volunteers. The authors reported their highest classification accuracy at 88.9%. Hsu et al. proposed a deep learning solution for the detection of facial palsy using a regular camera [10]. They formulated the facial palsy identification as an object detection problem and considered the deformation regions cause by facial palsy, or simply the palsy regions, on a patient's face as the target objects. Their proposed solution is a hierarchical network composed of three components: face detection, facial landmark detection, and local palsy region detection. The authors reported a 93% of prediction accuracy in their private database. Barbosa et al. proposed a method to classify facial paralysis in two stages: the discrimination of healthy from unhealthy subjects, and then, facial palsy classification among unhealthy subjects [11]. They used four facial expressions (at rest, raising the eyebrows, screwing up the nose, and smiling) to measure symmetry. They built a classification model by combining a rulebased approach and a machine learning method (hybrid classifier). The authors reported an up to 98.12% sensitivity in the discrimination among healthy and unhealthy in their private database.
In this work, we aim to detect facial paralysis in a set of face images, meaning that we aim to identify if the subject is healthy or a patient (binary classification). In this methodology, the evaluation of the input image is performed independently of the facial movement executed by the patient, different from other works which use a set of facial gestures to compute symmetry variations between expressions, then they detect facial paralysis. Our system measures extracted facial landmarks using simple mathematical operations that keep the implementation uncomplicated but that are still effective. Our facial measures aim to detect the level of asymmetry between the two sides of the face and also characterize facial gestures so that the learning algorithm is able to relate each asymmetry level with the expression found in the image. The proposed measures extract information from the eyebrows, eyes, nose, and mouth; they are not divided by specific regions and do not have pre-marked zones, as other authors propose. Our classification approach is based on a multi-layer perceptron, which provides a label as an output.
The contributions of this research are: a set of facial measures, easily computed from facial landmarks, for the binary classification of facial paralysis independently of the facial movement performed by the subject; evaluation using two public image databases; and a classification model to detect facial paralysis. The remainder of the paper is organized as follows: Section 2 describes the proposed methodology, Section 3 introduces our findings and discussion, and finally Section 4 provides concluding remarks.

Methodology
The framework of the proposed facial paralysis detection system is shown in Figure 1. As mentioned before, a typical system starts with the extraction of facial landmarks. However, it has been established by other authors that the available facial models are not suited to analyze subjects suffering from facial impairments [9,11,12]. Therefore, we took advantage of the 68 key points shape predictor developed by Guarin et al. and presented in [12]. Their model, called the MEEshape predictor, is publicly available for testing and comparison. A better performance in predicting facial landmarks in patients' photographs was observed in preliminary experiments using the MEE shape predictor. This was especially so if comparing the results with two of the available OpenCV implementations [13]: the LBF model [14] or the Kazemi model [15].

Facial Landmarks Extraction
The process to extract facial landmarks from a face within an input image, preferably a frontal face photograph, is based on the method and model proposed by Guarin et al. in [12]. The full facial landmarks extraction process follows as:

1.
Convert the color input image to gray scale.

2.
Resize the gray image using an scale factor (s f ) of s f = W nW , where nW = 200 and nH = H s f . W and H refer to the size of the input image (width and height, respectively).

3.
Detect the face on the resized image using the publicly available dlib libraries.

4.
Re-scale the detected face area using s f .

5.
Predict the facial landmarks on the re-scaled face area. 6.
Store the extracted data for future data processing.
The MEE shape predictor is trained to detect 68 points, but only 51 of them are of interest in this work. As depicted on Figure 2, the 51 points are reorganized to facilitate the computation of measures. Since the accuracy of the facial symmetry quantification can be affected by the tilt angle of the head [16], before computing any symmetry measures tilt correction was performed using the facial landmarks previously extracted. In this work, the first and last jaw points, points 48 and 49 in Figure 2b, were used to correct the possible tilt of the head. Here, point 48 is set as the origin of the coordinate plane and the angle between it and point 49 is the tilt angle to be corrected, with a geometry rotation of the key points. Notice that the angle between the two points was obtained according to Equation (1): where x = Pa x − Pb x and y = Pa y − Pb y . The facial landmark rotation process goes as follows: 1.
Compute a rotation matrix M. The function getRotationMatrix2D from OpenCV [13] can be employed using P48 and the angle ∠(P48, P49) between the two jaw points.

2.
Rotate the 51 key points using M: 3. Store the rotated data for future data processing.

Facial Measures Computation
As stated before, in this methodology the evaluation of the input image was performed aiming to detect a level of asymmetry within the two sides of the face. The proposed measures extract information from the eyebrows, eyes, nose, and mouth. As depicted on Figure 3, 34 distances were calculated using the facial key points. Some of those distances (A to K) were inspired by the work of [17], who aimed to evaluate objective measures from face photographs with a target other than facial paralysis detection, but they seem to be a good reference to characterize the healthy human face. Here, distances A to Q were proposed to detect any level of asymmetry among face sides (or regions), while measures R to Z were proposed to identify the facial movement executed by the patient; then, the asymmetry level was computed.  The proposed facial measures aim to identify any level of asymmetry between the left and right sides of the face, although locating the paralyzed side (or region) of the face is out of the scope of this work. Most measures were compared and ratios were computed to obtain a value in the range of [0, 1], where 0 means fairly asymmetric and 1 is assumed to be closer to a healthy face. Before describing our proposed measures, it is worth noticing that the angle between two points was computed according to Equation (1), the slope between points was computed following Equation (3), the Euclidean distance was employed in this work according to Equation (4), and the perimeter of a close segment was computed using Equation (5): S(P s , . . . , P l ) = l−1 ∑ x=s d(P x , P x + 1) + d(P s , P l ) (5) where S is a close segment, P s is the start point, and P l is the last one and represents the end point within the segment. In this work, it is assumed that a healthy face is symmetric with respect to the position of the face elements (e.g., the eyelids, the mouth corners, the eyebrows) independently of the face gesture. In the case where those elements are not symmetric to each other, it is assumed that a level of paralysis will be diagnosed. Therefore, 29 symmetry measures are proposed and described in Table 1; please refer to Figures 2 and 3 if further detail is required. In f19, f20 and f21, W is the distance depicted in Figure 3c, being the perimeter values W l and W r calculated as W l = S(P28, P29, P30, P31, P37, P38, P39) and W r = S(P31, P32, P33, P34, P35, P36, P37).

Multi-Layer Perceptron Classifier
Classification is a long-established problem in machine learning, and the use of artificial neural networks is fairly common. In this work, we employed the Weka (Waikato Environment for Knowledge Analysis) [18] suite of machine learning to train a binary classifier based on the multi-layer perceptron approach. The Weka function known as the Multilayer Perceptron (MLP) requires at least five parameters to learn: learning rate (L), momentum (M), training time (N), hidden layers (H), and seed (S). Both, the learning rate and the momentum are used to update the weights of the connections between nodes. The training time is the number of epochs to train through. Hidden layers refers to the number of layers within the MLP and the number of neurons for each. The seed is a number used to randomly initialize the weights of the connections between nodes. This classifier uses backpropagation to learn a multi-layer perceptron to classify instances. For more information regarding the Weka software, refer to [18].

Experiments and Results
There are a number of methodologies aiming to detect facial paralysis in a photograph. However, collaboration among the research community has been difficult due to the unavailability of public datasets, mainly because of patient privacy as many patients would prefer not to share their biometrical data. This situation motivated us to test our classification system on two different databases that recently became publicly available. Some remarks on such databases are given here.
First, we used the Massachusetts Eye and Ear Infirmary (MEEI) database, which is an open source set of facial photographs and videos representing the entire spectrum of flaccid and nonflaccid facial palsy collected by Greene et al. and introduced in [19]. The MEEI database was released to serve as a resource for facial palsy education and research. Initially, to demonstrate the utility of the database the relationship between the level of facial function and the perceived emotion expression was successfully characterized using a machine learning-based algorithm [19]. Later, the MEEI database was employed to develop a novel machine learning algorithm for the fast and accurate localization of facial landmarks in photographs of facial palsy patients; the improved shape predictor represents the first step toward an automatic system for computer-aided assessment in facial palsy [12]. In order to determine an agreement between the facial function evaluation using high-quality photographs and using in-person evaluation, the MEEI database was employed. The authors demonstrated that facial symmetry in facial palsy patients can be monitored using standardized frontal photographs [20]. Recently, the MEEI database was also employed to compare a clinician evaluation against machine learning-derived automated assessments in frontal photographs. The authors of [21] concluded that automated scores predicted more asymmetry in normal patients and less asymmetry in patients with flaccid palsy and synkinesis compared to clinician grading. Automated assessments hold promise for the standardization of facial palsy outcome measures and may eliminate the observer bias seen in clinician-graded scales. The database is composed of 480 high-resolution images from 60 participants, 10 healthy subjects, and 50 patients (25 suffering from flaccid and 25 from nonflaccid paralysis), each one performing 8 different facial movements: (1) at rest, (2) eyebrow elevation, (3) light effort eye closure, (4) full effort closure, (5) light effort smile, (6) full effort smile, (7) pucker, and (8) lip depression. This image database was used to design our classification system. To the best of our knowledge, there is no other publicly available database with these characteristics.
Second, the Toronto NeuroFace (TNF) is also a publicly available dataset, collected by Bandini et al. and introduced in [22], which aims to assess neurological disorders. Similarly to the MEEI dataset, the Toronto NeuroFace dataset was released to foster the development of novel and robust approaches for face alignment and oro-facial assessments that can be used to track and analyze facial movements in clinical populations suffering from amyotrophic lateral sclerosis and other neurological diseases. Authors in [22] analyzed the importance of using algorithms trained with data from the target population in order to improve the localization of facial landmarks and also the accuracy in face alignment. The TNF dataset consists of 261 videos, clinical scores per video, and more than 3300 annotated frames of faces from individuals performing oro-facial tasks typical of the clinical assessment.
To the best of our knowledge, neither the MEEI database or the TNF dataset have been employed to specifically detect facial paralysis in frontal face photography. It is worth noticing that both databases intend to facilitate information for the development of clinical applications; however, they are not equivalent in terms of image quality, lighting, and pose conditions, and the tasks performed by the participants are not equivalent. In other words, both databases are not directly comparable for our classification problem, but they were helpful in the design process.
As stated before, in Section 2.3, the Weka suite was employed at the design stage of our MLP classifier. The suite allowed us to analyze the performance of the MLP classifier parameters-in this case the function named CVParameterSelection was employed [18]. As a result of the analysis, the configuration values were set as learning rate L = 0.2045, momentum M = 0.1909, training time N = 500, seed S = 0, and hidden layers H = 56. It might be relevant to mention that the 29 features were selected from a set of symmetry measures after evaluating the worth of the features using the strategy implemented by the function named ClassifierAttributeEval [18].
Using the 10-fold cross-validation technique, the MLP classifier was trained with 640 samples computed from the MEEI database. Cross-validation is a statistical method used to estimate the performance of machine learning models on new data; it is a widely known technique used to train and evaluate. Later, it was observed that the MLP function did not perform sequential learning, resulting in a negative impact on the classifier performance because the available dataset was unbalanced. That is, there were fewer healthy samples compared to the unhealthy ones (80 vs. 400). To overcome this situation, the healthy set was replicated three times with a sample augmentation process. Similar to the process suggested in [9,16], the healthy images were rotated in two opposite directions, increasing the amount of available data and also verifying that our algorithm was invariant to rotation. In previous experiments, it was observed that increasing the amount of healthy instances by three times is enough to learn and discriminate this class without over fitting. In the end, the training of the MLP classifier was executed with 640 samples (240 healthy and 400 unhealthy instances).
In Table 2, the performance of the model for the testing part of the data for each fold is presented. There, TN stands for true negative (i.e., healthy samples), TP stands for true positive (i.e., unhealthy samples), and false negative (FN) and false positive (FP) are incorrectly classified samples. Here, the worst performance is 84.37% for the 1st fold and the best is 100% for the 5th fold. The average performance of this methodology is 94.06%, as shown in Table 2. After evaluating our MLP classifier with the original 480 samples, there is a 99.79% correct classification of the MEEI database, yielding a sensitivity of 99.75% and a specificity of 100%. The confusion matrix of test on the actual MEEI database is given in Table 3. It was already mentioned that the Toronto NeuroFace (TNF) is a database collected for clinical assessment, and to the best of our understanding and the information provided by the authors, the subset called Stroke can be used to evaluate our methodology. There are 8 subjects with facial asymmetry and 3 participants without it-in total, 817 asymmetrical samples and 219 symmetrical ones. It is important to notice that the TNF images refer to frames extracted from the subjects videos performing oro-facial tasks. Following the proposed methodology, a 10-fold cross-validation technique was employed to train another MLP classifier using samples computed from the TNF database. In Table 4, the performance of the model on the testing part of the data for each fold is presented. Here, the worst performance was 94.56% for the 5th fold and the best was 98.65% for the 3rd fold. The average performance of this methodology for the second dataset was 97.22%, as shown in Table 4. A 98.55% correct classification was found after evaluating the MLP classifier with the original 1036 samples. The confusion matrix of the test on the actual TNF database is described in Table 5, which yields a sensitivity of 98.29% and a specificity of 99.54%, proving that our methodology to design a binary classifier to detect facial paralysis can be extended to other databases of face photographs with an outstanding performance. Our methodology detects facial asymmetry levels within an image independently of the gesture performed by the subject, while most of the other methods compute facial asymmetry levels from a set of different facial gestures from the same subject. If we focus on the facial paralysis detection systems (i.e., a binary classification) that compute facial landmarks at some point of their processing, we can put together a summary of three methodologies, as previously introduced in Section 1; such relation is shown in Table 6. We observe that a direct comparison of methods in this table is not feasible because of discrepancies in the goals and metrics used; nonetheless, we include the results of works closer to ours. Those methods compute facial landmarks at some point of their process but train their shape predictor model on their own private dataset (e.g., [9][10][11]). Those different performance measures obtained from our tests and shown in Table 6 lead us to suggest that our method exhibits a better performance or, at least, one similar to that of other approaches.

Conclusions and Future Work
A methodology to detect facial paralysis in an image, assuming that it can be interpreted as a problem of asymmetry levels among face elements, was proposed. The system consists of a set of 29 facial measures easily computed from predicted landmarks and a binary classifier based on a multi-layer perceptron approach, which provides a healthy or unhealthy label as a result. The performance of the methodology was evaluated in two image databases. In the first one, consisting of 480 images of 60 subjects, the approach attained a 94.06% correct classification; a performance of 97.22% was reached on the second database, showing that our methodology to design a binary classifier can be expanded to other databases with excellent results. Future work can be oriented to more specific classification tasks on image databases.