A Smartphone-Based Automatic Diagnosis System for Facial Nerve Palsy

Facial nerve palsy induces a weakness or loss of facial expression through damage of the facial nerve. A quantitative and reliable assessment system for facial nerve palsy is required for both patients and clinicians. In this study, we propose a rapid and portable smartphone-based automatic diagnosis system that discriminates facial nerve palsy from normal subjects. Facial landmarks are localized and tracked by an incremental parallel cascade of the linear regression method. An asymmetry index is computed using the displacement ratio between the left and right side of the forehead and mouth regions during three motions: resting, raising eye-brow and smiling. To classify facial nerve palsy, we used Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM), and Leave-one-out Cross Validation (LOOCV) with 36 subjects. The classification accuracy rate was 88.9%.


Introduction
Facial nerve palsy is a nervous system disorder where there is loss of voluntary muscle movement in a patient's face caused by nerve damage. A clinical assessment system of facial nerve palsy is an important tool for diagnosing, monitoring and treating facial nerve palsy. The House-Brackmann (H-B) scale is a widely accepted system that grades the facial function from normal (grade 1) to total paralysis (grade 6) [1]. The disadvantage of the H-B scale is its subjective assessment characteristics, which are not reliable and vary among clinicians.
Recently, many studies have proposed methods to quantify facial nerve palsy using image-processing techniques. Current automatic grading systems can be divided into image or video-based. The main advantages of image-based systems are their low computational costs and ease of use. However, such systems have several limitations because images contain less information than video. This drawback introduces errors in calculating the asymmetric index as well as low reproducibility. In response to these disadvantages, most recent studies have proposed video-based systems. Park et al. proposed measuring the asymmetric index in the mouth region using video captured with webcams [2]. They used a thresholding approach based on the HSV space and point-tracking algorithms for lip segmentation and tracking, but the algorithms can be easily affected by the recording environment. In addition, the asymmetry index measured only the mouth region. Wang et al. proposed an automatic recognition method from six facial actions using active shape models plus Local Binary Patterns (ASMLBP) [3]. They used images, not videos, and only recognized those patterns of facial movements required to evaluate the diagnosis of facial paralysis. McGrenary et al. proposed a system that uses an artificial neural network [4]. They trained the network with maximum displacement values and mean intensities calculated through regional analysis. This system is not automatic and its performance can be affected by the conditions of the recording environment, such as lighting conditions. Shu et al. demonstrated a system that uses Multiresolution Local Binary Patterns (MLBP) to extract feature and resistor-average distance to measure the asymmetry of facial movements [5]. A total of 197 videos were obtained from a video camera and validated using Support Vector Machine (SVM). The MLBP-based method achieved 94% accuracy for the overall H-B scale. Although their results estimated the facial palsy grade, they had constraints with the recording video environment. Reilly et al. used Active Appearance Models (AAMs) for facial feature localization and extracted the distance between the corners of the mouth and mean smile as features [6]. However, they used a synthesized dataset that was not from real-world data and assessed paralysis of smiling function. All these studies are based on separate systems that cannot be carried by clinicians to evaluate patients. Furthermore, previous studies have not focused on the diagnosis of facial palsy nerve patients from normal state and have assumed that measurements are conducted in constrained environment.
In this study, we propose a smartphone-based facial nerve palsy assessment system to diagnose facial nerve palsy quickly and simply in daily life, similar to using videophone functions. The automatic system is designed to helping both patients and clinicians, who only need to perform three motions to diagnose facial nerve palsy and record progression of the disease. To diagnose facial nerve palsy, we use incremental training of discriminative models to detect facial shape points and extract those features that represent face asymmetry using shape points.

Experiment
In this section, we describe an incremental parallel cascade of linear regression for face landmark localization and tracking proposed by Asthana et al. [7]. Then, we describe the data acquisition and analysis.

Incremental Parallel Cascade of Linear Regression
There have been many successful facial landmark localization algorithms. In this study, we need a robust tracker because patients have asymmetric facial characteristics. When we attempted to localize facial landmarks using standard AAMS, the performance was good for healthy subjects, but poor for patients. For landmark localization, we choose the algorithm referred to as incremental Parallel Cascade of Linear Regression (iPAR-CLR). This algorithm can train data quickly and add new training samples to the regression functions without re-training on previous information. In this study, we use a 2D-shape model described as: where is the vector that represents the location of feature shapes, ̅ is the mean location of the shapes, denotes the submatrix of the basis of variations, is the parameters of non-rigid variations of the shape, R is rotation, s is scale, and is translation. The parameters of the model are = s, , , . The details for iPAR-CLR can be found in [7].
The facial feature tracker was trained using the iPAR-CLR method with 49 facial landmark points, as shown in Figure 1. For successful facial feature tracking, we used real-world databases not collected in a controlled environment. We trained the facial feature tracker using widely used databases: Labeled Face Parts in the Wild (LFPW) [8], Helen [9], Annotated Faces in the Wild (AFW) [10] and intelligent Behavior Understanding Group (iBUG) [11,12]. LFPW consists of 1423 faces from images downloaded from Google, Flickr, and Yahoo. The Helen database consists of 2330 faces gathered from Flickr. The AFW database consists of 468 faces, and the iBUG database consists of 135 faces. Examples of the trained databases are shown in Figure 2. The databases has 68 annotated landmarks, but we used only 49 points, with the exception of the contour of the face.

Data Acquisition
All participants were asked to perform three facial movements: "rest," "smile," and "raise eye brows". The video was acquired using iPhone 4S and iPhone 6 at 30 frames per second with 1080 × 1920 resolution using a rear-facing camera. The clinicians acquired front view videos of the participants by holding the smartphone in their hands with no constraints and under normal office fluorescent lighting conditions. The video recording duration of each person was 15~20 s.

Feature Extraction
The asymmetric index was calculated using the displacement of shape point sets that correspond to the eye-brows and mouth regions while the participants performed facial movements. To extract the asymmetric index, the forehead and eye regions were used based on the H-B scale and heuristic approach. We applied two approaches local points-based method and axis-based method. Given the 49 landmarks from the facial landmarks localization shown in Figure 1, represents the ith landmark and d(x,y) represents the distance between points x and y.

Local Points-Based Feature Extraction
Asymmetry Index of Forehead Region The asymmetry index of the forehead region was calculated using a displacement ratio between the left and right eye brow. We performed the following steps: (1) Calculate the mean point of the left and right eye-brows (LEB and REB) by averaging five points from each eye-brow from number one to five and six to ten as shown in Figure 3: (2) Calculate the mean point of the eye (LEC and REC) by averaging six points from each eye from the numbers 20 to 25 and 26 to 31 as shown in Figure 3: (3) Calculate distance ( and ) between the mean point of the eyebrow and that of the eye as shown in Figure 3: (4) Calculate displacement on each side by subtracting the mean distance of the resting state from the maximum distance of the raising eye brow movement: (5) Calculate the displacement ratio between the left and right side of the forehead. After comparing the two displacement values, the larger becomes the denominator: Asymmetry Index of Mouth Region The asymmetric index of the mouth was calculated using the displacement ratio between the left and right mouth corners. We performed the following steps: (1) Calculate the mean distance ( and ) between the point of the mouth corner and the points of the middle of mouth (P , P , P , P ) as shown in Figure 3: (2) Calculate the displacement of each side by subtracting the mean distance of the resting state from the maximum distance of the smile movement: (3) Calculate the displacement ratio between the left and right side of the mouth. After comparing the two displacement values, the larger becomes the denominator. indicates distance between LEB and LEC. indicates distance between REB and REC. P represents left mouth corner and P represents right mouth corner. and indicate mean distance between point of mouth corner (P , P ) and points of middle of mouth (P , P , P , P ).

Axis-Based Feature Extraction
The face is divided into the left and right regions based on the shape points of the eyes because they represent the minimum asymmetry among the corresponding regions. First we determine the horizontal line using the points of the eye region, and then calculate the vertical line that is perpendicular to the horizontal line. The horizontal line is the extension of the connected line between the left and right medial canthus (P , P ) shown in Figure 4. The vertical line is perpendicular to the horizontal line that passes through the eye's midpoint (EM), which is the mean point of the left and right medial canthus.

Asymmetry Index of Forehead Region
The asymmetry index of the forehead region was calculated using a displacement ratio between the left and right eyebrow. We performed the following steps: (1) Calculate the mean point of the eyebrows by averaging five points for each eyebrow (from the numbers one to five and six to ten, as shown in Equation (2)).  Figure 4).
(4) Calculate the displacement of each side by subtracting the mean distance of the resting state from the maximum distance of the raising eyebrow movement. (5) Calculate the displacement ratio between the left and right side of the forehead. After comparing the two displacement values, the larger becomes the denominator.

Asymmetry Index of Mouth Region
The asymmetric index of the mouth was calculated using the displacement ratio between the left and right mouth corners. We performed the following steps:  indicates distance between point of mouth corner and point of intersection.

Subjects
A total of 36 volunteers participated in the study. Of these, 23 subjects suffered from facial nerve palsy and 13 were normal subjects without facial disorders. Before the experiments, each subject was informed of the experiment procedures and the purpose of the study; all consented.

Results
The images were resized to 540 × 960 to reduce processing time, and asymmetry indices were extracted from the forehead and mouth regions. We compared the performance of combinations of axis and local points-based approaches using Linear Discriminant Analysis (LDA) and SVM with linear kernel as a classification method. Leave-one-out Cross Validation (LOOCV) was used to evaluate the performance of the classification system. Data analysis was performed with MATLAB (Math Works, Inc., Natick, MA, USA). Figures 5 and 6 show one example of the displacement of eye-brows and mouths corners for a patient and normal person. The displacement of the normal subject has the symmetry shown in Figures 5 and 6. However, the displacement of the patient with facial nerve palsy has significant asymmetry. Figure 7 shows that the results of the calculated asymmetry indices with the best combination are the local points-based approach for the forehead and the axis-based approach for the mouth region. The 2D plot of the asymmetry indices of the forehead and mouth regions shows that the mouth region has a more discriminating index. When compared using the mean value and standard deviation of the asymmetry index, the mouth region shows more discrimination than the forehead region. Table 1 represents the classification accuracy, precision rate, and recall rate for LDA and, SVM with linear kernel. The asymmetry index of the forehead region based on local points that of the mouth region based on axis scored the highest classification accuracy at 88.9%. The precision and recall rates were 92.3% and 90.0%, respectively. SVM with RBF and polynomial kernels, not listed in the table, were applied as classifiers, but their results were lower than LDA and SVM with linear kernel.

Simulation of Asymmetry Index with Various Head Orientations
To simulate various head orientations, we used the 3D facial points of a symmetric face consisting of 49 points. The synthetic face was rotated from −30° to 30° at intervals of one degree about the x-, y-, and z-axes, corresponding roll, pitch, and yaw movements. The rotated synthetic face was projected into the x-y plane, which is identical to an image plane, as shown in Figure 8a. We trained the LDA classifier using the best features (forehead_region and mouth_axis) extracted from all participants and tested to synthetic 2D projected face data. According to the simulation results, our method is more sensitive to yaw movement as compared to pitch movement, as shown in Figure 8b. Roll movement did not affect the asymmetry indices.

Measurement Error
During the data analysis, we found that measurement errors lead to wrong classification result. The most critical error occurred when a subject was located on a perfect frontal focus of the camera angle with a rotated head pose. If the left and right mouth corner points moved the same distance while the head rotates at some angle to the camera, the displacement at the image plane of the side rotating toward the camera is increased more than real displacement, whereas the displacement from the side that is rotating away from camera decreases more than the real value. This difference can results in a ratio error calculated from the displacement measurements. If we use the 3D deformable shape model, we can acquire head pose parameters relative to the camera. Therefore, we can calibrate facial landmark positions and reduce the errors in the results caused by head rotation.
The second measurement error is caused by inaccuracy in the facial landmark localization. In general, this error occurs during initialization of the shape points at the start of localization. We can overcome this error by presenting an initial shadow face outline and instructing the subjects to fit their face approximately within the shadow face outline.

Analysis of Eye Region
In this study, we did not analyze closing eye motions for the diagnosis of facial nerve palsy. Calculating asymmetry for the eye region is difficult if displacement of facial points is used. According to the H-B scale, if the eye has asymmetry, the mouth or forehead also has asymmetry. Our goal is to design a system that discriminates facial nerve palsy patients from normal individuals. Within this scope, it is adequate to analyze the forehead and mouth regions only. Shu et al. also showed that the forehead and mouth regions have better results than eye closing motions, although the detailed algorithm is different [5].

Combination of Asymmetry Indices
In this study, we used four combinations of asymmetry indices in the forehead and mouth regions using the axis and local points-based approach. The asymmetry index of the local points-based approach in the mouth region exhibited poor accuracy. In the case of normal subjects, the midpoints of the mouth region ( , , , ) used as reference points showed few movements during the smile movement. However, in the case of patients, the midpoints were followed the movement of unaffected mouth corner during smile movement, and the distance between the left and right mouth corners had similar values to the reference points. The asymmetry index of the axis-based approach in the forehead region also scored poor accuracy compared with the local points-based approach. We cannot explain the reason for this clearly, but we assume that selecting the eye center as the reference point for the local points-based approach is more suitable than the horizontal line used in the axis-based approach. To calculate each eye center, we used six points, but to obtain the horizontal line, we only used two points; therefore, the eye center might be a more reliable reference point.

Performance Comparison with Conventional Methods
As public databases for facial nerve palsy are not available, we compared the results reported in other articles. Previous studies only showed the methods used without mentioning their performance. Few papers presented the performance of their proposed systems. He et al. reported that approximately 94% accuracy was achieved when using MLBP-based method or the optical flow-based method [5]. Reilly et al. reported 88% accuracy when using the AAMs-based method [6]. Previous studies reported only accuracy, not the precision or recall rates, which are important for evaluating the performance of the diagnosis system. The accuracy of the proposed method was lower than that of MLBP or optical flow-based method, but similar to that of the AAMs-based method. However, the MLBP-based method used area information by using the front view of the face. Hence, the drawback of this method