Robust Eye Blink Detection Based on Eye Landmarks and Savitzky–Golay Filtering

: A new technique to detect eye blinks is proposed based on automatic tracking of facial landmarks to localise the eyes and eyelid contours. Automatic facial landmarks detectors are trained on an in-the-wild dataset and shows an outstanding robustness to varying lighting conditions, facial expressions, and head orientation. The proposed technique estimates the facial landmark positions and extracts the vertical distance between eyelids for each video frame. Next, a Savitzky–Golay (SG) ﬁlter is employed to smooth the obtained signal while keeping the peak information to detect eye blinks. Finally, eye blinks are detected as sharp peaks and a ﬁnite state machine is used to check for false blink and true blink cases based on their duration. The efﬁciency of the proposed technique is shown to outperform the state-of-the-art methods on three standard datasets.


Introduction
Recently, blink detection technology has been applied in various fields such as the interaction between disabled people and computers [1], drowsiness detection [2], and cognitive load [3].Therefore, the analysis of the eye state in terms of blink period, blink count, and frequency are an important source of information about the state of a subject and helps to investigate the influence of external factors on the change in emotional states.Eye blink is defined as a rapid closing and reopening of eyelids, and it typically lasts from 100 to 400 ms [4].Previous methods for eye blink detection estimate eye state as either open or closed [5] or track eye closure events [6].Other methods use template matching, where templates with open and/or closed eyes are learned and a normalised cross correlation coefficient is computed for an eye region of each image [7].These methods, however, are sensitive to image resolution, illumination, and facial movement dynamics.Recently, robust real-time facial feature trackers that track a set of interest points on a human face have been proposed.These trackers have been validated in a battery of experiments that evaluate their precision and robustness to varying illumination, various facial expressions, and head rotation.In this paper, a simple and efficient technique to detect eye blinks is proposed; it consists of four steps.In the first step, facial landmark positions are estimated.In the second step, the eye openness state is characterised by measuring the distance between eyelids.This is followed by applying Savitzky-Golay (SG) filtering to smooth the obtained signal and reduce signal noise.Then, the rapid distance changes between the eyelids are detected as blinks.Finally, an FSM is used to find true blink cases according to the blink duration.

Related Work
Viola and Jones' algorithm was employed on most of the methods to detect face and eyes [8].However, this algorithm is not able to track faces and eyes if the head moves or if light conditions change.Region tracking is frequently combined with Viola-Jones to achieve higher detection accuracy, despite the changes in facial pose [5].Different techniques have been proposed for blink detection.They can be classified into several categories such as contour analysis on difference images, optical flow, and template matching [9,10].In the template matching method, an open and/or closed eye template is learned using the correlation coefficient over time.Re-initialisation is triggered by the correlation coefficient falling under the defined threshold.A blink is detected if two successive frames' correlation coefficient value is lower than a predefined threshold.In [10], template matching using a histogram of local binary patterns (LBPs) was used to detect eye blinks.First, an open eye template is created from several initial images where the eye is open and not moving.For subsequent frames, eye region LBP histogram is calculated and compared with the template using the Kullback-Leibler divergence measure.The output waveform is filtered out using SG and the top hat operator.Later on, peaks are detected using the Grubb test and considered as eye blinks.This method yielded a detection rate of 99% on ZJU and Basler5 datasets, using different parameters for each dataset.A weighted gradient descriptor (WGD) was introduced in [11]; in this work, a new localisation scheme was introduced to validate the eye region returned by cascade models.This approach is based on calculating the partial derivatives for each pixel within the localised eye region over time.Weighted vectors are obtained in orientations (up and down), and an input waveform is obtained by finding the vertical difference between the y-coordinates of those vectors.Negative and positive waveform peaks represent the closing and opening of the eye.After noise filtering, eye blinks are represented by a local maximum and minimum.The authors in [11] report the best obtained results for given datasets using different parameters.A new dataset of five people recorded using a 100 fps Basler camera was also introduced in [11], and the reported detection rate on the Basler5 and the ZJU datasets was around 90% and 98.8%, respectively.In motion-based eye blink detection methods, rather than depending on appearance features, two or more consecutive frames are needed for frame differencing.A method for using optical flow to analyse the level of angular similarity in orientation between the motion vectors in the face and eye regions is described in [12].This method has been tested on a set of images rather than video recordings, achieving an accuracy of 96.96%.The Lucas-Kanade tracker was also used by Drutarovsky and Fogelton [13] to track the eye region.Around 255 trackers were placed over an eye region divided into 3*3 cells.Next, motion vectors are computed for each cell to obtain the input waveforms for a state machine.If the eyelid moved down and followed by upward movement within 150 ms, eye blink is detected by a state machine.This paper introduced the Eyeblink8 dataset, which is characterised by vivid facial mimics of recorded people.The reported recall is 73% on ZJU and 85% on Eyeblink8.Other approaches include a segmentation based on active shape models (ASMs).The authors of [14] used active shape models to obtain 98 facial landmarks.Eye shape is approximated using 8 landmarks for each eye.The ratio of the average height of eyes to the distance between eyes is used to estimate the degree of eye openness.Eye blink is detected if the eye openness degree changes from a threshold (thl) larger than 0:12 to a threshold (ths) smaller than 0:02.This method cannot deal with more challenging facial expressions found in videos in the wild and uses a fixed threshold for blink detection.Because ASMs must be pre-trained for each participant, they are not well suited for clinical applications or large numbers of participants, and training can take a long time.
These methods are sensitive to illumination changes, image resolution, and the rotation of the face, so robust real-time facial feature trackers that track a set of interest points on a human face have recently been proposed [15].These trackers have been validated in a battery of experiments that evaluate their precision and robustness to varying illumination, various facial expressions, and head rotation [16].In this paper, we propose a simple but efficient technique to detect eye blinks by employing a recent facial feature detector.The level of eye openness is derived as the vertical distance between upper and lower eye lids.Having a per-frame sequence of the eye openness estimates, the eye blinks are detected by filtering the signal using an SG filter and by detecting peaks that represent eye blinks.A finite state machine is then used to check for false and true blink cases according to the blink duration.This technique is evaluated on three standard blink datasets with ground-truth annotations.Moreover, blink properties such as frequency over time, amplitude, and duration are obtained.These characteristics are important in applications where it is required to determine the degree of drowsiness and cognitive load [3].The results obtained for these three standard datasets show an improved performance when compared to existing methods.

The Proposed Method
Blinking is a natural eye motion defined as the rapid closing and opening of the eyelid of a human eye.The proposed technique is composed of four main steps, as shown in Figure 1.These steps are applied to each frame of an input video.This method uses Zface [15] for automatic tracking of facial landmarks to localise the eyes and eyelid contours.The robustness of this method for 3D registration and reconstruction from the 2D video has been validated in a series of experiments [15,17].Using ZFace, no pre-training is required to perform 3D registration from the 2D video.A combined 3D supervised descent method is employed to define the shape model by a 3D mesh.ZFace registers a dense parameterised shape model to an image such that its landmarks correspond to consistent locations on the face.ZFace is used to track 49 facial landmarks from videos, where eye features are detected for each video frame, and the eye-opening state is estimated using the vertical distance (d) between eyelids.
where P1 and P2 are the eye landmark points.It is assumed that the obtained signal from the distance between the upper and lower eyelids is mostly fixed when an eye is open and approaches zero when the eye is closing.This is relatively insensitive to body and head positions.The resulting signal is affected by interference primarily caused by saccadic eye movements and facial expressions.These interferences are filtered while the shape of the signal is maintained.Lastly, the analysis of the filtered signal can be implemented to detect eye blinks represented as peaks representing the distance change between eyelids.Figure 2 below shows the run time for the face tracker.

Pre-Processing of the Extracted Facial Landmarks
In the process of calculating the vertical distance of eyelids, saccadic eye movements, head movements, and facial expressions will bring unavoidable noise to the signal.In order to improve signal quality and reduce tracking errors, signal pre-treatment is necessary to maintain the shape of the signal peaks denoting full eye closure.For this purpose, the signal was smoothed by median filter.Then, the SG filter [18] is utilised for the pre-treatment of the obtained signal as shown in Figure 3 below.The SG filter aims to increase the signal-to-noise ratio without deforming the signal and requires two key parameters: the window size and the polynomial degree.These two parameters are very important for reducing the impact of random noise fluctuations and preserving important signal information.If the window length is too long, some loss of valid signals will result, whereas, if the window length is too short, it cannot filter the signal well.Choosing a high polynomial degree may produce new unwanted noise, while too low a polynomial degree may lead to signal distortion as a result of over-smoothing.Therefore, it is important to select the window length and the polynomial degree appropriately to achieve a good trade-off between random noise reduction and valid signal preservation.The polynomial degree is selected in the range of one to three, and the window length is automatically adjusted by keeping the polynomial degree as a constant until an optimal result is obtained.The mathematical description of the smoothing process implemented by SG filtering is shown by the following formula: where Sis the original signal, S* is the processed signal, C i is the coefficient for the i-th smoothing, and N is the number of data points in the smoothing window and is equal to 2m + 1, where m represents the half-width of the smoothing window.The index j represents the running index of the ordinate data in the original data table [19].The core of SG filtering is selecting a polynomial in a sliding window to fit the original signal point-by-point depending on the least-squares estimation algorithm.The polynomial can be modeled as Here, b n are the coefficients of this polynomial, and k denotes the polynomial degree.The distinctive property of the SG filter is that it incorporates differentiation and smoothing into one algorithm, which is a reasonable method as smoothing is always required with differentiation [20].

Peak Detection
As eye blinks are characterised as peaks in the signal, first, a median filter is used for baseline correction to filter out the noise.This often results from certain facial expressions, smiling, closing one's eyes for a time longer than the duration of a blink, and saccadic eye movements.The next step is the detection of local peaks in the signal, which are larger than the value of the nearby samples.For 30 fps video, a scanning time window of 500 ms is used to find peaks.Each sample in the signal is detected as a peak if it is the largest value in the scanning window.As shown in Figure 3, the obtained signal is noisy, and detecting eye blinks in this case will result in many false positives.Applying an SG filter improves the signal appearance and blink detection.

Finite State Machine
After the smoothed signal of the eye is obtained, a finite state machine (FSM) is used to check the detected signal peaks for false and true blink cases.A blink duration is supposed to last for 100-400 ms.Hence, if the eye is closed for more than 800-900 ms, it is considered that the subject is either drowsy, looking down, or blinking involuntarily.An FSM is a simple model for keeping track of events triggered by external inputs.An FSM consists of a set of states to decide what happens when a particular input comes and which event is triggered.Typically, these states consist of an initial state, input events, output events, and a transition function that takes the current state and input event to generate a new output and next state.In this experiment, blink duration is assumed to last from 100 to 500 ms, taking into account noise from head movement and facial expressions.The width of the peak represents the time in ms for each detected blink.The full width at half maximum (fwhm) is used to find the peak width as shown in Figure 4.The calculated peak widths are used by the FSM to differentiate between blink and non-blink peaks.Each movement in the state machine has a number.After a peak is detected, the following states occur, as shown in Figure 5: 1.The state machine increments from State 0 to State 1.In State 1, if the peak duration lasts for a period of less than 100 ms, it is considered an invalid blink and the state is reset to 0. 2. In State 2, if the peak period lasts from 100 to 500 ms (16 frames for 30 fps video), it is considered a valid blink.The state resets to 0, and the blink is validated.3.If the state reaches 3, the eye closing period is considered to be more than 500 ms, so an invalid blink is concluded.

Eyeblink Datasets
• ZJU: This database [12] consists of 80 videos of 20 individuals.Each individual has 4 clips: frontal view, upward view, with glasses, and without glasses.Each clip lasts a few seconds and is 30 fps with a resolution of 320 × 240.There is no facial expression and almost no head movements.This dataset has different numbers of ground truth eye blinks reported, as shown in Table 1.A ground truth blink is defined by its beginning frame, peak frame, and ending frame.
Table 1.Different numbers of reported ground truth eye blinks in the ZJU dataset.
Reference Dataset [13] 264 [21] 261 [11] 258 [5] 255 • Eyeblink8: This dataset is more challenging as it contains facial expressions, head movements, and looking down on a keyboard.This dataset consists of 408 blinks on 70,992 video frames, as annotated by [21] with a video resolution of 640 × 480 captured at 30 fps with an average length from 5000 to 11,000 frames.

•
Talking face: This dataset consists of one video recording of one subject talking in front of the camera and making different facial expressions.This video clip is captured with 25 fps with a resolution of 720 × 576 and contains 61 annotated blinks [13,21].

Eye Blink Detection
In this section, the performance of the proposed eye blink detection technique is evaluated by comparing detected blinks with ground-truth blinks using the three standard datasets described above.
The following equations are used: The performance of the proposed approach is evaluated following [11], which considers eye blink as detected if the detected eye blink peak is between the start and end frame of the ground truth annotation.TP is the true positive count, FP is the false positive count, and FN is the false negative count.Using the mentioned datasets, the proposed method outperforms methods used in [5,13,21].The statistics are listed in Table 2 below.The false negatives counted using the ZJU dataset is due to the fact that the videos start with a blink and sometimes ends while the subject is blinking.For the Eyeblink8 dataset, the false negatives result from facial expressions, moving hands, narrowed eyes, looking down without blinking, and closing eyes intentionally for a long time.

Eye Blink Statistics
The main blink properties are the blink rate, the duration, and the amplitude.We measured blink frequency, duration, and amplitude.The blink rate indicates the number of blinks per minute.According to [22], the mean eye blink during rest was 17 blinks/min.During conversation, it increases to 26, and it decreases to 4.5 while reading.The blink rate in the ZJU is about 50 blinks per minute, because most of the blinks recorded for the ZJU were voluntary, and the subjects were informed that they would be recorded for a blink detection study.Next, the blink duration was measured.This blink property can last from 60 ms up to 700 ms and may reflect the subject's mood or the mental state.We measured eye blink duration and estimated the total time during which the eyes were in a closed state.Table 3 lists the blink rates and durations for the test datasets, and Figure 6 shows an example of a blink duration.Blink amplitude describes the degree of eye openness and differs between subjects.This property could be measured for each video, Figure 7 below shows the maximum amplitude during blinking.

Conclusions
There has been an increased interest in eye blink detection algorithms for different purposes, such as driver fatigue detection and cognitive load.In this paper, an overview of important eye blink detection methods is introduced, and a novel automatic technique for eye blink detection based on a facial tracker was presented.This technique operates using video frames and by finding the vertical distance between eyelids, which is then followed by signal filtering using SG, peak detection, fwhm calculation, and an FSM to reduce false negative eye blinks.The proposed methods were thoroughly experimentally tested and qualitatively evaluated on three standard datasets.The proposed method achieved a higher precision and recall over [21] using the Eyeblink8 data set.However, false negative blinks were detected due to unrecoverable noise in the obtained signal; such noise occurred because (1) some of the blinks were voluntary, and such blinks often have a relatively long duration, (2) the subject was looking down (i.e., looking at the keyboard), (3) blinks occurred amid opposite head movements, (4) subjects wore thick glasses which reflected too much light, and (5) subjects were too far from the camera.Other reasons may have contributed to this noise.Eye blink duration, frequency, and amplitude were obtained for analysis of eye movement dynamics.In our future research, we intend to test the proposed method in more varied environments, conduct more analyses using other digital filters or a combination of them to reduce signal noise to a minimum and consider additional parameters that describe eye state during blinking.

Figure 1 .
Figure 1.Overview of the proposed technique.

Figure 2 .
Figure 2.An example of facial tracker using Blink8 data.

Figure 3 .
Figure 3. Processing steps leading to the detection of eye blinks: (a) Signal obtained from facial landmark tracker in a video racking session, (b) peak detection applied to the noisy signal, (c) SG filter applied to the signal and baseline corrected , and (d) peak detection mrthod is applied on the filtered signal.

Figure 4 .
Figure 4. Blink peak width calculation using the full width at half maximum (fwhm).

Figure 5 .
Figure 5. Finite state machine for blink duration estimation.

Table 2 .
Results of the proposed technique and of other existing methods.