Tracking a Driver’s Face against Extreme Head Poses and Inference of Drowsiness Using a Hidden Markov Model

: This study presents a new method to track driver’s facial states, such as head pose and eye-blinking in the real-time basis. Since a driver in the natural driving condition moves his head in diverse ways and his face is often occluded by his hand or the wheel, it should be a great challenge for the standard face models. Among many, Active Appearance Model (AAM), and Active Shape Model (ASM) are two favored face models. We have extended Discriminative Bayesian ASM by incorporating the extreme pose cases, called it Pose Extended—Active Shape model (PE-ASM). Two face databases (DB) are used for the comparison purpose: one is the Boston University face DB and the other is our custom-made driving DB. Our evaluation indicates that PE-ASM outperforms ASM and AAM in terms of the face ﬁtting against extreme poses. Using this model, we can estimate the driver’s head pose, as well as eye-blinking, by adding respective processes. Two HMMs are trained to model temporal behaviors of these two facial features, and consequently the system can make inference by enumerating these HMM states whether the driver is drowsy or not. Result suggests that it can be used as a driver drowsiness detector in the commercial car where the visual conditions are very diverse and often tough to deal with.


Introduction
According to National Highway Traffic Safety Administration (NHTSA), drowsy driving causes more than 100,000 crashes a year [1].Drowsiness, in general, refers to a driver in a trance state while driving.The most common symptom of drowsiness includes excessive yawning and frequent eye-blinking due to difficulty of keeping one's eyes open.Symptoms also include reduced steering wheel operation and frequent head-nodding.The European Transport Safety Council (ETSC) (Brussels, Belgium)indicates that increment of the above-mentioned symptoms means that the driver is in drowsiness [2].
Automotive manufacturers are conducting diverse researches focusing on anti-drowsiness safety devices to prevent drowsy driving.Typical driver drowsiness detection methods includes learning driving patterns (operation of steering, speed acceleration/reduction, gear operation) or measuring brainwave (EEG), heartbeat, and body temperature to recognize and prevent the drowsy driving.Among many approaches, the non-contact sensor such as camera is often favored in detecting yawning, or tracking the head movement and eye-blink of the driver.Indeed, NHTSA and ETSC manuals suggest that detection and tracking of driver's eye-blinking is the most reliable way in determining the consciousness level of the driver.
Appl.Sci.2016, 6, 137 3 of 14 using HMM [16].In the present study, we have used a Markov chain framework whereby the driver's eye-blinking and head nodding are separately modelled based upon their visual features, and then the system makes a decision by combining those behavioral states of whether the driver is drowsy or not, according to a certain criteria.

Discriminative Bayesian-Active Shape Model (DB-ASM)
DB-ASM is a face model that is based on Active Shape Model (ASM) [9].One of its evolutions is the Constrained Local Model (CLM) in which a global face model is created by combining the local feature detections [8,10].The global model is composed of a Point Distribution Model (PDM), which is acquired by updating the several parameters at each landmark drawn on the face image.In the present study, DB-ASM utilizes the Minimum Output Sum of Squared Error (MOSSE) filter [17] for the local feature detection because it is known to be fast and reliable.Then, the shape parameters use a Maximum A Posteriori (MAP) update.

Point Distribution Model
Point Distribution Model (PDM) is used to learn about the face shapes, built by landmark points drawn by hand, using the Principal Component Analysis (PCA) technique.Before this stage, the scale, rotation, and translation were aligned in data by using Procrustes Analysis.The learned average data, and eigen value and eigenvector were statistically combined to complete a model to transform the shape.Equation (1) represents a PDM: where s 0 " `x0 1 , y 0 1 , . . ., x 0 v , x 0 v ˘T is the mean shape, Φ is the shape subspace matrix holding n eigenvectors, b s is a vector of shape parameters.S(., q) represents a similarity transformation function of the q " " s, θ, t x, t y, ‰ t pose parameters (s, θ, t x , t y are the scale, rotation, and translations with respect to the base mesh s 0 , respectively).

Feature Detector
As mentioned above, the local feature detector of the present DB-ASM is a MOSSE filter.It is known that it is very fast and robust against rotation and partial occlusion.Equation (2) represents the MOSSE filter. G Since the input image is given as f ; filter as h; 2D Gaussian Map as g in the Equation ( 2), each variable is to produce F, H, G, respectively, through 2D Fourier transform.Here, F is acquired by convoluting an input image using the 2D Fast Fourier Transform (FFT).d refers to the element-wise multiplication, whereas * refers to complex conjugate.The filter in Equation ( 2) can be derived by using Equation (3) on the local feature image at each location.
In Equation (3), each image and Gaussian map are to be treated with FFT and then an element-wise convolution to obtain the cumulative data.The cumulative data is to be divided to obtain the filter H ˚.
The created filter H and input image I are to be treated in the element-wise way and Equation ( 4) is used to calculate a response map (or correlation map).
The response map calculated by Equation ( 4) is to predict the weighted peak with likelihood.Then, an isotropic Gaussian is be obtained through this likelihood.Equation (5) gives the isotropic Gaussian of y WP i .y WP i in Equation ( 5) refers to a location having the highest correlation in the response map as shown in Figure 1, whereas ř WP yi refers to a Gaussian covariance matrix in y WP i and the response map.
where p i pq " Response Map.   in Equation ( 5) refers to a location having the highest correlation in the response map as shown in Figure 1, whereas ∑   refers to a Gaussian covariance matrix in    and the response map.
where   () =  .Figure 1 illustrates the visual appearance of the PDM, the MOSSE filter and the response maps, respectively, at five sampled landmarks among 56 on the given face.

𝑦𝑦 𝑖𝑖
and ∑   obtained in above section are used to update the current shape parameter and to perform a shape fitting.The equation to obtain the prior of shape parameter in Bayesian distribution is shown in Equation ( 6) as below.
The calculated prior shape parameter ∑   , as well as    , are used in updating process.The following Equations ( 7)-( 9) present the updating process.
A =  dimensional identity matrix,   = covariance of shape parameter.Using these equations, the shape parameter and pose parameter can be updated concurrently.The updates are repeated 10 times.When the difference between two shape parameters reaches to a threshold, it ceases the updating process.

Head Pose Estimation by the POSIT Algorithm
In this study, the POSIT algorithm is used in estimating the head pose of the driver [16].Head rotation is calculated by using the initialized standard 3D coordinate and current 2D shape coordinate.It calculates rotation and translation based on the correlation between points on the 2D coordinate  obtained in above section are used to update the current shape parameter and to perform a shape fitting.The equation to obtain the prior of shape parameter in Bayesian distribution is shown in Equation (6) as below.
The calculated prior shape parameter ř WP yi , as well as y WP i , are used in updating process.The following Equations ( 7)-( 9) present the updating process.
A = n dimensional identity matrix, P k = covariance of shape parameter.
Appl.Sci.2016, 6, 137 5 of 14 Using these equations, the shape parameter and pose parameter can be updated concurrently.The updates are repeated 10 times.When the difference between two shape parameters reaches to a threshold, it ceases the updating process.

Head Pose Estimation by the POSIT Algorithm
In this study, the POSIT algorithm is used in estimating the head pose of the driver [11].Head rotation is calculated by using the initialized standard 3D coordinate and current 2D shape coordinate.It calculates rotation and translation based on the correlation between points on the 2D coordinate space and corresponding points in the 3D coordinate space.Equation (10) is the formula for the POSIT algorithm.
Here, (X, Y, Z) represents the coordinate of 3D and (u, v) that of 2D, which is obtained based on projection.The camera matrix has intrinsic parameters, which are the center point of image, (c x ,c y ), scale factor (s), and focal length between pixels ( f x , f y ), respectively.The rotation matrix and translation matrix rR|ts are extrinsic parameters that have r ij and t ij , respectively.The POSIT algorithm estimates the rotation matrix and translation matrix in 3D space.
To use the POSIT algorithm in estimating head pose, a 3D face model is built using FaceGen Modeler 3.0, where the size of 3D face model is average and the coordinate of this 3D space is identical with that of the facial feature.The coordinates of generated 3D facial features are used for input to POSIT.

Pose Extended-Active Shape Model
In general, it is known that the head pose of a driver changes continuously since one drives a car while watching the side mirrors and rear mirror in turn.In this study, the range of head movement occurred while driving is utilized to determine whether the driver is drowsy or not.In particular, the change of head pose is significant when the driver is looking at the side-mirror to the right rather than the side-mirror to the left.The change of the driver's head pose and its frequency become an important factor in determining drowsiness of the given driver.
Among many, we can think of two statistical face models, favored in the computer vision community, for the present purpose.One of them is Active Shape Model (ASM) and the other is Active Appearance Model (AAM).Given that the head poses of the normal driver are very diverse and some of them are obviously extreme, the face model certainly needs to deal with such extreme pose cases.However, it is well known that these face models have been effective in estimating the head pose, which is less than about 30 ˝, called the average face shape.Therefore, they cannot deal well in the cases where the head pose is higher than 30 ˝, called the extreme head pose cases.
The reason is that some of extreme shape vectors are lost in carrying out the Principal Component Analysis (PCA) process.Moreover, since the locations of the average face shape significantly differ from those of the extreme shape, it is difficult to correct the fitting error.For instance, DB-ASM model typically uses the average shape model even when the head pose has increased more than a certain angle.In such case, the shape fitting becomes difficult and the fitting error is increased.
In this study, we have extended DB-ASM to include the extreme pose cases as illustrated in Figure 2, called it Pose Extended ASM (PE-ASM), where six extreme head poses are embedded into the basic model.On the right of the same figure, the center is the average shape model, and the other six shape models correspond to the extreme head pose cases, respectively.In other words, we have trained seven different DB-ASMs including the frontal face model.We have used the threshold values, as indicated in Table 1, in categorizing the head pose of the current face.In Section 3.1, it is shown that the POSIT algorithm is used in estimating the head pose of the given face.According to the estimated head pose value, the corresponding shape model is assigned among the seven head pose categories.Therefore, our face model is a head pose-specific model.Once the shape model is assigned, the face fitting starts with it and iterates until the error converges to a certain threshold.Here, the 'Front' case covers the head rotation range from −30° to +30° in three different axis, and the other six cases cover the extreme head rotation occasions as given in Table 1.

Detection of Eye-Blink
It is known that when humans get drowsy, the blood is moving to the end of hands and feet, and the eyes are blinked more often because tear production in the lachrymal glands is reduced.In addition, the blood supply to the brain is also reduced.As a result of these, the person goes into a trance state as the brain activity is naturally reduced.
In the present study, eye-blink of the driver was counted based on the shape point around the eyes as shown in Figure 3.When the eye of a driver blinks, the upper and lower points naturally come closer to each other.The number of blinks can be determined with the following equation.When d in Equation ( 11) is less than 80% to the maximum height of eyes, we consider it as a blink state.To make the ground truth of eye-blinking, three experts are employed: one for the judgment of blinking and two others for its verification.

Hidden Markov Model for Drowsiness Detection
HMM can be classified into the following types upon the modeling of output probabilities: (1) HMM with discrete probabilities distribution predicts the observed output probabilities with the and their corresponding shape models are drawn (Right).Here, the 'Front' case covers the head rotation range from ´30 ˝to +30 ˝in three different axis, and the other six cases cover the extreme head rotation occasions as given in Table 1.
Table 1.Categorization of head pose ranges of our dataset into 6 extremes cases.

Detection of Eye-Blink
It is known that when humans get drowsy, the blood is moving to the end of hands and feet, and the eyes are blinked more often because tear production in the lachrymal glands is reduced.In addition, the blood supply to the brain is also reduced.As a result of these, the person goes into a trance state as the brain activity is naturally reduced.
In the present study, eye-blink of the driver was counted based on the shape point around the eyes as shown in Figure 3.When the eye of a driver blinks, the upper and lower points naturally come closer to each other.Here, the 'Front' case covers the head rotation range from −30° to +30° in three different axis, and the other six cases cover the extreme head rotation occasions as given in Table 1.

Detection of Eye-Blink
It is known that when humans get drowsy, the blood is moving to the end of hands and feet, and the eyes are blinked more often because tear production in the lachrymal glands is reduced.In addition, the blood supply to the brain is also reduced.As a result of these, the person goes into a trance state as the brain activity is naturally reduced.
In the present study, eye-blink of the driver was counted based on the shape point around the eyes as shown in Figure 3.When the eye of a driver blinks, the upper and lower points naturally come closer to each other.The number of blinks can be determined with the following equation.When d in Equation ( 11) is less than 80% to the maximum height of eyes, we consider it as a blink state.To make the ground truth of eye-blinking, three experts are employed: one for the judgment of blinking and two others for its verification.The number of blinks can be determined with the following equation.When d in Equation ( 11) is less than 80% to the maximum height of eyes, we consider it as a blink state.To make the ground truth of eye-blinking, three experts are employed: one for the judgment of blinking and two others for its verification.
d " b pU pper x ´Lower x q 2 ``U pper y ´Lower y ˘2 (11)

Hidden Markov Model for Drowsiness Detection
HMM can be classified into the following types upon the modeling of output probabilities: (1) HMM with discrete probabilities distribution predicts the observed output probabilities with the discrete characteristic of the state; (2) HMM with continuous probabilities distribution uses the probability density function allowing the use of a particular vector of error in the absence of a quantization process for a given output signal; and (3) HMM with semi-continuous distribution poses, combining the advantages of the two aforementioned models-the probability values in HMM with discrete probabilities and the probability density function in the HMM with the continuous probabilities distribution.
Figure 4 presents our HMM structure that consists of three different mental states of the driver.The head pose estimation and eye-blinking detection were used to create the HMM observation symbols and predict the probabilities of the parameter with the Baum-Welch algorithm, calculating the symbol state for each state.Here, it is considered that the transition probability and initial value of symbol probability at each state are equivalent when calculating the transition probability at each state.The predicted probability distribution, as well as the Viterbi algorithm, were used as a test sequence to obtain the sequence at the best state, selecting the best matching state.combining the advantages of the two aforementioned models-the probability values in HMM with discrete probabilities and the probability density function in the HMM with the continuous probabilities distribution.Figure 4 presents our HMM structure that consists of three different mental states of the driver.The head pose estimation and eye-blinking detection were used to create the HMM observation symbols and predict the probabilities of the parameter with the Baum-Welch algorithm, calculating the symbol state for each state.Here, it is considered that the transition probability and initial value of symbol probability at each state are equivalent when calculating the transition probability at each state.The predicted probability distribution, as well as the Viterbi algorithm, were used as a test sequence to obtain the sequence at the best state, selecting the best matching state.For the present study, the GHMM (General Hidden Markov Model) public library, which is a standard HMM package, is used [20].The driver's head position and eye-blinking output are used to predict the state most suitable to the driver's state: one is an eye-blinking HMM and the other a nodding HMM, as shown in Figure 5.For training of the two HMMs, the videos are divided into many segments and each segment consists of 15 frames.The total is about 23,000 frames, and 70% of them are used for training and 30% of them for testing.The recognition rate during the training phase was 98% and during the testing phase was 92%.As training is carried out, the nodding and eye-blink states are enumerated to determine drowsy level of the given driver as shown in Figure 4.The mental state of the driver is categorized into: normal, warning, and alert.The state transition among the three states is determined by weights, as indicated in Figure 4.For the present study, the GHMM (General Hidden Markov Model) public library, which is a standard HMM package, is used [18].The driver's head position and eye-blinking output are used to predict the state most suitable to the driver's state: one is an eye-blinking HMM and the other a nodding HMM, as shown in Figure 5.For training of the two HMMs, the videos are divided into many segments and each segment consists of 15 frames.The total is about 23,000 frames, and 70% of them are used for training and 30% of them for testing.The recognition rate during the training phase was 98% and during the testing phase was 92%.As training is carried out, the nodding and eye-blink states are enumerated to determine drowsy level of the given driver as shown in Figure 4.The mental state of the driver is categorized into: normal, warning, and alert.The state transition among the three states is determined by weights, as indicated in Figure 4.
nodding HMM, as shown in Figure 5.For training of the two HMMs, the videos are divided into many segments and each segment consists of 15 frames.The total is about 23,000 frames, and 70% of them are used for training and 30% of them for testing.The recognition rate during the training phase was 98% and during the testing phase was 92%.As training is carried out, the nodding and eye-blink states are enumerated to determine drowsy level of the given driver as shown in Figure 4.The mental state of the driver is categorized into: normal, warning, and alert.The state transition among the three states is determined by weights, as indicated in Figure 4.  Figure 6 presents the criteria in determining the driver's final mental state upon the variation of state.It is noticeable that when the eye-blinking and head nodding occur together, the system consider the driver's state as asleep.
Appl.Sci.2016, 6, 137 8 of 13 Figure 6 presents the criteria in determining the driver's final mental state upon the variation of state.It is noticeable that when the eye-blinking and head nodding occur together, the system consider the driver's state as asleep.

Comparision between AAM and DB-ASM
IMM Face DB was used to evaluate the performance of DB-ASM and AAM [18].The Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) are used in measuring the fitting error.The former is defined in Equation ( 12).Here, n refers to the total frames and yi indicate the ground truth: value for the pose, whereas fi refers to the estimated pose.The latter is defined as Equation (13).
Results indicate that DB-ASM outperforms AAM in terms of MAE as well as RMSE, as shown in Table 2, partly because DB-ASM is a recent face model.In any case, this is good news since we are going to adopt DB-ASM as our basic face model for the present study.

Comparision between AAM and DB-ASM
IMM Face DB was used to evaluate the performance of DB-ASM and AAM [19].The Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) are used in measuring the fitting error.The former is defined in Equation ( 12).Here, n refers to the total frames and y i indicate the ground truth: value for the pose, whereas f i refers to the estimated pose.The latter is defined as Equation (13).
RMSE psq " s gt = ground truth, s shape point.Results indicate that DB-ASM outperforms AAM in terms of MAE as well as RMSE, as shown in Table 2, partly because DB-ASM is a recent face model.In any case, this is good news since we are going to adopt DB-ASM as our basic face model for the present study.In addition, it is well known that DB-ASM is robust against partial occlusion cases.Our demonstration confirms that it is very reliable even against several occlusion cases, such as 20% and 40% occlusion, respectively, as illustrated in Figure 7.We have seen that the overall fitting is severely damaged when the same occlusions are applied to AAM.

Setup for Collecting Our Custom-Made Driving Database
Since our goal is to determine the driver's drowsiness using two drowsiness-related parameters, i.e., head nodding and eye-blink, we need a driving database that should be collected from a moving vehicle environment.However, it is not easy to collect such data simply because it could be very dangerous to manage sleepy drivers in the moving car.One possible solution is to let the subject drivers to simulate a drowsy driver in the vehicle according to the certain scenarios that may include a few drowsiness-related behaviors.
Two scenarios are prepared: the normal driving and the drowsy driving.For the former, the subject is supposed to look at the side and rear mirrors sporadically with a normal attentive forward viewing and, yet, to drive with natural blinking.For the latter, the subject blinks often, like a drowsy driver.
During the whole recording session, each subject wears a gyro sensor module mounted in a white box that is again attached to a black headband as shown to the left of Figure 8.The sensor is responsible to measure 3D head movement of the driver, and the collected data are sent on a real-time basis to the receiver, which connects to a notebook PC within the vehicle.This setup allows each subject to focus only on a given scenario, resulting in improved usability compared to the previous studies.We have collected two streams of video concurrently using two cameras: one for the color image and the other for black and white image.The former is used for the basis, whereas the latter works with an IR (infra-red) LED panel to dealing with the night driving.Each video was collected by synchronizing with the gyro-sensor.In building the DB, all videos are recorded with a 320 × 240 image size, 200 frames, and two different lighting conditions.On the right of Figure 8, the whole setup is installed in front of the driver in the vehicle.Figure 9 depicts four images taken during the database-building process.Table 3 shows the head pose ranges for pitch, roll, and yaw, respectively.Notice that the yaw angle is greater than other two head rotations because drivers mainly looking out the front window or at the two side mirrors while driving [21].

Setup for Collecting Our Custom-Made Driving Database
Since our goal is to determine the driver's drowsiness using two drowsiness-related parameters, i.e., head nodding and eye-blink, we need a driving database that should be collected from a moving vehicle environment.However, it is not easy to collect such data simply because it could be very dangerous to manage sleepy drivers in the moving car.One possible solution is to let the subject drivers to simulate a drowsy driver in the vehicle according to the certain scenarios that may include a few drowsiness-related behaviors.
Two scenarios are prepared: the normal driving and the drowsy driving.For the former, the subject is supposed to look at the side and rear mirrors sporadically with a normal attentive forward viewing and, yet, to drive with natural blinking.For the latter, the subject blinks often, like a drowsy driver.
During the whole recording session, each subject wears a gyro sensor module mounted in a white box that is again attached to a black headband as shown to the left of Figure 8.The sensor is responsible to measure 3D head movement of the driver, and the collected data are sent on areal-time basis to the receiver, which connects to a notebook PC within the vehicle.This setup allows each subject to focus only on a given scenario, resulting in improved usability compared to the previous studies.We have collected two streams of video concurrently using two cameras: one for the color image and the other for black and white image.The former is used for the basis, whereas the latter works with an IR (infra-red) LED panel to dealing with the night driving.Each video was collected by synchronizing with the gyro-sensor.In building the DB, all videos are recorded with a 320 ˆ240 image size, 200 frames, and two different lighting conditions.On the right of Figure 8, the whole setup is installed in front of the driver in the vehicle.Figure 9 depicts four images taken during the database-building process.Table 3 shows the head pose ranges for pitch, roll, and yaw, respectively.Notice that the yaw angle is greater than other two head rotations because drivers mainly looking out the front window or at the two side mirrors while driving [20].
previous studies.We have collected two streams of video concurrently using two cameras: one for the color image and the other for black and white image.The former is used for the basis, whereas the latter works with an IR (infra-red) LED panel to dealing with the night driving.Each video was collected by synchronizing with the gyro-sensor.In building the DB, all videos are recorded with a 320 × 240 image size, 200 frames, and two different lighting conditions.On the right of Figure 8, the whole setup is installed in front of the driver in the vehicle.Figure 9 depicts four images taken during the database-building process.Table 3 shows the head pose ranges for pitch, roll, and yaw, respectively.Notice that the yaw angle is greater than other two head rotations because drivers mainly looking out the front window or at the two side mirrors while driving [21].Our computing setup for analysis consists of i7 CPU with 4 GB main memory, and the software package includes visual studio 2010, OpenCV, and OpenMP.The face tracker based on the present DB-ASM runs at 16-20 frames/second, in the real-time basis.

Performance Comparision between PE-ASM and Other Face Models
The Boston University (BU) head pose DB [19] is the standard head pose dataset which provides ground truth of each subject's head pose movements.It has been used in many previous studies in evaluating the head pose estimation task.However, the pose variation in the BU DB was not significant and, normally, it was less than 30°.Thus, it requires building a new database in which the range of head poses is large, as is often seen in the natural driving condition.
We have tested four face models such as AAM, ST-ASM, DB-ASM, and PE-ASM using the above two datasets.It is found that PE-ASM outperforms other three face models regarding the yaw fitting, which is the most important head pose component among three, for the BU DB case as shown in the top of Figure 10.Yet, performance between DB-ASM and PE-ASM for three components is not significant, which is in fact expected because the head pose ranges of BU DB are mostly less than 30°.The bottom panel of Figure 10 shows the result for our custom-made DB, where PE-ASM and DB-ASM outperform other two models in general.Notice that performance for the yaw is significantly improved with PE-ASM and DB-ASM.Overall performance of PE-ASM is better than that of DB-  Our computing setup for analysis consists of i7 CPU with 4 GB main memory, and the software package includes visual studio 2010, OpenCV, and OpenMP.The face tracker based on the present DB-ASM runs at 16-20 frames/second, in the real-time basis.

Performance Comparision between PE-ASM and Other Face Models
The Boston University (BU) head pose DB [21] is the standard head pose dataset which provides ground truth of each subject's head pose movements.It has been used in many previous studies in evaluating the head pose estimation task.However, the pose variation in the BU DB was not significant and, normally, it was less than 30 ˝.Thus, it requires building a new database in which the range of head poses is large, as is often seen in the natural driving condition.
We have tested four face models such as AAM, ST-ASM, DB-ASM, and PE-ASM using the above two datasets.It is found that PE-ASM outperforms other three face models regarding the yaw fitting, which is the most important head pose component among three, for the BU DB case as shown in the top of Figure 10.Yet, performance between DB-ASM and PE-ASM for three components is not significant, which is in fact expected because the head pose ranges of BU DB are mostly less than 30 ˝.The bottom panel of Figure 10 shows the result for our custom-made DB, where PE-ASM and DB-ASM outperform other two models in general.Notice that performance for the yaw is significantly improved with PE-ASM and DB-ASM.Overall performance of PE-ASM is better than that of DB-ASM, especially when the head pose is within the extreme head pose range.i.e., eye-blinking and head nodding.The bottom panel depicts decision of drowsiness according to the HMM state transition diagram.Here the colors are coded as shown in Figure 6.Note that whenever the eye-blinking and the head nodding are overlapped, the red sign, i.e., asleep, is flashed.

Conclusions
The conventional driver drowsiness detection systems are vulnerable to extreme head rotation and occlusion.This study presents a new method to deal with such problems.First, it is shown that DB-ASM is more robust against occlusion than AAM; secondly, given that extreme head poses occur frequently during the natural driving condition in the car, PE-ASM is an extended version of DB-ASM by incorporating six extreme head pose cases into it.Evaluation result using BU face DB and our custom face DB suggests that it is particularly reliable against extreme head poses; thirdly, since our drowsiness detection system is based on the status of two facial features-head nodding and eyeblinking-it requires to make a judgment based on these information.HMMs are used in inferring whether the subject is drowsy or not in time sequence.It runs in the real-time basis on a PC.Our result suggests that it has a potential application in a commercial vehicle, in which the visual conditions are diverse and difficult to deal with.

Supplementary Materials:
The following are available online at www.mdpi.com/2072-6651/6/5/137/s1,Video S1.This video demonstrates how both eye-bling and nodding of the driver are estimated and then Markov model makes inference according to their states in real-time.

Figure 1 .
Figure 1.Appearance of MOSSE (Minimum Output Sum of Squared Error)filter and response map on a PDM (Point Distribution Model)shape: the PDM shape on a given face having 56 landmarks (left); the MOSSE filters at five sampled locations (middle); and their corresponding response maps (right).

Figure 1 .
Figure 1.Appearance of MOSSE (Minimum Output Sum of Squared Error)filter and response map on a PDM (Point Distribution Model)shape: the PDM shape on a given face having 56 landmarks (left); the MOSSE filters at five sampled locations (middle); and their corresponding response maps (right).

Figure 1
Figure 1 illustrates the visual appearance of the PDM, the MOSSE filter and the response maps, respectively, at five sampled landmarks among 56 on the given face.2.3.Shape Update y WP i and ř WP yi

Figure 2 .
Figure 2. Pose Extended Active Shape Model (PE-ASM) is applied to the six extreme pose cases (Left)and their corresponding shape models are drawn (Right).Here, the 'Front' case covers the head rotation range from −30° to +30° in three different axis, and the other six cases cover the extreme head rotation occasions as given in Table1.

Figure 3 .
Figure 3.The geometric model for the eye.

Figure 2 .
Figure 2. Pose Extended Active Shape Model (PE-ASM) is applied to the six extreme pose cases (Left)and their corresponding shape models are drawn (Right).Here, the 'Front' case covers the head rotation range from ´30 ˝to +30 ˝in three different axis, and the other six cases cover the extreme head rotation occasions as given in Table1.

Figure 2 .
Figure 2. Pose Extended Active Shape Model (PE-ASM) is applied to the six extreme pose cases (Left)and their corresponding shape models are drawn (Right).Here, the 'Front' case covers the head rotation range from −30° to +30° in three different axis, and the other six cases cover the extreme head rotation occasions as given in Table1.

Figure 3 .
Figure 3.The geometric model for the eye.

Figure 3 .
Figure 3.The geometric model for the eye.

Figure 4 .
Figure 4. Markov model state transition diagram in which there are three possible states, and state transition occurs according to the transition weights, as indicated.

Figure 4 .
Figure 4. Markov model state transition diagram in which there are three possible states, and state transition occurs according to the transition weights, as indicated.

Figure 6 .
Figure 6.The criteria for determining driver's state upon changes in nodding and blinking HMM (Hidden Markov Model)states.

Figure 6 .
Figure 6.The criteria for determining driver's state upon changes in nodding and blinking HMM (Hidden Markov Model)states.

Figure 7 .
Figure 7. Robust fitting of DB-ASM against several occlusion cases.

Figure 8 .
Figure 8.A wireless gyro-sensor module mounted in a white box and its receiver (left), and cameras and other setup installed in the vehicle (right).Figure 8.A wireless gyro-sensor module mounted in a white box and its receiver (left), and cameras and other setup installed in the vehicle (right).

Figure 8 . 13 Figure 9 .
Figure 8.A wireless gyro-sensor module mounted in a white box and its receiver (left), and cameras and other setup installed in the vehicle (right).Figure 8.A wireless gyro-sensor module mounted in a white box and its receiver (left), and cameras and other setup installed in the vehicle (right).Appl.Sci.2016, 6, 137 10 of 13

Figure 9 .
Figure 9. Images for four subjects taken from our custom-made driving face database.The driver often nods without looking at the side and rear mirrors.In our custom-made DB, subjects are asked to move their head to the extreme head poses while driving naturally.The yaw, pitch, and roll of the extreme poses are greater than 30 ˝for four subjects and the image size is 640 ˆ480 with 500 frames length for each case.

13 Figure 10 .
Figure 10.Performance comparison between four face models using BU DB (Boston University database) (top) and our custom-made database (bottom).Three different head rotations are indicated with blue, red, and green, respectively.

Figure 11
Figure 11 illustrates the head pose estimations by AAM and PE-ASM, respectively.The top graphs (a,b) are obtained by applying two face models to BU DB, whereas the bottom ones (c,d) from our custom face DB.Given that the dotted lines in these graphs are the ground truth values, PE-ASM estimates the subject's head pose reasonably well, whereas AAM cannot accomodate the extreme head cases.The blue arrows indicate when each subject made his head rotation.The top subject's image came from BU DB, whereas the bottom one is from our custom-made face DB.

Figure 10 .
Figure 10.Performance comparison between four face models using BU DB (Boston University database) (top) and our custom-made database (bottom).Three different head rotations are indicated with blue, red, and green, respectively.

Figure 11
Figure 11 illustrates the head pose estimations by AAM and PE-ASM, respectively.The top graphs (a,b) are obtained by applying two face models to BU DB, whereas the bottom ones (c,d) from our custom face DB.Given that the dotted lines in these graphs are the ground truth values, PE-ASM estimates the subject's head pose reasonably well, whereas AAM cannot accommodate the extreme head cases.The blue arrows indicate when each subject made his head rotation.The top subject's image came from BU DB, whereas the bottom one is from our custom-made face DB.

13 Figure 6 .
Figure 6.Note that whenever the eye-blinking and the head nodding are overlapped, the red sign, i.e., asleep, is flashed.

Figure 12 .
Figure 12.Detection of driver's drowsiness using PE-ASM (Pose Extended -Active Shape Model) and HMM: PE-ASM is applied to the images of a driver (top panel), eye-blinking (2 nd panel), head pose (3 rd panel), HMM states (4 th panel) and the decision of driver's drowsiness (bottom panel).Note that whenever blinking and nodding HMM states are overlapped, the system detects drowsiness of the driver, indicated as red rectangles, as shown in the bottom panel.

Table 2 .
Comparison of fitting performance between AAM (Active Appearance Model)and DB-ASM (Discriminative Bayesian-Active Shape Model).

Table 2 .
Comparison of fitting performance between AAM (Active Appearance Model)and DB-ASM (Discriminative Bayesian-Active Shape Model).

Table 3 .
Head pose ranges of four subjects in our custom-made face database.

Table 3 .
Head pose ranges of four subjects in our custom-made face database.