Detecting Driver Drowsiness Using Hybrid Facial Features and Ensemble Learning

Xu, Changbiao; Huang, Wenhao; Liu, Jiao; Li, Lang

doi:10.3390/info16040294

Open AccessArticle

Detecting Driver Drowsiness Using Hybrid Facial Features and Ensemble Learning

by

Changbiao Xu

,

Wenhao Huang

^*

,

Jiao Liu

and

Lang Li

School of Optoelectronic Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

^*

Author to whom correspondence should be addressed.

Information 2025, 16(4), 294; https://doi.org/10.3390/info16040294

Submission received: 21 February 2025 / Revised: 19 March 2025 / Accepted: 1 April 2025 / Published: 7 April 2025

(This article belongs to the Special Issue Machine Learning and Artificial Intelligence with Applications)

Download

Browse Figures

Versions Notes

Abstract

Drowsiness while driving poses a significant risk in terms of road safety, making effective drowsiness detection systems essential for the prevention of accidents. Facial signal-based detection methods have proven to be an effective approach to drowsiness detection. However, they bring challenges arising from inter-individual differences among drivers. Variations in facial structure necessitate personalized feature extraction thresholds, yet existing methods apply a uniform threshold, leading to inaccurate feature extraction. Furthermore, many current methods focus on only one or two facial regions, overlooking the possibility that drowsiness may manifest differently across different facial areas among different drivers. To address these issues, we propose a drowsiness detection method that combines an ensemble model with hybrid facial features. This approach enables the accurate extraction of features from four key facial regions—the eye region, mouth contour, head pose, and gaze direction—through adaptive threshold correction to ensure comprehensive coverage. An ensemble model, combining Random Forest, XGBoost, and Multilayer Perceptron with a soft voting criterion, is then employed to classify the drivers’ drowsiness state. Additionally, we use the SHAP method to ensure model explainability and analyze the correlations between features from various facial regions. Trained and tested on the UTA-RLDD dataset, our method achieves a video accuracy (VA) of 86.52%, outperforming similar techniques introduced in recent years. The interpretability analysis demonstrates the value of our approach, offering a valuable reference for future research and contributing significantly to road safety.

Keywords:

drowsiness detection; computer vision; adaptive threshold; machine learning; eye aspect ratio; mouth aspect ratio; head pose estimation; gaze estimation

1. Introduction

Survey data from various countries highlight the significant threat that drowsiness while driving poses to road safety. In the United States, nearly 100,000 traffic accidents per year are attributed to drowsiness while driving, leading to numerous casualties and substantial property damage [1]. In Australia, drowsiness while driving is a direct factor in 20–30% of serious traffic accidents [2]. Similarly, in the European Union, approximately 20% of traffic accidents are linked to drowsiness while driving, with related incidents accounting for 49% of traffic accident casualties—considerably higher than other types of accidents [3,4]. These statistics emphasize the urgent need for vehicles to be equipped with reliable drowsiness detection systems to prevent accidents caused by drowsiness while driving and enhance road safety. The research methods applied to develop drowsiness detection systems can be broadly categorized into three types based on the data utilized: vehicle and driving behavior-based methods, physiological signal-based methods, and facial signal-based methods [5]. Vehicle and driving behavior-based methods consider parameters such as the steering wheel angle, grip strength [6,7], lane deviation [8], and braking pressure [9]. Physiological signal-based methods rely on measurements such as electroencephalograms (EEGs) [10,11,12], electrocardiograms (ECGs) [13,14], electrooculograms (EOGs) [15], and electromyograms (EMGs) [16]. Facial signal-based methods focus on features such as the blink duration (BD) [17,18], blink frequency (BF) [19,20], and blink amplitude (BA) [21]. These methods are further classified into invasive and non-invasive approaches based on the level of intrusion involved in data collection [22]. Invasive methods require direct contact with the driver, such as wearing an electrode cap for an EEG [23], placing electrodes near the eyes for an EOG [24], or attaching them to the chest for an ECG [25]. Non-invasive methods, on the other hand, utilize non-contact sensors, including pressure sensors integrated into car seats [26,27] and seatbelts [28], or cameras to capture images of the steering wheel [29] and the driver’s face [30,31], to extract drowsiness-related features.

Drowsiness detection methods can be categorized by data type, and each has its own strengths and limitations in terms of their accuracy, cost, and practicality [32]. Vehicle and driving behavior-based methods rely on sensors to monitor parameters such as steering wheel movements, the accelerator pedal pressure, and lane deviations [33,34]. However, these methods only reflect drowsiness indirectly and are influenced by external factors such as traffic, driving habits, and the weather, leading to lower accuracy and unstable performance [35,36,37,38,39,40]. They also require multiple sensors, making them the most expensive option. While highly accurate and reliable [41,42], physiological signal-based methods are typically invasive, causing discomfort to drivers and requiring controlled environments for effective operation, limiting their widespread use [43,44,45]. Attempts to introduce non-contact alternatives, such as integrating ECG sensors into clothing or using Bluetooth devices for heart rate monitoring [46], often bring issues associated with sensor errors, motion artifacts, and the high costs of precision sensors [47,48,49]. Additionally, these methods can interfere with driving and cause drivers to alter their behavior, further restricting their practical application [50]. On the other hand, facial signal-based methods, which represent non-invasive approaches, rely on in-vehicle cameras to capture images and extract features such as the blink frequency, yawning, and nodding using facial key-point models or deep convolutional neural networks [51,52,53]. These methods are not only accurate and reliable but also cost-effective and comfortable in implementation, as they do not require direct contact with the driver [54]. As a result, facial signal-based methods have become the most popular and practical approaches to drowsiness detection [55].

Facial signal-based drowsiness detection methods typically assess a driver’s drowsiness level by extracting features from specific facial regions. For instance, Bakheet and Al-Hamadi [56] used the histogram of oriented gradients (HOG) to extract features from eye images, classifying them with a naive Bayes network (BN). Xu et al. [57] combined eye movement data with a convolutional neural network (CNN), incorporating an attention mechanism for classification. Similarly, Teyeb et al. [58] evaluated drowsiness based on head tilt angles, while Alioua et al. [59] relied solely on changes in the mouth contours to detect yawning as an indicator of drowsiness. However, inter-individual differences among drivers and the varied manifestations of drowsiness across different facial regions present challenges for methods focusing on a single region, limiting their ability to accurately capture a driver’s true drowsiness state [60]. For example, the percentage of eyelid closure over the pupil over time (PERCLOS), which is a widely used drowsiness detection metric specified by the Federal Highway Administration (FHWA) in the United States [61,62,63], has been found to be ineffective in detecting drowsiness in some highly stimulated drivers, potentially resulting in false alarms [64].

Inter-individual differences among drivers pose significant challenges to the generalization and effectiveness of drowsiness detection methods that rely on features from a single facial region. These differences not only reduce the generalization potential of such methods but also reduce the effectiveness of traditional feature extraction techniques. For instance, Albadawi et al. [65] set average extraction thresholds by statistically analyzing the eye aspect ratio and mouth aspect ratio distributions of 18 participants. However, Ingre et al. [66] highlighted that such average thresholds fail to capture the personalized data variations in some drivers in drowsy states. Esse et al. [67] attempted to address this by proposing a dynamic feature extraction method that adjusts the thresholds based on the minimum number of frames, but they still could not effectively account for inter-individual differences by setting driver-specific thresholds. As a result, traditional methods often exhibit poor feature extraction performance in certain drivers [68]. To mitigate traffic accidents caused by drowsiness while driving, it is crucial to consider inter-individual differences more comprehensively and develop feature extraction strategies that are adaptable to a broader range of drivers.

Seeking to overcome the challenges posed by inter-individual differences among drivers, the integration of features from multiple facial regions has become a common approach in drowsiness detection research. For instance, Chen et al. [69] and Guo and Markoni [70] combined eye and mouth images, extracted features using a convolutional neural network (CNN), and classified them using a long short-term memory network (LSTM). Similarly, Huynh et al. [71] utilized features such as the eye closure time, head nodding, and yawning, employing a 3D convolutional neural network (3D CNN) for drowsiness classification. Gao et al. [72] focused on gaze direction and head pose features and applied Transformer models for classification. Research indicates that increasing the variety and quantity of facial region features enables us to leverage their complementary advantages, effectively addressing the challenges posed by inter-individual differences in drowsiness detection. However, the existing methods often rely on features from only one or two facial regions, failing to provide comprehensive coverage. This limitation hinders their ability to accurately represent the diverse manifestations of drowsiness across different drivers, thereby restricting the potential for detection performance improvements. To address this, we design a more diverse set of facial region features to better capture the varied drowsiness characteristics of different driver groups and enhance the detection accuracy.

Although previous studies have combined facial region features with various machine learning and deep learning techniques to classify drowsiness levels, relying on a single model can lead to overfitting, as it may be misled by data in specific feature spaces. In contrast, integrating multiple models allows for the combination of their respective advantages, compensating for the limitations of any one model and thus improving the accuracy and robustness of the detection process [73,74].

To address the above challenges, this study proposes an enhanced drowsiness detection framework. First, based on facial key points extracted using the MediaPipe Face Mesh [75], we developed an adaptive threshold correction method to ensure accurate feature extraction across different drivers. Next, we designed 20 features from four facial regions—the eyes, mouth, head pose, and gaze direction—to capture a comprehensive set of facial characteristics. Building upon the standardization strategy proposed by Ghoddoosian et al. [76], we standardized each participant’s drowsiness data using their alert state distribution, minimizing inter-participant feature differences while preserving the value changes within the same participant across alert and drowsy states. Then, we trained and fine-tuned three machine learning models—random forest (RF), a multilayer perceptron (MLP), and extreme gradient boosting (XGBoost)—integrating their advantages through a soft voting strategy to mitigate the impact of the shortcomings of any individual model. Finally, we conducted an interpretability analysis using the Shapley Additive Explanations (SHAP) method to examine the influence of each feature on the model’s predictions. In summary, this study introduces a robust drowsiness detection approach that addresses inter-individual differences among drivers using adaptive threshold correction, comprehensive feature design, a standardization strategy, and an ensemble learning framework. Additionally, the SHAP-based interpretability analysis provides valuable insights into how each facial region feature contributes to the model’s decision-making, offering a reference for future research.

This study offers the following contributions.

(1): To tackle the issue of inaccurate feature extraction due to inter-individual differences among drivers, we developed an adaptive threshold correction method that ensures precise feature extraction across various drivers.
(2): To address the limitations of existing methods that consider a limited set of facial features and struggle to represent the diverse manifestations of drowsiness among different drivers, we designed 20 features spanning four facial regions—the eyes, mouth, head pose, and gaze direction—and implemented a personalized standardization strategy during preprocessing, achieving the comprehensive coverage of the facial region’s features.
(3): To overcome the limitations of relying on a single model, we integrated multiple models, combining the strengths of various machine learning techniques to construct a robust and high-precision driver drowsiness detection system.
(4): We employed the Shapley Additive Explanations (SHAP) method to analyze the model’s decision-making process, uncovering the importance of different features in the model’s predictions and providing valuable insights for future research in this field.

The remainder of this paper is organized as follows: Section 2 presents the methodology proposed in this study, Section 3 discusses the experimental results, and Section 4 concludes the paper and outlines potential directions for future research.

2. Materials and Methods

This section introduces the materials and methods used in this study. Figure 1 illustrates the overall block diagram and system flowchart of the proposed method. The following subsections provide a detailed explanation of each component of the method.

2.1. Dataset

Selecting an appropriate dataset helps ensure the reliability of our experiments. Several datasets have been proposed for driver drowsiness detection research. The NTHU-DDD dataset [77] consists of 9.5 h of video footage of 18 participants, categorized into drowsy and non-drowsy states. This dataset provides sufficient video length, but the recording environment is uniform. Although the dataset can be used for drowsiness detection, it does not fully replicate real driving conditions. The YawDD dataset [78] consists of 2.6 h of video footage of 119 participants, labeled as Normal, Talking, or Yawning. The videos are further categorized based on whether the driver wears glasses. This dataset is one of the largest in terms of participant count, but each driver contributes only 2–3 min of video and typically presents only one state, either alert or drowsy. The NITYMED dataset [79] consists of 4 h of video of 21 participants, labeled as Yawning or Microsleep. The videos were recorded in various real driving environments; however, the dataset does not provide videos of drivers in an alert state for comparison. To address inter-individual differences in drowsiness detection, the dataset must comprise a diverse set of participants, each providing long video segments across multiple drowsiness levels for thorough analysis. Considering these factors, we selected the UTA-RLDD dataset for this study. The UTA-RLDD dataset, collected by Ghoddoosian et al. [76], is the largest real-world drowsiness detection dataset to date. Compared to similar datasets [77,78,79], UTA-RLDD includes data from 60 volunteers of diverse ethnic backgrounds and ages, with a total duration of 30 h. Specifically, the dataset includes video recordings of each participant in three distinct drowsiness states in a simulated driving environment: (1) alert, (2) mildly drowsy, and (3) severely drowsy. The videos were captured by the participants using mobile phones or webcams and contain various angles, backgrounds, and lighting conditions, reflecting typical driving environments. Figure 2 provides examples from the UTA-RLDD dataset.

The videos in the UTA-RLDD dataset primarily focus on the transition of participants from an alert state to a severely drowsy state. In this study, we aimed to determine a driver’s drowsiness level through fine-grained time windows. To meet the research requirements, we divided each video into 30 s segments, with each segment representing an independent data sample that reflected a single drowsiness state. To minimize the risk of overfitting due to strong correlations between features in adjacent time periods, we ensured that no segments overlapped during the clipping process. The chosen time window length of 30 s aligned with the method used for the NTHU-DDD dataset [77], ensuring sufficient data for model training while maintaining the data’s diversity. Finally, we utilized the video labels from the original dataset and labeled each video segment according to the video observation method proposed by He et al. [80]. To minimize observer bias, we excluded the corresponding data when the original label did not align with the label assigned by the observer. After labeling, the resulting video segments formed our new dataset, which consisted of a total of 3376 video segments. These segments were categorized as follows: 1157 in the alert category, 1089 in the mildly drowsy category, and 1130 in the severely drowsy category. The label distribution across each category was balanced.

2.2. Feature Extraction

Based on the labeled video segments, we designed a set of features for drowsiness detection. While traditional methods often rely on features from a single facial region and specific types of data, the method proposed in this study consists of a more comprehensive approach, considering features from four facial regions: the eye region, mouth contour, head pose, and gaze direction. Some of these features were directly calculated from the 2D coordinates of facial key points extracted with the MediaPipe Face Mesh, while others were indirectly derived by combining the original data with statistical information. Table 1 presents a detailed list of all 20 features used for driver drowsiness detection, where all features except YCI were collected from the sampling window of individual samples, each with a duration of 30 s.

2.2.1. Eye Region Features

Eye region features are among the most used features in drowsiness detection methods based on facial signals. In this study, the eye aspect ratio at a given moment,

E A R_{i}

, was calculated using the 2D coordinates of the facial key points extracted with the MediaPipe Face Mesh, in conjunction with the eye aspect ratio calculation method proposed by Soukupová and Čech [81]. Equation (1) calculates the eye aspect ratio at a specific time by taking the ratio of the mean vertical eye landmark distance to the horizontal eye landmark distance:

E A R_{i} = \frac{Mean (Dis (E_{2}, E_{6}), Dis (E_{3}, E_{5}))}{Dis (E_{1}, E_{4})}

(1)

where

E A R_{i}

indicates the degree to which the driver’s eyes are open at time

i

.

Mean (a, b)

represents the mean of the input parameters, and

Dis (a, b)

represents the Euclidean distance of the input parameters.

E_{n}

represents the 2D coordinates of the key points in the eye region, where the coordinate indices corresponding to the left eye are (362, 380, 373, 263, 387, 385), and the coordinate indices corresponding to the right eye are (33, 144, 153, 133, 158, 160); these are arranged in sequence. Figure 3 provides an illustration in which key eye points are annotated.

When the eyes are open,

E A R_{i}

tends to remain stable, while it decreases significantly when the eyes close. The threshold used to detect these changes was determined via the contrast-adaptive threshold correction method. This threshold allows for the identification of the start and end points of a blink event. Whenever a blink occurs, the feature representing the blink count of the driver,

B C

, increases accordingly. This feature helps to track the frequency of blinks, which is a key indicator of drowsiness. Combining

E A R_{i}

with the method based on the eye region features defined by Ghoddoosian et al. [76], the eye region features can be further extracted, including the blink duration

B D

, eye closure duration

M E C

, eye movement amplitude

A m p

, eye opening velocity

E O V

,

P e r c l o s

, and the average eye aspect ratio

{EAR}_{AVG}

.

Equation (2) defines the calculation method for the blink duration,

B D

:

B D = \frac{\sum_{i = 1}^{n} e n d_{i} - s t a r t_{i} + 1}{n}

(2)

where

e n d_{i}

represents the frame number at the end of the

i

-th blink process within the time window, while

s t a r t_{i}

represents the frame number at the start of the

i

-th blink process within the time window.

Equation (3) defines the calculation method for the eye movement amplitude,

A m p

:

A m p = \frac{E A R_{S t a r t} + E A R_{E n d} - 2 (E A R_{P e a k})}{2}

(3)

where

E A R_{S t a r t}

represents the

E A R

value at the start of the blink process,

E A R_{E n d}

is the

E A R

value at the end of the blink process, and

E A R_{P e a k}

is the lowest

E A R

value during the entire blink process.

Equation (4) defines the calculation method for the eye opening velocity,

E O V

:

E O V = \frac{E A R_{E n d} - E A R_{P e a k}}{T i m e_{O p e n}}

(4)

where

T i m e_{O p e n}

represents the number of frames that elapse from the complete closure to the complete opening of the eyes during the blink process.

The percentage of eye closure,

P e r c l o s

, is a crucial indicator for the assessment of drowsiness levels. Equation (5) defines the method used to calculate the proportion of time for which the eyes are closed within a given time window:

P e r c l o s = \frac{M E C}{L}

(5)

where

M E C

represents the eye closure duration (defined as the total number of frames during which the eyes are closed within the sampling time window), and

L

refers to the frame length of the sampling time window.

2.2.2. Mouth Contour Features

Similarly to

E A R_{i}

, the mouth aspect ratio

M A R_{i}

reflects the degree to which the driver’s mouth is open at time

i

, as shown in Equation (6):

M A R_{i} = \frac{Mean (Dis (M_{2}, M_{8}), Dis (M_{3}, M_{7}), Dis (M_{4}, M_{6}))}{Dis (M_{1}, M_{5})}

(6)

Here,

M_{n}

represents the 2D coordinates of the key points along the mouth contour, with corresponding coordinate indices (62, 180, 16, 404, 292, 271, 12, 41). When the driver’s mouth is closed, the

M A R_{i}

value remains stable, but it increases significantly during a yawn. To differentiate between normal speaking and yawning, we apply the threshold relative to

M A R_{i}

, determined via the adaptive threshold correction method, to identify the start and end points of a yawning event. Figure 4 provides an illustration in which key mouth points are annotated.

When a yawning event occurs, the feature representing the driver’s yawn count,

Y C

, will increase by 1. The yawn count in the most recent sampling time window,

Y C I

, will increase each time a yawning event is detected. Additionally, this count will gradually decrease at a rate of 0.25 every 30 s until it drops back to 0. This gradual decay helps to maintain the relevance of recent yawning events while reducing the impact of older events over time.

2.2.3. Head Pose Features

In this study, we employed the POSIT algorithm from reference [82,83] to estimate the driver’s head pose. By combining the real-time 2D coordinates of the head with pre-modeled 3D coordinates, the rotation and translation vectors of the head can be calculated, as illustrated in Equation (7):

s [\begin{array}{l} x \\ y \\ 1 \end{array}] = [\begin{matrix} f_{x} & 0 & c_{x} \\ 0 & f_{y} & c_{y} \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} R_{h} & T_{h} \end{matrix}] [\begin{array}{l} u \\ v \\ w \\ 1 \end{array}]

(7)

where

(u, v, w)

represents the pre-modeled 3D coordinates, and

(x, y)

refers to the real-time 2D coordinates. The 2D coordinates are located at specific facial points: the left eye corner, right eye corner, forehead, tip of the nose, left side of the mouth, and right side of the mouth, with corresponding indices (1, 9, 57, 130, 287, 359) in sequence. Figure 5 provides an illustration in which key head pose points are annotated. The focal length between pixels

(f_{x}, f_{y})

and the center point of the image

(c_{x}, c_{y})

form the camera intrinsic parameter matrix;

s

is a scale factor representing the depth information. The POSIT algorithm uses these parameters to estimate the rotation matrix

R_{h}

and translation vector

T_{h}

of the head. These two vectors together constitute the expanded form of the affine transformation matrix, as shown in Equation (8):

[\begin{matrix} R_{h} & T_{h} \end{matrix}] = [\begin{matrix} r_{11} & r_{12} & r_{13} & t_{x} \\ r_{21} & r_{22} & r_{23} & t_{y} \\ r_{31} & r_{32} & r_{33} & t_{z} \\ 0 & 0 & 0 & 1 \end{matrix}]

(8)

where to calculate the head pose, the rotation matrix

R_{h}

needs to be converted into Euler angles, as shown in Equation (9):

\{\begin{matrix} α_{h} = \arctan 2 (r_{21}, r_{11}) \\ β_{h} = \arctan 2 (\sqrt{r_{32}^{2} + r_{33}^{2}}, - r_{31}) \\ γ_{h} = \arctan 2 (r_{32}, r_{33}) \end{matrix}\}

(9)

where

α_{h}, β_{h}, γ_{h}

represent the roll, pitch, and yaw angles of the head pose, respectively. Using these data, we can estimate the current head pose and extract various head pose features. These features include the average value of the head pitch angle

P A

, the number of nods

N S

, the duration for which the head remains in a downward position

H D

, and the overall head activity

H A

.

Based on Equation (9), Equation (10) defines the calculation method for the average value of the head pitch angle,

P A

:

P A = \frac{\sum_{L}^{i} (β_{h, i})}{L}

(10)

where

P A

represents the change in the driver’s head-up angle in a mildly or severely drowsy state compared to the alert state, highlighting the differences in the head pose as the driver experiences varying levels of drowsiness. Although

P A

provides insight into the vertical variation trend of the driver’s head posture, it cannot reliably estimate the driver’s real-time head orientation. Traditional drowsiness detection methods rely on the absolute values of the Euler angles of the head pose to estimate the real-time head orientation. However, these values can be influenced by the camera’s installation position, which may vary across different drivers. To address this issue, in this study, we use the forward head orientation as a reference point. By calculating the change in the Euler angles relative to this reference, the real-time head orientation can be estimated, allowing the method to adapt to varying driving environments. The specific calculation method is provided in the following equation:

\{\begin{matrix} {\bar{α}}_{h} = Mean (\sum_{L}^{i} α_{h, i}) \\ {\bar{β}}_{h} = Mean (\sum_{L}^{i} β_{h, i}) \\ {\bar{γ}}_{h} = Mean (\sum_{L}^{i} γ_{h, i}) \end{matrix}\}

(11)

where

{\bar{α}}_{h}, {\bar{β}}_{h}, {\bar{γ}}_{h}

represent the values of the roll, pitch, and yaw angles of the head, respectively, in the normal orientation, which are calculated by sampling

α_{h, i}, β_{h, i}, γ_{h, i}

within the working window of the self-correction module. During the research process, we estimated the actual head orientation as follows. When the head pose orientation is upward,

β_{h, i} > ({\bar{β}}_{h} + 15^{o})

; when the head pose orientation is downward,

β_{h, i} < ({\bar{β}}_{h} - 15^{o})

; when the head pose orientation is to the left,

γ_{h, i} > ({\bar{γ}}_{h} + 15^{o})

; when the head pose orientation is to the right,

γ_{h, i} < ({\bar{γ}}_{h} - 15^{o})

; under other conditions, the head pose is taken as the forward orientation.

When the orientation of the head pose changes, the feature representing the driver’s head activity,

H A

, will increase accordingly. If the head pose remains downward for a prolonged period, the feature representing the head-dropping duration,

H D

, will accumulate. Additionally, if the head maintains a non-downward orientation for a specified duration (set to 3 s in this study) and then suddenly changes to a downward position that lasts for more than the set time threshold (0.5 s), the feature representing the number of nods,

N S

, will be updated and recorded.

2.2.4. Gaze Direction Features

When a driver transitions from an alert state to a drowsy state, changes in the gaze direction are often observed. For example, the driver may find it difficult to maintain a constant focus on the center of the forward view, or the frequency with which they scan the surrounding environment of the vehicle may decrease. Most existing drowsiness detection methods based on the gaze direction typically rely on expensive eye-tracking devices to collect signals, which limits their applicability in real-world, non-laboratory scenarios. In this study, gaze estimation is achieved using the method proposed by Aloui [84], which provides a more cost-effective and practical alternative to traditional detection methods that depend on specialized devices. To estimate the gaze direction, it is first necessary to describe the geometric features of the eyes and calculate the local posture of the eyeballs. The calculation methods for the vector of the horizontal axis of the eyes

g_{x}

and the vector of the vertical axis of the eyes

g_{y}

are shown in Equation (12):

\{\begin{matrix} g_{x} = G_{1} - G_{2} \\ g_{y} = G_{3} - G_{4} \end{matrix}\}

(12)

where

G_{n}

represents the 3D coordinates of the facial key points required to calculate the gaze direction, which is set by pre-modeling. These key points on the 2D plane correspond to the left side, right side, upper side, and lower side of the eyes and the center of the pupil. The specific coordinates for the left eye are indexed in sequence as (362, 263, 442, 450, 473), and those for the right eye are (130, 133, 223, 230, 468). Figure 6 provides an illustration in which key gaze direction points are annotated. Based on these key points, the calculation methods for the 3D positions of the horizontal center and vertical center of the eyeballs,

h_{c}

and

v_{c}

, are shown in Equation (13):

\{\begin{matrix} h_{c} = \frac{G_{1} + G_{2}}{2} \\ v_{c} = \frac{G_{3} + G_{4}}{2} \end{matrix}\}

(13)

Based on Equations (11) and (12), the Euler angles of the local posture of the eyeballs can be further obtained, as shown in Equation (14):

\{\begin{matrix} γ_{g} = \frac{2 \cdot (G_{5} - h_{c})}{∥ g_{x} ∥} \cdot {\hat{g}}_{x} \\ β_{g} = \frac{2 \cdot (G_{5} - v_{c})}{∥ g_{y} ∥} \cdot {\hat{g}}_{y} \end{matrix}\}

(14)

where

γ_{g}

and

β_{g}

refer to the yaw angle and pitch angle, respectively, in the Euler angles of the local posture of the eyeballs.

({\hat{g}}_{x}, {\hat{g}}_{y})

refer to the unit vectors in the horizontal and vertical directions of the eyes, obtained by normalizing

(g_{x}, g_{y})

.

(∥ g_{x} ∥, ∥ g_{y} ∥)

refer to the horizontal width and vertical height of the eyes, obtained by calculating the norm of

(g_{x}, g_{y})

.

The actual gaze direction is not determined solely by the local posture of the eyeballs but is also influenced by the head pose. By superimposing the 3D rotations corresponding to the head pose established in Equation (9) and the local posture of the eyeballs established in Equation (14), we can calculate the global posture rotation vector of the eyeballs, which represents the gaze direction

R_{g l o b a l}

, as shown in Equation (15):

R_{g l o b a l} = (w_{x}, w_{y}, w_{z})

(15)

where

w_{x}, w_{y}, w_{z}

refer to the components of

R_{g l o b a l}

in the

x, y, z

directions, respectively.

Individuals tend to rotate their eyeballs when observing to the left and right, while they typically turn their heads when observing in upward or downward directions. Based on this observation, the left and right offset degrees of the gaze direction can be measured using the

w_{y}

component of

R_{g l o b a l}

.

Similarly to Equation (11), the value of

w_{y}

is influenced by the installation position of the camera. In this study, we take the gaze direction towards the center of the forward view as the reference and estimate the real-time gaze direction by calculating the change in the component relative to this reference gaze direction. The calculation method is shown in Equation (16):

{\bar{w}}_{y} = Mean (\sum_{L}^{i} w_{y, i})

(16)

where

{\bar{w}}_{y}

represents the value of the component

w_{y}

when the gaze direction is towards the center of the forward view; this is calculated by sampling within the working window of the self-correction module. During the research process, we inferred the actual gaze direction using the following method. If

w_{y, i} > ({\bar{w}}_{y} + 0.5)

, the gaze direction is determined to be left; if

w_{y, i} < ({\bar{w}}_{y} - 0.5)

, the gaze direction is determined to be right; in all other cases, the gaze direction is determined to be towards the center of the forward view.

When the driver’s gaze direction changes, the gaze activity feature

G A

will increase accordingly. If the gaze direction shifts away from the center, the value of the feature

C D C

, indicating whether the driver is checking the side situation via the rearview mirror while also observing the road ahead, will decrease.

If the blink count

B C

, head activity

H A

, and gaze activity

G A

are all below their respective first quartiles in the alert state, and the

C D C

shows that the driver has not been monitoring the side situation for an extended period, it is determined that the driver is in a low-activity state. In this case, the Boolean feature

I A

will be set to true.

2.3. Adaptive Threshold Correction Method

Inter-individual differences among drivers significantly impact the effectiveness of feature extraction methods, particularly for features related to the eye region and mouth contours. In the current drowsiness detection research, events such as blinking and yawning are typically identified by setting thresholds, such as

T_{e}

for the eye aspect ratio and

T_{m}

for the mouth aspect ratio. For example, when

E A R_{i} < T_{e}

, we determine that a blinking event starts, and when

E A R_{i} > T_{e}

, we determine that the blinking event ends. However, due to the inter-individual differences in the eye and mouth sizes, it becomes challenging to set these thresholds. For drivers with larger eyes, their average

E A R

is usually higher, and a larger

T_{e}

is required to accurately detect blinking. In such cases, the value of

T_{e}

may exceed the average

E A R

seen in typical drivers. Conversely, drivers with smaller eyes require a lower

T_{e}

, and a similar issue arises when setting

T_{m}

. The existing methods often establish a universal threshold by statistically analyzing all participants [65,66], but these methods fail to account for inter-individual differences, which leads to challenges in accurately extracting blinking and yawning events.

To address this issue, in this study, we propose an adaptive threshold correction method. This method introduces a 60 s adaptive threshold correction window before the formal operation of the drowsiness detection system, allowing for the calculation of the personalized eye aspect ratio threshold

T_{a, e}

and mouth aspect ratio threshold

T_{a, m}

for each driver. We take the adaptive eye aspect ratio threshold

T_{a, e}

as an example.

First, we initialized the threshold. Then, we captured all eye aspect ratios lower than this initial threshold within the correction window

E A R_{i}

to form a set

E_{c l o s e}

. At the end of the correction window, we applied the Interquartile Range (IQR) method to filter out possible outliers in the set

E_{c l o s e}

, yielding a new set

E_{f i l t e r e d}

. If

E_{f i l t e r e d}

was not empty, we calculated the difference between the mean value

\bar{E}

of all

E A R_{i}

values within the correction window and the mean value

{\bar{E}}_{f i l t e r e d}

of

E_{f i l t e r e d}

. If

E_{f i l t e r e d}

was empty, which indicated that the initial threshold could not effectively capture the blinking features of the driver. In this case, we replaced

{\bar{E}}_{f i l t e r e d}

with the mean value

{\bar{E}}_{s m a l l e s t}

of

E A R_{i}

from the 10th to the 20th instance in reverse order of magnitude within the correction window, seeking to avoid the influence of extreme values. We considered the difference as the change in

E A R_{i}

for this driver between the open-eye and closed-eye states, and then multiplied it by the adaptive adjustment factor

α_{e}

. In this way, the adaptive eye aspect ratio threshold

T_{a, e}

suitable for this driver can be obtained.

The calculation method for the adaptive mouth aspect ratio threshold

T_{a, m}

follows the same procedure as for the adaptive eye aspect ratio threshold

T_{a, e}

. The adaptive adjustment factor for the eye region

α_{e} = 0.85

indicates that, in the discrimination criteria, “the eyes are closed at least 85%” is the condition used to determine that the eyes are closed. Similarly, the adaptive adjustment factor for the mouth contour is

α_{m} = 0.7

. The pseudo-code for the adaptive threshold correction method is provided in Algorithm 1.

Algorithm 1: Adaptive Blink Threshold

Input: Eye aspect ratio at time

i

: E_{i}

, time window

L

, adjustment factor α_{e}

, initial threshold

T

, minimum blink frames

M

.
Output:

Eye status S_{e}

, adaptive blink threshold T_{a, e}

.

# Step 1 : Collect E_{i}

during eye closure within L as E_{c l o s e}

.

1:: $if (i < = L) & (E_{i} < = T)$ then
2:: $E_{c l o s e} \leftarrow E_{c l o s e} \cup \{E_{i}\}$

# Step 2 : Compute the mean of E_{i}

in

L

, filter E_{c l o s e}

, and derive T_{a, e}

.

3:: $else if (i = = L)$ then
4:: $\bar{E} \leftarrow (\sum_{n = 1}^{L} E_{n}) / (L)$ $# Average value of E_{i}$ in the window $L$ .
5:: $Q_{1} \leftarrow f_{25} (E_{c l o s e})$ $# 25 th percentile of E_{c l o s e}$ .
6:: $Q_{3} \leftarrow f_{75} (E_{c l o s e})$ $# 75 th percentile of E_{c l o s e}$ .
7:: $I Q R \leftarrow Q_{3} - Q_{1}$ # Interquartile range (75th–25th percentiles).
8:: $E_{f i l t e r e d} \leftarrow {e \in E_{c l o s e} | e \in [Q_{1} - 1.5 \cdot I Q R, Q_{3} + 1.5 \cdot I Q R]}$
9:: $if (E_{f i l t e r e d} \neq Ø)$ then
10:: ${\bar{E}}_{c l o s e} \leftarrow {\bar{E}}_{f i l t e r e d}$
11:: else
12:: ${\bar{E}}_{c l o s e} \leftarrow {\bar{E}}_{s m a l l e s t}$ $# {\bar{E}}_{s m a l l e s t}$ $: set of the smallest E_{i}$ in $L$ .
13:: $T_{a, e} = (\bar{E} - {\bar{E}}_{c l o s e}) \cdot α_{e} + {\bar{E}}_{c l o s e}$

# Step 3 : Determine the state of S_{e}

using T_{a, e}

.

14:: $else if (i > L) & (E_{i} < = T_{a, e})$ then
15:: $j \leftarrow j + 1$ # $j$ : closed-eye buffer evaluation.
16:: $k \leftarrow 0$
17:: $if (j > = M)$ then
18:: $S_{e} \leftarrow C l o s e$
19:: else
20:: $S_{e} \leftarrow S_{e}$
21:: $else if (i > L) & (E_{i} > T_{a, e})$ then
22:: $k \leftarrow k + 1$ # $k$ : open-eye buffer evaluation.
23:: $j \leftarrow 0$
24:: $if (k > = M)$ then
25:: $S_{e} \leftarrow O p e n$
26:: else
27:: $S_{e} \leftarrow S_{e}$
28:: end if

2.4. Data Preprocessing

There are significant performance differences among drivers in a drowsy state; these individual variations present a major challenge for drowsiness detection research [65,66,67]. To minimize the impact of these differences in feature values across participants and preserve the value changes in the same driver in both the alert and drowsy states, we introduce an improvement to the eye feature standardization method proposed by Ghoddoosian et al. [76], extending its applicability to various facial region features. First, half of the samples from the non-Boolean features in the alert state are used to calculate the mean and standard deviation of these features. Then, the remaining non-Boolean features from the alert state, along with the non-Boolean features of the same participant in the other two states, are standardized using the calculated mean and standard deviation, as shown in Equation (17):

\bar{F_{n, m}} = \frac{F_{n, m} - μ_{n, m}}{σ_{n, m}}

(17)

where

\bar{F_{n, m}}

represents the standardized feature value,

F_{n, m}

represents the original feature value, and

μ_{n, m}

and

σ_{n, m}

represent the mean and standard deviation of the non-Boolean features collected in the alert state of feature

n

and the participant

m

, respectively.

In this study, to prevent data leakage, the data used to calculate the mean and standard deviation of the features were not included in the model training or testing phases. During actual deployment, adaptive threshold correction can be performed in the initial minutes after a new driver starts driving and remains in an alert state. The data collected during this period are used to calculate the means and standard deviations. Once the adaptive thresholds, means, and standard deviations have been determined in this correction window, the system proceeds with feature extraction and data preprocessing for subsequent analysis.

2.5. Classifiers

Currently, drowsiness detection research using physiological signals often relies on machine learning models for classification. However, research based on facial signals from drivers is relatively scarce, and there are no studies that have combined a large number of facial signal features with ensemble learning for classification. Ensemble models are typically divided into two types: hard voting and soft voting. Hard voting uses the predictions from each classifier and provides the final result based on the majority output, while soft voting weights and sums the predictions from each classifier to make the final decision.

In this study, the soft voting method was used to combine three machine learning models—random forest (RF), a multilayer perceptron (MLP), and XGBoost—to leverage the strengths of each model and compensate for the weaknesses of any single model. While many machine learning models show strong performance, we specifically chose these three after careful consideration. Random forest, a model based on decision trees [85], is effective in handling high-dimensional data, capturing feature relationships, evaluating the feature importance, and reducing overfitting through random sampling. A multilayer perceptron (MLP) is a feedforward neural network [86] that uses multiple layers of neurons and non-linear activation functions to map features to labels. Its simple and flexible structure makes it suitable for large-scale data, although it carries a risk of overfitting. XGBoost, based on gradient boosting [87], enhances the prediction performance through the construction and optimization of multiple decision trees and the employment of regularization and parallel algorithms to combat overfitting. It is particularly strong in modeling non-linear interactions.

Together, these three models complement each other in areas such as feature extraction, model interpretability, and generalization performance: the MLP learns feature interaction patterns that differ from those of tree models; random forest provides stability and reliability in noise processing and feature importance evaluation; and XGBoost enables the fitting of complex decision boundaries. Figure 7 illustrates the architecture of the ensemble model.

2.6. Shapley Additive Explanations (SHAP)

The interpretability of this model means that it can offer valuable insights for related research and streamline the process. Shapley Additive Explanations (SHAP), which is a model interpretation method based on Shapley values from game theory, works by retraining the model on different subsets of features and assigning importance to each feature. This allows for an assessment of their contributions to the final prediction. The Shapley value for each feature is defined as shown in Equation (18):

φ_{j} = \sum_{S \subseteq N \ {j}} \frac{| S |! (N - | S | - 1)!}{N!} (ν (S \cup {j} - ν (S))

(18)

where

S \subseteq N \ {j}

represents all subsets that do not contain feature

j

. Calculating the difference of

ν (S \cup {j} - ν (S)

, we obtained the contribution of feature

j

based on the subset

S

. Then, we took the weighted average of this contribution over all possible subsets to obtain the overall contribution of the feature

j

.

While SHAP can be used with various machine learning models, XGBoost offers distinct advantages in interpretability compared to the MLP, which has an opaque decision-making process, and random forest, which can exhibit fluctuations when aggregating interpretations. XGBoost’s node-splitting method and incremental training mechanism provide clearer insights into the model’s decision-making process. As a result, we selected XGBoost specifically for interpretation and analysis.

3. Results and Discussion

In this section, to validate the proposed method, we present the performance of different machine learning models and ensemble modeling techniques in driver drowsiness detection, using various combinations of facial features. The results are compared, interpreted, and analyzed. The evaluation metrics used in this study include the accuracy, precision, and recall rate. In the participant-independent validation, we also include the metric used for UTA-RLDD [76], namely the video accuracy (VA), which represents the percentage of correctly classified complete videos out of the total number of complete videos, with video clips detected as independent samples.

3.1. Performance Comparison of Different Models

In this study, we employed both participant-dependent and participant-independent validation methods to compare the performance of various machine learning models trained on comprehensive facial feature data.

The participant-dependent validation methods included the holdout, K-fold, and stratified K-fold validation techniques. In the holdout validation, the dataset was split into training and test sets with ratios of 70:30 and 80:20, respectively. For the K-fold cross-validation and stratified K-fold methods, we divided the dataset into k subsets, using k-1 subsets for training and the remaining subset for validation. The stratified K-fold method ensures the even distribution of labels across the training and test sets.

Table 2 presents the performance of the different machine learning models under the participant-dependent validation methods. Figure 8 presents a bar chart depicting the performance of the different machine learning models under the participant-dependent validation methods. The results demonstrate that the ensemble model outperformed the others, effectively leveraging the strengths of each individual model while overcoming the limitations of any single model. The accuracy, precision, and recall values for each model were closely aligned, with particularly consistent results obtained for the K-fold cross-validation and stratified K-fold validation methods. This indicates that the models performed consistently across different categories in the classification task, which can be attributed to the balanced distribution of samples across labels in the dataset.

The participant-independent validation method used in this study was leave-one-participant cross-validation (LOPO CV), which we used to generate 60 groups corresponding to the 60 drivers in the dataset. In each iteration, the data of one driver were used as the test set, while the data from the remaining 59 drivers served as the training set.

Table 3 presents the performance of various machine learning models under the participant-independent validation method. Figure 9 presents a bar chart depicting the performance of the different machine learning models under the participant-independent validation method. The results reveal that the ensemble model outperformed the others, with accuracy ranging from 69.38% to 88.32%, precision between 68.88% and 88.67%, and a recall rate ranging from 64.09% to 86.77%. XGBoost ranked second, with accuracy between 68.38% and 88.38%, precision from 67.37% to 88.65%, and a recall rate between 63.00% and 86.54%.

Compared to the participant-dependent validation method, the participant-independent validation method showed significant variations and more pronounced fluctuations in the accuracy, precision, and recall rates. This discrepancy is due to the fact that these evaluation metrics primarily reflect the classification ability of the model for individual samples. The participant-independent validation method, which simulates real-world deployment where the model encounters completely unfamiliar drivers, introduces more variability. In real-world scenarios, a driver’s drowsiness state is not consistently extreme at every sampling point, making it more challenging to classify individual instances accurately.

The VA, obtained by aggregating the results of video clips, helps to mitigate this issue and accounts for the cumulative effect of time; for this reason, this metric’s values were higher than the individual accuracy values. The ensemble model achieved a VA of 86.52%, while that of XGBoost reached 85.36%. This suggests that, while the ensemble model shows only slight advantages in the classification of individual samples, its more reliable performance over time, considering the cumulative nature of drowsiness, makes it a more effective model overall.

Figure 10 presents the confusion matrices for the classification of individual video clips using the different machine learning models under the participant-independent validation method. The results show that the proposed method achieved a high accuracy of 95.59% in distinguishing between the “alert state” and the “drowsy state”, with almost no misclassification of the “severely drowsy state” as the “alert state” and vice versa. This indicates that the method can reliably identify when a driver enters a drowsy state, effectively preventing accidents. However, distinguishing between “mild drowsiness” and “severe drowsiness” remains challenging, as the model tends to confuse these two states. This is due to the transitional nature and the lack of a clear boundary between mild and severe drowsiness, with many drivers showing similar behaviors in both states. To address this issue, additional feature design and further data collection are necessary.

3.2. Performance Comparison of Different Feature Combinations

To explore the contributions of different facial region features to the ensemble model’s output, we evaluated various combinations of facial features. Table 4 presents the performance of the ensemble model trained with different facial feature combinations. The results show that excluding any facial feature reduces the model’s performance, with the model using all facial features achieving the best results.

When using only a single facial region feature, the eye region features performed the best, with accuracy ranging from 59.32% to 81.70%, precision between 56.37% and 82.43%, recall from 52.97% to 78.51%, and VA of 74.16%. This highlights the dominant role of eye region features in driver drowsiness detection, aligning with established research in the field. However, some features are more effective when combined with others to form feature subsets. For instance, when used alone, the mouth contour or gaze direction features perform poorly. Meanwhile, when these are combined with other facial region features, the model shows a significant improvement. When using only mouth contour features, accuracy ranges from 29.27% to 52.93%, precision is between 67.11% and 84.20%, recall ranges from 12.55% to 40.19%, and VA is 38.20%. For gaze direction features, accuracy ranges from 32.05% to 54.63%, precision is between 31.95% and 57.21%, recall ranges from 26.42% to 49.84%, and VA is 43.82%. When combining both, accuracy improves to a range from 35.42% to 59.94%, precision is between 35.03% and 61.51%, recall ranges from 30.94% to 52.42%, and VA is 50.56%, indicating an overall improvement of more than 3% across all metrics, with VA increasing by nearly 7%. Combining eye region features with gaze direction features results in a VA of 78.09%, an improvement of 3.93% over using eye region features alone. Similarly, combining eye region features with mouth contour features achieves a VA of 79.78%, representing a 5.62% increase. When utilizing all facial region features, the model reaches a VA of 86.52%, demonstrating the best overall performance.

This can be attributed to the inter-individual differences among drivers, as drowsiness may manifest differently across different facial regions. For example, some drivers may exhibit a wandering gaze and yawning in the drowsy state, while others may not show such signs. As a result, models relying solely on mouth contour or gaze direction features may struggle to accurately classify the drowsiness states of these drivers. Therefore, utilizing a comprehensive set of facial features allows the model to better generalize across a diverse range of drivers, improving its overall performance. Table 4 shows the performance of the ensemble models trained with data from various combinations of facial region features and tested under participant-independent validation, where the reported numbers are the mean ± standard deviation, and the feature types are (1) eye region features, (2) mouth contour features, (3) head pose features, and (4) gaze direction features.

3.3. Comparison with Similar Techniques

In this study, we also compared the proposed method with five similar drowsiness detection methods developed in recent years. Ghoddoosian et al. [76] extracted four features from the eye region using facial key points and fed these features into a hierarchical multiscale LSTM network (HM-LSTM [88]) for drowsiness detection. Chengula et al. [60] classified driver drowsiness states by integrating three deep learning models. Liu et al. [89] used a multi-task cascaded convolutional neural network (MTCNN) and MobileNet [90] for feature extraction, followed by an LSTM network [91] for classification. Pandey and Muppalaneni [92] applied a convolutional neural network for facial region feature extraction and then used an LSTM network for classification. Mittal et al. [93] extracted mouth contour and eye region features through facial key points and applied various machine learning models, with logistic regression showing the best performance. Magán et al. [94] utilized deep learning to extract features from image frames and then input them into a fuzzy inference system for classification. Table 5 compares the VA of different methods on the UTA-RLDD dataset.

Ghoddoosian et al. [76] focused on a single facial region and neglected others, leading to limited performance. Liu et al. [89] and Magán et al. [94] used deep neural network models for feature extraction, enabling them to capture more detailed information than when using handcrafted features. However, the UTA-RLDD dataset contains many microexpressions that may not contribute significantly to drowsiness state classification, resulting in redundancy and reduced model performance. Chengula et al. [60] integrated multiple deep learning models, enhancing the generalization ability of the method; however, it may suffer from performance limitations due to redundant feature extraction. Pandey and Muppalaneni [92] and Mittal et al. [93] incorporated temporal features, improving their models’ performance. Among all methods compared, the approach proposed in this study achieved the best results. This success is attributed to the use of rich, diverse facial region features and the ensemble model, which effectively addresses inter-individual differences among drivers. Furthermore, our method relies solely on machine learning models, offering real-time performance advantages over deep learning or graph neural network methods, making it more suitable for practical deployment and promotion. Table 5 shows the comparison of the proposed ensemble model with other methods on the UTA-RLDD dataset in terms of the VA.

3.4. SHAP Analysis

In this study, we performed a SHAP analysis on the dataset using XGBoost, resulting in a feature importance graph (Figure 11). To calculate the feature importance, we computed the mean of the absolute Shapley values for each feature across all samples and sorted them in descending order. As shown in Figure 11, the most important features identified were the average eye aspect ratio

{EAR}_{AVG}

, the average value of the head pitch angle

P A

, and the duration of eye closure

M E C

. Notably, the duration of eye closure

M E C

is a commonly used feature in other studies, while the average eye aspect ratio

{EAR}_{AVG}

and the average head pitch angle

P A

are novel contributions of this study. These findings further validate the effectiveness of the features designed in this work.

Among the facial region features, the eye region features play the most significant role in the model’s decision-making process. The results suggest that incorporating the average blink duration

B D

within a statistical time window, along with the mean, minimum, and maximum values of the eye movement amplitude

A m p

and eye opening wave velocity

E O V

, as model inputs, allows for the effective detection of driver drowsiness. This is primarily attributed to the adaptive threshold correction and standardization processes. These steps not only facilitate the accurate collection of various facial region features but also help to mitigate numerical differences, preserving individual changes in the driver’s state.

The mouth contour features significantly contribute to distinguishing between the alert and severely drowsy states, suggesting that features associated with yawning are effective in differentiating between drowsy and non-drowsy states. However, these features show limited value in distinguishing between mild and severe drowsiness. Among the head pose features, the duration for which the head is in a downward position

H D

contributes relatively little, as not all drivers exhibit prolonged head-down behavior when drowsy. However, features such as the average head pitch angle

P A

and number of nods

N S

play a more significant role, as they provide a broader view of different driver groups. The head activity

H A

primarily helps to identify the alert state due to minimal variations in head movement across different drowsiness levels, resulting in a smaller contribution. The features indicating a low-activity state

I A

contribute the least, as the conditions designed for this state are too limited, and even in a drowsy state, facial features show noticeable changes.

The gaze direction features contribute consistently across all classifications, with the feature describing the frequency of the driver’s observation of the surroundings,

C D C

, having the most significant impact. This suggests that further design and collection of gaze direction features could help to compensate for any shortcomings in other facial region features, potentially improving the overall accuracy in drowsiness detection.

4. Conclusions

Drowsiness detection based on facial signals offers a cost-effective, reliable, and scalable solution, although inter-individual differences among drivers create challenges in its application. To overcome this, we propose a driver drowsiness detection method that integrates diverse facial region features and utilizes an ensemble model’s output. This approach includes an adaptive threshold correction technique for accurate feature extraction across individuals and incorporates 20 features from four facial regions—the eye region, mouth contour, head pose, and gaze direction—ensuring comprehensive coverage of the facial characteristics. An improved participant-independent standardization method was applied for feature preprocessing, and the ensemble model was employed for the classification of drivers’ drowsiness states.

Extensive experiments on the UTA-RLDD dataset were performed to validate the proposed method, with participant-independent validation yielding accuracy of 69.38–88.32%, precision of 68.88–88.67%, a recall rate of 64.09–86.77%, and VA of 86.52%. Thus, it enabled the effective determination of the overall drowsiness level in video clips. The ensemble model, combining the strengths of random forest, an MLP, and XGBoost, overcame the limitations of the individual models and achieved superior performance compared to recently introduced, similar approaches.

Further analysis through an ablation experiment showed that the features from the eye region performed best when used alone, yielding accuracy of 59.32–81.70% and VA of 74.16%. The integration of multiple features demonstrated complementary advantages, enabling us to successfully address inter-individual differences in drivers. The SHAP analysis confirmed that specific features, such as the average eye aspect ratio and head pitch angle, significantly contributed to model classification, supporting the proposed method’s effectiveness. However, some features showed minor contributions, suggesting areas for future enhancement in feature design to further improve the performance.

Because the data used in this study were derived from a simulated driving scenario, the proposed method still has limitations. Future work will focus on collecting data from real driving scenarios and optimizing the feature extraction methods based on larger datasets to enhance the detection of drowsiness, ultimately helping to prevent accidents caused by drowsiness while driving.

Author Contributions

Conceptualization, W.H. and C.X.; methodology, W.H.; software, W.H.; validation, W.H.; formal analysis, W.H.; investigation, W.H.; resources, C.X.; data curation, W.H., J.L. and L.L.; writing—original draft preparation, W.H.; writing—review and editing, W.H. and C.X.; visualization, W.H.; supervision, C.X.; project administration, C.X.; funding acquisition, C.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in this study are openly available. The UTA-RLDD: https://sites.google.com/view/utarldd/home, accessed on 12 February 2025.

Acknowledgments

The authors wish to acknowledge the School of Optoelectronic Engineering, Chongqing University of Posts and Telecommunications, China, for providing the laboratory facilities and support.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Perkins, E.; Sitaula, C.; Burke, M.; Marzbanrad, F. Challenges of driver drowsiness prediction: The remaining steps to implementation. IEEE Trans. Intell. Veh. 2023, 8, 1319–1338. [Google Scholar]
Australian Transport Council. National Road Safety Strategy 2011–2020. 2011. Available online: https://www.roadsafety.gov.au/nrss/2011-20 (accessed on 12 February 2025).
Sikander, G.; Anwar, S. Driver Fatigue Detection Systems: A Review. IEEE Trans. Intell. Transp. Syst. 2019, 20, 2339–2352. [Google Scholar]
Wang, Q.; Yang, J.; Ren, M.; Zheng, Y. Driver fatigue detection: A survey. In Proceedings of the 2006 6th World Congress on Intelligent Control and Automation, Dalian, China, 21–23 June 2006. [Google Scholar]
Bekhouche, S.E.; Ruichek, Y.; Dornaika, F. Driver drowsiness detection in video sequences using hybrid selection of deep features. Knowl.-Based Syst. 2022, 252, 109436. [Google Scholar]
Chai, M. Drowsiness monitoring based on steering wheel status. Transp. Res. Part D Transp. Environ. 2019, 66, 95–103. [Google Scholar] [CrossRef]
Li, R.; Chen, Y.V.; Zhang, L. A method for fatigue detection based on driver’s steering wheel grip. Int. J. Ind. Ergon. 2021, 82, 103083. [Google Scholar]
Hasanuddin, M.O.; Novianingrum, H.; Syamsuddin, E.Y. Design and implementation of drowsiness detection system based on standard deviation of lateral position. In Proceedings of the 2022 12th International Conference on System Engineering and Technology (ICSET), Bandung, Indonesia, 3–4 October 2022. [Google Scholar]
Hu, J.; Xu, L.; He, X.; Meng, W. Abnormal driving detection based on normalized driving behavior. IEEE Trans. Veh. Technol. 2017, 66, 6645–6652. [Google Scholar]
Latreche, I.; Slatnia, S.; Kazar, O.; Harous, S. An optimized deep hybrid learning for multi-channel EEG-based driver drowsiness detection. Biomed. Signal Process. Control 2025, 99, 106881. [Google Scholar]
Lin, X.; Huang, Z.; Ma, W.; Tang, W. EEG-based driver drowsiness detection based on simulated driving environment. Neurocomputing 2025, 616, 128961. [Google Scholar]
Lins, I.D.; Araújo, L.M.M.; Maior, C.B.S.; da Silva Ramos, P.M.; das Chagas Moura, M.J.; Ferreira-Martins, A.J.; Chaves, R.; Canabarro, A. Quantum machine learning for drowsiness detection with EEG signals. Process Saf. Environ. Prot. 2024, 186, 1197–1213. [Google Scholar] [CrossRef]
Freitas, A.; Almeida, R.; Gonçalves, H.; Conceição, G.; Freitas, A. Monitoring fatigue and drowsiness in motor vehicle occupants using electrocardiogram and heart rate—A systematic review. Transp. Res. Part F Traffic Psychol. Behav. 2024, 103, 586–607. [Google Scholar]
Fouad, I.A. A robust and efficient EEG-based drowsiness detection system using different machine learning algorithms. Ain Shams Eng. J. 2023, 14, 101895. [Google Scholar]
Oliveira, L.; Cardoso, J.S.; Lourenço, A.; Ahlström, C. Driver drowsiness detection: A comparison between intrusive and non-intrusive signal acquisition methods. In Proceedings of the 2018 7th European Workshop on Visual Information Processing (EUVIP), Tampere, Finland, 26–28 November 2018. [Google Scholar]
Zhang, Z.; Ning, H.; Zhou, F. A systematic survey of driving fatigue monitoring. IEEE Trans. Intell. Transp. Syst. 2022, 23, 19999–20020. [Google Scholar]
Cori, J.M.; Turner, S.; Westlake, J.; Naqvi, A.; Ftouni, S.; Wilkinson, V.; Vakulin, A.; O’Donoghue, F.J.; Howard, M.E. Eye blink parameters to indicate drowsiness during naturalistic driving in participants with obstructive sleep apnea: A pilot study. Sleep Health 2021, 7, 644–651. [Google Scholar] [PubMed]
Hollander, J.; Huette, S. Extracting blinks from continuous eye-tracking data in a mind wandering paradigm. Conscious. Cogn. 2022, 100, 103303. [Google Scholar]
Patel, P.P.; Pavesha, C.L.; Sabat, S.S.; More, S.S. Deep Learning based Driver Drowsiness Detection. In Proceedings of the 2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC), Salem, India, 9–11 May 2022; pp. 380–386. [Google Scholar]
Cori, J.M.; Wilkinson, V.E.; Jackson, M.; Westlake, J.; Stevens, B.; Barnes, M.; Swann, P.; Howard, M.E. The Impact of Alcohol Consumption on Commercial Eye Blink Drowsiness Detection Technology. Hum. Psychopharmacol. 2023, 38, e2870. [Google Scholar]
Schmidt, J.; Laarousi, R.; Stolzmann, W.; Karrer-Gauß, K. Eye blink detection for different driver states in conditionally automated driving and manual driving using EOG and a driver camera. Behav. Res. 2018, 50, 1088–1101. [Google Scholar]
Ramzan, M.; Khan, H.U.; Awan, S.M.; Ismail, A.; Ilyas, M.; Mahmood, A. A survey on state-of-the-art drowsiness detection techniques. IEEE Access 2019, 7, 61904–61919. [Google Scholar]
Seeck, M.; Koessler, L.; Bast, T.; Leijten, F.; Michel, C.; Baumgartner, C.; He, B.; Beniczky, S. The Standardized EEG Electrode Array of the IFCN. Clin. Neurophysiol. 2017, 128, 2070–2077. [Google Scholar]
Jiao, Y.; Deng, Y.; Luo, Y.; Lu, B.L. Driver Sleepiness Detection from EEG and EOG Signals Using GAN and LSTM Networks. Neurocomputing 2020, 408, 100–111. [Google Scholar]
Yang, G.; Lin, Y.; Bhattacharya, P. A Driver Fatigue Recognition Model Based on Information Fusion and Dynamic Bayesian Network. Inf. Sci. 2010, 180, 1942–1954. [Google Scholar]
Leicht, L.; Ruder, H.; Müller, C.; Suter, R.; Iff, I.; Moser, R.; Stettler, A. Capacitive ECG Monitoring in Cardiac Patients During Simulated Driving. IEEE Trans. Biomed. Eng. 2018, 66, 749–758. [Google Scholar] [CrossRef] [PubMed]
Walter, M.; Huebner, W.; Kautz, S.; Behrens, M.; Henn, J.; Heuser, L.; Pfeiffer, U.; Hentschel, B.; Reiss, M. The Smart Car Seat: Personalized Monitoring of Vital Signs in Automotive Applications. Pers. Ubiquitous Comput. 2011, 15, 707–715. [Google Scholar] [CrossRef]
Sun, Y.; Yu, X. An Innovative Nonintrusive Driver Assistance System for Vital Signal Monitoring. IEEE J. Biomed. Health Inform. 2014, 18, 1932–1939. [Google Scholar] [CrossRef] [PubMed]
Arefnezhad, S.; Samiee, S.; Eichberger, A.; Nahvi, A. Driver Drowsiness Detection Based on Steering Wheel Data Applying Adaptive Neuro-Fuzzy Feature Selection. Sensors 2019, 19, 943. [Google Scholar] [CrossRef] [PubMed]
Sun, Z.; Zhang, H.; Zhang, Y.; Zhang, W. Facial feature fusion convolutional neural network for driver fatigue detection. Eng. Appl. Artif. Intell. 2023, 126, 106981. [Google Scholar] [CrossRef]
Rajamohana, S.P.; Radhika, E.G.; Priya, S.; Sangeetha, S. Driver drowsiness detection system using hybrid approach of convolutional neural network and bidirectional long short term memory (CNN_BILSTM). Mater. Today Proc. 2021, 45 Pt 2, 2897–2901. [Google Scholar] [CrossRef]
El-Nabi, S.A.; El-Shafai, W.; El-Rabaie, E.S.; Ramadan, K.F.; El-Samie, F.E.A.; Mohsen, S. Machine learning and deep learning techniques for driver fatigue and drowsiness detection: A review. Multimed. Tools Appl. 2024, 83, 9441–9477. [Google Scholar] [CrossRef]
Sahayadhas, A.; Sundaraj, K.; Murugappan, M. Detecting driver drowsiness based on sensors: A review. Sensors 2012, 12, 16937–16953. [Google Scholar] [CrossRef]
Guo, W.; Zhang, B.; Xia, L.; Shi, S.; Zhang, X.; She, J. Driver drowsiness detection model identification with Bayesian network structure learning method. In Proceedings of the 2016 Chinese Control and Decision Conference (CCDC), Yinchuan, China, 28–30 May 2016; pp. 131–136. [Google Scholar]
Wang, X.; Xu, C. Driver Drowsiness Detection Based on Non-Intrusive Metrics Considering Individual Specifics. Accid. Anal. Prev. 2016, 95, 350–357. [Google Scholar] [CrossRef]
Hailin, W.; Hanhui, L.; Zhumei, S. Fatigue driving detection system design based on driving behavior. In Proceedings of the 2010 International Conference on Optoelectronics and Image Processing, Haikou, China, 11–12 November 2010; pp. 549–552. [Google Scholar]
Takei, Y.; Furukawa, Y. Estimate of driver’s fatigue through steering motion. In Proceedings of the 2005 IEEE International Conference on Systems, Man and Cybernetics, Waikoloa, HI, USA, 12 October 2005; Volume 2, pp. 1765–1770. [Google Scholar]
Dehzangi, O.; Masilamani, S. Unobtrusive driver drowsiness prediction using driving behavior from vehicular sensors. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 3598–3603. [Google Scholar]
McDonald, A.D.; Lee, J.D.; Schwarz, C.; Brown, T.L. A contextual and temporal algorithm for driver drowsiness detection. Accid. Anal. Prev. 2018, 113, 25–37. [Google Scholar] [CrossRef]
Zhang, X.; Wang, X.; Yang, X.; Xu, C.; Zhu, X.; Wei, J. Driver drowsiness detection using mixed-effect ordered logit model considering time cumulative effect. Anal. Methods Accid. Res. 2020, 26, 100114. [Google Scholar]
Lu, K.; Sjörs Dahlman, A.; Karlsson, J.; Candefjord, S. Detecting driver fatigue using heart rate variability: A systematic review. Accid. Anal. Prev. 2022, 178, 106830. [Google Scholar] [CrossRef]
Sun, Y.; Wang, R.; Zhang, H.; Ding, N.; Ferreira, S.; Shi, X. Driving fingerprinting enhances drowsy driving detection: Tailoring to individual driver characteristics. Accid. Anal. Prev. 2024, 208, 107812. [Google Scholar] [PubMed]
Owen, V.; Surantha, N. Computer vision-based drowsiness detection using handcrafted feature extraction for edge computing devices. Appl. Sci. 2025, 15, 638. [Google Scholar] [CrossRef]
Makhmudov, F.; Turimov, D.; Xamidov, M.; Nazarov, F.; Cho, Y.-I. Real-time fatigue detection algorithms using machine learning for yawning and eye state. Sensors 2024, 24, 7810. [Google Scholar] [CrossRef]
Liu, W.; Qian, J.; Yao, Z.; Jiao, X.; Pan, J. Convolutional two-stream network using multi-facial feature fusion for driver fatigue detection. Future Internet 2019, 11, 115. [Google Scholar] [CrossRef]
Jung, H.-S.; Shin, A.; Chung, W.-Y. Driver fatigue and drowsiness monitoring system with embedded electrocardiogram sensor on steering wheel. IET Intell. Transp. Syst. 2014, 8, 43–50. [Google Scholar] [CrossRef]
Sahayadhas, A.; Sundaraj, K.; Murugappan, M.; Palaniappan, R. A physiological measures-based method for detecting inattention in drivers using machine learning approach. Biocybern. Biomed. Eng. 2015, 35, 198–205. [Google Scholar]
Lohani, M.; Payne, B.R.; Strayer, D.L. A review of psychophysiological measures to assess cognitive states in real-world driving. Front. Hum. Neurosci. 2019, 13, 57. [Google Scholar] [CrossRef]
Cheng, B.; Zhang, W.; Lin, Y.; Feng, R.; Zhang, X. Driver drowsiness detection based on multisource information. Hum. Factors Ergon. Manuf. Serv. Ind. 2012, 22, 450–467. [Google Scholar] [CrossRef]
Ren, Z.; Li, R.; Chen, B.; Zhang, H.; Ma, Y.; Wang, C.; Lin, Y.; Zhang, Y. EEG-based driving fatigue detection using a two-level learning hierarchy radial basis function. Front. Neurorobotics 2021, 15, 618408. [Google Scholar]
Moujahid, A.; Dornaika, F.; Arganda-Carreras, I.; Reta, J. Efficient and compact face descriptor for driver drowsiness detection. Expert Syst. Appl. 2021, 168, 114334. [Google Scholar] [CrossRef]
Ghourabi, A.; Ghazouani, H.; Barhoumi, W. Driver drowsiness detection based on joint monitoring of yawning, blinking, and nodding. In Proceedings of the 2020 IEEE 16th International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, 3–5 September 2020; pp. 407–414. [Google Scholar]
Ahammed Dipu, M.T.; Hossain, S.S.; Arafat, Y.; Rafiq, F.B. Real-time driver drowsiness detection using deep learning. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 237213082. [Google Scholar]
Quddus, A.; Zandi, A.S.; Prest, L.; Comeau, F.J.E. Using long short term memory and convolutional neural networks for driver drowsiness detection. Accid. Anal. Prev. 2021, 156, 106107. [Google Scholar]
AL-Quraishi, M.S.; Ali, S.S.A.; AL-Qurishi, M.; Tang, T.B.; Elferik, S. Technologies for detecting and monitoring drivers’ states: A systematic review. Heliyon 2024, 10, e39592. [Google Scholar] [PubMed]
Bakheet, S.; Al-Hamadi, A. A framework for instantaneous driver drowsiness detection based on improved HOG features and Naïve Bayesian classification. Brain Sci. 2021, 11, 240. [Google Scholar] [CrossRef]
Xu, J.; Pan, S.; Sun, P.Z.H.; Park, S.H.; Guo, K. Human-Factors-in-Driving-Loop: Driver identification and verification via a deep learning approach using psychological behavioral data. IEEE Trans. Intell. Transp. Syst. 2023, 24, 3383–3394. [Google Scholar]
Teyeb, I.; Jemai, O.; Zaied, M.; Ben Amar, C. A drowsy driver detection system based on a new method of head posture estimation. In Intelligent Data Engineering and Automated Learning—IDEAL 2014; Corchado, E., Lozano, J.A., Quintián, H., Yin, H., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2014; Volume 8669, pp. 410–418. [Google Scholar]
Alioua, N.; Amine, A.; Rziza, M. Driver’s fatigue detection based on yawning extraction. Int. J. Veh. Technol. 2014, 2014, 678786. [Google Scholar] [CrossRef]
Chengula, T.J.; Mwakalonge, J.; Comert, G.; Siuhi, S. Improving road safety with ensemble learning: Detecting driver anomalies using vehicle inbuilt cameras. Mach. Learn. Appl. 2023, 14, 100510. [Google Scholar] [CrossRef]
Bamidele, A.A.; Kamardin, K.; Syazarin, N.; Mohd, S.; Shafi, I.; Azizan, A.; Aini, N.; Mad, H. Non-intrusive driver drowsiness detection based on face and eye tracking. Int. J. Adv. Comput. Sci. Appl. 2019, 10, 199521742. [Google Scholar]
Han, W.; Yang, Y.; Huang, G.; Sourina, O.; Klanner, F.; Denk, C. Driver drowsiness detection based on novel eye openness recognition method and unsupervised feature learning. In Proceedings of the 2015 IEEE International Conference on Systems, Man, and Cybernetics, Hong Kong, China, 9–12 October 2015; pp. 1470–1514. [Google Scholar]
Dasgupta, A.; George, A.; Happy, S.L.; Routray, A. A vision-based system for monitoring the loss of attention in automotive drivers. IEEE Trans. Intell. Transp. Syst. 2013, 14, 1825–1838. [Google Scholar]
Chowdhury, A.; Shankaran, R.; Kavakli, M.; Haque, M.M. Sensor applications and physiological features in drivers’ drowsiness detection: A review. IEEE Sens. J. 2018, 18, 3055–3067. [Google Scholar]
Albadawi, Y.; AlRedhaei, A.; Takruri, M. Real-time machine learning-based driver drowsiness detection using visual features. J. Imaging 2023, 9, 91. [Google Scholar] [CrossRef] [PubMed]
Ingre, M.; Akerstedt, T.; Peters, B.; Anund, A.; Kecklund, G. Subjective sleepiness, simulated driving performance and blink duration: Examining individual differences. J. Sleep Res. 2006, 15, 47–53. [Google Scholar] [CrossRef] [PubMed]
Essel, E.; Lacy, F.; Elmedany, W.; Albalooshi, F.; Ismail, Y. Driver drowsiness detection using fixed and dynamic thresholding. In Proceedings of the 2022 International Conference on Data Analytics for Business and Industry (ICDABI), Sakhir, Bahrain, 25–26 October 2022; pp. 552–557. [Google Scholar]
Sun, Y.; Wu, C.; Zhang, H.; Chu, W.; Xiao, Y.; Zhang, Y. Effects of individual differences on measurements’ drowsiness-detection performance. Promet-Traffic Transp. 2021, 33, 565–578. [Google Scholar]
Chen, S.; Wang, Z.; Chen, W. Driver Drowsiness Estimation Based on Factorized Bilinear Feature Fusion and a Long-Short-Term Recurrent Convolutional Network. Information 2021, 12, 3. [Google Scholar]
Guo, J.M.; Markoni, H. Driver drowsiness detection using hybrid convolutional neural network and long short-term memory. Multimed. Tools Appl. 2019, 78, 29059–29087. [Google Scholar]
Huynh, X.P.; Park, S.M.; Kim, Y.G. Detection of driver drowsiness using 3D deep neural network and semi-supervised gradient boosting machine. In Computer Vision—ACCV 2016 Workshops (Lecture Notes in Computer Science); Chen, C.S., Lu, J., Ma, K.K., Eds.; Springer: Cham, Switzerland, 2017; Volume 10118, pp. 101–113. [Google Scholar]
Gao, H.; Hu, R.; Huang, Z. Gaze behavior patterns for early drowsiness detection. In Artificial Neural Networks and Machine Learning—ICANN 2023; Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2023; Volume 14254, pp. 223–234. [Google Scholar]
Ramos, P.M.S.; Maior, C.B.S.; Moura, M.C.; Lins, I.D. Automatic drowsiness detection for safety-critical operations using ensemble models and EEG signals. Process Saf. Environ. Prot. 2022, 164, 566–581. [Google Scholar]
Siddiqui, H.U.R.; Akmal, A.; Iqbal, M.; Saleem, A.A.; Raza, M.A.; Zafar, K.; Zaib, A.; Dudley, S.; Arambarri, J.; Castilla, Á.K.; et al. Ultra-Wide Band Radar Empowered Driver Drowsiness Detection with Convolutional Spatial Feature Engineering and Artificial Intelligence. Sensors 2024, 24, 3754. [Google Scholar] [CrossRef]
Lugaresi, C.; Tang, J.; Nash, H.; McClanahan, C.; Uboweja, E.; Hays, M.; Zhang, F.; Chang, C.L.; Yong, M.G.; Lee, J.; et al. MediaPipe: A framework for building perception pipelines. arXiv 2019, arXiv:1906.08172. [Google Scholar]
Ghoddoosian, R.; Galib, M.; Athitsos, V. A realistic dataset and baseline temporal model for early drowsiness detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 16 June 2019; pp. 178–187. [Google Scholar]
Weng, C.-H.; Lai, Y.-H.; Lai, S.-H. Driver drowsiness detection via a hierarchical temporal deep belief network. In Proceedings of the ACCV Workshops, Taipei, Taiwan, 20–24 November 2016. [Google Scholar]
Abtahi, S.; Omidyeganeh, M.; Shirmohammadi, S.; Hariri, B. YawDD: A yawning detection dataset. In Proceedings of the 5th ACM Multimedia Systems Conference, Singapore, 19–21 March 2014; pp. 24–28. [Google Scholar]
Petrellis, N.; Zogas, S.; Christakos, P.; Mousouliotis, P.; Keramidas, G.; Voros, N.; Antonopoulos, C. Software Acceleration of the Deformable Shape Tracking Application: How to eliminate the Eigen Library Overhead. In Proceedings of the 2021 2nd European Symposium on Software Engineering, Larissa, Greece, 19–21 November 2021; pp. 51–57. [Google Scholar]
He, C.; Xu, P.; Pei, X.; Wang, Q.; Yue, Y.; Han, C. Fatigue at the wheel: A non-visual approach to truck driver fatigue detection by multi-feature fusion. Accid. Anal. Prev. 2024, 199, 107511. [Google Scholar] [PubMed]
Soukupová, T.; Čech, J. Real-time eye blink detection using facial landmarks. In Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
Choi, I.-H.; Jeong, C.-H.; Kim, Y.-G. Tracking a driver’s face against extreme head poses and inference of drowsiness using a hidden Markov model. Appl. Sci. 2016, 6, 137. [Google Scholar] [CrossRef]
Dementhon, D.F.; Davis, L.S. Model-based object pose in 25 lines of code. Int. J. Comput. Vis. 1995, 15, 123–141. [Google Scholar] [CrossRef]
Aloui, S. (ParisNeo). FaceAnalyzer [Software]. 2021. Available online: https://github.com/ParisNeo/FaceAnalyzer (accessed on 12 February 2025).
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar]
Rumelhart, D.; Hinton, G.; Williams, R. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Chung, J.; Ahn, S.; Bengio, Y. Hierarchical Multiscale Recurrent Neural Networks. arXiv 2016, arXiv:abs/1609.01704. [Google Scholar]
Liu, P.; Chi, H.-L.; Li, X.; Guo, J. Effects of dataset characteristics on the performance of fatigue detection for crane operators using hybrid deep neural networks. Autom. Constr. 2021, 132, 103901. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to Forget: Continual Prediction with LSTM. Neural Comput. 2000, 12, 2451–2471. [Google Scholar]
Pandey, N.N.; Muppalaneni, N.B. Dumodds: Dual modeling approach for drowsiness detection based on spatial and spatio-temporal features. Eng. Appl. Artif. Intell. 2023, 119, 105759. [Google Scholar] [CrossRef]
Mittal, S.; Gupta, S.; Sagar, A.; Shamma, I.; Sahni, I.; Thakur, N. Driver drowsiness detection using machine learning and image processing. In Proceedings of the 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 3–4 September 2021; pp. 1–8. [Google Scholar]
Magán, E.; Sesmero, M.P.; Alonso-Weber, J.M.; Sanchis, A. Driver drowsiness detection by applying deep learning techniques to sequences of images. Appl. Sci. 2022, 12, 1145. [Google Scholar] [CrossRef]

Figure 1. Complete description of the proposed system.

Figure 2. Sample frames from the UTA-RLDD dataset.

Figure 3. Key eye points annotated.

Figure 4. Key mouth points annotated.

Figure 5. Annotation of key head pose points.

Figure 6. Annotation of key gaze direction points.

Figure 7. The architecture of the ensemble model.

Figure 8. Bar chart depicting the accuracy of different machine learning algorithms under participant-dependent validation: (a) Holdout (80%); (b) Holdout (70%); (c) K-Fold; (d) Stratified K-Fold.

Figure 9. Bar chart depicting the performance of different machine learning algorithms under participant-independent validation. (a) Accuracy; (b) VA.

Figure 10. Confusion matrix under participant-independent validation: (a) RF; (b) MLP; (c) XGBoost; (d) Ensemble.

Figure 11. SHAP feature importance plot.

Table 1. Descriptive statistics of input features for driver drowsiness detection.

SI	Feature	Description
1	BC	Blink count
2	BD	Average blink duration
3	MEC	Total eye closure duration
4	AMP_AVG	Average blink amplitude
5	AMP_MAX	Maximum blink amplitude
6	AMP_MIN	Minimum blink amplitude
7	EOV_AVG	Average eye opening velocity
8	EOV_MAX	Maximum eye opening velocity
9	EOV_MIN	Minimum eye opening velocity
10	Perclos	Percentage of eye closure
11	EAR_AVG	Average eye aspect ratio
12	NS	Number of nods
13	PA	Average head pitch angle
14	HD	Duration for which the head is in a downward position
15	HA	Head activity
16	GA	Gaze activity
17	CDC	Center direction count
18	YC	Yawn count
19	YCI	Yawning in the first or last 90 s
20	IA	Inactive state

Table 2. Performance of different models with participant-dependent validation.

Validation	Classifier	Accuracy (%)	Precision (%)	Recall (%)
Holdout (80:20)	RF	80.47	80.29	79.82
	MLP	81.07	80.86	80.72
	XGBoost	86.24	86.12	86.16
	Ensemble	88.76	88.68	88.71
Holdout (70:30)	RF	78.97	78.88	78.62
	MLP	81.34	81.10	81.09
	XGBoost	85.98	85.85	85.80
	Ensemble	87.46	87.44	87.43
K-Fold (K = 10)	RF	78.29 ± 1.82	78.03 ± 1.82	77.76 ± 1.68
	MLP	82.96 ± 2.32	82.91 ± 2.33	82.64 ± 2.31
	XGBoost	85.69 ± 1.48	85.68 ± 1.47	85.40 ± 1.58
	Ensemble	86.94 ± 1.09	86.85 ± 1.01	86.72 ± 1.11
Stratified-K-Fold (K = 10)	RF	78.05 ± 2.13	77.94 ± 2.23	77.61 ± 2.17
	MLP	81.39 ± 1.58	81.31 ± 1.52	81.13 ± 1.65
	XGBoost	85.58 ± 2.52	85.52 ± 2.61	85.35 ± 2.60
	Ensemble	86.93 ± 2.06	86.87 ± 2.09	86.74 ± 2.14

Table 3. Performance of different models with participant-independent validation.

Validation	Classifier	Accuracy (%)	Precision (%)	Recall (%)	VA (%)
LOPV	RF	74.08 ± 11.80	75.04 ± 12.01	70.19 ± 13.60	80.34
	MLP	73.64 ± 9.97	73.71 ± 10.88	69.37 ± 12.36	73.73
	XGBoost	78.38 ± 10.00	78.01 ± 10.64	74.77 ± 11.77	85.36
	Ensemble	78.85 ± 9.47	78.73 ± 9.840	75.43 ± 11.34	86.52

Table 4. Performance of ensemble models with various feature combinations.

Facial Region Selection				Metric (%)
Eye	Mouth	Head	Gaze	Accuracy	Precision	Recall	VA
			√	43.34 ± 11.29	44.74 ± 12.79	38.13 ± 11.71	43.82
		√		54.95 ± 11.66	54.13 ± 13.51	49.96 ± 13.87	57.30
		√	√	56.93 ± 12.00	54.30 ± 13.67	51.04 ± 13.21	60.67
	√			41.10 ± 11.83	75.65 ± 8.54	26.37 ± 13.82	38.20
	√		√	47.68 ± 12.98	48.27 ± 15.04	41.68 ± 14.95	50.56
	√	√		57.47 ± 12.17	56.04 ± 14.08	51.90 ± 14.96	62.36
	√	√	√	60.47 ± 12.53	57.57 ± 15.10	54.61 ± 14.82	63.48
√				70.51 ± 11.19	69.40 ± 13.03	65.74 ± 12.77	74.16
√			√	72.93 ± 10.56	71.85 ± 11.51	68.38 ± 11.71	78.09
√		√		74.90 ± 10.02	75.17 ± 11.10	70.72 ± 12.27	79.21
√		√	√	76.58 ± 10.12	76.49 ± 9.93	72.54 ± 12.22	82.58
√	√			73.35 ± 10.24	72.52 ± 11.60	69.11 ± 11.68	79.78
√	√		√	75.74 ± 10.06	74.85 ± 11.07	71.60 ± 11.75	82.58
√	√	√		77.67 ± 9.44	77.62 ± 10.47	74.01 ± 11.57	85.39
√	√	√	√	78.85 ± 9.47	78.73 ± 9.84	75.43 ± 11.34	86.52

Table 5. Performance of different methods on the UTA-RLDD dataset.

Method	VA (%)
[76]	65.2
[60]	81.00
[89]	64.48
[92]	86
[93]	75.68
[94]	63
Ours	86.52

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, C.; Huang, W.; Liu, J.; Li, L. Detecting Driver Drowsiness Using Hybrid Facial Features and Ensemble Learning. Information 2025, 16, 294. https://doi.org/10.3390/info16040294

AMA Style

Xu C, Huang W, Liu J, Li L. Detecting Driver Drowsiness Using Hybrid Facial Features and Ensemble Learning. Information. 2025; 16(4):294. https://doi.org/10.3390/info16040294

Chicago/Turabian Style

Xu, Changbiao, Wenhao Huang, Jiao Liu, and Lang Li. 2025. "Detecting Driver Drowsiness Using Hybrid Facial Features and Ensemble Learning" Information 16, no. 4: 294. https://doi.org/10.3390/info16040294

APA Style

Xu, C., Huang, W., Liu, J., & Li, L. (2025). Detecting Driver Drowsiness Using Hybrid Facial Features and Ensemble Learning. Information, 16(4), 294. https://doi.org/10.3390/info16040294

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Detecting Driver Drowsiness Using Hybrid Facial Features and Ensemble Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. Feature Extraction

2.2.1. Eye Region Features

2.2.2. Mouth Contour Features

2.2.3. Head Pose Features

2.2.4. Gaze Direction Features

2.3. Adaptive Threshold Correction Method

2.4. Data Preprocessing

2.5. Classifiers

2.6. Shapley Additive Explanations (SHAP)

3. Results and Discussion

3.1. Performance Comparison of Different Models

3.2. Performance Comparison of Different Feature Combinations

3.3. Comparison with Similar Techniques

3.4. SHAP Analysis

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI