Next Article in Journal
Electro-Oculography and Proprioceptive Calibration Enable Horizontal and Vertical Gaze Estimation, Even with Eyes Closed
Previous Article in Journal
Metabolic Syndrome Detection Based on Classification of Electrocardiography Signals
Previous Article in Special Issue
Exploring Spectrogram-Based Audio Classification for Parkinson’s Disease: A Study on Speech Classification and Qualitative Reliability Verification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Drowsiness Classification in Young Drivers Based on Facial Near-Infrared Images Using a Convolutional Neural Network: A Pilot Study

1
Department of Electrical Engineering and Electronics, Aoyama Gakuin University, Sagamihara 252-5258, Japan
2
Department of Information and Management Systems Engineering, Nagaoka University of Technology, Niigata 940-2188, Japan
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(21), 6755; https://doi.org/10.3390/s25216755
Submission received: 20 August 2025 / Revised: 25 October 2025 / Accepted: 26 October 2025 / Published: 4 November 2025

Abstract

Drowsy driving is a major cause of traffic accidents worldwide, and its early detection remains essential for road safety. Conventional driver monitoring systems (DMS) primarily rely on behavioral indicators such as eye closure, gaze, or head pose, which typically appear only after a significant decline in alertness. This study explores the potential of facial near-infrared (NIR) imaging as a hypothetical physiological indicator of drowsiness. Because NIR light penetrates more deeply into biological tissue than visible light, it may capture subtle variations in blood flow and oxygenation near superficial vessels. Based on this hypothesis, we conducted a pilot feasibility study involving young adult participants to investigate whether drowsiness levels could be estimated from single-frame NIR facial images acquired at 940 nm—a wavelength already used in commercial DMS and suitable for both physiological sensitivity and practical feasibility. A convolutional neural network (CNN) was trained to classify multiple levels of drowsiness, and Gradient-weighted Class Activation Mapping (Grad-CAM) was applied to interpret the discriminative regions. The results showed that classification based on 940 nm NIR images is feasible, achieving an optimal accuracy of approximately 90% under the binary classification scheme (Pattern A). Grad-CAM revealed that regions around the nasal dorsum contributed to this, consistent with known physiological signs of drowsiness. These findings support the feasibility of NIR-based drowsiness classification in young drivers and provide a foundation for future studies with larger and more diverse populations.

1. Introduction

In the 21st century, automotive safety technologies have made remarkable progress. The widespread adoption of advanced driver assistance systems (ADAS), such as automatic emergency braking (AEB) and lane departure warning (LDW), has significantly improved the ability of vehicles to avoid accidents. Nevertheless, the majority of traffic accidents are still caused by human error, with decreased driver attention, drowsiness, and cognitive failures remaining major challenges.
According to the World Health Organization (WHO) Global Status Report on Road Safety 2023, approximately 1.19 million people die each year in road traffic crashes worldwide. The report highlights that this burden is particularly pronounced in low- and middle-income countries, and that road traffic injuries are the leading cause of death globally among young people aged 5 to 29 years [1]. In addition, according to the National Highway Traffic Safety Administration (NHTSA), driver misjudgment and inattention are major contributing factors in road crashes, with drowsy driving alone estimated to be involved in approximately 91,000 police-reported crashes in the United States in 2017 [2]. In addition to fatigue and drowsiness, distractions from smartphone use, the influence of alcohol and drugs, and age-related factors all compromise driver concentration. As a result, safety risks have emerged that cannot be addressed by vehicle control technologies alone.
Against this backdrop, driver monitoring systems (DMS) have attracted increasing attention in recent years. DMS monitor the driver’s state, such as drowsiness and inattention, using in-vehicle cameras and provides timely warnings. In Europe, DMS installation has been mandatory in new vehicles since 2022 [3], and similar regulations are being accelerated in Japan [4] and other countries [5,6,7]. Importantly, DMS are not merely alerting systems but are also recognized as a key element of human-centered design in the era of automated driving. By accurately assessing driver states, they enable safe authority transfer between the driver and the automated system, thereby supporting the transition toward higher levels of automation (L3–L4) [8,9].
Currently, most commercial DMS rely on behavioral indicators such as blink rate, eyelid closure, gaze direction, head pose, and body posture [10,11,12,13,14,15,16,17,18,19]. Previous studies on driver monitoring have often been conducted with a small number of participants under controlled laboratory conditions to verify methodological feasibility [11,18,20]. These examples indicate that small-sample pilot studies are widely accepted in the early phase of driver state estimation research, providing valuable insights before large-scale validation. However, these signals tend to manifest only after a significant decline in alertness, limiting their usefulness for early drowsiness detection [21]. Moreover, behavioral features are only external manifestations and do not directly reflect the driver’s physiological state [22]. This gap underscores the need for indices that more directly capture physiological changes related to drowsiness.
Near-infrared (NIR) imaging is considered a promising approach for exploring physiological correlates of drowsiness. The NIR spectrum lies between visible and far-infrared light and contains a wavelength band with relatively high penetration into biological tissues, often referred to as the biological window. The primary absorbers in biological tissue are hemoglobin and water: light below 700 nm is strongly absorbed by hemoglobin, while absorption by water increases markedly above 1200 nm. Consequently, the 700–1200 nm band minimizes absorption by both and allows deeper light penetration than visible light, suggesting that NIR reflectance may contain information related to subsurface blood perfusion and other physiological changes [23,24]. Based on this optical characteristic, the present study investigates whether facial NIR images could serve as potential physiological indicators of drowsiness.
Previous studies have shown the feasibility of using NIR imaging to estimate physiological or psychological states. For example, blood pressure estimation from facial NIR images has been demonstrated [25,26]. In addition, facial NIR-based methods for noninvasive glucose estimation have also been reported [27,28]. These findings suggest that NIR imaging can provide information beyond behavioral features and may reflect underlying physiological processes. Based on this perspective, we hypothesize that facial NIR images can be interpreted as a physiological indicator, capturing skin tone variations associated with drowsiness.
In particular, this study focuses on the 940 nm wavelength, which is already widely used in commercial driver monitoring cameras [29,30,31,32,33]. By utilizing the same spectral band as current systems, the proposed approach aims to reduce barriers to real-world implementation while exploring the feasibility of early-stage drowsiness detection. This work is therefore positioned as a pilot study for NIR-based drowsiness monitoring. To the best of our knowledge, only very limited prior research has attempted to use facial NIR images as physiological indicators of drowsiness. For instance, our group has previously reported a preliminary approach based on sparse coding [20].
To investigate this hypothesis, we conducted a pilot experiment in which facial NIR images were acquired from multiple participants under controlled driving-simulation conditions. Images were captured at 940 nm using a commercially compatible camera system. Drowsiness levels were independently annotated based on external facial expression evaluation, serving as ground-truth labels. The constructed dataset was then used to train a convolutional neural network (CNN) [34,35] for multi-level drowsiness classification because CNNs have become the state-of-the-art approach in image recognition tasks. Furthermore, Gradient-weighted Class Activation Mapping (Grad-CAM) [36] was applied to visualize and interpret the regions of the face that contributed most to the CNN’s decisions. This combined approach allowed us not only to evaluate the feasibility of NIR-based early drowsiness detection but also to gain insights into the physiological relevance of the learned representations.
The contributions of this study are summarized as follows:
  • Novel interpretation of NIR facial images: This study proposes to treat 940 nm NIR images as physiological indicators rather than behavioral cues, highlighting their potential to capture drowsiness-related changes in facial skin tone.
  • Demonstration of CNN-based drowsiness classification: By applying a CNN to facial NIR images acquired at 940 nm, the feasibility of multi-level drowsiness classification is demonstrated.
  • Pilot study toward practical applications: Since the 940 nm wavelength is already employed in commercial driver monitoring systems, this approach offers a practical pathway toward real-world implementation.
The remainder of this paper is organized as follows. Section 2 reviews previous studies on driver monitoring systems and identifies key research gaps motivating this work. Section 3 describes the data acquisition protocol, ground-truth labeling procedure, dataset construction, CNN model, evaluation metrics, and Grad-CAM visualization method. Section 4 presents the classification results. Section 5 discusses the physiological interpretation of the CNN’s learned representations. Finally, Section 6 concludes the paper and outlines directions for future research.
This work was conducted as a pilot study, meaning a preliminary investigation with a small number of participants under controlled conditions to test the feasibility of a new methodology before large-scale validation. The primary goal was to examine whether near-infrared (NIR) facial images could serve as physiological indicators of drowsiness when analyzed with a CNN model. The experimental setup was designed to approximate in-vehicle monitoring environments, thereby providing both feasibility evidence and a basis for future extensions under real driving conditions. In this study, we specifically focused on young adult drivers, as this group represents the majority of active drivers and allows the examination of drowsiness-related physiological responses under relatively uniform conditions. By limiting the participant demographic to young adults, potential variability due to age-related physiological and behavioral differences—such as skin reflectance, vascular reactivity, and facial muscle tone—was minimized. This design enables a clearer evaluation of the feasibility of near-infrared (NIR)-based drowsiness detection before extending the framework to older or more diverse populations in future studies.

2. Related Work

Driver monitoring systems (DMS) have been studied using a wide variety of behavioral and physiological indicators to estimate drowsiness or attention decline. Early systems relied primarily on behavioral cues such as eye blinks, eyelid closure (PERCLOS), gaze direction, head pose, and body posture. More recent approaches have introduced physiological sensors such as EEG, ECG/HRV, respiratory signals, or cerebral blood flow (fNIRS). Table 1 summarizes representative studies, their methodologies, and reported performance.
Most of the above studies rely on behavioral cues or sequential physiological signals acquired over several seconds. Consequently, they generally detect drowsiness only when the driver’s arousal level has already dropped markedly—for example, when eyelid closure, head droop, or autonomic fluctuations become prominent. In addition, many methods require wearable sensors (EEG, ECG, fNIRS) or are sensitive to environmental conditions such as illumination and driver posture.

2.1. Research Gap

While behavioral and physiological indicators have proven useful, they commonly depend on temporal feature accumulation, which limits responsiveness for real-time applications. Detection of mild or pre-drowsiness states remains particularly challenging.

2.2. Present Approach

To overcome these limitations, the present study focuses on spatial physiological information embedded in a single near-infrared (NIR) facial image. By employing a convolutional neural network (CNN) without time series input, the proposed method enables instantaneous, non-contact estimation of driver drowsiness levels. This non-sequential, single-frame approach mitigates sensor attachment issues and environmental sensitivity, and has the potential to detect subtle physiological changes preceding overt behavioral signs of drowsiness.

3. Materials and Methods

In this study, facial near-infrared images corresponding to all drowsiness levels were collected through experiments for drowsiness classification using CNNs. In addition, the ground-truth drowsiness levels were obtained through facial expression assessments conducted by evaluators other than the participants. Further details are provided in the following sections.

3.1. Facial NIR Imaging System and Experimental Setup

This subsection describes the imaging system and experimental setup used to acquire facial near-infrared (NIR) images under drowsiness-inducing conditions.

3.1.1. Experimental Systems

The hardware configuration of the NIR imaging system and driving simulator is summarized below. The experimental setup is shown in Figure 1.
Facial near-infrared (NIR) images were captured using a NIR camera (Genie Nano M1280 NIR, Teledyne DALSA, Waterloo, Canada). The camera provides a resolution of 1280 × 1024 pixels, with a spectral sensitivity of approximately 400–1000 nm, and the image acquisition rate was set to 1 fps. During image acquisition, the subject’s face was illuminated with LED light at a wavelength of 940 nm. A long-pass filter with a cut-on wavelength of 780 nm (RG780, SCHOTT, Mainz, Germany) was placed in front of the NIR camera to block visible light. In addition, a polarizer (SPFN-30C-26, Sigma Koki, Tokyo, Japan) was installed in front of both the LED and the NIR camera to measure scattered light from the skin.

3.1.2. Procedure and Conditions

The experimental procedure and pre-experimental conditions were designed to ensure consistency across participants and to effectively induce drowsiness.
In this experiment, a driving task using a driving simulator was assigned to the participants in order to record facial near-infrared (NIR) images under drowsiness-inducing conditions. The participants were 10 healthy adults (male and female) who held a standard driver’s license (mean age: 22.7 ± 0.9 years). Prior to the experiment, they were instructed to sleep at least 6 h, refrain from food and drink (except water), smoking, sleeping, and exercise for 2 h before the experiment, and participate without makeup, sunscreen, or other facial coverings. During the experiment, subjects were required to wear no visual aids other than eyeglasses or contact lenses. Two experimenters were present in the same room to monitor the physiological measurement conditions. To allow acclimatization to room temperature, the experiment was initiated at least 30 min after the subjects entered the room. Furthermore, to allow participants to adapt to the steering, accelerator, and brake operations of the driving simulator, a 15 min practice session was conducted. Each participant completed a single session composed of three consecutive periods, as illustrated in Table 2:
Thus, the total duration per participant was 46 min. Facial NIR images were recorded throughout the session according to this timeline. During the eyes-open resting period, a white cross was presented on a black background on the liquid crystal display in front of the participant, and the subjects were instructed via voice guidance to fixate on the cross. During the driving period, participants were instructed to maintain a constant driving speed of 80 km/h using the driving simulator. To enhance the drowsiness-inducing effect, the driving posture and steering style were left to the discretion of the subjects.
The driving simulator consisted of an accelerator and brake pedal set, a steering wheel controller (FANATEC ClubSport Wheel, Endor AG, Landshut, Germany), and three LCD monitors (40S21, Toshiba Visual Solutions Corporation, Tokyo, Japan). The simulation software used was Assetto Corsa (ver. 1.0, Kunos Simulazioni S.r.l., Rome, Italy). The vehicle model was based on the Suzuki Cappuccino (Suzuki Motor Corporation, Hamamatsu, Japan), a popular Japanese compact car, and the driving course was a flat, monotonous oval track with a total length of approximately 5500 m (Figure 2).
To reproduce the sensation of real driving, the vehicle dynamics and steering feedback were optimized, and the driving environment simulated a clear midnight condition. The seat of the driving simulator is shown in Figure 3.
To promote drowsiness, the seat was a relatively soft fabric seat originally used in an actual vehicle.

3.2. Drowsiness Annotation by Facial Expression Evaluation

To obtain ground-truth labels of drowsiness, facial expressions were evaluated by raters based on established criteria.
Since a strong correlation has been confirmed between subjective drowsiness ratings and facial expressions, it is possible to evaluate drowsiness based on facial expressions. In this study, following the criteria defined by Kitajima et al. [38] (Table 3), five evaluators other than the subjects assessed the drowsiness levels based on facial expression videos recorded during the experiment. These criteria [38] have been widely applied in subsequent driver monitoring research and incorporated into the national guidelines for driver monitoring systems issued by the Ministry of Land, Infrastructure, Transport and Tourism [4]. Furthermore, the validity of this facial expression-based framework has been empirically supported by recent research. Uchiyama et al. [39] demonstrated that the observer rating of drowsiness (ORD) based on the Kitajima et al. (1997) criteria [38] showed significant correlations with subjective (Karolinska Sleepiness Scale), physiological (EEG θ / α ratio), and behavioral (driving performance) indices, confirming its convergent validity. Therefore, the present labeling approach is consistent with established and validated methodologies for evaluating driver drowsiness.

3.3. Dataset

The dataset was constructed from the acquired NIR facial images and the corresponding drowsiness labels, followed by preprocessing for CNN input.

3.3.1. Input Images

This subsection describes how facial regions were extracted and standardized to generate the CNN input images. All near-infrared images acquired in the experiment were processed to extract the facial region using the method proposed in the previous study [40]. To further reduce the influence of inter-individual differences in facial orientation and morphology, spatial normalization [41] was applied. Finally, the images were resized to 259 × 259 pixels for use as input to the CNN. Figure 4 shows an example of input near-infrared images after spatial normalization.
These images were normalized immediately before being fed into the CNN, such that their maximum and minimum values were set to 1 and 0, respectively.

3.3.2. Labels

The labeling scheme for drowsiness levels and the redefinition of classes into three patterns (A, B, C) are summarized here.
Table 4 shows the number of data samples for each subject categorized according to the five-level drowsiness ratings.
In view of the fact that the objective of this study is to capture the early stages of drowsiness and to make use of data from subjects who did not exhibit levels 4 or 5, the drowsiness levels were redefined. The redefined levels are presented in Table 5.
In Pattern A, Level 1 was redefined as Low, and Levels 2 to 5 as High. In Pattern B, Levels 1 and 2 were redefined as Low, and Levels 3 to 5 as High. In Pattern C, Level 1 was redefined as Low, Level 2 as Medium, and Levels 3 to 5 as High. Table 6 shows the number of samples for each subject after redefining the levels (patterns A, B, C).
Furthermore, downsampling was performed to equalize the number of samples across drowsiness levels and among subjects for fair machine learning. The number of samples after downsampling is shown in Table 7.

3.4. CNN Modeling

The architecture and configuration of the convolutional neural network (CNN) used for drowsiness classification are detailed in this subsection.
The architecture of the CNN is summarized in Table 8.
Here, Conv, MaxPool, Dropout, and FC denote a convolutional layer, max-pooling layer, dropout layer, and fully connected layer, respectively. To ensure comparability with our previous study (currently under review), all hyperparameters—including the number of units per layer, network depth, learning rate, number of convolutional filters, kernel size, pooling window, batch size, and dropout rate—were kept identical to those used in that work. As a result, the CNN was configured with five convolutional layers, four max-pooling layers, two dropout layers, and two fully connected layers; Dropout1 and Dropout2 disabled 25% and 50% of the connections, respectively. The network was trained using the RMSprop optimizer with a learning rate of 0.001. The loss function was categorical cross-entropy, and training was performed for 20 epochs with a batch size of 30. Early stopping was not applied in this study, and a fixed random seed of 1 was used to ensure reproducibility. No data augmentation techniques were applied in this pilot study. As the pixel intensity of NIR images directly reflects physiological absorption characteristics, conventional geometric or photometric data augmentation was not applied to avoid introducing non-biological artifacts.

3.5. Model Evaluation Method

Model performance was evaluated using leave-one-subject-out cross-validation to assess generalizability across individuals.
To evaluate model performance, a 10-fold leave-one-subject-out cross-validation (LOSO CV) was carried out. For each fold, images from one subject were withheld for testing, while the remaining subjects provided the training data. The accuracies obtained from all folds were averaged to determine the final overall accuracy. This analysis was conducted independently for three label assignment strategies: Pattern A, Pattern B, and Pattern C.

3.6. Grad-CAM

Grad-CAM was employed to visualize the discriminative facial regions that contributed to the CNN’s classification decisions.
Grad-CAM [36] was employed to visualize the regions that contributed to CNN decisions. It was computed using the trained models and test images, and only correctly classified samples from each subject (i.e., those in which the predicted and annotated drowsiness levels matched) were analyzed. For each class and subject, the Grad-CAM maps ( L G r a d C A M c ) were averaged. Regions with higher average Grad-CAM values indicate the areas utilized by the CNN when determining the drowsiness level. In this study, Grad-CAM was applied exclusively to the classification pattern that achieved the highest accuracy. We present the averaged Grad-CAM maps superimposed on representative facial near-infrared images for each subject.

4. Results

4.1. CNN Performance

Figure 5 presents the average overall classification accuracy of the CNN model for the three label assignment strategies (Patterns A, B, and C).
The results are shown as the mean accuracy across all ten subjects, with error bars indicating the standard error obtained from 10-fold leave-one-subject-out cross-validation (LOSO CV). As seen in the figure, the highest classification performance was achieved with Pattern A, yielding an average accuracy of approximately 90%. This pattern, in which Level 1 was defined as “Low” and Levels 2–5 as “High,” provided the clearest separation of drowsiness states and demonstrated the most robust generalization across individuals.
In contrast, Pattern B, which grouped Levels 1–2 as “Low” and Levels 3–5 as “High,” exhibited a slight reduction in performance, with an average accuracy of approximately 85%. This decrease suggests that merging Level 2 into the “Low” group increased the intra-class variability, thereby complicating the boundary between drowsiness and non-drowsiness states.
Pattern C, which divided the data into three categories (Low: Level 1, Medium: Level 2, High: Levels 3–5), showed the lowest performance, with an average accuracy of approximately 82%. The increased difficulty of distinguishing between three drowsiness levels, especially in separating the “Medium” category from its adjacent levels, is considered a primary factor in this decline. Furthermore, because Pattern C involved downsampling to balance three classes, the reduction in training data may have also contributed to the lower accuracy compared to binary classification strategies.
Overall, these results demonstrate that the binary classification framework, particularly Pattern A, is most effective for CNN-based drowsiness detection from facial near-infrared images. Therefore, subsequent Grad-CAM analysis in this study was conducted using Pattern A, which achieved the highest overall classification accuracy.
In addition to the overall trends shown in Figure 5, the subject-wise classification accuracies are summarized in Table 9.
These results reveal that Pattern A not only achieved the highest mean accuracy but also demonstrated stable performance across individuals, exceeding 90% in eight out of ten subjects. By contrast, Patterns B and C showed greater inter-subject variability, with certain cases exhibiting relatively low accuracies (e.g., SubB: 66.0% in Pattern B; SubA: 45.2% in Pattern C). This variability highlights the challenge of achieving reliable generalization across subjects, particularly when using more complex label assignment strategies. Nevertheless, the consistently strong performance of Pattern A across most participants reinforces its suitability as the most effective framework for NIR-based drowsiness classification.

4.2. Average Grad-CAM Activation Map

To elucidate the decision-making process of the CNN, Grad-CAM analysis was conducted using Pattern A, which achieved the highest classification accuracy. Figure 6 presents the averaged Grad-CAM heatmaps computed from correctly classified samples of each subject.
The regions displayed in warm colors indicate the areas that contributed most strongly to the CNN’s prediction of drowsiness levels.
Overall, the CNN did not distribute its attention uniformly across the entire face but instead concentrated on physiologically meaningful regions. A notable finding common to all participants was the consistent activation in the periocular area. In subjects with relatively higher classification accuracy (e.g., SubE, SubH, SubI, SubJ), additional activations were also observed along the nasal dorsum and in the perioral region. While periocular activations were most prominent, noticeable responses were likewise observed in the nose and forehead. These regions are associated with variations in skin blood flow and hue [26,42,43,44], suggesting that subtle physiological changes related to drowsiness were likely captured in the near-infrared images. Nevertheless, it should also be considered that activations in the periocular region may, at least in part, reflect behavioral cues such as blinking or eyelid closure.

5. Discussion

This study investigated the feasibility of drowsiness classification based on facial near-infrared (NIR) images using a convolutional neural network (CNN). The results demonstrated that Pattern A, which defined Level 1 as “Low” and Levels 2–5 as “High,” achieved the highest classification accuracy of approximately 90%. Grad-CAM analysis further confirmed that the CNN focused on physiologically meaningful regions, particularly around the eyes and the nasal dorsum, suggesting that the model captured facial features closely associated with drowsiness. Subject-wise accuracies also indicated consistently high performance across individuals, despite some inter-individual differences in the extent of activation regions. These findings collectively suggest that CNN-based analysis of NIR facial images can serve as a promising approach for detecting drowsiness.
Compared to conventional commercial driver-monitoring systems (DMS), which primarily rely on behavioral cues such as eyelid closure, blink rate, or head movement, the present approach offers several advantages. Behavioral features often require temporal accumulation of data and are typically detected only after drowsiness has already advanced. By contrast, our results demonstrate that NIR facial images can be treated as physiological indicators that reflect subtle changes associated with drowsiness, enabling earlier detection. In particular, the structure of Pattern A, which distinguished a completely alert state (Level 1) from all other states (Levels 2–5), highlights its potential applicability for identifying the early onset of drowsiness—a capability that conventional behavior-based methods struggle to achieve.
In addition to conventional behavior-based methods, drowsiness estimation from NIR facial images has previously been attempted using sparse coding [20]. While sparse coding requires handcrafted feature extraction and may be constrained in capturing complex spatial representations, the present CNN-based approach learns features in an end-to-end manner, offering greater flexibility and adaptability. Although the current pilot study is limited by sample size, this comparison suggests that CNNs may provide a more robust framework for leveraging NIR facial images in drowsiness detection.
Despite these encouraging results, several limitations must be acknowledged. First, the classification accuracy of 90% is still insufficient for practical deployment in safety-critical contexts such as driving. Even a small number of misclassifications could lead to false alarms or missed detections, which would directly affect driver trust and road safety. Second, the experimental conditions were strictly controlled: participants were required to sleep for more than six hours prior to the experiment, abstain from eating or drinking (other than water), smoking, sleeping, and exercising for two hours before the experiment, and participate without makeup, sunscreen, or glasses (except contact lenses). These conditions may not reflect realistic driving environments, where external factors such as lighting, fatigue history, and personal habits vary considerably. Although the present experiment was conducted under controlled conditions to minimize external noise, previous research has demonstrated that physiological responses related to driver drowsiness in simulator environments are comparable to those in real-world driving [45,46,47]. Furthermore, our group recently developed an artifact correction method that compensates for environmental temperature and time-of-day variations in facial thermography [48], demonstrating that facial physiological imaging can be adapted for use in uncontrolled environments. These findings collectively suggest that the proposed NIR-based approach has practical potential for real-world driver monitoring applications.
Third, the drowsiness labels in this study were assigned based on third-party evaluations of facial expressions. Although no physiological gold standard (e.g., EEG or ECG) or subjective ratings was used for annotation, inter-rater reliability analysis demonstrated acceptable consistency among the five evaluators. Specifically, the intraclass correlation coefficient indicated good agreement across raters (ICC(2,k) = 0.92), while Fleiss’ κ showed fair exact agreement ( κ = 0.29 ), reflecting minor categorical variations that commonly occur in behavioral annotation.
Fourth, the CNN architecture and hyperparameters were not optimized for this task, leaving considerable room for further improvement.
It should be noted that this pilot study did not perform direct temporal validation or comparison with behavioral-based indicators. Our statement on “earlier detection” should therefore be regarded as a potential advantage of the proposed single-frame NIR approach, rather than as a result directly validated by comparison. Future work will include explicit temporal analyses and direct comparisons with conventional behavior-based systems to rigorously confirm this capability.
Future research should address these limitations by validating the approach under real driving conditions, incorporating a larger and more diverse subject pool that includes participants from different age groups and ethnic backgrounds, and introducing multimodal labels that combine subjective scales and physiological measurements with external ratings. In particular, future studies will include validation using physiological reference signals such as electroencephalography (EEG) and electrocardiography (ECG) to clarify the physiological relevance of NIR intensity and improve the robustness of the labeling process. In addition, further optimization of the CNN model, including architecture refinement, data augmentation, and parameter tuning, is expected to enhance classification performance. Integration with conventional behavioral indicators may also lead to a hybrid driver-monitoring system that combines the strengths of both physiological and behavioral monitoring. Finally, systematic evaluations of algorithm performance, vulnerability testing, and comparisons with existing state-of-the-art drowsiness detection methods will be necessary to demonstrate the practical advantages of the proposed framework.
In summary, this pilot study demonstrated that CNN-based analysis of facial NIR images can classify drowsiness with high accuracy and interpretability. While the current performance is not yet sufficient for practical application, the approach provides a foundation for developing advanced driver-monitoring technologies that enable earlier and more reliable detection of drowsiness than conventional systems.
From a practical standpoint, the results suggest that NIR-based CNN analysis could complement existing behavior-based DMS by enabling earlier and potentially more reliable drowsiness detection. In particular, Pattern A, which distinguishes an awake state (Level 1) from all other states (Levels 2–5), demonstrates potential applicability for early warning systems. This framework may be compatible with commercial in-cabin cameras that already employ NIR illumination at 940 nm, thereby enhancing real-time monitoring without requiring substantial hardware modifications. Ultimately, such capability could support safer transitions in semi-automated driving by allowing timely interventions before severe drowsiness occurs.

6. Conclusions

This pilot study explored the feasibility of classifying driver drowsiness using facial near-infrared (NIR) images and a convolutional neural network (CNN). By employing the 940 nm wavelength band already used in commercially available driver-monitoring cameras, we demonstrated that physiological features embedded in NIR facial images can serve as effective indicators of drowsiness. Among the three label assignment strategies, the binary classification scheme (Pattern A), which defined Level 1 as “alert” and Levels 2–5 as “drowsy,” achieved the highest average accuracy of approximately 90%. Although this level of accuracy is not yet sufficient for direct practical implementation, the results suggest that the CNN successfully captured physiological cues associated with drowsiness. Importantly, because Pattern A distinguishes between an awake state and all other levels, it shows potential for supporting early detection of drowsiness.
Furthermore, Grad-CAM analysis revealed that the CNN focused on physiologically meaningful regions such as the periocular, nasal dorsum, perioral, and forehead when making classification decisions. These regions correspond well with known drowsiness-related changes, supporting the reliability of the model. The analysis also showed inter-individual variations in highlighted areas, indicating that the CNN flexibly adapted to individual facial characteristics while consistently attending to relevant regions across subjects.
Nevertheless, several limitations should be acknowledged. The drowsiness labels were derived from third-party evaluations of facial expressions, without incorporating subjective self-reports or physiological references such as EEG or ECG. In addition, no extensive model optimization was conducted in this study, suggesting room for further improvement in classification accuracy and generalization. Future work should address these issues by integrating multimodal data, increasing subject diversity, and applying more advanced network architectures.
In summary, this study provides the first evidence that CNN-based analysis of NIR facial images can be leveraged as a physiological approach for driver drowsiness detection. While challenges remain for practical application, NIR imaging holds promise as a means for achieving earlier and more reliable detection of drowsiness compared to conventional behavior-based systems.

Author Contributions

Conceptualization, A.N. (Ayaka Nomura), K.N., K.O., and A.N. (Akio Nozawa); methodology, A.N. (Ayaka Nomura); software, A.N. (Ayaka Nomura); validation, A.N. (Ayaka Nomura); formal analysis, A.N. (Ayaka Nomura); investigation, A.N. (Ayaka Nomura); resources, A.N. (Ayaka Nomura); data curation, A.Y. and T.T.; writing—original draft preparation, A.N. (Ayaka Nomura); writing—review and editing, K.N., K.O., and A.N. (Akio Nozawa); visualization, A.N. (Ayaka Nomura); supervision, A.N. (Akio Nozawa); project administration, A.N. (Akio Nozawa) All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the guidelines of the Declaration of Helsinki and approved by the ethics review board.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. World Health Organization. Global Status Report on Road Safety 2023; World Health Organization: Geneva, Switzerland, 2023; Available online: https://www.who.int/teams/social-determinants-of-health/safety-and-mobility/global-status-report-on-road-safety-2023 (accessed on 25 October 2025).
  2. National Highway Traffic Safety Administration. Drowsy Driving: Asleep at the Wheel; U.S. Department of Transportation: Washington, DC, USA, 2017. Available online: https://www.nhtsa.gov/risky-driving/drowsy-driving (accessed on 25 October 2025).
  3. European Parliament and Council. Regulation (EU) 2019/2144 of 27 November 2019 on Type-Approval Requirements for Motor Vehicles and Their Trailers, and Systems, Components and Separate Technical Units Intended for Such Vehicles. Off. J. Eur. Union 2019, L 325, 1–40. [Google Scholar]
  4. Ministry of Land, Infrastructure, Transport and Tourism. Guidelines for Driver Monitoring Systems to Detect Drowsiness and Inattention; Ministry of Land, Infrastructure, Transport and Tourism: Tokyo, Japan, 2020; Available online: https://www.mlit.go.jp/report/press/content/001377623.pdf (accessed on 25 October 2025).
  5. GB/T 41797–2022; Road Vehicles—Driver Monitoring System—Performance Requirements and Test Methods. Standardization Administration of China: Beijing, China, 2022. Available online: https://www.chinesestandard.net/PDF.aspx/GBT41797-2022?English_GB/T%2041797-2022 (accessed on 25 October 2025).
  6. U.S. Senate. Stay Aware for Everyone (SAFE) Act of 2021; U.S. Government Publishing Office: Washington, DC, USA, 2021.
  7. Business Wire. Global and China Automotive DMS/OMS Research Report, 2023–2024. 8 April 2024. Available online: https://www.businesswire.com/news/home/20240408995808/en/ (accessed on 25 October 2025).
  8. Gerber, M.A.; Schroeter, R.; Ho, B. A human factors perspective on how to keep SAE Level 3 conditional automated driving safe. Transp. Res. Interdiscip. Perspect. 2023, 22, 100959. [Google Scholar] [CrossRef]
  9. Morales-Alvarez, W.; Sipele, O.; Léberon, R.; Tadjine, H.H.; Olaverri-Monreal, C. Automated Driving: A Literature Review of the Take over Request in Conditional Automation. Electronics 2020, 9, 2087. [Google Scholar] [CrossRef]
  10. Sigari, M.-H.; Pourshahabi, M.-R.; Soryani, M.; Fathy, M. A Review on Driver Face Monitoring Systems for Fatigue and Distraction Detection. Int. J. Adv. Sci. Technol. 2014, 64, 73–100. [Google Scholar] [CrossRef]
  11. Ji, Q.; Zhu, Z.; Lan, P. Real-time eye, gaze, and face pose tracking for monitoring driver vigilance. IEEE Trans. Veh. Technol. 2002, 8, 357–377. [Google Scholar] [CrossRef]
  12. Yae, J.-H.; Oh, Y.D.; Kim, M.S.; Park, S.H. A Study on the Development of Driver Behavior Simulation Dummy for the Performance Evaluation of Driver Monitoring System. Int. J. Precis. Eng. Manuf. 2024, 25, 641–652. [Google Scholar] [CrossRef]
  13. Kim, D.; Park, H.; Kim, T.; Kim, W.; Paik, J. Real-time Driver Monitoring System with Facial Landmark-based Behavior Recognition. Sci. Rep. 2023, 13, 18264. [Google Scholar]
  14. Halin, A.; Verly, J.G.; Van Droogenbroeck, M. Survey and Synthesis of State of the Art in Driver Monitoring. Sensors 2021, 21, 5558. [Google Scholar] [CrossRef]
  15. Sharma, P.K.; Chakraborty, P. A Review of Driver Gaze Estimation and Application in Gaze Behavior Understanding. Eng. Appl. Artif. Intell. 2024, 133, 108117. [Google Scholar] [CrossRef]
  16. Lachance-Tremblay, J.; Tkiouat, Z.; Léger, P.-M.; Cameron, A.-F.; Titah, R.; Coursaris, C.K.; Sénécal, S. A gaze-based driver distraction countermeasure: Comparing effects of multimodal alerts on driver’s behavior and visual attention. Int. J. Hum.-Comput. Stud. 2025, 193, 103366. [Google Scholar] [CrossRef]
  17. Meyer, J.E.; Llaneras, R.E. Behavioral Indicators of Drowsy Driving: Active Search Mirror Checks; Safe-D Project Report 05-084; Virginia Tech Transportation Institute (VTTI): Blacksburg, VA, USA, 2022. Available online: https://rosap.ntl.bts.gov/view/dot/63152 (accessed on 25 October 2025).
  18. Soleimanloo, S.S.; Wilkinson, V.E.; Cori, J.M.; Westlake, J.; Stevens, B.; Downey, L.A.; Shiferaw, B.A.; Rajaratnam, S.M.W.; Howard, M.E. Eye-blink parameters detect on-road track-driving impairment following severe sleep deprivation. J. Clin. Sleep Med. 2019, 15, 1271–1284. [Google Scholar] [CrossRef]
  19. Qu, F.; Dang, N.; Furht, B.; Nojoumian, M. Comprehensive study of driver behavior monitoring systems using computer vision and machine learning techniques. J. Big Data 2024, 11, 32. [Google Scholar] [CrossRef]
  20. Okajima, T.; Yoshida, A.; Oiwa, K.; Nagumo, K.; Nozawa, A. Drowsiness Estimation Based on Facial Near-Infrared Images Using Sparse Coding. IEEJ Trans. Electr. Electron. Eng. 2023, 18, 1961–1963. [Google Scholar] [CrossRef]
  21. Perkins, E.; Sitaula, C.; Burke, M.; Marzbanrad, F. Challenges of Driver Drowsiness Prediction: The Remaining Steps to Implementation. IEEE Trans. Intell. Veh. 2022, 8, 1319–1338. [Google Scholar] [CrossRef]
  22. Saleem, A.A.; Siddiqui, H.U.R.; Raza, M.A.; Rustam, F.; Dudley, S.; Ashraf, I. A systematic review of physiological signals based driver drowsiness detection systems. Cogn. Neurodynamics 2023, 17, 1229–1259. [Google Scholar] [CrossRef]
  23. Prahl, S. Optical Absorption of Hemoglobin; Oregon Medical Laser Center: Portland, OR, USA, 1999; Available online: https://omlc.org/spectra/hemoglobin/ (accessed on 25 October 2025).
  24. Jacques, S.L. Optical properties of biological tissues: A review. Phys. Med. Biol. 2013, 58, R37–R61. [Google Scholar] [CrossRef]
  25. Takano, M.; Nagumo, K.; Nanai, Y.; Oiwa, K.; Nozawa, A. Remote blood pressure measurement from near-infrared face images: A comparison of the accuracy by the use of first and second biological optical window. Biomed. Opt. Express 2024, 15, 1523–1537. [Google Scholar] [CrossRef]
  26. Oiwa, K.; Ozawa, Y.; Nagumo, K.; Nishimura, S.; Nanai, Y.; Nozawa, A. Remote Blood Pressure Sensing Using Near-Infrared Wideband LEDs. IEEE Sens. J. 2021, 21, 24327–24337. [Google Scholar] [CrossRef]
  27. Nakagawa, M.; Oiwa, K.; Nanai, Y.; Nagumo, K.; Nozawa, A. A comparative study of linear and nonlinear regression models for blood glucose estimation based on near-infrared facial images from 760 to 1650 nm wavelength. Artif. Life Robot. 2024, 29, 501–509. [Google Scholar] [CrossRef]
  28. Nakagawa, M.; Oiwa, K.; Nanai, Y.; Nagumo, K.; Nozawa, A. Feature Extraction for Estimating Acute Blood Glucose Level Variation From Multiwavelength Facial Images. IEEE Sens. J. 2024, 23, 20247–20257. [Google Scholar] [CrossRef]
  29. AMS OSRAM. In-Cabin Sensing Systems Today Operate at 940 nm Wavelength While Some Also Use 850 nm. Automotive & Mobility—In-Cabin Sensing. 2021. Available online: https://ams-osram.com/applications/automotive-mobility/in-cabin-sensing (accessed on 25 October 2025).
  30. Automotive Focus Shifts to Driver Monitoring Systems. Embedded (EE Times). 1 September 2021. Available online: https://www.embedded.com/automotive-focus-shifts-to-driver-monitoring-systems/ (accessed on 25 October 2025).
  31. Mouser Electronics White Paper: Driver Monitor System Yields Safety Gains. All About Circuits. 9 August 2022. Available online: https://www.allaboutcircuits.com/uploads/articles/Driver_Monitor_System_Yields_Safety_Gains.pdf (accessed on 25 October 2025).
  32. Lambert, D.K.; Itsede, F.; Tomura, K.; Nohara, A.; Chou, K.; Carty, D. Windshield with Enhanced Infrared Reflectivity Enables Packaging a Driver Monitor System in a Head-Up Display; SAE Technical Paper. No. 2021-01-0105; SAE International: Warrendale, PA, USA, 2021. [Google Scholar] [CrossRef]
  33. Smart Eye: AIS Driver Monitoring System; Product Page Including Field of View and Wavelength Specifications (940 nm); Macnica Corporation: Yokohama, Japan, 2023; Available online: https://www.macnica.co.jp/en/business/semiconductor/manufacturers/smarteye/products/144981/ (accessed on 25 October 2025).
  34. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
  35. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NeurIPS), Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
  36. Arefnezhad, S.; Keshavarz, A.; Lin, C.; Lin, F.; Tavakoli, N.; Chou, C.A. Driver drowsiness estimation using EEG signals with a dynamical encoder–decoder modeling framework. Sci. Rep. 2022, 12, 16435. [Google Scholar] [CrossRef]
  37. Khan, M.J.; Hong, K.-S. Hybrid EEG–fNIRS-based eight-command decoding for BCI: Application to drowsiness detection. Front. Hum. Neurosci. 2015, 9, 610–621. [Google Scholar]
  38. Kitajima, H.; Numata, N.; Yamamoto, K.; Goi, Y. Prediction of Automobile Driver Sleepiness (1st Resort, Rating of Sleepiness Based on Facial Expression and Examination of Effective Predictor Indexes of Sleepiness). Trans. Jpn. Soc. Mech. Eng. (C Ed.) 1997, 63, 93–100. (In Japanese) [Google Scholar]
  39. Uchiyama, Y.; Sawai, S.; Omi, T.; Yamauchi, K.; Tamura, K.; Sakata, T.; Nakajima, K.; Sakai, H. Convergent validity of video-based observer rating of drowsiness, against subjective, behavioral, and physiological measures. PLoS ONE 2023, 18, e0285557. [Google Scholar] [CrossRef]
  40. Nagumo, K.; Kobayashi, T.; Oiwa, K.; Nozawa, A. Face Alignment in Thermal Infrared Images Using Cascaded Shape Regression. Int. J. Environ. Res. Public Health 2021, 18, 1776. [Google Scholar] [CrossRef]
  41. Nagumo, K.; Oiwa, K.; Nozawa, A. Spatial normalization of facial thermal images using facial landmarks. Artif. Life Robot. 2021, 26, 481–487. [Google Scholar] [CrossRef]
  42. Sundelin, T.; Lekander, M.; Kecklund, G.; Van Someren, E.J.W.; Olsson, A.; Axelsson, J. Cues of fatigue: Effects of sleep deprivation on facial appearance. Sleep 2013, 36, 1355–1360. [Google Scholar] [CrossRef] [PubMed]
  43. Matsukawa, T.; Ozaki, M.; Nishiyama, T.; Imamura, M.; Kumazawa, T.; Iijima, T. Comparison of changes in facial and thigh skin blood flows during general anesthesia. Anesthesiology 1997, 86, 603–609. [Google Scholar]
  44. Aristizabal-Tique, A.; Farup, I.; Hovde, Ø.; Khan, R. Facial skin color and blood perfusion modulations induced by mental and physical workload: A multispectral imaging study. J. Biomed. Opt. 2023, 28, 035002. [Google Scholar]
  45. Ahlström, C.; Fors, C.; Anund, A. Comparing driver sleepiness in simulator and on-road studies using EEG and HRV. Accid. Anal. Prev. 2018, 111, 238–246. [Google Scholar]
  46. Caldwell, J.A.; Hall, S.; Erickson, B.S. Physiological correlates of drowsiness during simulator and real driving. Front. Hum. Neurosci. 2020, 14, 574890. [Google Scholar]
  47. Morales, E.; Navarro, J.; Diaz-Piedra, C. Physiological correlates of fatigue in simulated and real driving: Heart rate and skin temperature comparisons. Appl. Ergon. 2021, 93, 103375. [Google Scholar]
  48. Takano, M.; Nagumo, K.; Lamsal, R.; Nozawa, A. Correction of Artifacts from Environmental Temperature and Time-of-day Variations in Facial Thermography for Field Workers’ Health Monitoring. Adv. Biomed. Eng. 2025, 14, 54–61. [Google Scholar] [CrossRef]
Figure 1. The experimental setup.
Figure 1. The experimental setup.
Sensors 25 06755 g001
Figure 2. The driving screen.
Figure 2. The driving screen.
Sensors 25 06755 g002
Figure 3. The seat of the driving simulator.
Figure 3. The seat of the driving simulator.
Sensors 25 06755 g003
Figure 4. An example of input near-infrared images after spatial normalization.
Figure 4. An example of input near-infrared images after spatial normalization.
Sensors 25 06755 g004
Figure 5. Average of overall accuracy (%) (N = 10). Error bars represent the standard error during 10-fold cross-validation.
Figure 5. Average of overall accuracy (%) (N = 10). Error bars represent the standard error during 10-fold cross-validation.
Sensors 25 06755 g005
Figure 6. Average Grad-CAM activation maps (Pattern A). Higher activations were found primarily in the periocular region, with additional responses along the nasal dorsum, perioral area, and forehead.
Figure 6. Average Grad-CAM activation maps (Pattern A). Higher activations were found primarily in the periocular region, with additional responses along the nasal dorsum, perioral area, and forehead.
Sensors 25 06755 g006
Table 1. Representative studies on driver monitoring based on behavioral and physiological indicators.
Table 1. Representative studies on driver monitoring based on behavioral and physiological indicators.
Main Indicator/InputModel/MethodKey Findings/
Characteristics
Main Limitation
Blink kinematics [18]Statistical/regression analysisBlink parameters associated with degraded driving performance after sleep deprivationMainly detects advanced drowsiness; early sensitivity limited.
Behavioral indicators (mirror checks,
head/body
posture) [17]
Statistical feature analysisBehavioral indicators such as mirror-check frequency linked to drowsiness progression(No significant difference)Weak changes in early drowsiness stage.
Facial landmarks (68 points) [13]Deep LearningClassified driver behaviors from facial landmark dynamics using infrared facial imagesSequence dependence; limited instantaneous response.
Behavioral indicators  [12]Simulation dummy design/system evaluationDeveloped a driver-monitoring evaluation dummy and validated DMS performance under fatigue scenariosHardware-oriented validation; not a direct learning model.
Temporal gaze variation [16]On-road experiment with multimodal alertsInvestigated gaze behavior and visual attention during distraction countermeasuresNot targeted at subtle or short-term drowsiness.
EEG power ratios
( θ / α etc.) [36]
Deep LearningDemonstrated deep-learning-based vigilance estimation from EEG signals under sleep deprivationRequires electrode attachment; noise-sensitive.
Prefrontal hemoglobin changes [37]LDA/SVMHybrid EEG–fNIRS framework; reported vigilance/drowsiness-related cerebral hemodynamic changesBulky device; long temporal response.
Table 2. Experimental protocol and durations of each phase.
Table 2. Experimental protocol and durations of each phase.
PhaseDurationEye State
Rest 11 minEyes open
1 minEyes closed
1 minEyes open
Driving40 minNot controlled
Rest 21 minEyes open
1 minEyes closed
1 minEyes open
Table 3. Drowsiness rating criteria based on facial expression by the drowsiness assessment method [38].
Table 3. Drowsiness rating criteria based on facial expression by the drowsiness assessment method [38].
Drowsiness LevelStateExample of Facial Expression and Motions
Level 1Awake· Eye movements are quick and frequent
· Blink cycles are stable at approximately two per two seconds
· Body motions are active
Level 2Slightly drowsy· Lips are parted
· Motions of eye movements are slow
Level 3Drowsy· Blinks are slow and frequent
· Re-positions body on seat
· Touches hand to face
Level 4Very drowsy· Blinks assumed to occur consciously
· Shakes head
· Frequently yawns
Level 5Extremely drowsy· Eyelids closing
· Leans head back and forth
Table 4. Number of samples for each subject by drowsiness level (Original 5-level ratings [38]).
Table 4. Number of samples for each subject by drowsiness level (Original 5-level ratings [38]).
SubjectLevel 1Level 2Level 3Level 4Level 5
SubA4408806804000
SubB48028056010800
SubC340158048000
SubD640620114000
SubE3202808809200
SubF480104046036060
SubG580760106000
SubH3202004001160320
SubI34056072076020
SubJ3201801801480240
Table 5. Redefinition of levels in each pattern.
Table 5. Redefinition of levels in each pattern.
Drowsiness LevelPattern APattern BPattern C
Level 1LowLowLow
Level 2HighLowMedium
Level 3HighHighHigh
Level 4HighHighHigh
Level 5HighHighHigh
Table 6. Number of samples for each subject after redefinition of levels (Patterns A, B, C).
Table 6. Number of samples for each subject after redefinition of levels (Patterns A, B, C).
Pattern APattern BPattern C
SubjectLowHighLowHighLowMediumHigh
SubA4401960132010804408801080
SubB480192076016404802801640
SubC340206019204803401580480
SubD6401760126011406406201140
SubE320208060018003202801800
SubF480192015208804801040880
SubG5801820134010605807601060
SubH320208052018803202001880
SubI340206090015003405601500
SubJ320208050019003201801900
Total426019,74010,64013,3604260638013,360
Table 7. Number of samples for each subject after downsampling to the within-pattern class minimum (Patterns A, B, C).
Table 7. Number of samples for each subject after downsampling to the within-pattern class minimum (Patterns A, B, C).
Pattern APattern BPattern C
SubjectLowHighLowHighLowMediumHigh
SubA320320480480180180180
SubB320320480480180180180
SubC320320480480180180180
SubD320320480480180180180
SubE320320480480180180180
SubF320320480480180180180
SubG320320480480180180180
SubH320320480480180180180
SubI320320480480180180180
SubJ320320480480180180180
Total3200320048004800180018001800
Note. Within each pattern, all classes were downsampled to the minimum class count observed in that pattern: Pattern A → 320; Pattern B → 480; Pattern C → 180.
Table 8. Construction of CNN. Conv, MaxPool, Dropout, and FC indicate a convolutional layer, max pooling layer, dropout layer, and fully connected layer, respectively.
Table 8. Construction of CNN. Conv, MaxPool, Dropout, and FC indicate a convolutional layer, max pooling layer, dropout layer, and fully connected layer, respectively.
LayerOutput ShapeActivationParameters
Conv1(259, 259, 32)ReLU320
Conv2(257, 257, 64)ReLU18,496
MaxPool1(128, 128, 64)0
Conv3(126, 126, 64)ReLU36,928
MaxPool2(63, 63, 64)0
Conv4(61, 61, 64)ReLU36,928
MaxPool3(30, 30, 64)0
Conv5(28, 28, 64)ReLU36,928
MaxPool4(14, 14, 64)0
Dropout1(14, 14, 64)0
Flatten(12,544)0
FC1(128)ReLU1,605,760
Dropout2(128)0
FC2 (Output)(2 or 3)Softmax258
Total1,735,618
Table 9. Subject-wise classification accuracy (%) across three label assignment patterns (10-fold LOSO CV). Mean ± SD and Median are computed across subjects.
Table 9. Subject-wise classification accuracy (%) across three label assignment patterns (10-fold LOSO CV). Mean ± SD and Median are computed across subjects.
SubjectPattern APattern BPattern C
SubA47.364.045.2
SubB76.166.055.7
SubC92.563.359.1
SubD93.489.180.4
SubE98.197.797.4
SubF99.893.289.8
SubG99.795.297.0
SubH99.899.099.1
SubI99.797.597.0
SubJ100.099.898.7
Mean ± SD90.7 ± 16.986.5 ± 15.581.9 ± 20.8
Median98.994.293.4
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nomura, A.; Yoshida, A.; Torii, T.; Nagumo, K.; Oiwa, K.; Nozawa, A. Drowsiness Classification in Young Drivers Based on Facial Near-Infrared Images Using a Convolutional Neural Network: A Pilot Study. Sensors 2025, 25, 6755. https://doi.org/10.3390/s25216755

AMA Style

Nomura A, Yoshida A, Torii T, Nagumo K, Oiwa K, Nozawa A. Drowsiness Classification in Young Drivers Based on Facial Near-Infrared Images Using a Convolutional Neural Network: A Pilot Study. Sensors. 2025; 25(21):6755. https://doi.org/10.3390/s25216755

Chicago/Turabian Style

Nomura, Ayaka, Atsushi Yoshida, Takumi Torii, Kent Nagumo, Kosuke Oiwa, and Akio Nozawa. 2025. "Drowsiness Classification in Young Drivers Based on Facial Near-Infrared Images Using a Convolutional Neural Network: A Pilot Study" Sensors 25, no. 21: 6755. https://doi.org/10.3390/s25216755

APA Style

Nomura, A., Yoshida, A., Torii, T., Nagumo, K., Oiwa, K., & Nozawa, A. (2025). Drowsiness Classification in Young Drivers Based on Facial Near-Infrared Images Using a Convolutional Neural Network: A Pilot Study. Sensors, 25(21), 6755. https://doi.org/10.3390/s25216755

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop