In this section, we present the results of the evaluation of drowsiness detection in the inter-subject mode using the proposed approach. This evaluation aims to verify the generalizability of the model by testing its performance on subjects outside the training set. Results are analyzed in terms of performance metrics, such as accuracy, sensitivity, specificity, precision, and F-score. These metrics will allow us to assess the robustness and effectiveness of the approach under inter-subject conditions, where physiological variability is a critical factor. These results will help to validate the approach’s feasibility in practical applications to detect drowsiness in a real-life environment.
4.2. Combined CNN-SVM Model
In order to improve the classification of our drowsiness detection system, we apply the SVM technique to the features extracted by the CNN. To determine the optimal SVM kernel, we evaluate multiple options, including linear, polynomial, RBF, and sigmoid kernels.
Table 5 summarizes the classification results for each of these SVM kernels.
The results indicate that the RBF kernel was the most effective among those tested (more than 2% improvement). While the linear, polynomial, and sigmoid kernels also performed well, the RBF kernel achieved an impressive accuracy of 98.33%, distinguishing itself through its ability to model non-linear decision boundaries. This is particularly important for detecting drowsiness, where EEG signal relationships tend to be complex and non-linear. Furthermore, the RBF kernel demonstrates superior generalization on test data, reducing the likelihood of overfitting. Its flexibility enables it to pick up subtle variations in the signal, improving the accuracy of drowsiness detection. The SVM-RBF hyperparameters, penalty factor C = 1 and kernel factor , were selected using the GridSearch optimization tool.
Moreover, replacing the final softmax layer of the CNN3 model with an SVM classifier led to an increase in accuracy of 0.53%. This improvement can be explained by the intrinsic differences between the two classifiers: While the softmax layer functions as a linear classifier optimized for binary classification through cross-entropy loss, it may struggle with complex, high-dimensional, and non-linearly separable EEG feature spaces. In contrast, the SVM with an RBF kernel is designed to find an optimal separating hyperplane in a transformed feature space, maximizing the margin between the two classes and effectively handling non-linear relationships. This allows the SVM to better capture subtle non-linear EEG patterns associated with drowsiness, especially in the high-dimensional feature space extracted by CNN layers, thus enhancing classification performance. The accuracy and loss curves (during training and test) for the CNN SVM (RBF) are shown in
Figure 8.
Figure 9 illustrates the confusion matrix for the classification operation.
The evolution of the loss and accuracy curves highlights the efficient convergence of the model. The loss curve shows a rapid decrease during the initial epochs, stabilizing near zero after approximately 10 iterations. The validation loss follows a similar trend, with minor fluctuations in later epochs, suggesting good generalization capabilities.
Regarding accuracy, a sharp increase is observed during the first few epochs, reaching a plateau close to 99% by the tenth iteration. The absence of a significant gap between the training and validation curves indicates that the model neither underfits nor overfits. These results suggest optimal convergence and a strong ability to generalize to unseen data.
The analysis of the confusion matrix reveals high classification performance. The model correctly identified 116 samples from class 0 and 120 samples from class 1. In this context, class 0 refers to the alert state, while class 1 refers to the drowsy state. However, four samples from class 0 were misclassified as class 1, introducing false positives. Notably, no false negatives were observed, indicating perfect sensitivity in detecting class 1.
These results demonstrate high precision and recall, confirming the robustness of the model. The absence of false negatives is particularly beneficial for critical applications where detecting class 1 is essential. Although the presence of a few false positives may lead to unnecessary alerts, their low occurrence suggests that the model maintains reliable and robust classification performance.
Figure 10 presents two complementary evaluation curves for the model: the precision–recall curve (left) and the ROC curve (right). The precision–recall curve, with an area under the curve (PR AUC) of 0.9986, lies almost entirely in the upper-right region, reflecting very high precision and recall across all decision thresholds. This shape indicates that the model maintains excellent precision even when recall is maximized, which is particularly relevant for potentially imbalanced datasets. The ROC curve, with an area under the curve (ROC AUC) of 0.9986, shows a nearly vertical rise followed by a plateau close to 1, illustrating an almost perfect discriminative ability between positive and negative classes. These results confirm that the model effectively detects the positive class while minimizing false positives, thereby demonstrating outstanding overall performance.
4.4. Comparison with Transfer Learning Models
In this section, we compare the proposed 2D CNN + SVM approach with transfer learning models, specifically, VGG16, ResNet50, DeepConvNet, EEGNet, and ShallowConvNet, to evaluate its effectiveness for drowsiness detection in an inter-subject context. The goal is to demonstrate the advantages of the proposed method, particularly the integration of SVM in the classification phase, which enhances the system’s performance and robustness.
Transfer learning leverages pre-trained models trained on large datasets, enabling them to capture general features that can be fine-tuned for specific tasks like drowsiness detection. Transfer learning models provide strong initial feature extraction, which can be adapted to EEG data characteristics, offering a robust starting point for domain-specific tasks. By contrasting the results with established transfer learning models, we aim to highlight the superior accuracy, sensitivity, and F1-score of the 2D CNN + SVM approach, as illustrated in
Table 7. This comparison underscores the importance of tailored feature extraction and classification techniques in improving drowsiness detection systems.
For the comparison of our proposed approach with state-of-the-art models including 1D-CNN, VGG16, ResNet50, DeepConvNet, EEGNet, and ShallowConvNet, we employed commonly accepted standard hyperparameters as reported in the EEG signal processing literature. All models were trained for 50 epochs using the Adam optimizer with learning rates adapted to each architecture to ensure fair and effective convergence. The selection of hyperparameters was conducted through a grid search procedure to optimize model performance systematically.
Specifically, the 1D-CNN was configured with a learning rate of 0.001, a batch size of 32, a dropout rate of 0.5, and kernel sizes of 64 and 32 in its convolutional layers with ReLU activations. VGG16 and ResNet50, pretrained on ImageNet, were fine-tuned with a learning rate of 1 × 10−4, batch size of 32, and dropout of 0.5 applied on fully connected layers; these models use 3 × 3 kernels for VGG16 and a combination of 7 × 7 initial kernels followed by bottleneck blocks for ResNet50, all with ReLU activations. The EEG-specific architectures DeepConvNet, EEGNet, and ShallowConvNet were trained with a learning rate of 0.001, batch size of 64, and dropout rates ranging from 0.25 to 0.5. DeepConvNet and EEGNet utilize ELU activation functions, while ShallowConvNet applies square and logarithmic non-linearities suited for oscillatory EEG features. Kernel sizes varied per model according to their original designs: DeepConvNet uses temporal and spatial kernels of size [1 × 10], EEGNet employs depthwise and separable convolutions with kernels of sizes [1 × 64] and [1 × 16], and ShallowConvNet uses a kernel size of [1 × 13]. Batch normalization was applied after convolution layers in all models to stabilize training. This standardized configuration ensured consistent and fair evaluation of all models on the same datasets, allowing for a rigorous comparison of performance.
4.5. Discussion
The findings of this study highlight the potential of the proposed EEG-based method for drowsiness detection, achieving notable accuracy even when applied across different subjects. However, while the CWT successfully generated scalogram images that captured detailed time-frequency features, the method’s real-world practicality remains questionable. The reliance on CWT to extract meaningful information, although beneficial in revealing the evolution of frequency content over time, may introduce complexity and computational overhead. Furthermore, while this approach is an improvement on traditional methods of representing non-stationary signals such as EEG, its scalability and performance in more varied and less controlled environments still need to be proven, leaving room for further refinement and validation. Although the use of CNNs for automatic feature extraction has shown great promise for detecting patterns associated with drowsiness, it is essential to critically evaluate the wider implications of this approach, particularly in terms of its impact on detection accuracy and system performance. The implementation of an SVM allows efficient differentiation between drowsy and alert states in a cross-subjects framework, improving the robustness of the model to individual variability. However, it is worth asking whether this combination of CWT, CNN, and SVM effectively offers a superior solution to existing methods. Although the preliminary results are encouraging, further validation with diverse populations and under real-world conditions is needed to confirm that it systematically outperforms previous approaches in real-world applications.
In
Table 8, we compare our proposed method with existing literature on drowsiness detection in an inter-subject setting. In [
23], the authors employed a simple 1D CNN for drowsiness detection using 10-s EEG segments as input for classification. The final dense softmax layer of the CNN was responsible for decision-making, achieving an accuracy of 73.22%. In [
24], the same approach was followed, except for a reduction in EEG segment length to 4 seconds. This modification resulted in a significant accuracy improvement to 95%, highlighting the crucial impact of segment length on detection performance.
In another study [
25], the authors utilized EEG spectrograms for drowsiness detection, employing an RCNN model that achieved an accuracy of 88.39%. Similarly, in [
26], a CNN-LSTM architecture was used to predict drowsiness states based on EEG spectrograms, reaching an accuracy of 82.73%. In [
27], EEG epochs of 13 seconds were used as input features for a CNN-based model (CpNet), which achieved an accuracy of 86.44%. Finally, in [
28], 5-s EEG epochs were directly fed into a CNN, resulting in an accuracy of 93%. In this work [
61], the authors propose a deep learning model called AMD-GCN, which utilizes the power spectral density (PSD) of EEG signals filtered with a band-pass filter ([1–50] Hz) from 17 electrodes for drowsiness detection. The model was validated on the SEED-VIG database and achieved an overall accuracy of 89.94%. While this approach demonstrates strong performance in the field of drowsiness detection, it remains limited in terms of implementation on embedded systems due to its algorithmic complexity. Furthermore, the authors did not address inter-subject variability, as the evaluation was performed only in the intra-subject mode with a relatively high number of required electrodes.
In the work [
62], the authors proposed a drowsiness detection approach focusing on minimizing inter-subject variability. In this context, they employed a Random Forest (RF) classifier, achieving an accuracy of 86%. The method was validated using the SEED-VIG database, extracting power spectral density (PSD) features from 8-s EEG epochs. While this approach is optimized from an implementation perspective, its accuracy remains relatively low, particularly in the context of subject-independent detection. Furthermore, the method focuses exclusively on a single type of feature (PSD), which may limit its generalization capability.
Table 8.
Comparison of different EEG-based drowsiness detection methods.
Table 8.
Comparison of different EEG-based drowsiness detection methods.
Ref | Method | Segment Size | Classifier | Database | A (%) |
---|
[24] | 1D EEG + CNN | 4 s | Fully connected layer with Softmax activation | SEED-VIG | 95.00 |
[25] | Spectrogram + RCNN | – | Fully connected layer with Softmax activation | SEED-VIG | 88.39 |
[26] | Spectrogram + CNN-LSTM | – | Fully connected layer with Softmax activation | SEED-VIG | 82.73 |
[63] | AGL-Net | – | Fully connected layer with Softmax activation | SEED-VIG | 87.30 |
[64] | VIG-Net | 8 s | Fully connected layer with Softmax activation | SEED-VIG | 95.00 |
[61] | AMD-GCN model | 10 s | Fully connected layer with Softmax activation | SEED-VIG | 89.94 |
[62] | ML+PSD | 8 s | RF | SEED-VIG | 86 |
Proposed | 2D CNN + SVM | 30s | SVM | SEED-VIG | 95.8 |
The literature analysis clearly demonstrates that the proposed method outperforms existing approaches, achieving an accuracy of 98.33%. This result indicates that our approach effectively mitigates inter-subject variability due to several key factors. One crucial factor is the length of EEG epochs, which plays a significant role in drowsiness detection. The use of 30-s epochs provides a more comprehensive representation of vigilance states compared to shorter segments of 1 s, 4 s, 5 s, or 13 s.
Another major contributor to the improved accuracy is the use of scalograms, which offer a time-frequency representation of EEG signals derived from the Continuous Wavelet Transform (CWT). Unlike raw EEG signals, which are challenging to interpret due to their non-stationary nature, or spectrograms generated via the Fast Fourier Transform (FFT), which provide only a frequency-domain representation, scalograms capture both temporal and spectral information. Additionally, CNNs are highly effective in processing images, making scalogram-based feature extraction particularly relevant. CNNs can automatically learn meaningful features associated with drowsiness, further enhancing classification performance.
Finally, the choice of a classification model that excels in binary classification, while effectively handling non-linearity and high-dimensional data, significantly improves detection accuracy compared to a simple dense softmax layer. Through the analysis of previous studies, we observe that one of the most widely used datasets for drowsiness detection research is the SEED-VIG database.
To assess the ability of our approach to minimize inter-subject variability and ensure robust generalization, we evaluated its performance on the SEED-VIG database. This dataset contains EEG recordings from 21 subjects, collected using 12 electrodes (CP1, CPz, CP2, P1, Pz, P2, PO3, POz, PO4, O1, Oz, and O2). Each participant underwent a two-hour driving simulation during two key periods of the day—the afternoon and the evening—when drowsiness levels are most likely to vary. From these recordings, 5040 scalograms were generated for analysis. The results show a slight performance degradation compared to the original DROZY dataset, with an accuracy of 95.8%, a sensitivity of 94.7%, and an F1-score of 95.4%, against 98.33% accuracy on DROZY. This performance drop can be primarily attributed to the difference in electrode configurations between the two datasets, particularly the absence of the C3 and C4 derivations in SEED-VIG. These central electrodes are known to capture cortical activity highly relevant to vigilance regulation and early drowsiness onset, and their absence reduces the discriminative information available to the classifier.
To further investigate the robustness and generalizability of our approach across different experimental conditions and subject populations, an additional validation was conducted using an independent dataset collected at the Sahloul University Hospital (Monastir, Tunisia). This dataset comprises 45 hours of EEG recordings from eight healthy subjects aged between 21 and 25 years, all without a history of alcoholism or drug use. Data collection was performed at the Vigilance and Sleep Center of the Faculty of Medicine in Monastir, following an experimental protocol approved by the faculty’s Ethics Committee. All participants signed an informed consent form prior to the experiment, in compliance with ethical research standards. EEG signals were recorded from 19 channels (Fp1, Fp2, F2, F3, Fz, F4, F8, T3, C3, Cz, C4, T4, T5, P3, Pz, P4, T6, O1, and O2).
For validation purposes, only the C3 and C4 electrodes were retained, ensuring consistency with our minimal-electrode configuration. On this dataset, the proposed method achieved an accuracy of 97.83%, which is only marginally lower than the performance obtained on the DROZY database. These results demonstrate that the approach is capable of maintaining high detection performance even when applied to EEG data collected from different populations, under varying recording setups, and in diverse environments. The ability to achieve comparable performance with minimal electrodes across heterogeneous datasets strongly supports our claim that the proposed method effectively mitigates inter-subject variability and remains suitable for practical drowsiness monitoring applications, where electrode configurations and acquisition protocols may differ from one scenario to another. To assess whether the observed differences in performance across datasets were statistically significant, we conducted independent t-tests with corresponding p-value analysis. The comparison between the DROZY (98.33%) and SEED-VIG (95.8%) datasets revealed a highly significant difference (t = 23.72, p < 0.001), confirming that the absence of C3 and C4 electrodes in SEED-VIG had a measurable impact on performance. A smaller but significant difference was also found between DROZY (98.33%) and the Sahloul dataset (97.83%) (t = 5.56, p < 0.001). Finally, SEED-VIG and Sahloul also showed a statistically significant difference (t = −18.33, p < 0.001), highlighting the role of electrode configurations and acquisition protocols in shaping classification outcomes. Despite these statistical differences, all datasets consistently achieved accuracies above 95%, demonstrating that the proposed method remains robust and generalizable across heterogeneous populations and experimental conditions.