Driver Monitoring System Using Computer Vision for Real-Time Detection of Fatigue, Distraction and Emotion via Facial Landmarks and Deep Learning
Round 1
Reviewer 1 Report
Comments and Suggestions for Authors-
The demographic and driving profiles of the 27 participants (e.g., age, gender, driving experience, cultural background) are not reported. This omission limits the generalizability of the findings regarding emotion recognition and behavior detection, as these factors can significantly influence facial expressiveness and behavior patterns. Please include a summary table of participant characteristics.
-
The emotion recognition model was trained on the RAF-DB dataset, which contains exaggerated expressions of fear. This does not align with the subtle, often suppressed expressions likely occurring in real driving scenarios, as evidenced by the 0% detection rate for fear. Direct application without domain adaptation or fine-tuning on in-vehicle expression data limits the model's practical utility. Consider using a driving-specific emotion dataset or applying transfer learning techniques.
-
The use of fixed thresholds for EAR (0.23) and MAR does not account for individual anatomical differences (e.g., eye shape, facial structure). This leads to misclassifications, as noted for participants with smaller eyes. Implementing a brief personal calibration phase or adaptive thresholding mechanisms would likely improve robustness and reduce false positives.
-
The distraction detection logic classifies head rotations as distractions without contextual awareness. In normal driving, looking at mirrors or checking traffic signs is essential and safe. Treating all rotations equally will generate false alarms. Integrating vehicle dynamic data (e.g., steering angle, turn signal status) could help distinguish necessary from unnecessary head movements.
-
The system evaluation employs two distinct datasets: RAF-DB for emotion and the Driver Inattention Detection Dataset for fatigue/distraction. These datasets differ in context, collection environment, and annotation standards. This inconsistency makes it difficult to assess the integrated system's performance cohesively. A unified evaluation framework on a consistent dataset is needed.
-
Conducting some tests outdoors with variable lighting conditions introduces an uncontrolled variable that negatively impacts the stability of facial landmark detection (a core component for EAR, MAR, and pose). The results in these conditions may not reflect the system's reliable performance. Lighting conditions should be controlled or systematically varied and reported.
-
The hardware specifications of the camera used for data capture (model, resolution, frame rate) are not provided. This lack of detail hinders the reproducibility of the experiments. Please specify the acquisition setup to allow for replication and comparison.
-
The validation was conducted in short-duration simulated or static settings. The system's performance under prolonged, real-world driving conditions—where fatigue accumulates and distraction patterns evolve—remains unverified. Long-term testing in actual driving scenarios is crucial to assess practical endurance and reliability.
-
While a 100% accuracy rate is reported for distraction detection, the tested scenarios appear limited to deliberate gaze deviations (left, right, up, down). It is unclear if common, more nuanced distracting activities (e.g., mobile phone use, eating, conversing with passengers) were included. The claim of perfect detection should be qualified, and the scope of tested distractions explicitly defined.
-
The study lacks a comparative analysis with state-of-the-art baseline methods or alternative architectures under the same experimental conditions (e.g., using the same subset of data or metrics). While related works are cited, a direct quantitative comparison is absent, making it difficult to gauge the relative advancement of the proposed multi-task system over single-task alternatives.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsAuthors are requested to review their proposal in cases where the eyes are small, as in some cases this led to errors in the EAR estimation.
Review the EAR, MAR, and pitch values in the proposal, as the use of constant values creates limitations in the proposal.
Authors should review the MAR metric, as it may lead to lower accuracy.
Why photographs with low light?
Authors should review sudden changes in brightness, as this affects the proposal and could generate false positives.
Authors should review the recognition positions to make the proposal more efficient.
Authors should review the frame capture in relation to the focus so that it is the same in the frame-by-frame capture and does not discard temporary values.
Authors should review the analysis of micro-expressions.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsI believe the authors have addressed my concerns
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors explain and develop each of the observations.
