Next Article in Journal
Localized Reluctivity Stabilization of Hysteresis Model for Transient Finite Element Simulation of Ferromagnetic Materials
Previous Article in Journal
Experimental Study on Failure Characteristics and Energy Evolution Law of Coal–Rock Combination Body Under Different Quasi-Static Loading Rates
 
 
Article
Peer-Review Record

A Signal Normalization Approach for Robust Driving Stress Assessment Using Multi-Domain Physiological Data

by Damiano Fruet 1,*, Chiara Barà 2, Riccardo Pernice 2, Marta Iovino 2, Luca Faes 2,3 and Giandomenico Nollo 1
Reviewer 1: Anonymous
Reviewer 3: Anonymous
Submission received: 29 August 2025 / Revised: 9 October 2025 / Accepted: 14 October 2025 / Published: 28 October 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

"A Signal Normalization Approach for Robust Driving Stress 2 Assessment Using Multi-Domain Physiological Data” is a research paper that presents interesting findings. However, there are areas that could benefit from improvement.

  1. Line 51. The statement about invasiveness should be corrected. There are many non-invasive devices for measuring stress (https://doi.org/10.1002/adma.202211595)
  2. line 52. Continuous monitoring is not difficult to carry out. There are models for measuring stress levels. (https://doi.org/10.1002/adma.202211595).
  3. Some Back Matter should be included. The main are “Author contribution” and “ethical statement”. For the latter, please provide information on data collection.
  4. line 470. Please, explain why sensitivity and accuracy have been increased.
  5. The discussion should be improved through critical comparison with known approaches. This research is interesting and has potential for publication after revision.

Author Response

In the revised manuscript, we considered all the comments and the suggestions, and we addressed the reviews point by point. Please find our answers in red.

General Comment - "A Signal Normalization Approach for Robust Driving Stress Assessment Using Multi-Domain Physiological Data” is a research paper that presents interesting findings. However, there are areas that could benefit from improvement.

Response General Comment: Your feedback is greatly valued. We appreciate that the proposed work has received positive feedback.

Comments 1 - Line 51. The statement about invasiveness should be corrected. There are many non-invasive devices for measuring stress (https://doi.org/10.1002/adma.202211595).

Response 1: While advances in technology and materials have led to the miniaturization of devices and optimized control systems, stress assessment based on biomarkers still relies on invasive (though often minimally so) and non-reusable devices. We thank you for the suggestion and have incorporated this novelty within our document.

Comments 2 - Line 52. Continuous monitoring is not difficult to carry out. There are models for measuring stress levels. (https://doi.org/10.1002/adma.202211595)

Response 2: We've revised the text and added the suggested reference. This addresses the point and completes the discussion begun in comments 1.

Comments 3 - Some Back Matter should be included. The main are “Author contribution” and “ethical statement”. For the latter, please provide information on data collection.

Response 3: We have provided all the necessary details, including the Author Contribution statement. The information requested for the ethical statement (specifically regarding data collection) is included under the heading Data Availability in the Back Matter, where we clarify that the data were obtained from an open-source dataset.

Comments 4 - line 470. Please, explain why sensitivity and accuracy have been increased.

Response 4: The increases in sensitivity and accuracy are a direct result of the inter-subject normalization procedure, which reduced natural physiological differences between subjects and aligned their features to a common domain. This consistency allows the model to better distinguish stress events in the subject-normalized signals from the high physiological variability, which was most prominent in the original data, leading to a more accurate identification of stress instances. We have added a statement in the manuscript to better explain this concept.

Comments 5 - The discussion should be improved through critical comparison with known approaches. This research is interesting and has potential for publication after revision.

Response 5: The discussion paragraph has been revised according to your feedback to include a more critical comparison with known approaches. The revised text now explicitly addresses the limitations of traditional normalization (standardization and scaling) by defining their domain and mechanism, which typically result in isolated feature treatment. We then introduced our novel approach as a direct contrast, focusing on its operation in the time domain and its ability to establish strong feature interconnection. Finally, we proposed a hybrid model combining the strengths of both inter-subject (time) and amplitude (e.g., z-score) normalization.

Reviewer 2 Report

Comments and Suggestions for Authors

Dear corresponding Author, thank you for submitting your work to Eng journal and congratulations on the research.

Brief Summary: Study proposes inter-subject normalization for ECG and respiratory signals in driving stress classification, comparing with traditional tecniques on a database of 10 drivers.

General Comments: The sample of 10 subjects appears limited to robustly validate the proposed methodology. The lack of frequency domain analysis limits the completness of the approach, considering that frequency features are often crucial in stress assesment. The 5% improvement in accuracy, although statistically significant, appears modest.

Specific Comments:

  • Line 78-84: The discussion on inter-subject variability needs greater bibliographic support specific for the cited physiological ranges.
  • Line 141-143: The exclusion of 6 subjects (37.5% of the original sample) requires detailed analysis of the impact on results generalizability. The exclusion criteria is not clear to me. Line 217-218:
  • The value of 70 bpm for cardiac standardization needs more thorough physiological justification. This doesn't seem like a correct evaluation, moreover there is no data about these subjects: age, weight, sex, previous or current training status (if it exists). So these are not useful numbers in my opinion.
  • Line 256: Similarly, the choice of 14 breaths/minute requires specific literature support because it's data out of context without knowing the people.
  • Table 1-2: The resampling frequencies vary remarkably (209-393 Hz), this extreme range could introduce undiscussed artifacts.
  • Figure 2: The confusion matrix shows significant misclassification between highway and city, suggesting method limitations not adequately addressed.
  • Line 495-504: The comparison with previous studies is inadequate, direct controls with alternative normalization methodologies are missing and everything appears confusing.

In conclusion I believe that the work is extremely weak and very confusing, I think the authors can completely revise their work because in this form it doesn't appear repeatable and publishable.

Author Response

In the revised manuscript, we considered all the comments and the suggestions, and we addressed the reviews point by point. Please find our answers in red.

Comments 1 - The sample of 10 subjects appears limited to robustly validate the proposed methodology. The lack of frequency domain analysis limits the completness of the approach, considering that frequency features are often crucial in stress assesment. The 5% improvement in accuracy, although statistically significant, appears modest.

Response 1: We sincerely appreciate your thorough review and constructive comments, which have provided valuable insights for improving our manuscript. We have addressed each point below.

  1. 10 subject samples: I agree that the perceived number of 10 subjects may appear as a limitation. However, I first need to correct a misunderstanding: the number 10 relates to the number of acquisitions, not the number of subjects, a point I regret not specifying clearly in the manuscript. The total number of acquisitions was 16. Each acquisition lasted between 50 to 90 minutes, which resulted in a large amount of data. We chose this dataset because it's widely used for stress assessment, both by the original research group collecting the data and by other research groups. As has been done in other work using this same dataset for stress detection (e.g., doi: 10.3390/diagnostics13111897 and doi: 10.3390/s21072381), it was necessary to exclude some acquisitions. These were excluded because they did not meet the criteria of having all the considered signals and the necessary temporal references related to the stress moments. Unfortunately, no more data were available for this specific analysis, and our approach and resulting data size were consistent with other studies performing stress assessment based on this dataset.
  2. Frequency domain assessment: We completely agree that frequency-domain features possess significant potential for stress assessment and could potentially further enhance our model's performance. Our decision to focus solely on time-domain features in this initial work was deliberate, driven by two key factors. Primarily, our goal was to rigorously validate the efficacy of the novel raw-signal inter-subject normalization procedure by isolating its effect, using time-domain features to provide a clear proof-of-concept before introducing the complexity of frequency transformations. Secondarily, limiting the feature set allowed us to use smaller, non-overlapping windows, which generated a greater number of samples for classification, thereby supporting a robust training process. This is a common and validated approach, as shown by other studies that obtain promising results using only time-domain features (e.g., [DOI: 10.1109/ICIRCA48905.2020.9183244]). Nevertheless, as acknowledged in the Discussion and Conclusion, future work will definitively focus on incorporating and rigorously evaluating the contribution of frequency-domain features, especially when integrated with our normalization approach.
  3. 5% improvement in accuracy: While a 5% increase may seem modest, we consider this enhancement highly significant, reflecting the strong contribution of our methodology. This improvement elevates the overall accuracy from a baseline of 68% to 73% in a challenging, multilevel physiological classification task. However, the primary significance is the qualitative demonstration that our novel inter-subject normalization procedure is an effective and robust technique for mitigating substantial basal physiological variability directly on the raw signal. Previous studies that failed to address this variability at the raw data level often lost critical information. Our preprocessing step ensures that all extracted features are directly normalized, a benefit that traditional feature-based methods (such as standardization and scaling) couldn't match, as shown by our statistical comparisons. Furthermore, the methodology offers low computational cost, acting as an efficient preprocessing step. In challenging scenarios, such as the driving stress context examined here, this reduction in misclassification represents an important, substantial step forward for stress-monitoring systems.

 

Comments 2 - Line 78-84: The discussion on inter-subject variability needs greater bibliographic support specific for the cited physiological ranges.

Response 2: We enhanced our bibliographic support and added more context regarding the normal resting heart rate (line 79 – 86).

Comments 3 - Line 141-143: The exclusion of 6 subjects (37.5% of the original sample) requires detailed analysis of the impact on results generalizability. The exclusion criteria is not clear to me. Line 217-218.

Response 3: As mentioned in response 1, we initially chose this dataset because it's widely used for developing stress assessment algorithms. Unfortunately, some data are not consistent within the dataset, meaning that at least one of the following is missing:

  • No ECG data for that specific acquisition.
  • No respiratory data for that specific acquisition.
  • No synchronization between different physiological data.
  • No label associated with the driving phase, making it impossible to determine the subject's correct stress status.

Other studies also confirm that it is necessary to exclude some acquisitions due to these reasons (e.g., doi: 10.3390/diagnostics13111897 and doi: 10.3390/s21072381). However, the total number of samples is sufficient for a good analysis using a machine learning approach, as the acquisitions last from 60 to 90 minutes. Additionally, our approach of using a 20-second window further increases the number of samples considered.

Comments 4 - Line 217-218: The value of 70 bpm for cardiac standardization needs more thorough physiological justification. This doesn't seem like a correct evaluation, moreover there is no data about these subjects: age, weight, sex, previous or current training status (if it exists). So these are not useful numbers in my opinion.

Response 4: The proposed novel inter-subject normalization procedure acts independently of the chosen resting heart rate, as it normalizes each subject individually in the same manner. While we chose 70 bpm for this paper to align with the normal range declared in the introduction, this value carries mathematical significance for the normalization process, not a physiological assumption for any single subject. A higher or lower normalization heart rate could also be selected. This choice translates to different resampling frequencies for each subject, ultimately bringing all subjects into the same domain with a standardized resting heart rate. During stress conditions, the heart rate remains proportional to the 70 bpm baseline for each subject. The selection of 70 bpm may have significant effects on features derived from the frequency domain, an area that warrants deep investigation in future works. The current aim of this paper, however, is to focus specifically on the novel inter-subject normalization procedure and demonstrate its applicability in a real-world task.

Comments 5 - Line 256: Similarly, the choice of 14 breaths/minute requires specific literature support because it's data out of context without knowing the people.

Response 5: As for the response 4, the value of 14 breaths per minute has a mathematical significance in this inter-subject normalization procedure, rather than a purely physiological meaning. We added a reference (line 88) showing the considered normal breath rate for adults. The value of 14 was chosen because it falls within the range of the normal breath rate, ensuring that all subjects, after inter-subject normalization, exhibit the same breath rate during resting. This serves to emphasize any relative differences that occur during stress conditions.

Comments 6 - Table 1-2: The resampling frequencies vary remarkably (209-393 Hz), this extreme range could introduce undiscussed artifacts.

Response 6: The variable resampling frequency is not arbitrary, but it is the direct output of our proposed inter-subject normalization methodology. Our method's core purpose is to align the fundamental physiological rhythms (the Heart Rate and Breath Rate) of every single subject to the exact same specified target frequency (70 bpm and 14 breaths per minute) within the normalized signal. Because the original HR and BR signals vary widely between individuals, the required resampling factor must also vary widely. A subject starting with a very fast intrinsic rhythm requires a much lower resampling rate (e.g., 209 Hz) to reach the common target frequency domain than a subject starting with a slower rhythm. The reviewer's concern is that the varying resampling rates introduce differential artifacts that could contaminate the features used for classification. We mitigate this because our machine learning approach relies on time-domain features that are designed to be scale-invariant relative to the newly established HR target and BR target domain. The resampling procedure aligns the signals to a common domain, allowing our extracted features to capture the relative differences between subjects in a way that is independent of their initial domain. We do acknowledge, however, that this normalization approach can have an impact on pure frequency-domain features. We agree that a deeper investigation into the effects of variable resampling on those specific spectral features is a necessary area for future research.

Comments 7 - Figure 2: The confusion matrix shows significant misclassification between highway and city, suggesting method limitations not adequately addressed.

Response 7: We acknowledge that the confusion matrix in Figure 2 shows that distinguishing between the high-stress (City) and medium-stress (Highway) conditions is more challenging than isolating the resting state. This difficulty is attributed to the subtle physiological overlap between the two driving states compared to the distinct activity observed during rest. A discussion addressing this point has been added to the revised manuscript (Cahpter 5 – Disucssion) . It is clarified that the main focus of the current work is not to optimize classification between these two stress states, but rather to present the contribution of the novel inter-subject normalization procedure against other methodologies. Since the features and classification methodology were kept identical across all conditions, the study fully focuses on the impact of normalization, as detailed in Table 3.

Comments 8 - Line 495-504: The comparison with previous studies is inadequate, direct controls with alternative normalization methodologies are missing and everything appears confusing.

Response 8: We've updated the discussion based on your feedback to include a more critical comparison with existing methods. To the best of our knowledge, however, no inter-subject normalization procedure currently acts directly on raw data in the time scale, making a direct comparison with similar methodologies impossible. Nonetheless, the revised section now clearly outlines the limitations of traditional existing normalization procedure (like standardization and scaling), specifying their typical isolated, feature-by-feature treatment. Our novel approach is then presented as a direct contrast, highlighting its time-domain operation and ability to create strong feature interconnection.

Reviewer 3 Report

Comments and Suggestions for Authors

I have the following concerns that must be addressed before the final verdict.

1) How does the proposed inter-subject normalization compare against more advanced machine learning approaches (e.g., deep learning, domain adaptation, transfer learning) that also address inter-subject variability?

2) Can the method be validated on a larger or more diverse dataset beyond the Healey driving dataset to assess generalizability?

3) Why was the normalization limited to ECG and respiration? Could it be extended to other signals such as EDA, EMG, or PPG, and what challenges might arise?

4) Since only time-domain features were used, how would frequency-domain or nonlinear features behave after applying the normalization procedure?

5) How does the choice of fixed reference values (70 bpm for heart rate, 14 bpm for respiration) affect classification results? Would adaptive or subject-specific reference values perform better?

6) Could the authors provide a deeper statistical comparison with existing works that achieved higher accuracy, and explain whether the proposed approach can be combined with those methods?

7) How would the method perform in real-time applications where computational efficiency and short data windows are critical?

8) What is the physiological interpretability of the normalized signals? Does resampling alter meaningful temporal dynamics that might be important for stress characterization?

Author Response

In the revised manuscript, we considered all the comments and the suggestions, and we addressed the reviews point by point. Please find our answers in red.

Comments 1 - How does the proposed inter-subject normalization compare against more advanced machine learning approaches (e.g., deep learning, domain adaptation, transfer learning) that also address inter-subject variability?

Response 1: Many thanks for your comment. We agree that exploring more advanced machine learning approaches, such as deep learning or domain adaptation, offers a promising direction for future research. The primary aim of the presented study was to rigorously validate the performance of a novel inter-subject normalization procedure for stress assessment. To achieve this, we employed a highly controlled framework. We specifically compared our new methodology against well-known and widely used normalization procedures (standardization and scaling) within a fixed machine learning setup. This approach allowed us to isolate and deeply focus on the effect and unique contribution of the novel inter-subject normalization procedure, keeping the feature extraction and classification methodology constant. We believe that investigating the synergy between our proposed normalization and the more advanced machine learning approaches you mentioned will be a critical step in subsequent studies. This future work could also explore the effectiveness of our methodology on different feature types (e.g., feature-domain features) and with various classification methodologies.

Comments 2 - Can the method be validated on a larger or more diverse dataset beyond the Healey driving dataset to assess generalizability?

Response 2: The method can be further validated on different datasets and extended in future work to include other possible investigations (e.g., the usage of frequency-domain features), thus generalizing our approach. We initially chose this dataset because it is widely used for stress assessment studies (e.g. doi: 10.3390/diagnostics13111897 and doi: 10.3390/s21072381) and contains synchronized, accurately labeled data corresponding to various stress states. We have added these references to the manuscript.

Comments 3 - Why was the normalization limited to ECG and respiration? Could it be extended to other signals such as EDA, EMG, or PPG, and what challenges might arise?

Response 3: Thanks for the comment; this is a very good observation. In this study, our primary goal is to demonstrate the effectiveness of the inter-subject normalization procedure on a stress assessment task. To this end, we incorporated more than one signal to show the generalizability of the proposed methodology. Specifically, we selected ECG and respiratory data due to their established relevance in stress research. We intentionally limited the technical details regarding the signals to maintain a focus on the normalization procedure itself. By showing the procedure's effectiveness, future work can readily expand the range of physiological data considered for analysis.

Comments 4 - Since only time-domain features were used, how would frequency-domain or nonlinear features behave after applying the normalization procedure?

Response 4: We completely agree that frequency-domain features possess significant potential for stress assessment and could potentially further enhance our model's performance. Our decision to focus solely on time-domain features in this initial work was deliberate, driven by two key factors. Primarily, our goal was to rigorously validate the efficacy of the novel raw-signal inter-subject normalization procedure by isolating its effect, using time-domain features to provide a clear proof-of-concept before introducing the complexity of frequency transformations. Secondarily, limiting the feature set allowed us to use smaller, non-overlapping windows, which generated a greater number of samples for classification, thereby supporting a robust training process. This is a common and validated approach, as shown by other studies that obtain promising results using only time-domain features (e.g., DOI: 10.1109/ICIRCA48905.2020.9183244). Nevertheless, as acknowledged in the Discussion and Conclusion, future work will definitively focus on incorporating and rigorously evaluating the contribution of frequency-domain features, especially when integrated with our normalization approach.

Comments 5 - How does the choice of fixed reference values (70 bpm for heart rate, 14 bpm for respiration) affect classification results? Would adaptive or subject-specific reference values perform better?

Response 5: The proposed novel inter-subject normalization procedure acts independently of the chosen resting heart rate, as it normalizes each subject individually in the same manner. While we chose 70 bpm for this paper to align with the normal range declared in the introduction, this value carries mathematical significance for the normalization process, not a physiological assumption for any single subject. A higher or lower normalization heart rate could also be selected. This choice translates to different resampling frequencies for each subject, ultimately bringing all subjects into the same domain with a standardized resting heart rate. During stress conditions, the heart rate remains proportional to the 70 bpm baseline for each subject. The selection of 70 bpm may have significant effects on features derived from the frequency domain, an area that warrants deep investigation in future works. The current aim of this paper, however, is to focus specifically on the novel inter-subject normalization procedure and demonstrate its applicability in a real-world task. Note that using an adaptive reference would move in the opposite direction of the proposed methodology. The normalization procedure proposed here acts to bring subjects into the same resting state condition (regarding heart rate and breath rate), thereby emphasizing any differences during stress conditions. An adaptive reference, conversely, would tend to mitigate these differences.

Comments 6 - Could the authors provide a deeper statistical comparison with existing works that achieved higher accuracy, and explain whether the proposed approach can be combined with those methods?

Response 6: This is an excellent question that addresses a critical design choice in our study. Our primary goal wasn't to achieve the absolute highest accuracy, but to provide a fair and rigorous comparison of inter-subject normalization techniques. Studies with higher reported accuracy typically use significantly more complex methods, such as extensive frequency-domain features, deep learning models, and longer data segments, whereas our work intentionally uses a simplified pipeline and short segments to isolate the effect of normalization. A direct statistical comparison would be misleading due to these fundamental methodological differences. We strongly believe our inter-subject normalization procedure can be highly synergistic with existing state-of-the-art methods; it provides a more robust, subject-invariant input that can significantly enhance the generalization and performance of complex deep learning techniques. We added a discussion point in chapter 5 – Discussion.

Comments 7 - How would the method perform in real-time applications where computational efficiency and short data windows are critical?

Response 7: At this point, we have not yet implemented our methodology in a real-time application. However, we believe that the inter-subject normalization procedure does not introduce significant bottlenecks in the pipeline. It will be necessary to collect some data during a resting condition to calculate the resting heart rate and breath rate, which is required to define the final resampling frequency for that specific subject. From that point onward, the acquired data only needs to be multiplied by a defined factor, without any other impact on the feature extraction procedure.

Comments 8 - What is the physiological interpretability of the normalized signals? Does resampling alter meaningful temporal dynamics that might be important for stress characterization?

Response 8: The inter-subject normalization procedure is a mathematical operation that leads to stretching or compressing the signal in the time axis. This is done to ensure that selected features (e.g., heart rate during resting condition) are equal among subjects during the resting state. In this sense, the procedure can introduce a loss of physiological context in the data, as the significant differences in the resting state are deeply mitigated, potentially leading to one subject's data being interpreted as another's. However, this procedure's sole goal is to enhance the stress assessment procedure by highlighting any differences in stress conditions starting from the same reference point, and the results have shown promising outcomes. We thank you for the suggestion and have incorporated this point within our document in the discussion section.

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

I have carefully read the revisions addressing the doubts and concerns I expressed in my first review, and I believe the authors have provided adequate responses. I consider the manuscript, in its current form, suitable for publication.

Reviewer 3 Report

Comments and Suggestions for Authors

I am willing to accept the paper in its current form. 

Back to TopTop