1. Introduction
The contactless monitoring of a person’s vital signs is crucial, especially when it comes to evaluating the condition of the driver. This topic has gained attention from researchers in the field of road safety. Factors such as fatigue, stress, distraction, and other emotional states can have an effect on driving ability and increase the risk of road accidents.
To monitor and estimate the alertness of the driver at any given time, various indicators reflecting their physiological state can be considered. Changes in alertness levels correspond to changes in the driver’s psychophysiological state. Hence, technologies that detect drivers physiological signals have become increasingly important for improving road safety [
1,
2,
3]. The physiological signals used for assessing the functional state of drivers include cerebral activity, cardiac rhythm, muscular tension, and respiratory patterns. This paper will specifically focus on heart and respiratory rates as indicators for evaluating the driver’s state.
Researchers have proposed various approaches based on traditional methods for measuring physiological signals, such as electrocardiography (ECG) [
4,
5] and spirometry [
6]. However, these methods are invasive and require instrumentation in direct contact with the driver, which can be cumbersome and potentially dangerous. Non-invasive solutions have been proposed, such as the use of video cameras, photoplethysmography (PPG) [
7,
8], or pressure sensors on the seat [
9]. Nevertheless, these non-invasive solutions can still be cumbersome and do not allow for accurate measurement of physiological signals.
A promising approach for measuring driver physiological signals involves the use of CW radar for heart rate and respiration measurement [
10,
11]. This contactless method is non-invasive, making it particularly convenient and safe. CW radars use electromagnetic waves to measure the distance and speed of moving objects, which allows for measuring the distance between the radar and the driver’s chest, which moves in rhythm with respiration and heartbeats. Also, CW Doppler radars consume less power and have a simple hardware architecture. The relative displacement information obtained via the CW Doppler radar can be employed to estimate the heart rate in single-person scenarios [
12].
CW radar acquires all small vibrations generated on the chest surface by cardiac and respiration activities. It is susceptible to chest vibrations that are unrelated to heartbeats or breathing, such as body movements, as well as interference from other individuals [
12]. Reducing the radar’s susceptibility to the propagation channel and driver body movement constitutes a primary challenge. The amplitude of the heartbeat signal is considerably smaller (between 0.2 and 0.5 mm) compared to the thorax displacement caused by respiration (between 4 and 10 mm). Thus, separating the frequencies of each heart and breathing rhythm emerges as the second challenge for researchers in this field [
13,
14].
To overcome this problem of harmonies, the authors in [
15,
16] propose signal processing algorithms based on simple filtering or the heart rate frequency estimation using spectral analysis and to separate the heart rate and respiration signals from radar CW measurements. Authors in [
15] used the analysis of temporal variations in the signals in successive time windows for processing via the Fast Fourier transform (FFT) [
15] or the Wavelet transform (WT) methods [
16]. Various approaches using complex signal processing methods are proposed in [
14,
17,
18,
19]: In [
14], the author introduces an approach based on cyclostationarity techniques to extract heartbeat and respiration rates from vital signals obtained with a 2.4 GHz CW radar, without being dependent on environmental noise and random body movements. This is achieved via the derivation of order 1 and 2’s cyclostationarity moments and the second cyclic cumulant. Furthermore, in [
17], the authors focused on extracting the harmonic signal of heartbeats from the vibrations of the chest surface gathered via a Continuous-Wave Doppler Radar System (CW-DRS) equipped with a band-pass filter. This method assumes that respiration does not occur within the heartbeat’s harmonic region. The method proposed in [
18] uses a Doppler radar and Empirical Mode Decomposition (EMD) to filter out the noise and detect respiration signals. This signal is filtered to isolate respiration frequencies before being analyzed using the Short-Time Fourier Transform to determine the breathing rate. However, all of the deterministic approaches mentioned above are complex methods and their results depend on the environment of the application and the predefined conditions, which makes them less flexible and less precise.
The analysis and interpretation of raw radar data can be challenging due to several factors, as mentioned in the previous section, such as environment and harmonies. Machine Learning (ML) techniques have been used as a non-deterministic approach to extract physiological signals from the radar data in order to overcome the limits of deterministic methods. Many state-of-the-art solutions use various types of filters to separate the heart and respiration rates. In [
20], the authors propose using the gamma filter to model the time series heartbeat signal, accounting for respiration and respiration artifacts. The approach uses a gamma filter to isolate the heart rate from radar-measured signals, providing an effective and non-invasive method for heart rate monitoring. In [
21,
22], the authors suggest the use of the Kalman filter to update the band-pass filter limits for parameter estimations while considering heart rate measurement and reducing noise in vital signs. However, given the Kalman filter assumptions, it is required to selectively filter out corrupted data caused by arbitrary user motions in order to prevent subsequent vital sign estimates from being tainted [
21].
Other approaches are based on unsupervised or supervised machine-learning algorithms to predict vital parameters from time series [
23] or to extract pertinent information, such as arrhythmia detection [
24], based on electrocardiography (ECG) signals. Furthermore, CW Doppler radar systems have integrated Deep Learning methodologies, such as detecting heartbeats [
25,
26]. The first results indicate promising advantages in terms of heartbeat detection latency and source separation capabilities (resistance of heartbeat detection to respiration or random body motions) compared to traditional methods. Deep neural networks can learn to detect physiological signals by analyzing raw radar data, extracting relevant features, and estimating heart and respiration rates. the research in [
26] proposes the use of convolutional neural networks (CNN) in order to estimate the heart rate from the measured signals using an ultra-wideband radar (UWB). This approach focuses on person-specific identification, with the CNN being trained separately for each subject, primarily due to the lack of available training data. In [
12], the authors propose an artificial neural network (ANN) as the main signal processing element, which is trained to detect heartbeats accurately in real time, but most of the methods mentioned above allow us to extract either the heart rate or the respiratory rate but not both at the same time. Recently, the research in [
27] proposes the use of a deep learning framework utilizing a convolutional neural network to estimate the heart rate (Fc) and respiration rate (Fr) in real time using a dataset measured during sleep via a UWB radar with a sampling frequency
,
and a window size of 15 s in order to detect artifacts. In addition to resolving this problem, the research uses a Continuous Wavelet Transform (CWT) as a pre-processing method to extract the characteristics of each signal.
This paper proposes two approaches: one for estimation and the other for classification aimed at monitoring the driver’s vital signs and estimating his physiological state. The dataset used in this study was obtained from a CW radar with
and a window size of 50 s, which represents a logical duration to estimate the physiological state of the driver. This presents a difference between our study and [
27], which uses a 15 s window. Their focus was not on detecting changes in the physiological state but rather on working exclusively with the sleep state. The approaches presented in this paper employ different Deep Learning models: 1D-CNN; the Recurrent Neural Network (RNN); in particular, the Bi-LSTM network, using its ability to remember long-term dependencies in sequential data; and the TCN, which is adept at handling long sequences with complex patterns. Additionally, our paper introduces the CRNN model, which combines the benefits of CNN and Bi-LSTM to achieve a robust performance in detecting and extracting the heart and breath rate values, while also classifying the different physiological states of the driver based on the temporal vital signs measured via the CW radar.
The rest of this paper is organized as follows.
Section 2 introduces and discusses the architectures and characteristics of the four proposed Deep Learning models.
Section 3 focuses on the CW radar function and data processing.
Section 4 represents the results of our models. Finally,
Section 5 concludes the study and provides suggestions for future research.
2. Models Proposed
This section presents the different Deep Learning models in order to detect and extract the value of the heart rate and the respiratory rate from the temporal vital signs and also to classify the different states of the driver (fatigue or drowsiness, resting or normal state, and the state of stress). We have tested several Deep Learning models to evaluate each model’s performance and contribution to our problem’s study.
2.1. 1D-CNN
The 1D-CNN is a Deep Learning technique that involves applying a series of convolution filters to a one-dimensional sequence of data, such as vital signs in our case.
In our specific approach, we use a 1D-CNN which is comprised of two convolution layers of 128 units with a filter size of 512 (layer A and layer C in
Figure 1, as well as a MaxPooling-1D layer (layer B and layer D)) to reduce the dimensionality of the extracted features. These convolution layers allow us to extract relevant features from the input signal. We have used a network of dense layers (fully connected) for the output layers. In our study, we employ CNNs to solve regression and classification problems.The final layer of our model of regression outputs two values, corresponding to the heart rate and the respiratory rate extracted from the time-series signals. The final layer of our classification outputs three values, corresponding to three classes representing the physiological states of the driver as shown in
Figure 1.
Figure 1 illustrates the general architecture of the four models we have proposed. We aimed to maintain a similar structure for all four models while modifying the hidden layers section, as explained in
Section 2. However, the input section is the same for all four models: each input (xi) corresponds to a vital sign, labeled by the heart and respiratory rates (Fc, Fr) for regression, and by three different physiological states of the driver, namely fatigue or drowsiness, normal state, and stress.
2.2. TCN
In the context of analyzing CW radar signals for the extraction of heart and respiratory frequencies, we have explored an innovative approach using TCN [
28]. This is an extension of our previous model that uses 1D CNN, where we have applied a series of convolution filters to a one-dimensional sequence of vital signals. The TCN model proposed consists of a temporal convolution block with dilations ranging from 1 to 32 as hidden layers, as shown in
Figure 2. This allows the network to learn dependencies at different time intervals. The convolution layers are complemented by residual connections and causal padding to ensure that the prediction at each instant is based on past and current data. As for the output layers, the model has several fully connected layers (Dense), each followed by a Dropout regularization layer to control overfitting. The last two units of the network are dedicated to predicting heart and respiratory frequencies, which are used for the regression output. For the classification, the TCN output corresponds to the three physiological states of the driver.
This TCN approach offers an improvement over our old 1D CNN-based model, bringing a greater ability to understand temporal dependencies in the signal data and requiring fewer parameters than the CNN. This improvement can lead to a more accurate extraction of heart and respiratory frequencies from raw CW signals.
2.3. Bi-LSTM
Bi-LSTM is particularly well suited for detecting vital signals from the CW radar data, due to their ability to process temporal sequences and retain long-term information. Using Bi-LSTMs to process the CW radar data allows for the detection of heart and respiratory frequencies with high precision, showcasing an edge over traditional methods as well as the standalone CNN model discussed in the preceding section.
Figure 1 shows the architecture of the Bi-LSTM model proposed in this paper. It consists of a two-layer bidirectional LSTM with 128 units (layer A and layer D), a normalization layer (layer B), and a 1D global pooling layer (layer D) to reduce the dimensionality of the features extracted via the Bi-LSTM network. The 1D global pooling layer can help prevent overfitting and improve the generalization of the model. Additionally, it can save computational resources by reducing the number of parameters required for processing 1D sequential data. Our output layer is comprised of four units of fully connected layers, which corresponds to the desired outputs for the regression model. In the case of the classification model, the outputs correspond to the three physiological states of the driver, as shown in
Figure 1.
Combining the benefits of temporal feature extraction and long-term memory retention, the proposed Bi-LSTM model offers a powerful solution for analyzing the vital signs measured via the CW radar.
2.4. CRNN
In the previous sections, we have seen the individual strengths of CNN and Bi-LSTM in detecting heart and respiratory frequencies from signals measured using the CW radar data. The CNN offers excellent spatial feature extraction capabilities, while the Bi-LSTM effectively handles long-term temporal dependencies within the vital sign sequences.
In this section, we present a CRNN architecture developed to detect heart rate and respiratory frequency from vital signs measured via the CW radar. This architecture combines the benefits of CNNs and Bi-LSTMs into a unified model, enabling precise and efficient extraction of spatial and temporal features.
Our CRNN architecture, illustrated in
Figure 1, comprises a Conv-1D layer (layer A) of 128 units with a filter size of 512 for the initial extraction of features from CW radar signals. Following the phase of convolution, a bidirectional LSTM layer (layer C) of 128 units is used to comprehend the long-term temporal dependencies of these vital signs. This ability to effectively handle past and future information renders our model particularly suited to the sequential nature of heart rate and respiratory frequency data. The Bi-LSTM phase is followed by a Global Average Pooling layer (layer D) to reduce computational complexity while retaining the essence of key features. Subsequently, a series of dense layers are used and the final layer produces estimates of the heart rate and respiratory frequency for the regression output and it determines the physiological state of the driver for the classification output, as shown in
Figure 1.
In conclusion, our CRNN architecture uses the local feature extraction capability of the CNN and the expertise over long-term temporal dependencies of the Bi-LSTM to provide a robust method for analyzing the vital signs measured via the CW radar.
3. Experience
3.1. Data
This section introduces the operating principle of the CW radar for vital signs’ measurements, specifically heart and respiratory rates. We will present two distinct databases used for training the Deep Learning models. The first database contains simulation data generated by MATLAB, while the second one contains real data measured via a 24 GHz CW radar.
3.1.1. Simulation Data: Basic Principles of CW Radar Operation
The CW radar generates sinusoidal electromagnetic waves using a local oscillator (LO) [
12,
14]. These waves are then amplified via a power amplifier (PA). Mathematically, the transmitted signal,
, can be expressed as:
where
f represents the frequency of the transmitted signal,
denotes the phase noise of the LO, and
is the amplitude of the transmitted signal.
The transmitted waves reflect off of a moving object, such as a human body, and the reflected signal experiences a frequency shift due to the Doppler effect. The motion of the body comprises three components: respiration (
), heartbeat (
), and random body movements (
). This gives rise to the received signal
, which can be mathematically expressed as:
Here,
A is the amplitude of the received signal,
is the wavelength of the signal,
c is the speed of light,
represents the initial distance between the CW radar and the body,
denotes the displacement of the human body surface, and
is the signal noise.
Upon reflection, the signal mixes with the local oscillator’s signal, producing two foundational signals: the in-phase signal
and the quadrature-phase signal
. This process is depicted in
Figure 3, which illustrates the CW radar system and its associated components. The quadrature signal is shifted by 90° relative to the carrier. These signals can be expressed as:
In these equations, it is posited that the amplitudes of
and
coincide [
14], and
and
represent the noise components for the in-phase and quadrature-phase signals, respectively, while
represents any additional phase shift in the signal.
For these, the baseband signal,
, is derived as:
where
j is the imaginary unit. For simplification, this signal is then expressed in exponential form using Euler’s formula:
In this equation, , , and .
3.1.2. Simulation Data: Generation Procedure
We use a dataset generated by MATLAB for our research (simulation part), which serves to train and test our Deep Learning (DL) models of regression. This dataset comprises 3000 baseband signals from the CW radar, labeled with the heart rate, Fc, and the respiration rate, Fr, representing 30 subjects in a normal state, with each signal in Equation (
7) having 5001 samples and a varying Signal to Noise Ratio (SNR). For our study, the CW radar is used to generate baseband signals, so we chose 30 heart and respiratory rate values within their normal ranges: Fc [0.83–2] Hz and Fr [0.16–0.33], and we also varied the SNR from −10 to 10 dB for each case. This is performed to illustrate the impact across 100 different environments. In this setup, the CW radar in
Figure 3 and the individual are positioned 1 m apart as shown below
Table 1. In addition, to estimate the performance of the regression models, we generated an additional dataset using MATLAB, containing 15 new values for the heart and respiratory rates to represent 15 subjects in different physiological states (fatigue or drowsiness, normal state, and stress). Each value of Fc and Fr represents an individual, and for each case, the SNR is varied between −10 and 10 dB to produce 100 signals with identical Fc and Fr in 100 distinct environments. We selected five values in the normal ranges (Fc [0.83–2] Hz, Fr [0.16–0.33]) to represent individuals in a normal state. It should be noted that any change in the physiological state results in modifications to the vital signs. For individuals displaying signs of fatigue or drowsiness, we chose five values below these normal ranges. Conversely, for those experiencing stress, we selected five values above the normal ranges. This distinction aims to capture the typical variations in heart and respiration rates associated with different physiological states. The reason why we generated two different databases is to test the ability of the different DL regression models proposed in the article to detect and extract the exact value of the heart rate and the respiratory rate for different physiological states of the driver, knowing that we only used a database that only represents the normal state of the driver. This is a strong point, particularly when validating our models with real measurement data (problem of lack of data [
26]). The results of this test are presented in
Section 4.
On the other hand, we have created a second database with 5300 labeled signals for our classification models. This time, we have selected 20 values for Fc and Fr to represent the driver’s normal state, 16 for fatigue and drowsiness, and 17 for stress. We varied the SNR for each value from −10 to 10 dB.
The normal range for an adult’s respiratory rate is between 10 and 20 breaths per minute, or 0.16 Hz and 0.33 Hz [
29]. The accepted heart rate for adults is between 60 and 100 beats per minute, equivalent to 0.83 Hz and 1.67 Hz [
30,
31]. If a heart rate falls below 50 BPM (usually during sleep), it is referred to as bradycardia, while a rate over 100 BPM is called tachycardia. In terms of the amplitudes, the cardiac frequency amplitude (ac) ranges between 0.2 and 0.5 mm, and the respiratory frequency amplitude (ar) varies from 4 to 12 mm. The standard ranges for the values of ac and ar are ac [0.2–0.5] mm and ar [4–12] mm, respectively [
32].
3.1.3. Real Data
In this study, to both evaluate the performance of the DL models proposed and validate their accuracy on the real data, we used the clinical dataset provided in [
33]. This dataset consists of 30 healthy subjects of different ages and sexes measured via the CW radar system based on Six-Port technology operating at 24 GHz in the ISM band. As a reference, they used an electrocardiogram measured simultaneously with the CW radar. The characteristics of the dataset are mentioned in [
33]: Although this database was not created based on drivers, the signals it comprises will nevertheless allow us to test our regression and classification models on real signals.
To construct our dataset, we based it on the dataset proposed in [
33], using the radar signals in phase and quadrature to construct the baseband signal for each subject (knowing that both signals are stored in mV). Each time, we obtain a signal representing a resting scenario of the person. This signal is of 1,215,200 samples with a sampling frequency of 2000 Hz. To keep the same principle of our simulation dataset, we have re-sampled the data with
and we divided each signal into several signals of 5001 samples corresponding to 50.01 s as the acquisition time. To obtain the heart rate and respiratory rate values corresponding to each signal, several algorithms were used as follows. A normal FFT [
34] was applied to the baseband signals measured via the CW radar to extract the respiratory rate and the results obtained were compared with the results of the cyclostationary algorithm [
14]. For the heart rate, the R-peak algorithm [
35] was applied to the EGG signals measured via the electrocardiogram.
We finally constructed a dataset to train and test our models, with 280 rows and 5003 columns, where each row corresponds to a signal of 5001 samples with two labels (heart rate and respiratory rate), that is, for the regression approach.
In addition, we have constructed a second dataset for validating our classification models. This second real dataset contained 612 label signals (219 representing resting (rst), 185 signals for Apnea (apn), and 208 representing Valsalva (vals) [
33]). As the lack of data measured via the CW radar represents the different states of the driver (drowsiness or fatigue, normal state or resting, and stress), in order to evaluate our classification models, we used a dataset of the three scenarios mentioned above (Resting represents the normal state of the person, Apnea refers to a temporary pause in breathing, and finally, Valsalva represents a breathing technique involving a forceful exhalation against a closed airway, which can affect heart rate and blood pressure). The results of the regression and classification models are represented in
Section 4.
3.2. Training, Test, and Evaluation Networks
In this context, all regression models were trained for 60 epochs with a batch size of 64. The Adam optimizer with a learning rate of 0.001 was used to minimize the Root Mean Squared Error (RMSE), which measures the difference between the model predictions and the actual data. Furthermore, the RMSE is used as an evaluation metric for assessing the performance in the regression output. As for the classification models, they have been trained for 60 epochs with a batch size of 64 and an Adam optimizer with a learning rate of 0.0001. The loss function chosen was the categorical cross-entropy, and the model performance was evaluated using the accuracy metric.
To evaluate the performance of our models, we devised our simulation dataset as follows: 64% for the training, 16% for the validation, and 20% for the test with a random state of 4. We will use the test dataset to estimate the accuracy and performance of each model (Bi-LSTM, 1D-CNN, CRNN, and TCN). In the case of the real dataset, we have the vital signs measured via the CW radar for 30 healthy subjects. We partitioned this dataset into three segments: 17 subjects for training, five for validation, and eight for testing. This partitioning pertains to the regression dataset which contains the signal representing the normal state of the person or the resting scenario. In addition, for the classification dataset, we have just an available dataset for 24 individuals representing the three scenarios of resting, Valsalva, and apnea. We devised our classification dataset as follows: 14 subjects for training, four for validation, and six for testing.
Several statistical indicators or static tests are used to evaluate the architectures of neural networks. In this article, three statistical indicators have been used for the regression models, namely the correlation coefficient
, the root mean square error (RMSE), and the Mean Absolute Error (MAE) to quantify the accuracy of continuous predictions. Four other statistical indicators have been used to evaluate the classification models: accuracy, precision, recall, and the
.
where:
is the value simulated by the model;
is the measured value;
is the mean of the measured values, and N is the number of samples.
(True Positives): The number of observations that were correctly classified as positive by the model.
(True Negatives): The number of observations that were correctly classified as negative by the model.
(False Positives): The number of observations that were incorrectly classified as positive. The model predicted the observation was positive when it was actually negative.
(False Negatives): The number of observations that were incorrectly classified as negative. The model predicted the observation was negative when it was actually positive.
To evaluate the performance of each regression model, we used an adapted version of R2score, referred to as the R2score* indicator, which can be mathematically expressed as:
The reason for using R2score* instead of R2score is the use of different datasets, each containing 100 signals with varying snr [−10, 10] and the same
y (Fc and Fr). The results obtained were presented in
Section 4.3.