1. Introduction
Almost 20 thousand people died in traffic accidents in 2021 in the European Union [
1], and the number of deaths per million inhabitants in Serbia is higher, reaching 30% [
2]. The European Union has set itself the goal of reducing the number of traffic fatalities by 50% by 2030, and it is expected that zero traffic fatalities (‘Vision Zero’) will be reached by 2050 [
1]. Drowsiness, the growth of licensed older drivers, and health problems have been identified as the main causes of traffic accidents. Artificial intelligence (AI) has revolutionized the development of autonomous cars, but mass production is in the distant future. Application in everyday conditions is particularly difficult due to several open questions, such as car-to-car communication, complete amendment of legal regulations, etc. A transitional solution is semi-autonomous cars with a system for monitoring the driver’s health.
From July 2024, all new-type registered motor vehicles sold in the EU must comply with the updated General Safety Regulations, which require a range of mandatory advanced safety services [
3]. The automotive industry as well as the scientific community are making efforts to develop innovative solutions that would increase the safety of drivers and children in cars, as well as comfort and services that facilitate driving itself [
4,
5,
6,
7].
The scientific community is focused on developing a health monitoring system that does not require medical assistance and ensures comfortable and undisturbed driving. To provide a comfortable environment and enable daily application, these systems must not require the placement of electrodes or cuffs after applying the gel directly to the subject’s body, in contrast to clinical practice. As a result, the signal quality is significantly lower compared to signals recorded in controlled clinical conditions, and their processing represents a current scientific challenge. Jointly, the scientific community and the automotive industry proposed multisensor systems built into a car seat for recording the ECG of the Mercedes S class (2009; explained in detail in [
8]), into the steering wheel (measurement of blood oxygen saturation level (SpO2) and body temperature) of the Mercedes S class (2010, explained in detail in [
9]), into of car seat of a Ford car (2011, explained in detail in [
10]) and, finally, a proposal from 2023 for a portable cushion with built-in capacitive electrodes [
11]. There is also a health monitoring system based on high-tech cameras [
12,
13,
14], but it is not a favorable choice from the aspect of violating the privacy of the driver and passengers and taking longer to process images.
Successful driver performance in preventing vehicle crashes determines drivers’ general health and concentration levels that may be impaired in case of high-stress exposure. The development of gastrointestinal and musculoskeletal disorders for professional drivers is also considered a consequence of chronic stress, and the development of posttraumatic stress disorder is also reported as a common chronic mental health issue in ambulance and bus drivers, respectively [
15,
16]. One of the open questions is how stress during driving influences driver health, especially in older drivers, and pacemaker-built-in drivers. In the state-of-the-art literature, there is evidence to support a pathophysiological basis for the increased incidence of arrhythmias during driving, but there is a lack of convincing causal evidence linking driving activity to arrhythmia generation [
17].
A detailed review of the literature (published from 1990 to 2007) related to the development of algorithms for the detection of stress levels was presented in [
18]. The presented algorithms are based on feature extraction of direct-to-skin contact measurement ECG, electrodermal activity, respiratory response, electrodermal activity, and blood pressure. In the literature published from 2007 to 2022, traditional machine learning models based on feature extraction and deep transfer learning techniques were proposed for stress level detection based on ECG by recorded direct skin contact. A traditional machine learning model based on 14 features extracted by characteristics point of ECG for detection of three stress levels (with an achieved accuracy of 88.24%) was proposed in [
19]. M. Amin et al. [
20] proposed the use of deep learning techniques with input as scalogram images (1D ECG signals were transformed into 2D time-frequency images (scalograms)) and achieved an accuracy of 98.11% for stress level detection. In [
21], the authors proposed the use of a multimodal fusion model based on convolutional neural networks (CNN) and long short-term memory (LSTM) to fuse the ECG, vehicle data and contextual data to jointly learn for stress level detection during driving (achieved average accuracy was 92.8%). M. Rastgoo et al. [
18] highlight the challenges that may arise in the application of algorithms in the real-world driving scenario with the use of unobtrusive acquisition systems such as data quality and time-consuming data analysis of high dimensionality. Also, J. Wang et al. (2020) emphasized that there are no scientific studies dealing with the detection of driver’s illness on non-contact recorded signals while driving [
22].
A new challenge facing the scientific community is the processing of inferior quality signals recorded in everyday conditions without the supervision of medical experts. Inferior quality is a consequence of limiting factors such as measurement without direct skin contact with the driver (recording over clothes made of different materials), the dominant presence of movements during driving, as well as the higher sensitivity of capacitive electrodes to noise and the capacitive electrodes being more affected by motion artifacts than the conductive one [
11]. An cECG time series recorded during driving has an extremely low amplitude, approximately 1 mV, which is up to several tens of times lower intensity than the signals recorded in controlled clinical conditions [
23], with the predominant presence of movement artifacts (on average, 30% of the time series recorded during driving is useless [
24]). Despite such low quality, it has been shown that it is possible to reliably detect the R peak (the local maximum of the ECG signal), based on which the heart rate (HR) is estimated [
10]. A higher cECG amplitude level was achieved by proposed different hardware realization proposed in [
11].
This paper aims to develop a machine learning model for the detection of stress levels based on capacitive electrocardiogram (c)ECG recorded in an unobtrusive acquisition system that could be applicable in daily driving. The basic idea of the proposed method is to select a feature list based only on characteristic points of cECG that are possibly reliably estimated regardless of the cECG quality level, recorded during driving. The feature list is also constrained by feature selections that are not computationally demanding to enable the application of this model in real-time conditions. We expect that the proposed model will achieve high performance regardless of the quality of datasets and hardware realization acquisition system. For the development and validation of model, we used three open database sets: (1) ECG recorded by direct skin contact during driving (Database 1, the highest quality), (2) cECG recorded non-direct skin contact through clothes by capacitive electrodes built into a portable multi-modal cushion during driving (Database 2, modest quality), and (3) cECG recorded through the clothes no-direct skin contact by capacitive electrodes built into a car seat during driving (Database 3, the lowest quality). In each experiment, drivers were exposed to different stress levels: city driving (higher stress level) and highway driving (lower stress level) in Database 1, driving with distraction (by talking, or obligating driver’s moving according to predefined protocol) as higher stress level and driving without distraction as lower stress level in Database 2, and finally city driving (higher stress level) and open roads driving (lower stress level) in Database 3.
2. Materials and Methods
2.1. Materials
The proposed method is validated on three open datasets recording during driving: ECG recorded by electrodes with direct contact with skin; cECG recorded without direct skin contact through clothes by capacitive electrodes built into a portable multi-modal cushion; and cECG recorded through the clothes without direct skin contact by capacitive electrodes built into a car seat. All drivers that participated in the experiment were healthy volunteers.
2.1.1. ECG Recorded by Direct Skin Contact during Driving (Database 1)
The acquisition system consisted of an electrocardiogram (ECG), electromyogram, skin conductivity (electrodermal activation, and galvanic skin response), and respiration sensors [
25]. The FlexComp analog-to-digital converter (sampling frequency is 496 Hz) was connected to an embedded computer.
ECG electrodes were directly placed on the volunteer’s skin. The skin was prepared by using alcohol as a cleanser and Electro-Trace pre-gelled before ECG electrodes were placed [
26]. Modified lead II configuration was used to minimize motion artifacts and maximize the amplitude of the R waves.
In the experiment, 24 healthy volunteers were involved. Driver protocol consisted of a driving period of rest, highway, and city driving in the Boston area [
25]. The total duration, including rest periods, was from 50 min to 1.5 h, depending on traffic conditions. The stress level metric was estimated based on a driver questionnaire and a score derived from observable events and actions coded from video recordings. Obtained results have shown that a high level of stress was achieved for driving in city environments, and a medium level in driving highway environments. An open data set provided data recorded on 16 out of 24 drivers with complete recorded data and metrics for stress levels. The mean length [samples] ± SD of cECG is 219,590 ± 40,860 samples.
The database with all experimental recordings is publicly available in [
27], and the experiment is described in detail in reference [
25]. All volunteers gave written consent. For a clearer presentation of the results in tables and figures, these data are marked with Database 1.
2.1.2. cECG Recorded by Non-Direct Skin Contact through Clothes by Capacitive Electrodes Built into a Portable Multi-Modal Cushion during Driving (Database 2)
The acquisition system consists of four sensor units that include a reflective photoplethysmography sensing unit, a magnetic induction measurement sensing unit, an accelerometer, and one electrode for cECG built into a portable multi-modal cushion. The sampling frequency for each modality is 128 Hz. The cECG electrode is realized using a shielded high-impedance operational amplifier, OPA140 (Texas Instruments, Dallas, TX, USA). The amplifier output by coaxial cable is led to the controller box, where the ECG leads were obtained. Detailed block schemes and photographs are given in [
11].
In the experiment, 20 healthy subjects (18 males and 2 females, aged 25.9 ± 8.53 years) were used driving simulators for around 25 min [
11]. The open-source simulator CARLA was used for simulating driving, with the selected environment “Town04” [
11]. The driving was simpler than the driving conditions in real-world conditions because the drivers were not obliged to follow the traffic rules.
The experiment consisted of four stages: stage 1 (ST1)-driving in a salient environment (without talking with the driver), stage 2 (ST2)-movement of the driver during driving according to protocol, stage 3 (ST3)-distracting the driver with a conversation that simulates the presence of passengers, and stage 4 (ST4) siting without driving and talking [
11]. The mean length of cECG, expressed in samples, is 722,332.
The database with all experimental recordings is publicly available in [
28], and the experiment is described in detail in reference [
11]. All volunteers gave written consent.
The study protocol was a by the ethics committee of RWTH Aachen University Hospital (EK 183/22). For a clearer presentation of the results in tables and figures, these data are marked with Database 2.
2.1.3. cECG Recorded through the Clothes No-Direct Skin Contact by Capacitive Electrodes Built into a Car Seat during Driving (Database 3)
The acquisition system was based on 6 capacitive electrodes built into the car seat. The electrodes were arranged on three levels per two capacitive electrodes in the upper part of the car seat. Three electrodes with the most reliable measurements were manually selected [
29]. The sampling frequency was 200 Hz. A detailed block scheme and photograph was given in [
10,
29].
In the experiment, 6 male volunteers (mean aged 39.8 ± 26.2 years) were involved. cECG time series were recorded during driving in the city (about 2 h of recording), on the highway (about 8.8 h of recording), and in the polygon in Belgium (about 2.5 h of recording) in real-world conditions [
29]. The total number of measurements was 31 measurements, consisted of three cECG time series (marked by cECG
1 = Electrode
1 − Electrode
2, cECG
2 = Electrode
2 − Electrode
3, and cECG
3 = Electrode
3 − Electrode
1), as well as the reference ECG signal. The mean length of cECG, expressed in samples, is 1556.93 per channel. The estimated level of signal-to-noise ratio SNR was −40.01 ± 33.64 dB, for cECG
1 and −30.99 ± 23.39 dB for cECG
2 during driving [
24]. The negative value of SNR was a consequence of very high amplitude values of the coarse artifacts if compared to the useful parts of the signal (medical experts labeled useful and useless parts of the cECG time series). The high standard deviation is a consequence of the different amounts of coarse artifacts in the cECG time series in Database 3.
The database with all experimental recordings is publicly available in [
28], and the experiment is described in detail in reference [
29]. All volunteers gave written consent. For a clearer presentation of the results in tables and figures, these data are marked with Database 3.
The mean absolute amplitude value of (c)ECG time series, number of participants, and sampling frequency ECG for Database 1 (direct skin contact), Database 2 (non-direct contacts by capacitive electrodes built-in portable cushion), and Database 3 (non-direct contacts by capacitive electrodes built-in car seat) is shown in
Table 1.
It is noticeable that the hardware solution for cECG acquisition (Database 2), proposed in [
11], achieved higher amplitude intensity during driving compared to Database 3. Although the amplitude intensity of Database 1 (recorded with direct skin contact) is comparable to the amplitude value achieved in Database 3, the signal quality is significantly better due to fewer motion artifacts as a consequence of direct skin contact.
Table 1 also shows that the sampling frequency is different for each dataset.
2.2. The Proposed Method
2.2.1. Framework of the Proposed Method
In the recent literature (e.g., [
19,
26,
30]), stress level detection was determined based on parameters extractions from the ECG signal, muscle activity or electromyogram (EMG), skin conductance or electrodermal activity (EDA), and respiration (RSP) recorded by direct contact with the skin (high-quality recordings).
A new challenge facing the science community is the development of methods for estimation of the stress level based on unobtrusive recorded signals, through clothing, so that these acquisition systems can be candidates for everyday use in the automotive industry. In the first place, these signals are characterized by an inferior quality compared to the gold standard recordings in controlled clinical conditions, especially due to the significant presence of motion artifacts and recordings without direct skin contact. Inferior quality and the dominant presence of motion artifacts may contribute to the unreliably detection of P, Q, R, S, and T points (PQRST complex) in unobtrusively recorded cECG time series. In [
31], it is emphasized that reliable identification of P and T waves in cECG is not possible, mostly due to the presence of movement artifacts.
To enable application in pervasive environments, in the study, we propose a novel machine learning (ML) model based on the extraction of only four features that request only R-peak detection (the local maximum of the ECG signal), which can be the most reliably detected characteristic point in inferior quality cECG time series [
10]. As shown in
Figure 1, this method consists of three main parts: preprocessing, feature extraction, and classification step.
The detailed steps of the proposed method are as follows:
cECG time series were processed by method design for the reduction of coarse motion artifacts in cECG recorded on moving subjects (explained in detail in [
24]);
Four feature extraction: heart rate (HR), the value of R peak, binarized approximate entropy of cECG time series (BinEn), and pNN (percentage of differences between adjacent normal cardiac intervals);
Different type of machine learning methods has been tested, but the most consistent classifier for stress detection levels for three datasets (Database 1, Database 2, and Database 3) has been shown.
2.2.2. Preprocessing cECG Time Series
We have recently proposed a method for artifact reduction in cECG time series, originally developed for cECG recorded on driving subjects (described in detail in [
24]). The preprocessing method was validated on cECG recorded during driving by capacitive electrodes built-in car seat (Database 3). Raw cECG time series were segmented to provide real-time application during driving. Artifact reduction is based on estimated fluctuation around linear trends into segmented time series, to identify the presence of coarse artifacts as a consequence of moving subjects during driving or slow-changing artifacts. All parameters used in the pre-processing step were estimated based on statistical parameters of observed
cECG.Briefly, the
cECG time series is marked with
x, and samples are denoted by
, where
represents the length of the time series. Vector of cumulative sums
Y(
i)
, i = 1, …,
N is formed following [
32]:
where the
kth centralized sample is formed by subtracting the mean value of the time series
from the
kth sample
.
The vector of cumulative sums Y(i), is divided into non-overlapping segments of length SL. represents segments, where j is used for the notation of particular segments and k denotes the samples within a particular segment, .
In the next step, from
segments are subtracted the approximated polynomial
of the
v-th order.
where
is the polynomial coefficients on a segment, and
v is thepolynomial order [
32].
We used linear polynomials (
v = 1) as recommended in [
32].
Detrended fluctuation analysis function of one segment,
, is calculated as the sum of the square value of the difference between the original value of the time series and the trend of a given segment divided with
[
32]:
In the first step, we check the presence of coarse artifacts in particular segment according to established criteria proposed in [
24]:
where max(
x) and min(
x) are maximum and minimum values of time series
x, respectively, and
M- the is the second moment of time series
x.
If the condition from Equation (5) is fulfilled, we then eliminate observed segments if
is greater than the threshold
[
24]:
where
is the median value of detrended fluctuation function of all segments in time series,
and
are the standard deviation of
and
x, respectively, max(
x), min(
x)-maximum and minimum value of time series
x, respectively,
M- the second moment, C is constant value from the range,
and
.
To check if coarse artifacts partially spill over the adjacent segments, adjacent segments are also checked [
24]. If the difference between
values and
or
of adjacent segments is larger than
, this segment is also eliminated. To eliminate small deviation from the linear trend, i.e., slow-changing artifacts, we compare the value of
with threshold
. To avoid the elimination of the useful part of cECG time series where the signal intensity is very small, such as in the case of Database 3, this step is applied only if the difference between the mean value of
and
is greater than
[
24].
2.2.3. Feature Extraction
The selection of features was based on the detection of the R peak, the local maximal ECG signal, as the point of the PQRST complex that we can detect with the highest reliability even if the cECG recording quality is not high, as is the case with Database 2. The list of features consists of only 4 features (R value, heart rate, percentage of differences between adjacent normal-to-normal (NN) intervals, and Binarized entropy of cECG time series), which are not computationally demanding and would enable the use of the proposed ML model in real-time. To the best of the author’s knowledge, R-value and Binarized entropy (
BinEn) are recognized as features for stress level detection during driving for the first time (a detailed overview of the features used to detect driving stress is presented in [
18]).
Feature Selection Based on R Peaks
In line with [
10], we used open-source ECG analysis (detailed described in [
33]) for R peaks detection in cECG. Three features out of four are selected based on R peak: R peak value, heart rate (
HR) based on the time interval between adjusted R peaks (marked as RRI), following:y
and percentage of differences between adjacent normal-to-normal (NN) intervals (
pNNx). NN intervals correspond to RRI intervals after artifact reduction, so preprocessing of cECG is an obligatory step in this case.
pNNx is estimated following [
34]:
where
is difference between adjusted
NN intervals, and
x is predefined value.
In this study, we selected a value of
x = 50 ms, according to [
34].
Binarized Entropy (BinEn)
The fourth feature presents binarized entropy of cECG, and also does not require detection of PQRST complex in cECG. Binarized entropy (
BinEn) [
35], as a derivative from approximative entropy (for a detailed explanation, see [
36]), is an adapted need of mobile crowdsensing systems.
Binarized entropy (
BinEn) is performed on differentially coded time series, which significantly speeds up the estimation of entropy value and increases the probability that signals fulfill stationarity conditions. The complexity of
BinEn is linear, and it is achieved by substituting the number of different real vectors with a number of different binary vectors [
35]. This method modification makes
BinEn applicable in real-time conditions.
A brief recapitulation of BinEn for a single data set is below.
In the first step, time series
is binary differentially encoded and split into
m-sized binary vectors [
35]:
where the delay
τ distances the elements of the vector from each other and
m is the size of the vectors. The recommended value for parameter
τ = 1, and
[
35]. In the
BinEn, the vectors are binary, so the number of different vectors is
and each vector can be assigned a decimal number
[
35]:
represents the number of occurrences of a certain vector series in the observed time series
[
35]:
—indicator function equal to one if the condition is met, and zero otherwise.
The estimation of the probability mass function of observed vectors in
is equal to:
Distance
, between each pair of vectors, is calculated according to the Hamming distance [
35]:
where
notes ex-or logic function, and
I the indicators function. The distance
between the vectors is a discrete variable that can have one of the
values, that is,
.
The matrix of the Hamming distance is denoted by
H(
m) [
35]. Elements of matrix H are the distance between the vector whose decimal represents
k and the vector whose decimal represents
n, (
). The probability that vector
occurs in
is estimated based on the value in matrix H, which gave information about which vectors are in distance less than
r from
, and Equation (11), which gave information about a number of vectors that are at the same distance:
In the next step, the value of summand
is calculated as the average of logarithm
[
35]:
The final value of
BinEn is estimated on model of approximate entropy (detail in [
36]) [
35]:
2.2.4. ML Algorithms
We used different classes of ML algorithms to investigate their potential in the classification of different stress levels in three datasets, recorded during driving with different levels of quality (c)ECG time series.
K nearest neighbors (KNN), as one of the simpler ML algorithms, makes a classification based on the K of the nearest neighbors [
37]. The distance between test and training objects in feature space was measured by Euclidean distances. Based on Euclidean distances, test data is classified into: higher stress level (city driving) and lower stress level (driving in highway conditions) for Database 1, higher stress level (distraction of driver by speaking and self-movement during driving) and lower stress level (driving in silent conditions) for Database 2, and higher stress level (city driving) and lower stress level (driving in highway and training ground) for Database 3. The number of K is selected by rules of thumb, k =
.
The Support Vector Machine (SVM) is another class of machine learning that we used to investigate the potential in classification for three datasets, independently. SVM is based on statistical learning theory. It maps the input data onto a higher-dimensional feature space, with the idea of finding a hyperplane that separates space perfectly into its two classes [
38].
The Artificial Neural Networks (ANN), as one of the most effective machine learning tools [
37], is also investigated. We used a two-hidden-layer feed-forward network, where the weight coefficients were estimated in one direction from the input to the output layer without looping back. We used mean-square error as the loss function, and hyperbolic tangent sigmoid function as the activation function. The number of neurons in hidden layers was 20, the learning rate was 0.5, and we used networks with backpropagation training algorithm.
2.3. Statistical Analysis
Considering that a small number of healthy volunteers participate in experiments (
Table 1), we divided the cECG time series recorded while driving into non-overlapped parts with a duration of 50s. This was made possible by long hours of recordings within each database. The time duration to complete the database is given in
Table 1. The total size of the data set was 1098 (Database 1: 254, Database 2: 189 and Database 3: 655) after excluding cECG time series for which we did not have all the labeled data, the cECG time series that after preprocessing did not have satisfactory length for useful part of a signal (special in case Database 3), and part of an experiment that was not the focus of this study (for example, recording was in the rest period of Database 1, or siting without driving and talking in Database 2). In this way, we have provided a database of suitable size for testing and validating ML algorithms. Datasets were normalized by subtracting the mean value of each feature and a division by the standard deviation.
The datasets are imbalanced (Database 1: 72.44% is higher stress class, and 27.56% lower stress class, Database 2: 33.33% is higher stress class, and 66.67% lower stress class, and Database 3: 25.19% is higher stress class, and 74.81% lower stress class), so we opted for balanced accuracy. We also estimated Matthew’s correlation coefficient (MCC) because both classes are equally important, and it summarizes the classifier performances in a single value (from −1 to 1).
Classification performance is calculated according to the following expressions:
The classification performance was tested in the context of quantifying the success of the detection of the stress level. We distinguish between two classes: higher stress level (city driving) and lower stress level (driving in highway conditions) for Database 1, higher stress level (distraction of driver by speaking and self-movement during driving) and lower stress level (driving in silent conditions) for Database 2, and higher stress level (city driving) and lower stress level (driving in highway and training ground) for Database 3.
In this context, TP denotes the number of cases correctly identified as higher stress level, FP denotes the number of cases incorrectly identified as higher stress level, TN denotes the number of cases correctly identified as lower stress level, FN denotes the number of cases incorrectly identified lower stress level for each database sets. Training and test datasets consisted of cECG segments of different groups of subjects. In both training and test datasets, both classes (high and low stress levels) are present in an imbalanced proportion that is characteristic of these databases. Cross-validation is used to evaluate and validate the performance of machine learning models.
Some of the illustrative results are presented as graphs showing mean ± standard deviation. Statistical significance between observed groups was checked by t-test for paired samples in MATLAB R2019a. We used a significance level p < 0.01 for all compared groups.
4. Conclusions
The main contribution of the study is the development of an ML-based method for detecting stress levels during driving, which has met the criteria for application in pervasive healthcare systems. The method consists of a pre-processing step, extraction of only four features, and classification by an artificial neural network. The proposed method is adapted to cECG signals recorded in unobtrusive acquisition systems, over clothes in a comfortable environment that enables everyday application. The feature list is formed based on two criteria: possible reliable detection of characteristic points regardless of quality recorded time series (recording conditions are harsher than clinical recordings, due to measurements over clothes and the movement of the driver while driving) and computationally non-demanding features to enable the real-time application. The proposed method is validated on three open datasets with different quality of recordings, due to the use of different hardware realization for the acquisition system of cECG and different conditions of recordings. High performances are achieved in terms of balanced accuracy and MCC for Database 1 (direct skin contact recordings), for Database 2 (non-direct skin contact, measurement through clothes by capacitive electrodes built-in portable cushion), and for Database 3 (non-direct skin contact, measurement through clothes by capacitive electrodes built-in car seat). The Matlab code of the proposed method is available per request.
In the future, the proposed model can be an integral part of an automatic diagnostic system based on unobtrusive recorded cECG, following the proposals from [
39,
40,
41], which were developed for ECG recorded in controlled clinical conditions.