1. Introduction
Out-of-hospital cardiac arrest (OHCA) is one of the leading causes of death worldwide, with an annual incidence of 67 to 170 per 100,000 inhabitants in Europe, as well as survival rates at hospital discharge of around 8% (0% to 18%) [
1]. OHCA survival depends on several crucial factors, including bystander cardiopulmonary resuscitation (CPR) with emphasis on chest compressions (CCs), early defibrillation, and the overall standard of medical care provided by the emergency medical services (EMS) [
2].
Recognizing the patient’s cardiac rhythm throughout resuscitation is crucial for two key reasons: first, to guide therapy according to the treatment pathways defined by the international guidelines; and second, to retrospectively analyze the patient’s response to treatment. Regarding the first case, resuscitation guidelines emphasize the need for discriminating between shockable (Sh) rhythms, comprising ventricular fibrillation (VF) and pulseless ventricular tachycardia (VT); and non-shockable (NSh) rhythms, which include both organized (OR) and asystole (AS) rhythms. The Sh/NSh discrimination is the most crucial decision during resuscitation as defibrillation is the only treatment capable of restoring the normal function of the heart when a Sh rhythm is present [
2,
3]. A finer classification of the cardiac rhythm may also be needed, especially within the NSh rhythm group, to determine other decisive therapies. For instance, the recommended treatment for AS consists of high-quality CPR, early administration of adrenaline, and identification of the underlying cause of the arrest [
4], whereas the presence of an OR could be an indicative of the return of spontaneous circulation; in which case, the patient should be transported to hospital for post-resuscitation care and recovery [
2]. As for the retrospective debriefing of resuscitation episodes, knowledge of the patient’s rhythm throughout the episode may offer valuable insights on the interaction between therapy and physiological response [
5,
6,
7]; this may help identify optimal treatment strategies or clinical interventions that improve OHCA survival. One of the limitations for such retrospective studies is the lack of OHCA databases, including cardiac rhythm annotations by expert clinicians, which is mostly due to the expensive and time-consuming manual labor required. Given all this, there is a clear need for the development of multiclass algorithms that automatically identify the patient’s cardiac rhythm, both in real time and retrospectively.
The state-of-the-art OHCA rhythm classification algorithms are mainly based on the analysis of the ECG, typically consisting of an ECG feature extraction stage followed by a machine learning (ML) classifier. They mostly consist of an ECG feature extraction stage followed by a ML classifier. ECG feature extraction has been approached in time [
8,
9], frequency [
10,
11], combined time–frequency [
12], and complexity domains [
13]. The ML approaches explored for the classification stage include K-nearest neighbors [
14,
15], support vector machines [
16,
17], artificial neural networks [
15], and ensembles of decision trees [
18,
19]. Recently, OHCA rhythm classification has shifted toward deep learning (DL) techniques, such as convolutional neural networks (CNNs) [
20] or residual networks (ResNet) [
21], which avoid the knowledge-based feature extraction process of traditional ML models. As determining the need for defibrillation is crucial in OHCA, the discrimination between Sh and NSh rhythms has been the most commonly addressed classification problem by the aforementioned algorithms. However, a less simplistic cardiac rhythm classification is needed to determine other decisive therapies during CPR [
22,
23]. To address this, Rad et al. [
15] introduced the first multiclass OHCA rhythm classifier, where a set of features derived from the discrete wavelet analysis of the ECG were fed into different ML-based classifiers.
These ML- and DL-based binary and multiclass algorithms have primarily focused on rhythm classification during interruptions in chest compressions (CCs), as the mechanical activity during CPR introduces ECG artifacts that hinder accurate rhythm detection. As a result, current commercial defibrillators require rescuers to pause CCs every 2 min for rhythm analysis. However, these interruptions reduce blood flow to vital organs, decreasing the chances of survival [
24]. Over the past few decades, efforts have been made to develop more accurate rhythm analysis algorithms that could be applied during CCs, thus helping minimize CC interruptions [
10,
25]. These algorithms typically follow a similar structure to those used during non-CC intervals, but they include a preliminary filtering step to remove CC-induced artifacts. Such methods have proven successful in classifying both Sh/NSh and multiclass rhythms during OHCA [
26,
27].
While earlier methods primarily focused on manually delivered CCs, the use of mechanical compression devices in OHCA assistance has notably increased. These devices provide CCs at a constant rate and depth, and although evidence supporting improved survival remains inconclusive [
28,
29,
30], their growing adoption underscores the potential benefits they offer. Mechanical devices help ensure the quality of CCs in line with current resuscitation guidelines, even in situations where manual CPR might be compromised, such as during transport [
31,
32], in confined spaces, or during prolonged resuscitation efforts when rescuer fatigue may impact performance [
33]. In addition, these devices alleviate the physical burden on healthcare providers, allowing them to focus on other aspects of patient care. However, while some studies have addressed Sh/NSh rhythm classification in the context of mechanical CPR [
34,
35], there is no solution yet for multiclass rhythm classification during mechanical CPR.
In this study, we introduce the first DL solutions for reliable multiclass cardiac rhythm classification during mechanical CPR, which distinguish Sh, AS, and OR rhythms. The DL framework employs convolutional neural networks (CNNs) and residual networks (ResNet) for direct ECG classification. In order to know if deep learning techniques improve rhythm classification upon the existing state-of-the-art approaches during mechanical CPR, a traditional ML-based classifier was used as baseline. The ML framework consists of a feature extraction stage, in which more than 90 features are derived from the ECG rhythm classification research conducted over the past two decades, and a random forest (RF) classifier.
From a clinical perspective, a reliable multiclass rhythm classification during mechanical CPR would provide several benefits. It would enable more accurate identification of shockable rhythms, improving defibrillation timing and enhancing therapeutic decision making. Additionally, it would allow for more precise management of non-shockable rhythms, such as asystole, by ensuring timely and optimal treatment like adrenaline administration. This would not only improve resuscitation efficiency, but also enhance post-resuscitation care by reliably detecting the return of spontaneous circulation (OR), optimizing patient outcomes. Furthermore, a multiclass rhythm classifier would be useful for the retrospective annotation of rhythms, potentially offering valuable insights into the interaction between therapy and patient response. This could ultimately contribute to identifying optimal treatment strategies, refining resuscitation protocols, and improving survival chances in OHCA.
2. Materials
The data used in this study were collected from the Circulation Improving Resuscitation Care (CIRC) trial, which was designed to compare automated load distributing band CPR (LDB-CPR) with high-quality manual CPR (M-CPR) in terms of survival [
28,
36]. Data were gathered between 5 March 2009 and 11 January 2011 in a randomized, unblinded, and controlled group sequential trial of OHCA patients by three US (Fox Valley, Hilsborough, and Houston) and two European (Vienna and Nijmegen) EMS. After EMS providers initiated manual CCs, patients were randomized to receive either LDB-CPR or M-CPR. The LDB device (AutoPulse, ZOLL Medical, Chelmsford, MA, USA) delivered CCs in a fixed position, with a constant depth of 20% of the patient’s anterior posterior diameter of the chest and at a constant rate of 80
−1 (
).
Anonymized data from Lifepak 12 and 15 monitor defibrillators were exported to MATLAB (MathWorks Inc., Natick, MA, USA) using Physio-Control’s CODE-STAT data review software, and they were then resampled to a sampling frequency of 250 Hz. The data included the ECG and thoracic impedance (TI) signals of each episode together with the CC instants detected by the CODE-STAT software.
Figure 1 corresponds to a 70
interval from an OHCA episode, where ECG (corrupted by CCs) and TI signals are shown in Panels (a) and (c), respectively. The blue circles on the TI signal indicate the CC instants detected by the CODE-STAT software. As can be seen, each fluctuation in the TI signal corresponds to a CC that was administered by the EMS. Furthermore, in
Figure 1, two series of CCs can be clearly distinguished in the 0–15
and 47–70
time intervals, corresponding to M-CPR and LDB-CPR, respectively. The middle interval, from 15
to 47
, corresponds to a segment without CCs and is, therefore, free of CC artifacts. Note the clear difference in the TI pattern during LDB-CPR and M-CPR, with a much larger amplitude and a more regular pattern for LDB-CPR due to the constant depth and rate of the mechanical CCs. Panel (b) of
Figure 1 shows the instantaneous CC rate derived from the CC instants marked in Panel (c). It can be observed that the CC rate for manual CPR was variable and fluctuated around 140
−1, but when the LDB device was applied, the CC frequency stabilized in 80
−1.
Episodes where LDB-CPR was administered were used to conduct this study, and the application of the LDB device was identified when the CC rate stabilized at the device’s fixed rate of 80
−1 for at least 20
(notice the activation of the LDB device in Panel (b) of
Figure 1). Then, 22
signal segments were extracted, corresponding to a single cardiac rhythm and comprising a 6
CC-free interval and a 16
corrupted by CC artifacts (refer to the highlighted segment in
Figure 1). The intervals during CCs were used as inputs for the multiclass decision algorithms, while the artifact-free intervals were employed to annotate the real underlying rhythm of the patient. Rhythms were annotated as Sh, AS, or OR. The segments corresponding to the EMS of Hilsborough, Nijmegen, and Vienna were annotated by consensus of three biomedical engineers and then subsequently audited by a clinician specialized in the resuscitation field. These segments were used to evaluate the performance of the multiclass decision algorithms. The segments corresponding to the remaining two EMS were not audited by a clinician and were, therefore, used to train the algorithms.
The final database consisted of 15,479 segments extracted from 2058 patients, whereof 9666 segments (1252 Sh, 3865 AS, and 4549 OR) from 1178 patients were used to train/develop the multiclass decision algorithms, and then 5813 segments (1154 Sh, 1616 AS, and 3043 OR) from 880 patients were used to test the performance.
All information regarding ethics approval, data collection procedures, patients’ inclusion/exclusion criteria, and related ethical considerations is thoroughly outlined in the original clinical trial papers, from which the data in this study were derived [
28,
36]. In summary, the collection of defibrillator files was approved by the Institutional Review Board (IRB) or ethics committee for the lead EMS agency at each study site, and it was conducted under Exception for Informed Consent (EFIC) for emergency research issued by the USA Food and Drug Administration and the applicable laws of The Netherlands and Austria.
3. Methods
This study proposes and evaluates different algorithms for the three-class (Sh, AS, and OR) classification of ECG segments corrupted by LDB CC artifacts. All the algorithms were composed of two main stages: (1) an adaptive filter based on a Recursive Least Squares (RLSs) to eliminate CC artifacts from the ECG, and (2) a classification stage, for which RF, CNN, and ResNet models were optimized to classify the rhythm in the filtered ECG as Sh, AS, or OR. All the classification models were designed to analyze the filtered ECG in the interval from 2–14
(see the highlighted interval in
Figure 2), which may help avoid the filtering transients of the RLS filter. In what follows is
, where
is the sampling period (
), and
n is the sample index.
3.1. CPR Artifact Suppressing Filter
During CPR, the corrupted ECG signal,
, recorded by the defibrillator, can be expressed as
where
is the patient’s uncorrupted ECG, reflecting the actual underlying heart rhythm, and
represents the artifact introduced by CCs.
An adaptive RLS filter, tailored for removing periodic interferences [
37,
38], was used to obtain an estimate of the CC artifact
, which was then subtracted from
to obtain the filtered ECG
, i.e., an estimate of the true underlying heart rhythm. In this approach, the artifact is assumed to be quasi-periodic, and it is modeled as a truncated Fourier series of
N terms:
where
is the fundamental discrete frequency of CCs—which, for a LDB device, is constant at
with
—and
is the sampling period. Based on this model, the estimated artifact can be expressed in vector format as
where vectors
and
, respectively, define the time-varying coefficients and the in-phase and quadrature reference signals of the Fourier series:
The RLS filter estimates the
and
coefficients adaptively over time so that the error between the corrupted ECG,
, and the estimated artifact,
, is minimized at each iteration at the harmonics of the LDB device compression frequency,
. Note that, in this configuration (
Figure 2), the error signal corresponds to the filtered ECG,
. The update equations of the filter are given by
where the gain matrix,
, and the coefficient vector were initialized to
and
. As shown in the previous equations, there are two configurable parameters in the RLS filter: the number of harmonics,
N, that model the artifact, and the forgetting factor,
, which provides a trade-off between the filter’s adaptability and stability. These values were fixed to
and
, based on the optimal RLS configuration identified in [
35], where the RLS filter was evaluated in terms of the performance of a Sh/NSh decision algorithm applied to the filtered signal. Panel (a) of
Figure 2 shows the adaptive RLS filtering schema, while Panel (b) displays the input and output signals. From top to bottom, we have shown the following: the corrupted ECG,
; the estimated CC artifact,
; and the filtered ECG revealing the underlying rhythm of the patient (which, in this case, corresponds to an OR rhythm).
3.2. Optimization and Evaluation
The training set, composed of 9666 segments derived from 1178 patients, was used in a 10-fold cross validation (CV) approach for hyperparameter optimization and, in the case of RF models, for feature selection; details on the hyperparameters and feature selection are provided in the upcoming sections corresponding to each model. Data were divided patient-wise, and it was also ensured that every fold retained a prevalence of each rhythm comparable to that of the entire dataset.
The remaining 5813 segments, corresponding to 880 patients, were used to evaluate the performance of the classifiers. For each class
, the sensitivity (
), positive predictive value (
), and
-Score
i were computed; and the unweighted mean of all sensitivities (UMS), total accuracy (ACC), and unweighted mean of all the
-Scores (UMFS) were used as summarizing metrics:
where
,
,
, and
are the true positives, true negatives, false positives, and false negatives for class
i, respectively.
In order to estimate the statistical distributions of the performance metrics, the test set was split into 20 patient-wise replicas, each of which included 44 patients. Performance metrics were reported as the mean (standard deviation, SD) as they passed the Kolmogorov–Smirnov normality test. Finally, a two-sample paired t-test was performed to test for equal means of the performance metrics in both the deep learning methods and the state-of-the-art method, RF, as well as in the ResNet and CNN models. A p-value of < 0.05 was considered statistically significant.
3.3. Algorithm Based on CNNs
Figure 3a shows, in blue, the architecture of the multiclass OHCA rhythm classification algorithm based on CNNs. This architecture is based on the one proposed in [
26] to discern Sh and NSh rhythms during manual CPR. The filtered ECG (1D signal of
samples) was introduced to a CNN composed of
B convolutional blocks (
Figure 3 shows a three-block as an example, and
B is a trainable parameter), which is used to extract the high level features of the signal, followed by two fully connected layers for the three-class classification. The
b-th convolutional block is composed of a one-dimensional convolutional layer (Conv1D) with
filters of width
, followed by a batch normalization (BN) layer, a rectified linear unit (ReLU), a max-pooling layer, and a dropout layer.
The input to the first convolutional block is defined as
. The expression
will refer to the input of block
b or, equivalently, the output of block
, where
n and
m represent the time and filter index, respectively. The output of the Conv1D at
b-th convolutional block can be formulated as follows:
where the filter weights
and the biases for channel shifting
are the learnable parameters adjusted during training.
BN layers modify the output of the preceding layer to prevent complex weight interactions from altering the data distribution. This accelerates training by allowing for the use of larger learning rates and improves generalization while reducing overfitting [
39]. For every training mini-batch
, a BN layer calculates the channel-wise means
and variances
, and it then normalizes each channel via the following equation:
where
is a small value included to ensure numerical stability. The normalized channels are then adjusted through scaling and shifting to optimize the final ReLu layer. As a result, the outputs,
, can be expressed as
where
and
are trainable parameters. On inference, a moving average of the mini-batch means
and variances
observed during training is typically applied in (9).
Max-pooling layers downsample input data by taking the maximum value from each block of
K elements along the time dimension
n, so that the output for block
b can be represented as follows:
Finally, the ReLU layers add nonlinearity to the network through the activation function , enabling the model to learn intricate nonlinear mappings.
Zero-padding was applied before the convolution operations, so the only reduction in dimensionality was due to the max-pooling layers (). The dropout layer at the end of each block serves as a regularization mechanism, operating exclusively during training to prevent overfitting. This layer temporarily disables a randomly chosen fraction of the network’s adjustable parameters. The output of the last convolutional block was flattened and fed to a dense network composed of two fully connected layers with 10 and 3 neurons, respectively. Finally, a softmax layer transformed the output of the final 3 neurons into values ranging from , representing the probabilities that a given segment corresponds to a Sh (), AS (), or OR () rhythm.
3.4. Algorithm Based on ResNets
The third architecture explored was a ResNet, which mitigates the issue of performance deterioration as layers are added, enabling deeper networks [
40]. The main components of a ResNet are residual blocks, which comprise a main path—including convolutional, batch normalization, and other typical CNN layers—and a shortcut path, which directly connects the input and the main path output. Let
x be the input to a residual block, and let
be the desired data transformation; instead of learning this transformation directly, residual blocks then focus on learning the difference between the input and the output, which is called the residual
. This is achieved by the simple addition of the main path and shortcut path outputs, and it makes it easier for the network to learn by focusing on refining the input rather than completely transforming it.
Figure 3b shows the layout of the ResNet architecture [
40], which, similar to Jaureguibeitia et al. [
21], was designed as intending to replicate that of the CNN, thus deepening the network while maintaining a coherent structure. Similar to the CNN, the network was composed of
, and 6 blocks, each consisting of two residual blocks following the main path pre-activation configuration (conv-BN-ReLu-conv-BN) proposed by Han et al. [
41], where the first block of the network was an exception to this rule and consisted of a single, much simpler conv-BN-ReLU configuration with no shortcut path. Pooling layers were replaced by strided convolutions, which skip every other step in the filtering process. When adjustments to length and depth are needed, the shortcut path of the first residual block includes a strided convolution to create a linear projection of the input. Finally, the hidden fully connected layer was replaced by a global average pooling layer, which outputs the mean value of each input channel [
42].
3.5. CNN and ResNet Configurations
The CNN parameters that were adjusted during the training phase were the following: the number of convolutional blocks
; the width of the filters
(the same filter width was considered along the
B blocks); and the number of filters, which varied from block to block
. Six filter configurations, with increasing number of filters (from sparse to dense), were studied:
;
;
;
;
; and
. The values in parentheses correspond to the number of filters
for blocks
. For architectures with
blocks, central values (with upwards bias) were selected. Therefore, for 3, 4, and 5 blocks, the
configuration would be, for instance, as follows: (8, 16, 32), (4, 8, 16, 32), and (4, 8, 16, 32, 64), respectively.
Table 1 shows the 6 specific filter configurations,
, for each block number,
B, applied to the CNN model. The final values of
B,
I, and configuration
were all optimized during the training phase.
As in the CNN, each convolutional layer in the network used an identical filter size selected from
. Similarly, the possible configurations of the number of filters were selected from
–
, with the number of filters per block applied to each convolutional layer within that block.
Table 2 shows the 6 specific filter configurations,
, for each block number,
B, applied to the ResNet model.
The validation/selection of hyperparamenters was performed in two phases: First, the CNN and ResNet models were evaluated for a fixed configuration and for all combinations of the number of blocks B and filter width I. The optimal filter width was selected for each number of blocks tested as that of the models scoring the best performance. Then, the models were evaluated using the optimal filter width that was achieved in each of the number of blocks tested and in all combinations of the filter configuration , with the best performing model being selected to analyze the test data. In both cases, the performance criterion was the average between UMS and UMFS.
Both in the CNN and ResNet architectures, the weights and biases of every layer were optimized in order to minimize categorical cross-entropy using stochastic gradient descent with a momentum of 0.8. The initial learning rate was fixed at 0.02 and it was reduced by a factor of 0.8 at every epoch. The training process was conducted for 20 epochs with a batch size of 256 samples [
43].
3.6. Comparison with the State of the Art: Classical Machine Learning
The performance of the CNN and ResNet models was compared with that of a state-of-the-art classical ML solution that was designed for multiclass OHCA rhythm classification during manual CPR [
27]. In essence, the algorithm integrates a multi-resolution ECG analysis approach, employing the Stationary Wavelet Transform (SWT) for feature extraction and a RF classifier for the subsequent classification. The SWT decomposes the 12
window into 7 detail coefficient (d
1–d
7) sub-bands using a Daubechies 4 mother wavelet. A denoised version of the ECG was also reconstructed using the detail coefficients d
3 to d
7, corresponding to an analysis frequency band of 0.98–31.25
.
From the denoised ECG and the detail coefficients d
3–d
7, ninety-three features were extracted to characterize the OHCA rhythm subtypes, representing over 25 years of research in the field. These features were divided into five analysis domains. Time domain features included characteristics like the mean and the standard deviation of the heart rate [
34]; spectral features included classical measures like VF leakage [
44] or the power proportion concentrated around the VF-fibrillation band [
45]; complexity analysis covered entropy measures like sample or Shannon entropy [
46]; statistical analysis measures involved amplitude distribution characteristics; and, finally, phase space features utilized time-delay embedding to extract dynamics in the ECG. The detailed description of each of the 93 features can be found in [
27].
The training set was divided using a 10-fold CV approach to select the best subset of
K features and to optimize RF hyperparameters. First, the optimal set of features was selected for each of the 10 training folds that constitute the 10-fold CV. Feature selection was based on a recursive feature elimination (RFE) approach using out-of-bag permutation importance as a ranking criterion [
47,
48]. Permutation importance is an inherent characteristic of the RF classifier that evaluates the significance of each feature by randomly shuffling its values in the training data of each tree in the forest and then measuring the resulting change in the out-of-bag error. In each iteration of the RFE algorithm, features were ranked, and the least important 3% were removed. This process was repeated until the optimal sets of
K features, with
and
, were selected for classification. Once the best
K-feature subsets were selected in each of the 10 CV training folds, the RF classifier was optimized. Only one parameter of the RF classifier was considered for optimization: the minimum number of observations per leaf,
, which controls the depth of the trees and was identified in [
27] as essential for preventing overfitting. For every CV training fold and subset of
K features, different
values were trained in the range
, and these were then evaluated in the corresponding testing fold. The number of trees was set to
, and the number of predictors per split was fixed to the default value
for both feature selection and
optimization.
was found to be sufficient to stabilize accuracy without causing overfitting [
49], and the default value of the number of predictors per split achieved by far the best performance in [
27]. Finally,
was fixed to the default value
during the feature selection process.
4. Results
To set a reference for the CNN/ResNet results, the performance of the state-of-the-art classical ML algorithm was analyzed first.
Figure 4 and
Figure 5 show the results obtained by this algorithm on the training data. The left panel of
Figure 4 shows the mean performance metrics (UMS, UMFS, and the average between UMS and UMFS) as a function of the number of features
K; these were calculated as the average, across all
values considered, of the CV performance metrics obtained for the
K-feature and
RF models. The best compromise between model simplicity and performance was obtained for
as the metrics barely increased for a greater value of
K. The right panel of
Figure 4 shows the CV performance metrics as a function of
. RF models of
features were considered based on the previous results. In terms of the average between UMS and UMFS, the optimal range for
was 1
, with a significant decline in performance observed for larger values of
. The value
was chosen as optimal as it produced the most balanced UMS and UMFS results.
Figure 5 shows the 30 ECG features with a higher probability of selection for the
and
CV RF models. These probabilities were estimated by counting the number of times the features were selected in the 10 iterations of the feature selection algorithm in the 10-fold CV loop. The most important nine parameters, i.e., those that were selected in 100% of iterations, were highly heterogeneous as they were derived from all detail coefficients, as well as from the denoised ECG, and they also corresponded to complexity (SampEn), time (bCP), statistical (IQR, StdAbs1, Hmb, and Hcmp), and phase space domains (SkewPSD). The acronyms used in
Figure 5 are the same as those found in [
27], where a detailed description of each parameter is provided. Given these results, a single optimal RF configuration was defined using
and the most important
features, as per
Figure 5. On a 10-fold CV loop over training data, this configuration obtained identical UMS and UMFS scores of 81.3%. As a single model trained on the complete training data set and evaluated on the test set, it obtained a UMS and UMFS of 85.3% and 85.1%, respectively. The mean (SD) of the performance metrics obtained using the 20 replicas of the testing set can be found in
Table 3.
Regarding the performance of the DL models, the impact of altering the main parameters of the CNN architecture is depicted in
Figure 6. The top row shows the CV performance metrics (UMS, UMFS, and the average between UMS and UMFS) for a varying filter size
I and a fixed filter configuration
. The best performance in terms of the average between UMFS and UMS (the third column on both rows) was obtained for a filter width of
, and 16 when
, and 6 were used. The bottom row contains the results of the study of the effect of changing the filter configuration when the filter size was fixed at/to these optimal values. The
configuration for
blocks and the
configuration with
blocks achieved the best and very similar performances, with the six-block one being slightly better. Therefore, the optimal model for the CNN architecture was composed of six convolutional blocks with 8, 16, 32, 64, 128, and 256 filters, all of them of width 16. In the training set, this model achieved a UMFS and UMS of 83.0% and 84.8%, respectively. In the entire testing set, the model obtained a UMFS and UMS of 86.1% and 87.5%, respectively. The mean (SD) of the performance metrics obtained using the 20 replicas of the testing set can be found in
Table 3.
Figure 7 analyzes the effect of changing the parameters of the ResNet architecture. Similar to CNNs, the filter width was optimized first with the filter configuration fixed at
(first row) for fixed optimal filter sizes, and this was then followed by the selection of filter configuration (second row). The best classification results were obtained for four blocks. Adding a fifth block increased the complexity of the network (number of trainable parameters) and slightly decreased performance. Using only three blocks resulted in a large decrease in performance, or an overly simplistic model. As the third column of the first row shows, the best performance in terms of the average between UMFS and UMS was obtained for
when
and
when
. With the filter width fixed, the second row of
Figure 7 shows that the best performance was achieved for four blocks and the
filter configuration. Thus, the optimal configuration of the ResNet architecture consisted of four blocks containing 6, 12, 24, and 48 filters, respectively, all of them of width 32, that is, the ResNet input block used four filters and the remaining three blocks used 12, 24, and 48 filters in each of the residual blocks. In the training set, this configuration obtained a UMFS and UMS of 86.9% and 86.6%, respectively. In the entire testing set, the ResNet model obtained a value of 88.3% for both the UMFS and the UMS. The mean (SD) of the performance metrics obtained using the 20 replicas of the testing set can be found in
Table 3.
Table 3 shows the results of the OHCA rhythm classification algorithms based on classical ML, CNN, and ResNet models when the aforementioned optimal configurations were applied into the 20 replicas of the testing set. As the summarizing metrics demonstrate, the DL-based algorithms performed better than the traditional ML ones. The CNN model outperformed the RF model by 1.3 percentage points in UMFS, 2.3 in UMS, and 1.4 in ACC (
p-value < 0.05). The ResNet was the best-performing model, outperforming the CNN model by 1.8 percentage points in UMFS, 0.6 in UMS, and 1.7 in ACC (
p-value < 0.05). As such, the ResNet model outperformed the RF model by 3.1 percentage points in UMFS, 2.9 points in UMS, and 3.1 points in ACC (
p-value < 0.05). These results demonstrate that, for the first time, deep learning models offer better performance than classical ML methods for multiclass cardiac rhythm classification during mechanical CCs.
Figure 8 shows the confusion matrices obtained for the optimal configuration of the three models using the entire testing set, where 5816 segments were obtained from 880 patients. The distinction between AS and OR proved to be the most challenging. Regarding the ResNet, 15.1% of AS were incorrectly classified as OR, whereas 9.7% of OR rhythms were misclassified as AS. These findings are in line with those reported by Kwok et al., where, on a limited set of patients, they demonstrated the first three-class rhythm classification algorithm during manual CPR [
12]. In scenarios without a CPR artifact, the AS/OR discrimination is relatively simple and can be addressed using energy and heart rate measurements. During CCs, spiky filtering residuals may be confounded as QRS complexes during AS (Panel (d) of
Figure 9). Conversely, CPR artifact filtering may reduce R-peak amplitudes in OR rhythms, producing erroneous AS classifications (Panel (c) of
Figure 9).
Given the importance of the Sh/NSh discrimination during resuscitation therapy and as an additional experiment, the best performing method (ResNet) was adapted for the binary Sh/NSh classification task. For this experiment, AS and OR rhythms were grouped into the NSh category using 9666 segments (1252 Sh and 8414 NSh) to train and 5813 segments (1154 Sh and 4659 NSh) to test the model. The optimal ResNet architecture for this problem was selected in an analogous manner to that for the three-class problem. Performance was evaluated in terms of sensitivity (SE, the proportion of correctly classified Sh rhythms) and specificity (SP, the proportion of correctly classified NSh rhythms) in line with the minimum performance requirements for the Sh/Nsh discrimination recommended by the AHA; balanced accuracy (the mean of SE and SP) was chosen as a summarizing metric.
Figure 10 shows the performance metrics obtained across the 10-fold CV loop in the training set. As in previous figures, the top row contains analyses of the impact of altering the width of the filters,
I, while the bottom one includes details of the impact of altering the configuration of the filters,
, for a fixed filter size. As shown in the third panel of the bottom row, the
configuration with three and four blocks and
achieved the maximum BAC, with the four-block one being slightly higher. This architecture obtained SE/SP/BAC scores of 85.5%/98.6%/92.0% and 90.6%/98.5%/94.6% in the training and testing sets, respectively.
5. Discussion
The adoption of mechanical CPR devices has significantly increased in recent years, primarily through two main technologies: LDB- and piston-driven mechanical CC devices. Mechanical CPR guarantees high-quality CCs when a manual CPR is (i) subjected to fatigue, (ii) is practically challenging, or (iii) cannot be delivered safely. However, mechanical CPR also introduces large artifacts in the ECG, hindering rhythm analysis, which is critical for clinical decision making. Given the negative effect of CPR interruptions on resuscitation outcomes, there is a high interest for algorithms capable of rhythm analysis during ongoing CPR. To the best of our knowledge, this is the first study to address multiclass OHCA rhythm classification during mechanical CPR, and it is also the first to apply deep learning techniques in this context. Two specific DL architectures, CNN and ResNet, were tested and compared with a state-of-the-art classical ML approach [
26].
DL-based algorithms outperformed the classical ML algorithm by at least 2 percentage points in UMS and ACC. Considering that the classical ML algorithm relies on over 20 years of expert knowledge in ECG feature engineering for OHCA rhythm classification, these results highlight the power of DL algorithms to learn discriminative features by leveraging all the hidden information in the ECG. This simplifies the feature extraction process, saving time and, more importantly, improving the quality of the features extracted.
The algorithm based on ResNet offered the best performance, achieving a UMFS, UMS, and ACC of 88.3%, 88.3%, and 88.2% for the three-class classification task, respectively. This performance is similar to that which was obtained in [
27], i.e., the only other study in the literature that analyzed multiclass OHCA rhythm classification during manual CPR. The characteristics of manual compressions are rescuer-dependent, which means the variability of the resulting artifacts anticipates a more complex filtering challenge. However, manual artifacts showed significantly smaller artifact amplitudes and less harmonic components (smaller bandwidths) compared to LDB artifacts, and this balance resulted in similar levels of accuracy [
27,
34,
35]. For the Sh/NSh problem, the BAC was 94.6%, with a SE of 90.6% and a SP of 98.5%. This is a very important problem since it addresses shock advice decisions during CPR. Shock advice algorithms for defibrillators are normally tested on artifact-free data. In such a scenario, the AHA requires a minimum SE and SP of 90% and 95%, respectively [
50]. Our solution met those requirements.
The database used in this study was fully derived from OHCA cases. It is unclear whether the proposed algorithms would perform differently for in-hospital data; however, given that in-hospital resuscitation does not entail differences in LDB-CPR, it is tempting to argue that the analysis and results we have presented for OHCA patients will also be valid for in-hospital cardiac arrest and CPR.
Finally, some considerations about these results are worth noting. First, this study presents the first method for an automatic and clinically safe multiclass OHCA rhythm analysis during LBD-CPR. The proposed solution, together with already available solutions for piston-driven [
34] and manual CCs [
27], would cover rhythm analysis in every CPR scenario. This may open the possibility of a reliable multiclass OHCA rhythm analysis during CPR, contributing to guide therapy while reducing no-flow intervals, thereby improving survival in OHCA. Second, while the proposed DL algorithms significantly outperformed the classical ML approach and also met AHA requirements, they are no guarantee of the best achievable performance. Due to computational constraints, only a finite set of network configurations was considered. Moreover, this study was limited to CNN and ResNet architectures, following previous studies on manual CPR [
19,
26,
51,
52,
53]. More recent architectures such as Transformers (ViTs) [
54] and Capsule Networks (CapsNets) [
55] could result in improved performances, especially in the presence of increased training data.
The results obtained in this study represent a meaningful and significant improvement over the current state-of-the-art techniques for rhythm classification during mechanical compressions. Further research in DL techniques will be pursued to explore additional avenues for optimizing performance in future work.