A Computational Methodology Based on Maximum Overlap Discrete Wavelet Transform and Autoencoders for Early Prediction of Sudden Cardiac Death

Centeno-Bautista, Manuel A.; Perez-Sanchez, Andrea V.; Amezquita-Sanchez, Juan P.; Camarena-Martinez, David; Valtierra-Rodriguez, Martin

doi:10.3390/computation13060130

Open AccessArticle

A Computational Methodology Based on Maximum Overlap Discrete Wavelet Transform and Autoencoders for Early Prediction of Sudden Cardiac Death

by

Manuel A. Centeno-Bautista

¹

,

Andrea V. Perez-Sanchez

¹

,

Juan P. Amezquita-Sanchez

¹

,

David Camarena-Martinez

²

and

Martin Valtierra-Rodriguez

^1,*

¹

ENAP-RG, CA Sistemas Dinámicos y Control, Facultad de Ingeniería, Departamento de Electromecánica, Universidad Autónoma de Querétaro, Campus San Juan del Río, San Juan del Río 76807, Mexico

²

ENAP-RG, División de Ingeniería, Universidad de Guanajuato (UG), Campus Irapuato-Salamanca, Carretera Salamanca-Valle de Santiago km 3.5 + 1.8 km, Comunidad de Palo Blanco, Salamanca 36885, Mexico

^*

Author to whom correspondence should be addressed.

Computation 2025, 13(6), 130; https://doi.org/10.3390/computation13060130

Submission received: 25 April 2025 / Revised: 15 May 2025 / Accepted: 22 May 2025 / Published: 1 June 2025

(This article belongs to the Special Issue Feature Papers in Computational Biology)

Download

Browse Figures

Versions Notes

Abstract

Cardiovascular diseases are among the major global health problems. For example, sudden cardiac death (SCD) accounts for approximately 4 million deaths worldwide. In particular, an SCD event can subtly change the electrocardiogram (ECG) signal before onset, which is generally undetectable by the patient. Hence, timely detection of these changes in ECG signals could help develop a tool to anticipate an SCD event and respond appropriately in patient care. In this sense, this work proposes a novel computational methodology that combines the maximal overlap discrete wavelet packet transform (MODWPT) with stacked autoencoders (SAEs) to discover suitable features in ECG signals and associate them with SCD prediction. The proposed method efficiently predicts an SCD event with an accuracy of 98.94% up to 30 min before the onset, making it a reliable tool for early detection while providing sufficient time for medical intervention and increasing the chances of preventing fatal outcomes, demonstrating the potential of integrating signal processing and deep learning techniques within computational biology to address life-critical health problems.

Keywords:

sudden cardiac death; electrocardiogram signal; maximum overlap discrete wavelet packet transform; stacked autoencoders; deep learning

1. Introduction

At present, approximately 25% of the 17 million deaths that occur annually are associated with sudden cardiac death (SCD) [1,2]. SCD is defined as an unexpected death caused by a cardiovascular event [3]. Pulseless electrical activity, asystole, ventricular fibrillation, and ventricular tachycardia are conditions associated with SCD [4], resulting from disruptions in the heart’s normal functioning. Consequently, this condition leads to death as the brain and other organs are deprived of blood supply [5]. Accurate and timely prediction and intervention in arrhythmias can significantly improve the chances of preventing SCD by minimizing the delay between disease onset and treatment [6].

In recent decades, several prediction approaches have been developed to anticipate or predict an SCD event using two main strategies: (1) machine learning (ML) and (2) deep learning (DL). In the use of ML, numerous examples demonstrate the application of these techniques in electrocardiogram (ECG) analysis. For example, Alfarhan et al. [7] proposed an ML scheme based on the integration of diverse time-domain features, such as the mean and standard deviation of the QRS complex, with a k-nearest neighbor classifier for predicting an SCD event using ECG signals. The authors reported an accuracy of 97% up to 10 min before the SCD event. They also demonstrated that normalizing the extracted features improves the method’s accuracy. Ebrahimzadeh et al. [8] employed the first derivative feature to process the data combined with a multilayer perceptron (MLP), achieving an accuracy of 83% for predicting an SCD event 12 min in advance. Vargas-Lopez et al. [9] used empirical mode decomposition (EMD) with nonlinear features estimated using Higuchi fractal and permutation entropy to train an MLP, predicting an SCD event 25 min before its onset with an accuracy of 94%. This work emphasizes that by using only two indices, the computational load is reduced compared to studies that use multiple indices but achieve lower accuracy. Similarly, Ebrahimzadeh et al. [10] integrated the standard deviation of the short-term R-to-R wave interval from heart rate variability with an MLP to predict an SCD event 13 min before its onset, achieving an accuracy of 90.18%. They noted the importance of carefully selecting the features used to train the classifier, emphasizing the use of p-value evaluation as a starting point. Khazaei et al. [11] presented a methodology combining recurrence quantification analysis to extract features such as laminarity and increment entropy with a decision tree classifier for automatically predicting an SCD event. The authors reported an accuracy of 95% up to 6 min before the SCD event. As in previous works, the authors highlighted the importance of reducing computational load in methodology development. In recent years, Amezquita-Sanchez et al. [12] used the wavelet packet transform (WPT) combined with the homogeneity index to extract features from ECG signals. These features were employed to train an enhanced probabilistic neural network classifier, predicting an SCD event with an accuracy of 95.8% 20 min prior to the event. Singhal and Agarwal [13] developed a method for predicting SCD events based on a Fourier processing-based decomposition technique and a support vector machine (SVM) classifier. This method is highly robust to noise and is described as simple to implement for real-time systems. Using this methodology, they achieved an accuracy of 100% 10 min before the SCD event. Finally, Centeno-Bautista et al. [14] introduced an ML scheme based on the integration of complete ensemble empirical mode decomposition (CEEMD) to separate the signal and extract time–frequency domain features. Using an SVM classifier, they achieved an accuracy of 97.28% up to 30 min before the SCD event. This work, like others, emphasizes the simplicity of its design, particularly in using statistical indices to highlight the characteristics of the ECG signal. Even though significant advances have been achieved using ML schemes in recent years, there has been a growing interest in DL models [15] due to their ability to handle the complexities of multidimensional data, manage a high volume of variables, and effectively address more intricate problems. This capability enables a broader range of insights compared to the traditional tabular data analysis employed in ML-based methods [16]. In this regard, several works have explored DL strategies for detecting cardiac pathologies. For instance, Eleyan and Alboghbaish [17] employed a combination of convolutional neural networks (CNN) and long short-term memory (LSTM) networks to identify heartbeat irregularities that precede various heart diseases. Similarly, Seitanidis et al. [18] classified arrhythmias using a two-dimensional CNN fed with images generated from the ECG signal. This work highlights the model’s efficiency when implemented on purpose-built hardware. Acharya et al. [19] use also a CNN to detect myocardial infarctions. Similarly, Shilla and Wang [20] employed a CNN to identify cardiac arrhythmias that can lead to SCD events, distinguishing between the attributes of each pathology to ensure correct identification. On the other hand, Kwon et al. [21] applied a DL model with three hidden layers, 362 nodes, and batch normalization to identify cardiovascular diseases. They chose electrocardiographic recordings over other types of data due to their less invasive nature. In another study, Kwon et al. [22] proposed a method based on a three-layer recurrent neural network to improve traditional track-and-trigger systems used for predicting cardiac arrests in hospitals. This approach reduced false alarms by 82.2%. Although DL-based methods have been successfully applied to various cardiac pathologies, it is worth noting that very few studies have explored or implemented these methods for predicting SCD events. For example, Saragih and Isa [23] combined a WPT with a CNN to forecast an SCD event 30 min before its onset. They reported an accuracy of 95.89%, noting that comparing several mother wavelets in the WPT stage revealed that the Meyer mother wavelet was most effective at extracting features from ECG signals associated with SCD events. Kaspal et al. [24] combined a recurrence complex network with a CNN to predict an SCD event, achieving an accuracy of 90.60% 30 min prior to the event. The authors highlighted that eliminating the saturation gradient during training and employing a new activation function (a modified rectified linear unit) improved the CNN model’s efficiency in identifying relevant features in the analyzed signal variations. Telangore et al. [25] employed CNN and scalograms obtained by means of Hilbert–Huang transform (HHT) and wavelet transform (WT) in a multiclass analysis based on various time intervals before the SCD event. This allows them to have a prediction accuracy of 98.81% up to thirty minutes before. Finally, Centeno-Bautista et al. [26] proposed a method that uses complete ensemble empirical mode decomposition with a CNN to predict an SCD event up to 30 min before its onset, achieving an accuracy of 97.5%.

Based on the state-of-the-art review presented earlier, it is evident that while highly promising results have been achieved in the prediction of SCD events, there is still considerable room for further research in this area. Opportunities include improving both accuracy and prediction time, reducing computational time, minimizing complexity, enhancing robustness to noise, and increasing generalization to diverse datasets, among other aspects. In this context, exploring advanced methods in both signal processing and deep learning presents a valuable research opportunity. The maximum overlap discrete wavelet packet transform (MODWPT), for instance, is a robust signal decomposition technique that has demonstrated immunity to noise and effectiveness in extracting meaningful features from complex signals. It has been successfully applied in various domains, such as electrical signal analysis [27,28], and used to classify interictal and ictal electroencephalograms [29], detect myocardial infarction [30] or problems in the transmission of the heart’s electrical signal using ECG [31], and detect epileptic episodes using electromyographic signals [32]. Its ability to preserve signal energy across different frequency bands makes it particularly suitable for preprocessing tasks where precision and noise resistance are critical. Similarly, stacked autoencoders (SAEs), as part of deep learning techniques, are highly promising classification tools due to their ability to learn hierarchical and abstract representations from complex data. This capability not only enhances accuracy but also supports the generalization of models to diverse datasets. SAE-based methods have been successfully applied in various fields, including image identification [33], portfolio smart administration [34], predicting events such as drug interactions [35], distinguishing classes of electroencephalogram signals [36], and emotion recognition [37]. Their adaptability and capacity to manage large volumes of multidimensional data make them particularly well-suited for addressing the challenges of SCD prediction. Therefore, given the advantages of MODWPT and SAE, combining these techniques offers a promising avenue for the analysis and prediction of SCD events. The MODWPT’s noise resilience and feature extraction capabilities complement the SAE’s strengths in generalization and hierarchical learning, providing a synergistic framework for tackling the complexities of SCD prediction. Furthermore, these methods have not been previously tested for this specific purpose, either individually or in combination, underscoring the innovative nature of this research.

Motivated by the challenges and opportunities identified earlier, this work explores a novel computational methodology for predicting SCD events 30 min before their occurrence. The approach skillfully integrates the methods previously described, specifically the MODWPT to decompose ECG signals into their time–frequency components, and SAEs as classifiers to determine whether a signal originates from a subject at risk of an SCD event. This study goes beyond a simple combination of methods by conducting an in-depth analysis to enhance their efficiency and overall performance. To evaluate the proposed approach, signals from the MIT/BIH Normal Sinus Rhythm (NSR) and MIT/BIH Sudden Cardiac Death Holter (SCDH) databases [38] are utilized, consistent with many of the studies cited in the literature. The main contribution of the proposed method is based on leveraging the advantages of each MODWPT and SAE technique. In this sense, MODWPT enables the preservation of relevant patterns in the decomposed/analyzed signals, unlike traditional WPT and digital filters, where low-amplitude patterns can be diminished or suppressed. SAE enables the hierarchical extraction of discriminative features by progressively compressing and reconstructing the input data, unlike traditional machine learning classifiers, which rely on manually engineered features that may overlook subtle, yet critical patterns associated with the phenomenon. The results demonstrate the methodology’s effectiveness, achieving an accuracy of 98.94% in predicting SCD events 30 min before their onset.

2. Data and Methods

This section briefly describes the data and methods used in this work.

2.1. ECG Data

2.1.1. Data Used

As mentioned in the introduction, two datasets are used to evaluate the proposed methodology. Both datasets are publicly available and can be accessed on the Physionet website, which is owned by the MIT Laboratory for Computational Physiology. The first dataset, the NSR dataset, consists of 18 ECGs from individuals (13 females) aged between 20 and 50 years, who are identified as healthy patients. These records were cataloged without visible ECG pathology and monitored at a sample frequency of 128 Hz. The second dataset called the SCDH dataset, consists of 23 Holter recordings obtained from patients (8 females) aged between 17 and 89 years. For this study, only 20 records containing ventricular fibrillation are used; the rest feature a different pathology (e.g., ventricular tachycardia and hypertrophic cardiomyopathy) and are therefore not included in the study. These recordings capture the cardiac electrical activity during an SCD event. The sample frequency of this dataset is 250 Hz. No other exclusion criteria were used.

2.1.2. Selection and Preparation of ECG Signals

Adjusting certain characteristics of the datasets is necessary to ensure proper utilization. First, the sampling frequency differs; the NSR dataset is sampled at 128 Hz, and the SCDH dataset is sampled at 250 Hz. For this reason, the SCDH sampling frequency must be adjusted to maintain consistency with the sampling rate of the NSR database. To achieve this, the SCDH dataset is resampled to 128 Hz by convolving the original ECG signal with a digital low-pass filter. Additionally, the duration of each record must be standardized. Only 30 min of each record are used to limit the amount of data and avoid excessive computational loads. This selected time interval also facilitates comparisons with other studies reported in the literature. For the NSR dataset, a random interval is extracted from each record. In the SCDH dataset, the interval corresponds to the 30 min preceding the SCD event. Furthermore, these intervals are divided into 1-minute segments for analysis. It is important to note that the length of these intervals, 1 min, was chosen because this time window enables the detection of reliable ECG signal characteristics that can be associated with different phenomena [12,39]. Figure 1 illustrates this process: (a) shows a representation of the 1-minute time window segmentation, while (b) demonstrates how this segmentation is performed in an SCD record. It can be observed that these cuts are made during the SCD event.

It is important to note that no additional preprocessing operations—such as handling of missing values, outlier removal, or normalization—were applied to the raw ECG data, as the proposed methodology focuses on frequency-based transformations and scale-invariant image representations.

2.2. Maximal Overlapped Discrete Wavelet Packet Transform

The MODWPT is a signal-processing technique for analyzing and decomposing data signals into diverse frequency bands (FBs) or nodes [40]. MODWPT is an extension and enhancement of the discrete wavelet packet transform (DWPT) [30]. In particular, the DWPT is a technique for decomposing a signal into different frequency components. It works by iteratively splitting the signal into high-frequency and low-frequency components, also known as details and approximations, respectively, working as a filter bank. Therefore, the original signal is decomposed into FBs, covering half of the sampling frequency signal. However, some of drawbacks of DWPT include generating under-sampled modes, which can produce distortions at the edge of the calculated FBs, as well as down-sampling, which exacerbates these distortions. On the other hand, the MODWPT is an approach that ensures maximal overlap among FBs in the decomposition process. This means that each FB has a larger portion of overlap with its adjacent FB. Maintaining this maximum overlap is relevant for achieving smooth transitions between frequency components in ECG signals [29].

In MODWPT, the signal is divided into multiple FBs, similar to DWPT. However, MODWPT optimizes the filter bank design and the decomposition process to maximize the overlap among FBs, while maintaining the energy conservation property [30]. Furthermore, the calculated FBs are time invariant, a property indicating that the down-sampling does not occur as in DWPT. This results in a more densely sampled frequency representation, which is beneficial for feature extraction performed automatically by the classifier. Due to these advantageous characteristics, MODWPT is used in this work.

To obtain MODWPT coefficients at each level, the technique decomposes the input signal

s_{0}^{0} (k)

in several coefficients at level j, by convolving the infinite impulse response low-

(g (n))

and high

(h (n))

-pass filters [27], for the discrete position

k

and shift index

n

; sampling the original signal according to the following equations:

s_{j}^{2 z} (k) = \frac{1}{\sqrt{2}} \sum_{n = - \infty}^{+ \infty} g (n) s_{j - 1}^{z} (k - n)

(1)

s_{j}^{2 z + 1} (k) = \frac{1}{\sqrt{2}} \sum_{n = - \infty}^{+ \infty} h (n) s_{j - 1}^{z} (k - n)

(2)

where

z = 2 m

is the node number where

m \in ℕ (natural numbers)

and at scale

j

,

m \leq 2^{j - 1} - 1

; the node zero component

s_{j}^{0} (k)

represents the decomposition packet coefficients of the lowest frequency band at scale

j

, whereas at any other node, i.e., for (

z \neq 0

),

s_{j}^{z} (k)

represents the decomposition packet coefficients of the higher frequency bands at scale

j

.

In addition to the decomposition process, it can be entirely reconstructed from the two sequences

s_{j - 1}^{2 z}

and

s_{j - 1}^{2 z + 1}

using two reverse reconstruction quadrature filters,

\bar{g}

and

h

, respectively, in the following equations:

a_{j - 1}^{2 z} (k) = \frac{1}{\sqrt{2}} \sum_{n = - \infty}^{+ \infty} \bar{g} (n) s_{j}^{2 z} (n - k)

(3)

a_{j - 1}^{2 z + 1} (k) = \frac{1}{\sqrt{2}} \sum_{n = - \infty}^{+ \infty} \bar{h} (n) s_{j}^{2 z + 1} (n - k)

(4)

An example diagram illustrating the operation of MODWPT is shown in Figure 2, where a three-level decomposition is performed, resulting in eight FBs at the final level. Since MODWPT works with FBs corresponding to portions of the original signal’s frequency spectrum, each FB generated at level 3 represents one-eighth of half the sampling frequency. For a level-four analysis, 16 FBs would be generated, while a level-five analysis would produce 32 bands, and so on.

Increasing the level of decomposition generates narrower bands, which enhances the distinguishable features in each FB during subsequent processing. For this reason, these narrower bands are particularly useful for classification with deep learning techniques, specifically SAEs [41,42]. By emphasizing the signal’s characteristics, the classifier can generate an appropriate response without requiring additional methods for prior feature extraction. When analyzing ECG signals, the Daubechies wavelet of order 44 has been documented as the most suitable mother wavelet for this task [43].

2.3. Classifier

2.3.1. Autoencoder

An autoencoder (AE) is a type of artificial neural network used for unsupervised learning that aims to learn efficient representations of data. It operates by compressing the input into a lower-dimensional space and then reconstructing the original input from this compressed representation. The primary goal of an autoencoder is to capture the most important features of the data, enabling applications such as dimensionality reduction, denoising, and anomaly detection [44]. The general configuration of an AE, as shown in Figure 3, consists of an input layer, a first hidden layer smaller than the input layer, a bottleneck hidden layer with an even smaller size, another hidden layer matching the size of the first hidden layer, and finally, an output layer with the size as the input layer [45]. From this configuration, an AE comprises two primary components: an encoder and a decoder. The encoder, which spans from the input layer to the bottleneck, reduces the dimensionality of the input data, generating a compact representation. Meanwhile, the decoder, extending from the bottleneck to the output layer, aims to reconstruct the input without disturbances.

2.3.2. Stacked Autoencoder

One problem with AEs is that they can struggle to recognize data representations abstract enough to be useful in machine learning tasks. SAEs, by learning representations across multiple layers, can capture more abstract features for data classification [46,47]. An SAE is a feed-forward network composed of multiple AEs, where each AE layer is trained on the output of the previous one. SAEs are constructed by sequentially stacking multiple AE layers [48]. Each layer refines the representation learned by the preceding layer, ultimately forming a deep and hierarchical structure [49]. The general steps for training an SAE are as follows:

1. Train the first AE with the original data. The general configuration of the AE in this and subsequent stages is similar to the previously described setup, with a shallow hidden layer for both the encoder and decoder components. However, while the AE is trained using all its parts (encoder and decoder), only the features generated by the encoder’s hidden layer are used in the next step.

2. Train subsequent AEs. This stage involves stacking the AEs. Each AE in this phase uses the features generated by the hidden layer of the previous encoder as input data. The current AE is trained similarly to the first, generating features in its encoder’s hidden layer for use in the next AE. Consequently, each subsequent AE has a smaller input layer size than the preceding AE, progressively reducing the dimensionality of the analyzed data.

3. Classifier layer: In the final stage, a SoftMax classification layer is used to identify the expected cases from the input signals. The SoftMax layer takes the features generated by the last AE as input data.

A representation of how an SAE is assembled is shown in Figure 4. The hidden layer of the first AE, containing the features detected by the AE, becomes the input layer of the second autoencoder (a). These hidden layers from each AE form the corresponding hidden layers of the SAE, maintaining the order of the AEs from which they are derived (b).

In this way, the general structure of the AE relies on the reduction of features between input layers [50], enabling the extraction of only the most relevant characteristics for identification. The encoding process at each layer can be described by the following equation:

a_{i}^{(k)} = f (u_{i}^{(k)} + \sum_{j = 1}^{n_{k} - 1} a_{j}^{(k - 1)} w_{j i}^{(k - 1)}), i = 1, 2, \dots, n_{k}

(5)

where

a_{i}^{(k)}

represents the activation of the

i

-th neuron in the

i - t h

layer,

u_{i}^{(k)}

is the bias term,

w_{j i}^{(k - 1)}

denotes the weight connecting neuron

j

in layer

k - 1

to neuron

i

in layer

k

, and

f

is the activation function. This equation models how each layer successively reduces the dimensionality of the input by learning weighted combinations of features.

This reduction is particularly effective when the goal is to condense a large dataset into a few key features. Such an approach is advantageous when applied as a prediction method, as proposed in this research. The SAE is chosen as the classifier for predicting SCD events because it transforms the data into a lower-dimensional representation, ultimately producing a single output that represents the two possible outcomes in SCD event prediction.

An important aspect to consider when implementing a stacked autoencoder is the tuning of its hyperparameters, which can significantly affect its performance in feature extraction and classification tasks. Key hyperparameters include the number of hidden neurons in each autoencoder layer, the number of stacked layers, the activation function, the learning rate, the batch size, and the number of training epochs. Adjusting these parameters allows control over the model’s capacity to learn relevant features, prevent overfitting, and balance computational cost. Typically, the number of neurons is reduced progressively across layers to facilitate dimensionality reduction, while the choice of activation function influences the non-linear transformation capabilities of the network. The learning rate and number of epochs determine the convergence behavior during training. Careful tuning of these hyperparameters is essential to achieve a balance between reconstruction error, classification accuracy, and computational efficiency. Since there is no universal rule that guarantees optimal settings across different applications, this tuning process often involves trial and error and relies heavily on the developer’s experience and intuition.

3. Methodology

The proposed methodology to predict an SCD event consists of four steps: (1) data windowing, (2) signal decomposition, (3) image representation, and (4) prediction. A diagram of the methodology is shown in Figure 5, where it can be observed that the output of the proposed method is the classification of the input signal as either healthy or an SCD event.

First, the ECG signals are segmented as described in Section 2.1.2. Each ECG signal is divided into 30-minute segments, a process performed for each participant. Specifically, for the SCD database, the 30-minute segments are taken from the period preceding the SCD event. For ECG signals with normal cardiac rhythm (healthy signals), the 30-minute segments are randomly selected from the original ECG recording. These segments are further divided into 1-minute windows, which serve as the primary units for processing in the subsequent steps. Figure 6 shows the point where an SCD event begins; according to the database records, the thirty minutes leading up to this event are selected for the analysis.

Once the ECG signals from both databases are segmented and divided into 1-minute intervals, they are processed using the MODWPT technique to decompose them in various FBs. To configure the MODWPT, it is taken into account that the ECG signals have a frequency range of 0.5 to 30 Hz [51], where the most important components are in the range of 1 to 10 Hz. However, the entire bandwidth of the original signal is used, as there may be other frequency contributions from the SCD event minutes before its occurrence. Additionally, previous studies [32] using biosignals have shown that working with 2 Hz FBs can provide valuable information and useful features for classification. With this in mind, a 5-level decomposition is chosen, which results in 32 signals for each 1-minute window. Each of the newly generated signals has a 2 Hz FB, where FB one corresponds to the 0 to 2 Hz range, FB two the 2 to 4 Hz range, FB three to the 4 to 6 Hz range, and so on, up to FB 32, which covers the 62 to 64 Hz range. This frequency range improves representation and adds features in continuous steps [52]. MODWPT ensures maximum overlap, providing smooth transitions between frequency components in ECG signals and highlighting features in each FB for better classification with deep learning techniques. Additionally, the FBs do not introduce significant distortion in either the time or frequency domains [29].

The next step, corresponding to the image representation, takes the signals generated by MODWPT in the previous step (Figure 7a) and transforms them into an image that the classifier can process. The 32 signals are displayed using a surface plot, with the X and Y axes representing time and FB numbers, respectively, as shown in Figure 7b. The Z axis corresponds to the amplitude of the signals. The color map used is called “turbo”, selected for its smooth transition between colors. To create the image, the top view is used, where the color differences represent the amplitude of the signals, and the width and height of the image represent time and FBs, respectively. This arrangement ensures that the data are visually distinguishable without interference, as shown in Figure 7c. Finally, the generated image is converted to grayscale to reduce computational load compared to an RGB format [53], as shown in Figure 7d. The generated images have a size of 652 by 475 pixels.

The final step is data classification using the SAE. To configure the classifier, according to Section 2.3.1, two AEs are trained, each consisting of an input layer, a hidden layer, and an output layer. The proposed base structure of the SAE is created with the first AE having a hidden layer of 200 neurons and the second AE having a hidden layer of 50 neurons. This configuration is initially proposed to maintain the proportion of the SAE while also starting with a relatively low computational load by using only 200 neurons in the second layer. This structure serves only as a preliminary starting point and will be further refined in subsequent stages to improve performance. At the end of the SAE, a SoftMax layer is added as the classifier with two outputs: healthy and SCD event.

It is essential to mention that all these steps are implemented using MATLAB R2022a on a computer with a 10-core 2300 MHz processor, 32 GB RAM, and a Nvidia GeForce RTX 3060 GPU.

4. Experimentation and Results

4.1. Signal Decomposition

Once the 1-minute windows are generated, they are processed using MODWPT to create 32 new signals from each of these windows through a level-five decomposition, as previously described. The mother wavelet used is a Daubechies wavelet of order 44, which has proven to be highly effective in decomposing ECG signals [43]. Figure 8 shows a random segment of the 1-minute window and its decomposition into 2 Hz range FBs. The signal is decomposed only up to a frequency of 64 Hz due to Nyquist’s theory. Decomposing into FB with this range allows for more precise localization of the signal characteristics in the frequency domain. This is useful for the classification step, as it helps to identify specific events within particular FBs.

For visual comparison, Figure 9 shows the first six frequency bands after decomposition using MODWPT, where (a) shows the healthy signal and (b) shows the signal 1-minute before the SCD event. It can be observed that the difference in amplitude is quite noticeable for this example. However, this characteristic alone does not represent an easy and direct classification, and therefore, does not enable an easy and direct prediction of an SCD event. This is because the amplitude of the ECG signals can vary slightly due to several factors, such as the exact placement of the electrodes used for measurement. As a result, there can be significant variations in amplitude from one record to another, compared to the signals used to train the SAEs. However, the variation in the amplitudes of each frequency band relative to the others within each 1-minute window can provide relevant information for signal classification.

4.2. Data Classification

4.2.1. Testing Different Image Sizes

Once all the bands are obtained for each time window and condition, the images are generated according to the description presented in the methodology section, with a size of 652 by 475 pixels. These are the base images that are used in the next stage. For prediction, an SAE composed of two AEs is employed. Each AE has the following structure: an input layer, a hidden layer, and an output layer. The size of the hidden layer of the first AE is 200 neurons, and the hidden layer of the second AE is 50 neurons. These values are initially set as a reference when evaluating the different image sizes, but later, an experimental investigation is conducted to determine more efficient values. In the first instance, the smallest image size that the SAE can adequately analyze is evaluated, considering the accuracy value. For this, the SAE was trained with five different image sizes, with pixel dimensions of the original size of 652 × 475, 512 × 512, 256 × 256, 128 × 128, and 64 × 64. Figure 10 illustrates the different image dimensions to be used, both for a healthy signal and a signal prior to an SCD event.

Tests are conducted on each of the 1-minute windows of each type of image, both healthy and prior to the SCD event. An average of the values obtained each minute is calculated for each image size. The results indicate that the 128 × 128 pixel image size yields the highest average accuracy value, with a result of 98.15%, as shown in Table 1. This finding suggests that an image size of 128 × 128 pixels is the most efficient for accurate data classification. Furthermore, compared to the original image, this smaller size results in a significant reduction in image size, leading to a decrease in computational load. The complete results for the different sizes are presented in Table 1.

4.2.2. Testing the Number of the Hidden Layer Neurons

Once the image size with the best results has been identified, the next step is to find a good combination of neuron sizes in the hidden layers of the AEs. Similarly, this and the following test are conducted with each of the 1-minute windows, and the average of these results is calculated. For these tests, it is important to maintain the relationship between the sizes of the hidden layers required by the SAE, ensuring a reduction in the size of the hidden layers. The first AE’s hidden layer is evaluated as a first step. As mentioned, the base value is 200 neurons, and sizes of 400, 300, 100, and 75 neurons are also tested. The size of the second AE’s hidden layer is kept at 50 neurons. Again, the evaluation is based on accuracy. The results are shown in Table 2. In this case, the hidden layer size of 300 neurons for the first AE yields the best result, with an accuracy of 98.68%, which is an improvement over the value obtained from the base size of 200 neurons.

Finally, the size of the second hidden layer, initially set at 50 neurons, is evaluated. Sizes of 25, 75, 100, 125, 150, 175, and 200 neurons are tested. The results, shown in Table 3, indicate that for most sizes, there are variations in the average accuracy of around 0.3%; only the size of 75 neurons shows a more significant drop. The size that produced the best results was 175 neurons, with an accuracy of 98.94%.

The minute-by-minute accuracy values of the proposed methodology are shown in Table 4. The standard deviation shown by these thirty values is 1.31% with an average value of 98.94%, indicating that the methodology is accurate and consistent over the evaluated time.

Additionally, other evaluations of the proposed method have been carried out using different segments of the healthy signals to compare the performance of the methodology. For these tests, 30-minute segments are still used, divided into 1-minute windows. The best result obtained was the one previously mentioned, with 98.94% accuracy and the lowest being 94.47%, with a standard deviation of 1.31% for the first and 13.72% for the second. Figure 11 shows a visual summary of the results obtained in these evaluations for each 1-minute window, where the bars correspond to the average accuracy values, and the vertical lines indicate the maximum and minimum values obtained.

Therefore, the final structure of the SAE consists of 16,384 neurons of the input layer, corresponding to the total pixels of the image size 128 × 128 pixels, 300 neurons in the first hidden layer, 175 neurons in the second hidden layer, and two neurons in the output layer, which corresponds to the classification between healthy signals and those before SCD events. A representation of the final structure of the SAE is shown in Figure 12, which shows that the basic structure of the autoencoder is maintained.

The results obtained in this section highlight the importance of properly configuring the SAE to achieve better performance. Various tests must be conducted to ensure that the sizes of the hidden layers are correct, while maintaining the base structure of the SAE. Notably, increasing the sizes of the hidden layers compared to the initial configuration resulted in an improvement in accuracy, rising from 96.4% to 98.94%. Therefore, it is essential to evaluate different configurations to identify the one that delivers the best results with the proposed techniques.

Selecting the correct image size is also crucial for improving accuracy and reducing computational load. This study was capable of using an image size smaller than the original image generated from the MODWPT results, significantly reducing the computational load. In this regard, complete signal processing with the original image takes an average SAE training time of 7 h, while processing with the 128 × 128 pixel image takes only around 20 min, representing just 4% of the time required for processing with the original image size. The time taken for analysis per minute prior to the SCD event is 40 s on average.

5. Discussion

As previously mentioned, ML methods for predicting SCD events are more widely used than DL-based methods. These methods exhibit differences in both accuracy and prediction time before an SCD event. Table 5 shows the values for these characteristics, along with qualitative information related to the methods discussed.

In Table 5, methods using ML provide an average prediction time of 13 min before an SCD event, which is shorter compared to DL-based methods, where the average prediction time is 28 min before the event. Although good results are obtained, relying solely on accuracy for predicting an SCD event may not be sufficient to assess overall performance. For instance, in [7], the reported accuracy is 97%, but the prediction time is only 10 min. Similarly, [13] achieves a perfect accuracy of 100%, but the prediction time remains low at just 10 min before the event. While these methods are effective, there is room for improvement in prediction time. Extending the prediction window reduces the gap between treatment application and the onset of SCD, significantly increasing the likelihood of preventing fatal outcomes [6]. Furthermore, [9] improves the prediction time to 24 min, but accuracy drops to 94%. This highlights an area for potential enhancement. While ML techniques show promising results, deep learning approaches generally deliver better overall performance. DL methods typically achieve longer prediction times, with most predictions occurring 30 min before the SCD event. Additionally, most studies report accuracy levels above 95%, except [22], which shows an accuracy of 90.6%. A notable improvement among ML-based methods is found in [14], which achieves an accuracy of 97.28% with a prediction time of 30 min. However, an important factor to consider when comparing methodologies is computational load. The MODWPT processing used in this study is significantly less computationally intensive than other techniques, such as CEEMD. For example, CEEMD requires 45 times more processing time than MODWPT, which substantially increases the computational burden.

As summarized in Table 5 and discussed earlier, three key advantages of the proposed method can be highlighted: (1) Reduced computational load: The process is significantly faster than the method proposed in [26], which takes 45 times longer to process the data compared to the present approach using MODWPT. This makes real-time prediction more feasible; (2) Improved accuracy: The proposed method increases critical accuracy by at least 0.13% compared to the works [23,24,25,26]; (3) Reliable performance: This research provides a robust methodology for predicting an SCD event 30 min before its onset, achieving an accuracy of 98.94%.

In summary, the main contributions of the proposed methodology are threefold: (1) it achieves a high predictive accuracy of 98.94% in identifying SCD events 30 min before their onset, outperforming or matching the accuracy reported in similar studies; (2) it significantly reduces computational load by employing MODWPT instead of more demanding methods like CEEMD, making it more suitable for real-time applications; and (3) it demonstrates consistent and reliable performance across different time windows, with a low standard deviation of 1.31%. These features position the proposed approach as a robust, efficient, and accurate alternative to existing models for early prediction of SCD events.

6. Conclusions

The timely prediction of an SCD event can help prevent one of the leading causes of death globally by enabling early intervention and providing timely care to individuals at risk. SCD is a condition that can be anticipated with the appropriate tools to analyze ECG signals. Developing such tools through accurate interpretation of cardiac signals could allow for the implementation of corrective actions to prevent SCD events. This study explored the use of MODWPT and SAE to create a computational method that could serve as a novel tool in the field, merging the versatility of deep learning with the simplicity and efficiency of low computational load processing. A critical accuracy of 98.94% was achieved with a prediction window of 30 min. Additionally, as highlighted in the previous section, the computational burden of data processing is significantly lower than that of other techniques. This efficiency is attributed to the optimized SAE structure, which includes processing images of size 128×128 and employing a two-layer hidden structure with 300 and 175 neurons, respectively. This approach not only reduces computational load but also demonstrates improved accuracy compared to other studies.

As mentioned, the computation time of the final methodology for analyzing each minute of ECG signals is significantly lower than in the initial results. This is promising for potential implementation in an embedded device. However, to better evaluate this improvement, it is necessary to compare it with results from other studies. Unfortunately, such data are not reported in the related literature. As part of our interest in implementing this approach, future work will aim to explore this aspect further to gain a broader perspective on emerging SCD event prediction methods and their practical applications on embedded systems.

While the methodology proposed in this work has shown promising results, they should be considered preliminary. Further evaluation is necessary in the following areas: (1) Validation on larger datasets: To reliably confirm the results and advance toward practical applications. (2) Incorporation of diverse data: Including other factors leading to SCD, such as pulseless electrical activity, asystole, ventricular fibrillation, and ventricular tachycardia. (3) Reduction of computational load: Exploring additional techniques to further optimize processing efficiency. (4) Incorporation of a filtering step as part of the preprocessing to ensure that the analyzed signals do not present artifacts or any other alterations in the signals that could affect the analysis. Despite these limitations, the findings represent a valuable starting point for advancing the use of deep learning and specifically SAE in predicting SCD events, while also motivating the exploration of hardware-based implementations to support real-time monitoring applications.

To further reduce computational costs during data classification, particularly in the hidden-layer neurons, the authors are investigating dimensionality reduction methods using optimization algorithms such as the genetic algorithm [54], the chaotic whale atom search optimization [55], and the monkey optimization algorithm [56], among others. Additionally, research is being conducted on alternative learning algorithms to simultaneously lower computational costs and improve accuracy as future advancements in this area.

Author Contributions

Conceptualization, M.A.C.-B., J.P.A.-S. and M.V.-R.; methodology, M.A.C.-B., A.V.P.-S. and M.V.-R.; software, M.A.C.-B. and A.V.P.-S.; formal analysis, resources, and data curation, M.A.C.-B., A.V.P.-S. and A.V.P.-S.; writing—review and editing, all authors; supervision, project administration, and funding acquisition, J.P.A.-S., D.C.-M. and M.V.-R. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the “Secretaría de Ciencia, Humanidades, Tecnología e Innovación (SECIHTI)—México” that partially financed this research under the scholarships 830903 and 814956 given to M. A. Centeno-Bautista and A. V. Perez-Sanchez, respectively, and the scholarships 253652, 329800, and 296574, given to J. P. Amezquita-Sanchez, D. Camarena-Martinez, and M. Valtierra-Rodriguez, respectively, through the “Sistema Nacional de Investigadoras e Investigadores (SNII)–SECIHTI–México”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are not publicly available due to privacy issues.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AE	Autoencoder
CEEMD	Complete Ensemble Empirical Mode
CNN	Convolutional Neural Networks
DL	Deep Learning
DWPT	Discrete Wavelet Packet Transform
ECG	Electrocardiogram
EMD	Empirical Mode Decomposition
FBs	Frequency Bands
HHT	Hilbert–Huang Transform
LSTM	Long Short-Term Memory Networks
ML	Machine Learning
MLP	Multilayer Perceptron
MODWPT	Maximal Overlap Discrete Wavelet Packet Transform
NSR	Normal Sinus Rhythm
SAE	Stacked Autoencoders
SCD	Sudden Cardiac Death
SCDH	Sudden Cardiac Death Holter
SVM	Support Vector Machine
WPT	Wavelet Packet Transform
WT	Wavelet Transform

References

Srinivasan, N.T.; Schilling, R.J. Sudden cardiac death and arrhythmias. Arrhythm. Electrophysiol. Rev. 2018, 7, 111–117. [Google Scholar] [CrossRef] [PubMed]
Paratz, E.D.; Rowsell, L.; Zentner, D.; Parsons, S.; Morgan, N.; Thompson, T.; James, P.; Pflaumer, A.; Semsarian, C.; Smith, K.; et al. Cardiac arrest and sudden cardiac death registries: A systematic review of global coverage. Open Heart 2020, 7, e001195. [Google Scholar] [CrossRef] [PubMed]
Wong, C.X.; Brown, A.; Lau, D.H.; Chugh, S.S.; Albert, C.M.; Kalman, J.M.; Sanders, P. Epidemiology of Sudden Cardiac Death: Global and Regional Perspectives. Heart Lung Circ. 2019, 28, 6–14. [Google Scholar] [CrossRef]
Katritsis, D.G.; Gersh, B.J.; Camm, A.J. A clinical perspective on sudden cardiac death. Arrhythm. Electrophysiol. Rev. 2016, 5, 177–182. [Google Scholar] [CrossRef]
Jazayeri, M.A.; Emert, M.P. Sudden Cardiac Death: Who Is at Risk? Med. Clin. N. Am. 2019, 103, 913–930. [Google Scholar] [CrossRef]
Myat, A.; Song, K.J.; Rea, T. Out-of-hospital cardiac arrest: Current concepts. Lancet 2018, 391, 970–979. [Google Scholar] [CrossRef]
Alfarhan, K.A.; Mashor, M.Y.; Zakaria, A.; Omar, M.I. Automated Electrocardiogram Signals Based Risk Marker for Early Sudden Cardiac Death Prediction. J. Med. Imaging Health Inform. 2019, 8, 1769–1775. [Google Scholar] [CrossRef]
Ebrahimzadeh, E.; Manuchehri, M.S.; Amoozegar, S.; Araabi, B.N.; Soltanian-Zadeh, H. A time local subset feature selection for prediction of sudden cardiac death from ECG signal. Med. Biol. Eng. Comput. 2018, 56, 1253–1270. [Google Scholar] [CrossRef] [PubMed]
Vargas-Lopez, O.; Amezquita-Sanchez, J.P.; De-Santiago-Perez, J.J.; Rivera-Guillen, J.R.; Valtierra-Rodriguez, M.; Toledano-Ayala, M.; Perez-Ramirez, C.A. A new methodology based on EMD and nonlinear measurements for sudden cardiac death detection. Sensors 2020, 20, 9. [Google Scholar] [CrossRef]
Ebrahimzadeh, E.; Foroutan, A.; Shams, M.; Baradaran, R.; Rajabion, L.; Joulani, M.; Fayaz, F. An optimal strategy for prediction of sudden cardiac death through a pioneering feature-selection approach from HRV signal. Comput. Methods Programs Biomed. 2019, 169, 19–36. [Google Scholar] [CrossRef]
Khazaei, M.; Raeisi, K.; Goshvarpour, A.; Ahmadzadeh, M. Early detection of sudden cardiac death using nonlinear analysis of heart rate variability. Biocybern. Biomed. Eng. 2018, 38, 931–940. [Google Scholar] [CrossRef]
Amezquita-Sanchez, J.P.; Valtierra-Rodriguez, M.; Adeli, H.; Perez-Ramirez, C.A. A Novel Wavelet Transform-Homogeneity Model for Sudden Cardiac Death Prediction Using ECG Signals. J. Med. Syst. 2018, 42, 176. [Google Scholar] [CrossRef] [PubMed]
Singhal, A.; Agarwal, M. An automatic risk assessment system for sudden cardiac death using look ahead pattern. Multimed. Tools Appl. 2024, 83, 27243–27258. [Google Scholar] [CrossRef]
Centeno-Bautista, M.A.; Perez-Sanchez, A.V.; Amezquita-Sanchez, J.P.; Valtierra-Rodriguez, M. Sudden cardiac death prediction based on the complete ensemble empirical mode decomposition method and a machine learning strategy by using ECG signals. Measurement 2024, 236, 115052. [Google Scholar] [CrossRef]
Trayanova, N.A.; Topol, E.J. Deep learning a person’s risk of sudden cardiac death. Lancet 2022, 399, 1933. [Google Scholar] [CrossRef]
Barker, J.; Li, X.; Khavandi, S.; Koeckerling, D.; Mavilakandy, A.; Pepper, C.; Bountziouka, V.; Chen, L.; Kotb, A.; Antoun, I.; et al. Machine learning in sudden cardiac death risk prediction: A systematic review. Europace 2022, 24, 1777–1787. [Google Scholar] [CrossRef]
Eleyan, A.; Alboghbaish, E. Electrocardiogram Signals Classification Using Deep-Learning-Based Incorporated Convolutional Neural Network and Long Short-Term Memory Framework. Computers 2024, 13, 55. [Google Scholar] [CrossRef]
Seitanidis, P.; Gialelis, J.; Papaconstantinou, G.; Moschovas, A. Identification of Heart Arrhythmias by Utilizing a Deep Learning Approach of the ECG Signals on Edge Devices. Computers 2022, 11, 176. [Google Scholar] [CrossRef]
Acharya, U.R.; Fujita, H.; Oh, S.L.; Hagiwara, Y.; Tan, J.H.; Adam, M. Application of deep convolutional neural network for automated detection of myocardial infarction using ECG signals. Inf. Sci. 2017, 415–416, 190–198. [Google Scholar] [CrossRef]
Shilla, W.; Wang, X. Wavelet Transform and Convolutional Neural Network Based Techniques in Combating Sudden Cardiac Death. EMITTER Int. J. Eng. Technol. 2021, 9, 377–389. [Google Scholar] [CrossRef]
Kwon, J.M.; Kim, K.H.; Jeon, K.H.; Park, J. Deep learning for predicting in-hospital mortality among heart disease patients based on echocardiography. Echocardiography 2019, 36, 213–218. [Google Scholar] [CrossRef] [PubMed]
Kwon, J.M.; Lee, Y.; Lee, Y.; Lee, S.; Park, J. An algorithm based on deep learning for predicting in-hospital cardiac arrest. J. Am. Heart Assoc. 2018, 7, 13. [Google Scholar] [CrossRef]
Saragih, Y.V.; Isa, S.M. CNN performance improvement using wavelet packet transform for sca prediction. J. Theor. Appl. Inf. Technol. 2022, 100, 5458–5468. [Google Scholar]
Kaspal, R.; Alsadoon, A.; Prasad, P.W.C.; Al-Saiyd, N.A.; Nguyen, T.Q.V.; Pham, D.T.H. A novel approach for early prediction of sudden cardiac death (SCD) using hybrid deep learning. Multimed. Tools Appl. 2021, 80, 8063–8090. [Google Scholar] [CrossRef]
Telangore, H.; Azad, V.; Sharma, M.; Bhurane, A.; Tan, R.S.; Acharya, U.R. Early prediction of sudden cardiac death using multimodal fusion of ECG Features extracted from Hilbert–Huang and wavelet transforms with explainable vision transformer and CNN models. Comput. Methods Programs Biomed. 2024, 257, 108455. [Google Scholar] [CrossRef] [PubMed]
Centeno-Bautista, M.A.; Rangel-Rodriguez, A.H.; Perez-Sanchez, A.V.; Amezquita-Sanchez, J.P.; Granados-Lieberman, D.; Valtierra-Rodriguez, M. Electrocardiogram Analysis by Means of Empirical Mode Decomposition-Based Methods and Convolutional Neural Networks for Sudden Cardiac Death Detection. Appl. Sci. 2023, 13, 3569. [Google Scholar] [CrossRef]
Alves, D.K.; Costa, F.B.; Ribeiro, R.L.D.A.; Sousa Neto, C.M.D.; Rocha, T.D.O.A. Real-time power measurement using the maximal overlap discrete wavelet-packet transform. IEEE Trans. Ind. Electron. 2017, 64, 3177–3187. [Google Scholar] [CrossRef]
Yang, D.M. The detection of motor bearing fault with maximal overlap discrete wavelet packet transform and teager energy adaptive spectral kurtosis. Sensors 2021, 21, 6895. [Google Scholar] [CrossRef]
Zhang, T.; Chen, W.; Li, M. Classification of inter-ictal and ictal EEGs using multi-basis MODWPT, dimensionality reduction algorithms and LS-SVM: A comparative study. Biomed. Signal Process. Control 2019, 47, 240–251. [Google Scholar] [CrossRef]
Han, C.; Shi, L. Automated interpretable detection of myocardial infarction fusing energy entropy and morphological features. Comput. Methods Programs Biomed. 2019, 175, 9–23. [Google Scholar] [CrossRef]
Al-Naami, B.; Fraihat, H.; Owida, H.A.; Al-Hamad, K.; De Fazio, R.; Visconti, P. Automated Detection of Left Bundle Branch Block from ECG Signal Utilizing the Maximal Overlap Discrete Wavelet Transform with ANFIS. Computers 2022, 11, 93. [Google Scholar] [CrossRef]
Perez-Sanchez, A.V.; Amezquita-Sanchez, J.P.; Valtierra-Rodriguez, M.; Adeli, H. A new epileptic seizure prediction model based on maximal overlap discrete wavelet packet transform, homogeneity index, and machine learning using ECG signals. Biomed. Signal Process. Control 2024, 88, 105659. [Google Scholar] [CrossRef]
Balasubaramanian, S.; Cyriac, R.; Roshan, S.; Maruthamuthu Paramasivam, K.; Chellanthara Jose, B. An effective stacked autoencoder based depth separable convolutional neural network model for face mask detection. Array 2023, 19, 100294. [Google Scholar] [CrossRef] [PubMed]
Soleymani, F.; Paquet, E. Financial portfolio optimization with online deep reinforcement learning and restricted stacked autoencoder—DeepBreath. Expert Syst. Appl. 2020, 156, 113456. [Google Scholar] [CrossRef]
Wang, L.; You, Z.H.; Chen, X.; Xia, S.X.; Liu, F.; Yan, X.; Zhou, Y.; Song, K.J. A computational-based method for predicting drug-target interactions by using stacked autoencoder deep neural network. J. Comput. Biol. 2018, 25, 361–373. [Google Scholar] [CrossRef]
Idowu, O.P.; Ilesanmi, A.E.; Li, X.; Samuel, O.W.; Fang, P.; Li, G. An integrated deep learning model for motor intention recognition of multi-class EEG Signals in upper limb amputees. Comput. Methods Programs Biomed. 2021, 206, 106121. [Google Scholar] [CrossRef]
Yin, Z.; Zhao, M.; Wang, Y.; Yang, J.; Zhang, J. Recognition of emotions using multimodal physiological signals and an ensemble deep learning model. Comput. Methods Programs Biomed. 2017, 140, 93–110. [Google Scholar] [CrossRef]
Goldberger, A.L.; Amaral, L.A.N.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.-K.; Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet. Circulation 2000, 101, E215–E220. [Google Scholar] [CrossRef] [PubMed]
Chinara, S. Automatic classification methods for detecting drowsiness using wavelet packet transform extracted time-domain features from single-channel EEG signal. J. Neurosci. Methods 2021, 347, 108927. [Google Scholar] [CrossRef]
Shrifan, N.H.M.M.; Akbar, M.F.; Mat Isa, N.A. Maximal overlap discrete wavelet-packet transform aided microwave non-destructive testing. NDT E Int. 2021, 119, 102414. [Google Scholar] [CrossRef]
Dai, X.; Cheng, J.; Guo, S.; Wang, C.; Qu, G.; Liu, W.; Li, W.; Lu, H.; Wang, Y.; Zeng, B.; et al. Optimization Strategy of a Stacked Autoencoder and Deep Belief Network in a Hyperspectral Remote-Sensing Image Classification Model. Discret. Dyn. Nat. Soc. 2023, 2023, 9150482. [Google Scholar] [CrossRef]
Bai, Y.; Sun, X.; Ji, Y.; Fu, W.; Zhang, J. Two-stage multi-dimensional convolutional stacked autoencoder network model for hyperspectral images classification. Multimed. Tools Appl. 2024, 83, 23489–23508. [Google Scholar] [CrossRef]
Rafiee, J.; Rafiee, M.A.; Prause, N.; Schoen, M.P. Wavelet basis functions in biomedical signal processing. Expert Syst. Appl. 2011, 38, 6190–6201. [Google Scholar] [CrossRef]
Dong, G.; Liao, G.; Liu, H.; Kuang, G. A Review of the Autoencoder and Its Variants: A Comparative Perspective from Target Recognition in Synthetic-Aperture Radar Images. IEEE Geosci. Remote Sens. Mag. 2018, 6, 44–68. [Google Scholar] [CrossRef]
Yang, Z.; Xu, B.; Luo, W.; Chen, F. Autoencoder-based representation learning and its application in intelligent fault diag-nosis: A review. Measurement 2022, 189, 110460. [Google Scholar] [CrossRef]
Gadhiya, T.; Tangirala, S.; Roy, A.K. Stacked Autoencoder Based Feature Extraction and Superpixel Generation for Mul-tifrequency PolSAR Image Classification. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2019; pp. 331–339. [Google Scholar] [CrossRef]
Liu, S.; Zhang, C.; Ma, J. Stacked auto-encoders for feature extraction with neural networks. In Communications in Computer and Information Science; Springer: Berlin/Heidelberg, Germany, 2016; pp. 377–384. [Google Scholar] [CrossRef]
Li, D.; Fu, Z.; Xu, J. Stacked-autoencoder-based model for COVID-19 diagnosis on CT images. Appl. Intell. 2002, 51, 2805–2817. [Google Scholar] [CrossRef]
Zhou, P.; Han, J.; Cheng, G.; Zhang, B. Learning Compact and Discriminative Stacked Autoencoder for Hyperspectral Im-age Classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 4823–4833. [Google Scholar] [CrossRef]
Vareka, L.; Mautner, P. Stacked autoencoders for the P300 component detection. Front. Neurosci. 2017, 11, 302. [Google Scholar] [CrossRef] [PubMed]
Willigenburg, N.W.; Daffertshofer, A.; Kingma, I.; van Dieën, J.H. Removing ECG contamination from EMG recordings: A comparison of ICA-based and other filtering procedures. J. Electromyogr. Kinesiol. 2012, 22, 485–493. [Google Scholar] [CrossRef]
Walden, A.T.; Contreras Cristan, A. The Phase-Corrected Undecimated Discrete Wavelet Packet Transform and Its Application to Interpreting the Timing of Events. Available online: https://royalsocietypublishing.org/ (accessed on 24 April 2025).
Hagara, M.; Stojanović, R.; Bagala, T.; Kubinec, P.; Ondráček, O. Grayscale image formats for edge detection and for its FPGA implementation. Microprocess. Microsyst. 2020, 75, 103056. [Google Scholar] [CrossRef]
Dhanya, L.; Chitra, R. A novel autoencoder based feature independent GA optimised XGBoost classifier for IoMT mal-ware detection. Expert Syst. Appl. 2024, 237, 121618. [Google Scholar] [CrossRef]
Singh, J.P.; Kumar, M. Chaotic whale-atom search optimization-based deep stacked auto encoder for crowd behaviour recognition. J. Exp. Theor. Artif. Intell. 2024, 36, 187–211. [Google Scholar] [CrossRef]
Alwahedi, F.; Aldhaheri, A.; Ferrag, M.A.; Battah, A.; Tihanyi, N. Machine learning techniques for IoT security: Current research and future vision with generative AI and large language models. Internet Things Cyber-Phys. Syst. 2024, 4, 167–185. [Google Scholar] [CrossRef]

Figure 1. One-minute windows of records: (a) 1-minute time windows extracted from the total 30-minute sample of ECG signals, and (b) 1-minute time windows taken during the 30 min preceding the SCD event in SCDH signals.

Figure 2. Schematic representation of MODWPT decomposition levels. At each level, the signals have equal bandwidth.

Figure 3. The basic scheme of an AE.

Figure 4. Creation of an SAE where (a) shows how the AEs are trained separately to obtain the features to obtain the features and (b) exhibits the way the features obtained are employed to form the final structure of the SAE.

Figure 5. Proposed methodology for predicting an SCD event.

Figure 6. The instant where the SCD event begins for one of the records in the SCDH database.

Figure 7. Steps for image representation. First, the 32 signals from the MODWPT decomposition are obtained for each 1-minute window (a), then a surface graph is made with all the signals, using red for the most positive amplitudes and blue for the most negative, with yellow as the base (b), then a top view of this surface graph is taken (c) and finally it is converted to grayscale (d) so that it can be processed by the classifier.

Figure 8. Frequency bands generated by MODWPT.

Figure 9. Comparison of the first six frequency bands for a (a) healthy signal and (b) a 1-minute prior SCD event signal.

Figure 10. Different image sizes to be processed by the classifier. The original image is at the top. In (a), an example of an image of a healthy signal is shown. In (b), an example of an image taken prior to the SCD event is shown.

Figure 11. Accuracy results over time for the proposed SAE.

Figure 12. Representation of the final structure of the proposed SAE.

Table 1. The average accuracy of each image size.

Image Size (Pixels)	Average Accuracy (%)
652 × 475	96.04
512 × 512	95.17
256 × 256	95.70
128 × 128	98.15
64 × 64	96.14

Table 2. Average accuracy for different neuron sizes in the first AE hidden layer.

First AE Hidden Layer Neurons	Average Accuracy (%)
400	97.89
300	98.68
200	98.15
100	94.91
75	91.40

Table 3. Average accuracy for different neuron sizes in the second AE hidden layer.

Second AE Hidden Layer Neurons	Average Accuracy (%)
200	98.85
175	98.94
150	98.85
125	98.68
100	98.68
75	96.68
50	98.68
25	98.5

Table 4. Minute-by-minute accuracy.

Minutes Prior to SCD Event	Accuracy (%)	Minutes Prior to SCD Event	Accuracy (%)
1	100	16	100
2	100	17	100
3	100	18	97.36
4	97.36	19	100
5	100	20	100
6	97.36	21	97.36
7	97.36	22	97.36
8	100	23	100
9	100	24	100
10	97.36	25	100
11	97.36	26	100
12	97.36	27	100
13	97.36	28	97.36
14	100	29	100
15	100	30	97.36

Table 5. Quantitative and qualitative comparisons between the proposed work and similar studies.

Work	Data Processing	Classifier	Accuracy/Prediction Time
Alfarhan et al. [7]	Mean, standard deviation	K-nearest neighbor	97%/10 min
Ebrahimzadeh et al. [8]	First derivative feature	MLP	83%/12 min
Vargas-Lopez et al. [9]	EMD, Higuchi fractal	MLP	94%/24 min
Ebrahimzadeh et al. [10]	Standard deviation from variability heart rate	MLP	90.18%/13 min
Khazaei et al. [11]	Recurrence quantification analysis	Decision tree	95%/6 min
Amezquita-Sanchez et al. [12]	WPT, homogeneity index	Probabilistic neural network	95.8%/20 min
Singhal and Agarwal [13]	Fourier-based decomposition technique	min	100%/10 min
Centeno-Bautista et al. [14]	CEEMD	SVM	97.28%/30 min
Saraghi and Isa [23]	WPT	CNN	95.89%/30 min
Kaspal et al. [24]	Recurrence complex network	CNN	90.6%/30 min
Telangore et al. [25]	HHT, WT	CNN	98.81%/30 min
Centeno-Bautista et al. [26]	CEEMD	CNN	97.5%/30 min
Proposed work	MODWPT	Stacked autoencoders	98.94%/30 min

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Centeno-Bautista, M.A.; Perez-Sanchez, A.V.; Amezquita-Sanchez, J.P.; Camarena-Martinez, D.; Valtierra-Rodriguez, M. A Computational Methodology Based on Maximum Overlap Discrete Wavelet Transform and Autoencoders for Early Prediction of Sudden Cardiac Death. Computation 2025, 13, 130. https://doi.org/10.3390/computation13060130

AMA Style

Centeno-Bautista MA, Perez-Sanchez AV, Amezquita-Sanchez JP, Camarena-Martinez D, Valtierra-Rodriguez M. A Computational Methodology Based on Maximum Overlap Discrete Wavelet Transform and Autoencoders for Early Prediction of Sudden Cardiac Death. Computation. 2025; 13(6):130. https://doi.org/10.3390/computation13060130

Chicago/Turabian Style

Centeno-Bautista, Manuel A., Andrea V. Perez-Sanchez, Juan P. Amezquita-Sanchez, David Camarena-Martinez, and Martin Valtierra-Rodriguez. 2025. "A Computational Methodology Based on Maximum Overlap Discrete Wavelet Transform and Autoencoders for Early Prediction of Sudden Cardiac Death" Computation 13, no. 6: 130. https://doi.org/10.3390/computation13060130

APA Style

Centeno-Bautista, M. A., Perez-Sanchez, A. V., Amezquita-Sanchez, J. P., Camarena-Martinez, D., & Valtierra-Rodriguez, M. (2025). A Computational Methodology Based on Maximum Overlap Discrete Wavelet Transform and Autoencoders for Early Prediction of Sudden Cardiac Death. Computation, 13(6), 130. https://doi.org/10.3390/computation13060130

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Computational Methodology Based on Maximum Overlap Discrete Wavelet Transform and Autoencoders for Early Prediction of Sudden Cardiac Death

Abstract

1. Introduction

2. Data and Methods

2.1. ECG Data

2.1.1. Data Used

2.1.2. Selection and Preparation of ECG Signals

2.2. Maximal Overlapped Discrete Wavelet Packet Transform

2.3. Classifier

2.3.1. Autoencoder

2.3.2. Stacked Autoencoder

3. Methodology

4. Experimentation and Results

4.1. Signal Decomposition

4.2. Data Classification

4.2.1. Testing Different Image Sizes

4.2.2. Testing the Number of the Hidden Layer Neurons

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI