Efficient Extraction of Deep Image Features Using a Convolutional Neural Network (CNN) for Detecting Ventricular Fibrillation and Tachycardia

Mjahad, Azeddine; Saban, Mohamed; Azarmdel, Hossein; Rosado-Muñoz, Alfredo

doi:10.3390/jimaging9090190

Open AccessArticle

Efficient Extraction of Deep Image Features Using a Convolutional Neural Network (CNN) for Detecting Ventricular Fibrillation and Tachycardia

GDDP, Department Electronic Engineering, School of Engineering, University of Valencia, 46100 Burjassot, Valencia, Spain

^*

Author to whom correspondence should be addressed.

J. Imaging 2023, 9(9), 190; https://doi.org/10.3390/jimaging9090190

Submission received: 30 June 2023 / Revised: 23 August 2023 / Accepted: 8 September 2023 / Published: 18 September 2023

(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)

Download

Browse Figures

Versions Notes

Abstract

:

To safely select the proper therapy for ventricular fibrillation (

V F

), it is essential to distinguish it correctly from ventricular tachycardia (

V T

) and other rhythms. Provided that the required therapy is not the same, an erroneous detection might lead to serious injuries to the patient or even cause ventricular fibrillation (

V F

). The primary innovation of this study lies in employing a CNN to create new features. These features exhibit the capacity and precision to detect and classify cardiac arrhythmias, including

V F

and

V T

. The electrocardiographic (ECG) signals utilized for this assessment were sourced from the established MIT-BIH and AHA databases. The input data to be classified are time–frequency (tf) representation images, specifically, Pseudo Wigner–Ville (

P W V

). Previous to Pseudo Wigner–Ville (

P W V

) calculation, preprocessing for denoising, signal alignment, and segmentation is necessary. In order to check the validity of the method independently of the classifier, four different CNNs are used: InceptionV3, MobilNet, VGGNet and AlexNet. The classification results reveal the following values: for VF detection, there is a sensitivity (Sens) of 98.16%, a specificity (Spe) of 99.07%, and an accuracy (Acc) of 98.91%; for ventricular tachycardia (

V T

), the sensitivity is 90.45%, the specificity is 99.73%, and the accuracy is 99.09%; for normal sinus rhythms, sensitivity stands at 99.34%, specificity is 98.35%, and accuracy is 98.89%; finally, for other rhythms, the sensitivity is 96.98%, the specificity is 99.68%, and the accuracy is 99.11%. Furthermore, distinguishing between shockable (

V F

/

V T

) and non-shockable rhythms yielded a sensitivity of 99.23%, a specificity of 99.74%, and an accuracy of 99.61%. The results show that using tf representations as a form of image, combined in this case with a CNN classifier, raises the classification performance above the results in previous works. Considering that these results were achieved without the preselection of ECG episodes, it can be concluded that these features may be successfully introduced in Automated External Defibrillation (AED) and Implantable Cardioverter Defibrillation (ICD) therapies, also opening the door to their use in other ECG rhythm detection applications.

Keywords:

Biomedical Systems; Electrocardiographic Signals; ventricular fibrillation; ventricular tachycardia; time–frequency representation; non-stationary signals; image analysis; CNN

1. Introduction

Cardiac arrhythmia is prevalent in developed countries and represents a significant cause of mortality. Ventricular fibrillation (VF), even in its milder episodes, can lead to sudden cardiac death. As a result, the timely detection of ventricular arrhythmia is crucial to initiate appropriate therapeutic interventions and safeguard the patient’s life. While the causes of arrhythmia may vary, they all stem from disruptions in the heart’s cellular electrophysiology. Autopsy studies have consistently revealed that arrhythmogenic cardiac disorders are the primary underlying cause in cases of sudden cardiac death, with no evidence of pathological abnormalities in the heart. This underscores the fact that VF can trigger a rapid and irreversible degenerative process in the heart’s electrical system, leading to fatal consequences [1,2]. In order to restore normal cardiac rhythm during a ventricular fibrillation (

V F

) episode, the standard procedure involves the application of electrical defibrillation to the heart using an Automatic External Defibrillator (AED) [3]. AEDs are now readily available in various public locations, including airports, shopping centers, sports arenas, and more. This crucial process entails delivering a high-energy electrical shock externally, through the patient’s chest wall, with the aim of reestablishing a regular heart rhythm. Several studies [4,5,6] have demonstrated that the success of defibrillation is inversely correlated with the time elapsed between the onset of a ventricular fibrillation (

V F

) episode and the application of the electrical discharge. In other words, the longer the interval between the start of VF and the administration of the electrical shock, the lower the likelihood of a successful defibrillation. These findings underscore the critical importance of early intervention and prompt defibrillation in improving the chances of restoring a normal heart rhythm during VF episodes.

Detecting ventricular fibrillation (

V F

) automatically poses significant challenges due to its intrinsic characteristics, such as a lack of organization and irregularity, especially considering the existence of similar pathologies such as ventricular tachycardia (

V T

), where the required therapy is not the same as in VF. Specifically, some types of

V T

can be recovered by using drugs, and others require a low-energy synchronized electrical stimulation cardioversion. To successfully revert

V F

, an electrical shock must be administered, and the intensity of the shock (defibrillation level) depends on the stage of ventricular fibrillation. The early detection of VF enables the use of lower shock levels, facilitating faster restoration of the heart’s normal rhythms. However, it is of utmost importance to exercise caution, as administering an electrical shock to a patient not experiencing VF can lead to severe harm or even trigger VF. Ventricular tachycardia (

V T

) is one of the rhythms that can be particularly challenging to discern, underscoring the significance of accurate differentiation for making appropriate treatment decisions. Various detection algorithms have been developed utilizing diverse signal-processing techniques, including the Hilbert transform [7], Fourier transform [8], wavelet transform, and other signal processing methods [9,10], as well as time–frequency representations [11]. These techniques share a common characteristic: they integrate temporal and spectral information within the same representation. This fusion of information is particularly crucial when dealing with non-stationary processes like the electrocardiogram (ECG) signal, especially in the presence of irregular pathologies such as ventricular fibrillation (

V F

). By combining temporal and spectral information, these algorithms enable more effective detection and analysis of

V F

, enhancing our understanding and ability to diagnose and treat these irregular cardiac conditions.

The detection of ventricular fibrillation (

V F

) or ventricular tachycardia (

V T

) using electrocardiogram (ECG) data has been explored through numerous statistical methods. However, these manual approaches often struggle to extract features that effectively capture the intricate characteristics of ventricular arrhythmia. Consequently, machine learning techniques have emerged as successful alternatives for cardiac arrhythmia recognition. For instance, in [12], the wavelet method was implemented to identify ECG arrhythmias, specifically discerning three types of episodes:

N o r m a l

,

V T

, and

V F

. In [13], a Support Vector Machine (SVM) with a Gaussian Kernel was employed to detect ventricular irregularities, utilizing morphological features. Furthermore, in [14], for the detection and classification of shockable arrhythmias (

V F

/

V T

) Random Forest (RF) decision trees were utilized in combination with Variational Mode Decomposition. In [15], the real-time identification of shockable episodes (

V F

/

V T

) was realized using fixed thresholds. Moreover, beyond these strategies, alternative studies have embraced a range of machine learning techniques for the identification and classification of ventricular arrhythmias. In [16], a C4.5 classifier was implemented. [17] employed a k-Nearest Neighbor (kNN) classifier while [18] utilized Bayesian decision methods. Additionally, [19] employed Decision Trees in conjunction with independent component analysis (ICA). By harnessing the power of machine learning, these approaches offer promising avenues to improve the accuracy and depth of ventricular arrhythmia detection. They enable the extraction of meaningful features and enhance the understanding and recognition of complex cardiac conditions. As a result, these advancements contribute to more effective diagnosis and treatment strategies for ventricular arrhythmia.

Applying traditional algorithms to leverage the information contained within the architecture of electrocardiogram ECG data poses a significant challenge, primarily due to the non-stationary nature of biomedical signals. Consequently, these conventional algorithms often exhibit limited performance when it comes to representing the intricate characteristics present in such complex data. In contrast, convolutional neural networks (CNNs) have garnered substantial interest in the scientific communities focused on image and speech classification. This heightened attention stems from the fact that the topology of CNNs closely resembles that of biological systems. As a result, CNNs offer a more suitable framework for capturing and analyzing the complex patterns inherent in ECG signals, allowing for improved performance in detecting and classifying cardiac conditions.

1.1. Related Work

Convolutional neural networks (CNNs) have found extensive application in various domains, including traffic sign detection [20], indoor object detection [21,22], and numerous other fields [23,24]. Recognizing faces poses a significant challenge and has garnered interest across different disciplines such as psychology, model identification, computer vision, and computer graphics. Consequently, the literature on face recognition is vast and diverse. In [25], the authors presented a long-distance face recognition method that addresses the variation in recognition rates caused by distance variations. They employed a CNN for face recognition and measured similarity using the Euclidean distance. This approach achieved outstanding performance at various distances, surpassing traditional face recognition methods. A hybrid system for face recognition was introduced by the authors in [26], combining a Logistic Regression Classifier (LRC) with a CNN. The CNN was trained to localize and identify faces in images, while the LRC classified the features learned by the convolutional network. Experimental results on the Yale face dataset [27] demonstrated improved classification accuracy and reduced processing time. In [28], a CNN-based face identification system with nine layers was proposed. The network consisted of three convolution layers, two pooling layers, two fully connected layers, and one Softmax layer. The proposed CNN was evaluated on the ORL face [29] and AR face datasets [30], achieving higher recognition rates compared to traditional machine learning and handcrafted feature methods for face identification. The implementation of a deep learning algorithm for face recognition was detailed in [31]. The algorithm was based on the OpenFace project, utilizing the FaceNet neural network architecture [32]. The results highlighted the effectiveness of the incremental learning algorithm in improving performance. An Active Face Recognition system (AcFR) was proposed in [33], which employed a CNN and mimicked human behavior in common face recognition scenarios. A pre-trained VGG-Face CNN was utilized to extract facial image features, followed by nearest-neighbor identity recognition for identification. Evaluation of the CMU PIE face dataset [34] demonstrated that the recognition stage of the AcFR system outperformed that of alternative systems. In [35], the authors introduced a novel face recognition system using a deep C2D–CNN model at the decision level.

1.2. Proposed Work

In this work, we propose a ventricular arrhythmia detection method, distinguishing

V T

and

V F

shockable rhythms, based on feeding a CNN with raw time–frequency data. It follows from the idea that the feature extraction from the matrix resulting from the time–frequency analysis using CNN allows better results to be obtained than those detectors using feature-selection strategies and reducing to a minimum the necessary signal preprocessing. In order to prove the validity of this method, a range of four CNN-based classifiers of different natures are used to evidence its independence of the classifier.

To achieve the objectives, this paper is structured as follows. Section 2 introduces the CNN algorithm, Section 4 describes the materials used and provides details on the processing applied to the ECG signal. Section 5, Section 6 and Section 8 present the results, discussions, and conclusions, respectively.

2. Deep Learning Algorithms

Deep learning models are neural networks that possess a deep structure inspired by the intricate workings of the human brain. By mimicking its processes, deep learning aims to address a wide range of learning problems. Particularly in the field of computer vision, deep learning techniques have achieved remarkable success. Currently, the main types of networks are multilayer perceptron, CNN, and recurrent neural network (RNN) [36]. As for other DL networks, such as fully convolutional networks (FCNs) they are typically used in tasks related to semantic segmentation [37].

2.1. Fundamental Concepts of Convolutional Neural Networks

In this section, we will introduce the widely recognized convolutional neural network (CNN) architecture and discuss the specific model utilized in this study.

As discussed earlier, CNNs are popular due to their improved performance in image recognition and classification. Architecture-wise, CNNs are simply feedforward Artificial Neural Networks (ANNs) [38,39], as illustrated in Figure 1. CNNs are characterized by their layered structure and employ filters, kernels, or neurons with learnable weights and biases. Each filter receives input, performs convolution operations, and may apply non-linear transformations [40]. A typical CNN architecture comprises the following components:

The convolutional layer (CONV), which processes the received input data;
The pooling layer (POOL), which allows compressing the information by reducing the size of the intermediate image (often by subsampling);
The Fully Connected Layer (FCL) layer, which is a perceptron-type layer;
The classification layer (Softmax), which predicts the class of the input image.

2.1.1. Convolutional Layer

The convolutional layer is a fundamental component of a Convolutional Network and plays a crucial role in the computational process. Its main function is to extract features from input data, particularly images. By applying convolution, the spatial correlation between pixels is preserved as the network learns image features using small squares of the input image. A set of learnable neurons convolve the input image, resulting in a feature map or activation map in the output image [36]. A kernel is placed in the top-left corner of the image. The process is repeated until all possible locations in the image are filtered, which is shown in Figure 2.

2.1.2. Nonlinear Activation Function

The results of a linear operation, such as convolution, undergo further processing through a nonlinear activation function. While smooth nonlinear functions like sigmoid or hyperbolic tangent (tanh) were previously utilized due to their resemblance to the behavior of biological neurons, the rectified linear unit (ReLU) has become the most popular choice for nonlinear activation functions. The ReLU function is defined as

f (x) = m a x (0, x)

. Please refer to Figure 3 for a visual representation.

2.1.3. Pooling Layer

The pooling layer plays a crucial role in reducing the spatial size of the representation, thereby reducing the number of parameters and computational load in the network. Additionally, it helps to control overfitting. It is important to note that the pooling layer does not involve any learning process. Pooling units are generated using functions such as max-pooling, average pooling, or L2-norm pooling [36]. The process of the pooling operation is shown in Figure 4.

2.1.4. Fully Connected Layer

The FCL serves as the final pooling layer, providing the extracted features to a classifier that uses the Softmax activation function [36]. The Softmax function ensures that the sum of the output probabilities from the Fully Connected Layer is 1. It achieves this by transforming a vector of arbitrary real-valued scores into a vector of values between zero and one that add up to one.

2.1.5. Loss Function

A loss function, also known as a cost function, quantifies the agreement between the network’s output predictions obtained through forward propagation and the provided ground truth labels [41]. In multiclass classification tasks, the cross-entropy loss function is commonly used, while the mean squared error is typically employed for regression tasks involving continuous values. The selection of an appropriate loss function is a hyperparameter that depends on the specific task at hand and needs to be determined accordingly.

2.2. Optimization of Hyperparameters

Hyperparameters are parameters in a convolutional neural network (CNN) that are not learned during the training process but need to be specified beforehand. These hyperparameters significantly influence the network’s performance and can be adjusted to optimize the model’s accuracy and training efficiency. Some important hyperparameters in CNNs include the following.

Number of layers [42]: A conventional CNN typically consists of multiple layers, including convolutional layers, activation layers (e.g., ReLU), pooling layers, and fully connected layers.
Filter size (Kernel Size) [43]: The size of the filters used in the convolutional layers is an important parameter. Common filter sizes are 3 × 3, 5 × 5, and 7 × 7.
Number of filters [44]: The number of filters in each convolutional layer determines the depth of the feature maps generated. More filters lead to more expressive power but also increase computation requirements.
Stride [45]: The stride determines the step size at which the filter is moved across the input image. Common values are 1 and 2, with larger strides reducing the size of the output feature maps.
Padding [45]: Padding can be used to preserve the spatial dimensions of the input when convolving with filters. Common padding values are ’same’ and ’valid’.
Activation function [46]: Common activation functions include ReLU (rectified linear unit), leaky ReLU, and Sigmoid. ReLU is widely used due to its simplicity and effectiveness.
Pooling [47]: Pooling layers downsample the feature maps reduces the spatial dimensions. Common pooling types are Max pooling and average pooling, typically with a pool size of $2 \times 2$ .
Fully connected layers [48]: The number of neurons in the fully connected layers can vary based on the complexity of the task. The output layer size depends on the number of classes in the classification task.
Dropout [49]: Dropout is a regularization technique that randomly sets a fraction of neurons to zero during training, preventing overfitting. Common dropout rates are between 0.2 and 0.5.
Batch size [50]: The number of samples used in each iteration during training. Smaller batch sizes are computationally more expensive but can lead to better convergence.
Number of epochs [51]: This is the number of times the entire training dataset is passed through the network during training.
Learning rate [52]: The learning rate controls the step size during optimization. A small learning rate leads to slow convergence, while a large learning rate can cause instability.
Optimizer: Common optimizers used in CNNs include Stochastic Gradient Descent (SGD) [53], Adam, and RMSprop.

The choice of these parameters depends on the specific problem, dataset, and available computing resources. Often, hyperparameter tuning and experimentation are required to find the best parameter settings for a given CNN architecture and task.

2.3. CNN Architectures

In this study, four different CNN architectures were employed: AlexNet, VGGNet, InceptionV3, and MobileNet.

2.3.1. AlexNet

AlexNet is a deep CNN architecture capable of classifying over 1000 different classes. It consists of five convolutional layers (CLs) with three pooling layers, two fully connected layers (FLCs), and a Softmax layer. AlexNet utilizes a total of 650 k neurons and 60 million parameters. The input image for AlexNet needs to have dimensions of

227 \times 227 \times 3

. The first CL takes the input image and applies 96 kernels of size

11 \times 11 \times 3

with a stride of four pixels, producing the output for the second layer [54].

2.3.2. VGGNet

VGGNet, short for the visual geometry group network, is a deep neural network known for its multilayered architecture. It is based on the CNN model and has been widely applied to the ImageNet dataset. VGG-19, in particular, is known for its simplicity and utilization of

3 \times 3

convolutional layers, which contribute to its increased depth. Max pooling layers are used to reduce the volume size in VGG-19, and it includes two fully connected (FC) layers with 4096 neurons [55].

2.3.3. Inception V3

The Inception V3 is a deep learning model based on convolutional neural networks, which is used in image analysis and object detection. Inception V3 is a superior version of the basic model Inception V1, which was introduced by Szegedy and others in 2014 [56].

2.3.4. MobileNet

The MobileNet model is specifically designed for efficiency and optimized for running on embedded or mobile devices. Its key layer is the depthwise separable convolution, which helps reduce the number of features. MobileNet v2, released in April 2017, introduced bottleneck layers and shortcut connections as updates from the previous version [57].

3. Time–Frequency Representation

The Wigner–Ville Distribution (WV) is one of the most commonly used representations for time–frequency analysis. It is applied to the ECG time window without applying the Hilbert transform before performing the time–frequency decomposition. Figure 5 shows the symmetry of the diagram due to the presence of both positive and negative frequencies. In the second case, the analytic signal is first calculated using the Hilbert transform, and then each matrix is processed using the WV based on the obtained analytic signal.

Compared to the

P W V

, the artifacts and interferences introduced by the

W V

have been reduced, allowing for clearer spectral visualization [58], so the Pseudo Wigner–Ville (

P W V

) variant was finally used. This variant reduces these terms using a smoothing kernel

h (t)

. The mathematical description of

P W V

is defined as shown in the equation below.

P W V_{x} = \int_{- \infty}^{+ \infty} h (τ) S (t + \frac{τ}{2}) S^{*} (t + \frac{τ}{2}) e^{- j 2 ν π τ} d τ

(1)

where S(t) is the analyzed signal,

τ

is the time lag, t is the time instant, and h is the frequency smoothing window. In order to reduce interference,

P W V

uses the analytic signal to replace the original signal filtering out and thus the negative frequency. The analytic signal S(t) corresponding to the original x(t) signal is given by Equation (2).

S [x (t)] = x (t) + j H [x (t)]

(2)

where

H [x (t)]

is the Hilbert transform of

x (t)

, as shown in Equation (3).

H [x (t)] = \frac{1}{π} \int_{- \infty}^{+ \infty} \frac{x (τ)}{t - τ} d τ

(3)

4. Material and Methods

Figure 6 shows the general scheme of the followed methodology, from the reading of the records of the database to the results obtained by the classifier.

The developed methodology is composed of four fundamental phases.

First phase: The dataset used is described.
Second phase: The ECG data undergoes filtering to reduce baseline interference. Once filtered, the Window Reference Mark (WRM) of the ECG signal is obtained. Each WRM indicates the start of a time window (tw) within the ECG signal.
Third phase: Information extraction is performed by applying the Hilbert transform (Ht) to each window tw obtained in the first phase. Subsequently, the TFR matrix is computed using the Pseudo Wigner–Ville method, resulting in the Time–Frequency Representation Image (TFRI).
Fourth phase: The TFRI matrices obtained in the previous step are used as input for a deep learning CNN (CNN1, CNN2, InceptionV3, MobilNet, VGGNet, and AlexNet), as detailed in Section 2.3 and Section 4.4.1. The success of ventricular fibrillation (VF) detection relies on signal processing techniques and the structure of the classifiers employed. To achieve optimal performance, it is necessary to adjust the CNN parameters to better adapt to the data.

4.1. Materials

The ECG records used in this study were sourced from the MIT-BIH Malignant Ventricular Fibrillation [59] and AHA (2000 series) [60] standard databases. Without preselecting ECG episodes, the analysis was conducted to simulate the use of an AED. A total of 24 patients were included in the analysis, consisting of 22 records from the MIT-BIH database and two additional records from the AHA database. Each record contained half-hour annotated ECG recordings of continuous ECG. The inclusion of AHA records was intended to increase the number of ventricular tachycardia (

V T

) episodes and improve the balance of recorded time between ventricular tachycardia (

V T

) and ventricular fibrillation (

V F

) episodes. The study defined four groups (classes) of rhythms: normal sinus rhythm (

N o r m a l

), ventricular tachycardia (

V T

), ventricular fibrillation including flutter episodes (

V F

), and other rhythms (non-ventricular arrhythmia, noise, etc.), labeled as Other (

O t h e r

).

4.2. Electrocardiographic Signal Preprocessing

4.2.1. Denoising

The purpose of this preprocessing stage is to eliminate various types of noise present in the ECG signal, such as baseline oscillation and interferences like power line interference and electromyogram (EMG). Baseline oscillations typically have a frequency range below 1 Hz, power line interference occurs at 50 or 60 Hz, and the EMG exhibits a wide bandwidth with low amplitude when the patient is at rest and with a low energy below 45 Hz. To address these issues, the ECG signal is first resampled to 125 Hz. Then, an 8th-order IIR bandpass filter with a Butterworth response is applied, with a passband ranging from 1 Hz to 45 Hz. This effectively removes the baseline oscillation below 1 Hz, power line interference, and EMG activity above 45 Hz [61,62], as illustrated in Figure 7.

4.2.2. Segmentation

The next step involves obtaining a Window Reference Mark (WRM) to indicate the beginning of the ECG time window, denoted as

t w

. According to [58], a normal heart rate range is considered to be between 50 and 120 beats per minute (bpm). Therefore, the minimum distance (

W R M_{m i n}

) and maximum distance (

W R M_{m a x}

) between two consecutive WRMs are set to 0.5 s and 1.2 s, respectively. These values were utilized in our analysis. The calculation of WRM reference marks was performed using a pre-existing algorithm, where

N_{L M C}

represents the number of local maxima LM marks present in the signal. From each generated WRM reference mark, a time window

t w_{j}

of 1.2 s in length (150 samples) was created, starting at the corresponding WRM mark

W R M_{j}

, as shown in Equation (4).

\begin{matrix} t w_{j} = [W R M_{j}, W R M_{j} + 1.2 s]; j = 1, \dots, N_{L M C} \end{matrix}

(4)

4.3. Extraction of Image from TFR

Once the data matrix is obtained from the Time–Frequency Representation (TFR) combined with the Hilbert transform (Ht) for each tw window, this data matrix TFR is converted into an image TFRI (Lf × Lt) with a size of Lf × Lt pixels, where Lf = 45 and Lt = 150. This image is then directly input into the CNN. This approach ensures that all temporal and spectral information from the ECG signal is preserved in the data matrix, providing the classifier with comprehensive data information. It is important to note that there is no feature extraction performed on the TFRI, as it already contains the temporal and spectral information of the ECG signal.

Figure 8 illustrates examples of the time–frequency representations (TFR) using the Pseudo Wigner–Ville (

P W V

) transform for signals belonging to the

N o r m a l

,

O t h e r

,

V T

, and

V F

classes. The intensity distributions clearly exhibit distinct patterns for each class. In the case of a Normal signal, the intensity is localized in time, primarily due to the QRS complex, and it covers a wide range of frequencies. On the other hand, VF signals exhibit irregular intensity distributions along both the time and frequency axes without a specific pattern.

4.4. Model Training and Evaluation

4.4.1. Model Architecture

The architectures of the proposed CNN model are summarized in Table 1.

In the CNN1 method, 2 fully connected layers utilize the output from the TFR and predict the class of the image based on the vector calculated in previous stages.
In the CNN2 method, the network consists of 6 layers, including 2 convolution layers, 2 max-pooling layers, and 2 fully connected layers. Each convolution layer (layers 1 and 2) applies convolution with its respective kernel size (layers 3 and 4). Following each convolution layer, a max-pooling operation is performed on the generated feature maps. The purpose of max-pooling is to reduce the dimensionality of the feature maps, aiding in the extraction of essential features.

4.4.2. Training the Convolutional Neural Network Model

Unlike other research studies, which utilized optimization techniques to select layers in complex CNN architectures and employed different hyperparameters for training, in our case, we have taken a different approach. We began with a basic CNN structure and conducted a series of systematic tests where we progressively added and adjusted layers. Throughout this process, we maintained consistent hyperparameters for training. We evaluated the impact of these layers on performance using a validation dataset. This unique methodology has enabled us to identify the specific layers that have a notable positive impact on the network’s performance for the particular task we are addressing. The Adam optimizer was employed for training the model, and the categorical cross-entropy loss function was utilized for this purpose. The model was trained for 100 epochs. The training and validation results are depicted in Figure 9 and Figure 10. We can see that the error is close to 0 and the accuracy value is very high in both the training and evaluation sets. This indicates that training with 100 epochs is sufficient to have a well-trained model.

Cross-validation is essential for selecting optimal parameters in machine learning and deep learning. Various traditional cross-validation methods are available, such as leave-one-out cross-validation and k-fold cross-validation [63]. In this study, we followed a specific approach. We randomly chose 67% of the data for each class for training, leaving 33% for testing. The CNN model was trained on the training data, and we evaluated its classification performance on the test data employing metrics like sensitivity, specificity, a, and F-Score. We repeated this process five times with different random selections and averaged the results to assess the overall classifier performance.

4.5. Performance Metrics for Classification

The performance of different networks on the testing dataset was evaluated after the completion of the training phase. The evaluation was based on four performance metrics: accuracy, sensitivity, specificity, and F-Score. The following equations were used for calculation [64,65]:

\begin{matrix} A c c u r a c y (%) = \frac{(T P + T N)}{(T P + F P + T N + F N)} \times 100 \end{matrix}

(5)

\begin{matrix} S e n s i t i v i t y (%) = \frac{(T P)}{(T P + F N)} \times 100 \end{matrix}

(6)

\begin{matrix} S p e c i f i c i t y (%) = \frac{(T N)}{(T N + F P)} \times 100 \end{matrix}

(7)

\begin{matrix} F S c o r e (%) = \frac{(2 \times T P)}{(2 \times T P + F P + F N)} \times 100 \end{matrix}

(8)

In the classification of

N o r m a l

,

O t h e r

,

V T

, and

V F

patients, the terms true-positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN) were used.

5. Results

The preprocessing stage involved denoising and reducing baseline variation by applying an eighth-order Butterworth IIR bandpass filter with a frequency range of 1 Hz to 45 Hz. Window reference marks (WRMs) were calculated to indicate the beginning and end of the 1.2 s time window for each temporal signal. As previously mentioned, the experiments in this study utilized signals extracted from the MIT-BIH and AHA standard databases, categorized into four distinct groups:

V F

,

V T

,

N o r m a l

, and

O t h e r

. The initial preprocessing step encompassed denoising and baseline variation reduction through the utilization of an eighth-order Butterworth IIR bandpass filter with a frequency range spanning from 1 Hz to 45 Hz. Furthermore, window reference marks (WRMs) were generated to delineate the temporal boundaries of the 1.2 s time window (tw) for each signal.

We have proposed three different techniques to extract the image feeding the classifier: TFR_CNN1, Ht_TFR_CNN1, and Ht_TFR_CNN2.

In the TFR_CNN1 approach, we initially transformed each tw into a time–frequency Representation Image (TFRI) utilizing the Pseudo Wigner–Ville transform, without using the Hilbert transform (Ht). The resulting image was then converted into a feature vector, which served as input for the Fully Connected Layer (FCL) of the classifier.
In the Ht_TFR_CNN1 method, information extraction involved applying the Hilbert transform to each window’s tw obtained in the first phase, followed by the assessment of the Time–Frequency Representation (TFR) matrix using the Pseudo Wigner–Ville transform. The resulting TFR matrix was used to generate the TFRI, which was then used as input for the FCL.
In the Ht_TFR_CNN2 method, the parameters were extracted using CNN2 by combining the Hilbert transform (Ht) and the TFRI. The extracted vectors were then used as input for the FCL.

In the TFR_CNN1, Ht_TFR_CNN1, and Ht_TFR_CNN2 methods, after receiving a vector at the input, the FCL applies a linear combination and an activation function successively to classify the input image. The output of the FCL is a vector of a size corresponding to the number of classes, where each component represents the probability of the input image belonging to a specific class.

Figure 11, Figure 12, Figure 13 and Figure 14 illustrate the confusion matrix for one of the iterations. Table 2, Table 3, Table 4 and Table 5 present the averaged performance values acquired from the reiterated random validation employed in this study. When the TFR_CNN1 algorithm was used (epochs = 50), the results showed a sensitivity of 85.88%, an overall specificity of 99.30%, an overall accuracy of 96.82%, and an overall F-Score of 92.10% for

V F

, and a sensitivity of 95.84%, an overall specificity of 97.19%, an overall accuracy of 97.09%, and an overall F-Score of 96.52% for

V T

. It can be concluded that achieving high classification results using the TFR_CNN1 strategy is challenging, primarily due to the significant similarity between

V F

and

V T

signals. This necessitates the exploration of alternative approaches to address the class discrimination problem, leading to the utilization of Ht with RTF. The results obtained using the Ht_TFR_CNN1 algorithm (epochs = 50) for

V F

detection showed a sensitivity of 98.04%, an overall specificity of 98.94%, an overall accuracy of 98.77%, and an overall F-Score of 98.48%, while for

V T

, a sensitivity of 89.70%, an overall specificity of 99.70%, an overall accuracy of 99.00%, and an overall F-Score of 94.43% were obtained. When employing the Ht_TFR_CNN1 algorithm (epochs = 100) for

V F

detection, a sensitivity of 96.44%, an overall specificity of 99.28%, an overall accuracy of 98.75%, and an overall F-Score of 97.83% were achieved. For

V T

, the results included a sensitivity of 92.70%, an overall specificity of 99.53%, an overall accuracy of 99.06%, and an overall F-Score of 95.99%.

In the analysis of

V F

and

V T

detection using the Ht_TFR_CNN1 (epochs = 50) and Ht_TFR_CNN1 (epochs = 100) methods, it can be observed that both sensitivity and overall specificity fall within the range of 89.70% to 99.70%. These results are superior to those obtained without utilizing Ht, indicating their considerable acceptability, and consequently, they were chosen for subsequent tests. Regarding the Ht_TFR_CNN1 (epochs = 100) method, the results are better than those obtained using Ht_TFR_CNN1 (epochs = 50), indicating a better learning capability of the training dataset.

Analysis Based on Different CNN Algorithms

Figure 15, Figure 16, Figure 17 and Figure 18 present the confusion matrix derived from one of the five iterations of testing data. Additionally, we enhance the understanding of these findings by presenting Table 6, Table 7, Table 8 and Table 9, and Figure 19 and Figure 20, which summarize the results obtained from comparing the sensitivity, specificity, accuracy, and F-Score values achieved for the respective four classes.

When comparing the classifiers VGGNet and AlexNet with MobilNet and InceptionV3, it is evident that the former two yield better results, demonstrating a higher learning capability with the dataset. Analyzing the values in Table 8 and Table 9, when using the VGGNet classifier for

V T

, a sensitivity of 90.15%, overall specificity of 99.15%, overall accuracy of 98.77%, and overall F-Score of 94.43% were obtained. For

V F

, a sensitivity of 93.34%, overall specificity of 99.25%, overall accuracy of 98.14%, and overall F-Score of 96.20% were achieved. Similarly, using the AlexNet classifier for

V T

resulted in a sensitivity of 91.84%, overall specificity of 99.47%, overall accuracy of 98.94%, and overall F-Score of 95.50%. For

V F

, a sensitivity of 95.58%, an overall specificity of 99.34%, an overall accuracy of 98.64%, and an overall F-Score of 97.42% were obtained.

On the other hand, the classifiers Ht_TFR_CNN1 and Ht_TFR_CNN2 exhibit similar behavior for the classes

N o r m a l

and

O t h e r s

.

For the

N o r m a l

class, they showed a sensitivity of 99.29%, 99.34%; an overall specificity of 98.62%, 98.35%; an overall accuracy of 98.91%, 98.89%; and an overall F-Score of 98.95%, 98.84%, respectively. For the

O t h e r s

class, they displayed a sensitivity of 97.74%, 96.98%; an overall specificity of 99.62%, 99.68%; an overall accuracy of 99.22%, 99.11%; and an overall F-Score of 98.67%, 98.31%, respectively. However, the InceptionV3 classifier has a higher sensitivity of 98.15% for

V T

and a lower sensitivity of 77.28% for

V F

compared to the Ht_TFR_CNN2 classifier, which exhibits a lower sensitivity of 90.45% for

V T

and a higher sensitivity of 98.16% for

V F

. Comparing the results provided by the different algorithms, there is a significant variation in the sensitivity results for

V F

and sensitivity results for

V T

, primarily due to the morphological similarities between the

V T

class and the

V F

class.

6. Discussion

The identification of ventricular arrhythmias generally involves a procedure for extracting and selecting relevant features. In this study, we proposed using the Ht_TFR_CNNi method with (i=1,2) to extract features that capture information about the shape of the ECG signal. This combined method of Ht and TFR with CNN aims to condense the relevant information about the data’s shape, enabling effective detection and discrimination of shockable

V F

and

V T

rhythms, even in the presence of noise and complex signals. The obtained results shown in Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8 and Table 9 demonstrate the use of the CNN classifier with input features obtained from two methods, namely Ht_TFR_CNN1 and TFR_CNN1. The results indicate that the Ht_TFR_CNN1 and Ht_TFR_CNN2 features yield better performance, which is why we compare the Ht_TFR_CNN2 method with other works in the literature. While we employed the CNN classifier to highlight the enhanced classification outcomes compared to prior studies, the investigation of alternative classifiers remains an ongoing avenue that could potentially yield further improvements.

As shown in Table 10, the proposed Ht_TFR_CNN2 method achieves an average accuracy of 98.91% for multi-class discrimination, effectively distinguishing

N o r m a l

,

O t h e r

, and

V T

,

V F

types of ventricular arrhythmia. Additionally, Table 11 presents a two-class classification approach, demonstrating that the Ht_TFR_CNN2 method achieves an accuracy of 99.61% in discriminating shockable (

V T

or

V F

) and non-shockable rhythms. These results indicate that the Ht_TFR_CNN2 method delivers high classification performance. However, we also provide a comparison with other works in the literature, although it is challenging due to differences in the source signals used and the specific discrimination tasks performed. To compare with works focusing on VF discrimination, our work achieved high classification performance [58] by feeding the complete time–frequency image as input to different classifiers (e.g., Sen = 92.8% and Spe = 97.0% for

V F

, and Sen = 91.8% and Spe = 98.7% for

V T

, using an Artificial Neural Network Classifier, ANNC), Arafat et al. [66] achieved (Sens = 80.97%, Spe = 98.51%) for classifying VF episodes utilizing an improved version of the Threshold Crossing Interval (TCI) algorithm. Roopaei et al. [67] obtained an Acc = 88.60% utilizing chaotic-based reconstructed phase space features. In [68] attained Sens=91.9% and Spe =97.1% in detecting VF episodes employing SVM and specific feature-selection classifiers. Li and Rajagopalan [69] utilized a genetic algorithm and obtained Sens = 98.40%, Spe = 98.00%, and Acc = 96.30% in discriminating

V F

episodes. Ibtehaz et al. [70] achieved the highest results in this group, employing SVM and Empirical Mode Decomposition (EMD) classifiers (Sens = 99.99%, Spe = 98.40%, and Acc = 99.19%) for

V F

and non-VF classification. Acharya et al. [71] detected and classified ventricular arrhythmias employing a CNN neural network, achieving Sen = 56.44%, Spe = 98.19%, and Acc = 97.88% for

V F

. Xia et al. [72] obtained high performance values (Sen = 98.15% and Spe = 96.01% for VF, and Sen = 96.01% and Spe = 98.15% for

V T

) using Lempel–Ziv and Empirical Mode Decomposition (EMD) with selected clean episodes of

V T

and

V F

. Mjahad et al. [73] achieved an accuracy, sensitivity, and specificity values of 98.68%, 92.72%, and 99.53%, respectively employing TDA. Kaur and Singh [74] used Empirical Mode Decomposition (EMD) and approximate entropy with selected

V F

and

V T

episodes from the MIT-BIH database, achieving moderate classification performance (Sen = 90.47%, Spe = 91.66%, and Acc = 91.17%). In [75], the authors proposed a fuzzy similarity-based approximate entropy approach and obtained high performance ratios (Sen = 97.98% and Spe = 97.03% for

V F

, and Sen = 97.03% and Spe = 97.98% for

V T

). However, a fair comparison must consider that Xie’s work involved the preselection of clean episodes of

V F

and

V T

. Despite the preselection of ECG episodes in some works, the results of the Ht_TFR_CNN method in this study outperform the other works in the literature aiming to discriminate between

V F

and

V T

rhythms.

Table 11 presents a comparison focusing on detecting

V T

/

V F

episodes, specifically shockable and non-shockable rhythms. This set of works primarily targets the implementation on external defibrillators (AEDs) and implantable cardioverter defibrillators (ICDs), distinguishing between shockable and non-shockable rhythms (considering both

V T

and

V F

as shockable). Mjahad et al. [73] utilized TDA and obtained Sens = 99.03%, Spe = 99.67%, and Acc = 99.51% in discriminating

V F

episodes. Acharya et al. [76] proposed an eleven-layer convolutional neural network (CNN) for shockable and non-shockable arrhythmia classification, obtaining Sen = 95.32%, Spe = 91.04%, and Acc = 93.20%. Tripathy et al. [14] proposed Variational Mode Decomposition (VMD) and the Random Forest (RF) classifier, achieving Sen = 96.54%, Spe = 97.97%, and Acc = 97.23%. Buscema et al. [77] obtained Acc = 99.72% utilizing RNN. Kumar et al. [78] obtained Acc =98.80%, Sen = 98.60%, and Spe =98.90% employing CNN and IENN. Alonso-Atienza et al. [68] also obtained accuracy, sensitivity, and specificity values of 98.6%, 95.0%, and 99.0%, respectively, employing feature selection and an SVM classifier. Cheng and Dong. [79] achieved an accuracy of 95.50% employing a personalized features SVM. Mohanty et al. [16] detected and classified ventricular arrhythmias employing a cubic Support Vector Machine (SVM) and C4.5 classifiers, achieving Sen = 90.97%, Spe = 97.86%, and Acc = 97.02%. Li et al. [69] attained Sen = 98.4%, Spe = 98.0%, and Acc = 98.1% employing a genetic algorithm (GA) for feature selection and an SVM classifier. Xu et al. [80] attained high performance values (and Acc = 98.29%, Sen = 97.32% and Spe = 98.95%) utilizing adaptive variational and boosted CART.

The results of the Ht_TFR_CNN2 proposal in this work outperform those of other works in this group as well, achieving an accuracy of 99.61%, a sensitivity of 99.74%, and a specificity of 99.61%. Therefore, the benefits of using the Ht_TFR_CNN2 method in the classification procedure are evident. Ht_TFR_CNN2 can be successfully employed in the detection and classification of ventricular arrhythmia, as well as in the classification of shockable episodes. This illustrates that the fusion of CNN and TRF yields a resilient signal characterization, implying a potential and encouraging utilization of these attributes in Automated External Defibrillation (AED) and Implantable Cardioverter Defibrillation (ICD) treatments.

7. Application in a Real Clinical Setting

In real clinical settings, Artificial Intelligence (AI), specifically convolutional neural networks (CNNs), offers significant potential for enhancing patient care by detecting ventricular fibrillation (

V F

) in individuals at risk of cardiac arrest [81]. This approach facilitates swift

V F

identification through the rapid analysis of electrocardiograms (ECG) in emergency departments. AI models trained on diverse VF patterns can improve accuracy compared to manual interpretation by clinicians. AI-powered monitoring systems can continuously analyze ECG signals in critically ill patients, automatically alerting healthcare providers for

V F

detection, which is particularly valuable in intensive care units. Moreover, AI-assisted

V F

detection streamlines healthcare efficiency by helping prioritize patients based on urgency. Despite this promise, integrating AI-based VF detection requires overcoming challenges such as rigorous validation and regulatory approvals to ensure safety. Collaboration among clinicians, data scientists, and regulatory bodies is crucial for successful and safe AI implementation in healthcare. The aforementioned factors contribute to the efficacy of both Automated External Defibrillators (AEDs) and Implantable Cardioverter-Defibrillators (ICDs). In [82], a ’genetic’ programming (GP) model is employed to predict favorable defibrillation outcomes for patients with ventricular fibrillation (

V F

). In [82], the efficacy of a programmable automatic external cardioverter–defibrillator (AECD) is investigated within in-hospital cardiac arrest scenarios involving ventricular fibrillation (

V F

) and ventricular tachycardia (

V T

). Continuous research is necessary to refine AI algorithms, as demonstrated in this article, where the Pseudo Wigner–Ville (

P W V

) exhibited effective real-time classification without extensive computational time.

8. Conclusions

The accurate interpretation and differentiation of ventricular arrhythmias, such as VF and VT, are crucial for patient safety. In this paper, we introduced an innovative approach to feature extraction, seamlessly integrating RTF and CNN techniques, for VF detection. We observed a sensitivity rate of 98.16%, a specificity of 99.07%, and an accuracy of 98.91%; for ventricular tachycardia (VT), the sensitivity was noted at 90.45%, the specificity was 99.73%, and the accuracy was 99.09%; for normal sinus rhythms, the sensitivity was 99.34%, the specificity was 98.35%, and the accuracy was 98.89%; finally, for other rhythms, the sensitivity was 96.98%, the specificity was 99.68%, and the accuracy was 99.11%. Moreover, this study showcases an impressively high accuracy of 99.61%, with a sensitivity of 99.23% and a specificity of 99.74%, effectively discerning between shockable (

V T

/

V F

) and non-shockable rhythms.

The application of this innovative approach yields slightly or significantly improved results compared to previous comparable works using the Pseudo-Wigner–Ville t-f representation and a diverse range of CNNs. This indicates that the benefits of our methodology are independent of the classifier used. Additionally, our proposed methodology provides real-time detection of VF with low computational time, effectively differentiating it from other cardiac pathologies. This significantly enhances the accuracy of diagnosing patients experiencing these arrhythmias.

It is worth noting that these powerful results were achieved without the need for the preselection of episodes. Based on our findings, we conclude that this technique can be successfully applied to both the detection and classification of ventricular arrhythmia, including shockable rhythms. Moreover, it offers valuable features that facilitate the classification task. Despite the higher computational complexity during training, this technique has the potential to yield superior results not only in the field of ventricular arrhythmia detection but also in various bioengineering applications that currently involve a stage of feature selection and extraction prior to classification.

Author Contributions

Conceptualization, A.M., A.R.-M. and M.S.; methodology, A.M. and A.R.-M.; software, A.M.; validation, A.M., A.R.-M. and M.S.; formal analysis, A.M., A.R.-M. and M.S.; investigation, A.M. and A.R.-M.; resources, M.S. and H.A.; data curation, A.M. and A.R.-M.; writing—original draft preparation, A.M., M.S. and A.R.-M.; writing—review and editing, A.M., M.S., H.A. and A.R.-M.; visualization, H.A.; supervision, A.R.-M.; project administration, A.R.-M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Ethical review and approval were waived for this study due to the use od standard Physionet anonymized databse.

Informed Consent Statement

Patient consent was waived due to the use od standard Physionet anonymized databse.

Data Availability Statement

Publicly available data in Physionet: http://physionet.org/.

Conflicts of Interest

The authors declare no conflict of interest.

References

Doolan, A.; Semsarian, C.; Langlois, N. Causes of sudden cardiac death in young Australians. Med. J. Aust. 2004, 180, 110–112. [Google Scholar] [CrossRef] [PubMed]
Beck, C.S.; Pritchard, W.H.; Feil, H.S. Ventricular Fibrillation of Long Duration Abolished by Electric Shock. J. Am. Med. Assoc. 1947, 135, 985–986. [Google Scholar] [CrossRef] [PubMed]
Kerber, R.E.; Becker, L.B.; Bourland, J.D.; Cummins, R.O.; Hallstrom, A.P.; Michos, M.B.; Nichol, G.; Ornato, J.P.; Thies, W.H.; White, R.D.; et al. Automatic external defibrillators for public access defibrillation: Recommendations for specifying and reporting arrhythmia analysis algorithm performance, incorporating new waveforms, and enhancing safety: A statement for health professionals from the American Heart Association Task Force on Automatic External Defibrillation, Subcommittee on AED Safety and Efficacy. Circulation 1997, 95, 1677–1682. [Google Scholar] [PubMed]
Jin, D.; Dai, C.; Gong, Y.; Lu, Y.; Zhang, L.; Quan, W.; Li, Y. Does the choice of definition for defibrillation and CPR success impact the predictability of ventricular fibrillation waveform analysis? Resuscitation 2017, 111, 48–54. [Google Scholar] [CrossRef] [PubMed]
Amann, A.; Tratnig, R.; Unterkofler, K. Reliability of old and new ventricular fibrillation detection algorithms for automated external defibrillators. Biomed. Eng. Online 2005, 4, 1–15. [Google Scholar] [CrossRef] [PubMed]
Pourmand, A.; Galvis, J.; Yamane, D. The controversial role of dual sequential defibrillation in shockable cardiac arrest. Am. J. Emerg. Med. 2018, 36, 1674–1679. [Google Scholar] [CrossRef] [PubMed]
Lee, S.H.; Chung, K.Y.; Lim, J.S. Detection of ventricular fibrillation using Hilbert transforms, phase-space reconstruction, and time-domain analysis. Pers. Ubiquitous Comput. 2014, 18, 1315–1324. [Google Scholar] [CrossRef]
Othman, M.A.; Safri, N.M.; Ghani, I.A.; Harun, F.K.C. Characterization of ventricular tachycardia and fibrillation using semantic mining. Comput. Inf. Sci. 2012, 5, 35. [Google Scholar] [CrossRef]
Shyu, L.Y.; Wu, Y.H.; Hu, W. Using wavelet transform and fuzzy neural network for VPC detection from the Holter ECG. IEEE Trans. Biomed. Eng. 2004, 51, 1269–1273. [Google Scholar] [CrossRef] [PubMed]
Lim, J.S. Finding features for real-time premature ventricular contraction detection using a fuzzy neural network system. IEEE Trans. Neural Netw. 2009, 20, 522–527. [Google Scholar] [CrossRef] [PubMed]
Rosado-Munoz, A.; Martínez-Martínez, J.M.; Escandell-Montero, P.; Soria-Olivas, E. Visual data mining with self-organising maps for ventricular fibrillation analysis. Comput. Methods Programs Biomed. 2013, 111, 269–279. [Google Scholar] [CrossRef]
Orozco-Duque, A.; Rúa, S.; Zuluaga, S.; Redondo, A.; Restrepo, J.V.; Bustamante, J. Support Vector Machine and Artificial Neural Network Implementation in Embedded Systems for Real Time Arrhythmias Detection. In Proceedings of the International Conference on Bio-Inspired Systems and Signal Processing, Barcelona, Spain, 11–14 February 2013; pp. 310–313. [Google Scholar]
Pooyan, M.; Akhoondi, F. Providing an efficient algorithm for finding R peaks in ECG signals and detecting ventricular abnormalities with morphological features. J. Med. Signals Sens. 2016, 6, 218. [Google Scholar] [CrossRef]
Tripathy, R.; Sharma, L.; Dandapat, S. Detection of shockable ventricular arrhythmia using variational mode decomposition. J. Med. Syst. 2016, 40, 1–13. [Google Scholar] [CrossRef]
Jekova, I.; Krasteva, V. Real time detection of ventricular fibrillation and tachycardia. Physiol. Meas. 2004, 25, 1167. [Google Scholar] [CrossRef] [PubMed]
Mohanty, M.; Sahoo, S.; Biswal, P.; Sabut, S. Efficient classification of ventricular arrhythmias using feature selection and C4.5 classifier. Biomed. Signal Process. Control 2018, 44, 200–208. [Google Scholar] [CrossRef]
Jothiramalingam, R.; Jude, A.; Patan, R.; Ramachandran, M.; Duraisamy, J.H.; Gandomi, A.H. Machine learning-based left ventricular hypertrophy detection using multi-lead ECG signal. Neural Comput. Appl. 2021, 33, 4445–4455. [Google Scholar] [CrossRef]
Tang, J.; Li, J.; Liang, B.; Huang, X.; Li, Y.; Wang, K. Using Bayesian decision for ontology mapping. J. Web Semant. 2006, 4, 243–262. [Google Scholar] [CrossRef]
Kuzilek, J.; Kremen, V.; Soucek, F.; Lhotska, L. Independent component analysis and decision trees for ECG holter recording de-noising. PLoS ONE 2014, 9, e98450. [Google Scholar] [CrossRef] [PubMed]
Ayachi, R.; Said, Y.E.; Atri, M. To perform road signs recognition for autonomous vehicles using cascaded deep learning pipeline. Artif. Intell. Adv. 2019, 1, 1–10. [Google Scholar] [CrossRef]
Afif, M.; Ayachi, R.; Said, Y.; Pissaloux, E.; Atri, M. Indoor Image Recognition and Classification via Deep Convolutional Neural Network; Springer: Berlin/Heidelberg, Germany, 2020; pp. 364–371. [Google Scholar]
Afif, M.; Ayachi, R.; Said, Y.; Pissaloux, E.; Atri, M. An evaluation of retinanet on indoor object detection for blind and visually impaired persons assistance navigation. Neural Process. Lett. 2020, 51, 2265–2279. [Google Scholar] [CrossRef]
Virmani, D.; Girdhar, P.; Jain, P.; Bamdev, P. FDREnet: Face detection and recognition pipeline. Eng. Technol. Appl. Sci. Res. 2019, 9, 3933–3938. [Google Scholar] [CrossRef]
Khan, U.; Khan, K.; Hassan, F.; Siddiqui, A.; Afaq, M. Towards achieving machine comprehension using deep learning on non-GPU machines. Eng. Technol. Appl. Sci. Res. 2019, 9, 4423–4427. [Google Scholar] [CrossRef]
Moon, H.M.; Seo, C.H.; Pan, S.B. A face recognition system based on convolution neural network using multiple distance face. Soft Comput. 2017, 21, 4995–5002. [Google Scholar] [CrossRef]
Khalajzadeh, H.; Mansouri, M.; Teshnehlab, M. Face recognition using convolutional neural network and simple logistic classifier. In Proceedings of the Soft Computing in Industrial Applications: Proceedings of the 17th Online World Conference on Soft Computing in Industrial Applications; Springer: Berlin/Heidelberg, Germany, 2014; pp. 197–207. [Google Scholar]
Yale Face Database. Available online: http://vision.ucsd.edu/content/yale-face-database (accessed on 14 June 2023).
Yan, K.; Huang, S.; Song, Y.; Liu, W.; Fan, N. Face recognition based on convolution neural network. In Proceedings of the IEEE 2017 36th Chinese Control Conference (CCC), Dalian, China, 26–28 July 2017; pp. 4077–4081. [Google Scholar]
AT&T Database of Faces: ORL Face Database. Available online: http://cam-orl.co.uk/facedatabase.html (accessed on 14 June 2023).
Martinez, A.; Benavente, R. The AR Face Database; Technical Report Series; Report #24; CVC Tech: Fontana, CA, USA, 1998. [Google Scholar]
Li, L.; Jun, Z.; Fei, J.; Li, S. An incremental face recognition system based on deep learning. In Proceedings of the IEEE 2017 15th IAPR International Conference on Machine Vision Applications (MVA), Nagoya, Japan, 8–12 May 2017; pp. 238–241. [Google Scholar]
Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]
Nakada, M.; Wang, H.; Terzopoulos, D. AcFR: Active face recognition using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 35–40. [Google Scholar]
Gross, R.; Matthews, I.; Cohn, J.; Kanade, T.; Baker, S. Multi-pie. Image Vis. Comput. 2010, 28, 807–813. [Google Scholar] [CrossRef]
Li, J.; Qiu, T.; Wen, C.; Xie, K.; Wen, F.Q. Robust face recognition using the deep C2D-CNN model based on decision-level fusion. Sensors 2018, 18, 2080. [Google Scholar] [CrossRef] [PubMed]
Lu, J.; Tan, L.; Jiang, H. Review on convolutional neural network (CNN) applied to plant leaf disease classification. Agriculture 2021, 11, 707. [Google Scholar] [CrossRef]
Guan, S.; Kamona, N.; Loew, M. Segmentation of thermal breast images using convolutional and deconvolutional neural networks. In Proceedings of the 2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA, 9–11 October 2018; pp. 1–7. [Google Scholar]
Rahman, T.; Chowdhury, M.E.; Khandakar, A.; Islam, K.R.; Islam, K.F.; Mahbub, Z.B.; Kadir, M.A.; Kashem, S. Transfer learning with deep convolutional neural network (CNN) for pneumonia detection using chest X-ray. Appl. Sci. 2020, 10, 3233. [Google Scholar] [CrossRef]
Desai, M.; Shah, M. An anatomization on breast cancer detection and diagnosis employing multi-layer perceptron neural network (MLP) and Convolutional neural network (CNN). Clin. eHealth 2021, 4, 1–11. [Google Scholar] [CrossRef]
Budak, Ü.; Cömert, Z.; Çıbuk, M.; Şengür, A. DCCMED-Net: Densely connected and concatenated multi Encoder-Decoder CNNs for retinal vessel extraction from fundus images. Med. Hypotheses 2020, 134, 109426. [Google Scholar] [CrossRef] [PubMed]
Nguyen, T.P.; Choi, S.; Park, S.J.; Park, S.H.; Yoon, J. Inspecting method for defective casting products with convolutional neural network (CNN). Int. J. Precis. Eng. Manuf.-Green Technol. 2021, 8, 583–594. [Google Scholar] [CrossRef]
Victoria, A.H.; Maragatham, G. Automatic tuning of hyperparameters using Bayesian optimization. Evol. Syst. 2021, 12, 217–223. [Google Scholar] [CrossRef]
Young, S.R.; Rose, D.C.; Karnowski, T.P.; Lim, S.H.; Patton, R.M. Optimizing deep learning hyper-parameters through an evolutionary algorithm. In Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments, Austin, TX, USA, 15 November 2015; pp. 1–5. [Google Scholar]
Cui, H.; Bai, J. A new hyperparameters optimization method for convolutional neural networks. Pattern Recognit. Lett. 2019, 125, 828–834. [Google Scholar] [CrossRef]
Lee, W.Y.; Park, S.M.; Sim, K.B. Optimal hyperparameter tuning of convolutional neural networks based on the parameter-setting-free harmony search algorithm. Optik 2018, 172, 359–367. [Google Scholar] [CrossRef]
Kiliçarslan, S.; Celik, M. RSigELU: A nonlinear activation function for deep neural networks. Expert Syst. Appl. 2021, 174, 114805. [Google Scholar] [CrossRef]
Zou, X.; Wang, Z.; Li, Q.; Sheng, W. Integration of residual network and convolutional neural network along with various activation functions and global pooling for time series classification. Neurocomputing 2019, 367, 39–45. [Google Scholar] [CrossRef]
Basha, S.S.; Vinakota, S.K.; Dubey, S.R.; Pulabaigari, V.; Mukherjee, S. Autofcl: Automatically tuning fully connected layers for handling small dataset. Neural Comput. Appl. 2021, 33, 8055–8065. [Google Scholar] [CrossRef]
Dahl, G.E.; Sainath, T.N.; Hinton, G.E. Improving deep neural networks for LVCSR using rectified linear units and dropout. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 8609–8613. [Google Scholar]
Radiuk, P.M. Impact of Training Set Batch Size on the Performance of Convolutional Neural Networks for Diverse Datasets; Information Technology and Management Science: Riga, Latvia, 2017. [Google Scholar]
Utama, A.B.P.; Wibawa, A.P.; Muladi, M.; Nafalski, A. PSO based Hyperparameter tuning of CNN Multivariate Time-Series Analysis. J. Online Inform. 2022, 7, 193–202. [Google Scholar] [CrossRef]
Zeiler, M.D. Adadelta: An adaptive learning rate method. arXiv 2012, arXiv:1212.5701. [Google Scholar]
Gulcehre, C.; Moczulski, M.; Bengio, Y. Adasecant: Robust adaptive secant method for stochastic gradient. arXiv 2014, arXiv:1412.7419. [Google Scholar]
Dhar, P.; Dutta, S.; Mukherjee, V. Cross-wavelet assisted convolution neural network (AlexNet) approach for phonocardiogram signals classification. Biomed. Signal Process. Control 2021, 63, 102142. [Google Scholar] [CrossRef]
Anand, R.; Sowmya, V.; Gopalakrishnan, E.; Soman, K. Modified Vgg deep learning architecture for Covid-19 classification using bio-medical images. Iop Conf. Ser. Mater. Sci. Eng. 2021, 1084, 012001. [Google Scholar] [CrossRef]
Vijayan, T.; Sangeetha, M.; Karthik, B. Efficient analysis of diabetic retinopathy on retinal fundus images using deep learning techniques with inception v3 architecture. J. Green Eng. 2020, 10, 9615–9625. [Google Scholar]
Gómez, J.C.V.; Incalla, A.P.Z.; Perca, J.C.C.; Padilla, D.I.M. Diferentes Configuraciones Para MobileNet en la Detección de Tumores Cerebrales: Different Configurations for MobileNet in the Detection of Brain Tumors. In Proceedings of the 2021 IEEE 1st International Conference on Advanced Learning Technologies on Education & Research, Lima, Peru, 16–18 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–4. [Google Scholar]
Mjahad, A.; Rosado-Muñoz, A.; Bataller-Mompeán, M.; Francés-Víllora, J.; Guerrero-Martínez, J. Ventricular Fibrillation and Tachycardia detection from surface ECG using time–frequency representation images as input dataset for machine learning. Comput. Methods Programs Biomed. 2017, 141, 119–127. [Google Scholar] [CrossRef] [PubMed]
PhysioNet. Available online: http://physionet.org (accessed on 14 June 2023).
American Heart Association ECG Database. Available online: http://ecri.org (accessed on 14 June 2023).
Kaur, M.; Singh, B. Comparison of different approaches for removal of baseline wander from ECG signal. In Proceedings of the International Conference & Workshop on Emerging Trends in Technology, Mumbai, India, 25–26 February 2011; pp. 1290–1294. [Google Scholar]
Narwaria, R.P.; Verma, S.; Singhal, P. Removal of baseline wander and power line interference from ECG signal-a survey approach. Int. J. Electron. Eng. 2011, 3, 107–111. [Google Scholar]
Ramezan, C.A.; Warner, T.A.; Maxwell, A.E. Evaluation of Sampling and Cross-Validation Tuning Strategies for Regional-Scale Machine Learning Classification. Remote. Sens. 2019, 11, 185. [Google Scholar] [CrossRef]
Labatut, V.; Cherifi, H. Accuracy measures for the comparison of classifiers. arXiv 2012, arXiv:1207.3790. [Google Scholar]
Rebouças Filho, P.P.; Peixoto, S.A.; da Nóbrega, R.V.M.; Hemanth, D.J.; Medeiros, A.G.; Sangaiah, A.K.; de Albuquerque, V.H.C. Automatic histologically-closer classification of skin lesions. Comput. Med Imaging Graph. 2018, 68, 40–54. [Google Scholar] [CrossRef]
Arafat, M.A.; Chowdhury, A.W.; Hasan, M.K. A simple time domain algorithm for the detection of ventricular fibrillation in electrocardiogram. Signal Image Video Process. 2011, 5, 1–10. [Google Scholar] [CrossRef]
Roopaei, M.; Boostani, R.; Sarvestani, R.R.; Taghavi, M.A.; Azimifar, Z. Chaotic based reconstructed phase space features for detecting ventricular fibrillation. Biomed. Signal Process. Control 2010, 5, 318–327. [Google Scholar] [CrossRef]
Alonso-Atienza, F.; Morgado, E.; Fernandez-Martinez, L.; Garcia-Alberola, A.; Rojo-Alvarez, J.L. Detection of life-threatening arrhythmias using feature selection and support vector machines. IEEE Trans. Biomed. Eng. 2013, 61, 832–840. [Google Scholar] [CrossRef] [PubMed]
Li, Q.; Rajagopalan, C.; Clifford, G.D. Ventricular fibrillation and tachycardia classification using a machine learning approach. IEEE Trans. Biomed. Eng. 2013, 61, 1607–1613. [Google Scholar] [PubMed]
Ibtehaz, N.; Rahman, M.S.; Rahman, M.S. VFPred: A fusion of signal processing and machine learning techniques in detecting ventricular fibrillation from ECG signals. Biomed. Signal Process. Control 2019, 49, 349–359. [Google Scholar] [CrossRef]
Acharya, U.R.; Fujita, H.; Lih, O.S.; Hagiwara, Y.; Tan, J.H.; Adam, M. Automated detection of arrhythmias using different intervals of tachycardia ECG segments with convolutional neural network. Inf. Sci. 2017, 405, 81–90. [Google Scholar] [CrossRef]
Xia, D.; Meng, Q.; Chen, Y.; Zhang, Z. Classification of ventricular tachycardia and fibrillation based on the lempel-ziv complexity and EMD. In Proceedings of the Intelligent Computing in Bioinformatics: 10th International Conference, ICIC 2014, Taiyuan, China, 3–6 August 2014; Springer: Berlin/Heidelberg, Germany, 2014. Proceedings 10. pp. 322–329. [Google Scholar]
Mjahad, A.; Frances-Villora, J.V.; Bataller-Mompean, M.; Rosado-Muñoz, A. Ventricular Fibrillation and Tachycardia Detection Using Features Derived from Topological Data Analysis. Appl. Sci. 2022, 12, 7248. [Google Scholar] [CrossRef]
Kaur, L.; Singh, V. Ventricular fibrillation detection using emprical mode decomposition and approximate entropy. Int. J. Emerg. Technol. Adv. Eng. 2013, 3, 260–268. [Google Scholar]
Xie, H.B.; Zhong-Mei, G.; Liu, H. Classification of ventricular tachycardia and fibrillation using fuzzy similarity-based approximate entropy. Expert Syst. Appl. 2011, 38, 3973–3981. [Google Scholar] [CrossRef]
Acharya, U.R.; Fujita, H.; Oh, S.L.; Raghavendra, U.; Tan, J.H.; Adam, M.; Gertych, A.; Hagiwara, Y. Automated identification of shockable and non-shockable life-threatening ventricular arrhythmias using convolutional neural network. Future Gener. Comput. Syst. 2018, 79, 952–959. [Google Scholar] [CrossRef]
Buscema, P.M.; Grossi, E.; Massini, G.; Breda, M.; Della Torre, F. Computer aided diagnosis for atrial fibrillation based on new artificial adaptive systems. Comput. Methods Programs Biomed. 2020, 191, 105401. [Google Scholar] [CrossRef] [PubMed]
Kumar, M.; Pachori, R.B.; Acharya, U.R. Automated diagnosis of atrial fibrillation ECG signals using entropy features extracted from flexible analytic wavelet transform. Biocybern. Biomed. Eng. 2018, 38, 564–573. [Google Scholar] [CrossRef]
Cheng, P.; Dong, X. Life-threatening ventricular arrhythmia detection with personalized features. IEEE Access 2017, 5, 14195–14203. [Google Scholar] [CrossRef]
Xu, Y.; Wang, D.; Zhang, W.; Ping, P.; Feng, L. Detection of ventricular tachycardia and fibrillation using adaptive variational mode decomposition and boosted-CART classifier. Biomed. Signal Process. Control 2018, 39, 219–229. [Google Scholar] [CrossRef]
Brown, G.; Conway, S.; Ahmad, M.; Adegbie, D.; Patel, N.; Myneni, V.; Alradhawi, M.; Kumar, N.; Obaid, D.R.; Pimenta, D.; et al. Role of artificial intelligence in defibrillators: A narrative review. Open Heart 2022, 9, e001976. [Google Scholar] [CrossRef]
Podbregar, M.; Kovačič, M.; Podbregar-Marš, A.; Brezocnik, M. Predicting defibrillation success by ‘genetic’ programming in patients with out-of-hospital cardiac arrest. Resuscitation 2003, 57, 153–159. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Artificial Neural Network (ANN).

Figure 2. The process of a convolution operation.

Figure 3. Activation functions commonly applied to neural networks: (a) rectified linear unit (ReLu), (b) Sigmoid, and (c) hyperbolic tangent (Tanh).

Figure 4. The process of pooling operation.

Figure 5.

P W V

distribution of the ECG

N o r m a l

signal directly processed without the Hilbert transform.

P W V

distribution of the

N o r m a l

analytic signal using the Hilbert transform.

Figure 5.

P W V

distribution of the ECG

N o r m a l

signal directly processed without the Hilbert transform.

P W V

distribution of the

N o r m a l

analytic signal using the Hilbert transform.

Figure 6. A comprehensive diagram outlines the series of processing steps applied in the detection of ventricular fibrillation.

Figure 7. IIR bandpass filter applied to a Normal-type ECG. The original temporal signal is plotted in blue, and the filtered output signal is shown in red. The frequency response of the filter is displayed below.

Figure 8. In the illustration, the columns, from top to bottom, represent the original ECG time signal window, TFR (

150 \times 150

), TFR + Ht (

150 \times 150

), TFR + Ht (

45 \times 150

), and TRFI (

45 \times 150

), respectively. From left to right, they correspond to the classes

N o r m a l

,

O t h e r

,

V T

, and

V F

, respectively.

Figure 8. In the illustration, the columns, from top to bottom, represent the original ECG time signal window, TFR (

150 \times 150

), TFR + Ht (

150 \times 150

), TFR + Ht (

45 \times 150

), and TRFI (

45 \times 150

), respectively. From left to right, they correspond to the classes

N o r m a l

,

O t h e r

,

V T

, and

V F

, respectively.

Figure 9. Loss function diagram. The figure shows the function image of the model training CNN2; the train loss is 0.02, and the val loss is 0.1.

Figure 10. Accuracy function. The figure shows the function image of model training Ht_TFR_CNN2; the train accuracy is 100%, and the val accuracy is 98%.

Figure 11. Confusion matrix for classifying

N o r m a l

,

O t h e r

,

V T

, and

V F

classes utilizing the TFR_CNN1 technique (Epochs = 50).

Figure 11. Confusion matrix for classifying

N o r m a l

,

O t h e r

,

V T

, and

V F

classes utilizing the TFR_CNN1 technique (Epochs = 50).

Figure 12. Confusion matrix for classifying

N o r m a l

,

O t h e r

,

V T

, and

V F

classes utilizing the Ht_TFR_CNN1 technique (Epochs = 50).

Figure 12. Confusion matrix for classifying

N o r m a l

,

O t h e r

,

V T

, and

V F

classes utilizing the Ht_TFR_CNN1 technique (Epochs = 50).

Figure 13. Confusion matrix for classifying

N o r m a l

,

O t h e r

,

V T

, and

V F

classes utilizing the Ht_TFR_CNN1 technique (Epochs = 100).

Figure 13. Confusion matrix for classifying

N o r m a l

,

O t h e r

,

V T

, and

V F

classes utilizing the Ht_TFR_CNN1 technique (Epochs = 100).

Figure 14. Confusion matrix for classifying

N o r m a l

,

O t h e r

,

V T

, and

V F

classes utilizing the Ht_TFR_CNN2 method (Epochs = 100).

Figure 14. Confusion matrix for classifying

N o r m a l

,

O t h e r

,

V T

, and

V F

classes utilizing the Ht_TFR_CNN2 method (Epochs = 100).

Figure 15. Confusion matrix for classifying

N o r m a l

,

O t h e r

,

V T

, and

V F

classes utilizing the VGGNet method (Epochs = 6).

Figure 15. Confusion matrix for classifying

N o r m a l

,

O t h e r

,

V T

, and

V F

classes utilizing the VGGNet method (Epochs = 6).

Figure 16. Confusion matrix for classifying

N o r m a l

,

O t h e r

,

V T

, and

V F

classes utilizing the Alexnet method (Epochs = 6).

Figure 16. Confusion matrix for classifying

N o r m a l

,

O t h e r

,

V T

, and

V F

classes utilizing the Alexnet method (Epochs = 6).

Figure 17. Confusion matrix for classifying

N o r m a l

,

O t h e r

,

V T

, and

V F

classes utilizing the Mobilnet method (Epochs = 6).

Figure 17. Confusion matrix for classifying

N o r m a l

,

O t h e r

,

V T

, and

V F

classes utilizing the Mobilnet method (Epochs = 6).

Figure 18. Confusion matrix for classifying

N o r m a l

,

O t h e r

,

V T

, and

V F

classes utilizing the InceptionV3 method (Epochs = 6).

Figure 18. Confusion matrix for classifying

N o r m a l

,

O t h e r

,

V T

, and

V F

classes utilizing the InceptionV3 method (Epochs = 6).

Figure 19. Results achieved for the classification of the

V T

class during testing.

Figure 19. Results achieved for the classification of the

V T

class during testing.

Figure 20. Results achieved for the classification of the

V F

class during testing.

Figure 20. Results achieved for the classification of the

V F

class during testing.

Table 1. Details concerning the proposed CNN1 and CNN2 architecture.

Model	CNN1
Layer	Kernel Size	Filter Number	#Parameters
FC1	512	-	16589312
FC2	256	-	131328
Softmax	4	-	1285
Model	CNN2
Layer	Kernel Size	Filter Number	#Parameters
Conv1	3 × 3	32	320
Max Pooling1	4 × 4	-	0
Conv2	3 × 3	64	18496
Max Pooling2	4 × 4	-	0
FC1	128	-	991360
FC2	256	-	33024
Softmax	4	-	1028

Table 2. Results achieved for the classification of the

N o r m a l

class during testing.

Table 2. Results achieved for the classification of the

N o r m a l

class during testing.

Class	Normal
Algorithms	Sensitivity (%)	Specificity (%)				Accuracy (%)	F Score (%)
Algorithms	Normal	Global	VF	VT	Other	Total	Total
Ht_TFR_CNN1 (Epochs = 50)	89.70	98.57	99.53	99.48	97.73	98.76	93.92
Ht_TFR_CNN1 (Epochs = 100)	99.29	98.62	98.88	99.33	98.03	98.91	98.95
Ht_TFR_CNN2 (Epochs = 100)	99.34	98.35	99.59	99.83	99.59	98.89	98.84
TFR_CNN1 (Epochs = 50)	98.70	98.59	99.46	98.73	97.73	98.65	98.64

Table 3. Results achieved for the classification of the

O t h e r

class during testing.

Table 3. Results achieved for the classification of the

O t h e r

class during testing.

Class	Other
Algorithms	Sensitivity (%)	Specificity (%)				Accuracy (%)	F Score (%)
Algorithms	Other	Global	VT	Normal	VF	Total	Total
Ht_TFR_CNN1 (Epochs = 50)	97.24	99.41	99.82	99.29	99.65	98.95	98.31
Ht_TFR_CNN1 (Epochs = 100)	97.74	99.62	99.83	99.60	99.58	99.22	98.67
Ht_TFR_CNN2 (Epochs = 100)	96.98	99.68	99.96	99.61	99.79	99.11	98.31
TFR_CNN1 (Epochs = 50)	97.24	99.47	100	99.33	99.73	98.98	98.34

Table 4. Results achieved for the classification of the VT class during testing.

Class	VT
Algorithms	Sensitivity (%)	Specificity (%)				Accuracy (%)	F Score (%)
Algorithms	VT	Global	VF	Other	Normal	Total	Total
Ht_TFR_CNN1 (Epochs = 50)	89.70	99.70	96.71	99.84	99.94	99.00	94.43
Ht_TFR_CNN1 (Epochs = 100)	92.70	99.53	97.78	99.94	99.92	99.06	95.99
Ht_TFR_CNN2 (Epochs = 100)	90.45	99.73	96.92	99.94	99.98	99.09	94.86
TFR_CNN1 (Epochs = 50)	95.84	97.19	98.55	99.84	99.84	97.90	96.51

Table 5. Results achieved for the classification of the

V F

class during testing.

Table 5. Results achieved for the classification of the

V F

class during testing.

Class	VF
Algorithms	Sensitivity (%)	Specificity (%)				Accuracy (%)	F Score (%)
Algorithms	VF	Global	VT	Other	Normal	Total	Total
Ht_TFR_CNN1 (Epochs = 50)	98.04	98.94	90.96	99.64	99.68	98.77	98.48
Ht_TFR_CNN1 (Epochs = 100)	96.44	99.28	94.01	99.74	99.76	98.75	97.83
Ht_TFR_CNN2 (Epochs = 100)	98.16	99.07	91.56	99.74	99.83	98.91	98.61
TFR_CNN1 (Epochs = 50)	85.88	99.30	96.58	99.64	99.52	96.82	92.10

Table 6. Results obtained for the classification of the

N o r m a l

class in testing.

Table 6. Results obtained for the classification of the

N o r m a l

class in testing.

Class	Normal
Techniques	Sensitivity (%)	Specificity (%)				Accuracy (%)	F Score (%)
Techniques	Normal	Global	VF	VT	Other	Total	Total
Ht_TFR_CNN1 (Epochs = 100)	99.29	98.62	98.88	99.33	98.03	98.91	98.95
Ht_TFR_CNN2 (Epochs = 100)	99.34	98.35	99.59	99.83	99.59	98.89	98.84
InceptionV3 (Epochs = 6)	77.99	99.65	99.92	39.30	99.32	87.17	87.49
MobilNet (Epochs = 6)	79.42	99.44	99.08	99.36	99.64	88.39	88.30
VGGnet (Epochs = 6)	96.61	98.32	97.97	100	98.59	97.39	97.45
AlexNet (Epochs = 6)	99.43	97.29	98.69	100	95.83	98.45	98.34

Table 7. Results obtained for the classification of the

O t h e r

class in testing.

Table 7. Results obtained for the classification of the

O t h e r

class in testing.

Class	Other
Techniques	Sensitivity (%)	Specificity (%)				Accuracy (%)	F Score (%)
Techniques	Other	Global	VT	Normal	VF	Total	Total
Ht_TFR_CNN1 (Epochs = 100)	97.74	99.62	99.83	99.60	99.58	99.22	98.67
Ht_TFR_CNN2 (Epochs = 100)	96.98	99.68	99.96	99.61	99.79	99.11	98.31
InceptionV3 (Epochs = 6)	88.42	99.81	100	99.72	100	96.96	93.77
MobilNet (Epochs = 6)	99.64	85.08	98.41	79.60	97.68	88.21	91.78
VGGnet (Epochs = 6)	98.54	97.57	100	96.74	99.26	97.39	98.05
AlexNet (Epochs = 6)	95.78	99.57	100	99.58	99.40	98.77	97.63

Table 8. Results obtained for the classification of the

V T

class in testing.

Table 8. Results obtained for the classification of the

V T

class in testing.

Class	VT
Techniques	Sensitivity (%)	Specificity (%)				Accuracy (%)	F Score (%)
Techniques	VT	Global	VF	Other	Normal	Total	Total
Ht_TFR_CNN1 (Epochs = 100)	92.70	99.53	97.78	99.94	99.92	99.06	95.99
HT_TFR_CNN2 (Epochs = 100)	90.45	99.73	96.92	99.94	99.98	99.09	94.86
InceptionV3 (Epochs = 6)	98.15	83.55	99.11	99.04	80.18	84.59	90.26
MobilNet (Epochs = 6)	95.53	97.66	98.89	100	99.90	97.49	96.58
VGGnet (Epochs = 6)	90.15	99.15	97.07	99.94	100	98.77	94.43
AlexNet (Epochs = 6)	91.84	99.47	97.54	99.94	100	98.94	95.50

Table 9. Results obtained for the classification of the

V F

class in testing.

Table 9. Results obtained for the classification of the

V F

class in testing.

Class	VF
Techniques	Sensitivity (%)	Specificity (%)				Accuracy (%)	F Score (%)
Techniques	VF	Global	VT	Other	Normal	Total	Total
Ht_TFR_CNN1 (Epochs = 100)	96.44	99.28	94.01	99.74	99.76	98.75	97.83
Ht_TFR_CNN2 (Epochs = 100)	98.16	99.07	91.56	99.74	99.83	98.91	98.61
InceptionV3 (Epochs = 6)	77.28	94.90	98.15	89.72	96.86	91.28	85.18
MobilNet (Epochs = 6)	86.97	99.62	97.33	100	99.80	97.01	92.86
VGGnet (Epochs = 6)	93.34	99.25	92.28	100	99.85	98.14	96.20
AlexNet (Epochs = 6)	95.58	99.34	93.42	100	99.84	98.64	97.42

Table 10. Comparison of proposed CNN architecture for applications in detecting

N o r m a l

,

O t h e r

,

V T

, and

V F

classes with other techniques.

Table 10. Comparison of proposed CNN architecture for applications in detecting

N o r m a l

,

O t h e r

,

V T

, and

V F

classes with other techniques.

Class	VF			VT			Other			Normal			Data Base
Techniques	Sens (%)	Spe (%)	Acc (%)	Sens (%)	Spe (%)	Acc (%)	Sens (%)	Spe (%)	Acc (%)	Sens (%)	Spe (%)	Acc (%)	Data Base
This work, Ht_TFR_CNN1 (Epochs = 50)	98.04	98.94	98.77	89.7	99.70	99	97.24	99.41	98.95	89.7	98.57	98.76	MITBIH, AHA
This work, Ht_TFR_CNN1 (Epochs = 100)	96.44	99.28	98.75	92.70	99.53	99.06	97.74	99.62	99.22	99.29	98.62	98.91	MITBIH, AHA
This work, Ht_TFR_CNN2 (Epochs = 100)	98.16	99.07	98.91	90.45	99.73	99.09	96.98	99.68	99.11	99.34	98.35	98.89	MITBIH, AHA
This work, InceptionV3 (Epochs = 6)	77.28	94.9	91.28	98.15	83.55	84.59	88.42	99.81	96.96	77.99	99.65	87.17	MITBIH, AHA
This work, MobilNet (Epochs = 6)	86.97	99.62	97.01	95.53	97.66	97.49	99.64	85.08	88.21	79.42	99.44	88.39	MITBIH, AHA
This work, VGGnet (Epochs = 6)	93.34	99.25	98.14	90.15	99.15	98.77	98.54	97.57	97.39	96.61	98.32	97.39	MITBIH, AHA
This work, AlexNet (Epochs = 6)	95.58	99.34	98.64	91.84	99.47	98.94	95.78	99.57	98.77	99.43	97.29	98.45	MITBIH, AHA
[58] SSVR, TFR	91	97		92.8	98.7		92.3	99.2		96.6	96.3		MITBIH, AHA
[58] BAGG, TFR	95.2	96.4		88.8	99.7		88.6	99.8		96.6	94.1		MITBIH, AHA
[58] I2-RLR and TFR	89.6	96.7		91	98.1		92.5	98.1		94.9	96.4		MITBIH, AHA
[58] ANNC and TFR	92.8	97		91.8	98.7		92.9	99		96.2	96.7		MITBIH, AHA
[66] TCSC algorithm	80.97	98.51	98.14										MITBIH, CUDB
[67] Chaotic based			88.6										MITBIH, CCU
[68] SVM and FS	91.9	97.1	96.8										MITBIH, CUDB
[69] SVM and Genetic algorithm	98.4	98	96.3										CUDB, AHA
[70] SVM and EMD	99.99	98.4	99.19										MITBIH, CUDB
[71] CNN neural network	56.44	98.19	97.88										MITBIH, CUDB
[72] EMD and Lempel-Ziv	98.15	96.01		96.01	98.15								MITBIH, CUDB
[73] TDA	97.07	99.25	98.68	92.72	99.53	99.05	97.43	99.54	99.09	99.05	98.45	98.76	MITBIH, AHA
[73] PDI	84.34	96.77	94.26	82.25	98.53	97.38	92.86	97.15	96.19	93.09	92.14	92.65	MITBIH, AHA
[74] App Entropy and EMD	90.47	91.66	91.17	90.62	91.11	90.8							MITBIH
[75] Approximated entropy	97.98	97.03		97.03	97.98								MITBIH, CUDB

Table 11. Comparison of proposed CNN architecture for applications in detecting ventricular fibrillation and tachycardia with other techniques.

Class	Shockable (VT+VF)			Data Base
Technique	Sensitivity (%)	Specificity (%)	Accuracy (%)
This work, Ht_TFR_CNN1	98.53	99.69	99.39	MITBIH, AHA
This work, Ht_TFR_CNN2	99.23	99.74	99.61	MITBIH, AHA
[73] TDA	99.03	99.67	99.51	MITBIH, AHA
[73] PDI	89.63	96.96	95.12	MITBIH, AHA
[76] CNN	95.32	91.04	93.2	MITDB, CUDB, VFDB
[14] VMD with Random Forest	96.54	97.97	97.23	MITBIH, CUDB
[77] RNN			99.72	MITBIH
[78] CNN and IENN	98.6	98.9	98.8	MITBIH, AFDB
[68] FS and SVM	95	99	98.6	MITBIH, CUDB
[79] Personalized features SVM		95.6	95.5	MITBIH, CUDB, VFDB
[16] C4.5 classifier	90.97	97.86	97.02	MITBIH, CUDB
[69] SVM and bootstrap	98.4	98	98.1	MITBIH, AHA, CUDB
[80] Adaptive variational and boosted CART	97.32	98.95	98.29	MITBIH, CUDB

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mjahad, A.; Saban, M.; Azarmdel, H.; Rosado-Muñoz, A. Efficient Extraction of Deep Image Features Using a Convolutional Neural Network (CNN) for Detecting Ventricular Fibrillation and Tachycardia. J. Imaging 2023, 9, 190. https://doi.org/10.3390/jimaging9090190

AMA Style

Mjahad A, Saban M, Azarmdel H, Rosado-Muñoz A. Efficient Extraction of Deep Image Features Using a Convolutional Neural Network (CNN) for Detecting Ventricular Fibrillation and Tachycardia. Journal of Imaging. 2023; 9(9):190. https://doi.org/10.3390/jimaging9090190

Chicago/Turabian Style

Mjahad, Azeddine, Mohamed Saban, Hossein Azarmdel, and Alfredo Rosado-Muñoz. 2023. "Efficient Extraction of Deep Image Features Using a Convolutional Neural Network (CNN) for Detecting Ventricular Fibrillation and Tachycardia" Journal of Imaging 9, no. 9: 190. https://doi.org/10.3390/jimaging9090190

APA Style

Mjahad, A., Saban, M., Azarmdel, H., & Rosado-Muñoz, A. (2023). Efficient Extraction of Deep Image Features Using a Convolutional Neural Network (CNN) for Detecting Ventricular Fibrillation and Tachycardia. Journal of Imaging, 9(9), 190. https://doi.org/10.3390/jimaging9090190

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Efficient Extraction of Deep Image Features Using a Convolutional Neural Network (CNN) for Detecting Ventricular Fibrillation and Tachycardia

Abstract

1. Introduction

1.1. Related Work

1.2. Proposed Work

2. Deep Learning Algorithms

2.1. Fundamental Concepts of Convolutional Neural Networks

2.1.1. Convolutional Layer

2.1.2. Nonlinear Activation Function

2.1.3. Pooling Layer

2.1.4. Fully Connected Layer

2.1.5. Loss Function

2.2. Optimization of Hyperparameters

2.3. CNN Architectures

2.3.1. AlexNet

2.3.2. VGGNet

2.3.3. Inception V3

2.3.4. MobileNet

3. Time–Frequency Representation

4. Material and Methods

4.1. Materials

4.2. Electrocardiographic Signal Preprocessing

4.2.1. Denoising

4.2.2. Segmentation

4.3. Extraction of Image from TFR

4.4. Model Training and Evaluation

4.4.1. Model Architecture

4.4.2. Training the Convolutional Neural Network Model

4.5. Performance Metrics for Classification

5. Results

Analysis Based on Different CNN Algorithms

6. Discussion

7. Application in a Real Clinical Setting

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI