Comparison of Machine Learning Models in Nonlinear and Stochastic Signal Classification

Olejarczyk, Elzbieta; Massaroni, Carlo

doi:10.3390/app152011226

Open AccessArticle

Comparison of Machine Learning Models in Nonlinear and Stochastic Signal Classification

by

Elzbieta Olejarczyk

^1,2,*

and

Carlo Massaroni

^3,4

¹

Nalecz Institute of Biocybernetics and Biomedical Engineering, Polish Academy of Sciences, 02-109 Warsaw, Poland

²

Faculty of Electrical Engineering, Automatics, Computer Science, and Biomedical Engineering, AGH University of Krakow, 30-059 Krakow, Poland

³

Departmental Faculty of Engineering, Università Campus Bio-Medico di Roma, 00128 Rome, Italy

⁴

Fondazione Policlinico Universitario Campus Bio-Medico di Roma, 00128 Rome, Italy

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(20), 11226; https://doi.org/10.3390/app152011226

Submission received: 23 September 2025 / Revised: 14 October 2025 / Accepted: 17 October 2025 / Published: 20 October 2025

(This article belongs to the Special Issue New Advances in Electrocardiogram (ECG) Signal Processing)

Download

Browse Figures

Versions Notes

Abstract

Featured Application

The algorithm, based on an optimized ensemble RUSBoosted Trees classifier and a set of statistical and nonlinear measures, may be helpful in single-channel wearable ECG devices to detect artifacts occurring in real-time ECG recordings.

Abstract

This study aims to compare different classifiers in the context of distinguishing two classes of signals: nonlinear electrocardiography (ECG) signals and stochastic artifacts occurring in ECG signals. The ECG signals from a single-lead wearable Movesense device were analyzed with a set of eight features: variance (VAR), three fractal dimension measures (Higuchi fractal dimension (HFD), Katz fractal dimension (KFD), and Detrended Fluctuation Analysis (DFA)), and four entropy measures (approximate entropy (ApEn), sample entropy (SampEn), and multiscale entropy (MSE) for scales 1 and 2). The minimum-redundancy maximum-relevance algorithm was applied for evaluation of feature importance. A broad spectrum of machine learning models was considered for classification. The proposed approach allowed for comparison of classifier features, as well as providing a broader insight into the characteristics of the signals themselves. The most important features for classification were VAR, DFA, ApEn, and HFD. The best performance among 34 classifiers was obtained using an optimized RUSBoosted Trees ensemble classifier (sensitivity, specificity, and positive and negative predictive values were 99.8, 73.7%, 99.8, and 74.3, respectively). The accuracy of the Movesense device was very high (99.6%). Moreover, the multifractality of ECG during sleep was observed in the relationship between SampEn (or ApEn) and MSE.

Keywords:

electrocardiography; artifacts; nonlinear analysis; minimum-redundancy maximum-relevance algorithm; machine learning models; artificial intelligence

1. Introduction

Electrocardiography (ECG) is a fundamental tool in cardiac diagnostics, but the interpretation of ECG signals remains a challenging task due to frequent contamination by artifacts such as baseline wander, motion artifacts, and muscle noise [1].

Traditional time- and frequency-domain methods (e.g., bandpass filters, Wiener and Kalman filters, or wavelet decompositions) have been widely used to suppress noise [2,3,4,5,6,7]. Moreover, adaptive and hybrid techniques, such as empirical mode decomposition (EMD), ensemble EMD, and empirical wavelet transform (EWT), have been proposed [8,9,10,11,12]. However, these linear techniques often fail when the noise overlaps in frequency with the signal or when the ECG exhibits nonlinear dynamics [2].

Given these challenges, nonlinear analysis methods have emerged as powerful alternatives for both noise reduction and feature extraction. Measures such as Higuchi fractal dimension (HFD), Katz fractal dimension (KFD), approximate entropy (ApEn), sample entropy (SampEn), and Detrended Fluctuation Analysis (DFA) allow for the quantification of complexity, irregularity, and self-similarity in ECG signals. Recent studies have demonstrated that such nonlinear measures are not only effective in distinguishing between normal and diseased cardiac dynamics, but also in identifying and filtering out artifacts in real-time applications. For instance, Sharanya and Arjunan (2023) used a combination of fractal dimension techniques (HFD and KFD) and entropy-based features to effectively differentiate diabetic patients with cardiac autonomic neuropathy from healthy controls using only short ECG segments [13]. Similarly, the work by Chen et al. (2024) introduced a fast sample entropy algorithm optimized for wearable ECG devices, showing that SampEn can be implemented in real time with reduced computational cost and high discriminative power for atrial fibrillation detection [14]. Additionally, Olejarczyk et al. (2024) used nonlinear and statistical features to identify noisy ECG segments recorded with Movesense wearable devices [15].

In parallel, deep learning, especially convolutional neural networks (CNNs) and transformer-based architectures, has revolutionized automated ECG classification by learning complex spatiotemporal patterns from raw or preprocessed signals. Hybrid models like DeepECG-Net have achieved high accuracy (~98.2%) while maintaining some robustness to noise [16]. Additionally, wavelet-transformed inputs combined with Swin Transformers have enhanced arrhythmia classification under artifact-heavy conditions [14].

Despite their superior performance, deep learning (DL) methods face several limitations in real-time, resource-constrained environments: (1) high computational cost: DL models often require GPUs and are not feasible on ultra-low-power microcontrollers without significant model compression; (2) lack of interpretability: DL models are largely black-boxes and require post hoc explainability tools (e.g., SHAP and saliency maps), which may not yield physiologically meaningful insights; (3) dependence on large datasets: DL requires thousands of labeled examples, which may not be available for rare cardiac conditions or personal adaptations; (4) sensitivity to artifacts: unless explicitly trained with corrupted data, deep models may misclassify noisy signals; (5) DL architectures and hyper-parameters often vary widely between studies, making reproducibility and regulatory approval more challenging.

By contrast, nonlinear features offer several advantages: (1) they are lightweight in terms of computation and memory footprint [14,17] and interpretable in terms of physiological dynamics [18,19]; (2) they can be computed in real time on edge devices without cloud support [14,17]; (3) they do not require large labeled datasets for training and can generalize better in small-sample settings [20]; (4) some features (e.g., SampEn and DFA) are robust to mild artifacts and nonstationarity [21]; (5) many nonlinear measures are well-established and validated across decades of the biomedical signal processing literature, with standardized parameter settings and performance benchmarks [13,15,18,19,22].

Thus, despite the growing dominance of deep learning, nonlinear features remain highly relevant, particularly in wearable applications where power efficiency, real-time processing, and explainability are critical. Deep learning and nonlinear analysis can be seen as complementary approaches, i.e., nonlinear features can act as input to lightweight classifiers or as preprocessing steps that enhance signal quality prior to deep learning.

Multiple comparative studies confirm the value of nonlinear features in ECG quality assessment and disease classification. For example, a study by Abdelrazik et al. (2025) demonstrated that nonlinear features, including HFD and KFD, can serve as effective inputs to lightweight machine learning models (e.g., random forest and SVM) for wearable arrhythmia detection [22]. Similarly, Ribeiro et al. (2024) used discrete wavelet transforms and nonlinear features for multi-class cardiovascular disease classification from the PTB ECG database, achieving high accuracy [23]. Noitz et al. (2024) showed that machine learning models can still extract meaningful features in ECG data corrupted by synthetic artifacts [24].

During the last ten years, a tendency to consider a large number of features and classifiers to find the optimal set for assessing the quality of the ECG signal has been observed. The Web of Science database contains 294 papers published from January 2014 to the present related to the classification of ECG artifacts. However, only a few of these studies used nonlinear measures, even though the ECG signal, unlike artifacts, is nonlinear. In most studies, the authors analyze heart rate variability (HRV), derived from the ECG signal, not the signal itself, which significantly limits the information contained therein. Previous studies have compared the performance of several varieties of two standard classifiers—random forest and support vector machine (SVM)—using as many as 27 linear and nonlinear features, including approximate entropy, permutation entropy, and Lempel–Ziv complexity [25]. Considering that the nonlinear features significantly improved the classification performance, another study compared three classifiers, standard SVM, least-squares SVM, and long short-term memory (LSTM), using a combination of six features: approximate entropy, sample entropy, fuzzy measure entropy, Hurst exponent, kurtosis, and power spectral density [26]. The results indicated that the performance of LSTM is higher than the performance of the other two SVM classifiers. Recently, some investigators reported that applying a set of as many as seventy-seven time- and frequency-domain features of HRV, including nonlinear measures, allows for the classification of different patient conditions such as normal sinus rhythm, sudden cardiac death, coronary artery disease, congestive heart failure, and atrial fibrillation [27].

This study aims to comprehensively compare machine learning models in terms of performance, scalability, interpretability, and utility in biomedical signal analysis, using ECG signals and the artifacts occurring in these signals. We performed an extensive comparative study of thirty-four classifiers using a set of the eight most promising nonlinear features, including measures of fractal dimension that have not previously been considered as indices of ECG signal quality—Higuchi fractal dimension and Katz fractal dimension—as well as multiscale entropy.

Importantly, much of the literature focuses on heart rate variability (HRV), derived from ECG, rather than the ECG signal itself [27,28,29,30,31,32]. This can limit the richness of information, especially when assessing waveform morphology changes such as ST segment shifts. In this study, we compared 34 classifiers using a concise set of nonlinear features applied directly to ECG signals rather than HRV. The importance of matching specific features and classifiers to context-specific goals was emphasized.

2. Materials and Methods

2.1. ECG Registration and Preprocessing

A prospective cohort study was performed on six healthy volunteers (age 46 ± 17 years old; three women and three men; all volunteers without musculoskeletal, cardiovascular, and respiratory diseases; all European) at the Università Campus Bio-Medico di Roma (UCBM). The recruitment was accomplished in adherence to the Declaration of Helsinki and after Ethics Committee approval from the UCBM institution (Prot. PAR 04.22 OSS) [33]. All patients provided written informed consent.

About 8 h of ECG signals during sleep were recorded using a Movesense ECG device. The sampling frequency was 256 Hz. ECG signals were filtered using a notch filter and a band-pass Butterworth filter from 0.5 Hz to 70 Hz. The signal was then normalized in 2 s windows to guarantee maximum R peak amplitude in the artifact-free ECG segments throughout the whole recording. All artifacts were manually annotated in EDFBrowser and exported to ASCII files for further analysis.

2.2. ECG Analysis

The ECG analysis was performed using eight measures, including variance and the following seven nonlinear measures: Higuchi fractal dimension, Katz fractal dimension, Detrended Fluctuation Analysis, sample entropy, approximate entropy, and multiscale entropy for scales 1 and 2 (MSE1 and MSE2). All measures were calculated in 4 s windows. Then, an outlier detection and analysis was performed. Of the 217,246 points, 4 with standard deviations much higher than 1 were rejected from further analysis.

2.2.1. Higuchi Fractal Dimension (HFD)

Higuchi fractal dimension (HFD) is a measure of signal complexity [34]. The more complex the curve (from line to white noise), the greater the HFD value. HFD is 1 for lines and 2 for white noise. From a given time series, X(i), i = 1,…, N, where N is the total number of samples, k new time series are constructed:

X_{m}^{k} : X (m), X (m + k), . . ., X (m + int (\frac{N - m}{k}) \cdot k)

(1)

where k—time interval; m—initial time in the range from 1 to k; int(r)—integer part of a real number r.

The length of each of the k time series is calculated as a normalized sum of the absolute value of difference between a pair of samples distant k starting from sample m:

L_{m} (k) = \frac{1}{k} \cdot \sum_{i = 1, int (\frac{N - m}{k})} |X (m + i k) - X (m + (i - 1) k)| \cdot \frac{N - 1}{int (\frac{N - m}{k})}

(2)

The length of curve L(k) is calculated as the average L_m(k) for m = 1,…, k.

HFD is calculated as the coefficient of linear regression, which relates L(k) to the inverse of the k parameter as follows:

\ln L (k) ~ H F D \cdot l n (\frac{1}{k})

(3)

The k parameter is the only parameter of the HFD algorithm. In this study, an optimal k value was set to 8.

2.2.2. Katz Fractal Dimension (KFD)

Katz fractal dimension (KFD) [35] is defined as outlined below:

K F D = \frac{\log (L)}{\log (d)}

(4)

where L is a sum of Euclidean distances between successive points; and d is the diameter estimated as the maximum distance between the first point and any other sequence point.

To avoid the dependence of KFD on the measurement units, an average distance between successive points a is used as a general unit to normalize the distances L and d:

K F D = \frac{\log (\frac{L}{a})}{\log (\frac{d}{a})}

(5)

In contrast to HFD, KFD does not require any parameter.

2.2.3. Detrended Fluctuation Analysis (DFA)

Detrended Fluctuation Analysis (DFA) is a modification of root-mean-square analysis of random walks applied to nonstationary signals [36].

First, a cumulative sum of a given time series X(i) is calculated as follows:

y (k) = \sum_{i = 1}^{k} (X (i) - X m e a n)

(6)

where X_mean is the average value of the entire time series.

Next, the integrated time series y(k) is divided into segments of equal length n, and the least-squares line y_n(k) is fitted to the data in each segment.

The root-mean-square fluctuation in an integrated and detrended time series F(n) in the function of window size n is given by the following:

F (n) = \sqrt{\frac{1}{N} \sum_{t = 1}^{N} {[y (k) - y_{n} (k)]}^{2}}

(7)

The scaling exponent α is defined as the slope of a least-squares regression line, which relates log(F(n)) to log(n) for n from 1 to 10.

\ln F (n) ~ α \cdot l n (n)

(8)

2.2.4. Approximate Entropy (ApEn)

Approximate entropy (ApEn) is a measure of the irregularity of the signal [37].

ApEn is defined as follows:

A p E n (m, r, N) = \frac{1}{N - m + 1} \sum_{i = 1}^{N - m + 1} \ln C_{i}^{m} (r) - \frac{1}{N - m} \sum_{i = 1}^{N - m} \ln C_{i}^{m + 1} (r)

(9)

where

C_{i}^{m} (r) = \frac{1}{N - m + 1} \sum_{i = 1}^{N - m + 1} Θ (r - d (X (i) - X (j)))

(10)

is the correlation integral with the heavy-side step function Θ; d (X(i), X(j))—distance between ith and jth point of the time series of length N; m—embedding dimension, i.e., length of the sub-sequences to be compared (set to 2); r—tolerance level, i.e., threshold of radius distance (set to 0.2).

2.2.5. Sample Entropy (SampEn)

Sample entropy (SampEn) is a modification of ApEn that excludes self-matches, i.e., comparisons with itself [38].

Thus, the correlation integral has the following form:

C_{i}^{m} (r) = \frac{1}{N - m + 1} \sum_{i = 1}^{N - m + 1} Θ (r - d (X (i) - X (j))), i \neq j

(11)

SampEn is defined as

S a m p E n (m, r, N) = - l n \frac{Φ^{m} (r)}{Φ^{m + 1} (r)}

(12)

where

Φ^{m} (r) = \frac{1}{N - m + 1} \sum_{i = 1}^{N - m + 1} \ln C_{i}^{m} (r)

(13)

The advantage of SampEn over ApEn is its independence on data length. However, SampEn is only an approximate measure of information because it directly uses correlation integrals.

2.2.6. Multiscale Entropy (MSE)

The multiscale entropy (MSE) was introduced by [39].

A given time series, X(i), i = 1,…, N, where N is the total number of samples, is divided into nonoverlapping windows of length τ. Then, the data inside each window are averaged. Next, the coarse-grained time series are constructed for every scale factor τ according to the following equation:

y^{τ} (i) = \frac{1}{τ} \sum_{i = (j - 1) \cdot τ + 1}^{j \cdot τ} X (i), 1 \leq j \leq N / τ

(14)

The length of each coarse-grained time series is equal to the size of the original time series divided by the scale factor. SampEn is calculated for each scale factor. MSE is the relationship between SampEn and τ.

2.3. Feature Selection and Classification

The feature selection and classification were performed using the Classification Learner App implemented in the Statistics and Machine Learning Toolbox in Matlab R2023a.

2.3.1. Feature Selection

The minimum-redundancy maximum-relevance (MRMR) algorithm was applied for feature selection [40]. The MRMR method, based on mutual information of pairs of features and mutual information of a feature and the response, allows for minimizing the redundancy of a feature set and maximizing its relevance to the response.

2.3.2. Feature Classification

The binary classification of ECG segments with and without artifacts was performed separately for 34 classifiers to choose the best classifier. Several groups of classifiers were considered, such as Tree, Discriminant, Logistic Regression, Support Vector Machine (SVM), Naïve Bayes, k-Nearest Neighbors (k-NN), Neural Network, and Ensemble Trees. A stratified 5-fold cross-validation was performed before training all the classification models to avoid overfitting.

2.3.3. Hyper-Parameter Optimization

Every group of classifiers was optimized to find the best set of hyper-parameters for a given model. Table 1 provides the ranges of hyper-parameters used for training individual models. The values obtained for the optimized classifiers are also reported.

2.3.4. Classification Performance

The classification performance was evaluated for all classifiers using five metrics: sensitivity, specificity, precision (PPV), negative predictive value (NPV), and detection accuracy estimated according to the following formulas:

s e n s i t i v i t y = \frac{T P}{T P + F N}

(15)

s p e c i f i c i t y = \frac{T N}{T N + F P}

(16)

p r e c i s i o n = \frac{T P}{T P + F P}

(17)

N P V = \frac{T N}{T N + F N}

(18)

a c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(19)

where TP—artifacts marked manually and identified automatically; TN—no artifacts marked manually and not identified automatically; FP—no artifacts marked manually but identified automatically; FN—artifacts marked manually but not identified automatically.

3. Results

3.1. Distributions of Nonlinear ECG Measures in Healthy Persons

ECG segments with artifacts are characterized by higher values of variance and entropy (SampEn and ApEn) and lower values of KFD than those without artifacts. Interestingly, the HFD of segments with artifacts can take lower and higher values than segments without artifacts. In comparison, MSE1 and MSE2 do not allow for differentiation of the two classes. The ranges covering 99.5% of values typical for artifact-free ECG segments are provided in Table 2. Both non-standardized and standardized values are reported.

3.2. Feature Importance Scores

The most important feature is variance (score = 0.0267), followed by two measures with the same importance scores of 0.0210 (DFA and ApEn). Next, HFD contributes a slightly higher score than MSE1 (0.0166 vs. 0.0160). SampEn, KFD, and MSE2 have much lower scores than other features (0.0090, 0.0050, and 0.0016, respectively).

3.3. Choice of the Best Classifier

The training results, including accuracy, total cost, error rate, prediction speed, training time, and model size, are summarized in Table 3. The performance of thirty-four classifiers evaluated using five metrics is provided in Table 4. The average accuracy of the Movesense ECG device was 99.4%. Among all classifiers, an optimized ensemble RUSBoosted Trees classifier offers the best performance (c.f., Table 4) at a relatively lower total cost (c.f., Table 2). However, the weighted k-NN classifier allows us to find the highest number of segments with artifacts (TP). The ROC curves for both classifiers are shown in Figure 1. The area under the curve (AUC), which measures the discriminatory power of a classifier, is larger for the ensemble RUSBoosted Trees classifier than for the weighted k-NN classifier. Both classifiers, optimized weighted k-NN and optimized ensemble RUSBoosted Trees, are characterized by an optimal training time (14 s and 130 s, respectively), which is much lower than the time needed for training of Naïve Bayes, SVM, or Neural Network classifiers (c.f., Table 3). The maximal time was over an hour for non-optimized Kernel Naïve Bayes and Cubic SVM. The optimization of the hyper-parameters allowed for a significant reduction in training time. However, training an optimized Kernel Naïve Bayes or Neural Network still requires much longer time than other classifiers, which is reflected in up to 348 times (!) greater energy consumption (see Appendix A).

The detailed procedure for comparing performance metrics between each pair of optimized classifiers is provided in Appendix B.

3.4. Comparison of Classifiers: The Ensemble RUSBoosted Trees and the Weighted k-NN Classifier

In Figure 2, the scatter plots of variance in relation to the selected best features (DFA, ApEn, and HFD) are compared for two non-optimized classifiers: the ensemble RUSBoosted Trees and the weighted k-NN.

In the scatter plots of the weighted k-NN classifier, the space dominated by feature values for segments with artifacts is “contaminated” by blue points, i.e., values belonging to segments that the expert has classified as segments without artifacts. At the same time, the ensemble RUSBoosted Trees classifier can eliminate points not located in a space dominated by points belonging to the same class. However, the optimization of the ensemble RUSBoosted Trees classifier made the distributions of both classifiers similar.

3.5. Relationships Between Variance and Nonlinear Measures

The points in the scatter plots of variance and each of the nonlinear measures of ECG segments without artifacts occupy an area in the shape of an elongated blue disk (c.f., Figure 3), proving that these feature pairs are correlated. Regardless of the value of the nonlinear feature, the variance remains constant and at the same level independent of the feature (horizontal disks), except for KFD, for which we observe a slight increase in variance with the rise in KFD (c.f., Figure 3B). For this reason, applying a variance threshold above 0.02 allows for the elimination of most artifacts, which are observed mainly in the scatter plots of variance in the function of DFA (c.f., Figure 3C) and MSE independently on the scale (c.f., MSE1 in Figure 3F and MSE2 in Figure 3G). An additional threshold for HFD greater than 1.26 is needed in case of variance and HFD (Figure 3A). In the relationships between the variance and entropy measures, thresholds for SampEn, ApEn, MSE1, and MSE2 are required above the following values: 1.84, 2.10, 1.4, and 2.6, respectively.

3.6. Relationships Between Entropy and Fractal Dimension Measures

Surprisingly, we do not observe any correlation between ApEn and individual fractal dimension measures. Artifacts have higher entropy and lower DFA than a low-noise ECG signal (c.f., Figure 4C). Unlike DFA, the other two measures of fractal dimension (mainly HFD) can take both higher and lower values for artifacts than for the artifact-free ECG signal (c.f., Figure 4A,B).

3.7. Relationships Between Entropy Measures

As expected, strong correlations between SampEn and ApEn (c.f., Figure 5A) and between MSE1 and MSE2 can be observed (c.f., Figure 5B). Applying a threshold to higher entropy values, especially to the SampEn (above 1.84) and ApEn (above 2.10), allows us to eliminate a large number of artifacts. We observe an interesting pattern in the scatter plots of SampEn (or ApEn) in relation to MSE (c.f., Figure 5C and Figure 2D). In the artifact-free ECG-signal-value area, three subspaces are observed. This pattern is better visible in MSE2 than in MSE1 (c.f., Figure 5D).

4. Discussion

We performed an extensive comparative study of as many as thirty-four classifiers using a set of the eight most promising nonlinear features, including measures of fractal dimension that have not been considered previously as indices of ECG signal quality, i.e., Higuchi fractal dimension and Katz fractal dimension, as well as multiscale entropy.

The application of other nonlinear methods, such as Correlation Dimension (CD) or Lyapunov exponents (LEs), is limited because of their high computational load, sensitivity to noise, and dependence on long signals. In contrast, HFD and KFD calculate complexity directly from the time series without phase-space embedding. ApEn and SampEn use fixed template lengths and tolerance windows, while DFA uses a detrending and scaling procedure to make them much faster than CD or LE. Thus, the proposed methods (HFD, KFD, DFA, ApEn, and SampEn) are faster, more robust, and better suited to real-time or resource-constrained environments, like wearable ECG devices. In this study, ECG segments with and without artifacts were classified using a set of nonlinear features of the ECG signal itself, not HRV derived from the ECG, as used in most previous studies [27,28,29,30,31,32].

Moreover, unlike other studies, we do not indiscriminately use many features and classifiers [25,27]. Instead, we discuss the role of individual features and classifiers in a specific context and provide a physiological interpretation of those that yield the best results.

The MRMR method [40] was used to evaluate the importance of individual features for classification. Unlike other methods, this method does not depend on the classifier but only on the class characteristics. This allowed for more effective classification and the discovery of interesting relationships between features. The variance has been selected by MRMR as the best feature to characterize the artifacts because of their high variability compared to the variability of artifact-free ECG signals. However, the classification performance improved when nonlinear measures were included. Measures of fractal dimension and entropy are beneficial in the analysis of the ECG signal due to its nonlinear nature. These measures also allowed for differentiation between an artifact-free ECG signal and a stochastic noise. Entropy is a measure of the degree of disorder of a system; thus, the presence of artifacts in the ECG causes an increase in signal entropy. Meanwhile, the fractal dimension depends on the signal complexity. HFD takes values from one for deterministic curves (line and sinusoid) to two for white noise. Previous studies have shown that the HFD allows for the differentiation of movement and muscle artifacts [15]. HFD is lower than 1.15 for movement artifacts, while the HFD is greater than 1.26 for muscle artifacts. This is consistent with the low complexity of movement artifacts causing a decrease in HFD. In contrast, the high complexity of muscle artifacts led to an increase in the HFD of the signal. HFD is a much easier method of elimination of muscle artifacts than those based on shifted rank-1 reconstruction proposed by other authors [41].

Although the sensitivity and precision (PPV) values for all classifiers were over 99%, specificity and NPV depended on the choice of the classifier. The best performance was obtained using an optimized ensemble RUSBoosted Trees classifier. Specificity and NPV were 73.7 and 74.3, respectively. The highest NPV value of 83.5%, corresponding to identifying the largest number of segments with artifacts (TP), was possible using an optimized weighted k-NN classifier. However, it was possible at the expense of a lower specificity of 64.3%, which was related to the rejection of a more significant number of segments without artifacts.

A characteristic pattern with three subspaces distinguishable in the area of the artifact-free ECG-signal values, observed in the scatter plots of SampEn (or ApEn) in relation to MSE, may indicate the multifractal nature of the ECG, which is related to the occurrence of different sleep stages [42].

5. Conclusions

The aim of this study was to provide a comprehensive comparison of a broad spectrum of machine learning models in the context of differentiating two classes of signals, nonlinear ECG signals and stochastic artifacts. In this specific case, an optimized ensemble RUSBoosted Trees classifier guaranteed the best classification performance results.

Both classifiers, optimized weighted k-NN and optimized ensemble RUSBoosted Trees, are characterized by an optimal training time (14 s and 130 s, respectively). However, they require higher memory usage (k-NN: 24 MB, ensemble RUSBoosted Trees: 53 MB) than other classifiers, such as Decision Tree (13 kB), Discriminant (5 kB), SVM (314 kB), or Neural Network (31 kB). The Discriminant classifier has the lowest training time (2 s) and the smallest model size (5 kB), but at the cost of lower specificity (59.5% for Discriminant vs. 73.7% for ensemble RUSBoosted Tree). However, in wearable devices, where minimal memory usage is important, the Discriminant classifier would be preferable.

The importance of features for classification was assessed using the MRMR algorithm, which is much more computationally efficient and generalized than standard feature selection methods because a set of selected features does not depend on the classifier but only on the class characteristics. The application of MRMR revealed the particular importance of the variance in classifying artifacts occurring in a nonlinear signal, such as ECG, regardless of the type of classifier. Among the nonlinear measures, the most important were DFA, ApEn, and HFD.

DFA and entropy measures are more effective than HFD in distinguishing between stochastic (random) and chaotic (nonlinear deterministic) signals. HFD quantifies the self-similarity or complexity of a signal across scales but is not sensitive to the source of complexity. Consequently, both stochastic and chaotic signals can sometimes have similar HFD values, whereas DFA can distinguish better between correlated (e.g., deterministic or fractional Brownian) and uncorrelated (white noise) signals. On the other hand, entropy measures evaluate the unpredictability or irregularity in a time series. Stochastic signals (especially white noise) have high entropy due to their lack of structure. Chaotic systems have lower entropy than stochastic systems because they follow deterministic rules, even though they appear complex. Thus, entropy allows for distinguishing noise from structured chaos.

Moreover, we noticed an interesting relationship between sample entropy (or approximate entropy) and multiscale entropy, revealing the possible multifractality of ECG during sleep.

Additionally, the algorithm based on an optimized ensemble RUSBoosted Trees classifier and a set of several statistical and nonlinear measures may be helpful in single-channel wearable ECG devices to detect artifacts occurring in real-time ECG recordings.

The limitation of this study is the use of 8 h ECG signals from a small number of healthy volunteers during sleep. Further study will be expanded to a larger group of people, taking into account everyday conditions and other wearable devices.

Another limitation of this study is related to the subjective factor of the expert labeling process of artifact-free segments. To partially address this, labeling would need to be performed by multiple experts.

Author Contributions

Conceptualization, E.O.; methodology, E.O.; software, E.O.; validation, E.O.; formal analysis, E.O.; investigation, E.O.; resources, C.M.; data curation, C.M.; writing—original draft preparation, E.O.; writing—review and editing, C.M.; visualization, E.O.; project administration, E.O. and C.M.; funding acquisition, E.O. and C.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the European Health and Digital Executive Agency (HaDEA), grant no. 101128983. Views and opinions expressed are, however, those of the authors only and do not necessarily reflect those of the European Union or the European Health and Digital Executive Agency. Neither the European Union nor the granting authority can be held responsible for them. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics Committee of the Fondazione Policlinico Universitario Campus Bio-Medico on 6 February 2022 (Prot. PAR 04.22 OSS).

Informed Consent Statement

Informed consent was obtained from all subjects involved in this study.

Data Availability Statement

The datasets of this manuscript are publicly available in the RepOD repository: https://doi.org/10.18150/O7QQNQ.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ApEn	Approximate entropy
CD	Correlation dimension
CNN	Convolutional Neural Network
DFA	Detrended Fluctuation Analysis
DL	Deep learning
ECG	Electrocardiography
EMD	Empirical mode decomposition
EWT	Empirical wavelet transform
FN	False negative
FP	False positive
HFD	Higuchi fractal dimension
HRV	Heart rate variability
ICA	Independent component analysis
KFD	Katz fractal dimension
k-NN	k-Nearest Neighbors
LE	Lyapunov exponent
LSTM	Long short-term memory
MRMR	Minimum redundancy maximum relevance
MSE	Multiscale entropy
NPV	Negative predictive value
PPV	Positive predictive value (precision)
SampEn	Sample entropy
SVM	Support vector machine
TP	True positive
TN	True negative
VAR	Variance

Appendix A. Evaluation of Energy Consumption Due to Model Training

The carbon footprint expressed in terms of carbon dioxide equivalent (CO₂eq) was estimated using the Machine Learning Emissions Calculator (https://mlco2.github.io/impact/, accessed on 22 September 2025) to evaluate the energy consumption due to model training.

The model training was performed using an Intel^® Core™ i7-13700K Processor. According to the company specifications (https://www.intel.com/content/www/us/en/products/sku/230500/intel-core-i713700k-processor-30m-cache-up-to-5-40-ghz/specifications.html, accessed on 22 September 2025), the processor’s base power is 125 W and maximum turbo power is 253 W.

Assuming the value of carbon efficiency equal to 0.432 kg/kWh and offset bought set to 0%, the carbon footprint of 100 min of model training corresponds to 0.09–0.18 kg eq. CO₂.

For example, the training of an optimized weighted k-NN classifier lasting 14 s corresponds to the following:

125 W · 14 s = 0.125 · 0.0039 kWh = 0.00049 kWh · 0.432 kg eq. CO₂/kWh = 0.00021 kg eq. CO₂

The training of an optimized ensemble RUSBoosted Trees classifier lasting 130 s cor-responds to the following:

125 W · 130 s = 0.125 · 0.036 kWh = 0.0045 kWh · 0.432 kg eq. CO₂/kWh = 0.00195 kg eq. CO₂

The training of an optimized SVM lasting 3629 s corresponds to the following:

125 W · 3629 s = 0.125 · 1.01 kWh = 0.126 kWh · 0.432 kg eq. CO₂/kWh = 0.055 kg eq. CO₂

Whereas the training of an optimized Neural Network classifier lasting 4837 s corresponds to the following:

125 W · 4837 s = 0.125 · 1.34 kWh = 0.168 kWh · 0.432 kg eq. CO₂/kWh = 0.073 kg eq. CO₂

Therefore, the carbon footprint for an optimized Neural Network is 348 times (!) higher than for an optimized weighted k-NN.

Appendix B. Comparison of Model Performance

The performance of seven optimized models (Tree, Discriminant, Naïve Bayes, SVM, k-NN, ensemble RUSBoost Tree, and Neural Network) calculated for 5 folds is reported in Table A1.

To extract values from 5-fold classification, each model was trained using an appropriate fitc* function in MATLAB R2023a with the option “KFold”.

For example, for the optimized k-NN classifier, the following procedure was applied:

H.NumNeighbors = 14;

H.Distance = ‘euclidean’;

H.DistanceWeight = ‘squaredinverse’;

H.Standardize = 1;

k = 5; % number of folds.

Mdl = fitcknn (features, classes, ‘Distance’, char(H.Distance), ...

‘DistanceWeight’, char(H.DistanceWeight), ...

‘NumNeighbors’, H.NumNeighbors, ...

‘Standardize’, H.Standardize, ‘KFold’, k);

The confusion matrices (CM) for each fold were extracted as follows:

for i = 1:k

Labels = classes(Mdl.Partition.test(i));

Pred = predict(Mdl.Trained{i}, features(Mdl.Partition.test(i), :));

CM{i} = confusionmat(Labels, Pred);

end

Then, five performance metrics (sensitivity, specificity, PPV, NPV, and accuracy) were calculated from the confusion matrices (CM) for each of the five folds.

Next, the paired-sample t-test was applied to compare each pair of classifiers. The results are presented in Table A2.

Table A1. Performance of seven optimized classifiers evaluated using five metrics: sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and detection accuracy; TP—artifacts marked manually and identified automatically; TN—no artifacts marked manually and not identified automatically; FP—no artifacts marked manually but identified automatically; FN—artifacts marked manually but not identified automatically.

Optimized Model	TP	FP	TN	FN	Sensitivity	Specificity	PPV	NPV	Accuracy
Tree	43,016	140	219	73	0.998	0.610	0.997	0.750	0.995
	43,028	161	199	61	0.999	0.553	0.996	0.765	0.995
	43,015	152	208	74	0.998	0.578	0.996	0.738	0.995
	43,011	133	227	77	0.998	0.631	0.997	0.747	0.995
	43,025	167	192	64	0.999	0.535	0.996	0.750	0.995
					0.998	0.581	0.997	0.750	0.995
Discriminant	42,989	160	199	100	0.998	0.554	0.996	0.666	0.994
	42,993	141	219	96	0.998	0.608	0.997	0.695	0.995
	42,998	150	210	91	0.998	0.583	0.997	0.698	0.994
	43,004	144	216	84	0.998	0.600	0.997	0.720	0.995
	43,008	129	230	81	0.998	0.641	0.997	0.740	0.995
					0.998	0.597	0.997	0.704	0.995
Naïve Bayes	42,672	67	292	417	0.990	0.813	0.998	0.412	0.989
	42,662	62	298	427	0.990	0.828	0.999	0.411	0.989
	42,640	68	292	449	0.990	0.811	0.998	0.394	0.988
	42,665	62	298	423	0.990	0.828	0.999	0.413	0.989
	42,656	57	302	433	0.990	0.841	0.999	0.411	0.989
					0.990	0.824	0.999	0.408	0.989
SVM	43,040	148	211	49	0.999	0.588	0.997	0.812	0.995
	43,031	157	203	58	0.999	0.564	0.996	0.778	0.995
	43,046	147	213	43	0.999	0.592	0.997	0.832	0.996
	43,024	148	212	64	0.999	0.589	0.997	0.768	0.995
	43,022	150	209	67	0.998	0.582	0.997	0.757	0.995
					0.999	0.583	0.997	0.789	0.995
k-NN	43,037	125	234	52	0.999	0.652	0.997	0.818	0.996
	43,048	135	225	41	0.999	0.625	0.997	0.846	0.996
	43,056	125	235	33	0.999	0.653	0.997	0.877	0.996
	43,038	134	226	50	0.999	0.628	0.997	0.819	0.996
	43,048	123	236	41	0.999	0.657	0.997	0.852	0.996
					0.999	0.643	0.997	0.842	0.996
Ensemble	42,617	47	312	472	0.989	0.869	0.999	0.398	0.988
	42,644	60	300	445	0.990	0.833	0.999	0.403	0.988
	42,579	47	313	510	0.988	0.869	0.999	0.380	0.987
	42,633	62	298	455	0.989	0.828	0.999	0.396	0.988
	42,643	43	316	446	0.990	0.880	0.999	0.415	0.989
					0.989	0.856	0.999	0.398	0.988
Neural Network	43,022	141	218	67	0.998	0.607	0.997	0.765	0.995
	43,023	148	212	66	0.998	0.589	0.997	0.763	0.995
	43,036	147	213	53	0.999	0.592	0.997	0.801	0.995
	43,024	147	213	64	0.999	0.592	0.997	0.769	0.995
	43,040	139	220	49	0.999	0.613	0.997	0.818	0.996
					0.999	0.598	0.997	0.783	0.995

Table A2. Comparison of classification performance for each pair of models separately for each of the five metrics (sensitivity, specificity, PPV, NPV, and accuracy). The statistically significant differences between models for p-values less than 0.05 are marked in red.

Sensitivity	Discriminant	Naïve Bayes	SVM	k-NN	Ensemble	Neural Network
Tree	0.012654	0.000001	0.097824	0.002349	0.000003	0.087904
Discriminant		0.000001	0.009894	0.000472	0.000006	0.000494
Naïve Bayes			0.000001	0.000002	0.020519	0.000002
SVM				0.054314	0.000015	0.587531
k-NN					0.000007	0.004663
Ensemble						0.000007
Specificity	Discriminant	Naïve Bayes	SVM	k-NN	Ensemble	Neural Network
Tree	0.609934	0.000317	0.918646	0.040432	0.000335	0.426127
Discriminant		0.000018	0.438621	0.048048	0.000131	0.941055
Naïve Bayes			0.000011	0.000041	0.061452	0.000003
SVM				0.000524	0.000010	0.062232
k-NN					0.000001	0.000620
Ensemble						0.000005
PPV	Discriminant	Naïve Bayes	SVM	k-NN	Ensemble	Neural Network
Tree	0.614120	0.000320	0.913403	0.039895	0.000338	0.424345
Discriminant		0.000018	0.449108	0.046823	0.000132	0.926907
Naïve Bayes			0.000011	0.000044	0.061923	0.000003
SVM				0.000514	0.000010	0.063177
k-NN					0.000001	0.000614
Ensemble						0.000005
NPV	Discriminant	Naïve Bayes	SVM	k-NN	Ensemble	Neural Network
Tree	0.027501	0.000000	0.078218	0.002126	0.000000	0.075628
Discriminant		0.000020	0.025051	0.000682	0.000010	0.001404
Naïve Bayes			0.000023	0.000007	0.056455	0.000009
SVM				0.021545	0.000032	0.749565
k-NN					0.000006	0.002727
Ensemble						0.000005
Accuracy	Discriminant	Naïve Bayes	SVM	k-NN	Ensemble	Neural Network
Tree	0.242112	0.000001	0.087122	0.004320	0.000018	0.115168
Discriminant		0.000013	0.082095	0.001745	0.000009	0.009532
Naïve Bayes			0.000010	0.000006	0.031457	0.000005
SVM				0.003499	0.000043	0.795631
k-NN					0.000014	0.000643
Ensemble						0.000011

References

Clifford, G.D.; Azuaje, F.; McSharry, P.E. Advanced Methods for ECG Analysis; Artech House: London, UK, 2006. [Google Scholar]
Chatterjee, S.; Thakur, R.S.; Yadav, R.N.; Gupta, L.; Raghuvanshi, D.K. Review of noise removal techniques in ECG signals. IET Signal Proc. 2020, 14, 569–590. [Google Scholar] [CrossRef]
Van der Bijl, K.; Elgendi, M.; Menon, C. Automatic ECG Quality Assessment Techniques: A Systematic Review. Diagnostics 2022, 12, 2578. [Google Scholar] [CrossRef]
Siddiah, N.; Srikanth, T.; Kumar, Y.S. Nonlinear filtering in ECG Signal Enhancement. Int. J. Comput. Sci. Commun. Netw. 2012, 2, 123–128. [Google Scholar]
Sarafan, S.; Vuong, H.; Jilani, D.; Malhotra, S.; Lau, M.P.H.; Vishwanath, M.; Ghirmai, T.; Cao, H. A Novel ECG Denoising Scheme Using the Ensemble Kalman Filter. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. 2022, 2022, 2005–2008. [Google Scholar]
Khavas, Z.R.; Asl, B.M. Robust heartbeat detection using multimodal recordings and ECG quality assessment with signal amplitudes dispersion. Comput. Methods Programs Biomed. 2018, 163, 169–182. [Google Scholar] [CrossRef]
Zhao, Z.; Zhang, Y. SQI Quality Evaluation Mechanism of Single-Lead ECG Signal Based on Simple Heuristic Fusion and Fuzzy Comprehensive Evaluation. Front. Physiol. 2018, 9, 727. [Google Scholar] [CrossRef]
Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.-C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. A Math. Phys. Eng. Sci. 1998, 454, 903–995. [Google Scholar] [CrossRef]
Wu, Z.H.; Huang, N.E. A study of the characteristics of white noise using the empirical mode decomposition method. Proc. R. Soc. A Math. Phys. Eng. Sci. 2004, 460, 1597–1611. [Google Scholar] [CrossRef]
Chang, K.M. Arrhythmia ECG noise reduction by ensemble empirical mode decomposition. Sensors 2010, 10, 6063–6080. [Google Scholar] [CrossRef] [PubMed]
Xu, X.; Liang, Y.; He, P.; Yang, J. Adaptive Motion Artifact Reduction Based on Empirical Wavelet Transform and Wavelet Thresholding for the Non-Contact ECG Monitoring Systems. Sensors 2019, 19, 2916. [Google Scholar] [CrossRef]
Elouaham, S.; Dliou, A.; Jenkal, W.; Louzazni, M.; Zougagh, H.; Dlimi, S. Empirical Wavelet Transform Based ECG Signal Filtering Method. J. Electr. Comput. Eng. 2024, 2024, 9050909. [Google Scholar] [CrossRef]
Sharanya, S.; Arjunan, P.D. Fractal Dimension Techniques for Analysis of Cardiac Autonomic Neuropathy. Biomed. Eng. Appl. Basis Commun. 2023, 35, 2350003. [Google Scholar] [CrossRef]
Chen, C.; da Silva, B.; Ma, C.; Li, J.; Liu, C. Fast Sample Entropy Atrial Fibrillation Analysis Towards Wearable Device. In Proceedings of the 12th Asian-Pacific Conference on Medical and Biological Engineering. APCMBE 2023, Suzhou, China, 18–21 May 2023; Wang, G., Yao, D., Gu, Z., Peng, Y., Tong, S., Liu, C., Eds.; IFMBE Proceedings. Springer: Cham, Switzerland, 2024; Volume 103. [Google Scholar]
Olejarczyk, E.; Raus-Jarzabek, E.; Massaroni, C. Automatic identification of movement and muscle artifacts in ECG based on statistical and nonlinear measures. In Proceedings of the 2024 IEEE International Workshop on Metrology for Industry 4.0 & IoT, Florence, Italy, 29–31 May 2024. [Google Scholar]
Alghieth, M. DeepECG-Net: A hybrid transformer-based deep learning model for real-time ECG anomaly detection. Sci. Rep. 2025, 15, 20714. [Google Scholar] [CrossRef]
Wang, Y.-H.; Chen, I.-Y.; Chiueh, H.; Liang, S.-F. A Low-Cost Implementation of Sample Entropy in Wearable Embedded Systems: An Example of Online Analysis for Sleep EEG. IEEE Trans. Instrum. Meas. 2021, 70, 4002412. [Google Scholar] [CrossRef]
Gomolka, R.S.; Kampusch, S.; Kaniusas, E.; Thurk, F.; Szeles, J.C.; Klonowski, W. Higuchi Fractal Dimension of Heart Rate Variability During Percutaneous Auricular Vagus Nerve Stimulation in Healthy and Diabetic Subjects. Front. Physiol. 2018, 9, 1162. [Google Scholar] [CrossRef] [PubMed]
Horie, T.; Burioka, N.; Amisaki, T.; Shimizu, E. Sample Entropy in Electrocardiogram During Atrial Fibrillation. Yonago Acta Med. 2018, 61, 49–57. [Google Scholar] [CrossRef]
Zhao, L.; Liu, C.; Wei, S.; Shen, Q.; Zhou, F.; Li, J. A New Entropy-Based Atrial Fibrillation Detection Method for Scanning Wearable ECG Recordings. Entropy 2018, 20, 904. [Google Scholar] [CrossRef]
Alcan, V. Sample Entropy Analysis of heart rate variability in RR interval detection. Muhendis. Bilim. Ve Tasarım Derg. 2020, 8, 783–790. [Google Scholar] [CrossRef]
Abdelrazik, A.; Eldesouky, M.; Antoun, I.; Lau, E.Y.M.; Koya, A.; Vali, Z.; Suleman, S.A.; Donaldson, J.; Ng, G.A. Wearable Devices for Arrhythmia Detection: Advancements and Clinical Implications. Sensors 2025, 25, 2848. [Google Scholar] [CrossRef]
Ribeiro, P.; Sa, J.; Paiva, D.; Rodrigues, P.M. Cardiovascular Diseases Diagnosis Using an ECG Multi-Band Non-Linear Machine Learning Framework Analysis. Bioengineering 2024, 11, 58. [Google Scholar] [CrossRef] [PubMed]
Noitz, M.; Mortl, C.; Bock, C.; Mahringer, C.; Bodenhofer, U.; Dunser, M.W.; Meier, J. Detection of Subtle ECG Changes Despite Superimposed Artifacts by Different Machine Learning Algorithms. Algorithms 2024, 17, 360. [Google Scholar] [CrossRef]
Zhang, Y.; Wei, S.; Zhang, L.; Liu, C. Comparing the Performance of Random Forest, SVM and Their Variants for ECG Quality Assessment Combined with Nonlinear Features. J. Med. Biol. Eng. 2019, 39, 381–392. [Google Scholar] [CrossRef]
Fu, F.; Xiang, W.; An, Y.; Liu, B.; Chen, X.; Zhu, S.; Li, J. Comparison of Machine Learning Algorithms for the Quality Assessment of Wearable ECG Signals Via Lenovo H3 Devices. J. Biol. Eng. 2021, 41, 231–240. [Google Scholar] [CrossRef]
Karimulla, S.; Patra, D. An Optimal Methodology for Early Prediction of Sudden Cardiac Death Using Advanced Heart Rate Variability Features of ECG Signal. Arab. J. Sci. Eng. 2024, 49, 6725–6741. [Google Scholar] [CrossRef]
Rasmussen, J.H.; Rosenberger, K.; Langbein, J.; Easie, R.R. An open-source software for non-invasive heart rate variability assessment. Methods Ecol. Evol. 2020, 11, 773–782. [Google Scholar] [CrossRef]
El-Yaagoubi, M.; Goya-Esteban, R.; Jabrane, Y.; Munoz-Romero, S.; Garcia-Alberola, A.; Rojo-Alvarez, J.L. On the Robustness of Multiscale Indices for Long-Term Monitoring in Cardiac Signals. Entropy 2019, 21, 594. [Google Scholar] [CrossRef]
Stapelberg, N.J.C.; Neumann, D.L.; Shum, D.H.K.; McConnell, H.; Hamilton-Craig, I. The sensitivity of 38 heart rate variability measures to the addition of artifact in human and artificial 24-hr cardiac recordings. Ann. Noninvasive Electrocardiol. 2018, 23, e12483. [Google Scholar] [CrossRef]
Giles, D.A.; Draper, N. Heart rate variability during exercise: A comparison of artefact correction methods. J. Strength Cond. Res. 2018, 32, 726–735. [Google Scholar] [CrossRef]
Ernst, G. Hidden Signals-The History and Methods of Heart Rate Variability. Front. Public Health 2017, 5, 265. [Google Scholar] [CrossRef]
Massaroni, C.; Olejarczyk, E.; Lo Presti, D.; Schena, E.; Nusca, A.; Ussia, G.P.; Silvestri, S. Indirect Respiratory Monitoring via Single-Lead Wearable ECG: Influence of Motion Artifacts and Devices on Respiratory Rate Estimations. In Proceedings of the 2024 IEEE International Workshop on Metrology for Industry 4.0 & IoT, Florence, Italy, 29–31 May 2024. [Google Scholar]
Higuchi, T. Approach to an irregular time series on the basis of the fractal theory. Phys. D 1988, 31, 277–283. [Google Scholar] [CrossRef]
Katz, M.J. Fractals and the analysis of waveforms. Comput. Biol. Med. 1988, 18, 145–156. [Google Scholar] [CrossRef]
Peng, C.K.; Havlin, S.; Hausdorff, J.M.; Mietus, J.E.; Stanley, H.E.; Goldberger, A.L. Fractal mechanisms and heart rate dynamics: Long-range correlations and their breakdown with disease. J. Electrocardiol. 1996, 28 (Suppl. S1), 59–64. [Google Scholar] [CrossRef] [PubMed]
Pincus, S.M. Approximate entropy as a measure of system complexity. Proc. Natl. Acad. Sci. USA 1991, 88, 2297–2301. [Google Scholar] [CrossRef] [PubMed]
Richman, J.S.; Moorman, R.J. Physiological time-series analysis using approximate entropy and sample entropy. Am. J. Physiol. Heart Circ. Physiol. 2000, 278, 2039–2049. [Google Scholar] [CrossRef] [PubMed]
Costa, M.; Goldberger, A.L.; Peng, C.K. Multiscale entropy analysis of complex physiologic time series. Phys. Rev. Lett. 2002, 89, 068102. [Google Scholar] [CrossRef]
Ding, C.; Peng, H. Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 2005, 3, 185–205. [Google Scholar] [CrossRef]
Chen, X.; Zheng, S.; Peng, L.; Zhong, Q.; He, L. A novel method based on shifted rank-1 reconstruction for removing EMG artifacts in ECG signals. Biomed. Signal Process. Control 2023, 85, 104967. [Google Scholar] [CrossRef]
Costa, M.; Goldberger, A.L.; Peng, C.K. Multiscale entropy analysis of biological signals. Phys. Rev. E 2005, 71, 021906. [Google Scholar] [CrossRef]

Figure 1. The ROC curves for two classifiers: (A) the ensemble RUSBoosted Trees classifier; (B) the weighted k-NN classifier.

Figure 2. Comparison of the scatter plots of variance in relation to the selected best features. (A,B)—DFA, (C,D)—ApEn, (E,F)—HFD for two non-optimized classifiers: the ensemble RUSBoosted Trees classifier (A–E) and the weighted k-NN classifier (B–F).

Figure 3. The scatter plots of variance in function of nonlinear measures: (A) HFD; (B) KFD; (C) DFA; (D) ApEn; (E) SampEn; (F) MSE for scale 1; (G) MSE for scale 2.

Figure 4. The scatter plots of ApEn in function of fractal dimension measures: (A) HFD; (B) KFD; (C) DFA.

Figure 5. The scatter plots illustrating relationships between entropy measures: (A) SampEn vs. ApEn; (B) MSE2 vs. MSE1; (C) MSE2 vs. SampEn; (D) MSE2 vs. ApEn.

Table 1. The ranges of hyper-parameters used for training individual models.

Classifier Group	Ranges of Hyper-Parameters
Tree	split criterion: Gini’s diversity index; surrogate decision splits: off; maximum number of splits: 100 (fine), 20 (medium), 7 (coarse) optimized: maximum number of splits: 36
Discriminant	in both linear and quadratic discriminant, a full covariance structure is used optimized: linear
Efficient Logistic Regression and Efficient Linear SVM	solver, regularization, and regularization strength (lambda) are automatic; relative coefficient tolerance (beta tolerance): 0.0001; multi-class coding: one-vs.-one
Naïve Bayes	standardize data; kernel, in contrast to Gaussian distribution for numeric predictors, uses unbounded support optimized: kernel type: triangle
SVM	multi-class coding: one-vs.-one; standardize data; box constraint level: 1; linear, quadratic, and cubic kernel function use automatic scale; Gaussian SVM scale: 0.71 (fine), 2.8 (medium), and 11 (coarse) optimized: kernel function: Gaussian; kernel scale: 1.7037
k-NN	standardize data; distance metric: Euclidean (fine, medium, coarse, and weighted k-NN), Cosine (cosine k-NN), Minkowski (cubic k-NN); number of neighbors: k = 10, except fine k-NN (k = 1) and coarse k-NN (k = 100); distance weight: equal, except the weighted k-NN (squared inverse distance) optimized: distance metric: Euclidean; weighted k-NN; number of neighbors: k = 14
Ensemble Trees	number of learners: 30; learner type: Decision Tree for AdaBoost, Bag, and RUSBoost, while Discriminant or Nearest Neighbors for Subspace Ensemble with subspace dimension equal to 4; all predictors to sample are used by Decision Tree learner; maximum number of splits: 20 for AdaBoost and RUSBoost with learning rate equal to 0.1, or 217,241 for Bag optimized: for RUSBoost: number of learners: 325; maximum number of splits: 139,269; learning rate: 0.50351; for Bagged Tree: number of learners: 112; maximum number of splits: 2451
Neural Networks	standardize data; regularization strength (lambda): 0; activation: ReLU; iteration limit: 1000; number of layers: 1 (narrow, medium, and wide), 2 (bilayered), 3 (trilayered); layer size: 10, except medium (25) and wide (100) Neural Network optimized: lambda: 4.7042 × 10⁻⁶; activation: ReLU; number of layers: 1; layer size: 296
Kernel	SVM or Logistic Regression Kernel; regularization strength (lambda): automatic; multi-class coding: one-vs.-one; kernel scale: automatic; iteration limit: 1000; number of expansion dimensions: automatic

Table 2. Ranges of feature values covering 99.5% of values typical for artifact-free ECG segments: non-standardized and standardized values.

Non-Standardized Data
Measure	Variance	HFD	KFD	DFA	SampEn	AppEn	MSE1	MSE2
min	0.008	1.04	1.0000	1.31	0.78	1.17	0.02	0.04
max	0.097	1.49	1.0003	2.95	2.22	2.32	2.25	3.38
Standardized Data
Measure	Variance	HFD	KFD	DFA	SampEn	AppEn	MSE1	MSE2
min	−0.0042	−7.61	−0.11	−10.17	−3.74	−4.57	−1.56	−1.33
max	−0.0040	15.86	0.20	2.53	5.55	4.49	17.95	15.13

Table 3. Training results in terms of accuracy, total cost, error rate, prediction speed, training time, and model size for individual models. The results are reported separately for non-optimized and for optimized classifiers. The prediction speed and training time were provided for a group of optimized classifiers and the best-optimized classifiers in each group.

Classifier	Accuracy [%]	Total Cost	Prediction Speed [obs/s]	Training Time [s]	Model Size [kB]
Fine Tree	99.5	1109	1,400,000	11	29
Medium Tree	99.5	1097	830,000	10	8
Coarse Tree	99.4	1294	870,000	9	5
Linear Discriminant	99.5	1169	610,000	9	5
Quadratic Discriminant	98.5	3193	580,000	8	8
Binary GLM Logistic Regression	99.4	not applicable	730,000	21	39,000
Efficient Logistic Regression	99.4	1354	770,000	16	12
Efficient Linear SVM	99.3	1436	730,000	19	12
Gaussian Naïve Bayes	98.4	3562	610,000	14	7
Kernel Naïve Bayes	98.9	2477	260	4729	53,000
Linear SVM	99.4	1229	180,000	911	222
Quadratic SVM	99.5	1109	360,000	3192	202
Cubic SVM	99.5	1048	540,000	5681	188
Fine Gaussian SVM	99.3	1516	26,000	2693	1000
Medium Gaussian SVM	99.5	993	78,000	908	232
Coarse Gaussian SVM	99.5	1124	60,000	452	202
Fine k-NN	99.5	1098	13,000	59	24,000
Medium k-NN	99.5	1042	5200	145	24,000
Coarse k-NN	99.5	1155	2400	382	24,000
Cosine k-NN	99.4	1242	1400	748	18,000
Cubic k-NN	99.5	1041	2900	317	24,000
Weighted k-NN	99.6	871	5600	160	24,000
Ensemble Boosted Trees	99.5	1050	69,000	376	273
Ensemble Bagged Trees	99.6	885	51,000	1092	5000
Ensemble Subspace Discriminant	99.4	1246	36,000	83	120
Ensemble Subspace k-NN	99.5	1045	5100	349	492,000
Ensemble RUSBoosted Trees	98.1	4076	98,000	130	273
Narrow Neural Network	99.5	1077	710,000	1350	7
Medium Neural Network	99.5	1023	780,000	2072	8
Wide Neural Network	99.5	1046	1,100,000	3419	14
Bilayered Neural Network	99.5	1156	710,000	1801	8
Trilayered Neural Network	99.5	1070	850,000	2165	10
SVM Kernel	99.4	1334	120,000	1138	11
Logistic Regression Kernel	99.4	1406	110,000	547	11
Optimized Classifier Group	Accuracy [%]	Total Cost	Prediction Speed [obs/s]	Training Time [s]	Model Size [kB]
Tree	99.5	1083	3,600,000	44	13
Discriminant	99.5	1169	2,200,000	35	5
SVM	99.6	927	130,000	29,587	314
Naïve Bayes	98.9	2466	270	51,166	53,000
k-NN	99.6	870	60,000	4140	24,000
Ensemble Bagged Trees	99.6	899	33,000	32,677	18,000
Ensemble RUSBoosted Trees	99.6	931	19,000	2372	53,000
Neural Network	99.6	957	590,000	39,444	31
Optimized Classifier	Accuracy [%]	Total Cost	Prediction Speed [obs/s]	Training Time [s]	Model Size [kB]
Tree	99.5	1083	3,200,000	2	13
Discriminant	99.5	1169	2,100,000	2	5
SVM	99.6	927	210,000	184	314
Naïve Bayes	98.9	2466	250	3629	53,000
k-NN	99.6	870	82,000	14	24,000
Ensemble Bagged Trees	99.6	899	51,000	316	18,000
Ensemble RUSBoosted Trees	99.6	933	20,000	130	53,000
Neural Network	99.6	953	600,000	483	31

Table 4. Performance of thirty-four classifiers evaluated using five metrics: sensitivity, specificity, precision (PPV), negative predictive value (NPV), and detection accuracy; TP—artifacts marked manually and identified automatically; TN—no artifacts marked manually and not identified automatically; FP—no artifacts marked manually but identified automatically; FN—artifacts marked manually but not identified automatically.

Classifier	TP	FP	TN	FN	Sensitivity	Specificity	PPV	NPV	Accuracy
1. Fine Tree	215,058	723	1075	386	99.8	59.8	99.7	73.6	99.5
2. Medium Tree	215,073	726	1072	371	99.8	59.6	99.7	74.3	99.5
3. Coarse Tree	215,029	879	919	415	99.8	51.1	99.6	68.9	99.4
Optimized Tree	215,090	729	1069	354	99.8	59.5	99.7	75.1	99.5
4. Linear Discriminant	214,993	718	1080	451	99.8	60.1	99.7	70.5	99.5
5. Quadratic Discriminant	212,575	324	1474	2869	99.7	82.0	98.8	33.9	99.5
6. Binary GLM Logistic Regression	215,159	914	884	285	99.9	49.2	99.6	75.6	99.4
7. Efficient Logistic Regression	215,163	1073	725	281	99.9	40.3	99.5	72.1	99.4
8. Efficient Linear SVM	215,337	1329	469	107	100.0	26.1	99.4	81.4	99.3
9. Gaussian Naïve Bayes	212,158	276	1522	3286	98.5	84.6	99.9	31.7	98.4
10. Kernel Naïve Bayes	213,283	316	1482	2161	99.0	82.4	99.9	40.7	98.9
Optimized Naïve Bayes	213,291	313	1485	2153	99.0	82.6	99.9	40.8	98.9
11. Linear SVM	215,240	1025	773	204	99.9	43.0	99.5	79.1	99.4
12. Quadratic SVM	215,215	880	918	229	99.9	51.1	99.6	80.0	99.5
13. Cubic SVM	215,210	814	984	234	99.9	54.7	99.6	80.8	99.5
14. Fine Gaussian SVM	215,435	1507	291	9	100.0	16.2	99.3	97.0	99.3
15. Medium Gaussian SVM	215,152	701	1097	292	99.9	61.0	99.7	79.0	99.5
16. Coarse Gaussian SVM	215,189	869	929	255	99.9	51.7	99.6	78.5	99.5
Optimized Gaussian SVM	215,170	653	1145	274	99.9	63.7	99.7	80.7	99.6
17. Fine k-NN	214,980	634	1164	464	99.8	64.7	99.7	71.5	99.5
18. Medium k-NN	215,184	782	1016	260	99.9	56.5	99.6	79.6	99.5
19. Coarse k-NN	215,202	913	885	242	99.9	49.2	99.6	78.5	99.5
20. Cosine k-NN	215,132	930	868	312	99.9	48.3	99.6	73.6	99.4
21. Cubic k-NN	215,181	778	1020	263	99.9	56.7	99.6	79.5	99.5
22. Weighted k-NN	215,213	640	1158	231	99.9	64.4	99.7	83.4	99.6
Optimized Weighted k-NN	215,216	642	1156	228	99.9	64.3	99.7	83.5	99.6
23. Ensemble Boosted Trees	215,078	684	1114	366	99.8	62.0	99.7	75.3	99.5
24. Ensemble Bagged Trees	215,151	592	1206	293	99.9	67.1	99.7	80.5	99.6
25. Ensemble Subspace Discriminant	215,918	720	1078	526	99.8	60.0	99.7	67.2	99.9
26. Ensemble Subspace k-NN	215,277	878	920	167	99.9	51.2	99.6	84.6	99.5
27. Ensemble RUSBoosted Trees	211,553	185	1613	3891	98.2	89.7	99.9	29.3	98.1
Optimized Ensemble Bagged Trees	215,144	599	1199	300	99.9	66.7	99.7	80.0	99.6
Optimized Ensemble RUSBoosted Trees	214,985	472	1326	459	99.8	73.7	99.8	74.3	99.6
28. Narrow Neural Network	215,076	709	1089	368	99.8	60.6	99.7	74.7	99.5
29. Medium Neural Network	215,080	659	1139	364	99.8	63.3	99.7	75.8	99.5
30. Wide Neural Network	215,020	622	1176	424	99.8	65.4	99.7	73.5	99.5
31. Bilayered Neural Network	215,076	688	1110	368	99.8	61.7	99.7	75.1	99.5
32. Trilayered Neural Network	215,068	694	1104	376	99.8	61.4	99.7	74.6	99.5
Optimized Neural Network	215,098	611	1187	346	99.8	66.0	99.7	77.4	99.6
33. SVM Kernel	215,183	1073	725	261	99.9	40.3	99.5	73.5	99.4
34. Logistic Regression Kernel	215,127	1089	709	317	99.9	39.4	99.5	69.1	99.4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Olejarczyk, E.; Massaroni, C. Comparison of Machine Learning Models in Nonlinear and Stochastic Signal Classification. Appl. Sci. 2025, 15, 11226. https://doi.org/10.3390/app152011226

AMA Style

Olejarczyk E, Massaroni C. Comparison of Machine Learning Models in Nonlinear and Stochastic Signal Classification. Applied Sciences. 2025; 15(20):11226. https://doi.org/10.3390/app152011226

Chicago/Turabian Style

Olejarczyk, Elzbieta, and Carlo Massaroni. 2025. "Comparison of Machine Learning Models in Nonlinear and Stochastic Signal Classification" Applied Sciences 15, no. 20: 11226. https://doi.org/10.3390/app152011226

APA Style

Olejarczyk, E., & Massaroni, C. (2025). Comparison of Machine Learning Models in Nonlinear and Stochastic Signal Classification. Applied Sciences, 15(20), 11226. https://doi.org/10.3390/app152011226

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comparison of Machine Learning Models in Nonlinear and Stochastic Signal Classification

Abstract

Featured Application

Abstract

1. Introduction

2. Materials and Methods

2.1. ECG Registration and Preprocessing

2.2. ECG Analysis

2.2.1. Higuchi Fractal Dimension (HFD)

2.2.2. Katz Fractal Dimension (KFD)

2.2.3. Detrended Fluctuation Analysis (DFA)

2.2.4. Approximate Entropy (ApEn)

2.2.5. Sample Entropy (SampEn)

2.2.6. Multiscale Entropy (MSE)

2.3. Feature Selection and Classification

2.3.1. Feature Selection

2.3.2. Feature Classification

2.3.3. Hyper-Parameter Optimization

2.3.4. Classification Performance

3. Results

3.1. Distributions of Nonlinear ECG Measures in Healthy Persons

3.2. Feature Importance Scores

3.3. Choice of the Best Classifier

3.4. Comparison of Classifiers: The Ensemble RUSBoosted Trees and the Weighted k-NN Classifier

3.5. Relationships Between Variance and Nonlinear Measures

3.6. Relationships Between Entropy and Fractal Dimension Measures

3.7. Relationships Between Entropy Measures

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Evaluation of Energy Consumption Due to Model Training

Appendix B. Comparison of Model Performance

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI