Entropy Measures of Electroencephalograms towards the Diagnosis of Psychogenic Non-Epileptic Seizures

Psychogenic non-epileptic seizures (PNES) may resemble epileptic seizures but are not caused by epileptic activity. However, the analysis of electroencephalogram (EEG) signals with entropy algorithms could help identify patterns that differentiate PNES and epilepsy. Furthermore, the use of machine learning could reduce the current diagnosis costs by automating classification. The current study extracted the approximate sample, spectral, singular value decomposition, and Renyi entropies from interictal EEGs and electrocardiograms (ECG)s of 48 PNES and 29 epilepsy subjects in the broad, delta, theta, alpha, beta, and gamma frequency bands. Each feature-band pair was classified by a support vector machine (SVM), k-nearest neighbour (kNN), random forest (RF), and gradient boosting machine (GBM). In most cases, the broad band returned higher accuracy, gamma returned the lowest, and combining the six bands together improved classifier performance. The Renyi entropy was the best feature and returned high accuracy in every band. The highest balanced accuracy, 95.03%, was obtained by the kNN with Renyi entropy and combining all bands except broad. This analysis showed that entropy measures can differentiate between interictal PNES and epilepsy with high accuracy, and improved performances indicate that combining bands is an effective improvement for diagnosing PNES from EEGs and ECGs.


Introduction
Psychogenic non-epileptic seizures (PNES) clinically resemble epileptic seizures but are not due to epileptic electrical brain activity [1]. Although the condition is almost as prevalent as multiple sclerosis [2,3], PNES is regularly misdiagnosed: people with PNES are not appropriately diagnosed for an average of seven years [4], and approximately 78% of patients were taking at least one anti-epileptic drug at the time of accurate diagnosis [5]. This has serious adverse effects for both patients and healthcare systems, through unnecessary visits to hospitals, medical tests, and treatments. In addition, since anti-epileptic drugs are not effective for PNES, these misdiagnosed patients will have endured the negative side effects of these expensive drugs without any significant benefit [3]. Furthermore, an estimated one in five referrals to epilepsy clinics actually have PNES [6], highlighting the difficulties in making an accurate diagnosis.
The current gold standard method of diagnosis is the recording of a seizure with video-electroencephalogram (EEG), from which a specialist assesses the semiology (the Single biomedical signal parameters have been shown to be insufficient as a differentiator for PNES and epilepsy [18]. Therefore, a potential tool to mitigate these problems is machine learning. Machine learning classifiers are mathematical algorithms that "learn" how to separate conditions by training on a set of data. The validity of this trained model is then tested using more data. When analysing biomedical signals, these data are typically comprised of one or more features extracted from the signal taken at different observations. This allows the classifiers to consider multiple factors with different types of information simultaneously. The model's ability to separate the conditions is assessed using performance metrics such as accuracy (ability to predict both conditions correctly) [19].
Machine learning has been previously used to classify entropy measures extracted from the EEGs of PNES patients. For instance, from 2014-2018, a series of six papers published by the same group of researchers [20][21][22][23][24][25] used spectral entropy as one of 55 EEG features analysed by machine learning.
Ahmadi et al. [26] used EEGs from 20 epilepsy and 20 PNES subjects and compared the Shannon entropy, spectral entropy, Renyi entropy, Higuchi fractal dimension, Katz fractal dimension, and the EEG frequency bands with an imperialist competitive algorithm. They found that spectral entropy and Renyi entropy were the most important EEG features as they were always among the five best feature subsets. Furthermore, the classification accuracy decreased significantly when either or both were excluded from a subset. They also found that SVMs with a linear or RBF kernel were the best classifiers.
The same group did another study [10], this time with five epilepsy and five PNES subjects. They extracted the same EEG features from each frequency band, this time including the energy of the signal. The researchers found that beta was the best band for all features and gamma was the worst. The highest performing features differ for each band, making an overarching conclusion difficult.
Cura et al. [27] used synchrosqueezing to represent the time-frequency maps of 16 epilepsy and six PNES subjects. From these maps, 17 features were extracted: three flux, flatness, and energy concentration measures; two Renyi entropy measures; six statistical features; and five TF sub-band energy measures. The researchers used decision tree, SVM, RF, and RUSBoost classifiers to differentiate all 17 features. For the three class problems, the inter-PNES (non-seizure), PNES seizure, and epileptic seizure EEGs, the highest accuracy, precision, and lowest false discovery rate were reported by RF with 95.8%, 91.4%, and 8.6%. The highest sensitivity was reported by the RUSBoost classifier with 90.3%. All classifiers except the SVM reported higher accuracy ≥ 93%, sensitivity ≥ 82%, and precision ≥ 86% and lower false discovery rates ≤ 14% values. The researchers also compared the inter-PNES and PNES EEGs for PNES seizure detection. All accuracies were ≥90% (excluding the SVM for one patient) and RF reported the highest of these. This paper will aim to assess the ability of seven entropy metrics to differentially diagnose PNES and epilepsy by using these features individually as the inputs for four popular machine learning methods. This analysis will compare the diagnostic power of each feature and each EEG frequency band for a large database of PNES and epilepsy EEG and electrocardiogram (ECG) recordings.

Materials and Methods
The data used in this analysis were collected routinely at St George's Hospital, London and consisted of interictal and preictal surface EEG recordings from 48 PNES and 29 epilepsy patients. The PNES subjects have an age range of 17-59 (mean 34.76 ± 10.55) and a male/female ratio of 14/34. The epilepsy subjects have an age range of 19-79 (mean 38.95 ± 13.93) and a male/female ratio of 18/11. Suitable cases were retrospectively identified from the video-EEG database of those attending for inpatient video-EEG monitoring from 2016 to 2019. The diagnosis of functional seizures was made according to International League Against Epilepsy diagnostic criteria [28] by at least two clinicians experienced in the diagnosis of epilepsy and were documented through video-EEG in all cases. The diagnosis of epileptic seizures was based upon EEG confirmed ictal epileptiform activity during the recorded epileptic event during video-EEG monitoring. Exclusion criteria for both groups included cases with a dual diagnosis of both epileptic and functional non-epileptic seizures. The recordings were taken with Natus Networks with an EEG32 headbox. The EEG electrodes were placed according to the 10-20 system montage with Cz-Pz as the reference electrode. The ECG is comprised of two electrodes, ECG+ and ECG-, placed on the right and left mid-clavicular line. The sampling frequencies were either 256, 512, or 1024 Hz, and bandpass filtering from 0.5 to 70 Hz was applied. The data were reviewed and clipped by experienced clinicians in the field, who selected awake time epochs when patients were still and at rest, without seizures or ictal/epileptiform manifestations, and with minimal noise. All clipped EEG data was de-identified and the video removed prior to the current analysis. Anonymised recordings were stored in EDF+ format.
The EEGs and ECGs were preprocessed using MNE-python [29]. The signals with a sampling rate of over 256 Hz were downsampled to this value and the common electrodes were selected: Fp1, F7, T3, T5, O1, F3, C3, P3, Fz, Cz, Fp2, F8, T4, T6, O2, F4, C4, P4, Fpz, Pz, ECG+, and ECG-. The EEGs were filtered using an FIR, Hamming window, bandpass filter with cutoff frequencies of 0.5 and 40 Hz. The ECGs were filtered using a Bessel IIR bandpass filter with cutoff frequencies of 0.25 and 40 Hz, the method for which was derived from [30,31]. Inspection of the time and frequency plots of the EEG showed no significant mains noise, so this was not specifically removed. The data were then segmented into ten-second non-overlapping epochs. To remove noise, epochs where the EEG amplitude did not exceed 1 µV were removed, and AutoReject [32] automatically removed epochs with noisy EEG. The remaining epochs were then visually inspected to exclude any epochs that contained flat EEG or ECG. The resulting 10,452 epochs were then baseline corrected using the average of each subject's EEG. These EEG samples were then filtered into the frequency bands: delta 0.5-4 Hz, theta 4-8 Hz, alpha 8-13 Hz, beta 13-30 Hz, and gamma 30-40 Hz. The ECG channel was found by subtracting the values of the ECG+ lead from the ECG-lead. Baseline wander was then removed using a filter with a 0.05 Hz cutoff [33]. Entropy features were extracted from every band and every channel (including ECG), including the original broad band (0.5-40 Hz). The ECG filtering, however, was the same for each EEG frequency band analysed (0. [25][26][27][28][29][30][31][32][33][34][35][36][37][38][39][40]. The entropy measures used in this analysis were: approximate, sample, spectral, singular value decomposition (SVD), Renyi, and wavelet entropy. These features were extracted from each channel in each sample, giving 21 input parameters per band per feature. The approximate and sample entropies were computed using EntropyHub [34], the spectral and SVD entropies were calculated using MNE-features [35], and the Renyi entropy was estimated using DIT [36].
Approximate entropy was introduced by Pincus [37] to define irregularity in sequences and time series data [38]. Formally, given N data points from a time series {x(n)} = x(1), x(2), . . . , x(N), the ApEn is calculated using two input parameters, a run length m and a tolerance window r, which must be fixed [38]. To define ApEn(m, r, N), form vectorsequences X(1), . . . , Then define the distance d[X(i), X(j)] between vectors X(i) and X(j) as the maximum distance in their respective scalar components. For each i ≤ N − m + 1, construct C m i (r) defined as (the number of X(j) such that d[X(i), X(j)] ≤ r)/(N − m + 1). Next, define Φ m (r) as the average value of ln C m i (r). The ApEn is then defined in Equation (1) [38], where N is 2560 throughout this analysis.
Nevertheless, to avoid the occurrence of ln(0) in the calculation of ApEn, the algorithm includes self-matching, leading to a discussion of bias in this entropy metric [39]. Sample entropy (SampEn) was introduced by Richman and Moorman [39] as an improvement upon ApEn by reducing the dependency on record length and to avoid self-matching. To define SampEn(m, r, N) of a time series {x(n)} = x(1), x(2), . . . , x(N), with a run length m and a tolerance window r, form vector-sequences X m (1), . . . , X m (N − m + 1), defined by X m (i) = [x(i), x(i + 1), . . . , x(i + m − 1)], where i = 1, . . . , N − m + 1. The distance d[X m (i), X m (j)] between vectors X m (i) and X m (j) is then defined as the maximum absolute distance between their respective scalar components.
. Then, increase the dimension to m + 1 and calculate and A m (r) as the average value of A m i (r). Therefore, B m (r) is the probability that two sequences will match m points, whereas A m (r) is the probability that two sequences will match m + 1 points. Sample entropy is then defined using Equation (2), which is estimated by the statistic in Equation (3), where N is 2560 throughout this analysis.
Since both ApEn and SampEn are highly dependent on the input parameters run length m and tolerance window r, these values require selection. For both entropies, the recommended range of values for the parameters are m = 1 or 2 and r between 0.1 and 0.25 times the standard deviation (SD) of the input time series x(n) [39]. Therefore, the following parameter combinations were tested with a grid search m = [1,2], where r = r sd × is the SD of the input time series. To avoid overfitting the data, a subset of ten patients per class were selected for this analysis. ApEn and SampEn were extracted from this subset using each combination of m and r. These features were then inputted to a support vector machine (SVM) with a radial basis function (RBF) kernel and validated with 5-fold cross validation. The m and r combination that returned the highest average balanced accuracy from the classifiers was then selected as the input parameters to be used for the analysis with the full dataset. The specifics of the machine learning aspects of this process are described below.
Spectral entropy (SpecEn) finds the Shannon entropy [40] of the power spectrum and is calculated using Equation (4), where p i is the probability distribution of the power spectrum of the time series, i is one of the discrete states (assuming a bin width of one spectral unit), the sum of p i is 1, and Ω is the number of discrete states [41].
SVD entropy (SVDEn) was defined by Alter et al. [42]. SVD is a matrix orthogonalisation decomposition method, so for a time series {x(n)} = x(1), x(2), . . . , x(N) the Hankel matrix H m×n can be reconstructed as where 1 < n < N, m = N − n + 1 [43]. The SVD of H m×n can be defined as where the left singular vectors U m×m and right singular vectors V n×n are orthogonal matrices, and ∑ m×n is a diagonal matrix composed of singular values (σ 1 ≥ σ 2 ≥ . . . ≥ σ L ≥ 0, L = min(m, n)) [43]. In this space, matrix H m×n satisfies k|∑|l ≡ ∑ l δ kl ≥ 0 for all 1 ≤ k, l ≤ L [42]. Let us define the normalised eigenvalues as, which indicates the relative significance of the lth eigenvalue and eigenvector in terms of the fraction of the overall expression that they capture [42]. Then the SVD entropy of the dataset X is as shown in Equation (8) [42]: Renyi entropy (REn) estimates the spectral complexity of a signal and is calculated using Equation (9), where the order α ≥ 0 and α = 1, p α i is the probability distribution of the time series, i is one of the discrete states, and Ω is the number of discrete states [44]. For this analysis, α = 2 to replicate [10] for ease of comparison with this study.
Wavelet entropy (WaveEn) is a measure of the degree of disorder associated with the multi-frequency signal response. The wavelet coefficients C i,j were found using wavelet decomposition, where i is the time index and j is the index of the different resolution levels. The energy for each time i and level j can be found using Equation (10) [45].
The mean energy was then calculated using Equation (11), where the index k is the mean value in successive time windows, which will now give the time evolution; k 0 is the starting value of the time window (k 0 = 1, 1 + ∆t, 1 + ∆t, . . .); and n is the number of wavelet coefficients in the time window for each resolution level [45]. The probability distribution for each level can be defined using Equation (12) [45].
Following the definition of Shannon entropy [40], the time-varying wavelet entropy was found using Equation (13) [45]. More details can be found at [46].
For this analysis, Morlet wavelets were used since they are commonly used in EEG research [47]. Once these features had been extracted from every channel for every epoch in every band, they were used to train and test four machine learning classifiers: SVM, k-nearest neighbours (kNN), random forest (RF), and gradient boosting machine (GBM). These models were implemented using the scikit-learn python package [48].
SVMs were introduced in [49] and classify by searching for an optimal hyperplane that separates the classes. If the data are separable, the hyperplane maximises a margin around itself that does not contain any data, creating boundaries for the classes. Otherwise, the algorithm establishes a penalty on the length of the margin for every observation that is on the wrong side. The SVM classifiers used in this analysis used an RBF kernel, which maps the data onto a non-linear plane. The RBF kernel between two patterns x and x is calculated using Equation (14).
In this case, γ was taken as 1/(number of features × variance of the data).
The kNN algorithm is based on the idea that similar groups will cluster. The model is trained by 'plotting' observations based on their features, presumably with the classes clustering. The algorithm is tested by plotting an observation and classifying it based on the class of the nearest neighbours. The number of nearest neighbours, k, was individually selected by a grid search that tested 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, and 20 neighbours. This defined k as the value that returned the highest balanced accuracy with ten-fold cross validation.
RF was introduced by [50] and is based on randomised decision trees. Decision trees are flowchart-like structures that predict the value of a target variable by learning a series of simple decision rules based on the training data. RF uses an ensemble of trees, each with a different random subset of the features in a method called bootstrap aggregating, or bagging. This decreases the variance, compared to an individual decision tree, and reduces the risk of overfitting. The class was then taken as the average of the trees' probabilistic predictions, whereas the original publication [50] let each tree vote for a single class.
GBMs are ensembles of weak learners, typically decision trees, and were introduced by [51,52]. GBMs are similar to gradient descents in a functional space. The model is built by adding a new tree with every iteration. The new tree is fitted to minimise the sum of the losses of the (now previous) model. For binary classification, a prediction is made based on the probability that the sample belongs to the positive class. This is found by applying the sigmoid function to the tree ensemble.
To classify the feature set, ten-fold cross validation was used to define the training and testing datasets. Since the classes in this dataset are imbalanced with more PNES data, the epilepsy data in the training set was oversampled using a synthetic minority over-sampling technique (SMOTE). The feature space was then reduced using principal component analysis (PCA), with a variance of 95%.
Precision, recall, and balanced accuracy were used to evaluate the classifiers' predictions of test data. Since the dataset was imbalanced, these metrics were selected as they avoid inflated performance metrics on imbalanced datasets. Equations (15)- (17) show the calculations for these performance metrics.
balanced accuracy = 1 2 where TP is the true positive rate, TN is the true negative rate, FP is the false positive rate, and FN is the false negative rate. Here, PNES is the positive class and epilepsy is the negative class.
Permutation feature importance was also used to compare the EEG frequency bands. This was done by adapting the algorithm [50] to include multiple features. A model m was fitted using training data, and then a reference score s was defined using the validation data D. Each feature (channel) of the set (band) to be assessed f n:o was then permutated (randomly shuffled) in order to corrupt the validation samples of that band and give D k,n:o . The score s k,n:o of model m on this corrupted validation dataset was then computed. This process of permutating and calculating score s k,n:o was repeated K times with iteration k. The importance i n:o of the feature set (band) f n:o is then defined using Equation (18).

Results
The grid search to establish the ideal values for m and r SD found that the highest average accuracy across the bands was returned when m = 2 and r SD = 0.2 for ApEn and when m = 1 and r SD = 0.15 for SampEn. These parameters were then used to extract the ApEn and SampEn from the full dataset. The accuracies from these tests can be found in the Supplementary Materials.
Using the methods described, the balanced accuracies returned are reported in Table 1. Tables containing the precision and recall can be found in the Supplementary Materials.  Table 1 shows a range of balanced accuracies with only two instances returning below chance (50%). The highest accuracy was 94.68%, with 96.12% precision and 95.19% recall, which was obtained by Renyi entropy with a kNN classifier in the 'all' band. Generally, the lowest performing entropy measure was wavelet entropy, and the best was Renyi entropy. Overall, the lowest accuracies were obtained by the gamma band, and with all the EEG bands combined-the 'all' band-the highest accuracies were returned.
When comparing the entropy measures and the frequency bands, it is possible to group the measures into three different trends: Renyi entropy; sample, approximate, SVD, and spectral entropy; and wavelet entropy. Wavelet entropy was the measure returning the lowest accuracies with a mean of 53.24 ± 3.18%. This measure returned higher accuracies in the 'all' and theta bands, and the lowest accuracies in the alpha, beta, and gamma bands.
Sample, approximate, SVD, and spectral entropy returned higher accuracies in the 'all' and broad bands. The combined 'all' band improved the SVM, kNN, and GBM classifiers. The RF, however, only showed a slight increase. The SVM accuracy was significantly improved (over 12% increase, excluding spectral entropy) by the 'all' band for all these measures, as well as the kNN (over 9% increase, excluding spectral entropy). The delta, theta, alpha, and beta bands returned medium accuracies, and the gamma band returned a further drop in classifier performance. These measures typically outperformed wavelet entropy by a large margin, with means of 67.71 ± 7.29%, 68.97 ± 8.13%, 65.23 ± 7.44%, and 65.16 ± 5.93%, respectively.
The Renyi entropy was overall the highest performing entropy measure, with a mean of 82.48 ± 4.20%. In the broad band, the accuracies of this measure were only somewhat higher than the sample, approximate SVD, and spectral entropies. However, the accuracies for Renyi entropy increased in the theta, alpha, beta, and gamma bands. In comparison, the accuracy for the other measures remained stable or decreased in these bands, especially gamma. The combination of 'all' bands improved the accuracy, especially for the SVM, which increased by 10.86%. As a result, most of the classifiers in the 'all' band were able to achieve over 90%.
The best classifiers were kNN and, generally, the higher the overall accuracy for a band and/or feature, the bigger the difference between kNN and RF and the other two classifiers. Overall, RF was the better classifier. However, the kNN returned the highest accuracy value since it, along with the SVM, was greatly improved by combining all the bands, whereas RF and GBM were less affected. Furthermore, Table 1 shows that GBM was often the lowest performing classifier.
Since the combination of the bands performed well, a further experiment was conducted to establish which specific bands were contributing to the high accuracy. Using the same process as described above, each band was excluded from the full set and the remaining bands were used for classification. The ECG signal was also used as an input for each band. This experiment used the highest performing classifier, kNN, and the highest performing entropy metric, Renyi entropy, and the outcomes are summarised in Table 2. The importance of the band reported is the average permutation band importance over ten-fold cross validation.  Table 2 shows that removing a single band had a minor effect on the precision and recall, thus affecting the balanced accuracy but not significantly. Excluding broad and delta increased the accuracy to 95.03% and 94.93%, respectively, from 94.68% when all bands were used. However, excluding the others resulted in a loss of 0.60% or more. Therefore, the theta, alpha, beta, and gamma bands contain important information for Renyi entropy. The band importance from the permutation-based testing is congruent with these findings, with the broad and delta bands returning half the permutation importance of the other bands. These findings are congruent with the trend shown in Table 1 for the Renyi entropy, where broad and delta slightly underperformed compared to the other four non-combination bands.

Discussion
Spectral and wavelet entropy were both found by calculating the Shannon entropy of the frequency spectrum, where spectral entropy estimated the spectrum using Welch's method and wavelet entropy used Morlet wavelets. Despite these similarities, the resultant accuracies were significantly different, with spectral entropy outperforming wavelet entropy in every band and with every classifier. This suggests that Welch's method is more suitable for extracting the uncertainty in the frequency domain for this specific task. Furthermore, the spectral and wavelet entropies both returned the lowest accuracies, on average, of all the measures. Therefore, our results suggest that for these data measures of complexity, those in the time domain may be more effective than those in the frequency domain. The measure that returned the highest accuracy, Renyi entropy, is a variation of Shannon entropy applied directly to the time series. This further lends to the effectiveness of temporal complexity, and further research should explore similar methods.
While the classifier performances for most entropy measures were improved by combining all frequency bands, generally the SVM and kNN improved more significantly than the decision tree-based algorithms, especially RF. Decision trees do not need to increase the parameters with more inputs, so it is possible that the extra information was lost for these model types. Furthermore, the nature of an ensemble of random subsamples of the feature set, as is the case with RF, may have hindered the classifier's ability to consider the extra information. This could be the cause of the limited improvement and occasional degradation of the RF when combining the classifiers, despite the high performance in the non-combination bands. Therefore, feature selection methods, such as feature ranking, should be used with this classifier to potentially improve accuracy with larger feature sets.
A 2021 meta-analysis on resting state EEGs for the diagnosis of epilepsy and PNES [53] found that comparing oscillations along the theta band may separate epilepsy and PNES. Reuber et al. [4] also found interictal slow rhythms in the theta band for nine out of 50 PNES patients. When considering only the delta, theta, alpha, beta, and gamma bands, the current analysis found that the theta band returned the highest balanced accuracy for 13 out of 24 (four classifiers for six entropy measures) instances, indicating that a difference in theta oscillations could be reflected in the entropy. However, the beta band returned the highest accuracy in 8 of these 24 instances, especially for the spectral entropy. Therefore, the beta band could also be of interest to future researchers.
Comparison to the literature is complex due to the difference in techniques used to analyse the EEGs of PNES patients. For instance, Pyrzowski et al [11] extracted the entropy from pooled histograms of the zero crossing rate, and the six-paper series [20][21][22][23][24][25] and Cura et al. [27] only used one or two entropy measures as part of a larger feature set, obscuring the influence of the entropy. Furthermore, [11,[20][21][22][23][24][25] included non-PNES subjects within their subject cohorts. The papers that included the ECG [15][16][17], all analysed the entropy of the heart rate data, a binary signal representing the R peaks, instead of the ECG signal itself. While these studies do represent the potential of entropy for this diagnostic task, the fundamental difference in method makes comparisons with them impossible.
Gasparini et al. [13] and Lo Giudice et al. [14] both statistically analysed the entropy of the EEG signal. The authors of [13] found no differences between the Shannon or permutation entropies of PNES patients and healthy controls, and [14] found no difference in interictal permutation entropy between PNES and epilepsy subjects. Therefore, statistical analysis alone may not be sufficient to differentiate between these groups.
The studies published by Ahmadi et al. [10,26] give details of the performance of similar entropy measures and classifiers in the frequency bands and use PNES-only and epilepsy-only groups. Thus, an in-depth comparison with the current study is possible, although neither study used an ECG channel, only EEGs, and only include the interictal state. The 2018 study [26] used an imperial competitive algorithm to rank the individual feature-band pairs and has listed the top five combinations of inputs for each classifier. They found that RF and decision trees were the weaker classifiers, compared to SVM-Linear, SVM-RBF, and GBM. However, the current analysis found that RF was overall the best classifier, with GBM underperforming. Ahmadi et al. (2018) also found that spectral and Renyi entropies were the most important features, compared to Shannon entropy, Higuchi fractal dimension, and Katz fractal dimension. The current study did not extract Shannon entropy or any fractal dimensions, so a direct comparison cannot be made. However, this analysis did find that Renyi entropy was a very high-performing metric for all bands, and spectral entropy was better than chance (50% accuracy) for all tests. Ahmadi et al. (2018) do not directly compare the frequency bands, though gamma is not listed in the features for any of the top performing inputs. This is congruent to the current study, since gamma underperformed for most entropy measures, including spectral entropy. The outlier is Renyi entropy, which retained high accuracies in the gamma band in the current analysis. Furthermore, broad band Renyi entropy was listed by [26] for most of the top performing combinations. By comparison, the current study found that Renyi was the entropy measure that returned the highest accuracies for the broad band analysis but returned lower accuracies than the other bands for this metric. In addition, the delta band is not noted as important by [26] for either entropy measure; therefore, it was found to be less important for these features, which is in agreement with the findings of the current analysis.
The study by Ahmadi et al. 2020 [10] gave a clearer breakdown of the bands for the Shannon, spectral, and Renyi entropy, although only the precision and recall values were reported, not accuracy, and the broad band was not analysed. In addition, the values reported for the delta and theta bands are exactly the same, which is statistically unlikely and is not reflected in the ROC curves also given. Therefore, the values reported in the current version of this paper for one of these bands may be incorrect. The delta, theta, and gamma bands for all entropy measures and Shannon entropy in the alpha band all return low performance metrics of roughly chance accuracy. The beta band, and spectral and Renyi entropy in the alpha band, however, return mostly 70% precision and 60% recall. ROC analysis showed that the beta band outperformed the delta, theta, alpha, and gamma bands. The alpha band performed well, but much worse than the beta. The delta and theta bands were similar to random chance, and gamma distinctly underperformed for all measures. For the current analysis, the Renyi entropy does show that delta is one of the bands less likely to help differentiate PNES from epilepsy, but disagrees for the theta, alpha, beta, and gamma bands, which all return good and fairly similar accuracies. These trends reported by Ahmadi et al. (2020) were more similar to those for the sample, approximate, spectral, and SVD entropies; where gamma significantly underperformed. Spectral entropy also showed a slight increase in beta band accuracies, but only spectral entropy showed this, and delta performed on par with the other bands.
A limitation of our study is that the two classes are not age-or sex-matched. The ages are similar enough that significant influence is unlikely. However, the PNES group has significantly more females than males, whereas the epilepsy group has more males than females. This is due to PNES being more commonly diagnosed in females than males by a factor of 3:1 [54,55]. In previous studies [56][57][58], machine learning has been successfully used to separate EEG entropy measures of females and males; therefore, it is possible that the balanced accuracies were inflated by the disparity in sex between the two groups. To ensure that this disparity did not have a significant impact, the model that returned the highest accuracy (Renyi entropy with a kNN classifier, with the delta, theta, alpha, beta, and gamma bands inputted as separate features) was trained and tested again with a subset of subjects that were age-and sex-matched. This matched dataset included 50 subjects with a ratio of 11 females to 14 males in both classes, and the epilepsy group had a mean age of 39.16 ± 11.86 while the PNES group had 38.52 ± 10.96. The accuracy, precision, and recall of the matched dataset were 95.40%, 97.10%, and 93.33%. Therefore, the balanced accuracy and precision increased slightly while the recall decreased slightly. Considering this outcome and the similarities in the literature, it is still reasonable to conclude that the difference in sexes between the classes had a minor impact and that entropy measures are indeed powerful measures in differentially diagnosing PNES and epilepsy. Another limitation is that the data includes both preictal (before seizure) and interictal (resting) recordings. Therefore, it is not possible to separate the impacts of these different types of data on the results. Finally, due to a small patient cohort, the current study used tenfold cross-validation to assess the classifiers. Therefore, samples from each subject were present in both the training and testing datasets. While this is a limitation, it demonstrates that this method is viable and, if trained on a larger population, could be beneficial in clinical contexts.

Conclusions
This study shows that the analysis of different frequency bands in the EEG, plus the ECG, with different entropy algorithms returns useful information for the classification of PNES. Furthermore, the bands providing the highest accuracy vary from entropy measure to measure. Therefore, the combination of bands for classification by machine learning algorithms can return higher results. While this would increase the computation cost, entropy measures are quick and low-cost; therefore, the added computation is a small cost compared to the improved performance. The current analysis found that the highest balanced accuracy, 95.03%, was returned by the delta, theta, alpha, beta, and gamma bands combined for the Renyi entropy when a kNN was used in the classification. However, this high performance may have been affected by the use of epoch-wise ten-fold cross validation. The kNN and RF classifiers returned the overall highest accuracies, with the GBM repeatedly underperforming compared to the others, and SVM and kNN showed more improvement with the combination of the bands. Further analysis should explore the combination of further low-cost features to increase the performance and improve the robustness of the classifiers for different patients.
Supplementary Materials: The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/e24101348/s1, Table S1: Balanced accuracies of the approximate entropy of the small subset of data with the test values of m and r SD . Average reports the average accuracy across the band. Ordered from the highest average balanced accuracy to the lowest in the Average column; Table S2: Balanced accuracies of the sample entropy of the small subset of data with the test values of m and r SD . Average reports the average accuracy across the band. Ordered from the highest average balanced accuracy to the lowest in the Average column; Table S3: Precision of the entropy metrics for every classifier and EEG frequency band (ECG is included in every band). Bold values denote the highest precision amongst the classifiers for each EEG band and entropy measure; Table  S4: Recalls of the entropy metrics for every classifier and EEG frequency band (ECG is included in every band). Bold values denote the highest recall amongst the classifiers for each EEG band and entropy measure.

Institutional Review Board Statement:
The study was approved by the Ethics Committee of Fulham, London as part of a larger study on biomarkers in functional seizures (IRAS 231863, REC 18/LO/0328, 18 July 2018).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data used in this study were provided by St George's Hospital and are not publicly available.