# Evaluation of Unsupervised Anomaly Detection Techniques in Labelling Epileptic Seizures on Human EEG

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Dataset

#### 2.2. Data Processing

#### 2.3. Feature Engineering

- 30 Hz raw—the vector of all 300 values of the WP in the 1–30 Hz band;
- 3 Hz raw—the sub-vector of 30 values of the WP in the 1–3 Hz band;
- 30 Hz mean—the mean value of the WP in the 1–30 Hz band;
- 3 Hz mean—the mean value of the WP in the 1–3 Hz band;
- 30 Hz PCA—PCA-based feature obtained after decomposing the WP in the 1–30 Hz band and considering four PCs that explain $90\%$ of the variance;
- 3 Hz PCA—PCA-based feature obtained after decomposing the WP in the 1–3 Hz band and considering four PCs that explain $98\%$ of the variance.

- the overall distribution of the WP across frequencies reflected by the feature 30 Hz raw;
- the mean value of the WP in the spectrum reflected by the feature 30 Hz mean;
- the WP values at dominant frequencies reflected by the feature 30 Hz PCA.

#### 2.4. Machine Learning Methods

#### 2.4.1. One-Class Support Vector Machine

**Kernel type**is similar to one used in standard SVM classifiers;- Threshold parameter (
**nu**) indicates the expected percentage of outliers in the data; - Kernel coefficient (
**gamma**) determines the degree of wrapping of the vectors by the plane; - Stopping criterion (
**tol**) implies that the algorithm stops running when the difference between old and new loss values becomes less than**tol**.

#### 2.4.2. k-Nearest Neighbors

**Algorithm**is a parameter responsible for the method used for distance calculation;**n_neighbors**defines the number of nearest neighbors;**Threshold**defines a decision boundary, i.e., the data with a distance exceeding the threshold is referred to as an outlier.

#### 2.4.3. Local Nearest Neighbors Distance

**A**is a given point and

**B**is its kth nearest neighbour, then the localized distance is the distance from

**A**to

**B**, divided by the distance from

**B**to its kth nearest neighbour. Hyperparameters for LNND are the same as the ones for kNN.

#### 2.4.4. Local Outlier Factor

**Algorithm**defines a distance measure;**n_neighbors**is the number of neighbours;**Contamination**sets the percentage of outliers in the dataset.

#### 2.4.5. Isolation Forest

**contamination**[39].

#### 2.5. Evaluation and Hyperparameter Optimization

## 3. Results and Discussion

## 4. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

EEG | Electroencephalogram |

ML | Machine learning |

DL | Deep learning |

RNN | Recurrent neural networks |

LSTM | Long short-term memory |

DSS | Decision support system |

SVM | Support vector machine |

PC | Principle component |

ICA | Independent component analysis |

CWT | Continuous wavelet transform |

WP | Wavelet power |

PCA | Principle component analysis |

OCSVM | One-class support vector machine |

kNN | k-Nearest neighbours |

LNND | Local nearest neighbours distance |

LOF | Local outlier factor |

IF | Isolation forest |

TP | True positive |

FP | False positive |

FN | False negative |

rm ANOVA | Repeated measures analysis of variances |

sCSP | Sparse common spatial pattern |

FSST | Fourier transform-based synchrosqueezing transform |

CADS | Computer-aided diagnosis system |

TQWT | Tunable-Q wavelet transform |

CNN | Convolutional neural network |

## References

- Beghi, E. The epidemiology of epilepsy. Neuroepidemiology
**2020**, 54, 185–191. [Google Scholar] [CrossRef] - Thijs, R.D.; Surges, R.; O’Brien, T.J.; Sander, J.W. Epilepsy in adults. Lancet
**2019**, 393, 689–701. [Google Scholar] [CrossRef] [PubMed] - Fisher, R.S.; Acevedo, C.; Arzimanoglou, A.; Bogacz, A.; Cross, J.H.; Elger, C.E.; Engel, J., Jr.; Forsgren, L.; French, J.A.; Glynn, M.; et al. ILAE official report: A practical clinical definition of epilepsy. Epilepsia
**2014**, 55, 475–482. [Google Scholar] [CrossRef] [PubMed] - Goldberg, E.M.; Coulter, D.A. Mechanisms of epileptogenesis: A convergence on neural circuit dysfunction. Nat. Rev. Neurosci.
**2013**, 14, 337–349. [Google Scholar] [CrossRef] [PubMed] - Motamedi, G.; Meador, K. Epilepsy and cognition. Epilepsy Behav.
**2003**, 4, 25–38. [Google Scholar] [CrossRef] [PubMed] - Elger, C.E.; Hoppe, C. Diagnostic challenges in epilepsy: Seizure under-reporting and seizure detection. Lancet Neurol.
**2018**, 17, 279–288. [Google Scholar] [CrossRef] - Friedman, D.E.; Hirsch, L.J. How long does it take to make an accurate diagnosis in an epilepsy monitoring unit? J. Clin. Neurophysiol.
**2009**, 26, 213–217. [Google Scholar] [CrossRef] - Tatum, W.O. Handbook of EEG Interpretation; Springer Publishing Company: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
- Amiri, M.; Aghaeinia, H.; Amindavar, H.R. Automatic epileptic seizure detection in EEG signals using sparse common spatial pattern and adaptive short-time Fourier transform-based synchrosqueezing transform. Biomed. Signal Process. Control
**2023**, 79, 104022. [Google Scholar] [CrossRef] - Malekzadeh, A.; Zare, A.; Yaghoobi, M.; Kobravi, H.R.; Alizadehsani, R. Epileptic seizures detection in EEG signals using fusion handcrafted and deep learning features. Sensors
**2021**, 21, 7710. [Google Scholar] [CrossRef] - Jiwani, N.; Gupta, K.; Sharif, M.H.U.; Adhikari, N.; Afreen, N. A LSTM-CNN Model for Epileptic Seizures Detection using EEG Signal. In Proceedings of the 2022 2nd International Conference on Emerging Smart Technologies and Applications (eSmarTA), Ibb, Yemen, 25–26 October 2022; pp. 1–5. [Google Scholar]
- Khan, I.M.; Khan, M.M.; Farooq, O. Epileptic Seizure Detection using EEG Signals. In Proceedings of the 2022 5th International Conference on Computing and Informatics (ICCI), New Cairo, Egypt, 9–10 March 2022; pp. 111–117. [Google Scholar]
- Siddiqui, M.K.; Morales-Menendez, R.; Huang, X.; Hussain, N. A review of epileptic seizure detection using machine learning classifiers. Brain Inform.
**2020**, 7, 5. [Google Scholar] [CrossRef] - Mohri, M.; Rostamizadeh, A.; Talwalkar, A. Foundations of Machine Learning; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Tzimourta, K.D.; Tzallas, A.T.; Giannakeas, N.; Astrakas, L.G.; Tsalikakis, D.G.; Angelidis, P.; Tsipouras, M.G. A robust methodology for classification of epileptic seizures in EEG signals. Health Technol.
**2019**, 9, 135–142. [Google Scholar] [CrossRef] - Ullah, I.; Hussain, M.; Qazi, E.-u.-H.; Aboalsamh, H. An automated system for epilepsy detection using EEG brain signals based on deep learning approach. Expert Syst. Appl.
**2018**, 107, 61–71. [Google Scholar] [CrossRef] - Pominova, M.; Artemov, A.; Sharaev, M.; Kondrateva, E.; Bernstein, A.; Burnaev, E. Voxelwise 3D convolutional and recurrent neural networks for epilepsy and depression diagnostics from structural and functional MRI data. In Proceedings of the 2018 IEEE International Conference on Data Mining Workshops (ICDMW), Singapore, 17–20 November 2018; pp. 299–307. [Google Scholar]
- Abdelhameed, A.M.; Daoud, H.G.; Bayoumi, M. Deep convolutional bidirectional LSTM recurrent neural network for epileptic seizure detection. In Proceedings of the 2018 16th IEEE International New Circuits and Systems Conference (NEWCAS), Montreal, QC, Canada, 24–27 June 2018; pp. 139–143. [Google Scholar]
- Si, Y. Machine learning applications for electroencephalograph signals in epilepsy: A quick review. Acta Epileptol.
**2020**, 2, 5. [Google Scholar] [CrossRef] - Birjandtalab, J.; Pouyan, M.B.; Nourani, M. Unsupervised eeg analysis for automated epileptic seizure detection. In Proceedings of the First International Workshop on Pattern Recognition, Tokyo, Japan, 11–13 May 2016; International Society for Optics and Photonics: Bellingham, WA, USA, 2016; Volume 10011, p. 100110M. [Google Scholar]
- Wickramasinghe, C.S.; Amarasinghe, K.; Marino, D.L.; Rieger, C.; Manic, M. Explainable unsupervised machine learning for cyber-physical systems. IEEE Access
**2021**, 9, 131824–131843. [Google Scholar] [CrossRef] - Chen, Z.; Lu, G.; Xie, Z.; Shang, W. A unified framework and method for EEG-based early epileptic seizure detection and epilepsy diagnosis. IEEE Access
**2020**, 8, 20080–20092. [Google Scholar] [CrossRef] - Nandan, M.; Talathi, S.S.; Myers, S.; Ditto, W.L.; Khargonekar, P.P.; Carney, P.R. Support vector machines for seizure detection in an animal model of chronic epilepsy. J. Neural Eng.
**2010**, 7, 036001. [Google Scholar] [CrossRef] [PubMed] - Karpov, O.E.; Grubov, V.V.; Maksimenko, V.A.; Utaschev, N.; Semerikov, V.E.; Andrikov, D.A.; Hramov, A.E. Noise amplification precedes extreme epileptic events on human EEG. Phys. Rev. E
**2021**, 103, 022310. [Google Scholar] [CrossRef] - Karpov, O.E.; Grubov, V.V.; Maksimenko, V.A.; Kurkin, S.A.; Smirnov, N.M.; Utyashev, N.P.; Andrikov, D.A.; Shusharina, N.N.; Hramov, A.E. Extreme value theory inspires explainable machine learning approach for seizure detection. Sci. Rep.
**2022**, 12, 11474. [Google Scholar] [CrossRef] - Karpov, O.E.; Afinogenov, S.; Grubov, V.V.; Maksimenko, V.; Korchagin, S.; Utyashev, N.; Hramov, A.E. Detecting epileptic seizures using machine learning and interpretable features of human EEG. Eur. Phys. J. Spec. Top.
**2022**, 1–10. [Google Scholar] [CrossRef] - White, D.M.; Van Cott, C.A. EEG artifacts in the intensive care unit setting. Am. J. Electroneurodiagn. Technol.
**2010**, 50, 8–25. [Google Scholar] [CrossRef] - Ebersole, J.S.; Pedley, T.A. Current Practice of Clinical Electroencephalography; Lippincott Williams & Wilkins: Pennsylvania Furnace, PA, USA, 2003. [Google Scholar]
- Aldroubi, A.; Unser, M. Wavelets in Medicine and Biology; Routledge: Oxfordshire, UK, 2017. [Google Scholar]
- Hramov, A.E.; Koronovskii, A.A.; Makarov, V.A.; Maximenko, V.A.; Pavlov, A.N.; Sitnikova, E. Wavelets in Neuroscience; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
- Adeli, H.; Zhou, Z.; Dadmehr, N. Analysis of EEG records in an epileptic patient using wavelet transform. J. Neurosci. Methods
**2003**, 123, 69–87. [Google Scholar] [CrossRef] [PubMed] - Bro, R.; Smilde, A.K. Principal component analysis. Anal. Methods
**2014**, 6, 2812–2831. [Google Scholar] [CrossRef] - Frolov, N.S.; Grubov, V.V.; Maksimenko, V.A.; Lüttjohann, A.; Makarov, V.V.; Pavlov, A.N.; Sitnikova, E.; Pisarchik, A.N.; Kurths, J.; Hramov, A.E. Statistical properties and predictability of extreme epileptic events. Sci. Rep.
**2019**, 9, 7243. [Google Scholar] [CrossRef] [PubMed] - Lenz, O.U.; Peralta, D.; Cornelis, C. Average Localised Proximity: A new data descriptor with good default one-class classification performance. Pattern Recognit.
**2021**, 118, 107991. [Google Scholar] [CrossRef] - Burnaev, E.; Smolyakov, D. One-class SVM with privileged information and its application to malware detection. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), Barcelona, Spain, 12–15 December 2016; pp. 273–280. [Google Scholar]
- Zhang, Z. Introduction to machine learning: k-nearest neighbors. Ann. Transl. Med.
**2016**, 4, 218. [Google Scholar] [CrossRef] [PubMed] - Zheng, W.; Zhao, L.; Zou, C. Locally nearest neighbor classifiers for pattern classification. Pattern Recognit.
**2004**, 37, 1307–1309. [Google Scholar] [CrossRef] - Breunig, M.M.; Kriegel, H.P.; Ng, R.T.; Sander, J. LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 16–18 May 2000; pp. 93–104. [Google Scholar]
- Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar]
- Goutte, C.; Gaussier, E. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In Proceedings of the Advances in Information Retrieval: 27th European Conference on IR Research (ECIR 2005), Santiago de Compostela, Spain, 21–23 March 2005; pp. 345–359. [Google Scholar]
- Conover, W.J. Practical Nonparametric Statistics; John Wiley & Sons: Hoboken, NJ, USA, 1999; Volume 350. [Google Scholar]
- Maksimenko, V.A.; Van Heukelum, S.; Makarov, V.V.; Kelderhuis, J.; Lüttjohann, A.; Koronovskii, A.A.; Hramov, A.E.; Van Luijtelaar, G. Absence seizure control by a brain computer interface. Sci. Rep.
**2017**, 7, 2487. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**The dependence of the minimum explained variance on the number of PCs for the 3 Hz (green line) and 30 Hz (blue line) frequency bands.

**Figure 3.**Precision–recall curves for the different outlier detection algorithms are shown in different colours. Each panel reflects the type of input data as stated in the panel caption: (

**A**) 3 Hz mean; (

**B**) 30 Hz mean; (

**C**) 3 Hz raw; (

**D**) 30 Hz raw; (

**E**) 3 Hz PCA; (

**F**) 30 Hz PCA.

**Figure 4.**Distance between the seizures and normal states (group mean and $95\%$ CI) in the feature space depending on the frequency band and feature. Sub-figures correspond to the different distance-based ML algorithms: LNND (

**A**); LOF (

**B**); kNN (

**C**).

Algorithm | Hyperparameter | Range of Values |
---|---|---|

OCSVM | Nu | 10${}^{i}$, $i\in $ [−6, −1] |

Gamma | 10${}^{i}$, $i\in $ [−6, −1], and ‘scale’ | |

Tol | 10${}^{i}$, $i\in $ [−6, −1] | |

Kernel type | ‘rbf’, ‘poly’, ’sigmoid’ | |

kNN, LNND, LOF | N_neighbors | $i\in $ [1, 20] |

Algorithm | ‘Euclidean’, ‘manhattan’, ‘cosine’ | |

Threshold (for kNN, LNND), % | $j\times {10}^{i}$, $i\in $ [−4, 1], $j\in $ 1, 5 | |

Contamination (For LOF) | $j\times {10}^{i}$, $i\in $ [−6, −1], $j\in $ 1, 5 | |

IF | Contamination | $j\times {10}^{i}$, $i\in $ [−6, −1], $j\in $ 1, 5 |

**Table 2.**The optimal hyperparameters that provide the highest F1-score for the different algorithms and types of input data.

Algorithm | Hyperparameter | Input Data | |||||
---|---|---|---|---|---|---|---|

30 Hz | 3 Hz | 30 Hz Mean | 3 Hz Mean | 30 Hz PCA | 3 Hz PCA | ||

OCSVM | Nu | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 |

Gamma | scale | ||||||

Tol | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | |

Kernel type | rbf | ||||||

kNN | N_neighbors | 3 | 3 | 4 | 4 | 4 | 3 |

Threshold, % | 0.5 | 0.5 | 0.1 | 0.1 | 0.1 | 0.1 | |

Algorithm | Euclidean | ||||||

LNND | N_neighbors | 5 | 8 | 9 | 9 | 9 | 7 |

Threshold, % | 0.5 | 0.5 | 0.5 | 0.5 | 0.1 | 0.1 | |

Algorithm | Euclidean | ||||||

LOF | N_neighbors | 7 | 8 | 8 | 8 | 8 | 3 |

Contamination | 0.005 | 0.005 | 0.001 | 0.005 | 0.001 | 0.0001 | |

Algorithm | Euclidean | ||||||

IF | Contamination | 0.001 | 0.005 | 0.001 | 0.001 | 0.001 | 0.005 |

**Table 3.**The maximum F1-score for all models trained on the different types of input data using the optimal parameters from Table 2. Results are shown as a group mean and 95% confidence intervals (bold text indicates the best results of the models).

Feature | Algorithm | ||||
---|---|---|---|---|---|

OCSVM | kNN | LNND | LOF | IF | |

3 Hz Raw | 0.305 ± 0.057 | 0.316
± 0.055 | 0.281 ± 0.055 | 0.297 ± 0.055 | 0.304 ± 0.057 |

30 Hz Raw | 0.307 ± 0.060 | 0.312 ± 0.060 | 0.331 ± 0.056 | 0.330 ± 0.054 | 0.271 ± 0.073 |

3 Hz mean | 0.255 ± 0.071 | 0.282 ± 0.074 | 0.116 ± 0.040 | 0.278 ± 0.073 | 0.282 ± 0.074 |

30 Hz mean | 0.245 ± 0.071 | 0.270 ± 0.073 | 0.135 ± 0.046 | 0.254 ± 0.053 | 0.270 ± 0.073 |

3 Hz PCA | 0.273 ± 0.072 | 0.300 ± 0.075 | 0.250 ± 0.071 | 0.292 ± 0.075 | 0.276 ± 0.073 |

30 Hz PCA | 0.300 ± 0.058 | 0.313 ± 0.077 | 0.338 ± 0.077 | 0.331 ± 0.080 | 0.304 ± 0.061 |

T-Stat | df | ${\mathbf{W}}_{\mathit{i}}$ | ${\mathbf{W}}_{\mathit{j}}$ | p | ${\mathbf{p}}_{\mathit{bonf}}$ | ${\mathbf{p}}_{\mathit{holm}}$ | ||
---|---|---|---|---|---|---|---|---|

OCSVM | kNN | $0.039$ | 316 | $264.000$ | $263.500$ | $0.969$ | $1.000$ | $1.000$ |

LNND | $7.883$ | 316 | $264.000$ | $163.500$ | <0.001 | <0.001 | <0.001 | |

LOF | $1.294$ | 316 | $264.000$ | $247.500$ | $0.197$ | $1.000$ | $1.000$ | |

IF | $0.196$ | 316 | $264.000$ | $261.500$ | $0.845$ | $1.000$ | $1.000$ | |

kNN | LNND | $7.844$ | 316 | $263.500$ | $163.500$ | <0.001 | <0.001 | <0.001 |

LOF | $1.255$ | 316 | $263.500$ | $247.500$ | $0.210$ | $1.000$ | $1.000$ | |

IF | $0.157$ | 316 | $263.500$ | $261.500$ | $0.875$ | $1.000$ | $1.000$ | |

LNND | LOF | $6.589$ | 316 | $163.500$ | $247.500$ | <0.001 | <0.001 | <0.001 |

IF | $7.687$ | 316 | $163.500$ | $261.500$ | <0.001 | <0.001 | <0.001 | |

LOF | IF | $1.098$ | 316 | $247.500$ | $261.500$ | $0.273$ | $1.000$ | $1.000$ |

T-Stat | df | ${\mathbf{W}}_{\mathit{i}}$ | ${\mathbf{W}}_{\mathit{j}}$ | p | ${\mathbf{p}}_{\mathit{bonf}}$ | ${\mathbf{p}}_{\mathit{holm}}$ | ||
---|---|---|---|---|---|---|---|---|

OCSVM | kNN | $1.013$ | 316 | $244.000$ | $254.000$ | $0.312$ | $1.000$ | $1.000$ |

LNND | $5.115$ | 316 | $244.000$ | $193.500$ | <0.001 | <0.001 | <0.001 | |

LOF | $0.861$ | 316 | $244.000$ | $252.500$ | $0.390$ | $1.000$ | $1.000$ | |

IF | $1.215$ | 316 | $244.000$ | $256.000$ | $0.225$ | $1.000$ | $1.000$ | |

kNN | LNND | $6.128$ | 316 | $254.000$ | $193.500$ | <0.001 | <0.001 | <0.001 |

LOF | $0.152$ | 316 | $254.000$ | $252.500$ | $0.879$ | $1.000$ | $1.000$ | |

IF | $0.203$ | 316 | $254.000$ | $256.000$ | $0.840$ | $1.000$ | $1.000$ | |

LNND | LOF | $5.976$ | 316 | $193.500$ | $252.500$ | <0.001 | <0.001 | <0.001 |

IF | $6.331$ | 316 | $193.500$ | $256.000$ | <0.001 | <0.001 | <0.001 | |

LOF | IF | $0.355$ | 316 | $252.500$ | $256.000$ | $0.723$ | $1.000$ | $1.000$ |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Karpov, O.E.; Khoymov, M.S.; Maksimenko, V.A.; Grubov, V.V.; Utyashev, N.; Andrikov, D.A.; Kurkin, S.A.; Hramov, A.E.
Evaluation of Unsupervised Anomaly Detection Techniques in Labelling Epileptic Seizures on Human EEG. *Appl. Sci.* **2023**, *13*, 5655.
https://doi.org/10.3390/app13095655

**AMA Style**

Karpov OE, Khoymov MS, Maksimenko VA, Grubov VV, Utyashev N, Andrikov DA, Kurkin SA, Hramov AE.
Evaluation of Unsupervised Anomaly Detection Techniques in Labelling Epileptic Seizures on Human EEG. *Applied Sciences*. 2023; 13(9):5655.
https://doi.org/10.3390/app13095655

**Chicago/Turabian Style**

Karpov, Oleg E., Matvey S. Khoymov, Vladimir A. Maksimenko, Vadim V. Grubov, Nikita Utyashev, Denis A. Andrikov, Semen A. Kurkin, and Alexander E. Hramov.
2023. "Evaluation of Unsupervised Anomaly Detection Techniques in Labelling Epileptic Seizures on Human EEG" *Applied Sciences* 13, no. 9: 5655.
https://doi.org/10.3390/app13095655