# Evaluation of Unsupervised Anomaly Detection Techniques in Labelling Epileptic Seizures on Human EEG

## Abstract

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Dataset

#### 2.2. Data Processing

#### 2.3. Feature Engineering

- 30 Hz raw—the vector of all 300 values of the WP in the 1–30 Hz band;
- 3 Hz raw—the sub-vector of 30 values of the WP in the 1–3 Hz band;
- 30 Hz mean—the mean value of the WP in the 1–30 Hz band;
- 3 Hz mean—the mean value of the WP in the 1–3 Hz band;
- 30 Hz PCA—PCA-based feature obtained after decomposing the WP in the 1–30 Hz band and considering four PCs that explain $90\%$ of the variance;
- 3 Hz PCA—PCA-based feature obtained after decomposing the WP in the 1–3 Hz band and considering four PCs that explain $98\%$ of the variance.

- the overall distribution of the WP across frequencies reflected by the feature 30 Hz raw;
- the mean value of the WP in the spectrum reflected by the feature 30 Hz mean;
- the WP values at dominant frequencies reflected by the feature 30 Hz PCA.

#### 2.4. Machine Learning Methods

#### 2.4.1. One-Class Support Vector Machine

**Kernel type**is similar to one used in standard SVM classifiers;- Threshold parameter (
**nu**) indicates the expected percentage of outliers in the data; - Kernel coefficient (
**gamma**) determines the degree of wrapping of the vectors by the plane; - Stopping criterion (
**tol**) implies that the algorithm stops running when the difference between old and new loss values becomes less than**tol**.

#### 2.4.2. k-Nearest Neighbors

**Algorithm**is a parameter responsible for the method used for distance calculation;**n_neighbors**defines the number of nearest neighbors;**Threshold**defines a decision boundary, i.e., the data with a distance exceeding the threshold is referred to as an outlier.

#### 2.4.3. Local Nearest Neighbors Distance

**A**is a given point and

**B**is its kth nearest neighbour, then the localized distance is the distance from

**A**to

**B**, divided by the distance from

**B**to its kth nearest neighbour. Hyperparameters for LNND are the same as the ones for kNN.

#### 2.4.4. Local Outlier Factor

**Algorithm**defines a distance measure;**n_neighbors**is the number of neighbours;**Contamination**sets the percentage of outliers in the dataset.

#### 2.4.5. Isolation Forest

**contamination**[39].

#### 2.5. Evaluation and Hyperparameter Optimization

## 3. Results and Discussion

## 4. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

EEG | Electroencephalogram |

ML | Machine learning |

DL | Deep learning |

RNN | Recurrent neural networks |

LSTM | Long short-term memory |

DSS | Decision support system |

SVM | Support vector machine |

PC | Principle component |

ICA | Independent component analysis |

CWT | Continuous wavelet transform |

WP | Wavelet power |

PCA | Principle component analysis |

OCSVM | One-class support vector machine |

kNN | k-Nearest neighbours |

LNND | Local nearest neighbours distance |

LOF | Local outlier factor |

IF | Isolation forest |

TP | True positive |

FP | False positive |

FN | False negative |

rm ANOVA | Repeated measures analysis of variances |

sCSP | Sparse common spatial pattern |

FSST | Fourier transform-based synchrosqueezing transform |

CADS | Computer-aided diagnosis system |

TQWT | Tunable-Q wavelet transform |

CNN | Convolutional neural network |

**Figure 1.**The dependence of the minimum explained variance on the number of PCs for the 3 Hz (green line) and 30 Hz (blue line) frequency bands.

**Figure 3.**Precision–recall curves for the different outlier detection algorithms are shown in different colours. Each panel reflects the type of input data as stated in the panel caption: (

**A**) 3 Hz mean; (

**B**) 30 Hz mean; (

**C**) 3 Hz raw; (

**D**) 30 Hz raw; (

**E**) 3 Hz PCA; (

**F**) 30 Hz PCA.

**Figure 4.**Distance between the seizures and normal states (group mean and $95\%$ CI) in the feature space depending on the frequency band and feature. Sub-figures correspond to the different distance-based ML algorithms: LNND (

**A**); LOF (

**B**); kNN (

**C**).

Algorithm | Hyperparameter | Range of Values |
---|---|---|

OCSVM | Nu | 10${}^{i}$, $i\in $ [−6, −1] |

Gamma | 10${}^{i}$, $i\in $ [−6, −1], and ‘scale’ | |

Tol | 10${}^{i}$, $i\in $ [−6, −1] | |

Kernel type | ‘rbf’, ‘poly’, ’sigmoid’ | |

kNN, LNND, LOF | N_neighbors | $i\in $ [1, 20] |

Algorithm | ‘Euclidean’, ‘manhattan’, ‘cosine’ | |

Threshold (for kNN, LNND), % | $j\times {10}^{i}$, $i\in $ [−4, 1], $j\in $ 1, 5 | |

Contamination (For LOF) | $j\times {10}^{i}$, $i\in $ [−6, −1], $j\in $ 1, 5 | |

IF | Contamination | $j\times {10}^{i}$, $i\in $ [−6, −1], $j\in $ 1, 5 |

**Table 2.**The optimal hyperparameters that provide the highest F1-score for the different algorithms and types of input data.

Algorithm | Hyperparameter | Input Data | |||||
---|---|---|---|---|---|---|---|

30 Hz | 3 Hz | 30 Hz Mean | 3 Hz Mean | 30 Hz PCA | 3 Hz PCA | ||

OCSVM | Nu | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 |

Gamma | scale | ||||||

Tol | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | |

Kernel type | rbf | ||||||

kNN | N_neighbors | 3 | 3 | 4 | 4 | 4 | 3 |

Threshold, % | 0.5 | 0.5 | 0.1 | 0.1 | 0.1 | 0.1 | |

Algorithm | Euclidean | ||||||

LNND | N_neighbors | 5 | 8 | 9 | 9 | 9 | 7 |

Threshold, % | 0.5 | 0.5 | 0.5 | 0.5 | 0.1 | 0.1 | |

Algorithm | Euclidean | ||||||

LOF | N_neighbors | 7 | 8 | 8 | 8 | 8 | 3 |

Contamination | 0.005 | 0.005 | 0.001 | 0.005 | 0.001 | 0.0001 | |

Algorithm | Euclidean | ||||||

IF | Contamination | 0.001 | 0.005 | 0.001 | 0.001 | 0.001 | 0.005 |

**Table 3.**The maximum F1-score for all models trained on the different types of input data using the optimal parameters from Table 2. Results are shown as a group mean and 95% confidence intervals (bold text indicates the best results of the models).

Feature | Algorithm | ||||
---|---|---|---|---|---|

OCSVM | kNN | LNND | LOF | IF | |

3 Hz Raw | 0.305 ± 0.057 | 0.316
± 0.055 | 0.281 ± 0.055 | 0.297 ± 0.055 | 0.304 ± 0.057 |

30 Hz Raw | 0.307 ± 0.060 | 0.312 ± 0.060 | 0.331 ± 0.056 | 0.330 ± 0.054 | 0.271 ± 0.073 |

3 Hz mean | 0.255 ± 0.071 | 0.282 ± 0.074 | 0.116 ± 0.040 | 0.278 ± 0.073 | 0.282 ± 0.074 |

30 Hz mean | 0.245 ± 0.071 | 0.270 ± 0.073 | 0.135 ± 0.046 | 0.254 ± 0.053 | 0.270 ± 0.073 |

3 Hz PCA | 0.273 ± 0.072 | 0.300 ± 0.075 | 0.250 ± 0.071 | 0.292 ± 0.075 | 0.276 ± 0.073 |

30 Hz PCA | 0.300 ± 0.058 | 0.313 ± 0.077 | 0.338 ± 0.077 | 0.331 ± 0.080 | 0.304 ± 0.061 |

T-Stat | df | ${\mathbf{W}}_{\mathit{i}}$ | ${\mathbf{W}}_{\mathit{j}}$ | p | ${\mathbf{p}}_{\mathit{bonf}}$ | ${\mathbf{p}}_{\mathit{holm}}$ | ||
---|---|---|---|---|---|---|---|---|

OCSVM | kNN | $0.039$ | 316 | $264.000$ | $263.500$ | $0.969$ | $1.000$ | $1.000$ |

LNND | $7.883$ | 316 | $264.000$ | $163.500$ | <0.001 | <0.001 | <0.001 | |

LOF | $1.294$ | 316 | $264.000$ | $247.500$ | $0.197$ | $1.000$ | $1.000$ | |

IF | $0.196$ | 316 | $264.000$ | $261.500$ | $0.845$ | $1.000$ | $1.000$ | |

kNN | LNND | $7.844$ | 316 | $263.500$ | $163.500$ | <0.001 | <0.001 | <0.001 |

LOF | $1.255$ | 316 | $263.500$ | $247.500$ | $0.210$ | $1.000$ | $1.000$ | |

IF | $0.157$ | 316 | $263.500$ | $261.500$ | $0.875$ | $1.000$ | $1.000$ | |

LNND | LOF | $6.589$ | 316 | $163.500$ | $247.500$ | <0.001 | <0.001 | <0.001 |

IF | $7.687$ | 316 | $163.500$ | $261.500$ | <0.001 | <0.001 | <0.001 | |

LOF | IF | $1.098$ | 316 | $247.500$ | $261.500$ | $0.273$ | $1.000$ | $1.000$ |

T-Stat | df | ${\mathbf{W}}_{\mathit{i}}$ | ${\mathbf{W}}_{\mathit{j}}$ | p | ${\mathbf{p}}_{\mathit{bonf}}$ | ${\mathbf{p}}_{\mathit{holm}}$ | ||
---|---|---|---|---|---|---|---|---|

OCSVM | kNN | $1.013$ | 316 | $244.000$ | $254.000$ | $0.312$ | $1.000$ | $1.000$ |

LNND | $5.115$ | 316 | $244.000$ | $193.500$ | <0.001 | <0.001 | <0.001 | |

LOF | $0.861$ | 316 | $244.000$ | $252.500$ | $0.390$ | $1.000$ | $1.000$ | |

IF | $1.215$ | 316 | $244.000$ | $256.000$ | $0.225$ | $1.000$ | $1.000$ | |

kNN | LNND | $6.128$ | 316 | $254.000$ | $193.500$ | <0.001 | <0.001 | <0.001 |

LOF | $0.152$ | 316 | $254.000$ | $252.500$ | $0.879$ | $1.000$ | $1.000$ | |

IF | $0.203$ | 316 | $254.000$ | $256.000$ | $0.840$ | $1.000$ | $1.000$ | |

LNND | LOF | $5.976$ | 316 | $193.500$ | $252.500$ | <0.001 | <0.001 | <0.001 |

IF | $6.331$ | 316 | $193.500$ | $256.000$ | <0.001 | <0.001 | <0.001 | |

LOF | IF | $0.355$ | 316 | $252.500$ | $256.000$ | $0.723$ | $1.000$ | $1.000$ |

