Training Datasets for Epilepsy Analysis: Preprocessing and Feature Extraction from Electroencephalography Time Series

Christian Riccio; Angelo Martone; Gaetano Zazzaro; Luigi Pavone

doi:10.3390/data9050061

,

and

¹

Department of Civil Engineering, University of Naples Federico II, 80125 Napoli, Italy

²

Laboratory of IT Plant Maintenance, CIRA (Italian Aerospace Research Centre), 81043 Capua, Italy

³

Laboratory of IT Test Management and Data Acquisition, CIRA (Italian Aerospace Research Centre), 81043 Capua, Italy

⁴

IRCCS Neuromed, 86077 Pozzilli, Italy

Data2024, 9(5), 61;https://doi.org/10.3390/data9050061

Version Notes

Order Reprints

Review Reports

Abstract

We describe 20 datasets derived through signal filtering and feature extraction steps applied to the raw time series EEG data of 20 epileptic patients, as well as the methods we used to derive them. Background: Epilepsy is a complex neurological disorder which has seizures as its hallmark. Electroencephalography plays a crucial role in epilepsy assessment, offering insights into the brain’s electrical activity and advancing our understanding of seizures. The availability of tagged training sets covering all seizure phases—inter-ictal, pre-ictal, ictal, and post-ictal—is crucial for data-driven epilepsy analyses. Methods: Using the sliding window technique with a two-second window length and a one-second time slip, we extract multiple features from the preprocessed EEG time series of 20 patients from the Freiburg Seizure Prediction Database. In addition, we assign a class label to each instance to specify its corresponding seizure phase. All these operations are made through a software application we developed, which is named Training Builder. Results: The 20 tagged training datasets each contain 1080 univariate and bivariate features, and are openly and publicly available. Conclusions: The datasets support the training of data-driven models for seizure detection, prediction, and clustering, based on features engineering.

Dataset: Data are available at https://doi.org/10.5281/zenodo.10808054.

Dataset License: CC-BY 4.0

Keywords:

data science; epilepsy; feature extraction; seizure data; signal preprocessing; training datasets

1. Summary

1.1. Problem Statement

Epilepsy is a neurological disorder that affects millions globally. It is characterized by recurrent seizures and presents substantial challenges in medical diagnosis and clinical management. Central to these challenges is the analysis of electroencephalography (EEG) time series data. An EEG captures the brain’s electrical activity and is critical for identifying and understanding epileptic seizures. Despite the richness of information contained within EEG signals, the raw time series data, as recorded by sensors, present considerable difficulties for direct analysis due to their complexity and the high variability of signal characteristics among patients. This complexity is compounded by a lack of sufficient techniques for directly analyzing raw series, necessitating advanced data processing methodologies for effective interpretation.

Our work underscores the importance of employing data processing techniques for EEG signal analysis in epilepsy. Our datasets, which are derived from the Freiburg EEG Database [1] through advanced preparation analyses, encompass diverse seizure phases and form a comprehensive foundation for the development of advanced diagnostic tools. By processing raw EEG signals and extracting a large amount of features from the filtered signals, our approach may enhance the research in this domain.

The main objective of this paper is to provide a foundational corpus for analyzing seizures and training analytical models for seizure detection, prediction, and clustering. We describe the derivation of 20 datasets from EEG data of patients with focal epilepsy through signal filtering and feature extraction and make these datasets freely available.

Our datasets provide invaluable insights into the dynamics of epileptic seizures, encompassing recordings across various brain states critical for comprehensive seizure analysis. As summarized in Table 1, these states include pre-ictal, ictal, post-ictal, and inter-ictal phases, each one offering a unique perspective on the seizure cycle. This segmentation underlines the database’s utility in exploring the mechanisms of seizure onset, progression, and recovery, enhancing our understanding and prediction of epileptic events. Many studies [2,3] highlight that the duration of epilepsy phases can be quite variable and patient-specific, influenced by factors such as the type of epilepsy, the nature of individual seizures, and the physiological state of the patient at the time.

Table 1. Distinct states of epileptic seizures.

This work not only contributes to the epilepsy research community by providing access to meticulously annotated EEG data, but also fosters innovation in Data Science (DS) methodologies for analyzing large amounts of data generated at high frequencies. Through rigorous data processing, we aim to advance epilepsy monitoring and improve patient outcomes, highlighting the critical role of interdisciplinary research in medical diagnostics and underscoring the necessity of novel approaches to raw EEG data processing. For the sake of clarity, the objectives of this manuscript are specifically tailored to a specialized audience, including computer engineers, data scientists, and related professionals focused on developing SW for seizure analysis. It is not intended for engineers and physicists at epilepsy centers who require ready-to-use seizure detection SW.

1.2. Related Works

The datasets discussed in this study, in whole or in part, as well as others generated using the same methods but with varying temporal parameters (refer to Section 3 and Section 6), served as the training data for developing models aimed at epilepsy analysis. These models were obtained utilizing DS techniques and Machine Learning (ML) algorithms. Table 2 lists the related research efforts, which primarily focused on the detection of epileptic seizures (for further details, see Section 4). Specifically, the work cited in [4] provides a comprehensive account of the seizure detection analyses, efforts to reduce false alarms, and the portability of models, conducted using the training datasets we make openly available. Table 2 also presents the performance metrics of models trained with ML algorithms (k-NN, MLP, SVM, BayesNet) using the datasets described in this paper, highlighting the effectiveness of our methodology in creating them.

Table 2. Related Works.

1.3. About This Paper

The rest of this paper is organized as follows: in Section 2, we provide a detailed description of the training datasets that we make available, introducing the Freiburg Seizure Prediction EEG database and outlining its significance in epilepsy research. In Section 3, we present our methods, including the software tool developed for signal processing and feature extraction from EEG time series, which is called the Training Builder (TrB) tool. In Section 4, we explore various DS techniques for epilepsy analysis, focusing on prediction, forecasting, and detection. Section 5 concludes the paper with reflections on the study’s implications and future research directions. Finally, Section 6 provides information about requesting additional training datasets.

2. Data Description

Being able to access large quantities of neurological data from individuals with epilepsy is crucial for analysis when using DS methodologies and techniques. In this section, we describe the datasets generated by our data preprocessing and feature extraction SW tool (TrB tool), which analyze EEG signals from patients in the Freiburg Seizure Detection Database.

Training Datasets

The training datasets we provide consist of 20 csv files obtained through the TrB tool [8], one for each epileptic patient of the Freiburg Seizure Detection Database.

The Freiburg EEG Database stands out as a fundamental resource in epilepsy research, as it has been carefully curated to support advancements in detection and prediction and enhance our understanding of the underlying mechanisms of seizures. It comprises intracranial EEG recordings from a selected cohort of 21 patients (although data from only 20 patients are available to us because patient number 12 is missing), each dealing with drug-resistant focal epilepsy. These patients underwent comprehensive pre-surgical evaluation at the University Hospital of Freiburg, Germany, making the dataset particularly relevant for those investigating the potential of surgical interventions in epilepsy treatment. The EEG recordings span a vast spectrum of up to 128 channels per patient. They encompass a wide array of brain states, including prei-ctal, ictal, post-ictal, and inter-ictal phases (Table 1). These phases provide an overall view, which is necessary for developing algorithms that can accurately distinguish between normal and abnormal brain activity. The temporal resolution of the database is very high, with recordings sampled at 256 Hz. This ensures that the fast dynamics of epileptic activity are captured in detail, which is crucial for analyzing the rapid changes associated with seizure onset and progression. Each patient is monitored for 24 h on average. In addition, the DB includes an extensive array of patient metadata (Table 3) [9].

Table 3. Metadata exploration and data insights of the Freiburg EEG database.

Each of the 20 file names we provide uniquely identifies the patient using a number as suffix, so the file Pat001.csv refers to the training data of patient number 001.

In each file we have a columns header with three metadata fields as follows:

Registration: Freiburg EEG database has several registrations for each patient. This column specifies the registration number from which the training data are extracted.
Actual Timestamp: this column specifies the time interval, in terms of the initial sample and the final sample, from which all the features are extracted. This interval is also related to the length of the selected window (the L parameter), e.g., the symbol 1_512 states that a 2 s window was used because it contains 512 samples (i.e., $2 * 256$ , where 256 is the sample frequency of the Freiburg DB recordings).
Actual TAG: this column is the class value identified in the actual timestamp interval. It summarizes which portion of the signal the record (data vector) refers to and can therefore take on the values in the set {PRE, IKTAL, POST, INTER} (see Table 1).

The remaining fields of the csv file represent the features, which are calculated by applying the sliding window technique (see Section 3). Each feature name is in the form EiBjFCMk, where:

Ei is the i-th electrode number, with $i = 1, 2, \dots, 6$ . Electrodes 1, 2, and 3 are in focus because they are located in the epileptic brain areas, while electrodes 4, 5, and 6 are out of focus, as they are situated in the healthy regions of the brain.
Bj is the j-th band number, with $j = 8, 13, 21, 30, 40, 70$ . Each of these band numbers corresponds, respectively, to: $α$ (8–13 Hz), $β_{1}$ (13–21 Hz), $β_{2}$ (21–30 Hz), low $γ$ (30–40 Hz), medium $γ$ (40–70 Hz) and high $γ$ (70–120 Hz) bands [4].
FC is the Code of feature extracted. In Table 4, a list of all implemented features is reported, where it is specified, among other things, whether a feature is a Univariate or Bivariate (UB column) (for further details on features descriptions and formulas, see [5,8]).

Table 4. Extracted Features.
Mk is the calculation method used to compute features. For univariate features, the value is always MU. In contrast, for bivariate features, the value can be MA if the reference signal is sourced from the preceding L window, or MB if the reference signal is the zero constant signal.

Each feature is extracted from a window of length L of the EEGs, registered by the 6 electrodes Ei, previously filtered in the 6 Bj bands. Thus, we have 1080 features, because of:

6 b a n d s * 6 e l e c t r o d e s * (14 U n i v . f e a t u r e s + 8 B i v . f e a t u r e s * 2 C a l c . m e t h o d s) = 1080

(1)

All the datasets that we make available are obtained considering length

L = 2

and sliding

S = 1

as temporal parameters of the sliding window. This choice follows the methodology described in [4]. A selection of windowing time parameters L and S is reported in [6]. Moreover, it is straightforward to also obtain the window parameters

L = 2

and

S = 2

from the training dataset by excluding the odd rows in the provided training sets.

In conclusion, for each windowed signal of length L, we have 1083 fields (3 metadata + 1080 features).

3. Methods

In this section, we provide a description of all the methods used to create the 20 final training datasets that we are making available. These datasets are the result of the EEGs preprocessing elaboration, whose steps for signal filtering and feature extraction are performed using the TrB SW tool that we developed for time series analysis.

3.1. Training Builder Tool

The TrB tool, a modular and extensible SW application, filters large quantities of time series (using low-pass, band-pass or high-pass filters) and extracts from them all the features listed in Table 4, using procedures carried out considering the technique of the sliding window. The final outputs of the tool are the training sets, which can be used as input for the training of models with data-driven learning techniques. Therefore, each dataset varies depending on

Time series (or the recorded part of them).
Filters parameters: type (low-pass, band-pass, etc.) and cut-off frequencies.
Windowing temporal parameters: L and S.
Univariate features.
Bivariate features.

The time series of the signals are analysed by the TrB considering the sliding window technique. Signal windowing is achieved by using two user-selectable temporal parameters (or window parameters):

L: it represents the length of the signal to be analysed, expressed in seconds.
S: it represents the slippage of the signal to be analysed (i.e., how often the algorithm is applied), expressed in seconds.

If the sliding step size S is smaller than the window size L, the windows overlap, while if

S = L

, we obtain a tumbling window.

Through TrB’s GUI, the user can select which and how many univariate and bivariate features to compute. In case of bivariate selection, user has to choose which reference signal to use to calculate the feature. Actually, this reference signal is of two different types:

The previous L: i.e., the same signal taken at a previous L interval.
The zero signal: i.e., the zero constant signal.

Each final dataset consists of a comma-separated values (csv) file, where features are recorded as vectors.

3.2. Software Architecture

The TrB SW architecture has been designed following the Client/Server architectural model, in which the Server part is composed of the algorithms for massively extracting features, windowing and filtering functions, and other support utilities, while the Client part is composed of a web-oriented Graphical User Interface application, which enables output result visualization and shows a form for user input selection and validation. Figure 1 shows the high-level diagram of the designed and implemented SW architecture, including the input data sources and the outputs delivered; accordingly, two possible time series data sources are provided:

Figure 1. High-level application logic scheme.

Recorded in text format (txt or csv).
Stored in a time series DB (TSDB).

Using a TSDB, instead of formatted files, allows optimization of the management of time series, with regard to their storage and recovery, while ensuring high reliability and availability.

In output, instead, the results of the application of features to these time series are provided in csv format. The csv file can be saved by the client and stored on a local file system.

TrB has been developed to be as extensive as possible, with the aim of being able to run feature calculation algorithms developed with different programming languages; currently, Java, C/C++, Matlab and Python are natively supported, but compatibility with other languages can be easily configured.

4. Epilepsy Analysis

Data Mining, ML, Deep Learning, AI, and other DS techniques revolutionize epilepsy analysis by interpreting complex neurological data to enhance seizure detection, predict seizures, identify epileptogenic zones, personalize treatments, and plan surgeries, ultimately improving patient outcomes. Analyzing EEG data, patient information, and feature-based training datasets deepens our understanding of epilepsy. This section introduces potential applications for the 20 provided training datasets, expanding on the studies listed in Table 2.

4.1. Prediction, Forecasting, and Detection

Predicting the pre-ictal state is highly valuable for managing epilepsy. Research on epileptic seizure prediction has been underway for several decades [10], leveraging both ML and Deep Learning algorithms [11,12]. These approaches aim to enhance the accuracy and timeliness of predictions, thereby offering significant improvements in patient care and the quality of life of people with epilepsy.

The definition of seizure prediction involves utilizing an alert system when an algorithm detects the pre-ictal period, indicating that a seizure will occur within a well-defined period known as the Seizure Occurrence Period, after a certain time horizon that allows for intervention, which is referred to as the Seizure Prediction Horizon [13]. The results of this approach are not always satisfactory, given the high complexity of the neurological phenomenon under analysis [14,15].

On the other hand, seizure forecasting, which is a new development in EEG analysis, takes a probabilistic approach, in which the patient is not alerted to an imminent seizure but instead is provided a constant analysis of seizure likelihood. This method identifies states of low, moderate, and high risk, continuously conveying this information to the user [16,17].

The main difference between seizure prediction and forecasting lies in the following approach: prediction is based on a deterministic alert for a specific event, whereas forecasting evaluates the likelihood of an event over time, offering a continuous risk assessment without generating specific alerts.

Seizure prediction has been studied recently in many works, but most of the existing works that rely on EEG data analysis concern seizure detection [18,19]. The main goal of a seizure detection model is to accurately identify the occurrence of seizures in real-time or from recorded data. This involves distinguishing seizure activities (ictal phase) from non-seizure activities (pre-ictal or inter-ictal phases) within the brain’s electrical signals. The detection model aims to enable timely intervention, improve patient management, and reduce the risk associated with unattended seizures.

4.2. Other Investigations

Other types of analyses can be conducted on datasets obtained from the EEG signals of epileptic patients. An unsupervised clustering-based approach could be useful, allowing us to group patients with respect to some of their characteristics [20,21]. In more detail, meta-characteristics, such as statistical features or complexity measures, can be calculated for each patient from the datasets with the extracted features. This new dataset, which may be referred to as a meta-set, contains 20 records, with 1 record per patient, and is the basis for new analyses such as a cluster analysis, with the aim of determining similar patients considering epileptic characteristics. This newly extracted knowledge is useful for directing new pharmacological therapies to groups of patients and not just to a single patient.

Although obtaining a single detection, prediction, or forecasting model for all patients is complex, it may be simpler and more efficient to develop one for a group of similar patients.

5. Conclusions

In this study, our objectives are to describe and provide sets of data obtained from the processing and analysis of neurological data related to epileptic patients, leveraging advanced EEG signal preprocessing and feature extraction techniques.

The training datasets, generated through the TrB tool via two successive steps of signal filtering and feature extraction, are useful for subsequent investigations and modeling, for instance, using DS methodologies.

Future steps may include integrating additional algorithms for Univariate and Bivariate feature calculations into the TrB tool. We also plan to enable the provision of the algorithm’s code at runtime, which is particularly feasible when using runtime-interpreted programming languages like Python or Matlab.

For further investigation, there is also the field of data visualization, which will allow the graphical exploration of time series for both raw and processed signals.

6. User Notes

The datasets discussed in this study, in whole or in part, served as the training sets for developing the detection models described in [4]. We would like to highlight the adaptability of our proposed TrB tool, which allows for the extraction of more detailed data by adjusting its parameters, like the windowing temporal parameters L and S. Should users need more specialized data, we are open to providing them upon request, underlining our commitment to supporting collaborative research in the field of Data Science for epilepsy analysis.

Author Contributions

Conceptualization, G.Z. and L.P.; methodology, G.Z. and A.M.; software, A.M.; validation, C.R.; resources, L.P. and A.M.; data curation, G.Z., A.M. and C.R.; writing—original draft preparation, C.R. and G.Z.; writing—review and editing, C.R. and L.P.; supervision, G.Z. and L.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The training datasets derive from the EEG Database provided by Epilepsy Center Freiburg as well as the Freiburg Center for Data Analysis and Modelling. Authors obtained prior written consent from the Epilepsy Center Freiburg to publish these data or to use them for publication (agreement was signed by one of the authors, L.P., on 16 March 2012).

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available via Zenodo at https://doi.org/10.5281/zenodo.10808054 (accessed on 12 March 2024).

Acknowledgments

The authors wish to thank the Center Freiburg as well as the Freiburg Center for Data Analysis and Modelling. They would also like to mention the Big Data Facility project, funded by the Italian PRORA, for which our tool was designed and developed and data analysis was performed. G.Z. and A.M. would like to thank Roberto V. Montaquila of CIRA for his support in the development of signal pre-processing algorithms.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence.
DB	Database.
DS	Data Science.
EEG	Electroencephalography.
INTER	Inter-ictal.
PRE	Pre-ictal.
ML	Machine Learning.
POST	Post-ictal.
SW	Software.
TrB	Training Builder.
TSDB	Time Series DB.

References

FSPEEG Website, Seizure Prediction Project Freiburg, University of Freiburg. Available online: http://epilepsy.uni-freiburg.de/freiburg-seizure-prediction-project/eeg-database (accessed on 1 March 2024).
Quercia, A.; Frick, T.; Egli, F.E.; Pullen, N.; Dupanloup, I.; Tang, J.; Asif, U.; Harrer, S.; Brunschwiler, T. Preictal Onset Detection through Unsupervised Clustering for Epileptic Seizure Prediction. In Proceedings of the 2021 IEEE International Conference on Digital Health (ICDH), Chicago, IL, USA, 5–11 September 2021; pp. 142–147. [Google Scholar]
Vroomen, P. Postictal Paresis in Focal Epilepsies—Incidence, Duration, and Causes. Neurology 2005, 64, 580. [Google Scholar] [CrossRef] [PubMed]
Zazzaro, G.; Pavone, L. Machine Learning Characterization of Ictal and Interictal States in EEG Aimed at Automated Seizure Detection. Biomedicines 2022, 10, 1491. [Google Scholar] [CrossRef] [PubMed]
Zazzaro, G.; Cuomo, S.; Martone, A.; Montaquila, R.V.; Toraldo, G.; Pavone, L. EEG Signal Analysis for Epileptic Seizures Detection by Applying Data Mining Techniques. Internet Things 2021, 14, 100048. [Google Scholar] [CrossRef]
Pafferi, F.; Zazzaro, G.; Martone, A.; Bifulco, P.; Pavone, L. Temporal Analysis for Epileptic Seizure Detection by Using Data Mining Approach. In Proceedings of the 2020 IEEE 22nd International Conference on High Performance Computing and Communications, Yanuca Island, Cuvu, Fiji, 14–16 December 2020; pp. 1356–1363. [Google Scholar] [CrossRef]
Zazzaro, G.; Martone, A.; Montaquila, R.V.; Pavone, L. From Electroencephalogram to Epileptic Seizures Detection by Using Artificial Neural Networks. Int. J. Med. Med. Sci. 2019, 12, 8. [Google Scholar] [CrossRef]
Martone, A.; Zazzaro, G.; Pavone, L. A Feature Extraction Framework for Time Series Analysis. An Application for EEG Signal Processing for Epileptic Seizures Detection. In Proceedings of the ALLDATA 2019, the 5th International Conference on Big Data, Small Data, Linked Data and Open Data, Valencia, Spain, 24–28 March 2019. [Google Scholar]
Yuan, S.; Mu, J.; Zhou, W.; Dai, L.Y.; Liu, J.X.; Wang, J.; Liu, X. Automatic Epileptic Seizure Detection Using Graph-Regularized Non-Negative Matrix Factorization and Kernel-Based Robust Probabilistic Collaborative Representation. IEEE Trans. Neural Syst. Rehabil. Eng. 2022, 30, 2641–2650. [Google Scholar] [CrossRef] [PubMed]
Yang, X.; Zhao, J.; Sun, Q.; Lu, J.; Ma, X. An Effective Dual Self-Attention Residual Network for Seizure Prediction. IEEE Trans. Neural Syst. Rehabil. Eng. 2021, 29, 1604–1613. [Google Scholar] [CrossRef] [PubMed]
Rasheed, K.; Qayyum, A.; Qadir, J.; Sivathamboo, S.; Kwan, P.; Kuhlmann, L.; O’Brien, T.; Razi, A. Machine Learning for Predicting Epileptic Seizures Using EEG Signals: A Review. IEEE Rev. Biomed. Eng. 2020, 14, 139–155. [Google Scholar] [CrossRef]
Neto, A.J.V.; Silva, L.; Moioli, R.; Brasil, F.; Rodrigues, J. Predicting Epileptic Seizures: Case Studies Harnessing Machine Learning. In Proceedings of the ICC 2020-2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 7–11 June 2020; pp. 1–6. [Google Scholar] [CrossRef]
Maiwald, T.; Winterhalder, M.; Aschenbrenner-Scheibe, R.; Voss, H.U.; Schulze-Bonhage, A.; Timmer, J. Comparison of Three Nonlinear Seizure Prediction Methods by Means of the Seizure Prediction Characteristic. Phys. D Nonlinear Phenom. 2004, 194, 357–368. [Google Scholar] [CrossRef]
Assi, E.; Nguyen, D.; Rihana, S.; Sawan, M. Towards Accurate Prediction of Epileptic Seizures: A Review. Biomed. Signal Process. Control 2017, 34, 144–157. [Google Scholar] [CrossRef]
Korshunova, I.; Kindermans, P.; Degrave, J.; Verhoeven, T.; Brinkmann, B.; Dambre, J. Towards Improved Design and Evaluation of Epileptic Seizure Predictors. IEEE Trans. Biomed. Eng. 2018, 65, 502–510. [Google Scholar] [CrossRef] [PubMed]
Stirling, R.; Cook, M.; Grayden, D.; Karoly, P.J. Seizure forecasting and cyclic control of seizures. Epilepsia 2020, 62, S14–S23. [Google Scholar] [CrossRef]
Budde, B.; Maksimenko, V.; Sarink, K.; Seidenbecher, T.; van Luijtelaar, G.; Hahn, T.; Pape, H.; Lüttjohann, A. Seizure Prediction in Genetic Rat Models of Absence Epilepsy: Improved Performance through Multiple-Site Cortico-Thalamic Recordings Combined with Machine Learning. eNeuro 2021, 9, ENEURO.0160-21.2021. [Google Scholar] [CrossRef]
Shoeibi, A.; Khodatars, M.; Ghassemi, N.; Jafari, M.; Moridian, P.; Alizadehsani, R.; Panahiazar, M.; Khozeimeh, F.; Zare, A.; Hosseini-Nejad, H.; et al. Epileptic Seizures Detection Using Deep Learning Techniques: A Review. Int. J. Environ. Res. Public Health 2021, 18, 5780. [Google Scholar] [CrossRef] [PubMed]
Farooq, M.S.; Zulfiqar, A.; Riaz, S. Epileptic Seizure Detection Using Machine Learning: Taxonomy, Opportunities, and Challenges. Diagnostics 2023, 13, 1058. [Google Scholar] [CrossRef] [PubMed]
Prince, P.; Premalatha, J.; Marshiana, D.; Raghavan, K.; Kumar, S. Application of Clustering Techniques on Statistical Features of EEG Signals for Seizure Detection. Indian J. Public Health Res. Dev. 2019, 10, 1384. [Google Scholar] [CrossRef]
Bhattacharya, S.; Bennett, A.; Alba, C.; Kriukova, K.; Duncan, D. Unsupervised Seizure Detection in EEG Using Long Short Term Memory Network and Clustering. In Proceedings of the 2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing (MLSP), Rome, Italy, 17–20 September 2023; pp. 1–6. [Google Scholar] [CrossRef]

Figure 1. High-level application logic scheme.

Table 1. Distinct states of epileptic seizures.

State	Description	Abbreviation
Pre-ictal	This state occurs before the onset of a seizure, without a standard duration due to the unclear starting point.	PRE
Ictal	This state starts with the onset of the seizure and concludes with the end of the seizure.	IKTAL
Post-ictal	This state begins immediately after the ictal phase.	POST
Inter-ictal	This state occurs after the post-ictal phase and concludes before the onset of the pre-ictal state of a subsequent seizure.	INTER

Table 2. Related Works.

Title	Goals	Patients	Window Parameters *	Modeling Method	Performance Metrics **
ML Characterization of Ictal and Interictal States in EEG Aimed at Automated Seizure Detection [4]	Seizure Detection, False Alarms Reduction, Model Portability	3, 4, 11, 13, 17 ***, 19, 21	$L = 2$ , $S = 1$	k-Nearest Neighbors (k-NN)	CA ⁺ = 99.89%, TPR ⁺⁺ = 93.68%, TNR ⁺⁺⁺ = 99.89% on Interictal
EEG Signal Analysis for Epileptic Seizures Detection by Applying Data Mining Techniques [5]	Seizure Detection	16	$L = 5$ , $S = 1$	Support Vector Machine (SVM)	CA = 99.63%, TPR = 99.6%
Temporal Analysis for Epileptic Seizure Detection Using Data Mining Approach [6]	Temporal Analysis, Seizure Detection	9	$L = 1, 2, 4, 10$ , $S = 1, 2, 4, 5, 10$	Bayesian Networks (BayesNet)	TPR = 100%, TNR = 100%
From Electroencephalogram to Epileptic Seizures Detection by Using Artificial Neural Networks [7]	Seizure Detection	3	$L = 5$ , $S = 1$	Multi Layer Perceptron (MLP)	CA = 99.99%, TPR = 99.5%
A Feature Extraction Framework for Time Series Analysis [8]	Training Builder Description, Seizure Detection	16	$L = 5$ , $S = 1$	Multi-Layer Perceptron (MLP)	CA = 99.27%, TPR = 95%

* See Section 3 for further details. ** The definitions of the performance metrics can be found in [4]. *** Training Patient, ⁺ Classification Accuracy, ⁺⁺ True Positive Rate, ⁺⁺⁺ True Negative Rate.

Table 3. Metadata exploration and data insights of the Freiburg EEG database.

Category	Description
Patient Demographics	Age, gender, history, epilepsy type, seizure focus
Clinical Information	Seizure frequency, semiology, medications, imaging
Implant Details	Electrode type, location, brain coverage
Seizure Annotations	Labeled seizure events with onset and duration
Interictal Annotations	Annotations of seizure-free periods
Dataset Organization	Segregated into ictal and inter-ictal for focused research

Table 4. Extracted Features.

Id	Name	Code	UB
1	Standard Deviation	SD	U
2	Kurtosis	KU	U
3	Hjorth Mobility	HM	U
4	Shannon Entropy	SH	U
5	Log-Energy Entropy	LE	U
6	Kolmogorov Complexity	KC	U
7	Upper Limit Lempel–Ziv Complexity	LU	U
8	Lower Limit Lempel–Ziv Complexity	LL	U
9	Peak Displacement	PD	U
10	Predominant Period	PP	U
11	Averaged Period	AP	U
12	Squared Grade	SG	U
13	Squared Time to Peak	SP	U
14	Inverted Time to Peak	IP	U
15	Conditional Entropy	CE	B
16	Joint Entropy	JE	B
17	Mutual Information	MI	B
18	Cross Correlation Index	CC	B
19	Euclidean Distance	ED	B
20	Levenshtein Distance	LD	B
21	Dynamic Time Warping	DT	B
22	Longest Common Sub-Sequence	LC	B

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Training Datasets for Epilepsy Analysis: Preprocessing and Feature Extraction from Electroencephalography Time Series

Abstract

1. Summary

1.1. Problem Statement

1.2. Related Works

1.3. About This Paper

2. Data Description

Training Datasets

3. Methods

3.1. Training Builder Tool

3.2. Software Architecture

4. Epilepsy Analysis

4.1. Prediction, Forecasting, and Detection

4.2. Other Investigations

5. Conclusions

6. User Notes

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Article Metrics

Citations

Article Access Statistics