Neuromorphic Architecture Accelerated Automated Seizure Detection in Multi-Channel Scalp EEG

Epileptic focal seizures can be localized in the brain using tracer injections during or immediately after the incidence of a seizure. A real-time automated seizure detection system with minimal latency can help time the injection properly to find the seizure origin accurately. Reliable real-time seizure detection systems have not been clinically reported yet. We developed an anomaly detection-based automated seizure detection system, using scalp-electroencephalogram (EEG) data, which can be trained using a few seizure sessions, and implemented it on commercially available hardware with parallel, neuromorphic architecture—the NeuroStack. We extracted nonlinear, statistical, and discrete wavelet decomposition features, and we developed a graphical user interface and traditional feature selection methods to select the most discriminative features. We investigated Reduced Coulomb Energy (RCE) networks and K-Nearest Neighbors (k-NN) for its several advantages, such as fast learning no local minima problem. We obtained a maximum sensitivity of 91.14%±1.77% and a specificity of 98.77%±0.57% with 5 s epoch duration. The system’s latency was 12 s, which is within most seizure event windows, which last for an average duration of 60 s. Our results showed that the CD feature consumes large computation resources and excluding it can reduce the latency to 3.6 s but at the cost of lower performance 80% sensitivity and 97% specificity. We demonstrated that the proposed methodology achieves a high specificity and an acceptable sensitivity within a short delay. Our results indicated also that individual-based RCE are superior to population-based RCE. The proposed RCE networks has been compared to SVM and ANN as a baseline for comparison as they are the most common machine learning seizure detection methods. SVM and ANN-based systems were trained on the same data as RCE and K-NN with features optimized specifically for them. RCE nets are superior to SVM and ANN. The proposed model also achieves comparable performance to the state-of-the-art deep learning techniques while not requiring a sizeable database, which is often expensive to build. These numbers indicate that the system is viable as a trigger mechanism for tracer injection.


Introduction
Epilepsy is one of the most common neurological disorders, affecting up to one percent of the population worldwide and almost two million people in the United States alone [1]. Up to 30% of epilepsy patients experience medically refractory recurrent seizures [2] that do not respond to anti-seizure medication. In patients presenting with medically intractable seizures, complete surgical resection of the epileptic zone may be curative, offering the best long-term prognosis, with either complete absence of seizures or partial response to surgery with decreased seizure frequency and/or decreased use of anti-epileptic medication.
Presurgical evaluation entails extensive workup, including clinical workup, interictal (between seizures) scalp EEG, ictal (during seizures) video EEG monitoring, and neuropsychological testing; in addition, patients undergo morphologic (MRI, CT) and functional (interictal PET and ictal single photon emission computed tomography (ictal-SPECT)) multimodality imaging [3]. Usually, patients are offered neurosurgical options if the clinical presentation, ictal-interictal EEG, and imaging features are concordant for localization of the seizure focus. Often, despite extensive presurgical workup and imaging, either the data is discordant or inconclusive; in this large subset of patients, ictal-SPECT is often helpful for localizing seizures [3] and phases, which demonstrates areas of acute ictal hyperperfusion (enhanced perfusion during seizures). Ictal-SPECT imaging is instrumental in identifying non-lesional intractable seizures and in pediatric patients.
Seizures are known to propagate rapidly to the ipsilateral and contralateral cortex, especially in extratemporal foci compared to temporal foci. This propagation is very rapid and often diffuse. Since blood flow follows electrical activity [4], it is imperative to inject the perfusion tracers as soon as the onset of seizures on EEG and/or video monitoring is observed. Hence, to obtain an accurate ictal-SPECT scan, the elapsed time from seizure onset to tracer injection is critical and must be as short as possible [5]. The reliability of tracer injection for seizure localization significantly improves the elapsed period from seizure onset to tracer injection; Early radio tracer injection has been considered the most critical factor for seizure localization. Pastor et al. [6] and Setoain et al. [7] reported improved seizure localization using automated tracer injection (average of 33 s; range: 19-63 s; p < 0.05) compared to manual injections (average of 41 s; range: 14-103 s; p = 0.14) and a successful localization seizure focus in 21 of the 27 patients (78%) by automated technique as opposed to 19 of the 29 patients (65%) by manual technique. Ho et al. [8] have documented the different cerebral perfusion patterns in temporal lobe seizures during ictal and periictal phases. Delayed injections lead to diffuse/multiple-foci of hyper-perfusion on ictal-SPECT, thus invalidating the procedure.
Automated seizure detection on ictal EEG has been attempted for more than four decades. After preprocessing the EEG signal for noise and artifact removal, different techniques have been used for the detection task, including rule-based wavelet and spectral analysis, artificial neural networks (ANN), and support vector machines (SVM) [9][10][11] ( Table 1). Research in neurostimulation and automated drug delivery systems has further grown this field, and ANN and SVM are emerging as the front runner classifiers in automated systems [12][13][14]. Though the reported detection accuracies of various techniques have been impressive, reaching as high as 90% or more, these results are based on well-defined and cleansed samples and are often obtained off-field in the laboratory [9]. When deployed in a real-world clinical setting, the accuracies can plummet significantly. Currently, neural networks are software emulations and are computationally intensive. So, a finite time is elapsed for processing the input data streams; the temporal delay is well known to exponentially increase with increasing volumes and the complexity of the incoming data streams. It has also been noted in several studies [15][16][17][18][19] that individual-based systems perform better than a generalized system because of the significant inter-individual variance of epileptic signals and their general random nature. When deployed in realworld settings, these systems generally tend to have a minimal amount of patient-specific ictal/seizure EEG data than the interictal/normal data.
The traditional methods for seizure detection such as ANN, need large amounts of training data for acceptable performance. Also, it has been shown that ANN requires 4 fold more computational power than SVM. Wang et al. [20] proposed a random forest with grid search optimization. In addition, most studies reporting the classification results with these machine learning (ML) models use a large database, such as the CHB-MIT scalp EEG Database, for training and reporting the model metrics [10,21]. Typically, if we take a few EEG sessions for training and aim to perform the SPECT injection in the subsequent few sessions, we would have a substantial amount of normal data but very little seizure data. In our clinical recordings, a session contained 4 h of normal data and 71 s of seizure data on average. Many adaptive pattern classifiers have been developed to provide high-performance and real-time responses with real-world data. Much recent emphasis has been placed on deep learning, but numerous other classifiers have been developed. These include decision trees, Boltzmann machines, RCE networks, feature-map, LVQ, high-order networks, radial basis function classifiers, and modified nearest neighbor approaches, to name a few. These classifiers provide trade-offs in memory and computation requirements, training complexity, and ease of implementation and adaptation. K-NN methods allow reduced error rates. For instance, several studies have demonstrated that k-NN, which train rapidly but require large amounts of memory and computation, sometimes perform as well as back-propagation classifiers, which are more complex to train but require less memory. Decision trees, which have small memory and computation requirements, often perform as well as more complex back-propagation classifiers but are more prone to over-fitting. Radial basis function classifiers require intermediate amounts of memory and training time. RCE networks require less memory than k-nearest neighbor classifiers but adapt its structure over time using simple adaptation rules that recruit new nodes to match the complexity of the classifier to that of the training data. It was reported, in [22], that RCE networks adapt faster and require fewer exemplar nodes than the nearest neighbor classifiers as more nodes, if needed, are recruited to generate more complex decision regions, and the size of hyperspheres formed by existing nodes is modified during adaptation. It has been demonstrated, both theoretically and experimentally, that RCE forms complex decision regions rapidly. They can be trained to solve many problems more than an order of magnitude faster than back-propagation classifiers. RCE networks are currently being applied to many real-world problems for real-time execution, due to their fast learning and the absence of local minima.
Recent advances in machine learning science and deep learning techniques have shown their superiority for learning very robust seizure representation features. For example, artificial neural networks (ANNs) were used to detect seizures after using traditional feature extraction techniques. Some researchers have used semi-supervised deep learning strategies for epileptic EEG classification. The most widely used method involves training a neural network in an unsupervised way using unlabeled data and then training it again in a supervised way using labeled data.
Several deep learning-based systems have been proposed to address the limitation of the classification schemes mentioned above [23][24][25][26]. For instance, Abdelhameed et al. [23] proposed a 2D supervised deep convolutional autoencoder (SDCAE) to detect epileptic seizures in multichannel EEG signals recordings automatically. They showed that deep learning could achieve 98% detection accuracy with high sensitivity. The computational training and testing times of these models were not reported. Although deep learning approaches seem to be attractive, it requires a sizeable database, which is not always available. Furthermore, deep learning requires specific hardware for faster training, yet building large comprehensive datasets is tedious and expensive. Additionally, the large volumes of continuous EEG recordings required for deep learning algorithms are limited and remain a significant limitation. Finally, in order to elucidate the optimal network structure for a deep neural network, substantial labor may be required. To the best of the authors' knowledge, few to no studies have examined the use of machine learning for automatic seizure detection with experimental implementation on hardware. The choice of hardware implementation over software implementation is because dedicated hardware provides real-time and faster processing compared with general software [27].
We identified k-Nearest Neighbors (k-NN) and Reduced Coulomb Energy (RCE) networks for this task [28]. Wang et al. [11,29] reported high accuracies using k-NN and SVM, respectively. Shoka et al. [30] developed an automatic seizure diagnosis based on channel selection. Shoka et al. tested several machine learning techniques such as SVM, Ensemble decision trees, k-NN, LDA, Logistic Regression, decision trees, and Naive Bayes. These algorithms showed 80% accuracy on unfiltered data. They showed also that filtered data improved the detection by 1% to 2%. Rivero et al. [31] also reported high accuracy using k-NN. The choice of these algorithms was also motivated by the commercial availability of specialized hardware tailored for implementing these algorithms. Based on neuromorphic architecture [32], this hardware has been engineered to improve the accuracy of pattern recognition and, more importantly, decrease the elapsed time between signal input and the output of results, and has been used recently by many researchers [33]. We use NeuroStack [34] board from General Vision (Petaluma, CA, USA) for our application, which has multiple neuromorphic chips and enables multiple such boards to be daisy-chained, significantly increasing its ability for pattern learning. The NeuroStack has an onboard FPGA for digital signal processing operations. The FPGA has parallel architecture and has multiple processing elements, which can be used to implement a high-throughput mapreduce framework to speed up the preprocessing operations on multiple EEG channels. The major contributions of this study are summarized as follows: • We developed a clinical dataset that consists of 205 recordings with an average of 7 h and 35 min for normal brain activity and 5 min 11 s for seizure. The 205 EEG recordings has been collected from 45 patients; • We demonstrated that traditional k-NN and RCN could achieve high seizure identification accuracy with high sensitivity (91.14%) and acceptable specificity (98.77%), achieving comparable performance to support vector machine, ANN, and deep learning. We did not directly compare the proposed technique to deep learning on the same datasets because the hardware used in this study does not support deep learning, but we could obtain results from recent studies and surveys. The results show that machine learning can be used in limited data and computing resources cases, which is often the case. Another advantage of traditional machine learning over deep learning is eliminating longer labeling tasks; • We investigated several types of features such as nonlinear features (sample entropy and correlation dimension) and first and second-order feature extraction. We also explored several feature selections such as mutual information-based feature selection, Chi-square score-based feature selection, ANOVA F1-Value, and Recursive Feature Elimination. We showed that well-engineered features could help machine learning achieve high accuracy while supporting real-time seizure detection. We showed that a latency as small as 3.6 s can be achieved; • In comparing the proposed method to other state-of-the-art machine learning, we showed that the proposed methodology is superior to SVM and ANN. They are the most widely used algorithms in seizure detection. Because of the limited training dataset, we only employed a 4-layer neural network. Increasing the depth of the network requires more training data, which we did not have; • We developed a graphical user interface that can assist epileptologists to apply their expertise in the field and facilitate the labeling jobs as they can spend less time with this task.
The remaining sections of this paper are outlined as follows: Section 2 describes data collection, feature extraction, and feature selection as well as the ML methods used for classification. Section 3 describes the experimental setup, experiments, training, and evaluation metrics. It also describes examples of results as well as their analysis. Section 4 discusses the results in the context of related and state-of-the-art-techniques. Section 5 summarizes the main findings of this study and concludes the paper.

Data Collection
We used archived scalp-EEG data collected over two years at the King Fahad Medical City (KFMC), Riyadh, Saudi Arabia. This study utilized archived clinical data that was approved as an exempt study by the institutional review board (IRB) of KFMC.
Inclusion criteria: all adult patients with suspected focal intractable seizures admitted to the video-EEG monitoring suite for seizure localization.
Exclusion criteria: all pediatric patients with suspected focal intractable seizures. Recordings: all EEG recordings were made in the video-EEG monitoring suite. The EEG signals had 21 channels per recording, which were captured using a 10-20 electrode system [42] at a sampling rate of 500 Hz. The dataset comprised 205 EEG recordings from 45 unique individuals. The data was annotated by the epileptologists at KFMC. Figure 1 gives a graphical overview of seizure data distribution among all the individuals. Over the 205 cases, the average seizure duration was 1 min 11 s, and the average normal activity (up to the seizure onset) duration was 1 h 35 min. The onset of seizures from the start of EEG recording was short, since some patients had a history of persistent seizures and presented with seizure activity immediately after the recording's start. Over the 45 subjects who make up the 205 recordings, the average seizure duration was 5 min 11 s, and the median duration was 2 min. The average interictal duration was 7 h and 35 min, and the median duration was 1 h 33 min.

Preprocessing
EEG signals are often contaminated with artifacts. For instance, eye-blinks and the movement of eyeballs generate electrical signals, which are collectively known as ocular artifacts (OA) [43]. Other artifacts include muscle artifacts, cardiac artifacts, and extrinsic artifacts [44]. A variety of techniques has been proposed in the literature to remove these artifacts, which can be broadly classified mainly into two categories. The first category estimates the artifactual signals using a reference channel, whereas the second decomposes EEG signals into other domains. Techniques of the latter category include regression, blind source separation (BSS), empirical-mode decomposition (EMD), DWD [43,[45][46][47], and hybrid methods. A complete review on these methods can be found in [44]. In this paper, EEG signals went through preprocessing before feature extraction. We focused on wavelet decomposition, which has many advantages over other alternatives in that it supports automatic processing, can be performed on a single channel and has versatility to attenuation artifacts [44]). After decomposition of EEG data using wavelet transformation, thresholding was applied to discard the signal that contained artifacts. In this preprocessing, windowing was applied, and the applied window equaled the epoch length ( Figure 2). The EEG signals were windowed into fixed-length epochs. The epoch length varied from 1 s to 7 s. The preprocessing removed low-frequency artifacts. The mathematical model of the DWD is described in detail in the following subsections.

Feature Extraction
We used traditional machine learning methods for this real-time application instead of deep learning methods since the traditional methods have simple hardware and computational requirements. Hence, there was a need for low-level feature extraction. The raw data was windowed into fixed-length epochs, and features were extracted from each epoch. Since the raw data had 21 channels, each epoch has 21 channels as well. We extracted nonlinear dynamical features and statistical features from the EEG data.
Nonlinear features: We extracted nonlinear features based on chaos theory, which have been proven to represent brain activity well [1,21,48,49]. These nonlinear features were sample entropy (SampEn) and correlation dimension (CD).
The sample entropy of a time-series is the negative natural logarithm of the conditional probability that two sequences similar to 'm' points remain similar at the next point, where self matches are not included in calculating the probability. SampEn [50] can be computed as: where 'm' is the embedding dimension (length of vectors to compare). We used an embedding dimension of 2, which has been shown to be appropriate for small datasets [51], r is the tolerance value, n is the original data length, and A and B are given as, where, A m (r) denotes the probability that the two sequences match for m + 1 points, and B m (r) denotes the probability that the two sequences match for m points [50]. Entropy is a concept handling predictability and randomness, with higher values of entropy always related to less system order and more randomness [52]. In the event of a seizure, the EEG signal on certain channels shows more randomness, suggesting a high value of SampEn for the seizure epochs compared to epochs with normal brain activity. SampEn is calculated using the algorithm proposed by Richman et al. [51].
The CD of a set of points measures the space dimensionality occupied by these points. It determines if a seemingly random time-series signal was truly random or generated by a nonlinear dynamical deterministic system. A truly random signal cannot be embedded in a smaller dimension than the embedding dimension, while a signal generated by a nonlinear dynamical system can be embedded within a smaller dimension space. It has been observed that seizure data has smaller CD compared to normal data. As a result, one can conclude that any discernible randomness in normal data is likely due to random noise, whereas the randomness in seizure data is due to seizure generating mechanisms in the brain. The CD is calculated using the Grassberger-Procaccia algorithm [53].
For a time series given by ts = (x 1 , x 2 , ..., x n ), the CD can be computed as [54], where C r is given by [54], and where Θ denotes the Heaviside function. Statistical features: Statistical features were extracted from the sub-signals obtained from discrete wavelet decomposition (DWD) of each channel of the EEG signal. DWD has been successfully used for extracting features in multiple studies with EEG data [10,55,56]. DWD provides a high-resolution signal at each analysis scale while not compromising the temporal resolution.
DWD is based on discrete wavelet transform. A wavelet is an oscillating function that is rapidly diminishing. The signal is split into scaled and translated versions (ψ a,b (t)) of a single function ψ, termed mother wavelet, in continuous wavelet analysis [57]: where a and b are the scale and translation parameters, respectively. DWD was obtained by discretizing the scale and translation parameters. Several mother wavelet functions can be used [45][46][47] Each epoch was decomposed into seven sub-signals using a six-level DWD.
Six-level DWD has been shown to be appropriate for feature representation in time and frequency domain [58,59]. In addition, we conducted wavelet decomposition of EEG with five scales, and selected few frequency bands of them for subsequent processing inspired by the work done by Liu et al. [39]. Table 2 gives the frequency composition of each sub-signal represented by corresponding detail or approximation coefficients.

Feature Selection
We developed a simple graphical user interface (GUI) using Python programming language, which helped us visualize the seizure, non-seizure data, and the corresponding features. The initial features were selected such that there was a visible distinction in the features corresponding to normal and seizure data. The feature selection experiments were performed with epoch durations of 3 s, 5 s, and 7 s, which are commonly used sub-sample durations [56,60].
Features with low discriminative power-RMS, MA, and AM corresponding to the [0, 3.90625] Hz band-could be easily identified using the tool. This apparent visual distinction can be attributed to the DC content in this frequency band, which does not appear to change much between seizure and normal activity. We started with 23 features per channel and arrived at 20 using the visualization tool depicted in Figure 3. The following filter and wrapper feature selection methods were applied on this reduced set, each method resulting in a distinct optimal feature set.

•
Mutual information-based feature selection: The features are ranked based on the mutual information between them and the target output. The mutual information is calculated using entropy estimation from k-Nearest Neighbor distances [61,62]; • Chi-squared score-based feature selection: The features are ranked based on the chisquared value [63] between them and the target output. Since chi-squared statistic measures the dependence between stochastic variables, it helps to remove the nondiscriminative features, which are most likely independent of the target class; • ANOVA F-Value: This method selects the features with the highest one-way analysis of variance (ANOVA) [64,65] score between the target output and each feature; • Recursive Feature Elimination (RFE): This is a wrapper method where features are recursively eliminated until the performance of a classifier stops improving [66].

Machine Learning
Since our goal was to develop a real-time seizure detection system, the selected ML algorithms had to be simple in terms of hardware and power requirements [22]. Owing to the preponderance of normal data and disproportionately small amount of seizure data, ML algorithms from the anomaly detection class of algorithms were used. The term anomaly detection is used here because the seizure does not last for a long duration compared with the session recording. For this specific medical application, i.e., seizure focus localization, the true positive rate (rate of correct identification of seizure activity) should be as high as possible with the lowest-possible false positive rate (rate of misclassification of normal activity as seizure activity). We used k-Nearest Neighbors (k-NN) and Reduced Coulomb Energy (RCE) networks for this task. Similar template-based classifiers were used by Qu et al. [16]. We compared the results of classifiers based on these algorithms to the baseline results obtained from the classifiers based on traditional ML algorithms SVM and ANN. k-NN: K-NN is a non-parametric, memory-based classification algorithm. It commits all the training examples to memory and uses them as templates for classification. When a test example is presented, the algorithm computes the L2 distance to each of the saved templates. The k closest examples are then selected, and the test example is assigned to the class represented by the majority of the k examples. The algorithm has one hyperparameterthe number of nearest neighbors, k. As seen in Figure 4, a binary k-NN divides the feature space into two distinct regions corresponding to the two classes. RCE: RCE improve the accuracy of K-NN by adding a distance threshold. In this way, RCE addresses one of the main deficiencies of K-NN. As it can be seen in Figure 4, an input example still has neighbors and is attributed to one class even when it is very distant from any saved example. Adding a distance threshold ( Figure 5) allows for saving only examples who are close to neighbors and find correct classes for new input examples. Another advantage of RCE is that not all examples are committed to memory. The shape of the IF around a vector is defined by the distance metric used for the network-e. If a new training vector falls in the IF of a vector belonging to the other class, the IF of the existing vector is shrunk so as not to include the new vector and the new vector is assigned an IF so as not to include the other vector in its IF. A binary RCE network, in contrast to a k-NN, can output three classes: class 1, class 2, or Unknown class . When deployed for classification, if the vector falls within the influence field of either or both of the classes, it is classified into the closest class in terms of the network distance metric. This can happen frequently with seizures, owing to their random nature-seizure patterns can vary significantly from one session to the other for the same individual [67]. SVM: SVM is a two-class classifier, which builds a nonlinear boundary between the two classes of interest in a multidimensional feature space [68]. We tested several kernel functions-radial basis function, polynomial, and linear kernels-and selected linear kernel with automatic class weighting for this task. SVM was implemented using scikit-learn [69].
ANN: ANN, also known as multi-layer perceptron, produces a mapping from the input space to the target space by optimizing the parameters of the network-the weights and biases connecting nodes in successive layers [70,71]. The ANN architecture was guided by the input size and the number of output classes. The LBFGS solver [72], which is suited for small ANNs trained on datasets, was used for optimizing the log-loss function by adjusting the weights and biases. Rectified Linear Unit (ReLU) activation function was used for all the nodes. The input layer has 210 nodes, followed by two hidden layers with 70 and 4 nodes, respectively, which are followed by the output layer with 2 nodes. ANN was also implemented using the scikit-learn python library [69].

Experimental Setup
As depicted in Figure 2, the raw EEG data was converted into a series of feature vectors through preprocessing and feature extraction. In the preprocessing stage, the raw EEG data was windowed into 5 s long epochs with no overlap between successive epochs. In the feature extraction stage, CD and SampEn were calculated on the preprocessed data per epoch for each of the 21 channels. The remaining features were computed from the DWD coefficients. For each session, we had a set of seizure vectors and a set of normal vectors. Since there was an overwhelming amount of normal data, we randomly chose a subset of normal vectors, which was three times the size of the seizure set, for our experiments. Since we have hundreds of hours of data, and 21 channels per epoch, we implemented a map-reduce framework using Python's Multiprocessing module to make use of multiple processors and speed up the feature extraction by a factor of 10. This framework can be translated into hardware for real-time implementation using an FPGA. For our work, the feature vectors were stored in hdf5 format [73], which optimized memory utilization and execution speed.
We used the NeuroStack hardware for implementing the k-NN and RCE networks. The hardware has the following constraint: each board can commit a maximum of 4096 examples to memory. Hence, a k-NN network on this hardware can store only the first 4096 training vectors. The RCE network stores the first 4096 vectors that do not fall into each other's influence field. The training order for k-NN was seizure vectors followed by normal vectors. This was done to make sure the k-NN saved sufficient seizure examples in memory. In case of RCE, since the decision space changes with the order of the training vectors, we performed iterative training until two successive iterations resulted in the same decision space.
As mentioned, NeuroStack uses parallel neuromorphic architecture. The basic operation, which is computing the distance between an input example and all the saved examples, takes a constant amount of time, irrespective of the number of saved examples. This is possible because each example is saved in a separate uniquely addressable memory location. This results in a small training time and a quick response while testing. A consequence of this memory setting is that there is a maximum limit on the size of an example. In NeuroStack, each example can be 256 bytes long. To conform to the 256-byte memory limit, every feature was normalized to 255 so that each feature takes up at most 1 byte of memory. This allowed for a maximum of 12 features per channel, which is equivalent to a maximum of 252 bytes per epoch.

Experiments
The following experiments were performed: For population-based classifiers, the training set included session data from all the individuals but one, and all the sessions belonging to the one individual were used for testing. This experiment was repeated for all individuals.
In all the classification experiments, seizure and normal data were respectively des- The following are the metrics used for evaluating and comparing the performance of the classifiers: Sensitivity: Also known as the true positive rate, is the fraction of the seizure examples classified as a seizure.
Specificity: Also known as the true negative rate, it is the fraction of normal examples classified as normal. Speci F1-score: This combines the sensitivity and specificity into a single metric, making it easy to compare the performance of classifiers with different sensitivity and specificity values.
where the precision is given by: and the recall is given by:

Feature Selection
Each filter-based feature selection system outputs a feature importance list for each individual. It was observed that the optimal feature set differs among individuals, as can be seen in Figure 6. We averaged the importance scores over all individuals for each method and compared the performance of the classifiers with these feature sets.

Patients
Average Chi-squared Mutual information ANOVA F-Value Percentage score Figure 6. Feature importance calculated using chi-squared, mutual information, and ANOVA F-Value from left to right. The columns represent the relative importance of features for each patient. The importance is represented on a scale from green to red, green being the most important and red the least important.
It can be seen that the top 10 features selected by ANOVA-F and mutual informationbased methods are the same, while the Chi-squared based method yields a different set of features. Recursive feature elimination (RFE) was performed for SVM, k-NN, and RCE. It was not performed for ANN since the architecture of ANN would have to change for every different input size. For RCE, the examples classified as Unknown were assigned to normal class. For both RCE and k-NN, one nearest neighbor was used. Table 3 lists the feature sets obtained from the feature selection experiments.

Resolution Strategy
As mentioned in the machine learning section, the examples classified as Unknown by RCE can be resolved via various strategies. The results of using three different strategies with individual-based RCE and RCE RFE feature sets are shown in Table 4. Note that the results are presented for the entire data. Each individual-based RCE results in a pair of sensitivity and specificity values (one for each patient). The variance in the sensitivity and specificity metrics shown in the table is the statistical variance of sensitivity and specificity values across all the patients. It has been observed that using a population-based k-NN on examples classified as Unknown by individual-based RCE results in the best classifier system. One nearest neighbor is used in this case.  Table 5 shows the performance of individual-based k-NN and RCE networks with a different number of nearest neighbors. We extended the strategy of using a general classifier on Unknown examples in case of RCE to SVM and ANN. For this task, we set a threshold on the predict probability of the class output by SVM/ANN. We observed that most of the non-seizure examples which were classified as seizure had a predicted probability of <0.8. Since the tracer injection is most effective when administered at the actual onset of seizure, we would want a high specificity. To rectify such misclassifications, we decided to input all the examples with predict-probability of <0.8 to a subsequent (population-based) classifier-this resulted in a lower sensitivity but a higher specificity. The improvement observed using this strategy is shown in Table 6, and all the experiments with SVM and ANN presented here use this two-stage approach.

Number of Nearest Neighbors
Once the architectures of all the classifiers were set, we compared the performance of the classifiers with different feature sets for individual-based scenario, as presented in Table 7. Following this experiment, the optimal feature set for each classifier was selected. The rest of the experiments were performed in an individual-based scenario using these optimal features, unless otherwise mentioned.
As illustrated in Figure 7, we had optimal performance for RCE with 10 features obtained through RFE.

Number of EEG Sessions Used for Training
The effect of a small amount of training data on the classifier performance is shown in Table 8. For training with multiple sessions, say m out of a total of N sessions available per individual, we performed ( N m )C experiments, with every combination of m sessions used for training and the rest used for testing, and averaged the results. The results for epoch length selection can be found in Table 9. It can be seen also that the specificity remains almost the same as we increase the epoch duration from 1 to 7 s. For sensitivity, as we increased the epoch duration from 1 s to 5 s, the sensitivity increased to reach its maximum at 5 s. For an epoch duration of 6 s and 7 s, the sensitivity decreased. In short, we obtained the best performance with 5 s epoch duration (91.14% and 98.77%).  Table 10 summarizes the final experimental results, in terms of sensitivity and specificity. As can be seen from the table, the RCE network trained on each individual has the best performance with a sensitivity of 80.16% and a specificity of 97.17%, followed by SVM. SVM has the best performance among population-based methods, but it is far from optimal. RCN still has the highest specificity (86.70%) but its sensitivity drops to 42.89%. ANN has the highest specificity for population based method with 81.42%.

Latency for NeuroStack
For epoch duration of 5 s, the feature extraction process, when parallelized with four threads, took 11.6 ± 0.1 s. The majority of the time (∼8 s) was taken to calculate the Correlation Dimension. If CD was not calculated, the latency for 5 s epoch duration was 3.6 ± 0.2 s. Exclusion of CD from the feature set resulted in a 3% drop in sensitivity and 1% drop in specificity. The classification process took 1.2 µs on an average. The latency can be reduced with smaller epoch duration, but with a small drop in sensitivity. Latency has a dependency on epoch size. For example, using 2 s epoch duration yields a latency of ∼5 s (including computation of CD).

Discussion
The experiments conducted in this work and the results provide relevant information for deciding the architecture of the classifier and the overall real-time seizure detection system. It can be observed that RCE has better performance than SVM, ANN, and k-NN when trained with data from a single session, and this performance is comparable to the one obtained using all sessions for training. In the case of SVM and ANN, we see that the performance gradually improves with data size, and they need numerous EEG sessions for training to reach the performance obtained by RCE with a training set composed of a single EEG session. This performance disparity can be attributed to the ability of RCE networks to identify anomalies with confidence, even with a small amount of seizure data. Having a secondary population-based classifier to classify the Unknown examples (or examples with predict probability <0.8 in case of ANN and SVM) also improved the performance of the classification systems. All the examples which could not be classified by the primary model with a high probability were input to a secondary classifier for a second opinion and classified accordingly. It can be seen from Table 10 that individual-based seizure detection systems work better than population-based systems. This can be attributed to the high variability in the seizure patterns from person to person, which cannot be captured by a single general model. In this study, the seizure and normal data was labeled and we knew the start and end of each type of data. Hence, non-overlapping epochs were used without any performance degradation. When deployed in a clinical workflow, the system will be working with continuous streams of EEG data. Overlapping epochs can be used in such a scenario to further reduce the chance of missing a seizure since there will be more seizure epochs. The present system can be further improved by pre-processing the data to remove artifacts using combined Blind source separation and independent component analysis [74], which has proven more useful for separating linearly mixed independent sources in EEG signals, including artifacts [75,76]. The influence of such pre-processing can improve the presented results and can be investigated in future studies.
This improved system can be used to trigger tracer injection for ictal-SPECT. In the future, our system can also be used to trigger deep brain stimulation (electro-stimulation) for suppressing ictal discharges and their propagation, and to inject drugs intracranially for more effective seizure control. Table 11 presents a summary of the main results presented in this study in comparison between the proposed techniques and the state of the art techniques. We did not implement the techniques using the hardware as some of the techniques are not applicable to our study. However, we cite the results obtained by several papers recently published for the sake of the comparison. It can be seen from Table 11, that the proposed method is able to achieve over 90% plus sensitivity, specificity, and accuracy, all while keeping the delay of the seizure detection within 12 s. Table 11. Performance comparison between the state of the art techniques and proposed methods.

Conclusions
This study presents an approach for automatic seizure detection based on RCN networks. These networks are data efficient. This study is the first of its kind in terms of hardware implementation and validation of the theoretical approach. The proposed methods are comparable to recent deep learning techniques that can achieve state of the art detection accuracy. The proposed technique has the advantage of being trained on fewer training samples instead of large database required by deep learning, which entails tedious labeling work. It can be concluded that a 5 s epoch duration resulted in the highest sensitivity and specificity. It can be concluded that the latency of RCE is highly dependent on the CD feature. Including the CD feature improves the system accuracy but at the cost of increased latency. Also, individual-based RCN has better performance than populationbased RCE. It can be concluded also that increasing the number of EEG sessions resulted in better performance for all the studied algorithms. Increasing the number of neighbors results in an increase of specificity of K-NN but not the sensitivity.