Nanopower Integrated Gaussian Mixture Model Classifier for Epileptic Seizure Prediction

This paper presents a new analog front-end classification system that serves as a wake-up engine for digital back-ends, targeting embedded devices for epileptic seizure prediction. Predicting epileptic seizures is of major importance for the patient’s quality of life as they can lead to paralyzation or even prove fatal. Existing solutions rely on power hungry embedded digital inference engines that typically consume several μW or even mW. To increase the embedded device’s autonomy, a new approach is presented combining an analog feature extractor with an analog Gaussian mixture model-based binary classifier. The proposed classification system provides an initial, power-efficient prediction with high sensitivity to switch on the digital engine for the accurate evaluation. The classifier’s circuit is chip-area efficient, operating with minimal power consumption (180 nW) at low supply voltage (0.6 V), allowing long-term continuous operation. Based on a real-world dataset, the proposed system achieves 100% sensitivity to guarantee that all seizures are predicted and good specificity (69%), resulting in significant power reduction of the digital engine and therefore the total system. The proposed classifier was designed and simulated in a TSMC 90 nm CMOS process, using the Cadence IC suite.


Introduction
The continuing progress in integrated circuit (IC) technologies has resulted in complex and power-efficient systems that address the challenges of various Internet of Things (IoT) and machine learning (ML) applications [1,2]. A particular example is wearable systems that monitor the user's health condition, such as electroencephalogram (EEG) monitors [3]. In this case, the subject's brain activity is monitored through the use of electrodes attached to the scalp in order to track, classify, and diagnose epileptic seizures. By continuously monitoring EEG signals throughout the everyday life of the subject, accurate conclusions about their condition can be drawn and intractable epileptic seizures [4,5], which are not amenable to medication, can be forecasted [6,7].
Wearable devices that track the EEG signals can be employed in an everyday fashion. However, the need to operate unobstructed on a Lithium battery or using energy harvesters [8] poses constraints on the acquisition procedure; all-digital signal processing and ML-powered inference can be power-hungry and limit the system's autonomy. A trend to alleviate this limitation is the employment of cascaded classifiers, where the first ones consume relatively low power and are always on, activating more complex units only when needed [9]. Although weight quantization and pruning [10] have notably reduced the power dissipation per inference for digital ML models [11], the digital processing blocks that are provided with the models' input features consume considerable power [12,13].
This renders the aforementioned cascaded classification scheme sub-optimal. To address this, recent work has proposed moving the feature extraction procedure in the analog part of the processing chain [12][13][14][15]. Front-end signal processing blocks, like switched capacitor filter banks, operate on the signals prior to the analog-to-digital converter (ADC) to lower the overall system's power. The digitized features are then input to the ML model in the digital back-end.
To increase the autonomy of wearable EEG monitoring devices, overall power consumption must be decreased below the µW range. Because the energy performance of typical ML models in digital circuitry is in the µW [12,16], an alternative approach seems to be preferable. To this end, in this work we propose an ultra-low power classification system that takes advantage of analog features and uses an analog classifier as a switching device for the power-hungry digital back-end [17]. The proposed architecture, along with the mainstream approaches discussed previously, are illustrated conceptually in Figure 1. The classifier is a Gaussian mixture model (GMM), and its analog implementation consumes 180 nW of power when operating on a 0.6 V supply voltage, in the sub-threshold region. Its predictions are used to switch on and off a subsequent stage of a digital classifier, which provides high accuracy for the whole processing chain. For evaluation, the proposed classifier is designed and verified using a real-world intractable epileptic seizure dataset [4,5].  The remainder of this paper is organized as follows. The background regarding epiliptic seizure prediction and analog classifiers is provided in Section 2. Section 3 explains the mathematical foundations of GMMs. The proposed architecture and its building blocks are discussed in Section 4. Section 5 presents the experimental results of the proposed approach on a real-world EEG dataset. A comparison study and discussion are provided in Section 6. Concluding remarks are given in Section 7.

Motivation and Background
In this section we provide the necessary background on epiliptic seizure prediction and a summary of existing approaches. To introduce the reader to the state-of-the-art in analog implementations of ML systems, a summary of existing analog classifiers is also given.

Epileptic Seizure Prediction
An epileptic seizure is a sudden excessive neural activity or electrical disturbance in the brain [18,19].
An individual suffering from epilepsy demonstrates symptoms that vary from unnoticeable to paralyzing or even lethal. In practice, patients' quality of life is severely affected by the unpredictability and the frequency of the seizures. A remedy to this can be prediction and warning about upcoming epileptic episodes. An accurate prediction of an upcoming seizure could allow them to prepare accordingly and avoid potentially dangerous activities, like, for instance, driving. Epileptic seizure prediction stems from examining the patients' health using bio-signal acquisition methods.
There are four different states regarding epileptic seizures; (a) pre-ictal, (b) ictal, (c) post-ictal, and (d) inter-ictal [18,19]. States (a)-(c) refer to the periods shortly before, during, and shortly after a seizure, respectively, whereas (d) refers to the period between two seizures, when the patient is considered to be in a normal state. Based on the analysis presented in [20], the duration of the pre-ictal and post-ictal periods varies from 30 min to 2 h. An accurate and real-time identification of the pre-ictal state is crucial, as it is equivalent to predicting an upcoming seizure.
However, there exist multiple architectures that employ analog design methodologies to address epileptic seizure prediction through monitoring EEG signals. The work in [13,15] employs analog feature extraction to greatly minimize the system's power consumption. A different approach includes employing analog pre-processing circuits directly on the acquisition device in remote computing applications [33][34][35]. In particular, the analog circuit reduces the overall power consumption of the communication device by reducing the data that need to be transferred to the remote server for prediction. A brief summary of seizure prediction systems in terms of employed algorithms, operating device, and power consumption for all the aforementioned implementations is provided in Table 1.

Analog Classifiers
Analog integrated circuits (ICs), powered by their capability to operate in the subthreshold domain [36], are gaining popularity as a means to reduce power consumption in comparison to their digital counterparts. Applications that employ real-time ML techniques are typically power hungry and could greatly benefit from analog circuitry. Nonetheless, analog circuits struggle with high dimensional classification problems as they typically require multiple cascaded multipliers. In practice, analog multipliers are usually unreliable and their operating voltage range is limited. Two main approaches regarding this issue are to either tailor multipliers for specific applications [37,38] or utilize architectures and/or circuits that avoid multipliers [39][40][41][42][43]. Following the former approach, translinear-based, current-mode multipliers [44] are the most popular choice. Regarding the latter, Gaussian function circuits are a commonly used solution [45].
Translinear-based Gaussian function circuits [45] that consist of squaring and exponentiator circuits are utilized in [39,40]. In this case, by leveraging the properties of the exponential function, the multiplication is replaced by the summation of the exponents, which is a trivial task. Alternatively, the work proposed in [41][42][43] uses more compact building blocks, e.g., bump circuits [41], that implement multivariate Gaussian functions without the use of multipliers. A performance summary of the aforementioned work is presented in Table 2. By examining Table 2, the implementation with the lowest power consumption is [41] (365 nW). This is due to the combination of a compact and simple ML model with ultra lowpower building blocks that operate in the sub-threshold domain with a low supply voltage. Based on our previous work in [41], here we build a low power analog classifier and improve upon the accuracy by employing a GMM model instead of a simple Gaussian one.

Gaussian Mixture Model
In this section, the mathematical foundations of GMMs, which comprise the core of the proposed classifier, are given. In addition, the use of the GMMs within the scope of classification is also described.
Consider an N-dimensional random variable X = [x 1 , . . . , x n ] and its probability density function (PDF) p with X ∼ p. The GMM is a probabilistic model that consists of a weighted sum of Gaussian distributions and can be used to approximate unknown PDFs from data [46]. In the case of X, the GMM's Gaussian distributions, also noted as components, are also N-dimensional. GMMs belong to the general class of mixture models (MMs) and are widely used in the literature, as they combine both the approximation capabilities of MMs and the properties of Gaussian distributions.
The approximate PDF of X, as modeled by a GMM λ, is given by Here, the component count is K ≥ 1 and for the weights it holds that In the special case of diagonal covariance matrices, each Gaussian distribution is given by where superscript 'i' denotes the Gaussian component and subscript 'n' the dimension, i.e., µ i n is the nth component of vector M i and σ i n is the nth component of the diagonal of matrix Σ i . Hence, each component is derived by the product of N univariate Gaussian distributions given by GMMs are adapted to data by using the expectation-maximization (EM) algorithm [46]. Although their unsupervised nature renders them suitable for clustering problems, they can also be used within the scope of supervised classification models. Considering a dataset D with N-dimensional input vectors and C classes, one can fit C separate GMMs [λ i ] C i=1 to each subset of D associated with each class. Therefore, the PDF of the input vectors that belong to each class is approximated by a GMM.
Using the above setting, one can infer the class y of a new input vector X of an unknown class as the one whose approximate PDF provides the highest likelihood, i.e., In this case, superscript 'c' denotes the class. It is important to note that in Equation (4) all GMMs share the same number of components K. In the supervised setting, K is denoted as clusters, and this is the naming this paper follows for the rest of the sections. The number of clusters is a hyperparameter of the overall classifier and it is chosen based on the complexity of the data.

Proposed Architecture
In this section, the architecture of the proposed analog classifier and the operation of its building blocks are analyzed. To reduce the overall power consumption, in the following building blocks, all transistors operate in the sub-threshold region, and the power supply rails are set to V DD = −V SS = 0.3V for the entire classifier.
Based on Section 3, a GMM-based classifier requires two basic building blocks: one that generates a Gaussian PDF, as in (1), and another that implements the argmax operator, as in (4). In the case of analog hardware, bump circuits have been proposed for the hardware implementation of a univariate Gaussian PDF [47]. Recently, a modified version of the bump circuit was proposed to generate multivariate PDFs as well [48]. Concerning the analog implementation of the argmax operator, winner-take-all (WTA) circuits have been employed in the literature [49]. In this work, we modify a typical bump circuit and use it in the proposed classifier in order to increase its accuracy.
The modified bump circuit is a combination of two sub-circuits: a symmetric current correlator [48] and a differential block [50]. The aim of this modification is to increase the quality of the Gaussian curve and reduce the distortion in the case of the multivariate bump circuits. In particular, the symmetric current correlator improves the symmetry of the Gaussian curve around the mean value [48]. The simple differential block offers good control of the Gaussian curve's parameters with a minimal area [50]. The cascode mirrors are used instead of the standard ones, to offer robust mirroring even for small bias currents. This is necessary for multivariate bump circuits. This bump circuit, shown in Figure 2, provides a more accurate Gaussian curve, shown in Figure 3, than either of [48,50]. Transistors' dimensions are summarized in Table 3. The mean value, the variance, and the height of the Gaussian curve are controlled via the voltage parameters V r and V c and the bias current I bias , respectively [48,50].   The multivariate Gaussian PDF is realized by multiplying two or more bump circuits, as described in Equation (2). Consider a sequence of two bump circuits. Biasing the second one with the output current of the first one results in an overall output current that is equivalent to the multiplication of their respective Gaussian curves [48]. An implementation of a 4D Gaussian PDF (four cascaded bump circuits) is shown in Figure 4. Only the first bump is biased with a preset current (I bias ), representing the weight w i of the corresponding cluster i. The topology in Figure 4 constitutes a cluster of the proposed GMM-based classifier. The second block of the proposed architecture is a Lazzaro WTA circuit [49]. Its flexibility and simplicity make it the most popular choice for the implementation of the argmax operator. This WTA circuit is composed of sub-blocks denoted as neuron cells. For a C class classification problem the number of neuron cells must be also C, each one responsible for a single class. In particular, each neuron cell receives the likelihood from a specific GMM and outputs a current in binary format; if the GMM corresponds to the class with the highest likelihood, this current is logical one (which is close to the WTA's bias current), otherwise it is logical zero (less than 100 pA). For demonstration purposes, a transistor level implementation of a WTA circuit with two neurons is shown in Figure 5. All transistors' dimensions are equal to W/L = 0.4 µm/1.6 µm.
Neuron Cell Neuron Cell Utilizing the aforementioned building blocks and based on Equation (4), the proposed GMM-based classifier with two classes, two clusters per class, and 4D inputs is shown in Figure 6. Each GMM class is comprised of two 4D bump circuits, which correspond to the two clusters, and two current mirrors that are used to add the output currents of each cluster. The overall output current [I ci ] 2 i=1 of each class is analogous to the class' likelihood. The WTA circuit compares these probabilities and the predicted class is determined via the . It should be noted that it is impractical to provide the classifier's 34 controlling parameters ([V rj ] 16 i=j , [V cj ] 16 j=1 and [I biasi ] 2 i=1 ) externally. Therefore, an alternative option that involves integrating analog memories adjacent to the classifier is preferable. In particular, as typically the classifier will be trained only once prior to its deployment, non-volatile analog memories are a promising choice [51,52]. However, for a general purpose classifier that may require altering this configuration multiple times, dynamic memories can be a more opportune solution [53,54].

Epileptic Seizure Prediction Application
In this section, the proposed classifier is tested on a real-world epilepsy seizure prediction problem [4,5] to confirm its proper operation. The classifier has been designed using the Cadence IC suite in a TSMC 90 nm CMOS process. All simulation results are conducted on the layout (post-layout simulations), which is shown in Figure 7. The data are acquired from the CHB-MIT Scalp EEG database [4,5] and contain EEG signals from children with intractable epilepsy. The ictal periods are labeled by expert physicians. Here, pre-ictal and post-ictal periods are considered to span an hour before and an hour after the seizure, respectively. The data samples that do not belong in ictal, pre-ictal, or post-ictal periods are labeled as inter-ictal.
There are four features for the classification: the signal's peak-to-peak voltage and energy percentages in the alpha and the first and second half of the gamma frequency bands [55]. These features can be efficiently derived from the raw EEG signals using analog feature extraction techniques [13,56]. The system's necessary parameters are derived by software-based training, prior to the circuit's deployment.
The aim of the classifier is to successfully distinguish the pre-ictal from the inter-ictal periods. In order to operate as a minimal power front-end wake-up circuit, it must predict all possible seizures and maintain a low number of false positive alarms. The first requirement is equivalent to having high classification sensitivity [57], which is measured by: Achieving a high sensitivity score is crucial for the patient's health, as it ensures that all upcoming seizures will be predicted. However, the second requirement is equivalent to minimizing the rate with which the high power consumption digital back-end is turned on. This leads to a significant power consumption reduction for the whole system, shown in Figure 1c. An appropriate measure to quantify this reduction is the specificity [57] of the analog classifier, given by: In practice, this metric is the ratio of the time that the digital back-end is idle to the duration of all the inter-ictal periods (no risk for seizure).
To test the proposed classifier both in terms of classification specificity and circuit's behavior in PVT variations, two separate tests are conducted. The first one is a comparison between the proposed implementation and a software-based one. In particular, 20 separate software-based training iterations are conducted to account for random effects. The resulting specificity scores are summarized in Table 4. The proposed architecture's mean specificity is only 2% lower than that of a software-based implementation. For demonstration purposes, the state of four patients along with the predictions of the analog classifier are presented in Figure 8. The classifier successfully predicts all 17 seizures (100% sensitivity) of the test set. The second test is a Monte-Carlo analysis for N = 100 points, for one of the previous 20 candidates. The Monte-Carlo analysis histogram is shown in Figure 9. Its mean value is µ M = 69.93% with a standard deviation of σ M = 0.41%. This confirms the proper performance and operation of the proposed architecture.

Discussion and Comparison
A comparison between this work and other studies that employ analog design methodologies to address epileptic seizure prediction through monitoring EEG signals is provided in Table 5. Here, it is seen that this work achieves very low power consumption per channel (180 nW per channel), outperforming all the implementations except that from [13], which achieves 96 nW per channel. Nonetheless, as the proposed implementation requires only a single channel, its total power consumption is significantly smaller. In particular, the proposed architecture consumes power in the range of nW, which is not the case for the rest of the implementations in Table 5. This power dissipation is achieved using a supply voltage of only 0.6 V, which is also the lowest one in Table 5. The specificity of the proposed classifier is 69%, which, along with [13] (86%) and [15] (84.4%), constitutes the three highest specificity scores.
Another important metric for measuring efficiency in analog computing, which is invariant to the application and is therefore a relatively fair metric for comparing architectures designed for different applications, is the energy consumed per operation. The proposed classifier consumes 180 nW and can achieve a computational speed of 166 K classifications per second, which results in 1.1 pJ per classification. Each classification, for a GMM-based classifier composed of two classes, two clusters per class, and 4D inputs, requires 131 operations. This results in the classifier's consumption being 8.2 fJ per operation. Unfortunately, these metrics are not provided in the literature for comparison purposes. As shown in Table 5, most epileptic seizure prediction systems employ multiple channels, i.e., electrodes, in order to increase their accuracy. Nonetheless, acquisition devices with multiple electrodes are usually uncomfortable for the patient and impractical for constant monitoring. To this end, this work focuses on extracting data from a single electrode. By doing so, the resulting device is less bulky and more convenient to use. In addition, to further increase the device's portability, the classifier is proposed to operate in an embedded device. In this way, the patient can be monitored constantly with no requirements for wireless communication with other devices as proposed in [33][34][35].
In real-world scenarios, EEG signal acquisition is affected by uncontrolled parameters and environmental factors. In the case of a single electrode in particular, motion artifacts, electrode misplacement, and external electromagnetic interference can drastically reduce the quality of the signal and potentially lead to diagnostic errors. Having multiple electrodes for signal acquisition may seem more robust, as the contaminated recordings could be only a fraction of the total inputs to the prediction system, but this comes at the cost of the system's portability and with no theoretical guarantee. Efforts to determine the goodness of the acquired signals have been proposed in the literature via employing ML classification techniques [58]. We argue that this quality assessment can implicitly take place within the GMM classifier of our system provided that: (a) real-world, noise-contaminated EEG signals are used for training and, (b) the classifier is expanded to provide confidence bounds about its predictions. By doing so, additional systems for quality assessment, as in [58], can be avoided and thereby the area and power consumption of the device is unchanged.
Another important design consideration is the trade-off between the wake-up circuit's power consumption and its specificity. As the specificity of the wake-up circuit increases, the overall power consumption of the digital circuit, which is typically greater than the analog one's, decreases. However, to achieve high specificity values, it is essential to increase the complexity of the analog circuit. In particular, to improve the classifier's performance, improved-performance acquisition devices, more analog feature extraction circuits, and larger analog memories storing the classifier's parameters are required. All the aforementioned modifications result in increased power consumption. In practice, increasing the power consumption of the analog front-end must be done cautiously; a classification system with a power greedy analog classifier that switches on and off a digital one may consume more power than an all-digital one.

Conclusions
A fully analog processing unit was presented as an alternative to the conventional front-end architectures for inference systems targeting EEG signals. The proposed system includes an 180 nW or 8.2 fJ per operation analog integrated GMM-based classifier, which activates the high-performance digital inference back-end only when needed. Its main building blocks are Gaussian function circuits and the Lazzaro WTA circuit. The classifier was trained on a real-world seizure prediction dataset and designed in a TSMC 90 nm technology. Post-layout simulation results suggest that the proposed circuit achieves 100% sensitivity, as all 17 seizures of the test set are predicted, and 69.07% specificity.

Conflicts of Interest:
The authors declare no conflict of interest.