1. Introduction
Epileptic seizure (ES) is a clinical manifestation of abnormal, excessive, and hyper-synchronous discharges in cortical neurons [
1]. It can present with various symptoms such as uncontrollable convulsions, tremors, loss of consciousness, and blank stares [
2,
3], profoundly affecting patients’ daily lives. The most serious risk associated with epilepsy is its potential fatality, with mortality rates in adults reaching up to 3.6%. While anti-epileptic drugs effectively control seizures in about 70% of patients, the remaining 30% exhibit drug resistance [
4], highlighting the urgent need for more accurate monitoring and timely intervention.
Electroencephalogram (EEG) records brain activity using electrophysiological indicators. It is closely associated with ES caused by abnormal brain discharges. ES detection relies on EEG signals for continuous monitoring of patients and rapid identification of seizure events [
5]. This is a huge workload for neurologists due to the large number of patients whose seizures need to be detected [
6]. Therefore, the development of automated ES detection methods plays a crucial role in supporting neurologists. These methods can help neurologists reduce missed or misdiagnosed cases [
7]. In addition, automated ES detection methods help to analyze the frequency, duration, and pattern of seizures, providing valuable insights for developing a precise medication regimen or surgical treatment plan [
8].
For ES detection methods, the quality of EEG signal is crucial [
9]. EEG signals are divided into intracranial EEG (iEEG) and scalp EEG (sEEG). iEEG is an invasive method where electrodes are implanted directly into the brain, offering significantly higher Signal-to-Noise Ratio (SNR) and fewer artifacts. This results in superior accuracy for ES detection and brain activity mapping. However, it involves surgical procedures like craniotomy, which carry risks such as long-term inflammation [
10]. In contrast, sEEG is a non-invasive method that records EEG signals using electrodes placed on the scalp. This approach makes sEEG safer and more accessible. However, sEEG signals have lower SNR and are more prone to artifacts due to attenuation by the skull and surrounding tissues [
11]. In clinical practice, EEG signals are analyzed in two main periods: ictal (during a seizure) and inter-ictal (the interval between seizures) [
12]. ES detection aims to alert the patient when the onset point is detected. The time difference between the point detected by the algorithm and the point at which the seizure onset occurs is the detection delay. An example of ES detection applied to the iEEG signal is shown in
Figure 1.
ES detection methods can be divided into two categories: deep learning and traditional machine learning. Deep learning-based ES detection methods have shown great potential, largely due to their ability to model complex spatio-temporal features of EEG signals [
14]). Convolutional neural networks (CNNs) are commonly used to extract multi-scale spatial features from EEG signals [
15,
16,
17,
18]. On this basis, Xu et al. [
19] proposed a multi-scale CNN model to shorten the latency of ES detection. Their method achieved 4.7 s detection latency and 0.08/h false detection rate in the SWEC-ETHZ iEEG short-term dataset. Long Short-Term Memory and transformer networks excel at capturing the inherent time-dependence of ES activity, providing directions for automatic ES detection [
20,
21]. Although deep learning methods have performed well in ES detection, their reliance on large-scale model parameter storage and high computational cost make them difficult to implement on wearable and low-power devices in real applications. Meanwhile, traditional machine learning, such as Support Vector Machines, Random Forests, and K-Nearest Neighbors [
22] have been used for ES detection. Compared to deep learning methods, these methods usually require fewer parameters and less computational resources and data, making them more suitable for real-time ES detection in resource-limited environments [
5].
As a traditional machine learning method for probabilistic inference, Bayesian theory [
23] has been applied to ES detection since as early as 2009. Tzallas et al. [
24] pioneered the use of the Naive Bayes classifier (NBC) for ES detection, laying the foundation for automated detection methods based on probabilistic reasoning. In recent years, several studies have also adopted the NBC for ES detection [
25,
26,
27,
28,
29], further validating its feasibility across different datasets. However, these methods often require a larger amount of training data. These factors limit their generalizability in more complex scenarios. To solve these problems, hyperdimensional (HD) computing approaches have also garnered attention. Burrello et al. [
30] introduced one-shot learning for iEEG ES detection using end-to-end binary operations, encoding iEEG signals with local binary pattern (LBP) features and mapping them into high-dimensional vectors. Detection was performed by comparing the Hamming distances of these vectors between inter-ictal and ictal periods. Later, they determined the epileptogenic zone by comparing the high-dimensional vector differences between different electrodes during ictal and inter-ictal periods [
13]. In 2020, they combined three features, LBP, line length, and amplitude, to classify high-dimensional vectors by Hamming distance. The detection latency of this method on the SWEC-ETHZ iEEG short-term dataset was 8.81 s, which is the lowest value currently achievable by traditional machine learning methods [
31]. However, the problems of low sensitivity, high average detection latency, and large high-dimensional vectors that are difficult to implement on hardware remain. Furthermore, while iEEG provides high-quality recordings, most existing algorithms are not suitable for real-time or embedded implementations due to high computational costs. In contrast, our approach is designed to meet the constraints of low-power, memory-constrained systems and can therefore be deployed in bedside monitors, implantable neurostimulators, or other future resource-constrained environments.
In summary, current methods for detecting ES through EEG signals still exhibit certain limitations. On the one hand, contemporary HD computational approaches typically suffer from significant detection delays and rely on large HD vectors that are difficult to implement efficiently in hardware. On the other hand, whilst iEEG can provide high-quality recordings, the substantial computational and memory costs involved render many existing algorithms unsuitable for real-time or embedded applications. In contrast, our method aims to develop and evaluate a lightweight, low-detection-delay, and high-detection-performance framework for ES detection.
2. Method
The overall architecture of the proposed ES detection framework is shown in
Figure 2. This framework pioneers the integration of HD computation with a binary Naive Bayes classifier (BNBC) for detecting ES in iEEG signals. Specifically, our objective is to achieve reliable one-shot and few-shot learning performance while maintaining a lightweight model architecture.
Our proposed ES detection method consists of four main phases. Firstly, the feature extraction phase involves generating 6-bit LBP codes from 7 sampling points for each electrode, which effectively captures the local patterns of the EEG signals. Secondly, these LBP codes and corresponding electrodes are converted into high-dimensional vectors using the HD computing method, representing the signal features within time windows of either 0.25 s or 0.5 s. Thirdly, binary Naive Bayes classifier [
32] is then applied to classify these high-dimensional vectors. During the training phase, the BNBC learns classification rules to distinguish between ictal and inter-ictal periods based on the high-dimensional feature representations. In the testing phase, the BNBC classifies the test signals as either inter-ictal or ictal. Finally, the detection decision is further optimized through a patient-dependent voting mechanism, which considers the results from the last 5 s of the prediction label. The detailed procedure of the method is illustrated in
Figure 3.
2.1. Data Preparation
This study focuses on analysing the SWEC-ETHZ short-term iEEG dataset [
13]. The dataset consists of 100 anonymized iEEG recordings from 16 patients with drug-resistant epilepsy, sampled at 512 Hz. Each recording consists of a 3 min pre-ictal period, an ictal period, and a 3 min post-ictal period, with the number of electrodes varying from 36 to 100. For pre-processing, the iEEG signal is first converted to a binary sequence using LBP coding. Each sample consists of seven sampling points, which are encoded by LBP to produce 6-bit binary values. LBP coding involves comparing the voltage amplitudes of two neighboring sample points: if the voltage at the latter sample point is greater than the voltage at the former, the binary value is assigned a value of 1; otherwise, it is assigned a value of 0. These 6-bit binary values are computed and stored as decimal numbers. The stored decimal numbers are then mapped into high-dimensional vectors.
2.2. Hyperdimensional Computing
To perform HD computing, we construct two mapping matrices whose elements are independently sampled from the normal distribution . One matrix is used to map the coded LBP signals, whereas the other maps the corresponding electrodes into the same D-dimensional space.
Since the elements are independent and identically distributed, the resulting high-dimensional vectors are approximately orthogonal: for two independently sampled vectors , the expected inner product is and the variance of the normalized inner product decreases as . Thus, when , the cosine similarity between two random vectors concentrates tightly around zero with a standard deviation of approximately , and the probability that two vectors exhibit a large correlation is negligibly small. In other words, different codes and electrodes are represented by quasi-orthogonal vectors in the HD space.
In our implementation, the LBP operator produces 6-bit codes for each signal sample. Each code therefore takes an integer value between 0 and 63, i.e., there are possible patterns. Consequently, we allocate 64 distinct high-dimensional vectors to represent the coded signals, each corresponding to 1 decimal value in the range 0–63. The number of high-dimensional vectors used for electrode mapping equals the number of electrodes available for a given patient; if a patient has M intracranial electrodes, we instantiate M electrode vectors and use only these rows of the electrode mapping matrix. This design naturally accommodates patients with different channel counts, while keeping all HD vectors in the same dimensionality D.
The LBP codes and their corresponding electrodes are then bound by applying a bit-wise XOR operation between the code vector and the electrode vector. The resulting bound vectors from different electrodes are bundled by element-wise summation across electrodes and processed by a majority threshold. Specifically, after summation, each dimension is compared to half of the number of bound vectors being aggregated: if an element exceeds this threshold, it is set to 1; otherwise, it is set to 0. The resulting binary vector represents the HD encoding of a single time step. These binary vectors are further accumulated over time to represent either 0.5 s or 0.25 s signal windows, followed by the same majority-thresholding operation. This hierarchical bundling process converts the raw iEEG signal into multiple binary high-dimensional vectors, each encoding a 0.5 s or 0.25 s interval, which are then used as input samples for subsequent classification.
There are several reasons for selecting 0.25 s and 0.5 s as the segmentation window sizes. The onset stage of ES typically occurs within low-to-mid frequency rhythms. The 0.25 s and 0.5 s window sizes can cover 1-2 cycles. This segmentation ensures real-time processing while providing essential temporal features. Secondly, throughout the framework, binding vectors require summation and thresholding operations. Excessively short windows may cause encoding fluctuations, while overly long windows significantly increase detection latency. Furthermore, when considering hardware deployment, shorter windows reduce the sample size and cache requirements per decision. Utilizing integer LUT and addition operations effectively lowers computational and storage demands.
2.3. One-Shot and Few-Shot Learning
Different training strategies were employed for various patients, classified based on the number of training seizures into one-shot learning and few-shot learning. In the one-shot learning approach, a single seizure along with its corresponding inter-ictal period was utilized for training, while the remaining seizures were reserved for testing. Conversely, the few-shot learning strategy involved training with multiple seizures and their corresponding inter-ictal periods with the remaining seizures set aside for testing. For a fair comparison with previous works [
13,
31], we used exactly the same learning strategies for these patients.
2.4. Binary Naive Bayes Classifier
Since the input samples are binary vectors, we employ the probabilistic method known as the BNBC [
32], which is derived from Bayes’ theorem. In BNBC, the class descriptions of each feature vector in the training set are aggregated to construct a posterior probability model. During the testing process, the posterior probability that a given binary feature vector belongs to each category is computed. A key distinction of BNBC compared to traditional Bayesian classifiers is its ability to handle binary vectors, which significantly reduces computational complexity. This characteristic makes BNBC particularly suitable for large-scale or real-time applications.
In our method, given binary feature vectors
, the probability
can be readily estimated under the assumption that each component
follows a Bernoulli distribution with success probability
. Specifically,
and the conditional probability is then given by
2.4.1. Training
During the BNBC training phase, the prior probability
for each class
c is computed as follows:
where
is the indicator function:
For high-dimensional binary vectors, the probability of
can be computed as
2.4.2. Testing
After training, the BNBC classifies each test sample by assigning it to the class
c that maximizes the posterior probability. Each segment of the input signal is thereby labeled as either ictal or inter-ictal. Combining Equations (
1) and (
2) with Bayes’ theorem, the posterior probability of
belonging to class
c can be written as
2.4.3. Lookup Table
In BNBC, an Lookup Table (LUT) mechanism is designed for both the training and testing phases in order to replace costly multiplications by integer additions. During training, the class-conditional probabilities
are first estimated for each dimension
k and class
. The same probability values are then transformed and stored in the LUT and reused during testing. Starting from the Naive Bayes decision rule, the predicted label is obtained by
where
denotes the
k-th component of the binary HD vector and
is the probability that
given class
c. Equation (
7) shows that classification only requires the evaluation of logarithms of probabilities and the accumulation of their (weighted) sums.
Because each lies in the range , we discretize these probabilities and store their negative logarithms as integers in an LUT. To this end, a scaled logarithmic transform is applied. The logarithm is multiplied by for two reasons. First, the negative sign converts a maximization of the log-posterior into a minimization of a non-negative cost, which is convenient for integer accumulation. Second, the scaling factor provides a good compromise between numerical precision and hardware complexity. It yields sufficiently fine quantization of the log-probabilities while keeping the resulting integer values within a range that can be represented using a small fixed number of bits and implemented efficiently with integer adders and bit shifts.
Directly computing
is undefined and very small probabilities may lead to numerical instability. To avoid this, we introduce a small offset and define the LUT function as
where
p denotes a discretized probability value. The constant
corresponds to the midpoint of each probability bin of width
; adding this offset avoids the singularity at
and stabilizes the transform for very small probabilities, while still providing a good approximation to the continuous log-probability. For probabilities larger than
, the corresponding negative log-probability becomes very small; we therefore clamp these values to 1 to avoid storing excessively small integers that would have negligible impact on the decision.
To maintain a simple and hardware-friendly implementation, we discretize the probability range into 100 bins, which results in 100 distinct LUT entries. A finer discretization could in principle provide higher numerical precision, but would also increase the memory footprint of the LUT and the bit-width required to store each entry. The chosen resolution therefore reflects a trade-off between approximation accuracy and storage requirements. At the end of the training phase, all class-conditional probabilities are estimated and quantized to the nearest bin, and their corresponding integer values are stored in a probability array. In addition, we pre-compute and store the negative log-posterior for each class, , to capture the prior distribution of class labels.
During testing, the HD input vector is processed by summing the appropriate LUT entries instead of evaluating products and logarithms. For each class
c, we accumulate the integer costs associated with the active dimensions of the HD vector and add the stored class prior term
. The class with the smallest accumulated cost is then selected as the predicted label, indicating whether the corresponding segment is ictal or inter-ictal. In this way, all expensive floating-point multiplications and logarithms are replaced by table lookups and integer additions, which are well suited for low-power embedded implementations. An example of the resulting LUT is given in
Table 1.
2.5. Sliding Window Voting Detection
During the post-processing phase, predicted labels are aggregated into 5 s windows to generate the final ES detection results. A threshold is defined such that if the cumulative value within a window exceeds , the window is classified as indicating a seizure. This classification is then used to calculate the detection delay associated with the actual seizure onset. For each patient, the latency is computed as the mean detection delay between the seizure onset point and the ES detection point.
3. Results
The proposed method is evaluated using the SWEC-ETHZ iEEG short-term dataset. Prior to conducting the experiments, the method is optimized for practical application scenarios. Initially, a BNBC is employed to classify high-dimensional vectors, with a focus on achieving higher sensitivity to meet practical requirements. We feed the high-dimensional vectors into the BNBC for learning and segment it into 0.5 s windows, with the dimensionality of the HD space set to 10,000. This approach, referred to as ’BNBC + 0.5 s + 10,000HD’, enhances both sensitivity and specificity; however, it also results in increased latency.
To address the latency issue, the 0.5 s window is replaced with a 0.25 s window, and the dimensionality of the HD space is reduced from 10,000 to 1000, resulting in the method termed ’BNBC + 0.25 s + 1000HD’. Furthermore, an LUT is added in both ’BNBC + 0.5 s + 10,000HD’ and ’BNBC + 0.25 s + 1000HD’, leading to the methods ’BNBC + 0.5 s + 10,000HD + LUT’ and ’BNBC + 0.25 s + 1000HD + LUT’. The number of seizures used for training and validation is summarized in
Table 2. Overall and per-patient detection latencies are reported in
Table 3. Across both one-shot and few-shot learning, our BNBC-based methods yield substantially lower average latency than Hamming-based HD baselines.
The results of the experiments are presented in
Table 2. Four metrics are utilized to evaluate the method: latency, sensitivity, specificity, and accuracy. For each patient, latency is calculated as the mean detection delay between the seizure onset point and the ES detection point. Sensitivity measures the method’s ability to correctly identify ictal samples, reflecting its effectiveness in recognizing seizure periods; higher sensitivity indicates improved accuracy in ES detection and a lower rate of missed detections. Specificity assesses the method’s ability to correctly identify inter-ictal samples, with higher specificity corresponding to a lower false alarm rate and improved accuracy in detecting non-seizure periods. Accuracy represents the overall performance of the method in classifying ictal and inter-ictal samples, reflecting the proportion of correctly classified samples across all tasks. Compared to the results of [
13,
31], our method demonstrates improvements across all evaluation metrics. A detailed comparison of the different methods is provided in
Table 2.
Notably, we observed that for some patients, the model obtained based on training with a single seizure sample (one-shot learning) instead outperformed the few-shot learning model trained with multiple seizure samples. This phenomenon is consistent with the findings of Burrello et al. [
31], who noted that few-shot learning models may introduce more variability in seizure patterns or propagation paths in some cases, thereby increasing intra-class differences and leading to blurred classification boundaries.
Specifically, for patients with a clear, widespread, and stable seizure pattern, the one-shot learning model was able to learn highly representative features from a single sample to construct a concise and discriminative prototype vector. In contrast, a few-shot leaning model may introduce more inconsistent seizure types or local variations during training, which may lead to ‘feature dilution’ or ‘interference superposition’ in the prototype representation in high-dimensional hyperspace, and ultimately degrade the classification performance. Therefore, in practical applications, the adoption of the few-shot learning strategy should depend on individual patient characteristics, and the model performance does not always improve with the increase in the number of training samples.
Compared to Hamming-distance-based HD classifiers [
13,
31], the proposed BNBC+LUT framework exhibits superior performance and lower storage requirements under the same data and evaluation settings. On the one hand, out method improves sensitivity and specificity while stabilizing the average latency to within 4–5 s through a shorter window. On the other hand, the LUT replaces multiplication and logarithms with table lookup and integer addition, reducing the inference complexity and decreasing model storage from hundreds of KB to approximately 25–35 KB. This significantly outperforms previous Hamming distance classification methods.
3.1. One-Shot Learning
Compared to the Hamming methods [
13,
31], our methods achieved a sensitivity of approached 100%. The average latency was lowest in the BNBC + 0.5 s + 10,000HD method, recorded at 2.92 s. The BNBC + 0.25 s + 1000HD method exhibited an average latency of 3.15 s, with the lowest latencies observed in patients P4 and P5, while still maintaining stable detection performance. This indicates that our method achieves good performance even with a reduced dimensionality of the high-dimensional space. Furthermore, although the addition of the LUT resulted in a slight increase in average latency for the BNBC + 0.5 s + 10,000HD + LUT method, it remained low at 3.05 s. This demonstrates that the integration of the LUT does not significantly affect the latency.
3.2. Few-Shot Learning
As shown in
Table 2, the BNBC + 0.5 s + 10,000HD method demonstrates the highest accuracy among the examined approaches. There is a strong performance observed in P9 and P10 (specificities of 92.31% and 97.68%, respectively). Overall, sensitivity exceeds 90% for all patients except P12. However, the average latency in few-shot learning is 6.31 s. Consequently, this method is most suitable for applications where classification stability is prioritized over low latency. By shortening the window size to 0.25 s and reducing the dimensionality of HD computing, this method further refines its performance for specific patients. Its mean sensitivity is 97.56% and the overall mean latency is reduced to 4.31 s, with P12 having the shortest latency of 0 s. After adding the LUT, the BNBC + 0.5 s + 10,000HD + LUT method shows a greater latency of 5.46 s on P15. Compared to the method without LUT, this method obtains higher sensitivity and lower specifity. Also, there is a reduction of 0.05 s in the average latency. The BNBC + 0.25 s + 1000HD + LUT performs well in P10 (specificity of 97.54%), and achieves approximately 100% sensitivity in some cases (e.g., P10 and P14), resulting in a highest mean sensitivity of 97.76% in few-shot learning. In addition, this method has a slight increase in average latecy to 5.65s. Overall, the BNBC + 0.5 s + 10,000HD method offers balanced performance and adaptability with sensitivity (97.99%) and specificity (95.9%). However, the latency of this method is 4.61 s, which is higher than BNBC + 0.25 s + 1000HD and BNBC + 0.25 s + 1000HD + LUT. The BNBC + 0.25 s + 1000HD has the lowest latency, which is 4.31 s. While the LUT adds and reduces the specificity of the method, it also elevates the latency of the model. However, it improves the overall sensitivity to 98.88%.
3.3. Storage Requirement
The storage requirements for the proposed method are evaluated using the first seizure of P1 as an example. P1 was implanted with 100 electrodes, so we can calculate the maximum storage requirement. This calculation below provides a detailed assessment of the storage requirement for the proposed BNBC + 0.25 s + 1000HD + LUT method. Given that the system employs 16-bit analog-to-digital conversion, each sample point requires 16 bits. Therefore, the storage for the LBP can be computed as follows:
Additionally, storing 6 LBP values necessitates:
leading to a subtotal of 1475 bytes. Further storage is required for the LBP mapping matrix and the electrode mapping matrix, which contains 100 electrodes. Storing LBP mapping matrix (
C) necessitates:
Then, storing 6 LBP values necessitates:
while summation of XOR results consumes:
For a 0.25 s duration, each sampling point requires at least 5 bits, contributing:
Then, due to the nature of BNBC, we only need to store the histogram for one category. The storage for histograms in the BNBC training process is:
while the LUT requires:
Approximately 10.5 additional bytes are required to store the results of the calculations and the post-processing threshold. Summing all these components yields a total storage requirement of 25,235.5 bytes (approximately 25.24 KB). It is important to note that the methods proposed in [
13,
31] do not fully account for the hardware requirements. Therefore, we calculated the storage needs for these methods using the same approach, with comparative results summarized in
Table 2.
4. Conclusions
A new method for ES detection based on a BNBC for iEEG signals is proposed, achieving an average specificity of 93.09% and sensitivity of 98.88%, with the lowest average latency recorded at 4.31 s using one-shot and few-shot training strategies. This demonstrates the method’s effectiveness in detecting ES while minimizing missed detections and false alarms. For the first time, we combine BNBC with high-dimensional HD computation for ES detection, enabling a dimensionality reduction in high-dimensional vectors while improving overall sensitivity and latency. By segmenting the signals, features can be extracted from smaller segments, thereby reducing the average latency of ES detection. The robustness of our approach is validated through one-shot and few-shot learning experiments, highlighting its potential to provide timely and reliable ES detection for real-time monitoring and management of epilepsy, particularly in clinical and resource-limited settings.
Our approach combines an HD framework with BNBC, thereby reducing the effective dimension of the encoded vectors. Simultaneously, the overall performance surpasses that of HD classifiers based on Hamming distance in previous works. Furthermore, by segmenting the signal into short windows and extracting their LBP codes, we have reduced the average latency of ES detection. Within the classification component of the framework, we introduce LUT. By replacing floating-point multiplication and logarithmic operations with integer addition and table lookups, the entire framework becomes suitable for low-power, memory-constrained hardware platforms.
Despite these improvements, our work retains several limitations. Firstly, the experiments were conducted on a small sample size of 16 patients using short-term iEEG datasets. Secondly, we employed leave-one-seizure-out cross-validation without conducting cross-patient or cross-dataset validation. Furthermore, we have yet to realize a dedicated hardware prototype for power consumption and latency measurements.
Future work should focus on addressing these limitations. We plan to validate our approach on larger, more comprehensive datasets. This will include cross-patient and cross-dataset evaluations to better characterize robustness and generalization capabilities. Moreover, we aim to investigate adaptive learning and transfer learning strategies to reduce the volume of patient-specific data required for reliable performance. Finally, we shall explore concrete implementation approaches on neuromorphic or low-power embedded platforms, integrating LUT-based BNBC into real-time ES detection systems for clinical and resource-constrained environments.