Deep Learning for Generalized EEG Seizure Detection after Hypoxia–Ischemia—Preclinical Validation

Brain maturity and many clinical treatments such as therapeutic hypothermia (TH) can significantly influence the morphology of neonatal EEG seizures after hypoxia–ischemia (HI), and so there is a need for generalized automatic seizure identification. This study validates efficacy of advanced deep-learning pattern classifiers based on a convolutional neural network (CNN) for seizure detection after HI in fetal sheep and determines the effects of maturation and brain cooling on their accuracy. The cohorts included HI–normothermia term (n = 7), HI–hypothermia term (n = 14), sham–normothermia term (n = 5), and HI–normothermia preterm (n = 14) groups, with a total of >17,300 h of recordings. Algorithms were trained and tested using leave-one-out cross-validation and k-fold cross-validation approaches. The accuracy of the term-trained seizure detectors was consistently excellent for HI–normothermia preterm data (accuracy = 99.5%, area under curve (AUC) = 99.2%). Conversely, when the HI–normothermia preterm data were used in training, the performance on HI–normothermia term and HI–hypothermia term data fell (accuracy = 98.6%, AUC = 96.5% and accuracy = 96.9%, AUC = 89.6%, respectively). Findings suggest that HI–normothermia preterm seizures do not contain all the spectral features seen at term. Nevertheless, an average 5-fold cross-validated accuracy of 99.7% (AUC = 99.4%) was achieved from all seizure detectors. This significant advancement highlights the reliability of the proposed deep-learning algorithms in identifying clinically translatable post-HI stereotypic seizures in 256Hz recordings, regardless of maturity and with minimal impact from hypothermia.


Introduction
Perinatal hypoxic-ischemic encephalopathy (HIE) is a life-threatening complication, arising after lack of oxygen and blood flow to the brain [1].In both experimental and clinical studies in term and preterm neonates, moderate to severe hypoxia-ischemia (HI) is typically followed by electroencephalographic (EEG) seizures, which are associated with white and grey matter injury [2][3][4][5].While studies indicate a strong association between neonatal seizures and outcomes in term infants with HIE [6], there is evidence demonstrating that EEG seizures may not necessarily have clinical correlates in all neonates [7].The great majority of EEG seizures at all ages are not associated with clinical signs.Real-time manual identification and interpretation of EEG seizures in newborns are, however, challenging and require expert clinical knowledge [8,9].We need to significantly improve our ability to reliably detect seizures in order to more effectively target treatment to infants who might benefit [8,10].
Preclinical experiments can help to improve our understanding of the diagnostic and prognostic values of EEG as the timing and severity of the insult can be controlled and continuous measurements made from immediately after HI [11][12][13].Neonatal models of clinically relevant HI insults in large animals at term are challenging and are largely carried out under anesthesia [14,15] and not readily feasible at preterm equivalent ages.A useful alterative is the fetal sheep, which permits continuous, comprehensive physiological measurements at all ages including continuous EEG recordings before, during, and after HI without the confounding effects of anesthesia or other medications.Fetal studies have shown that the phases of HI injury are similar to those seen in newborns, with reperfusion, latent, secondary, and tertiary phases of evolving injury.
We have previously demonstrated that EEG waveforms, specifically micro-scale sharp waves and gamma spike transients, are observed in the latent phase of recovery in preterm fetal sheep and are correlated with HI-related brain injury [16,17].We have recently developed and validated successful data-driven deep-learning-based pattern classifiers that can accurately identify and quantify these transients [16] as well as delta-band rolling waveforms in the form of stereotypic evolving micro-scale seizures (SEMSs) in the EEG of preterm fetal sheep.The techniques utilized included 1D convolutional neural networks (1D-CNNs) [16,18], wavelet Fourier CNNs (WF-CNNs) [16], and a highly accurate state-ofthe-art wavelet scalogram CNN (WS-CNN) approach that infuses spectrally rich feature maps of EEG sections into a deep CNN classifier for pattern recognition [16].
After initial recovery of oxidative metabolism in a "latent" phase after HI, moderate to severe injury is associated with secondary deterioration, with of loss of cerebral oxidative metabolism and high-amplitude stereotypic evolving seizures (HASs) in most fetuses and a fully stochastic EEG background regardless of age (Figure 1A-C) [11,19].Experimentally, seizures typically start around 7-8 h post-HI and peak by approximately 24-48 h.In the preterm brain, these seizures are most often discrete events [11] but can develop into status epilepticus in term fetuses [19].HASs are defined by their repetitive, stereotypic nature, lasting for at least 10 s with an amplitude of >20 µV in at least one EEG channel [20,21], matching the clinical definition [22][23][24].Examples of HASs from HI-normothermia preterm, HI-normothermia term, and HI-hypothermia term groups are shown in Figure 1D-L.We have previously shown that the 1D-CNN and WS-CNN classifiers can be used to identify HASs after HI in preterm fetuses [18,25].Others have shown that data-driven deeplearning-based classifiers can help to identify seizures in term and preterm infants [26,27] and adults [28,29].
While there is growing evidence that automated classifiers for EEG background assessment and grading are feasible [30][31][32][33], there is limited evidence for how automated deep-learning-based seizure detection algorithms perform across different stages of maturation [26].Other studies also show the application of deep-learning-based algorithms for neonatal EEG seizure identification [34][35][36][37][38][39].Recent studies emphasize that automated neonatal seizure detection algorithms can be beneficial; however, the algorithm's predictions will still necessitate review by a human expert [5].This is important to determine, as EEG characteristics change with age, consistent with maturation of neural connectivity and changing neurotransmitter function, and there are data to show that seizure morphology may also differ with age [11,26,[40][41][42].Further, there are few studies on the impact of clinical treatments, particularly therapeutic hypothermia (TH), which is the only currently approved neuroprotection treatment for term and near-term newborns with moderate to severe HIE [1].Preclinical studies demonstrated that TH is most effective when started as early as possible within the first 6 h after HI (i.e., during the latent phase) and continued for 3 days.These data informed clinical trials and current practice [1].Animal and clinical studies show that TH can partially suppress HASs [2,5,[43][44][45][46].In the present study, we sought to validate the accuracy of the WS-CNN, the WF-CNN, and 1D-CNN advanced deep-learning algorithms for seizure detection in a cohort of 40 near-term and preterm fetal sheep after severe HI, using continuous EEG recordings.In the near-term fetuses, we also examined whether TH affected their accuracy.We report that the proposed deep net algorithms can accurately detect seizures regardless of morphological variation due to age or hypothermia with a consistent decaying performance order when tested on data from HI-normothermia preterms, HI-normothermia terms, and HI-hypothermia terms, respectively.We will further demonstrate that training of the deep nets by adding data from sham-normothermia terms into the training sets can help to improve accuracy.The proposed spectral-based seizure detectors may also help to interpret spectral-component coherency of seizures.The non-denoised 256Hz recordings used in this study provided a framework to address the performance of the proposed deep-CNN-based seizure detectors in preclinical situations, where seizures could be reliably identified from non-HAS events (e.g., due to movement artifacts, etc.), thus making the validation of the techniques clinically trustworthy.These results are a significant step forward towards robust identification of post-HI seizures regardless of the brain maturity and treatment that should be further assessed in clinical studies.Finally, we show how these results illustrate a robust approach to developing a generalized seizure detector that can robustly identify seizures across all groups regardless of maturation or use of TH.

Methods
The known maturational differences between term and preterm EEG activity [47,48] imply that automatic identification of seizures across these groups can be challenging when the algorithm is trained only using data from one group (e.g., trained on terms and then tested on preterms).These developmental impacts have been emphasized by studies showing that adults epileptic seizure detectors are not suitable for EEG seizure detection in term infants [49,50].Clinically, preterm seizure morphology and characteristics are more complex, with slightly different frequency components than those seen in term seizures [40,41].In the current study, we investigate solutions to the challenges above by addressing the following questions: (a) Can our previous micro-scale EEG pattern classifiers be re-designed for accurate seizure identification in data from fetal sheep models with different gestational ages and/or under the influence of treatment with therapeutic hypothermia?
To answer this question, we refined previously developed EEG pattern classifiers, the WS-CNN, the WF-CNN, and the 1D-CNN, for seizure detection in preclinical data from a large cohort of fetal sheep.Figure 2 is a schematic showing where each algorithm is situated at a system level.We have described the basic algorithms previously [16,25].Details of the re-designed structures including brief tables that demonstrate the new CNN architectures are described in Sections 2.1-2.3.
(b) Can the seizure detection algorithms trained/validated on datasets from certain group sets identify seizures in the EEG sets of other individual groups?
The paper explores how the choice of data partition (train/validation/test) influenced the results for each algorithm: 1. Study #1: A leave-one-out cross-validation (LOOCV) approach where data from the term sham-normothermia group were included in three different training/test schemes.2. Study #2: A leave-one-out cross-validation (LOOCV) approach where data from the sham-normothermia group were excluded in three different training/test schemes.This was used to study the possible impacts of removing data from the shamnormothermia group.3. Study #3: A k-fold cross-validation (k = 5) approach where data from all groups were randomly combined and included in five different training/test schemes.

Methods
The known maturational differences between term and preterm EEG activity [47,48] imply that automatic identification of seizures across these groups can be challenging when the algorithm is trained only using data from one group (e.g., trained on terms and then tested on preterms).These developmental impacts have been emphasized by studies showing that adults epileptic seizure detectors are not suitable for EEG seizure detection in term infants [49,50].Clinically, preterm seizure morphology and characteristics are more complex, with slightly different frequency components than those seen in term seizures [40,41].In the current study, we investigate solutions to the challenges above by addressing the following questions: (a) Can our previous micro-scale EEG pattern classifiers be re-designed for accurate seizure identification in data from fetal sheep models with different gestational ages and/or under the influence of treatment with therapeutic hypothermia?
To answer this question, we refined previously developed EEG pattern classifiers, the WS-CNN, the WF-CNN, and the 1D-CNN, for seizure detection in preclinical data from a large cohort of fetal sheep.Figure 2 is a schematic showing where each algorithm is situated at a system level.We have described the basic algorithms previously [16,25].Details of the re-designed structures including brief tables that demonstrate the new CNN architectures are described in Sections 2.1-2.3.We explored how the performance of each seizure detector changed across and within each category above.The three proposed seizure detectors were trained/validated/tested over 1727 h of recordings in each separate study above (overall more than 17,300 h of recordings).Study #3 above showed that combining datasets recorded from both hemispheres of all four fetal sheep cohorts (n = 40) provided the most accurate generalized seizure detector.We then demonstrated that our state-of-the-art WS-CNN seizure classifier outperformed the other proposed algorithms for real-time identification of HASs and was able to distinguish them from high-amplitude EEG noise and/or other background activity in the conventional 256 Hz post-HI recordings, across all studies.WS feature map images were then infused into a deep CNN classifier (WS-CNN) for pattern recognition [16].Despite its larger architecture design and computationally intensive nature, this algorithm consistently outperforms our other CNN-based classifiers [16].Table 1 details the architecture of the proposed 17-layer deep WS-CNN classifier, including convolutional layers (with batch normalization and ReLU units), max-pool, fully connected layers, and a softmax and a classification layer at the end (see full details in [16]).The root mean square propagation (RMSProp) optimizer was used to update the weights and bias parameters of the classifier.Due to the satisfactory overall performance from RMSProp, we did not investigate substituting this optimizer with Adam or SGDM optimizers.Training/validation was performed over 60 epochs using the default parameter values of 1.00 × 10 −3 and 0.9 for the learning rate and SquaredGradientDecayFactor, respectively.WS feature map images were then infused into a deep CNN classifier (WS-CNN) for pattern recognition [16].Despite its larger architecture design and computationally intensive nature, this algorithm consistently outperforms our other CNN-based classifiers [16].Table 1 details the architecture of the proposed 17-layer deep WS-CNN classifier, including convolutional layers (with batch normalization and ReLU units), max-pool, fully connected layers, and a softmax and a classification layer at the end (see full details in [16]).The root mean square propagation (RMSProp) optimizer was used to update the weights and bias parameters of the classifier.Due to the satisfactory overall performance from RM-SProp, we did not investigate substituting this optimizer with Adam or SGDM optimizers.Training/validation was performed over 60 epochs using the default parameter values of 1.00 × 10 −3 and 0.9 for the learning rate and SquaredGradientDecayFactor, respectively.

WF-CNN Seizure Detector
This approach is a simplified version of the WSs where only the spectrally dominant features of the raw EEG segment, in the form of wavelet and Fourier spectrums (WF), were extracted and used as opposed to the full-range spectral features (scalogram images) [16,25].We combined three time-series to form 3D input matrix sets of 51,302 × 3 × 1 in size: - The CWT coefficients of each EEG segment using morl at an arbitrary scale of 80 (equal to pseudo-frequency of 2.56 Hz).This scale number was chosen to target the embedded spectrums near the mean frequency of the delta-band (0.5-4 Hz). - The inverse Fourier transform time-series of the EEG segment (IFFT: Spectral components within 0.2-4.5 Hz were preserved).This was chosen to cover delta-band spectrums as studies have shown neonatal seizures are more likely to contain rhythmic delta sharp-waves/discharges [41,51].- The original raw EEG segment.The 3D input matrix sets of the spectrally dominant features from the previous section were fed into a deep 2D-CNN classifier (WF-CNN).We designed a 14-layer deep CNN to perform classification on the 3D input matrices of features.This approach is computationally more efficient due to the much simpler input features [16,25].Table A1 of Appendix A details the architecture of the proposed 14-layer deep 2D-CNN used in the WF-CNN seizure detector.The inner convolutional layers of the network were designed to avoid massive size reductions within the inner layers.An RMSProp updating optimizer was used to train the WF-CNN over 60 epochs with default parameter settings.

1D-CNN Seizure Detector
We recently demonstrated that a 1D-CNN classifier is capable of identifying HASs in a limited dataset of preterm fetal sheep [18].This approach is computationally the simplest seizure classifier compared to the WF-CNN and WS-CNN approaches, where the feature extraction blocks in the two previous approaches were skipped and a CNN produces internal feature maps of its input 1D EEG time-series.In the current study, EEG segments of 51,302 × 1 in length were directly infused into a 14-layer deep 1D-CNN classifier.The designed architecture was trained using an RMSProp optimizer over 60 epochs with default parameter settings (Table A2 in Appendix A).

Performance Metrics
The performances of the proposed seizure detectors were evaluated by measuring sensitivity, selectivity, precision, accuracy, and area under the curve (AUC).These metrics were calculated by determining full results of the confusion matrix including the total number of true-positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN) hits for each scheme in each study category, separately (see Table 2).Precision is the percentage of TPs among TP + FN.Accuracy is defined as the percentage of true hits (TP + TN) among all possible outcomes (TP + FN + TN + FP).Training-to-testing ratios/regimes in each scheme of study #1 and #2 were over-ruled by the number of EEG patterns in each animal group (Table 2).A fixed training-to-testing ratio of 4.0 was considered for all five cross-validation folds in study #3 (80% data for training/validation and 20% for test).Partitioning data across all folds of study #3 was performed using the "rng('default')" Matlab function [52] to ensure data reproducibility for performance assessment across all seizure detectors.

Computing Infrastructure
The algorithms were developed, trained, and tested in Matlab ®® using New Zealand eScience Infrastructure (NeSI) (Auckland, New Zealand) high-performance computing facilities' Cray CS400 cluster [53].The training process was executed using enhanced NVIDIA Tesla A100 PCIe GPUs with 40 GB HBM2 stacked memory bandwidth at 1555 GB/s [54].Intel Xeon Broadwell CPUs (E5-2695v4, 2.1 GHz) were used on the cluster for handling the GPU jobs.In brief, 26 near-term and 14 preterm time-mated Romney/Suffolk fetal sheep at 124 and 98 ± 1 days of gestation, respectively (term is 145 days of gestation), were used in this study.Fetuses were surgically instrumented with catheters and electrodes under general anesthesia using aseptic techniques, as previously described [11].Prior to the surgical procedure, food was withheld from the animals for 18 h; however, access to water was maintained.Long-acting oxytetracycline (20 mg/kg, Phoenix Pharm, Auckland, New Zealand) was administered intramuscularly to the ewes 30 min before the commencement of surgery.Anesthesia was initiated through intravenous injection of propofol (5 mg/kg, AstraZeneca Limited, Auckland, New Zealand) and was sustained using 2-3% isoflurane in an oxygen environment.Throughout the procedure, the depth of anesthesia, maternal heart rate, and respiration were continuously monitored by experienced anesthetic personnel.The ewes received a continuous infusion of isotonic saline (at an approximate rate of 250 mL/h) to uphold fluid balance.Following a maternal midline abdominal incision, the fetus was exposed.For near-term fetuses, the vertebral-occipital anastomoses were ligated and inflatable carotid occluder cuffs were placed around both carotid arteries [55].For preterm fetuses, an inflatable silicone occluder (OC16HD, 16 mm, In Vivo Metric, Healdsburg, CA, USA) was loosely positioned around the umbilical cord to enable postsurgical occlusion of the umbilical cord for inducing fetal HI [11].Using a 7-stranded stainless steel wire (AS633-7SSF; Cooner Wire Co., Chatsworth, CA, USA), two pairs of EEG electrodes were constructed and placed on the dura mater over the parasagittal parietal cortex (10 mm and 20 mm anterior to bregma and 10 mm lateral for near-term fetuses and 5 mm and 15 mm anterior to bregma and 5 mm lateral for preterms) and secured with cyanoacrylate glue.A reference electrode was sewn over the occiput.For near-term fetuses, a thermistor (Replacement Parts Industries, Inc., Chatsworth, CA, USA) was placed over the parasagittal dura, 30 mm anterior to bregma, to measure extradural temperature and a second thermistor was inserted into the esophagus to measure core temperature.In the case of all fetuses, a cooling cap made from silicone tubing (3 × 6 mm, Degania Silicone, Degania Bet, Israel) was affixed to the head.For all fetuses, the uterus was then closed and antibiotics (80 mg Gentamicin, Pharmacia and Upjohn, Rydalmere, New South Wales, Australia) were administered into the amniotic sac.The maternal laparotomy skin incision was sutured and anesthetized with an injection of 10 mL of 0.5% bupivacaine plus adrenaline (AstraZeneca Ltd., Auckland, New Zealand).All fetal catheters and leads were externalized through an incision in the maternal flank.The maternal long saphenous vein was catheterized to provide access for postoperative maternal care and euthanasia.
Postoperative care: Following surgery, sheep were housed in individual metabolic cages with unrestricted access to food and water.The room was maintained at a temperature of 16 ± 1 • C with 50 ± 10% humidity and a 12 h light/dark cycle (lights on at 06:00 h).The ewe received daily intravenous antibiotics for four days (600 mg benzylpenicillin sodium, Novartis Ltd., Auckland, New Zealand, and 80 mg gentamicin).
The patency of fetal catheters was sustained by continuous infusion of heparinized saline (20 U/mL at 0.2 mL/h), and the maternal catheter was maintained by daily flushing.

Experimental Protocols
Near-term fetuses were randomized to HI-normothermia term (HI, no hypothermia) (n = 7), HI-hypothermia term (HI, hypothermia) (n = 14), and sham-normothermia term (no HI, no hypothermia) (n = 5).Separately, an HI-normothermia preterm group was prepared (HI, no hypothermia, n = 14).For near-term fetuses, at 128 ± 1 d gestation, cerebral ischemia was induced through the temporary inflation of the carotid occluder cuffs with sterile saline for a duration of 30 min.The successful occlusion was then confirmed by the rapid onset of an isoelectric EEG signal, typically within 30 s after inflation.In the sham-normothermia experiments, the carotid occluder cuffs were not inflated.HI-normothermia preterm fetuses at 103 ± 1 d gestation received complete umbilical cord occlusion induced by inflation of the umbilical cord occluder with sterile saline.This occlusion was maintained for 25 min or until blood pressure dropped below 8 mmHg or asystole occurred.
In the HI-hypothermia term group, hypothermia was started 3 h after the end of the HI insult and continued for 72 h.Hypothermia was performed by attaching the exteriorized ends of the silicone scalp coil to a pump (TX150 heating circulator, Grant Instruments Ltd., Cambridge, UK) in a cooled water bath and circulating cold water through the cooling coil.In line with previous studies conducted on near-term fetal sheep, the initial target extradural temperature was regulated to a range of 31-33 • C in the HI-hypothermia terms [55].The water was not circulated, and the cooling coil was maintained in thermal equilibrium with the fetal temperature.As described previously, once the cooling period concluded, the water pump was deactivated, and the fetuses were allowed to naturally re-warm over a period of approximately 60 min.Subsequently, euthanasia was administered to both ewes and fetuses at the conclusion of the study, which occurred 7 days after ischemia for near-terms or 21 days after ischemia for preterms.The euthanasia process involved an intravenous overdose of sodium pentobarbitone (9 g administered to the ewe; Pentobarb 300; Chemstock, Christchurch, New Zealand) for subsequent immunohistochemistry analysis.In all groups, 7 days of EEG recordings after HI were considered for seizure analysis.For the HI-hypothermia term group, this included seizures during the entire cooling phase and those that occurred after the end of cooling.

Data Acquisition
After the end of surgery, fetal EEG and other physiological parameters including mean arterial blood pressure, heart rate, and EEG were recorded continuously from 24 h before the insult and evaluated up to 168 h after the end of HI or sham HI.
Data were continuously acquired and stored on disk for subsequent offline analysis utilizing custom data acquisition software (LabView 2020 for Windows, National Instruments, Austin, TX, USA).The raw EEG recordings underwent initial processing, commencing with a 6th-order anti-aliasing Butterworth low-pass filter featuring a 500 Hz cut-off frequency.Following this step, the signal was subjected to a gain amplification of ×10,000 and subsequently high-pass filtered via a first-order filter, where the cut-off frequency was set at 1.6 Hz.The recordings were then digitized at a rate of 4096 Hz, and a 10th-order low-pass inverse Chebyshev filter at 128 Hz (implemented in software) was applied, subsequently reducing the sample rate to 256 Hz before the data were saved.The data from this final stage were subsequently transferred to Matlab for analysis.
EEG intensity (power) was also derived using the sum of the power spectrum between 1 and 20 Hz over 1 min recording bins of the raw EEG signal and log transformed (decibels (dB), 20× log (intensity)) as this transformation gives a better approximation of the normal distribution.
A total of ~10,080 min of EEG recordings were collected from each fetal sheep (in total more than 8600 h per EEG channel from 40 subjects).We manually identified the location of all HAS events on both left and right EEG channels by plotting the EEG power (log µv 2 ), separately for each, and the locations of the seizures were marked accordingly.The exact seizure section was then identified in the raw EEG recording and the seizure was centered in a 51,302 × 1 long segment (equal to ~3.34 min long segments).The procedure of centralizing the seizure has been explained in the preprocessing section below.Choosing the 3 min long segments ensured that the full lengths of the longer seizures were included in the dataset.A total of 31,015 EEG segments including 3955 HASs and 27,060 non-HASs were manually annotated to create the database (Table 3).The larger number of non-HAS events was intentionally chosen to account for the significantly higher proportion of normal EEG activity and to encompass the natural emergence of seizures against the EEG background.This reinforces the algorithm's robustness in facing non-HAS events.Table 3 provides full details for the number of EEG segments in each animal group.Table 2 further demonstrates how data were partitioned into different training schemes for each study category.

Preprocessing
The EEG power signal mainly helps to monitor the long-term trend of an experiment and is calculated as the sum of the power spectrum of the raw EEG signal between 1 and 20 Hz over a 1 min recording.Therefore, manually identified/labeled seizure points in the EEG power signal represent only 1 min data where the center/minimum/maximum datapoints are not necessarily the center of the seizure.Therefore, we developed a strategy to find and locate seizures in the center of the 3 min long EEG segment as follows: We initially selected 1 min long recordings either side of the marked seizure location in the EEG power signal (sig_1).The max/min values of sig_1 are not necessarily a good choice for the center of a seizure, thus we chose to consider the center of the weight of sig_1 (the signal's datapoint with the highest weight) as the center of the seizure epoch.To find the center of the weight, sig_1 was initially zero meaned and amplified to the arbitrary power of 10 to intensify signal components with higher energy levels.The output signal (sig_2) was then passed through a moving median absolute deviation (movmad) function with an arbitrary sliding window size of a length of 8000 points and scaled up to sig_1 (sig_3).This strategy allowed determination of sections of sig_3 that held the highest signal weight within the arbitrarily assigned moving average datapoints above (i.e., 8000).The maximum value of sig_3 was then considered as the center of weight for the identified seizure and 100 s of data from either side of this datapoint (equal to a total of 3.34 min) were automatically selected to form the seizure segment (HAS).This step was initially performed by developing automatic algorithms to generate plots, similar to what is shown in Figure 4, for all manually identified seizure locations.
Bioengineering 2024, 11, x FOR PEER REVIEW 13 of 27 The plotted graphics were then visually checked by two experts to ensure that the final EEG segment represented a seizure that is centered in the 3 min long EEG segment.This preprocessing step can significantly improve the results of the proposed seizure classifiers as this centralizing strategy helps to use the data in a much better format for a classification task.
The non-HAS EEG segments were randomly chosen outside the HAS intervals (or overlapping tails of seizures in their segments).Therefore, the non-HAS events could include any electrophysiological activity including normal increasing EEG activity, background EEG, movement artifact, electronic noise, etc.To improve generalization and strengthen the performance validity of the proposed pattern classifier, we did not de-noise the original EEG segments.This helps facilitate assessment of the proposed seizure detector on data that are similar to those obtained clinically.No data augmentation (such as horizontal/vertical flipping of the EEG segments)   The plotted graphics were then visually checked by two experts to ensure that the final EEG segment represented a seizure that is centered in the 3 min long EEG segment.This preprocessing step can significantly improve the results of the proposed seizure classifiers as this centralizing strategy helps to use the data in a much better format for a classification task.
The non-HAS EEG segments were randomly chosen outside the HAS intervals (or overlapping tails of seizures in their segments).Therefore, the non-HAS events could include any electrophysiological activity including normal increasing EEG activity, background EEG, movement artifact, electronic noise, etc.
To improve generalization and strengthen the performance validity of the proposed pattern classifier, we did not de-noise the original EEG segments.This helps facilitate assessment of the proposed seizure detector on data that are similar to those obtained clinically.No data augmentation (such as horizontal/vertical flipping of the EEG segments) was performed on the data so that the performance assessments were purely evaluated on the natural electroencephalographic morphology of the patterns.

Results of the WS-CNN Seizure Detector
Results of the WS-CNN approach for the three different study categories are shown in Table 4.The deep WS-CNN classifier was able to identify seizures in an unseen fetal sheep group regardless of age or treatment with an average high accuracy of 98.52% whether EEG data from the sham-normothermia group were used in the training set or not.The overall accuracy of the WS-CNN classifier for study #1 was equal to 98.5 ± 1.0% (range 97.2-99.7%),for study #2 98.6 ± 0.9% (range 97.4-99.7%),and for study #3 99.8 ± 0.04% (range 99.3-99.6%).
The results confirmed the reliability of the 17-layer deep WS-CNN pattern classifier for detection of post-HI HASs in the 256 Hz sampled EEG.Average area under curve (AUC) values of 0.95, 0.95, and 0.99 were obtained for studies #1 to #3, respectively.Figure 5A-C show the ROC curves and the corresponding AUC values of the WS-CNN seizure detector for the three study categories.
The results showed that the 14-layer WF-CNN pattern classifier can also accurately identify the post-HI HASs.Average AUC values of 0.95, 0.95, and 0.99 were obtained for studies #1 to #3, respectively.Figure 5D-F show the ROC curves and the corresponding AUC values of the WF-CNN seizure detector for the three study categories.
The results indicated that despite the negligible fall in accuracy with the 1D-CNN approach, it could also reliably accurately identify HASs.Average AUC values of 0.95, 0.94, and 0.99 were obtained by the 1D-CNN approach for studies #1 to #3, respectively.Figure 5G-I show the ROC curves and the corresponding AUC values of the 1D-CNN seizure detector for the three study categories.

Discussion
The present study demonstrates that deep-learning pattern classifiers that we originally developed to identify EEG transients can accurately identify post-HI seizures with more than 99.7% accuracy across preterm and near-term gestations and during TH in near-term animals.In these studies, we examined whether data from different gestational ages could be used to train deep net classifiers to identify seizures from non-HAS EEG segments in unseen data from other groups.We explored three main study categories to test and compare algorithms using leave-one-out cross-validation approaches with training sets that (1) included and (2) excluded data from the sham-normothermia group; and (3) a k-fold cross-validation (k = 5) approach where data from all groups were randomly combined and included in the training set.We found some variations in classification accuracy across groups that likely reflect embedded morphological differences in seizures between groups.
Study #1 (sham-normothermia group included in the training sets): The seizure detectors tested in this study consistently achieved their best performance when tested on data from HI-normothermia preterms (Figure 5A,D,G and Table 7) with an average accuracy of 99.5 ± 0.3%.This suggests that the embedded spectral morphological features of seizures in the HI-normothermia term and the HI-hypothermia term groups are also present in the preterm group.
Interestingly, when we included the preterm datasets in the training sets this reduced the average overall performances to 98.7 ± 0.2% and 96.9 ± 0.3% when the seizure detectors were tested on data from the HI-normothermia terms and HI-hypothermia terms, respectively.This indicates that the term datasets include more complex spectral features than the preterm data, and thus, while term-trained algorithms accurately identify HInormothermia preterm seizures, there is less accuracy the other way around.This could be due to slower delta activity in the preterm fetuses (equivalent to a subset of delta-band full spectral range) compared to the evolving EEG components in a more term brain that covers a broader spectral delta range including those that already exist in preterms.This is evident from the increasing number of wrongly detected seizures (false positives) in study #1 in Tables 4-6 for the HI-sham-term-tested and HI-hypothermia-term-tested classifiers compared to the HI-normothermia-preterm-tested classifiers.
Our data are consistent with the well-known relative maturation and connectivity of the preterm brain compared with term.Our HI-normothermia preterm fetal group is at a brain age equivalent of 28-30 weeks of gestation in humans [11].This age is characterized by rapid production of white matter, but at a time when the cortical neurons are not yet myelinated [56][57][58][59].The EEG is discontinuous in nature, and the frequency content of preterm EEG generally contains slower delta activity [60,61], which develops into covering faster spectral components towards term age with the shift to continuous EEG activity and sleep state cycling [60,[62][63][64][65].
Study #2 (sham-normothermia term group excluded from the training sets): This study showed a similar profile to that of study #1, with data demonstrating decreasing performance from HI-normothermia preterms to HI-normothermia terms and HI-hypothermia terms.The lack of EEG data from the sham-normothermia term group in the training sets appeared to have contributed to lowering the overall performances of the proposed seizure detectors in this study category.The performance of the 1D-CNN classifier was impacted more than that of the WF-CNN and the WS-CNN classifiers, confirming the robustness of the latter two strategies.Compared to study #1, adding data from the sham group in the training sets could also assist classifiers to achieve better performances by lowering the number of FN hits, on average.Study #3: The cross-validation results in Tables 4-6 and the ROC curves in Figure 5C,F,I indicate that the three proposed seizure detectors were able to properly generalize when trained on combined datasets from all groups and accurately identify seizures from nonseizures and background EEG activity.Results indicate how the seizure detectors perform equally well across all validation folds with average overall performances of 99.78 ± 0.04%, 99.73 ± 0.08%, and 99.70 ± 0.14% for the WS-CNN, WF-CNN, and the 1D-CNN classifiers, respectively.These high-performance measures have been cross-validated over 31,015 EEG patterns and confirm that the proposed classifiers are able to learn the morphological variations of seizures (and their possible spectral feature differences) across all fetal sheep groups to properly identify a novel HAS event in an unseen dataset.
These promising results indicate an effective approach to train generalized seizure detectors to robustly identify seizures regardless of the gestational age and/or the influence of treatment/drug on the EEG quality.The results also suggest that the simpler architectures of the 1D-CNN and the WF-CNN classifiers can be used to achieve much faster analysis, in real-time, instead of the computationally heavy structure of the WS-CNN.Nevertheless, the larger standard deviations of 0.14 and 0.08 from the 1D-CNN and WF-CNN, respectively, compared to 0.04 with the WS-CNN classifier illustrate a potential limitation of the faster classifiers.
Algorithm comparisons: Overall, comparing performances of the classifiers in Tables 4-6 demonstrates that feeding the deep 2D-CNN classifiers with the spectrally rich feature maps of the EEG segments can provide much better seizure detection accuracies compared to when the raw EEG segments were directly fed into a 1D-CNN pattern classifier.In fact, the wavelet scalograms in WS-CNNs and the matrices of spectrally dominant features in the WF-CNNs provided robust spectrally detailed inputs for the deep 2D-CNN pattern classifiers to desirably classify post-HI HAS from non-HAS events.This unique ability to almost perfectly identify HASs from non-HASs is important as the non-HAS events in this study are any non-HAS electrophysiological activity that could include normal increasing EEG activity, movement artifacts, electronic noise, etc.Our data show that the proposed CNN-based seizure detectors can competitively identify the HASs across all study schemes with negligible performance drop for the WF-CNN and 1D-CNN compared to the WS-CNN approach.Data further suggest that the raw EEG time-series fed into the 1D-CNN as well as the spectrally dominant features in the input matrices of the WF-CNN approach can provide sufficient information for the designed CNN classifiers to identify HAS from non-HAS events.This is particularly important as the structure of the 1D-CNN and WF-CNN seizure detectors are computationally more efficient, allowing the algorithms to run faster with less required memory compared to the WS-CNN.The choice of faster computations and technology requirements is at the expense of negligible accuracy drop, compared to the WS-CNN strategy.The data suggest that the WS generated using a morl mother wavelet of a scale of 1-500 covered a broader range of spectral features that helped the WS-CNN to outperform the WF-CNN and 1D-CNN approaches.The application of the NVIDIA A100 GPUs in this study has contributed to remarkably fast analysis despite the computationally intensive nature of the processing.
Limitations: In interpreting this study, the reader should consider that, regarding the clarity in identifying HASs in dural surface-recorded EEGs from fetal sheep models, they have lower background noise than typical neonatal scalp recordings [66][67][68].While the HASs observed in dural surface-recorded EEG from fetal sheep models are directly comparable to typical neonatal seizures in clinical scalp measurements, the dural recordings inherently provide a superior signal-to-noise ratio than conventional clinical scalp measurements.The enhanced clarity is beneficial for accurate seizure detection in this experimental setting but might not fully capture the challenges associated with clinical EEG recordings from the scalp, where noise levels are typically higher due to factors such as infant movement.Further, it is possible that movement may differ between the term and preterm brain.Thus, validation using clinical recordings will be an important future direction.
Nevertheless, these results support that the proposed algorithms can reliably identify post-HI seizures in conventional 256Hz recordings across groups with different gestational ages and/or under the influence of therapeutic hypothermia.

Conclusions
This study demonstrated the effectiveness of a deep CNN pattern classifier for generalized seizure detection in over 17,300 h of EEG recordings following acute HI in a cohort of 40 fetal sheep.The cohort included a range of settings, including normothermia vs. hypothermia, and term and preterm gestations, as well as sham controls.The CNN seizure classifier exhibited exceptional accuracy, with an average 5-fold cross-validated performance exceeding 99.7%, affirming the reliability of the proposed deep-learning algorithm.Minimal influences of gestational age and hypothermic treatment on seizure detector performance were observed, confirming its generalizability to identify seizures across gestational ages when trained on a robust dataset with samples from all groups.The three study categories, incorporating variations in training datasets, provided valuable insights into the robustness and generalization capabilities of the proposed algorithms.The classifiers consistently performed well across different validation folds, confirming their ability to identify seizures regardless of gestational age or therapeutic intervention.Notably, when trained on term data, the algorithms demonstrated superior accuracy in identifying seizures in preterm subjects compared to the reverse scenario, emphasizing the importance of considering the developmental stage of the fetal brain in training seizure detection algorithms.Variations in classification accuracy across study categories underscored morphological differences in seizures between groups, reflecting the relative maturation and connectivity of preterm versus term brains.
The study introduced the potential for real-time analysis using simpler architectures like the 1D-CNN and WF-CNN classifiers, offering faster processing while maintaining competitive accuracy, emphasizing the feasibility of deploying these algorithms in nearclinical scenarios.The comparative analysis of different CNN architectures highlighted the advantage of utilizing spectrally rich feature maps in 2D-CNN classifiers for improved seizure detection accuracy.The use of non-denoised 256Hz recordings added clinical relevance, showcasing the algorithms' robustness in reliably identifying post-HI seizures under real-world conditions.Critical future directions will involve validating the algorithms with clinical data to establish their applicability in real-world healthcare settings.Overall, this research signified a significant advancement in automated seizure detection after HI events, offering a promising avenue for enhancing clinical practices and patient outcomes in neonatal care.

Figure 1 .
Figure 1.Examples of EEG power activity before, during, and up to 60 h after an HI insult induced by acute umbilical cord occlusion in an HI−normothermia preterm fetus (A), global cerebral ischemia induced by bilateral carotid artery occlusion in HI−normothermia term fetuses with no treatment (B), and HI−hypothermia term fetal sheep with 3 days of therapeutic hypothermia (C).The presence of high-amplitude stereotypic evolving seizures (HASs) during the secondary phase of recovery is demonstrated in (A-C).Examples of individual HASs from each fetal sheep in the above groups are shown in (D-L), respectively.HAS patterns above demonstrate examples of EEG epochs used in the training and testing of the proposed seizure detectors.
(c) Can specific training strategies help to improve the generalization and robustness of pattern classifiers to perform equally well across all groups and identify seizures regardless of what hemisphere the EEG has been recorded from?

Figure 3 .
Figure 3. (A-D): Examples of post-HI high-amplitude stereotypic seizures (HASs) in all fetal sheep groups, equal to Figure 1D,E,K,L.(I-L): Examples of post-HI non−HAS EEG segments.(E-H) and (M-P): Examples of the corresponding scalogram images of the HAS and non−HAS EEG segments, respectively, used for training of the seizure detectors.The scalograms were generated using Morlet mother wavelet of scales 1 to 500.

Figure 3 .
Figure 3. (A-D): Examples of post-HI high-amplitude stereotypic seizures (HASs) in all fetal sheep groups, equal to Figure 1D,E,K,L.(I-L): Examples of post-HI non−HAS EEG segments.(E-H) and (M-P): Examples of the corresponding scalogram images of the HAS and non−HAS EEG segments, respectively, used for training of the seizure detectors.The scalograms were generated using Morlet mother wavelet of scales 1 to 500.

Figure 4 .
Figure 4. (A): sig_1: An example of un−-centered seizure (HAS) from preterm fetal sheep (10 h post−HI insult).sig_3: evaluated moving median absolute deviation (movmad) of sig_1 with an arbitrary sliding window size of length of 8000 points, scaled up to sig_1 for visualization.(B): Example of the final centered seizure (HAS) in a 3.34 min long EEG segment used in the training and testing of the deep net classifiers.

Figure 4 .
Figure 4. (A): sig_1: An example of un−-centered seizure (HAS) from preterm fetal sheep (10 h post−HI insult).sig_3: evaluated moving median absolute deviation (movmad) of sig_1 with an arbitrary sliding window size of length of 8000 points, scaled up to sig_1 for visualization.(B): Example of the final centered seizure (HAS) in a 3.34 min long EEG segment used in the training and testing of the deep net classifiers.

Figure 5 .
Figure 5. ROC curves and the corresponding AUC values from testing WS−CNN (A-C), WF−CNN (D-F), and 1D−CNN (G-I) seizure detectors in study #1, #2, and #3, respectively.The data for each proposed classifier are presented as mean ± SD, demonstrating improved accuracy and much lower variability when data from all fetal sheep groups have been used in the cross-validation results of study #3.

Table 1 .
The architecture of the proposed 17-layer deep WS-CNN seizure classifier.

Table 1 .
The architecture of the proposed 17-layer deep WS-CNN seizure classifier.
2.6.Experiments, Data Acquisition, and Preparation Ethics All procedures were approved by the Animal Ethics Committee of the University of Auckland (R1942) under the New Zealand Animal Welfare Act and carried out in accordance with the Code of Animal Ethical Conduct established by the Ministry of Primary Industries of the New Zealand Government.

Table 2 .
Study design including three main categories.Evaluations performed separately for the WS-CNN, WF-CNN, and 1D-CNN seizure detectors.

Table 3 .
Number of fetal sheep in each group as well as the total number of manually identified seizures and non-seizure patterns from the left/right EEG channels.

Table 4 .
Results of the WS-CNN pattern classifier for the identification of post-HI seizures (HASs) across all study categories (#1-#3).

Table 5 .
Results of the WF-CNN pattern classifier for the identification of post-HI seizures (HASs) across all study categories (#1-#3).

Table 6 .
Results of the 1D-CNN pattern classifier for the identification of post-HI seizures (HASs) across all study categories (#1-#3).

Table 7 .
A comparison of the evaluated average overall performance of the proposed seizure detectors in each scheme of each study category.