Unsupervised Multitaper Spectral Method for Identifying REM Sleep in Intracranial EEG Recordings Lacking EOG/EMG Data

A large number of human intracranial EEG (iEEG) recordings have been collected for clinical purposes, in institutions all over the world, but the vast majority of these are unaccompanied by EOG and EMG recordings which are required to separate Wake episodes from REM sleep using accepted methods. In order to make full use of this extremely valuable data, an accurate method of classifying sleep from iEEG recordings alone is required. Existing methods of sleep scoring using only iEEG recordings accurately classify all stages of sleep, with the exception that wake (W) and rapid-eye movement (REM) sleep are not well distinguished. A novel multitaper (Wake vs. REM) alpha-rhythm classifier is developed by generalizing K-means clustering for use with multitaper spectral eigencoefficients. The performance of this unsupervised method is assessed on eight subjects exhibiting normal sleep architecture in a hold-out analysis and is compared against a classical power detector. The proposed multitaper classifier correctly identifies 36±6 min of REM in one night of recorded sleep, while incorrectly labeling less than 10% of all labeled 30 s epochs for all but one subject (human rater reliability is estimated to be near 80%), and outperforms the equivalent statistical-power classical test. Hold-out analysis indicates that when using one night’s worth of data, an accurate generalization of the method on new data is likely. For the purpose of studying sleep, the introduced multitaper alpha-rhythm classifier further paves the way to making available a large quantity of otherwise unusable IEEG data.


Introduction
Intracranial electroencephalography (iEEG) recordings are routinely collected during the surgical treatment of epilepsy, while simultaneous electromyography (EMG) recordings and electrooculography (EOG) recordings are not [1,2].Since the typical hospital stay of an epileptic patient undergoing invasive monitoring is 1 to 2 weeks, nearly every epilepsy unit in the US has iEEG sleep data that they have collected, and that they continue to collect, which is unaccompanied by electromyography (EMG) and electrooculography (EOG).These latter two recording modalities measure muscle tone and eye movements, respectively, and are required by expert human sleep scorers to separate wake from rapid eye movement (REM) sleep [3].Reliable identification of REM in this data would rescue a large quantity of rare data collected during invasive surgery from living human brains.This data could be used to, for example, (i) assess the clinical use of REM dynamics to accurately identify the epileptic zone [4][5][6]; such identification is required to successfully treat focal epilepsy seizures surgically [7][8][9], and (ii) to further our understanding of brain dynamics during REM sleep (e.g., [10]).
Scientists have characterized sleep into four stages, namely: non-REM stage 1 (NREM1), non-REM stage 2 (NREM2), non-REM stage 3 (NREM3), and REM sleep.These stages are based, in part, upon electrophysiological features observed in recordings such as EEG and iEEG.Of all the sleep stages, REM sleep exhibits electrophysiological features that are most similar to those exhibited in recordings collected from awake subjects.As a consequence, the separation of REM from the awake state (hereafter labeled "Wake"), is the most difficult classification task when accurately labeling electrophysiological recordings (e.g., see [11,12]).This labeling process is sometimes referred to as "sleep scoring" by sleep scientists, and individuals sleep scoring data are colloquially known as "sleep scorers".Since REM sleep is accompanied by low muscle tone and rapid eye movements, expert sleep scorers rely upon accompanying recordings of muscle activity (electromyography, EMG) and upon recordings that indicate eye movement (electrooculography, EOG) to separate Wake from REM sleep [13][14][15][16][17].As previously stated, these recordings are absent from the majority of iEEG data, and a method capable of identifying REM based upon iEEG data alone will be of significant value.
Sleep scoring is a laborious and time-consuming process.As a consequence, many methods of automatically sleep scoring electrophysiological data exist; examples include refs.[18][19][20][21].However, they have not been wholly successful, particularly when separating REM from Wake [11,12].This limitation has not been alleviated with the advent of modern machine learning and artificial intelligence (AI) methods.In a review of 37 published deep learning sleep classifiers spanning the ten years from 2010 to 2020, it is recommended that EOG and EMG be analyzed in conjunction with EEG to achieve robust classification results [22].More recently, classical machine learning methods have been compared against deep learning methods such as those reported in [23][24][25] for the purpose of scoring sleep and are found to perform similarly [26,27].It is worth noting, however, that the performance reported in [24] on EEG is excellent, possibly due to the use of time-frequency distributions, possibly due to the use of a convolutional neural network, or possibly due to both.
Regardless, it is important to note that iEEG is not EEG, and it exhibits key differences.First, iEEG is significantly more invasive than EEG, with multi-contact electrodes penetrating through the cortex of the brain to clinically-indicated deep structures such as the hippocampus and the amygdala.With EEG, the electrodes lie upon the scalp with consistent, and more evenly spaced, coverage from individual to individual.Second, iEEG typically lacks occipital recordings.These two differences are substantial, and any sleepscoring method that classifies sleep well using EEG recordings may not classify sleep well when applied to iEEG sleep data.For intracranial EEG, there is one published method of scoring sleep that does not use commensurate EMG and EOG recordings [28].This method of sleep scoring also performs least well when identifying REM and N1 sleep stages (see Figure 3, [28]).
Our method uses alpha oscillations recorded intracranially and unaccompanied by EOG or EMG recordings to identify REM sleep episodes.Alpha (8-12 Hz) oscillations occur during eyes-closed wakefulness in humans but dissipate once the subject falls asleep [29][30][31][32].This pattern of alpha activity has been used for the purpose of sleep scoring EEG data by human sleep scorers, especially to distinguish awake periods from periods of rapid eye movement (REM) sleep [4,33].The alpha oscillations described above, as detected using standard EEG, are most prominent over the occipital cortex.However, intracranial patients rarely have electrodes in the occipital regions; instead, intracranial electrodes are commonly located in the temporal and frontal regions of the brain.Figure 1 displays an example of the difference in alpha oscillations between Wake and REM sleep exhibited in iEEG.
Unlike the other previously mentioned iEEG sleep scoring method (i.e., [28]), the proposed Wake vs. REM classifier is unsupervised (i.e., there is no training), its goal is to identify REM, it is restricted to a single 4 Hz frequency interval (i.e., alpha rhythm), and it uses K-means clustering modified to cluster multiple-channel, multitaper eigencoefficients.Similar to the sleep scoring method presented in ref. [28], the classifier proposed here requires only one night's worth of data.Applying the method introduced in ref. [28] to our data yields hypnograms that agree less with those identified by human sleep scorers than those depicted in Figure 4 in ref. [28].This is the case despite trying all the software options provided in ref. [35] and selecting the analyzed channels using three different methods.Specifically, (i) selecting all available channels (this is the default "BEA" option in ref. [35]), (ii) selecting only the subset of iEEG channels used to compute the results in this paper, or (iii) selecting those channels that exhibit fewer than one ictal discharge per minute using the recommended ictal event detector [36].These last two methods use the "EEA" option specified in ref. [35].Since we have not attained the performance reported in ref. [28], we leave further work directly comparing our method with those introduced in ref. [28] for a future study.
Numerous automatic EEG sleep scoring methods are reported in the literature [37,38].A typical sleep scoring classifier involves feature selection, often through the use of clustering, followed by sleep stage classifier training.K-means clustering of EEG frequency domain features is reported in ref. [39][40][41][42].In refs.[39,40], K-means clustering of reduced EEG spectral features is used to obtain feature weights to use with a K-nearest neighbor classifier and a decision tree classifier for the purpose of scoring sleep data.In ref. [42], K-means clustering is applied to the spectrogram of the Cz EEG channel for semi-automated sleepscoring.In ref. [41], feature vectors containing EEG power spectra evaluated between the frequencies of 0.5 and 20 Hz are clustered to demonstrate subject-specific differences.Our method differs from these methods in that (i) its purpose is to identify REM from iEEG recordings, (ii) it is focused upon a single interval of frequencies, (iii) it is completely unsupervised and does not train a classifier, nor (iv) does it reduce the number of features in a feature-reduction step, (v) it uses aspects of the multitaper method of spectral time-series analysis to construct multi-channel alpha rhythm feature vectors, and (vi) the presented K-means clustering algorithm is updated from the original K-means algorithm to make use of the large-sample distribution of the resulting multitaper, multi-channel, alpha rhythm features.

Data Collection
Data were collected from nine patients (6 females and 3 males) between the age of 19 and 55 years (mean 33.11 ± 13.99(SD) years) undergoing invasive monitoring using intracranial depth electrodes for intractable epilepsy (see Table 1).The study from which data was collected was approved by the Carilion Clinic IRB, IRB-20-1179, 8 February 2021, and was conducted in accordance with the Declaration of Helsinki.Patients signed informed consent prior to participating in the study.Two to sixteen depth electrodes, placed according to clinical criteria (see Table 2), each with 6 to 14 regularly-spaced macro contacts, were implanted in each subject, and the electric potential difference between each contact and the reference contact (a subdermal electrode contact placed near Fz), were sampled at 2 kHz.For each subject, recording started between 8 PM and midnight, depending on an estimated bedtime.EEG data from the raw Neuralynx files was converted to EDF files in blocks of 4 h periods.For each subject, between 6 and 12 h of sleep/wake data were recorded and considered for analysis.Electrophysiological readings from Ad-Tech electrodes are collected at 2 kHz with a 256-channel ATLAS digital acquisition system, (Neuralynx, Bozeman, MT), which was connected to a PC running Neuralynx, Pegasus Acquisition Software, version 2.1.1.

Electrode Localization
The locations of depth electrode contacts are determined from preoperative highresolution magnetic resonance imaging (MRI) scans along with postoperative CT scans.For each subject, the Freesurfer program [43,44] is used to generate 3D RAS coordinates and Desikan-Killiany anatomical labels [45] for the electrode contacts, and then the postoperative CT scan is coregistered with these coordinates using the IELU pipeline and a MATLAB script [46,47] to obtain anatomical locations.

Manual Sleep Scoring
For the purposes of comparison, manual sleep scoring is conducted independently by two experienced sleep scorers using criteria specified by the American Association of Sleep Medicine for sleep scoring [4] using synchronized EEG (EEG is collected from a centrally located subdermal strip), iEEG, EMG, and EOG signals.All recordings are collected using the ATLAS recording system.Periods identified as REM sleep are further verified by two additional sleep scorers.An epoch is labeled REM if low-amplitude mixedfrequency activity is observed along with rapid eye movements and a low chin EMG tone.Of the subjects examined, only subject 7 displays unusual electrophysiological phenomena.Though epochs for subject 7 labeled as NREM or REM exhibit many features of their respective sleep stages, all sleep stages also display unusual activity.For instance, during periods of low EMG tone accompanied by conjugate eye movements that would otherwise be labeled REM, high voltage, low frequency activity (but not low enough frequency to be N3) is often observed on some channels.This behavior is not typical for any of the other subjects.Subject 7 is included for completeness and also as an example of how the proposed REM/Wake classifier performs on unusual activity.

Data Selection
Analysis is restricted to at least one seizure-free night of sleep where the subject entered both NREM and REM sleep and to surface contacts on frontal and temporal depth electrodes.These contacts are more likely to be (i) seizure-free, (ii) collected by other iEEG data centers, (iii) may generalize to electrocorticography (ECoG) data, which records activity at the surface level of the brain, and (iv) are most likely to exhibit the neural REM/Wake alpha-rhythm dynamics observed in standard EEG recordings.
Electrode contacts are further removed from analysis if identified by epileptologists to lie in the epileptic focus or to exhibit interictal activity (see Tables 1 and 2).Table 2. Number and placement of implanted electrodes.Each electrode had multiple macro contacts, which are the actual recording surfaces located along the electrode shaft.Therefore, some macro contacts were located closer to the surface of the brain.Note that the macro contacts (6-10 per electrode) were evenly spaced along the electrode shaft.

Subject Number
No

Filtering and Sectioning
To suppress power-line interference, selected data is filtered to remove 60 Hz and harmonic frequencies using a zero-phase FIR notch filter.Filtered data is partitioned into 30 s, non-overlapping sections.Each section is assigned a stage label by the manual sleep scoring procedure (see Section 2.3).

Multitaper Eigencoefficients
Each 30 s section of data is further subdivided into 5 non-overlapping sections, each of which is 6 s in duration.Let c ∈ N equal the electrode-contact index, s ∈ N indicate the subject, e indicate the 30 s episode (or section), m ∈ {1, 2, 3, 4, 5} the subsection, and let t equal the subsection time-index.Further define, d (c,s,e) m,t to be equal to the t th centered sample of the m th subsection of the e th 30 s window for contact c and subject s.Note that the explicit dependence of c, e upon s is suppressed in the notation.Let ν (k) t be the t th sample of the k th -order discrete-prolate spheroidal sequence (DPSS); a DPSS is parameterized by order k, as well as by its length N and bandwidth W. The zero-th order DPSS possesses, among all length N sequences, the maximum signal energy within the frequency interval (−W, W).The k + 1 st order DPSS is, of all sequences orthogonal to ν (0) , maximally energy concentrated within (−W, W) [48].Here we focus on alpha-rhythm, which is classically considered bandlimited to the frequency interval (8 Hz, 12 Hz).For the remainder of this work f are equal to 10 Hz and W is equal to 2 Hz.For sample period ∆, the corresponding dimensionless DPSS time-bandwidth parameter is equal to ∆NW.It is equal to 12 for the chosen ∆N and W. Let F N , equal to 1/2∆, be the Nyquist frequency.The corresponding k th order eigencoefficient, Y e,m,k , evaluated at the frequency f ∈ − f N , f N , is equal to [49], Here, i 2 is equal to −1 and Y (c,s,e) m,k ( f ) ∈ C is a complex-valued number.A simple multitaper spectrum estimate, Š(c,s,e) , evaluated at f , is equal to, ( The parameter K is chosen to be the number of DPSSs that possess signal-energy within the frequency interval (−W, W) exceeding 0.98 (K = 22).The signal-energy of a sequence is equal to its squared 2 norm.The 2 norm of any DPSS is equal to 1.With this specification, out-of-band bias due to spectral leakage is limited by the optimal in-band energy concentration properties of the DPSSs [34,[49][50][51].The resulting K is equal to 22 (K = 2NW − 2).Let N c equal the number of electrode contacts for subject s.Form the matrix of eigencoefficients Y (s,e) ∈ C N c ×110 characterizing alpha rhythm for the e th 30-s recording for subject s.The p th row and q th column of Y (s,e) is equal to, such that q = 22 m q − 1 + k q and subject to the restriction m q ∈ 1, . . ., 5, k q ∈ 1, . . ., 22.

Multitaper Eigencoefficient Clustering
Cluster labels are assigned in a K-means clustering algorithm modified to cluster the multitaper eigencoefficients characterizing each 30 s recording episode.Specifically, the distance introduced in [52,53] is replaced with the cluster-conditioned probability density, P(Y | C = j), for cluster j.Furthermore, the probability density of observing the eigencoefficient feature matrix Y, conditioned upon cluster j is, Here, N r is equal to 110.It is equal to the number of subsections (i.e., 5) multiplied by the number of DPSS tapers, K (i.e., 22).The real-valued matrices Y r (Y i ) are the real (imaginary) components of the eigencoefficient feature matrix Y, μ(j) ∈ C N c is the j th cluster-conditional sample average, and Ř(j) ∈ C N c ×N c is the j th cluster-conditional sample covariance matrix.Let the j th element of the set of labels L j ∈ {1, . . . ,N cluster } N episode indicate the cluster to which the eigencoeffiecient feature-vector Y (s,e) is assigned.Here, N cluster is the number of clusters and N episode is the number of 30 s epochs.For subject s.Let Q j be the number of episodes assigned to cluster j.Furthermore, for cluster j, the cluster-specific mean-vector estimate μ(j) is equal to, Y (s,e)   .,jN r Q j . ( Let the convenience variable Ỹ(s,e,j) equal Ỹ(s,e,j) = Y (s,e) − μ(j) 1 T Nr .
Here, 1 Nr is a column-vector of Nr elements, each of which equals 1.The real part, Ř(j) r , of the cluster specific covariance matrix estimate Ř(j) , is equal to, Similarly, the imaginary part, Ř(j) i , is equal to, Ř(j The cluster-conditional probability density function, Equation ( 4), is motivated by the asymptotic distribution of the tapered discrete-time Fourier transform of a multivariate, weak-sense stationary, random process (see, for example, Theorem 4.4.2,[54]), as well as the approximate independence of the multitaper eigencoefficients due to the orthogonality of the DPSSs [49].The steps in the modified K-means algorithm are as follows: 1.
Specify the number of clusters.2.
The initial observation labels are randomly guessed (as recommended in ref. [55]), and the cluster-specific parameters µ and R are estimated for each cluster using Equations ( 5), ( 7) and ( 8), to obtain μ, Ř for each cluster.
Steps 3 and 4 are repeated until an iteration occurs without a resulting change in the feature-matrix label.Once this occurs, subsequent iterations result in no change, and the algorithm has converged.
As previously stated, this iterative procedure is a variation on standard K-means clustering and is related to the K-means clustering algorithms discussed in [52,53,55].For our data, by iteration 25, no changes to the eigencoefficient feature-matrix labels occur between one iteration and the next.The procedure consisting of steps 1 through to 5 above produces a single clustering.It remains to determine the number of clusters to use.Upon convergence, an approximate Akaike information criterion A IC is computed.It is equal to, with, as before, N c equal to the number of electrode contacts.As with the Akaike information criterion, for the purpose of model selection, models, or labelings, which minimize A IC promote models that result in probable labeling and penalize models with many clusters.Smaller values of AIC imply a lower Kullback-Liebler divergence between the unknown data-generating probability density function and the modeled generating probability density function [56,57].In this work, steps 1 through 5 above are computed, starting with N cluster equal to 2 through 14.The chosen clustering, Ñcontact , minimizes the A IC specified in Equation (9).That is, Ñcontact = arg min n ∈{2,...,14} A IC(n ) .(10)

REM Cluster
The goal of this work is to reliably identify a subset of REM episodes.This is accomplished once cluster labels have been assigned to the episode feature matrices, Y (s,e) , e = 1, . . ., N episode , by identifying, in a sense to be made precise below, the cluster exhibiting the lowest alpha spectral power.Let C j be the set of episodes assigned to cluster j.
A cluster-conditional multitaper estimate of alpha-power for contact c and subject s is equal to, Š(c,s) Next, for each contact, c, use Š(c,s) j , j = 1, . . ., N cluster , to rank the clusters from smallest Š(c,s) j to largest.Specifically, let the rank r The rank, r Finally, let the REM cluster index j (s) * for subject s equal, The episodes belonging to C j * are identified as REM sleep.If j (s) * is integer.In the situation where j * is not integer-valued the analysis is deemed unreliable.For all data analyzed in this work, j (s) * is integer.

Measures of Label Confidence
Two proxy measures of REM label identification specificity are computed.Let j 1,c be equal to, and let j 2,c equal, Then, the minimum, per-contact, alpha-power difference, dα c is equal to, The first figure of merit, FOM 1 , is equal to, Smaller FOM 1 indicates that, for at least one electrode contact, the difference in alpha power between the two clusters exhibiting lowest alpha power is small.It is listed in row 4 of Table 3 and is least for the two worst-performing subjects.A second figure of merit, FOM 2 , is equal to the fraction of the contacts, c, in cluster j * (i.e., the REM cluster), for which Š(c,s) is not minimum (see row 5, Table 3).

Hold-Out Analysis
The specificity of the proposed classifier depends on the sample size.Figure 2 shows the specificity of the proposed classifier (see Section 2.8) on random data splits.Data is randomly divided into two equally sized sets of data (manually identified REM and Wake episodes are each randomly split apart from each other and then combined), and the classifier is applied to each set.This random splitting and classification is repeated 50 times.The resulting REM identification specificity is plotted in Figure 2.  3).
Dependence upon sample size is further quantified by repeating the above 50 random data splits, but in this case, changing the split to data divisions of varying size.The resulting boxplots of REM identification specificity are plotted in Figure 3.  3), half of a night's worth of REM and Wake episodes are required.

Comparison with the Alpha Power Detector
The proposed unsupervised classifier introduced in Section 2.8 is an atypical classifier.It is insightful to compare this classifier with an alpha-power binary hypothesis test.To do this, the statistical power of the proposed classifier with the alpha-power test is matched by setting the latter test's threshold such that the number of REM episodes identified by the two different methods is equal.The number of false positives resulting from classification using the two methods is compared in Figure 4. Of the subjects exhibiting normal sleep, the proposed classifier out-performs the equivalent statistical-powered hypothesis test based upon alpha power for five of eight subjects, and performs comparably on the remaining two subjects.For one subject, the specificity of the proposed classifier is 20% larger than the specificity demonstrated by the competing binary-hypothesis test.A matched-power comparison between the proposed classifier and a binary hypothesis test which compares alpha-power to reject a null hypothesis which is associated with REM sleep.The alpha-power threshold of the latter is set such that an equal number of REM episodes are identified by both classifiers.Of the subjects exhibiting normal sleep, the proposed classifier out-performs the equivalent statistical-powered hypothesis test based upon alpha power for six of eight subjects.

Clustering Pseudo-Code
The following pseudo-code illustrates the classification of the REM and Wake sleep stages.Given a night of 30 s, multi-channel, data section,

1.
Divide the data into 30 s, non-overlapping sections of data.

2.
Compute the 110N c eiegencoefficients characterizing each 30 s section of data.

3.
Specify the number of clusters to be equal to 2.

4.
Randomly assign a cluster label to each of the 30 s sections of data.

5.
Compute each of the cluster statistics conditioned upon the cluster labels.6.
For each 30 s section of data, and for each cluster, compute the log-probability of the observed section data conditioned upon each of the cluster labels.7.
Assign to each 30 s section of data the cluster label that maximizes the log-probability of the observed 30 s section eigencoefficients.8.
Repeat steps 4 through 7 until the cluster labels do not change from one repetition to the next.9.
Compute the approximate AIC, Ã IC, for this number of clusters, using all of the log-probabilities.10.Repeat steps 3 through 9 but increment the number of clusters on each repetition until 14 clusters are used, or until an empty cluster results.
Steps 1 through 10 above are used to estimate the cluster labels, which are taken to be the labels associated with the smallest Ã IC.

REM Identification Pseudo-Code
To identify the REM cluster, first perform the clustering estimation described in Section 2.12.Then: 1.
Average the cluster-specific alpha spectral power across 30 s episodes.

2.
For each electrode contact, order the average cluster alpha spectral power across clusters.3.
For each cluster, compute the median of the ranks across the contacts.This is the spectral rank.4.
The REM cluster is the cluster with the smallest spectral rank.

Software Overview
The Pegasus Acquisition Software, Neuralynx, version 2.1.1 is used for data acquisition.Except for the spectrogram depicted in Figure 1, custom MATLAB scripts are used for all spectral analysis, clustering, signal processing, and statistical analysis.These scripts are available upon request.Multitaper spectral analysis dates back to the seminal paper published in 1982 [49].Software available to perform multitaper analysis is available in the following languages: Fortran, R, C, Python, Julia, and MATLAB [34,[58][59][60][61][62].

Results
Table 3 shows the results of applying the proposed classifier introduced in Section 2.8 to recorded iEEG data.The unsupervised, multitaper, alpha rhythm classifier introduced in Section 2.8 outperforms an equivalently powered alpha-power hypothesis test (see Figure 4 in Section 2.11) and attains specificity comparable to that obtained by experienced human sleep-scorers.It requires one-half of a night's worth of REM and Wake episodes of healthy sleep to obtain results comparable to those obtained with a full night of data (see Figure 3 in Section 2.10).Since the time at which these episodes are recorded spans an entire night, for practical purposes, an entire night's worth of data is required.The provided figure of merit, FOM 1 , is lowest for the two worst-performing subjects (see row 3, Table 3), while FOM 2 does not correlate with performance (compare rows 2 and 4 of Table 3).

Discussion
The proposed multitaper, multi-channel iEEG REM detector achieves a specificity greater than or equal to 94% on 6 out of 8 subjects exhibiting normal sleep.The figure-ofmerit metric, FOM 1 , reported in Table 3, and labeled "Minimum alpha power difference between the closest competing cluster with the REM identified cluster" differs by over an order of magnitude between the typical subject and the two worst performing subjects.This figure-of-merit responds to the separation in terms of alpha power between the closest cluster and the REM cluster itself.Subject selection based upon this criterion raises performance to a specificity of greater than or equal to 94% on 6 out of 7 subjects when identifying REM with the proposed method.Finally, it is worth noting that while the precision of the proposed detector is not high, it is sufficient to identify a median (minimum) minutes of REM equal to 28.75 (14.5) min per subject.This is identified from a single night of data when the typical hospital stay of a focal epilepsy patient is 1 to 2 weeks.
The proposed method is motivated by six main statistical signal processing considerations.The first (i) is to use sufficient information in the alpha frequency interval for the purpose of optimizing a binary detector (Wake vs. REM).Naively, this suggests the use of time-frequency distributions such as those employed in ref. [24] to classify sleep.It is important to note that there exists a bias/variance trade when temporally resolving frequency-domain structures.Resolution in time implies less temporal averaging.In many cases, this results in an increase in estimator variance, which in this case increases feature-domain variance, which can obscure clusters.That bias will move cluster centers; however, unless it moves them in an overlapping direction, this does not impact clustering performance for the purpose of detection.Finally, the potential information lost by not using a time-frequency distribution is the specific time within a 30 s episode at which a reproducible time-frequency event occurs.This information, if it exists, may further divide the REM cluster and necessitate an otherwise unnecessary cluster-merging step.Similarly, one might consider incorporating phase-phase and phase-amplitude spectral features (e.g., [51]).We seek to capture known average dynamics that occur in REM and, furthermore, the across-brain region timing of alpha oscillations.This suggests the use of multichannel multitaper spectral statistics [49,64], because (a) a multitaper spectrum estimator minimizes out-of-band bias while controlling the increase in variance associated with data tapering; (b) the multitaper eigencoefficients are concentrated within the frequency interval ( f − W, f + W), are pairwise orthogonal on this interval, and form an approximate basis for this interval (and by symmetry for (− f − W, − f + W), for real-valued time-series).The second statistical signal processing consideration (ii) is to use the likelihood-ratio.It is known by the Neyman-Pearson lemma to provide the optimal binary detector (see, for example, ref. [65]).The third, fourth, and fifth considerations (iii,iv,v) are to allow for subject specific idiosyncracy (exhibited by ref. [41] and enhanced in iEEG due to variable epileptic focii), temporal changes of brain state, and corrupted data, such as that which results from recording during patient motion.These considerations suggest the use of unsupervised learning applied to the 30 s stretches of recording to be sleep-scored.Among the large number of unsupervised learning methods, K-means clustering is classical and is covered in books such as [66].Finally (vi), there is a limited quantity of sleep-scored iEEG data from which to train a classifier.To generalize well, our detector must be simple and motivated by existing scientific and signal processing knowledge.These considerations lead us to select the minimum alpha power cluster as defined in Section 2.8 since Wake, motion artifacts, and other outliers all tend to increase the signal power observed in the alpha frequency interval.
The performance of the proposed method compared with that exhibited by a more simple alpha-power detector (see Figure 4, Section 2.11), suggests the use of the proposed detector.Specifically, the specificity is superior for 4 of 8 subjects, is the same for 1, slightly worse for 2, and exhibits a much greater specificity for one subject.Focusing on the impact of the performance improvement on this latter subject, should this performance improvement generalize, the proposed detector will avoid the expected 20% of incorrectly labeled REM episodes that would occur when using the simple alpha power detector on 1/8 th of all iEEG data.
The bootstrap analysis (see Figure 2) indicates comparable detector performance across random data assignments for all but Subjects 4,8, and 7 (i.e., the worst-performing subjects).For subjectss exhibiting normal sleep, the detector is applied in a hold-out analysis.For these worse-performing subjects, performance on half a night of data is comparable to that attained from a night of data.See Figure 3 and Table 1.The bootstrap and hold-out analyses provide evidence that the specificity exhibited by the proposed multitaper detector will generalize.
This work can be extended by including the proposed REM detector as part of a more general sleep stage classifier.In this context, the sleep-stage transition probabilities may be used to potentially increase performance.The sample size of nine subjects is not large.It will be important to verify the performance of the proposed multitaper, multi-channel REM detector with larger, future iEEG data.Should the performance generalize, it provides a basis upon which REM can be identified in iEEG recordings unaccompanied by EOG and EMG.Identified REM activity can be correlated against, for example, the location of the epileptogenic zone for the purpose of improving the outcome of focal epilepsy surgery.

Conclusions
The proposed unsupervised multitaper classifier correctly identifies 36 ± 6 min of REM in one night of recorded sleep (row 1, Table 3) while incorrectly labeling less than 10% of all labeled REM episodes for all but one subject; human rater reliability is estimated at near 80% [63].This classifier outperforms the equivalent statistical-power classical test.Hold-out analysis indicates that when using one night's worth of iEEG data, an accurate generalization of the method on new data is likely.The introduced multitaper multi-channel alpha-rhythm classifier further paves the way to making a large quantity of otherwise unusable iEEG data available for the purpose of studying sleep.

Figure 1 .
Figure 1.Electrode localization and spectrogram.(A ) A sagittal MRI image showing the location of the outermost cortical contact of a frontal electrode in the left hemisphere for subject 6; the contact is located in the left superior frontal gyrus.( B ) Spectrogram showing the alpha activity pattern detected from this electrode contact during Wake and REM sleep.The black line indicates the beginning of the REM sleep episode.Note that not shown is a large lapse in time-the REM episode occurred much later in the night than the wake period.The frequency (vertical axis) is in units of Hz.The multitaper spectrogram is computed using the Chronux MATLAB software, version 2.10 [34] using a 3 second duration sliding window with 50% overlap, a time-bandwidth parameter of 3 and 5 DPSS tapers.

j
of cluster j, for subject s, to be the median of r (s) j,c across electrode contacts: ρ (s) j = median c = {1,...,N c } r (s) j,c .

Figure 2 .
Figure 2. Performance of the proposed classifier is excellent with the exception of Subjects 4, 8 and 7. Subject 7 exhibits abnormal sleep and can be excluded along with Subject 4 based upon FOM 1 (see Table3).

Fraction 8 Figure 3 .
Figure 3.For subjects 4 and 8, to obtain specificity approaching full-data performance (see row 1 of Table3), half of a night's worth of REM and Wake episodes are required.

FractionFigure 4 .
Figure 4.A matched-power comparison between the proposed classifier and a binary hypothesis test which compares alpha-power to reject a null hypothesis which is associated with REM sleep.The alpha-power threshold of the latter is set such that an equal number of REM episodes are identified by both classifiers.Of the subjects exhibiting normal sleep, the proposed classifier out-performs the equivalent statistical-powered hypothesis test based upon alpha power for six of eight subjects.

Table 1 .
Patient Information.The row for each patient provides demographic information, diagnostic information about their epilepsy, and brain imaging results.

Table 3 .
[63]proposed method identifies REM in 6 out of 8 subjects exhibiting normal sleep with a specificity greater than or equal to 94 %.For the other two subjects, the specificity is 86% and 73%; respectively.It exhibits performance comparable to the inter-rater reliability exhibited by human sleep scorers[63].