Quantitative Evaluation of EEG-Biomarkers for Prediction of Sleep Stages

Electroencephalography (EEG) is immediate and sensitive to neurological changes resulting from sleep stages and is considered a computing tool for understanding the association between neurological outcomes and sleep stages. EEG is expected to be an efficient approach for sleep stage prediction outside a highly equipped clinical setting compared with multimodal physiological signal-based polysomnography. This study aims to quantify the neurological EEG-biomarkers and predict five-class sleep stages using sleep EEG data. We investigated the three-channel EEG sleep recordings of 154 individuals (mean age of 53.8 ± 15.4 years) from the Haaglanden Medisch Centrum (HMC, The Hague, The Netherlands) open-access public dataset of PhysioNet. The power of fast-wave alpha, beta, and gamma rhythms decreases; and the power of slow-wave delta and theta oscillations gradually increases as sleep becomes deeper. Delta wave power ratios (DAR, DTR, and DTABR) may be considered biomarkers for their characteristics of attenuation in NREM sleep and subsequent increase in REM sleep. The overall accuracy of the C5.0, Neural Network, and CHAID machine-learning models are 91%, 89%, and 84%, respectively, for multi-class classification of the sleep stages. The EEG-based sleep stage prediction approach is expected to be utilized in a wearable sleep monitoring system.


Introduction
Sleep is a biological activity that occurs spontaneously in humans and has an influence on task performance, physical and mental health, and overall quality of life. Sleep accounts for about one-third of an individual's whole lifetime. Sleep deprivation is the root cause of insomnia, anxiety, schizophrenia, and other mental illnesses. Moreover, drowsiness, an outcome of sleep deprivation, is a reason for around one-fifth of vehicle accidents and injuries. Sleep is a dynamic phenomenon including a variety of sleep phases, wake (W), nonrapid eye movement (NREM) sleep, and rapid eye movement (REM) sleep. Furthermore, NREM sleep stages are classified into NREM-1 (N1), NREM-2 (N2), and NREM-3 (N3) [1]. A healthy sleeper goes through multiple NREM and REM cycles throughout the night. The N1 stage occurs when the individual feels sleepy and marks the shift from the awake state. In Stage N2, the dynamics of vital signals, such as ocular movements, heart rate, body temperature, and brain activity start to attenuate. The N3 stage is considered deep sleep or • EEG biomarkers, consisting of frequency spectral measures for sleep stages, have been identified using statistical analysis. • Machine-learning models have been developed to classify the neurological states in different sleep stages. We organized the remainder of this article into four sections. The datasets and the methodology for EEG pre-processing, feature extraction, and statistical and machinelearning analysis methods are described in Section 2. After that, the results are reported in Section 3, trailed by the discussion. Lastly, we state our conclusions in Section 5.

Materials and Methods
To identify the physiological biomarkers of sleep stages and develop a machinelearning-based prediction model to classify the sleep stages, we performed EEG data pre-processing, feature extraction, feature selection, statistical analysis of features, and a machine-learning classification approach (Figure 1). Details about the EEG data processing, statistical analysis, and machine-learning classification methods are presented in the following subsections. • Machine-learning models have been developed to classify the neurological states in different sleep stages.
We organized the remainder of this article into four sections. The datasets and the methodology for EEG pre-processing, feature extraction, and statistical and machine-learning analysis methods are described in Section 2. After that, the results are reported in Section 3, trailed by the discussion. Lastly, we state our conclusions in Section 5.

Materials and Methods
To identify the physiological biomarkers of sleep stages and develop a machine-learning-based prediction model to classify the sleep stages, we performed EEG data pre-processing, feature extraction, feature selection, statistical analysis of features, and a machinelearning classification approach (Figure 1). Details about the EEG data processing, statistical analysis, and machine-learning classification methods are presented in the following subsections.

Dataset
We utilized the sleep recording of Haaglanden Medisch Centrum (HMC, The Hague, The Netherlands), available as an open-access public dataset in PhysioNet [21,22]. It was collected in 2018 and published very recently on 1 July 2021. The dataset includes a wholenight PSG sleep recording of 154 people (88 Male, 66 Female) with a mean age of 53.8 ± 15.4 years. Patient recordings were chosen at random and represented a diverse group of people who were referred for PSG examinations in the context of various sleep disorders. All signals were captured at 256 Hz using AgAgCl electrodes on SOMNOscreen PSG, PSG+, and EEG 10-20 recorders (SOMNOmedics, Randersacker, Germany). Each recording consists of four-channel EEG (F4/M1, C4/M1, O2/M1, and C3/M2), two-channel EOG (E1/M2 and E2/M2), one-channel bipolar chin EMG, and one-channel ECG. The recordings also contain the sleep scoring, consisting of W, N1, N2, N3, and R for an epoch of 30 sec. The AASM guidelines were used to score sleep stages which were manually scored by well-trained sleep technicians [1]. We have decided to use three EEG channels (F4, C4, and O2) in this study according to the international 10-20 EEG system.

Pre-Processing
The EEG signal was filtered to remove any 60 Hz AC noise from the nearby electrical grid. The eye-blink and muscle artifacts were separated and removed using EOG and EMG recordings from the EEG signal. Independent component analysis (ICA) was then

Dataset
We utilized the sleep recording of Haaglanden Medisch Centrum (HMC, The Hague, The Netherlands), available as an open-access public dataset in PhysioNet [21,22]. It was collected in 2018 and published very recently on 1 July 2021. The dataset includes a whole-night PSG sleep recording of 154 people (88 Male, 66 Female) with a mean age of 53.8 ± 15.4 years. Patient recordings were chosen at random and represented a diverse group of people who were referred for PSG examinations in the context of various sleep disorders. All signals were captured at 256 Hz using AgAgCl electrodes on SOMNOscreen PSG, PSG+, and EEG 10-20 recorders (SOMNOmedics, Randersacker, Germany). Each recording consists of four-channel EEG (F4/M1, C4/M1, O2/M1, and C3/M2), twochannel EOG (E1/M2 and E2/M2), one-channel bipolar chin EMG, and one-channel ECG. The recordings also contain the sleep scoring, consisting of W, N1, N2, N3, and R for an epoch of 30 sec. The AASM guidelines were used to score sleep stages which were manually scored by well-trained sleep technicians [1]. We have decided to use three EEG channels (F4, C4, and O2) in this study according to the international 10-20 EEG system.

Pre-Processing
The EEG signal was filtered to remove any 60 Hz AC noise from the nearby electrical grid. The eye-blink and muscle artifacts were separated and removed using EOG and EMG recordings from the EEG signal. Independent component analysis (ICA) was then used to eliminate ocular and muscle artifacts from the EEG signal using the FastICA methods [23]. Low-frequency motion artifact interference was produced by head and sensor movement close to the skin. A signal-to-noise ratio (SNR) was obtained for each signal by calculating the power ratio of the movement-affected EEG signal and the undisturbed measurement [24]. A band-pass filter was used to filter the EEG waveform within the frequency range of 0.5-44 Hz. The pre-processing and feature extraction of EEG data were carried out using the AcqKnowledge version 5.0 program (Biopac Systems Inc., Goleta, CA, USA).

Feature Extraction
EEG can be defined in terms of frequency and power within different frequency bands. The delta (δ) band ranges in frequency from 0.5 to 4.0 Hz, the theta (θ) band ranges in frequency from 4.0 to 8.0 Hz, the alpha (α) wave runs on 8.0-13.0 Hz, the beta (β) band is maintained in frequency from 13.0 to 30.0 Hz, and the gamma (γ) wave attained 30.0-44.0 Hz band [25,26]. EEG features were extracted from EEG signals using Fast Fourier transforms (FFT) and other methods to study the power within the EEG data. The power spectrum was computed using power spectral density (PSD) for each time epoch using the Welch periodogram technique [27]. For each epoch, the mean power, median frequency, mean frequency, spectral edge, and peak frequency features were extracted from this PSD. The epoch width was specified as 30 s. Extracted EEG features are summarized in Table 1 of this study. This EEG dataset contains a total of 89 sets of EEG features. The EEG Frequency Analysis was performed using FFT and the Welch periodogram [27] on artifact-free EEG signals with 10% hamming and extracted absolute power in the following spectral frequency bands: delta (0.5-4.0 Hz), theta (4.0-8.0 Hz), alpha (8.0-13.0 Hz), beta (13.0-30 Hz), and gamma (30.0-44 Hz). The average power of the power spectrum within the epoch was defined as the mean power. The median frequency was defined as the frequency at which half of the total power in the epoch is attained. The mean frequency was defined as the frequency at which the epoch's average power is obtained. The spectral edge is defined as the frequency below which 90% of the total power inside the epoch is attained. The frequency at which the maximum power occurs throughout the epoch was identified as the peak frequency. To normalize the amplitudes of distinct EEG bands, relative power (RP) was computed as the ratio of each band's power to the total power of all bands. For every 30 s epoch, all band power features were calculated. The following is the definition of the spectral power density of an EEG time-series signal x(t) with frequency j: wherex t (j) is the Fourier transform of x(t) at frequency, j (in Hz) using the Welch periodogram. The EEG Band Relative power is defined in Equation (2).

DAR, DTR, and DTABR
The delta-alpha ratio (DAR), defined as the ratio of delta to alpha band power, was calculated according to Equation (3). The delta-theta ratio (DTR) was defined as the ratio of delta band power to theta band power and computed according to Equation (4). Equation (5) defines the (Delta + Theta)/(Alpha + Beta) ratio (DTABR), identified as the relative sum of slow-wave (delta rhythm and theta rhythm) power to fast-oscillating wave (alpha rhythm and beta rhythm) power [29].

Features Selection
Feature selection greatly reduces the time and memory required for data processing, enabling machine learning algorithms to focus on just the most important features. The Fstatistics [30] were used to determine the relevance of each feature on a scale ranging from zero to one. We used the p-value (probability) based on F-statistics for feature selection to investigate the most contributing features after performing the one-way ANOVA F-test for each continuous predictor. In the first step, we eliminated any features that had constant or missing values. The significance of each feature was measured by its effectiveness in independently predicting the target class. In this study, features with feature importance (1-p) of more than 95% were selected, where p is the F-test result.

Classification Algorithms
Machine-learning algorithms were used to classify neurological features during wakefulness, stages N1, N2, N3, and R. EEG feature data from 80% of selected features was used for training, while 20% of data were used for testing classification algorithms. The Neural Network, CHAID, and C5.0 models were used to distinguish the neurological features of sleep stages. As the N1 stage dataset is smaller compared with the other sleep stage datasets, we implemented the "class weighting" technique [31], heavily weighting the N1 stage and under-weighting the majority classes to deal with the imbalanced dataset.

The Neural Network Model
The neural network is a data analysis technique that makes predictions based on the growth of a complex multi-layered network [32]. In this research, we employed a multilayer perceptron (MLP) neural network. This model is capable of estimating a broad variety of analytical models with minimal requirements on the model structure and assumptions. This model is comprised of multiple input nodes, a neural network with hidden layers, and an output layer.

Chi-Squared Automatic Interaction Detector (CHAID) Model
The chi-squared automatic interaction detector (CHAID) method creates a decision tree by incrementally breaking a subset into two or more child nodes, starting with the entire data set [33]. The optimal partition across all nodes is obtained by merging the classifiers' pairs until no significant difference in the target's pair is noticed. As a decision tree model, the output of the CHAID model is visually appealing and simple to read in a clinical decision support system. This technique is commonly used in applications involving biological data analysis.

C5.0 Model
The C5.0 model is a supervised data analysis method that attempts to construct decision trees or rule sets [34]. This model partitions the data according to the field with the greatest gain ratio. The model constructs a decision tree which is then pruned to reduce the tree's estimated error rate. This model requires little training time and is resilient to missing data and a large number of input variables.

Data Analysis
This study employed descriptive statistics to compare the participants' demographic data. The characteristics of the EEG spectra features were shown in a bar chart with an error bar. The data in the bar chart represents the mean value of the data along with their respective 95% confidence intervals (CI). Methods of statistical analysis consisted of descriptive statistics and hypothesis tests. The independent-samples t-test was used as a comparative measure of EEG features among sleep stages. A p-value of less than 0.05 was marked as statistically significant. Statistical analyses were accomplished using SPSS 26 software (IBM, Armonk, NY, USA). For the classification of sleep phases, we utilized state-of-art machine learning methods. EEG feature datasets were partitioned into the training and the testing dataset. We trained the machine learning algorithms on the training dataset to build the classification models which were then utilized for prediction on the EEG testing datasets. To eliminate overfitting, we used non-exhaustive k-fold (k = 10) cross-validation on the training dataset. For machine learning evaluations, we utilized IBM SPSS Modeler 18 software (IBM, Armonk, New York, NY, USA).

EEG Biomarkers for Sleep Stages
The EEG waveform varied during sleep with the change in sleep stages. Figure 2 shows the bar charts with error bars with a 95% confidence interval (C.I.) of EEG features of frequency bands during sleep stages W, N1, N2, N-3, and R. The global data indicates the average measures of the features of the frontal, central, and occipital lobes. The horizontal bars (brown color) are the outcomes of the hypothesis tests and indicate significant differences (p < 0.05) in EEG features among the sleep stages.
Alpha was highest in the wake stage and lowest in the N3 or deep sleep stage in all cortical positions. Alpha gradually weakens as sleep becomes deeper. In the REM sleep stage, the alpha wave again gains strength. Beta was also dominant in the wake stage and lowest in the N3 or deep sleep stage in all cortical positions. Beta gradually becomes dormant as sleep propagates from light sleep to a deep sleep state. In the REM sleep stage, the beta wave again increases.
Theta was highest in the REM stage and lowest in the N3 or deep sleep stage in all cortical positions. Theta increases in light sleep. In the REM sleep stage, the theta wave again gains strength. Delta was highest in deep sleep or N3 stage and lowest in the wake stage in the frontal and occipital cortical positions. An exception is observed only in the central lobe. In the central lobe, delta was highest in the wake stage and sharply went down in the N1 and N2 stages. Delta again gradually increased as sleep became deeper and was highest in REM sleep in the central cortex. Gamma was highest in the wake stage and    Delta power ratios, such as DAR and DTR, were explored during sleep stages W, N1, N2, N-3, and R (Table 3). Figure 3 shows the bar charts with error bars with a 95% confidence interval of DAR, DTR, and DTABR in sleep stages. Global delta ratio parameters (DAR, DTR, and DTABR) were dominant in the wake and N1 stages; they decreased sharply in the N2 and N3 stages. In the REM sleep stage, DAR, DTR, and DTABR further increase compared with the deep sleep N3 stage.

Machine Learning Analysis
Machine-learning algorithms were utilized to predict the physiological states of various sleep stages. Machine Learning analysis is comprised of three steps: feature selection, model training, and model testing (or validation). During the feature selection process, the F-statistics were used to assess the feature relevance of sleep EEG features. EEG features with a p-value larger than 0.95 were selected for further classification investigation. The confusion matrix, also known as the error matrix, clearly demonstrates prediction outcomes for all target classes. Other performance parameters are computed using the confusion matrix, including accuracy, sensitivity, and precision. Accuracy was defined as the ratio of correct predictions to total observations and was regarded as the most intuitive performance metric for identifying the optimal model. The following standard formulas are used to estimate the performance evaluation matrix:

Machine Learning Analysis
Machine-learning algorithms were utilized to predict the physiological states of various sleep stages. Machine Learning analysis is comprised of three steps: feature selection, model training, and model testing (or validation). During the feature selection process, the F-statistics were used to assess the feature relevance of sleep EEG features. EEG features with a p-value larger than 0.95 were selected for further classification investigation. The confusion matrix, also known as the error matrix, clearly demonstrates prediction outcomes for all target classes. Other performance parameters are computed using the confusion matrix, including accuracy, sensitivity, and precision. Accuracy was defined as the ratio of correct predictions to total observations and was regarded as the most intuitive performance metric for identifying the optimal model. The following standard formulas are used to estimate the performance evaluation matrix: where TP stands for the true positive, TN means the true negative, FP stands for the false positive, and FN means the false negative.

Multi-Class Classification of Sleep Stages
We utilized the machine learning algorithms for the multi-class classification of the sleep stages W, N1, N2, N-3, and R. The confusion matrices of the three machine-learning models (C5.0, Neural Network, and CHAID Models) were demonstrated in Tables 4-6 as the outcomes of prediction performance for sleep stages. The performances of the three machine-learning models (C5.0, Neural Network, and CHAID Models) were demonstrated in Figure 4 to classify the sleep stages using a training and testing dataset of EEG features.  Moreover, the wake stage was classified with the highest precision for training (86%) and testing (77%). Furthermore, the negative predictive value of the C5.0 model was highest in the wake stage for training (98%) and testing (96%).

Discussion
In our study, we characterized the neurological changes in sleep stages and classification of sleep stages using three EEG channels located in the frontal (F4), central (C4), and occipital (O2) lobes of a diverse group of adults. The extent of neurological change depends on the individual's sleep pattern, dynamics of sleep stage transitions, and the individual's lifestyle overall. We evaluated the neurological biomarkers through EEG in every sleep stage. Patient recordings were randomly chosen and reflected a broad sample of individuals referred for PSG exams for a variety of sleep disorders. Sleep is classified as REM or NREM sleep. Stages N1, N2, and N3 correspond to NREM sleep. Different sleep phases must be characterized and classified to identify sleep-related diseases. For instance, detecting REM sleep is an essential job for diagnosing REM sleep behavior disorder, and classification of wake and sleep states is required for sleep monitoring. This study addresses these demands by classifying W, N1, N2, N3, and REM stages.
Alpha rhythm, one of the basic features of human EEG, is prominent in the relaxed eye-closed awake state, N1, and REM sleep [35]. Alpha attenuates during high arousal states. In our study, alpha oscillation is higher in the resting awake state and decreases in the light sleep stage. The alpha activity also increases during REM sleep due to the short bursts of alpha rhythm [36]. A similar nature was observed for beta activity in sleep stages. Theta rhythm increases in light sleep (N1 and N2) stages relative to the wake stage and attenuates in the slow-wave sleep (N3) stage. A rise in delta activity was observed in the The Neural Network model showed 89% accuracy using the training dataset and 89% accuracy using the testing dataset for multi-class classification of sleep stages ( Table 8). The N3, REM, and W stage were the most accurately classified with accuracy for training (91%, 91%, and 92%) and testing (91%, 91%, and 92%). The wake stage was classified with the highest sensitivity for training (86%) and testing (86%). The sensitivity of the Neural Network model was lowest for the N1 stage. Moreover, the wake stage was classified with the highest precision for training (75%) and testing (76%). Furthermore, the negative predictive value of the Neural Network model was highest in the wake stage for training (97%) and testing (97%).
The CHAID model showed 84% accuracy using the training dataset and 84% accuracy using the testing dataset for multi-class classification of sleep stages ( Table 9). The W stage was most accurately classified with accuracy for training (90%) and testing (90%). The wake stage was classified with the highest sensitivity for training (72%) and testing (71%). The sensitivity of the CHAID model was lowest in the N1 stage. Moreover, the wake stage was classified with the highest precision for training (73%) and testing (73%). Furthermore, the negative predictive value of the CHAID model was highest in the wake stage for training (94%) and testing (94%).

Discussion
In our study, we characterized the neurological changes in sleep stages and classification of sleep stages using three EEG channels located in the frontal (F4), central (C4), and occipital (O2) lobes of a diverse group of adults. The extent of neurological change depends on the individual's sleep pattern, dynamics of sleep stage transitions, and the individual's lifestyle overall. We evaluated the neurological biomarkers through EEG in every sleep stage. Patient recordings were randomly chosen and reflected a broad sample of individuals referred for PSG exams for a variety of sleep disorders. Sleep is classified as REM or NREM sleep. Stages N1, N2, and N3 correspond to NREM sleep. Different sleep phases must be characterized and classified to identify sleep-related diseases. For instance, detecting REM sleep is an essential job for diagnosing REM sleep behavior disorder, and classification of wake and sleep states is required for sleep monitoring. This study addresses these demands by classifying W, N1, N2, N3, and REM stages.
Alpha rhythm, one of the basic features of human EEG, is prominent in the relaxed eye-closed awake state, N1, and REM sleep [35]. Alpha attenuates during high arousal states. In our study, alpha oscillation is higher in the resting awake state and decreases in the light sleep stage. The alpha activity also increases during REM sleep due to the short bursts of alpha rhythm [36]. A similar nature was observed for beta activity in sleep stages. Theta rhythm increases in light sleep (N1 and N2) stages relative to the wake stage and attenuates in the slow-wave sleep (N3) stage. A rise in delta activity was observed in the slow-wave deep-sleep (N3) stage compared with light sleep stages. Delta wave is considered an indicator of slow-wave deep sleep [37].
It has been observed that the classification rates for the N1 and N2 sleep stages are lower, which is one of the most challenging tasks. The N2 sleep stage is usually the transition between the light sleep and deep sleep stages [38]. As both N1 and N2 stages are light sleep states, the N2 stage is often mislabeled as N1. Therefore, the automated sleep staging algorithm misclassified it as N1 or N2 [39]. Moreover, gamma rhythms are identical in light sleep stages (N1 and N2) and REM sleep. This may also lead to the misclassification of N1, N2, and REM sleep stages. Furthermore, human sleep is a combination of distinct sleep phases with an unequal distribution of sleep epochs. Table 10 demonstrates a comparative study of methodologies and results between the current work and previous machine learning-based sleep studies. It is observed in Table 10 that our proposed approach has a notable improvement in prediction performance compared with the existing state-of-the-art works related to the five-class sleep states classification. Our classification performance is much higher than other multi-class classification studies. We analyzed only three-channel EEG data to understand the neurological changes in EEG due to sleep stages, focusing on single-channel data from each frontal, central, and occipital lobe. Although a standard sleep study consists of multimodal biosignals, we did not study all EEG channels to simplify the automatic sleep stage prediction suitable for a wearable sleep monitoring system. In the future, we plan to extend this study with multimodal signals to enhance the accuracy of the prediction models.

Conclusions
Prediction of sleep stages is considered an assistive technology in machine-learningenabled wearable sleep monitoring systems. The neurological biomarkers of sleep stages have been quantified through the EEG signal of polysomnography. In NREM sleep, attenuation of the alpha, beta, and gamma rhythms were observed, as well as the rise of theta and delta rhythms with the awake state and the subsequent increase in alpha and beta rhythms in REM sleep. Delta wave power ratios (DAR, DTR, and DTABR) are expected to be considered as biomarkers for their nature of decreasing NREM sleep and subsequent increase in REM sleep. The overall accuracy of the C5.0, Neural Network, and CHAID models are 91%, 89%, and 84%, respectively, in the multi-class classification of the sleep stages. This EEG-based sleep stage prediction technique is a promising candidate for further neuroscience research in a wearable sleep monitoring system.