Sleep State Classiﬁcation Using Power Spectral Density and Residual Neural Network with Multichannel EEG Signals

: This paper proposes a classiﬁcation framework for automatic sleep stage detection in both male and female human subjects by analyzing the electroencephalogram (EEG) data of polysomnography (PSG) recorded for three regions of the human brain, i.e., the pre-frontal, central, and occipital lobes. Without considering any artifact removal approach, the residual neural network (ResNet) architecture is used to automatically learn the distinctive features of di ﬀ erent sleep stages from the power spectral density (PSD) of the raw EEG data. The residual block of the ResNet learns the intrinsic features of di ﬀ erent sleep stages from the EEG data while avoiding the vanishing gradient problem. The proposed approach is validated using the sleep dataset of the Dreams database, which comprises of EEG signals for 20 healthy human subjects, 16 female and 4 male. Our experimental results demonstrate the e ﬀ ectiveness of the ResNet based approach in identifying di ﬀ erent sleep stages in both female and male subjects compared to state-of-the-art methods with classiﬁcation accuracies of 87.8% and 83.7%, respectively.


Introduction
Sleep stage classification can be helpful in identifying sleep disorders, such as, snoring, insomnia, sleep apnea, sleep deprivation, narcolepsy, sleep hypoventilation, and teeth grinding, etc. [1,2]. Sleep disorders are important healthcare concerns as they are significant contributors to fatigue and drowsiness, especially among drivers, and cause around 10-15% of the total vehicle accidents involving fatalities each year [3,4]. Therefore, in order to improve road safety and reduce the risk to millions of human lives, it is important that we understand the causes of sleep disorders, which involves, above all, the understanding and identification of different sleep stages [3,5]. Human sleep is of two fundamental types; rapid eye movement sleep (REM) and non-rapid eye movement sleep (NREM). The NREM sleep is believed to occur in three stages, i.e., N1, N2 and N3, where each stage progressively turns into deeper sleep. Among these NREM stages, most of our sleep time is spent in the N2 stage [6], whereas the REM sleep first starts 90 minutes after we fall asleep, and is mostly associated with dreaming. During a full night's sleep we go through multiple cycles of REM and NREM sleep [6].
Polysomnography is the study of sleep and is usually undertaken to study sleep disorders. It involves the collection of data from different sources using techniques, such as, electrooculogram (EOG), electrocardiogram (ECG), electroencephalogram (EEG), and electromyogram (EMG). 1.
An automatic sleep stage detection framework is developed for both genders due to inherent biologoical differences between the two that might affect the electrical activities in their brains and consequently the recorded EEG data. 2.
The proposed framework considers multi-channel EEG data as input that is recorded at different brain lobes to accurately detect different sleep stages.

3.
A ResNet architecture, with eight identity shortcut connections, is used with the PSD of time domain EEG signals as input, to identify different sleep stages. The performance of the proposed framework is compared with eight different approaches and is found to be better.
The rest of this paper is organized as follows: related work is summarized in Section 2, whereas Section 3 describes the details of the dataset and the proposed methodology. Results and discussion are presented in Section 4, whereas Section 5 concludes the paper.

Related Work
Safri et al. [39] highlighted the necessity of sex-based sleep state identification and analyzed the recorded EEG signals of 29 students by using partial directed coherence (PDC) and power spectrum estimation (PSE) to investigate gender differences in problem-solving skills. Among other things their work indicated differences between the two genders in terms of the functional connectivity between brain regions, i.e., in PDC, and power distribution of EEG waveforms. Similarly, Chellapa et al. [40] established that gender differences in light sensitivity affect intensity perception, attention and sleep in humans. Silva et al. [16] showed the existence of specific gender differences in sleep pattern by examining the recorded polysomnographic findings of participants in a sleep laboratory, thereby, establishing the need for a gender specific classification approach in sleep stage detection.
Several studies analyzed the attributes of the non-stationary EEG signals from different brain lobes. Wu et al. [17] proposed a signal processing based approach that combined empirical intrinsic geometry (EIG) and the synchro squeezing transform (SST) to compute the dynamic attributes of respiratory and EEG signal of central and occipital scalp sources (C3A2, C4A1, O1A2, and O2A1). Gunes et al. [18] proposed a Welch power spectral density based feature analysis with k-means clustering by analyzing EEG from central (C4-A1) scalp source. Friawan et al. [19] proposed a time-frequency domain-based feature analysis using random forests by analyzing the EEG data from central scalp source (C3-A1). Chen et al. [20] established a decision support algorithm with symbolic fusion for sleep state classification by analyzing EEG data from central scalp source (C3-A2 and C4-A1). Abdulla et al. used correlation graphs coupled with an ensemble model to analyze EEG data from the central scalp source (C3-A2) [26], whereas Phan et al. [27] proposed an attention-based recurrent neural network that used the prefrontal channel (Fpz-Cz) EEG data for sleep stage classification. Most of these methods, however, use EEG data from a single channel, which may not prove to be very robust in sleep stage detection.
Several studies highlighted the importance of considering multichannel EEG analysis to improve the classification performances. Jiang et al. [21] proved the relationship between altitudinal evidence entrenched in multi-channel EEG signals and various sleep stages by utilizing minimum Riemannian distance (RD) to covariance centers, which improved the classification performance. Similarly, Krauss et al. [11] demonstrated that spatially spread cortical activity, exhibited by efficient EEG amplitudes across different recording channels (prefrontal F4-M1, central C4-M1, and occipital O2-M1) contains all the related evidence for separating the sleep stages. Besides, Saper et al. [2] demonstrated that in the pre-frontal lobe (Fp2-A1 channel), the higher executive functions (e.g., emotional regulation, reasoning, and problem solving) occur frequently. In contrast, the visual processing functions are seen mostly in the occipital lobe (O2-A1 channel) of brain [2]. In this study, a classification framework based on multichannel EEG signals (Fp2-A1, Cz2-A1, and O2-A1) is considered. Figure 1 shows the block diagram of the proposed sleep stage classification framework. First, the power spectral density is calculated for individual channels of the EEG data for both the genders. After that, ResNet is trained using the PSD of individual channels as input, to identify different sleep stages based upon input from individual channels that is then combined in subsequent stages. The ResNet is individually trained to identify sleep stages in each gender, and its accuracy is also calculated individually for both genders.

Methodology
Appl. Sci. 2020, 10, x FOR PEER REVIEW 4 of 14 Figure 1 shows the block diagram of the proposed sleep stage classification framework. First, the power spectral density is calculated for individual channels of the EEG data for both the genders. After that, ResNet is trained using the PSD of individual channels as input, to identify different sleep stages based upon input from individual channels that is then combined in subsequent stages. The ResNet is individually trained to identify sleep stages in each gender, and its accuracy is also calculated individually for both genders.

Dataset Description
The proposed approach is validated using a publicly available dataset, i.e., the DREAMS database from the University of MONS-TCTS Laboratory (Stéphanie Devuyst, Thierry Dutoit) and Université Libre de Bruxelles -CHU de Charleroi Sleep Laboratory (Myriam Kerkhofs) [41]. Details of this dataset are given in Table 1. It contains polysomnography (PSG) data or EEG signals of 16 female and 4 male human subjects for a whole night. The data was recorded using a 32 channel polygraph from subjects who were aged between 20 to 65 years. The EEG signals were sampled at a frequeny of 200 Hz. The sleep stages were annotated by experts following analysis of microevents, each of 30-second epoch, using the criteria defined by AASM. As mentioned earlier, data from 3 EEG channels (i.e., Fp2-A1, Cz2-A1, and O2-A1) is used in this study. Moreover, the dataset appears to be a little imbalanced in terms of gender. However, this bias in data is removed by grouping the subjects based upon their gender, i.e., the proposed methodology considers sleep stage detection individually for both genders, hence, the class imbalance in the dataset does not affect the classifier's performance. In addition, the same number of samples are selected from different sleep stages to create a wellbalanced dataset.

Dataset Description
The proposed approach is validated using a publicly available dataset, i.e., the DREAMS database from the University of MONS-TCTS Laboratory (Stéphanie Devuyst, Thierry Dutoit) and Université Libre de Bruxelles -CHU de Charleroi Sleep Laboratory (Myriam Kerkhofs) [41]. Details of this dataset are given in Table 1. It contains polysomnography (PSG) data or EEG signals of 16 female and 4 male human subjects for a whole night. The data was recorded using a 32 channel polygraph from subjects who were aged between 20 to 65 years. The EEG signals were sampled at a frequeny of 200 Hz. The sleep stages were annotated by experts following analysis of microevents, each of 30-second epoch, using the criteria defined by AASM. As mentioned earlier, data from 3 EEG channels (i.e., Fp2-A1, Cz2-A1, and O2-A1) is used in this study. Moreover, the dataset appears to be a little imbalanced in terms of gender. However, this bias in data is removed by grouping the subjects based upon their gender, i.e., the proposed methodology considers sleep stage detection individually for both genders, hence, the class imbalance in the dataset does not affect the classifier's performance. In addition, the same number of samples are selected from different sleep stages to create a well-balanced dataset. To the best of the authors' knowledge, very few studies that use data driven techniques have utilized this dataset. Therefore, to establish the robustness of the proposed approach, it is compared with eight different techniques while using the same dataset and experimental setup, i.e., (1) Raw EEG + ResNet, (2) PSD + CNN (5 layers [42]), (3) PSD + CNN (10 layers), (4) PSD + CNN (14 layers), (5) PSD + CNN (18 layers), (6) FE + Random Forest [43] (RF), (7) fast Fourier transform (FFT) + multilayered perceptron (MPC) [44], and (8) PSD + MPC.

Domain Knowledge Extraction by Power Spectral Density
Sleep signals are continuous time series data with fluctuations in amplitude and behavior of different frequencies that changes over time for every sleep stage. The sheer volume of the highly fluctuating raw EEG data makes it challenging for machine learning algorithms to find a meaningful pattern that can be helpful in identifying different sleep stages. Hence, we calculate the Welch PSD of the raw EEG data to improve our chances of finding out a meaningful pattern that can be helpful in identifying different sleep stages [45][46][47][48]. The Welch PSD calculates the distribution of energy across different frequencies of the EEG signal. In Figure 2, the computed Welch PSD is plotted for the five different sleep stages of a female subject to show the differences among the PSD of different sleep stages. For calculating the PSD, the 30 s epoch is considered as an input. After that, DFT of 50% overlapping segment of nfft (number of points in DFT) is used (sampling frequency = 200 Hz, nfft = 100) to calculate the spectrum with DFT averaging. Therefore, the change in the amplitude is observed by converting the amplitude response from dBW/Hz to dBW/bin. For each dataset, the computed PSD response of each sleep state is different from the others, hence necessitating the use of three channel EEG data for sleep state identification.

Domain Knowledge Extraction by Power Spectral Density
Sleep signals are continuous time series data with fluctuations in amplitude and behavior of different frequencies that changes over time for every sleep stage. The sheer volume of the highly fluctuating raw EEG data makes it challenging for machine learning algorithms to find a meaningful pattern that can be helpful in identifying different sleep stages. Hence, we calculate the Welch PSD of the raw EEG data to improve our chances of finding out a meaningful pattern that can be helpful in identifying different sleep stages [45][46][47][48]. The Welch PSD calculates the distribution of energy across different frequencies of the EEG signal. In Figure 2, the computed Welch PSD is plotted for the five different sleep stages of a female subject to show the differences among the PSD of different sleep stages. For calculating the PSD, the 30 s epoch is considered as an input. After that, DFT of 50% overlapping segment of nfft (number of points in DFT) is used (sampling frequency = 200 Hz, nfft = 100) to calculate the spectrum with DFT averaging. Therefore, the change in the amplitude is observed by converting the amplitude response from dBW/Hz to dBW/bin. For each dataset, the computed PSD response of each sleep state is different from the others, hence necessitating the use of three channel EEG data for sleep state identification.

Residual Neural Network
ResNet has emerged as a revolutionary idea in deep learning [28,29]. The main difference between a typical and a residual neural network is the residual connection, i.e., identity or shortcut connection, which makes a significant difference in architecture and performance [49]. Unlike traditional neural networks, where each layer feeds into the next layer (Figure 3a), in a residual neural network, each layer feeds not just into the next layer but also into the layers which are a few hops away [30] (Figure 3b). Thus, it helps to propagate larger gradients to the initial layers through backpropagation, thereby solving the vanishing gradient problem and enabling the training of deeper networks [30,50].

Residual Neural Network
ResNet has emerged as a revolutionary idea in deep learning [28,29]. The main difference between a typical and a residual neural network is the residual connection, i.e., identity or shortcut connection, which makes a significant difference in architecture and performance [49]. Unlike traditional neural networks, where each layer feeds into the next layer (Figure 3a), in a residual neural network, each layer feeds not just into the next layer but also into the layers which are a few hops away [30] (Figure 3b). Thus, it helps to propagate larger gradients to the initial layers through backpropagation, thereby solving the vanishing gradient problem and enabling the training of deeper networks [30,50].
In this work, the calculated PSD is considered as the input to the ResNet. The ResNet architecture in the proposed framework is based on ResNet-18 [30,[51][52][53].
Like the standard CNN [49], this ResNet architecture contains convolution layers, fully connected layers, activation functions, batch normalizations (BN) [54], and a feed forward architecture [31,49]. The weights and biases in the convolutional layers are optimized through the backpropagation of errors during the training, whereas, non-linear transformation is achieved by using the rectified linear unit (ReLU) activation function [55], given by Equation (2). For minimizing the loss function, Adamax optimizer [56] is used. The considered ResNet architecture is shown in Figure 4.
There are 2 main differences between the original ResNet-18 architecture and the one used in this work, i.e., (a) BN is adopted to deal with internal covariate shift problem mostly due to the nonstationary nature of the PSD input data, and (b) usage of total 8 identity connections throughout the network architecture rather than usage of skip connection [53]. These 8 identity connections allow the gradients to promptly roll through the network architecture, without passing through the non- The connection mechanism of the feed-forward layers for neural architectures, i.e., (a) traditional neural networks, and (b) the residual neural network (ResNet) architecture.
In Figure 3b, p is the input of a neural network block, where the network wants to learn the true distribution of the input, denoted as D(p). If the difference (residual) between the input and input distribution is F(p), then In this work, the calculated PSD is considered as the input to the ResNet. The ResNet architecture in the proposed framework is based on ResNet-18 [30,[51][52][53].
Like the standard CNN [49], this ResNet architecture contains convolution layers, fully connected layers, activation functions, batch normalizations (BN) [54], and a feed forward architecture [31,49]. The weights and biases in the convolutional layers are optimized through the backpropagation of errors during the training, whereas, non-linear transformation is achieved by using the rectified linear unit (ReLU) activation function [55], given by Equation (2). For minimizing the loss function, Adamax optimizer [56] is used. The considered ResNet architecture is shown in Figure 4.
There are 2 main differences between the original ResNet-18 architecture and the one used in this work, i.e., (a) BN is adopted to deal with internal covariate shift problem mostly due to the non-stationary nature of the PSD input data, and (b) usage of total 8 identity connections throughout the network architecture rather than usage of skip connection [53]. These 8 identity connections allow the gradients to promptly roll through the network architecture, without passing through the non-liner activation functions. This suggests that the proposed deep network model should not create a training error higher than its shallower counterparts. Unfortunately, there is no rule of thumb that either helps to select the best number of identity connections or allows to tell the exact number of layers in the deeper network. It always depends on the complexity of the experimental dataset. The dataset utilized in this study is difficult to handle using traditional deep neural networks because of the nature of the PSD patterns obtained from the EEG signals. Therefore, a deeper network architecture, ResNet, is considered [51][52][53].
Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 14 liner activation functions. This suggests that the proposed deep network model should not create a training error higher than its shallower counterparts. Unfortunately, there is no rule of thumb that either helps to select the best number of identity connections or allows to tell the exact number of layers in the deeper network. It always depends on the complexity of the experimental dataset. The dataset utilized in this study is difficult to handle using traditional deep neural networks because of the nature of the PSD patterns obtained from the EEG signals. Therefore, a deeper network architecture, ResNet, is considered [51][52][53].

Data Preparation for Classification and Parameters for Performance Measurement
In the dataset used for this study, each sample from individual sleep stage consists of a 30 s epoch length. However, there is a significant imbalance in the data in terms of the number of samples for different sleep stages of individual subjects. To balance the dataset prior to its use by the ResNet, 70 data samples from each sleep stage are randomly selected. Thus, a total of 350 samples are considered for each healthy subject (each class comprises 70 data samples, and in total 5 signal classes were considered in this study). To compute the classification performance of ResNet, K-Fold cross validation (CV) [57] is used. It should be noted that the choice of K, in K-Fold CV is usually arbitrary. To lower the variance of the CV results, it is recommended to repeat/iterate CV procedure with several new random splits. Therefore, in this experiment, the number of K in the CV is 4. For executing the 4-Fold CV, all the samples from the individual datasets are divided in the ratio 75:25 for training and testing, respectively. For example, dataset 1 consists of 5600 samples (350 samples from each of the 16 female subjects), of which 4200 samples are used for training whereas the remaining 1,400 are used for testing. The details of datasets used for training and testing the ResNet are given in Table 2. The performance of the proposed framework is evaluated using several parameters, i.e., (a) F1 score (F1) [58], (b) accuracy score (AS), and (c) confusion matrix [59]. The F1 score and AS can be calculated using Equations (3), and (4), respectively.

Data Preparation for Classification and Parameters for Performance Measurement
In the dataset used for this study, each sample from individual sleep stage consists of a 30 s epoch length. However, there is a significant imbalance in the data in terms of the number of samples for different sleep stages of individual subjects. To balance the dataset prior to its use by the ResNet, 70 data samples from each sleep stage are randomly selected. Thus, a total of 350 samples are considered for each healthy subject (each class comprises 70 data samples, and in total 5 signal classes were considered in this study). To compute the classification performance of ResNet, K-Fold cross validation (CV) [57] is used. It should be noted that the choice of K, in K-Fold CV is usually arbitrary. To lower the variance of the CV results, it is recommended to repeat/iterate CV procedure with several new random splits. Therefore, in this experiment, the number of K in the CV is 4. For executing the 4-Fold CV, all the samples from the individual datasets are divided in the ratio 75:25 for training and testing, respectively. For example, dataset 1 consists of 5600 samples (350 samples from each of the 16 female subjects), of which 4200 samples are used for training whereas the remaining 1,400 are used for testing. The details of datasets used for training and testing the ResNet are given in Table 2.
In Equations (3) and (4), the terms TP, TN, FN, and FP represent the number of true positives, true negetives, false negatives, and false positives, respectively. For measuring the class-wise performance, F1 score is considered. The classification accuracy for each gender is determined by calculating the mean AS. Similarly, the overall accuracy can be determined by calculating the mean accuracy for both genders. During training, the loss function of the proposed deep learning framework is monitored to avoid both overfitting and underfitting [60]. The details of the performance analysis are given in Table 3.

Performance Analysis of Residual Neural Network
The results in Table 3 show that the data for female subjects yields better performance, 4.1% higher, compared to the data for male subjects, which is due to the fact that the amount of data available for the former is four times more than the later. The amount of data is critical for the construction of good machine learning models particularly deep learning models, which have significantly more parameters to tune. Similarly, the confusion matrices for this experiment are shown in Figure 5.  For the purpose of this experiment, the ResNet is trained for 1000 epochs with a 75/25 train/validation ratio to tackle the overfitting-underfitting problem. Moreover, during training, two different optimization techniques: i.e., stochastic gradient descent (SGD) and Adamax are considered for loss function optimization. Adamax shows better convergence (learning rate) for all the datasets compared to SGD as shown in Figure 6.

Comparisons and Discussion
The proposed approach is compared with the following eight techniques using the same datasets: (1) Raw EEG + ResNet: this approach uses the raw multi-channel EEG signals as input to the ResNet architecture used in the proposed framework. Comparison with this technique highlights the benefit of using the PSD of the EEG signals over the raw EEG signals as input.
(2) PSD + CNN (5 layers): this approach uses the same input data, i.e., PSD of raw EEG signals, but uses a 5-layer CNN instead of a ResNet. Comparison with this technique highlights the benefit of using ResNet compared to CNN.

Comparisons and Discussion
The proposed approach is compared with the following eight techniques using the same datasets: (1) Raw EEG + ResNet: this approach uses the raw multi-channel EEG signals as input to the ResNet architecture used in the proposed framework. Comparison with this technique highlights the benefit of using the PSD of the EEG signals over the raw EEG signals as input.
(2) PSD + CNN (5 layers): this approach uses the same input data, i.e., PSD of raw EEG signals, but uses a 5-layer CNN instead of a ResNet. Comparison with this technique highlights the benefit of using ResNet compared to CNN.
(3) PSD + CNN (10 layers): this approach uses the same input data, i.e., PSD of raw EEG signals, but uses a 10-layer CNN instead of a ResNet. Comparison with this technique highlights the benefit of using ResNet compared to a CNN with 10 layers.
(4) PSD + CNN (14 layers): this approach uses the same input data, i.e., PSD of raw EEG signals, but uses a 10-layer CNN instead of a ResNet. Comparison with this technique and the two techniques prior to it highlights the benefit of using ResNet compared to CNN with increasing number of layers.
(5) PSD + CNN (18 layers): this approach uses the same input data, i.e., PSD of raw EEG signals, but uses an 18-layer CNN instead of a ResNet. The number of layers in the CNN is the same as the number of layers in the ResNet in the proposed framework. Nevertheless, the performance of ResNet is better in comparison.
(6) FE + RF: this approach uses time-frequency based features extraction (FE) and a random forest (RF) classifier [41]. Comparison with this approach highlights the significance of both the input data and ResNet architecture used in the proposed framework.
(7) FFT + MPC: this approach uses fast Fourier transform (FFT) based feature analysis and a multilayered perceptron (MPC) based classifier [42] to highlight the importance of the deeper residual network architecture.
(8) PSD + MPC: this approach uses the PSD of EEG signals as input to a MPC based classifier to highlight the importance of the deeper residual network architecture.
Comparison of all these approaches with the proposed framework is given in Table 4. The results in Table 4 clearly show that the proposed approach is significantly more effective than all the other approaches in correctly identifying sleep stages in both the genders. The "Improvement" column in Table 4 shows the percentage improvement in absolute terms that the proposed method has over the other techniques. The comparison results show that the proposed framework (PSD + ResNet) yields an average performance improvement of 3.8-40.6% and 1.3-37.9% for female and male subjects, respectively.
The results in Table 3 show that the performance of raw EEG + ResNet is inferior to the proposed PSD + ResNet based framework, which indicates that the ResNet architecture is more efficient in learning the smaller changes of the subjective sleep states from the PSD compared to the raw EEG. Achieving the same performance with the raw EEG as input would require a deeper ResNet architecture with more identity shortcut connections. However, as the input length of raw EEG is far greater than the PSD, it would increase the training time for the ResNet. Moreover, approaches based on deep learning do not require artifact removal from the raw data, since these approaches are very good at automatically learning small changes in the data distribution [30,31,49].

Conclusions
In this study, an automatic sleep state detection framework is presented for both male and female human subjects. The proposed framework uses the PSD of EEG signals collected from three important regions of the brain, i.e., the pre-frontal, central, and occipital, as input to a ResNet. The ResNet architecture automatically extracts the intrinsic information for each sleep stage to classify it correctly. The proposed framework was tested on the publicly available Dreams dataset. It achieved accuracies of 87.8% and 83.7% for female and male subjects, respectively. The proposed framework was compared with several state-of-the-art methods to justify the choice of a ResNet based classifier and PSD based input data, and it yielded significant performance improvements, i.e., 3.8-40.6% and 1.3-37.9% for female and male subjects, respectively. The dataset used for validation of the proposed framework is imbalanced in terms of gender, which inevitably affected the experimental results. The difference in performance is understandable as in machine learning as well as in deep learning, more data often translates into a better model. Nevertheless, in future, a more extensive and balanced dataset would be considered to validate the findings of this study. Furthermore, in addition to PSD, other time-frequency based analysis techniques will be explored to capture all the variations in the non-stationary EEG signals.
Author Contributions: All the authors contributed equally to the conception of the idea, implementing, and analyzing the experimental results, and writing the manuscript. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.