Machine-Learning-Based-Approaches for Sleep Stage Classification Utilising a Combination of Physiological Signals: A Systematic Review

: Increasingly prevalent sleep disorders worldwide significantly affect the well-being of individuals. Sleep disorder can be detected by dividing sleep into different stages. Hence, the accurate classification of sleep stages is crucial for detecting sleep disorders. The use of machine learning techniques on physiological signals has shown promising results in the automatic classification of sleep stages. The integration of information from multichannel physiological signals has shown to further enhance the accuracy of such classification. Existing literature reviews focus on studies utilising a single channel of EEG signals for sleep stage classification. However, other review studies focus on models developed for sleep stage classification, utilising either a single channel of physiological signals or a combination of various physiological signals. This review focuses on the classification of sleep stages through the integration of combined multichannel physiological signals and machine learning methods. We conducted a comprehensive review spanning from the year 2000 to 2023, aiming to provide a thorough and up-to-date resource for researchers in the field. We analysed approximately 38 papers investigating sleep stage classification employing various machine learning techniques integrated with combined signals. In this study, we describe the models proposed in the existing literature for sleep stage classification, discuss their limitations, and identify potential areas for future research.


Introduction
Sleep is a fundamental human function that involves a series of changes in the heart, brain, muscles, eyes, and respiratory activities.Besides aiding in mental and physical health recovery, sleep contributes to healthy brain functionality during the day [1,2].However, drowsiness and sleep disorders, such as sleep apnoea [3] and periodic leg movement [4], can adversely affect daily activities [5].Worldwide, more than 400 million adults have sleep apnoea [6].A study conducted in the United States found that up to 24% of adults are affected by sleep issues [7].In Australia, 33% of the population is affected by insomnia [8].In addition, the Sleep Heart Health Study found that people who have trouble falling asleep may be affected by health issues that include neurocognitive deficits, cardiovascular problems, diabetes, recurrent heart attacks, and stroke [9,10].Therefore, to protect human health, it is essential to monitor sleep.
Sleep specialists, who are experts trained in sleep medicine, follow the guidelines of the American Academy of Sleep Medicine (AASM) [11] to classify sleep into three primary stages: wake (W), non-rapid eye movement (NREM) sleep encompassing three substages (N1, N2, and N3), and rapid eye movement (REM) sleep.During the NREM stage, parasympathetic activity rises, while heart rate (HR), sympathetic activity, blood pressure, and metabolic rate fall.Neuronal activity is higher during the REM stage than during the NREM stage [12,13].In a typical sleep cycle, 50-60% of sleep is spent in N1 and N2, which are considered light sleep stages; 15-20% in N3, which is considered a deep sleep stage; 20-25% in the REM sleep stage; and 5% or less in the W stage [14].The absence of certain sleep stages in a typical sleep cycle may suggest the presence of sleep disorders.
The activities mentioned above for sleep stages are recorded by a traditional method called polysomnography (PSG) [15], which is considered as the gold standard for classifying sleep stages following the guidelines established by AASM.The physiological signals recorded during PSG include respiratory effort [16], electroencephalogram (EEG) [17], electrocardiogram (ECG) [18], electrooculogram (EOG) [19], and electromyogram (EMG) [20].Each physiological signal recording is divided by sleep specialists into 30-second segments.These segments are then classified into W, NREM (including N1, N2, and N3), or REM sleep stages [21].This classification is based on visual analysis, which is not only time-consuming but also error prone.Furthermore, patients need to be connected to sensors and equipment for extended periods during sleep studies, which can have an impact on the quality of recorded data [22].
Artificial intelligence (AI) has recently been employed in a wide range of clinical medical applications, including surgeries [23], classifications of types of seizure [24], and stage classifications [25].AI research aims to build intelligent tools that can help medical specialists make clinical decisions in the medical field [26].Machine learning (ML) is a subfield of AI that employs algorithms and approaches to create models that learn from data and make predictions or decisions based on that learning.The key drawback of these conventional ML techniques is the need for feature engineering methods to extract features from input data.The model's performance may be restricted by the time-consuming process of designing and selecting relevant features from the input data.However, recently introduced deep learning (DL) has overcome the limitations of conventional machine learning algorithms by employing multilayered neural networks to extract and learn key features from raw data [27].
The utilisation of machine learning (ML) techniques for sleep stage classification has been extensively examined in literature reviews [25, [28][29][30][31][32][33].The systematic reviews in [25, 29,31] focus on models of sleep stage classification based on single-channel EEG signals.The advantages of using single-channel EEG signals include convenience and ease of use, and they can be adapted for use in the patient's home using wearable sensors.Other reviews [28,30,32,33] have focused on models developed using single-channel EEG signals and a combination of physiological signals to classify sleep stages.The benefits of using a combination of physiological signals with ML models can increase accuracy because the model has more information and can extract more discriminated features.
This systematic review focuses on machine learning investigations involving multiple channels of physiological signals, including EEG, ECG, EMG, EOG, and respiratory data for sleep stage classifications.The physiological signals were employed, as either a multichannel or a combination of multiple signals, to develop a model for the classification of sleep stages.We selected research studies that encompassed multiple physiological signals from reviews [30,32,33].Furthermore, we conducted a thorough literature search to identify additional publications aligned with our criteria.Our emphasis on signal combinations sets this review apart and offers a valuable resource for researchers exploring this specialised domain.
The remainder of the article is organised as follows: Section 2 provides a detailed description of the research article selection methodology.Section 3 describes a conceptual framework for the classification of sleep stages.Section 4 reviews different existing models of sleep stage classification.Section 5 highlights the limitations of existing approaches.Section 6 discusses and analyses the existing research used in the classification of sleep stages.Finally, Section 7 summarises our conclusions and identifies directions for future research.

Methodology of Selection Papers
In this review article, we followed a systematic review methodology proposed by Dixon et al. [34].

Data Sources
We systematically searched literature databases, including Scopus, Google Scholar, and PubMed, from the last few decades (approximately from 2000) up to present, with English language restriction.In this systematic review, we focused on studies that employed multiple channels of EEG, ECG, EMG, EOG, and respiratory signals or a combination of these signals to propose models for the classification of sleep stages.We conducted a comprehensive search of the literature to identify additional studies that met our inclusion criteria.We screened the titles and abstracts of the retrieved articles and included studies that employed the specified physiological signals for sleep stage classification.

Data Extraction
A data extraction protocol was defined and evaluated by all authors.The inclusion criteria for this study encompassed studies with keywords related to ("Classification of sleep stage") AND ("Combined physiological signals" OR "Combined EEG ECG EMG EOG respiratory") AND ("Machine learning" OR "Deep learning" OR "Artificial intelligence" OR "Big data").The included document types were indexed journal papers, conference papers, book chapters, and books.Exclusion criteria were applied to filter out studies that did not fall under the subareas of interest, and those that were not in English or did not meet the predefined criteria.

Data Analyses
This article primarily concentrates on conducting a systematic review, rather than a meta-analysis, to explore the classification of sleep stages using intelligent data analysis techniques in the medical field.However, it does not extensively delve into specific details and results obtained from individual case studies.Therefore, the utilization of data analysis techniques within this specific context is not the main focus.

Results
In our analysis, we incorporated a total of 38 papers that fulfilled the predefined inclusion criteria.Figure 1 represents the comprehensive search and selection process, outlining the reasons for excluding certain studies.

Conceptual Framework for the Classification of Sleep Stages
Figure 2 shows the conceptual framework for sleep stage classification.Researchers in this area use various datasets, such as Sleep-edf [35], Sleep-edfx [36], MASS [37], MIT-BIH [38], ISRUC-Sleep [39], SHHS [40], UCD [41], and PhysioNet Challenge [42].In the preprocessing step, datasets are cleaned to exclude missing values and eliminate noise, artefacts, and other distortions.The most common preprocessing methods are filtering [43], normalization [44], and signal conditioning [45].After that, the feature selection step is used to select the most important features that have a significant impact on the model's performance.The two common methods for feature selection are feature engineering and statistical methods.Most of the existing work on sleep stage classification employs feature engineering methods, such as fast Fourier transform (FFT) [46], wavelet transform (WT) [47], short-time Fourier transform (STFT) [48], Pan-Tompkins algorithm [49], and temporal decomposition (TD) [50].Common statistical methods include dynamic wrapping, dispersion entropy, max, mean, skew, and variance features [51].However, recently, some studies proposed to use raw data as input without feature engineering or statistical methods to reduce the complexity of the proposed model.This approach allows deep learning algorithms to learn directly from raw data [52].The next step is the data splitting strategy, which includes cross-validation and random splitting approaches used to split datasets into training and testing sets.The crossvalidation approach divides the dataset into 3, 5, or 10 parts to evaluate the classification models [53].The random splitting approach employs a small fraction of the testing set to evaluate the model (e.g., 70% training set, 15% validation set, and 15% testing set).In addition, some studies use the above splitting strategies with either a subject-wise or non-subject-wise approach.In a subject-wise approach, the training and testing sets do not share any patient's recording samples.Therefore, the patients whose recordings are used in the training set are excluded from the testing set.In contrast, a non-subject-wise approach uses the same patients' recording samples in the training and testing sets.
The final step involves classification using machine learning or deep learning models.In the sleep stage classification models, three types of categorizations for the sleep stages have been used.In the first type, a binary classification has been implied, distinguishing between wake stage (W ) and sleep stage (combining REM and NREM).In the second type, it categorises sleep into three stages: wake (W), non-rapid eye movement (NREM), and rapid eye movement (REM), while the third type involves categorization into five stages: W, N1, N2, N3, and REM.However, it is important to note that binary and three-sleep stage classifications do not completely align with traditional AASM guidelines, while a five-sleep stage classification is consistent with the AASM guidelines.

Literature Review
For this literature review, the selected papers were categorised into five subsections based on the utilised input signals.According to a preliminary survey of the literature, the signals most frequently used in the classification of sleep stages were found to be EEG, EMG, EOG, ECG, and respiratory effort.Each subsection includes a definition of the signals and describes all associated models and their performances.

Electroencephalogram (EEG)
EEG techniques capture the brain's electrical activity.Electrical signals in the brain can be measured by observing changes in the electrical activity between two electrodes over time.The standard method for measuring EEG signals is commonly known as the 10-20 system, which employs a minimum of 21 electrodes [54].The EEG signal displays the diverse properties of brain activity.These activities help to classify sleep stages.Stage W is characterised by alpha activity in the occipital area [55].The N1 stage is the transition between the W and N2 stages, and theta activity is one of the characteristics of EEG activity of the N1 stage [55].Stage N2 is characterised by distinctive features known as spindles and K-complexes.Spindles are brief bursts of high-frequency brain activities, while K-complexes are characterised by sharp, high-amplitude brain activity with a unique appearance in EEG signals [56].Delta activity is indicative of N3, which is a deep sleep stage [57].The REM stage is characterised by rapid, low-voltage theta waves [58].The distinct frequency ranges of EEG signals corresponding to each sleep stage are shown in Table 1. Figure 3 shows a sample of time series EEG data for the five sleep stages.[59] proposed a deep learning model to select important features from EEG signals for sleep stage classification, aiming to reduce reliance on sleep experts.They used two channels of EEG signals (Fpz-Cz and Pz-Oz) as input to the 1D-CNN model.The architecture of 1D-CNN included seven layers of 1D-CNN, a max-pooling layer followed by a fully connected layer to classify five sleep stages.Their model achieved an accuracy of 92.60%.Similarly, Satapathy et al. [60] proposed a deep learning model, which used two channels of EEG signals (C3-A2 and C4-A1) as input to the 1D-CNN model.The architecture of 1D-CNN contained seven blocks that included 1D-CNN, batch normalization, and ReLu layers.The last layer was fully connected with softmax to classify the segments into five sleep stages.Their model was tested on subgroup 1 and subgroup 2 of the ISRUC-Sleep dataset, and the accuracies achieved for the classification of five sleep stages were 97.22% and 95.06%, respectively.
Another study by Delimayanti et al. [61] extracted features from two channels of EEG signals (Pz-Oz, Fpz-Cz) by using an FFT method to improve the accuracy of the classification.The features extracted passed to SVM to classify three sleep stages and five sleep stages.Their model achieved an accuracy of 94.14% and 91.73% for three-and five-sleep-stage classification, respectively.Dequidt et al. [62] conducted a study to explore the utilization of time-frequency representations, such as spectrograms, as input for a fine-tuned VGG-16 network.Their research focused on comparing various spectrograms encoding multiple EEG channels to facilitate the recognition of visual patterns in images.The study reported an achieved accuracy of 82.96% with five-sleep-stage classification.

Electromyogram (EMG)
The EMG technique records muscles' electrical activity during contraction and relaxation during sleep, which makes EMG a significant signal for classifying sleep stages [63].Figure 4 presents a sample of time series EMG data for the five sleep stages, illustrating that EMG activity reaches its peak during the W stage.As we progress from stage N1 to stage N3, EMG activity gradually decreases as the muscles begin to relax.In the REM stage, the EMG activity is at its lowest point as the muscles are inactive and relaxed.

Electrooculogram (EOG)
In sleep research, the measurement of eye movements is crucial for evaluating sleep quality and identifying sleep stages [68].EOG recording is used to capture eye movements: Surface electrodes are positioned around the eye to measure a potential gap between its anterior and posterior poles [69].During the W stage, EOG signals are used to detect rapid eye movements, providing an indication of this stage's characteristics.In the NREM stage, EOG signals record slow eye movements.In contrast, during the REM stage, EOG signals can capture bursts of rapid eye movements, which are a distinctive feature of this stage [70].Figure 5 presents a sample of time series EOG data for the five sleep stages.A study by Sokolovsky et al. [73] proposed a deep learning model that contained a deep network to improve the classification accuracy.Their model inputs two channels of EEG signals (Fpz-Cz and Pz-Cz) and one channel of EOG signal.Their architecture consisted of six layers of 1D-CNN, followed by batch-normalization and max-pooling layers.After that, they added three layers of 1D-CNN, followed by a batch normalization and max-pooling layer.This structure was repeated three times.In the end, they added two max-pooling layers, followed by a 1D-CNN layer, a max-pooling layer, a 1D-CNN layer, and two fully connected layers.Their model achieved an accuracy of 81% for the classification of five sleep stages.Phan et al. [74] used two different datasets (Sleep-edf and MASS) for evaluating their proposed model.They selected a combination of one channel of EEG signal (Fpz-Cz) and one channel of EOG (horizontal) signal from the Sleep-edf dataset.Similarly, they used a combination of one channel of EEG (C4-A1) and one channel of EOG signal (ROC-LOC) from the MASS dataset.They used a short-time Fourier transform method and a 2D-CNN model to classify five sleep stages.The architecture of their model consisted of one layer of 2D-CNN, a max-polling layer, and a multitask softmax layer.Their model achieved an accuracy of 82.30% on the Sleep-edf dataset and 82.50% on the MASS dataset.
Almutairi et al. [67] proposed an SSNet model using a combination of signals of two channels of EEG (Fpz-Cz and Pz-Cz) and one channel of EOG as input.Their model classified the segments into the three sleep stages with an accuracy of 95.65%.Sekkal et al. [75] used two channels of EEG signals (Fpz-Cz and Pz-Cz) and one channel of EOG signal as input.They extracted statistical features from raw data to pass to different machine learning classifiers, such as SVM, RF, and KNN.Their model with an SVM classifier achieved the highest accuracy of 89.1%.Toma et al. [76] proposed an end-to-end CRNN (convolutional recurrent neural network) model for five-sleep-stage classification.The model takes three channels of EEG signals (Pz-Oz and Fpz-Cz) and EOG signal as input.It consists of two branches of 1D-CNN to extract spatial features, followed by several RNN layers.Dropout layers are inserted between the RNN layers to prevent overfitting.Two types of dropout layers, regular dropout and spatial dropout, are used in the model.The study reported an accuracy of 90.30% for five-sleep-stage classification.

Electrocardiogram (ECG) and Respiratory
ECG and respiratory signals can assist with sleep stage classification because the heart rate and respiratory effort change throughout the sleep stages [77].An ECG is a recording of the heart's electrical activity over a timespan.The ECG signal consists of several beats comprising the P wave, QRS complex, and T wave, depending on the individual's heart condition [78,79].The properties of the ECG signal can change during both NREM and REM sleep stages.The heart rate decreases during the NREM sleep stage, while heart rate variability can either increase or decrease depending on the NREM sleep stage.Conversely, both the heart rate and heart rate variability increase during the REM sleep stage, as reported in [80].
Respiratory inductance plethysmography (RIP) is a noninvasive technique for measuring airflow and respiratory effort.Changes in respiratory patterns can help classify sleep stages.For example, during the REM sleep stage, the respiratory system can become more irregular, and the upper airway muscles may become more relaxed, leading to more frequent disruptions in breathing and potential sleep apnoea [81].
We have categorised studies into two subgroups below based on the type of signal combination used as input.The first group includes studies utilising a combination of ECG and respiratory as input.The second group encompasses studies employing a combination of EEG and ECG or a combination of EEG and respiratory as input.
As mentioned earlier, the first group includes studies that utilised a combination of ECG and respiratory as input.Long et al. [82] used statistical features of dynamic wrapping to extract features from ECG and respiratory signals and achieved a 95% accuracy in the binary classification of sleep stages using the LDA classifier.Fonseca et al. [83] applied the Pan-Tompkins algorithm to extract an R-R interval from ECG signals and the mean and variance of respiratory signals to classify three sleep stages with an accuracy of 80% by using a BLD classifier.Casal et al. [84] utilised a combination of signals from ECG and respiratory effort to classify the segments into the binary classification of sleep stages using a two-layered gated recurrent unit (GRU) neural network.They reported achieving an accuracy of 90.13%.
The second group includes studies that utilised a combination of EEG and ECG or a combination of EEG and respiratory as input.For example, Tripathy et al. [85] used R-R intervals from the ECG signal and the dispersion entropy method for extracting statistical features from the EEG signal, achieving a 73.70% accuracy in classifying five sleep stages using multi-fully-connected layers.Yu et al. [86] used a fast Fourier transform method to extract features from one channel of EEG and ECG signals, achieving an accuracy of 99% in classifying five sleep stages using SVM.Tautan et al. [64] proposed a model that takes one EEG (F3-M2) and one respiratory signal channel as input.The model uses an FFT method to extract features from the EEG signal and extract statistical features such as mean, skew, and variance from the respiratory signal.The model achieved accuracies of 93.72% and 52.27% using RF and MLP classifiers, respectively.Moreover, they tested their model by combining one channel of EEG and ECG signals, using an FFT method to extract features from the EEG signal and R-R intervals from the ECG signal.Their proposed model achieved accuracies of 72.52% and 60.28% using RF and MLP classifiers, respectively.Zhao et al. [87] used a combination of two channels of EEG and ECG as input.They passed these signals separately to a 1D-CNN model, which contains five layers of 1D-CNN.The model they developed achieved an accuracy of 98.84% in a binary sleep stage classification.

Combination of Signals
The combination of more than two types of physiological signals provides complementary information that can improve sleep stage classification.This approach can be beneficial because certain features of a sleep stage might be missed by one signal but detected by another [30].
We have categorised studies into two further subcategorises below based on the type of signal combination used as input.The first category includes studies that utilise a combination of four types of signals as input: EEG, EMG, ECG, and respiratory.The second category encompasses studies that employ a combination of three types of signals as input: EEG, EMG, and ECG, or a combination of EEG, EMG, and EOG.
As mentioned earlier, the first category includes studies that utilise a combination of four types of signals: EEG, EMG, ECG, and respiratory as input to classify sleep stages.Only two studies were found in this category.Willemen et al. [88] proposed a model that extracted statistical features and utilised an SVM classifier, achieving an accuracy of 69% in classifying five sleep stages.Furthermore, Helland et al. [89] extracted mean and variance features from raw data and employed a BLD classifier, resulting in an 80% accuracy for the classification of five sleep stages.
The second category encompasses studies that employ a combination of three types of signals: EEG, EMG, and ECG/EOG as inputs to classify five sleep stages.Takatani et al. [90] extracted R-R features from ECG signals and applied fast Fourier transform (FFT) to extract frequency domain features from EEG and EMG signals.The selected features were then evaluated using a linear discriminant analysis (LDA) classifier, resulting in an accuracy of 80%.Biswal et al. [91], on the other hand, used a short-time Fourier transform method to extract frequency domain features.These features were evaluated by passing them through a model that included a combination of 1D-CNN layers and a bidirectional LSTM (Bi-LSTM) layer.Their model achieved a classification accuracy of 87.5%.
In a study by Choi et al. [92], the researchers investigated the utilization of five signal combinations, namely, ECG, EEG, EMG, left-eye EOG, and right-eye EOG.They explored all possible combinations of these signals and determined that the combination of EEG, EMG, and ECG exhibited the most promising outcomes.Statistical features were extracted from these signals, taking into account different window sizes and signal lengths, and an XGBoost classifier was employed to evaluate the performance.The proposed model achieved an accuracy of 85%.
Cui et al. [93] used two channels of EEG signals (C3-A2 and C4-A1), two channels of EOG signals (O1-A2 and LOC-A2), and one channel of EMG signals (X1).They applied finegrained segmentation and a 2D-CNN model to classify five sleep stages.The architecture of the 2D-CNN model included two 2D-CNN layers, max-pooling layers, and a fully connected layer.The classification of five sleep stages by their model resulted in an accuracy of 90.12%.Zhang et al. [94] proposed a method that combined short-time Fourier transform features with raw data to classify five sleep stages using a 2D-CNN model.The architecture of their model consisted of two 2D-CNN layers, followed by a max-pooling layer, an LSTM layer, and a fully connected layer.Their approach achieved an accuracy of 86%.
Chambon et al. [95] proposed a 2D-CNN model to extract features from a combination of six channels of EEG signals and two channels of EOG signals.The 2D-CNN architecture consisted of three layers of 2D-CNN and max-pooling layers.Additionally, they utilised a separate 2D-CNN model to extract features from three channels of EMG signals.All the extracted features were then combined and passed to a fully connected layer for the classification of five sleep stages.The proposed model achieved an accuracy of 79%.Phan et al. [74] proposed a model that combined a short-time Fourier transform method with multitask CNN layers for the classification of five sleep stages.The multitask CNN architecture comprised one layer of 2D-CNN, a max-pooling layer, and a multitask softmax layer.The model used a combination of one EEG signal channel (C4-A1), one EOG signal channel (ROC-LOC), and two EMG signal channels (CHIN1-CHIN2) as input, and the model achieved an accuracy of 81.2%.
Xu et al. [96] utilised a combination of EEG signals (Fpz-Cz and Pz-Oz), an EOG signal (Horizontal), and an EMG signal as input.To extract features from raw data, they employed a 1D-CNN model consisting of a convolution block (four 1D-CNN layers and max-pooling layers) and a reduction block (input passed to two max-pooling layers and three layers of 1D-CNN).These blocks were repeated four times, followed by two 1D-CNN layers and a fully connected layer.The model's performance was evaluated on two datasets, achieving an accuracy of 85.40% on the Sleep-edf dataset and 81.60% on the Sleep-edfx dataset for classifying five sleep stages.Similarly, Sharma et al. [97] utilised two channels of EEG signals (C3-A2 and C4-A1), one channel of an EMG signal, and two channels of EOG signals (EOG-L and EOG-R) as input.They applied a wavelet decomposition method to extract frequency domain features and evaluated the performance of these features by using a bagging tree classifier for the classification of three and five sleep stages.The model achieved an accuracy of 95.44% and 95.20% for the classification of three and five sleep stages, respectively.
Yan et al. [98] utilised a combination of EEG, EMG, and EOG signals to feed into a 1D-CNN model with four layers of 1D-CNN, followed by max-pooling layers, achieving an accuracy of 73% for the classification of five sleep stages.Then, they applied an STFT method to extract frequency domain features from the raw data.These features were passed to the 1D-CNN model, which resulted in an improved accuracy of 74.24%.Almutairi et al. [67] utilised two datasets, Sleep-edfx and ISRUC-Sleep, to evaluate their SSNet model's performance for the classification of three and five sleep stages.From the Sleep-edfx dataset, they chose two EEG channels (Fpz-Cz and Pz-Cz), one EMG channel, and one EOG channel as input.Meanwhile, from the ISRUC-Sleep dataset, they selected two EEG channels (C3-A2 and C4-A1), one EMG channel (X1), and two EOG channels (O1-A2 and LOC-A2) as input.Their SSNet model achieved accuracies of 94.64% and 91.22% for three-and five-sleep-stage classification on the Sleep-edfx dataset, respectively.On the ISRUC-Sleep dataset, the SSNet model reported accuracies of 94.34% and 90.98% for three and five sleep stages, respectively.Satapathy et al. [99] utilised EEG (C3-A2), EMG (X1), and EOG (ROC-A2) signals as input, and passed them to a 1D-CNN model consisting of nine layers.Their model demonstrated high accuracy for the classification of three and five sleep stages of subgroup 1 of the ISRUC-Sleep dataset, with reported accuracies of 98.61% and 98.46%, respectively.Furthermore, the model was tested on subgroup 2 of the ISRUC-Sleep dataset, yielding accuracies of 98.78% and 98.46% for the classification of three and five sleep stages, respectively.Another study by Satapathy et al. [100] employed the same signals as in their previous study [99] and extracted statistical features to pass to an RF classifier.Their model classified five sleep stages on subgroups 1 and 2 of the ISRUC-Sleep dataset, achieving accuracies of 98.52% and 98.46%, respectively.Toma et al. [101] proposed a model for sleep stage classification, which aims to classify five sleep stages using features extracted from four distinct channel signals, namely, EEG (Fpz-Cz, Pz-Oz), EOG, and EMG signals obtained from PSG recording.The model architecture consists of two key building blocks: the "Conv Block" and the "Bi-LSTM Block".The Conv Block includes two consecutive 1D convolutional layers, a max-pooling layer, and a dropout layer for extracting spatial features from the input signals.On the other hand, the Bi-LSTM Block comprises a Bi-LSTM layer, a max-pooling layer, and a dropout layer to capture and learn temporal correlations in the data.By concatenating the outputs of these dual-channel convolutional Bi-LSTM network modules, the model classifies the five sleep stages and reported an accuracy of 91.44% in their study.
Later, Pei et al. [102] proposed a hybrid model that combined multiple signals, including EEG (C4-A1), EOG (EOGL and EOGR), and EMG signals.They fed the signals to a model architecture that consisted of seven layers of 1D-CNN and GRU.They tested their model on the SHHS dataset, utilising 717,883 segments, and their model achieved an accuracy of 83.15%.Huang et al. [103] proposed a DeConvolution-and Self-Attention-based Model (DCSAM) as a novel approach for the classification of five sleep stages.DCSAM has the capability to reverse the feature map of a hidden layer, mapping it back to the input space.The DCSAM model comprises five layers of 1D-CNN, followed by max-pooling layers.The final two layers consist of an attention layer and a fully connected layer.Their model achieved an accuracy rate of 90.26%.

Gaps in Literature
This section aims to provide a valuable resource for scholars who are seeking to gain a comprehensive understanding of the limitations within the current state of the literature.The classification of sleep stages is crucial as it helps in detecting sleep disorders, which can have significant implications for life-threatening conditions.The literature review indicates that the existing framework used for sleep stage classification encounters one or more of the following limitations.

Testing of Multiple Datasets
Enhanced reliability and generalizability of a proposed model can be achieved by evaluating a model with multiple datasets collected through the implementation of diverse recording equipment and laboratory practices.As a result, it guarantees the model's strength and effectiveness in handling various scenarios [106].A considerable amount of the existing literature assesses models using a single dataset.However, a growing body of literature as highlighted in [60,67,96,97,99,100] tested their models across multiple datasets with variations in data collection environments.Results from these studies highlight the critical role that the diversity in datasets and data collection methods plays in improving a model's robustness.

Splitting Strategy
The splitting strategy used to divide a dataset into training, validation, and testing sets can impact a model's performance in the classification of sleep stages [107].Many studies have employed random allocation, where data in the dataset are randomly divided into fixed percentages of training, validation, and testing, such as 70% training, 15% validation, and 15% testing, or 90% training and 10% testing.Another cross-validation approach has been utilised to calculate the average accuracy of the entire dataset.Another factor that can significantly influence model performance is the choice between a subject-wise or a non-subject-wise strategy.In a subject-wise approach, the model may recognise patterns in the test data in a more effective and generalisable way, as training and testing sets do not include the same subjects.Conversely, a non-subject-wise approach may result in the model being unable to adequately generalise, as it recognises similar patterns in training and testing data [108].These considerations highlight the importance of carefully selecting the splitting strategy and considering the subject-wise or non-subject-wise approach to ensure an accurate, reliable classification of sleep stages.

Computational Complexity
Computational complexity is mainly associated with training and deploying deep learning models.Deep neural networks often exhibit several parameters, leading to increased computational demands and longer training times [109].Researchers have tackled this problem by prioritising the development of low-parameter deep learning models.Therefore, there is a need to propose new, efficient models that are less computationally complex while maintaining high performance standards.

Imbalanced Dataset
The classification of sleep stages is hindered by the limitations posed by imbalanced datasets.Sleep stage classification involves training machine learning models to accurately identify sleep stages based on physiological signals.However, imbalanced datasets arise due to the uneven distribution of samples across sleep stages.Sleep time is predominantly spent in the N2 stage, while other stages, such as N1, N3, and REM, are comparatively less frequent.This inherent class imbalance leads to a bias in the model's performance, with a tendency to favour dominant classes.Consequently, minority classes such as N1, N3, and REM sleep stages may be poorly classified [110].The insufficient representation of these under-represented classes makes it challenging for models to learn their distinctive characteristics.

Scarcity of Studies Using a Combination of Signals for Sleep Stage Classification
In this study, we identified only 38 out of 1427 studies that utilised a combination of signals and machine learning models.This finding underscores the prevalent focus of researchers on utilising single-channel EEG signals with machine learning.The utilization of multiple signals available from PSG is reported in the identified studies to provide additional features that aid in accurately classifying sleep stages.Hence, we suggest that future studies explore the use of combined physiological signals to enhance the accuracy of sleep stage classification.

Discussion
This systematic review investigated the effectiveness of using machine learning on a combination of physiological signals in sleep stage classification.The combination of signals from multiple physiological sources has gained attention, as it has been found to be a promising approach for enhancing the accuracy and reliability of sleep stage classification.By leveraging complementary information captured by signals, researchers aim to improve the overall performance of sleep stage classification models.The studies included in the literature review were characterised based on the type and number of physiological signals used, the classification models employed, and the accuracy achieved.Figure 6 presents the distribution of the total number of studies that utilised either multiple channels of a single type of physiological signal or a combination of signal types for the classification of sleep stages.The selection of a signal's channels is critical for the performance of the sleep stage classification model.Typically, multiple signal channels are used instead of using a single channel.However, utilising additional channels can increase the costs associated with the recording configuration and impose a greater computational complexity on machine learning models.Thus, the selection of channels balancing the accuracy and efficiency is an important research area.For example, Cui et al. [93] observed that increasing the number of channels correlated with enhanced model performance.Chambon et al. [95] demonstrated that their research using a set of six EEG channels produced results comparable to those obtained with a larger set of 20 EEG channels.Sharma et al. [97] conducted a comprehensive investigation that explored 15 signal configurations.Notably, among this array of combinations, the one that incorporated the specific set of five channels, as proposed in their research, consistently demonstrated superior performance in the classification of sleep stages.Furthermore, Sekkal et al. [75] comprehensively compared signal combinations and a single-channel EEG.They compared combinations of signals with a single-channel EEG.The study found that when specific classifiers were used, there was only a small decrease in accuracy, even when using a single EEG channel or different signal combinations.This suggests that the choice of classifiers plays a significant role in maintaining the accuracy of sleep stage classification.Investigations by Almutairi et al. [67] and Dequidt et al. [62] exploring numbers of channels of signal combinations revealed the potential for significant improvements in model results by utilising all available channels from the dataset used in their proposed models.
It is also observed from the literature that majority of the studies for the classification of sleep stages used feature extraction methods as a pivotal step in their data processing pipelines.Figure 7 presents the distribution of studies utilising different feature extraction methods and raw signals as inputs to machine learning models.A total of 27 studies have chosen to employ feature extraction techniques to extract crucial information or features from raw data.These methods are designed to condense and represent underlying patterns in a more informative manner [111].In parallel, an alternative approach has been embraced by 14 studies, wherein they directly utilise raw, unprocessed data as input for their classification models.This distinction highlights the variety of methodologies used within this research domain, where some researchers prioritise feature engineering, while others take advantage of deep learning and sophisticated machine learning techniques with raw data input [111].These datasets stand out due to their availability and diversity.In contrast, the usage of the PhysioNet Challenge 2018 and the MIT-BIH, UCD, and SHHS datasets is comparatively low due to restrictive access and small size.However, when comparing the performances of machine learning models across studies, a significant challenge arises due to variations in datasets and sample sizes.These dataset differences, including data source, diversity, and size, introduce confounding factors that complicate direct model comparisons.A model trained on a small, specialised dataset may excel within that context but might not generalise to another dataset with distinct characteristics [112].Therefore, it is essential to recognise that identifying the 'best' model is highly context dependent, and meaningful comparisons necessitate careful consideration of the data and sample sizes underlying each model's evaluation.In our review, we found that 14 studies proposed models based on 1D-CNN, and 6 studies proposed models based on 2D-CNN.These 1D-CNN models have the advantage of being computationally less complex than 2D-CNN models.In addition, 2D-CNN-based models require input signals to be converted from 1D to 2D.This conversion process must be carefully handled to prevent the potential loss of important information [113].Therefore, 1D-CNN models are well suited for real-time applications, such as home-based sleep stage classification.

Conclusions
This systematic review targets studies that employ machine learning techniques for sleep stage classification using combined multichannel physiological signals.These studies utilise signal combinations to enhance classification accuracy, with EEG, EMG, and EOG signals being the most frequently used inputs for machine learning models.Most reviewed studies proposed a variety of machine learning models for both three-and five-sleep-stage classifications.Additionally, a prevailing preference was observed for feature engineering methods over raw data utilisation.Furthermore, the review highlights an emerging trend that underscores the potential benefits of leveraging combined signals and deep learning algorithms to achieve improved sleep stage classification.This trend represents a promising direction for future research and application in the field of sleep medicine.
To further advance sleep stage classification, future studies are recommended to consider additional metrics such as specificity, sensitivity, F1 score, and kappa for evaluating ML models.These metrics are especially beneficial when dealing with imbalanced datasets.Moreover, researchers are encouraged to evaluate model performance using both subjectwise and non-subject-wise evaluation approaches.This comparative analysis will yield valuable insights into the generalisability and effectiveness of the models across diverse data distributions.In the context of addressing imbalanced datasets, future research should also consider implementing data augmentation techniques to improve models' performance.Class imbalance difficulties can be overcome by creating synthetic samples from the minority class through data augmentation, thereby enhancing classification accuracy.Future research in the domain of sleep stage classification must prioritise the investigation of the most effective combination of physiological channels required for accurate and efficient classification.

Figure 1 .
Figure 1.Comprehensive search and selection process for systematic review.

Figure 2 .
Figure 2. Conceptual framework for the classification of sleep stages.

Figure 3 .
Figure 3.Samples of EEG patterns in five sleep stages from the Sleep-edfx dataset [36].
Few studies have explored the use of combining EEG and EMG signals to enhance the accuracy of sleep stage classification.Tautan et al. [64] used a combination of one channel of EEG (F3-M2) and one channel of EMG signals as input.They extracted statistical and FFT features from the raw data to pass to RF and MLP classifiers to classify five sleep stages, achieving accuracies of 88.65% and 66.70%, respectively.Akin et al. [65] proposed a machine learning model that used one channel of EMG and one channel of EEG signals (C3-A2) as input, and applied a wavelet transform (WT) to extract features from the signals.The deep neural network (DNN) model they developed achieved a 98% accuracy in classifying five sleep stages.Kim et al. [66] used a temporal decomposition method to extract features from one channel of EEG signal (Fpz-Cz) and one channel of EMG signal, and used SVM as a classifier to classify five sleep stages, achieving an accuracy of 93.8%.Almutairi et al. [67] selected multichannel EEG signals (Fpz-Cz and Pz-Cz) and one channel of EMG signal as input, and passed them through a deep learning model named as SSNet model containing two deep learning architectures.The first architecture contained five 1D-CNN layers, and the second architecture contained two LSTM layers.The features extracted from the two architectures were combined and passed to a fully connected layer to classify three sleep stages, achieving an accuracy of 95.46%.

Figure 4 .
Figure 4. Samples of EMG patterns in five sleep stages from the Sleep-edfx dataset [36].Muscle activity exhibits a gradual reduction from the wake (W) stage to the REM (rapid eye movement) stage.

Figure 5 .
Figure 5.Samples of EOG patterns in five sleep stages from the Sleep-edfx dataset [36].Wake (W) shows frequent eye movements, the NREM stages display sporadic eye movements and unique patterns, and the REM stage exhibits rapid distinct eye movements.The characteristics of EOG signals have been utilised in proposing models for classifying sleep stages.Estrada et al. [71] used a feature engineering method with fuzzy rules for the classification.Two channels of EOG and EMG signals were combined and passed to an FFT method to extract features.Fuzzy rules were used to predict the final results for the classification of five sleep stages.The study did not report the performance of their proposed model.Yildirim et al. [72] proposed a deep learning model to extract features from a combination of EEG and EOG signals.They selected one channel of EEG signal (Fpz-Cz) and one channel of EOG signal (horizontal).These features passed to a 1D-CNN model.Their architecture consisted of two layers of 1D-CNN and max-pooling layers, and the order of these layers was repeated five times.The final layers were two fully connected layers to classify the segments into three and five sleep stages.They tested their model on two datasets (Sleep-edf and Sleep-edfx).The model achieved an accuracy of 94.64% for three sleep stages and 91.22% for five sleep stages on the Sleep-edf dataset.Similarly, the model achieved an accuracy of 94.34% for three sleep stages and 90.98% for five stages on the Sleep-edfx dataset.A study by Sokolovsky et al.[73] proposed a deep learning model that contained a deep network to improve the classification accuracy.Their model inputs two channels of EEG signals (Fpz-Cz and Pz-Cz) and one channel of EOG signal.Their architecture consisted of six layers of 1D-CNN, followed by batch-normalization and max-pooling layers.After that, they added three layers of 1D-CNN, followed by a batch normalization and max-pooling layer.This structure was repeated three times.In the end, they added two max-pooling layers, followed by a 1D-CNN layer, a max-pooling layer, a 1D-CNN layer, and two fully connected layers.Their model achieved an accuracy of 81% for the classification of five sleep stages.Phan et al.[74] used two different datasets (Sleep-edf and MASS) for evaluating their proposed model.They selected a combination of one channel of

Figure 6 .
Figure 6.Distribution of studies using multiple channels of a single type of physiological signal or a combination of different types of physiological signals for the classification of sleep stages.Most of the reviewed studies utilised a combination of EEG + EOG + EMG signals for sleep stage classification, as these signals provided more accurate discrimination between sleep stages.They capture both brain activity and eye movement patterns that characterise each sleep stage.Additionally, incorporating EMG signals provides valuable information about muscle activity and helps differentiate sleep stages with varying muscle tone [67].The selection of a signal's channels is critical for the performance of the sleep stage classification model.Typically, multiple signal channels are used instead of using a single channel.However, utilising additional channels can increase the costs associated with the recording configuration and impose a greater computational complexity on machine learning models.Thus, the selection of channels balancing the accuracy and efficiency is an important research area.For example, Cui et al.[93] observed that increasing the number of channels correlated with enhanced model performance.Chambon et al.[95] demonstrated that their research using a set of six EEG channels produced results comparable to those obtained with a larger set of 20 EEG channels.Sharma et al.[97] conducted a comprehensive investigation that explored 15 signal configurations.Notably, among this array of combinations, the one that incorporated the specific set of five channels, as proposed in

Figure 7 .
Figure 7.The distribution of studies used feature extraction methods or raw data as input to machine learning.

Figure 8
Figure 8 illustrates the distribution of the utilisation of sleep datasets in studies dedicated to sleep stage classification through the utilisation of ML techniques.The most frequently used open-source datasets in sleep research are ISRUC-Sleep, Sleep-edf, Sleepedfx, and MASS.These datasets stand out due to their availability and diversity.In contrast, the usage of the PhysioNet Challenge 2018 and the MIT-BIH, UCD, and SHHS datasets is comparatively low due to restrictive access and small size.However, when comparing the performances of machine learning models across studies, a significant challenge arises due to variations in datasets and sample sizes.These dataset differences, including data source, diversity, and size, introduce confounding factors that complicate direct model comparisons.A model trained on a small, specialised dataset may excel within that context but might not generalise to another dataset with distinct characteristics[112]. Therefore, it is essential to recognise that identifying the 'best' model is highly context dependent,

Figure 8 .
Figure 8.The distribution of the utilization of each sleep dataset in studies employing ML techniques for sleep stage classification.

Figure 9
Figure9illustrates the ML models proposed in the literature to classify sleep stages.CNNbased architectures are the most popular models for classifying sleep stages.In our review, we found that 14 studies proposed models based on 1D-CNN, and 6 studies proposed models based on 2D-CNN.These 1D-CNN models have the advantage of being computationally less complex than 2D-CNN models.In addition, 2D-CNN-based models require input signals to be converted from 1D to 2D.This conversion process must be carefully handled to prevent the potential loss of important information[113].Therefore, 1D-CNN models are well suited for real-time applications, such as home-based sleep stage classification.

Figure 9 .
Figure 9. Distribution of different machine learning models for the classification of sleep stages.

Table 1 .
Distinct frequency ranges of EEG signals corresponding to each sleep stage.

Table 2
lists studies that used two or more physiological signals for sleep stage classification.Few studies of sleep stage classification models have been developed utilising multiple channels of EEG signals as inputs.Blanco et al.

Table 2 .
Selected studies used two or more physiological signals for sleep stage classification.