Distributional Representation of Cyclic Alternating Patterns for A-Phase Classiﬁcation in Sleep EEG

: This article describes a detailed methodology for the A-phase classiﬁcation of the cyclic alternating patterns (CAPs) present in sleep electroencephalography (EEG). CAPs are a valuable EEG marker of sleep instability and represent an important pattern with which to analyze additional characteristics of sleep processes, and A-phase manifestations have been linked to some speciﬁc conditions. CAP phase detection and classiﬁcation are not commonly carried out routinely due to the time and attention this problem requires (and if present, CAP labels are user-dependent, visually evaluated, and hand-made); thus, an automatic tool to solve the CAP classiﬁcation problem is presented. The classiﬁcation experiments were carried out using a distributional representation of the EEG data obtained from the CAP Sleep Database. For this purpose, data symbolization was performed using the one-dimensional symbolic aggregate approximation (1d-SAX), followed by the vectorization of symbolic data with a trained Doc2Vec model and a ﬁnal classiﬁcation with ten classic machine learning models for two separate validation strategies. The best results were obtained using a support vector classiﬁer with a radial basis kernel. For hold-out validation, the best F1 Score was 0.7651; for stratiﬁed 10-fold cross-validation, the best F1 Score was 0.7611 ± 0.0133. This illustrates that the proposed methodology is suitable for CAP classiﬁcation.


Introduction
Sleep is a reversible behavioral state characterized by unresponsiveness and perceptual disengagement from the environment [1].Sleep is studied systematically through polysomnography (PSG), in which electroencephalography (EEG), electrooculography (EOG), electromyography (EMG), electrocardiography (ECG), and respiratory signals (such as pulse oximetry, airflow, and respiratory effort) are recorded to help experts understand the physiological sleep processes and evaluate the underlying causes of diverse sleep disturbances.
The EEG signals extracted from PSG, also named sleep EEG, contain valuable information about the structure of sleep that is organized and analyzed through its macro and micro-structure.
The so-called sleep macro-structure is defined as the basic structural organization of normal sleep, which is presented in cycles and is classified into two main types.

•
Rapid eye-movement (REM) sleep: This is also named active sleep after the presence of rapid eye movements and the possibility of having dreams.• Non-rapid eye-movement (NREM) sleep: This is also named inactive sleep and is subdivided into the stages N1, N2, and N3, each meaning a more profound (i.e., less responsive to the environment) sleep state.
On the other hand, sleep micro-structure is defined by the quantification of the graphic elements present in the different sleep stages.The list of these graphic elements includes sleep spindles, slow-wave activities, sharp waves, arousals, and cyclic alternating patterns [2].It is important to state that the mentioned graphic elements are usually short-length and user-dependent, and, if present, their labels are visually evaluated and hand-made by sleep experts, which may lead to human mistakes.
Further to being inherently time-and effort-consuming, sleep descriptors are typically evaluated using vastly different signal processing (or visual) techniques.This makes it more challenging to combine the findings into a consistent description of the sleep processes at both the macro and micro levels.

Cyclic Alternating Pattern
The object of this study is the cyclic alternating pattern (CAP), defined by an alternating sequence of two characteristic EEG patterns, each lasting between 2 and 60 seconds [3].These patterns are the A-phase, which is composed of lumps of sleep phasic events, and the B-phase, which is simply the return to the background EEG.
The sequence of A-B phases is also called a CAP cycle.Yet, there is another sequence composed of the same EEG patterns.This is the CAP sequence, which contains at least two complete CAP cycles in succession.The minimum content of a CAP sequence is, therefore, A-B-A-B.Examples of CAP cycles and sequences are shown in Figure 1.Furthermore, A-phase activities are classified referring to the proportion of EEG synchrony (high-voltage slow waves) and EEG desynchrony (low-voltage fast rhythms) presented throughout the A-phase.The three A-phase types are: • A1: EEG activity is dominated by EEG synchrony and, if present, EEG desynchrony occupies less than 20% of the complete A-phase duration; • A2: EEG activity is a combination of slow and fast rhythms, with EEG desynchrony occupying between 20% and 50% of the entire A-phase; • A3: Fast, low-amplitude rhythms dominate EEG activity, and strictly more than 50% of the A-phase is occupied by EEG desynchrony.Terzano et al. (2001) [4] described the detailed rules to score the CAP since it is considered a valuable EEG marker of unstable sleep.CAP rate is an interesting CAPrelated parameter that can be computed by following these rules.
In the same direction, manifestations of CAP phases have been linked to some specific phenomena.For example, the A-phase represents a favorable condition for the onset of nocturnal motor seizures in generalized and local (frontotemporal) epilepsy [9], bruxism [12], and periodic leg movements [13].
On the other hand, the B-phase appears to be chronologically related to inhibitory events in nocturnal myoclonus in epileptic patients [5,9].
Additionally, CAP is a valuable parameter in the research of sleep disorders across all live stages since it can be detected both in child and adult sleep [14].Thus, the detection and classification of CAPs are meaningful to understanding some physiological and pathological sleep processes.
However, CAP detection and classification are commonly not carried out routinely due to the time and attention this problem requires.An automatic tool to solve this task would be helpful to massively analyze CAPs and possibly unravel new interesting relations between CAPs and different health conditions.

Hypothesis
Even though there are numerous investigations on the automatic detection and classification of a CAP and its sub-types (as discussed in Section 2), this research focuses on the implementation of natural language processing (NLP) and pattern recognition techniques on the same problem.
The main reason for exploring these techniques is to (i) reduce the complexity of the problem by representing the time-series data as shorter sequences of symbols and (ii) try to obtain interpretable results by applying NLP techniques such as word embeddings.
Word embeddings are a vectorial representation of words for text analysis, as proposed in [15].This representation is based on mapping words (included in a collection of phrases or documents) to real-number vectors that rely on the distributional hypothesis.This hypothesis was popularized by Joseph R. Firth in 1957, who stated that "words that occur in similar contexts tend to have similar meanings" [16].
By assuming that signals can be translated into symbols and then organized as "words", they can also be, theoretically, represented as word embeddings.Then, the embeddings can be analyzed with machine learning models, as carried out in NLP.
This being the case, the hypothesis for this research is that using a distributional approach (based on the distributional hypothesis) for EEG signal representation will allow for the implementation of NLP techniques to solve the CAP detection and classification problem.This will be achieved using a context-based vectorial representation of signal segments as if they were words or phrases in a text.

Literature Review
This section reviews some of the most relevant investigations for this research and is divided into two subsections.Section 2.1 will describe some recent research works on time-series data mining and sleep macro-structure analysis.Section 2.2 will review some research works on micro-structure analysis, specifically on CAP detection and classification.

Time Series Data Mining
Data mining (DM) is a set of methodologies that analyze large datasets, aiming to identify patterns and relationships that can help solve different types of problems.This is achieved by four main processes: data gathering, data preparation, data mining/processing, and a final analysis and interpretation of the results [17].Time-series data mining (TSDM) applies these processes to sequential data, i.e., data recorded over time [18].
However, when analyzing some specific time series (such as biomedical signals), TSDM focuses on particular areas of interest, known as events, instead of evaluating the entire time series [19].Hence, this subsection describes some of the research works on TSDM that are interesting for the present proposal.Li et al. (2012) developed a methodology to visualize variable-length time-series motifs by implementing the symbolic aggregate approximation (SAX) [20].A grammar-based compression algorithm (greedy and heuristic) was implemented for motif detection.This methodology was performed on ECG signals, and the results demonstrated that recurrent patterns can be effectively identified with grammar induction in time series, even without prior knowledge of their lengths.
Wave2Vec is a tool for vectorizing EEG signals to predict a brain disease (alcoholic vs. non-alcoholic patients) proposed by Kim et al. (2018) [21].This prediction was achieved by quantizing fixed-length EEG segments to one of the hexadecimal symbols of a fixed "bagof-symbols".Subsequently, vectorization was performed with a similar model to Word2Vec.Finally, three classification models based on deep neural networks (DNN), convolutional neural networks (CNN), and recurrent neural networks (RNN) were compared.
Grammar induction for detecting anomalies in time series was implemented by Gao et al. in 2020 [22].This research was carried out on ECG signals transformed into symbols with SAX.The resulting symbolic sequences were analyzed through another NLP technique named grammar induction, where a set of rules that best describes the analyzed phenomena is found.Numerosity reduction was performed to simplify the rules found by the algorithm and to finally implement a "Rule Density Function" for anomaly detection.
It is essential to mention that the previous research had objectives substantially different from those of the current research.Li et al. (2012) [20] and Gao et al. (2020) [22] used ECG signals, which are significantly more periodic than EEG.Although Kim et al. (2018) [21] implemented their tool on EEG signals, they were searching for another type of information within them (to predict alcoholism).None of the previous examples use sleep EEG or search for CAPs.
In recent years, sleep analysis, especially the sleep staging task, has also been explored [23,24].Nevertheless, the methodologies proposed to solve this task are only indirectly related to the current research, which inquires into the symbolic transformation and/or vectorization of the signals.
Sleep stage classification labels 30-second-long PSG segments, known as sleep epochs.The labels are usually visually scored and divided into deep-sleep, light-sleep, and awake or, more precisely, S1 or N1, S2 or N2, S3 or N3, and R or REM, which are used to analyze sleep macro-structure.
In this direction, Joe et al. (2022) [23] analyzed EEG and EOG images in terms of their time and frequency domains by using a CNN on the Sleep-EDFx dataset [25].They achieved results of 94% both in accuracy and F1 scores.Alternatively, Zhang et al. (2023) [24] analyzed a single channel (ECG) to analyze the sleep structure of three different datasets, obtaining accuracy values of 0.849, 0.827, and 0.868.

CAP Detection and Classification
For sleep micro-structure analysis, Rosa et al. (1999) pioneered the automatic detection of CAP sequences during sleep.They implemented feature extraction and detection with the maximum likelihood and a variable length template-matched filter [26].The preliminary results of classifying CAP vs. Non-CAP segments were achieved by using a state machine ruled-based decision system on a group of four middle-aged adults.
A tool to analyze the micro and macro-structure of sleep EEG was implemented by Malinowska et al. (2006) [27].They detected deep-sleep stages and arousals, the continuous description of slow wave sleep, and the measure of spindling activity by using adaptive time-frequency approximations with the matching pursuit (MP) algorithm.
Though quantifiable results were not reported in this research, this tool is relevant due to the interest in automating sleep EEG analysis since 2006.
More recently, Hartmann et al. (2019) [28] implemented a long short-term memory (LSTM) model to detect A-phases and classify them into three subtypes: A1, A2, and A3.From the CAP Sleep Database, they removed epochs marked as "Awake" and "REM Sleep" and worked with two patient subsets: 16 healthy patients and 30 diagnosed with nocturnal frontal-lobe epilepsy (NFLE).They achieved averaged F1 scores ranging from 57.37% to 67.66% in the different modalities of classification (A-phase vs. non-A-phase and A1 vs. A2 vs. A3 vs. B), as shown in Table 1.A two-dimensional convolutional neural network (2D-CNN) was implemented by Arce Santana et al. (2020) [29] to detect A-phases and classify them into the A1, A2, and A3 subtypes by using a different approach.They segmented nine EEG recordings in 4-second long epochs and computed their spectrograms, which are visual representations of the frequency content of a time series.They fed the resulting images to a deep 2D-CNN and obtained mean accuracy scores of 88.09% in A-phase detection and 77.31% in A-phase classification (A1 vs. A2 vs. A3).Unfortunately, no other metric is reported, restricting the performance analysis within unbalanced data, which is the case of the A-phase subtypes.
The automated detection of CAPs and the classification of sleep stages using a deep neural network was proposed by Loh et al. (2021) [30].Six healthy patients from the CAP Sleep Database were used to segment, standardize, and classify the sleep stages and CAP patterns with a one-dimensional convolutional neural network model (1D-CNN).Loh et al. (2021) reported F1 scores of 75.34% and 33.04% for CAP detection in a balanced and an unbalanced dataset, respectively.Overall, the summarized metrics shown in Table 1 reveal that the model performs better on the balanced dataset than on the unbalanced dataset.This is directly caused by the number of B-phase examples (87.4% of the unbalanced dataset), which increased the difficulty of identifying the A-phase examples.
A novel approach was explored by Tramonti Fantozzi et al. ( 2021) to automate A-phase detection (A-phase vs. Non-A-phase) through local and multi-trace analysis [31].They found that channel F4-C4 performed better than all the other analyzed channels, achieving F1 scores from 61.38% to 63.88% via local analysis on this channel.In comparison, their multi-trace approach resulted in F1 scores ranging from 64.34% to 66.78% on the different patient subsets.A total of 41 recordings from the CAP Sleep Database were analyzed in this research.Further details of the data and the methodology are shown in Table 1.
Finally, the GTransU-CAP model was designed by You et al. ( 2022) [32], and this was trained on the same data subset from [28].This model represents an automatic labeling tool for CAPs in sleep EEG, using a gated transformer-based U-Net framework with a curriculum-learning strategy.In A-phase detection, F1 scores of 67.78% in healthy patients and 72.16% in patients with nocturnal front lobe epilepsy (NFLE) were achieved.For the A-phase subtype classification, the non-weighted average F1 scores are 59.45% and 59.55% for healthy and epileptic patients, respectively.
A summary of the previous literature review is presented in Table 1, where the different methodologies and classification approaches can be identified.Additionally, a list of the metrics reported for each approach is included in the last column.The diversity of the used data subsets, classification approaches, and performance measurements reported by the researchers hinders a direct comparison between them.
As described in Table 1, a diverse range of A-phase classification methodologies exist, and a key point stood out: only three investigations included the identification of A-phase subtypes (A1, A2, and A3) under two different classification strategies (one including only A-phase subtypes and the second including also B-phases).Thus, there is still a window of opportunity to solve the A-phase classification problem.
The literature review's extended results are presented in Table A1, which reveals that classifying the naturally unbalanced CAP data is not trivial.The number of B-phases compared to A-phases hinders A-phase detection [30].Moreover, the nature of the A2 subtype (basically, a mixture of A1 and A3) hampers its correct classification [28,32].
Finally, since none of the previous works reviewed a symbolic or distributional approach to sleep micro-structure analysis, there is a chance to explore these approaches and implement new tools based on them.

Materials and Methods
In this section, the proposed methodology is described in four subsections.First, the selected database is described.Second, the process of symbolization of the EEG signals is detailed.Third, the vectorization tool is defined.Finally, the proposed classification models and the evaluation metrics are detailed.
Figure 2 shows the flowchart corresponding to the methodology implemented for 1 dataset selection, 2 symbolic transformation, and 3 the vectorization process, includ-ing the Doc2Vec non-supervised training.The final step of the first part of the methodology is 4 to save the concatenated vectorized data.The classification steps are described in Figure 3, where the proposed architectures for classification under two different validation strategies are detailed.This methodology starts with 5 concatenated vectorized data loading, followed by 6 data splitting, and the 7 model's training and testing under the corresponding validation strategy.Finally, the 8 evaluation metrics are computed for 9 model comparisons or performance averaging.
These processes are shown in parallel for simplicity, although hold-out validation was implemented before the K-fold cross-validation in order to be able to exclusively select and analyze the best-performing model.
In contrast to the methodology described to train the Doc2Vec models (Figure 2), where unsupervised learning is implemented, supervised learning is used to train the classification models (Figure 3).For the first approach to the problem, the second largest group of patients was selected, i.e., the 22 RBD recordings.Additionally, channels Fp2-F4, F4-C4, Fp1-F3, and F3-C3 were selected due to the presence of CAPs in the frontocentral regions of the brain [34].Figure 4 schematizes the electrode placements for the selected channels (red and blue), following the 10-20 system [35].An analysis of the selected data showed that the blue channels (Fp2-F4 and F4-C4) were present in 21 out of the 22 recordings, whereas the red and blue channels (Fp2-F4, F4-C4, Fp1-F3, and F3-C3) were present in only 16 out of the 22 recordings.This analysis determined the data for the experiments described in this document: 21 RBD recordings for channels Fp2-F4 and F4-C4 (Figure 2 1 ).

Data Symbolization
The first step for data symbolization was segmentation, which changed the total duration of the signals into N-second segments for each recording-Each recording has a different duration in hours, hence, a different N value.Once the 21 recordings were split into one-second segments, they were processed through one-dimensional symbolic aggregate approximation (1d-SAX); see Figure 2 2 .
In order to better understand 1d-SAX symbolic transformation, it is necessary to understand symbolic aggregate approximation (SAX) and piecewise aggregate approximation (PAA): • PAA is a time series downsampling in which the mean value of each fixed-sized segment is retained [36]; • SAX is based on PAA but performs an additional quantization of the mean value.Under the assumption that the time series follows a standard normal distribution, the quantization boundaries are computed to ensure that the symbols are assigned to each quantized mean value with equal probability [37]; • 1d-SAX is a symbolic representation of a time series based on SAX.This representation contains information about each segment's slope and a symbol that is associated with the segment's mean value.In other words, each segment is represented by an affine function of two quantized parameters: the slope and mean value [38].
In this particular case, the mean value is represented with 1 of the 26 letters in the lowercase English alphabet a, b, c, ..., y, z, whereas the slope is represented with an integer between 1 and 10.Hence, the number of 1d-SAX symbols created by combining a letter and an integer is 260.Examples of the final symbols are a1, b8, c4, d3, e7, f 2, and so on, with all of these having a specific meaning of the segment's features.The 1d-SAX symbols will be referred to as "words" from now on.
Thus, "phrases" will refer to the sequences of "words" corresponding to the symbolic one-second segments obtained from each channel-Each channel was analyzed separately.These unlabeled symbolic "phrases" were used to train the model described in Section 3.3 using unsupervised learning.
On the other hand, a structured database was created with longer "phrases" corresponding to the CAP labeled segments; for reference, the CAP A-phase duration is between 2 and 60 s.The classification experiments (see Section 3.4) were carried out using this structured database with supervised learning.

Data Vectorization
A well-known type of word representation is word embeddings, which can capture words' contextual features into low-dimensional vectors.Word embeddings became especially popular when Mikolov et al. (2013) [15] introduced Word2Vec, a group of models designed to learn and infer these vectors in a computationally efficient way.
One year later, an extension of Word2Vec was introduced by Le and Mikolov (2014) [39], named Paragraph Vector, better known as Doc2Vec.These models aim to create vector representations of sequences of words (sentences or documents) instead of individual words.
With the symbolic EEG unlabeled data, two Doc2Vec models were trained (one per selected channel) so as to learn an adequate representation of the "phrases", i.e., the sequences of "words" or sequences of 1d-SAX symbols.
Finally, the instances of the structured CAP symbolic data were processed with the Doc2Vec-trained models, and the inferred vectors (two in this case) were concatenated to build the input of the classification models.The data vectorization processes are summarized in Figure 2 3 and finalized with the concatenation and saving of the inferred embeddings (Figure 2 4 ).

Classification
In Figure 3, from 5 to 7 , classification was performed using 10 classic machine learning models and their default parameters (if not specified otherwise), including the following: 1.
Support vector classifier with a linear kernel; 3.
Support Vector classifier with a radial basis kernel;

4.
Support vector classifier with a radial basis kernel and a balanced approach to automatically modify the weights with an inverse proportion to the number of occurrences of each class in the training data; 5.
Decision tree classifier with a maximum depth settled in 5; 6.
Random forest classifier with a maximum depth settled in 5 and a maximum number of estimators fixed in 10; 7.
Multi-layer perceptron classifier with a learning rate of 0.1 and a maximum number of iterations settled in 1000; 8.
Gaussian naive Bayes classifier; 10.XGBoost classifier, with a maximum depth settled in 5, a maximum number of estimators fixed in 10, and a learning rate of 0.1.
This list of classifiers was validated using hold-out validation, with a training set using 80% of the data and the remaining 20% used for testing data to obtain a general scope of how the different classification models performed.
Considering that hold-out validation represents an adequate opportunity to evaluate a group of different models if the training and testing processes are performed under the same initial conditions (i.e. if the training and testing splits do not change), 10 classifiers were trained on the same training set and then evaluated with the same testing set.
Based on hold-out validation results, the best-performing model was selected and then evaluated under a stratified K-fold cross-validation strategy for the following reasons: Finally, the K value for the stratified K-fold cross-validation was fixed at 10.This means the fitting procedure was performed ten times, each with a training set of 90% of the data and a testing set of the remaining 10%.Additionally, this validation strategy ensures that the testing instances are not repeated between folds and that the training and testing sets preserve the same label proportion as the original input data.

Evaluation
The metrics implemented to evaluate the performance of the described classification task (Figure 3 8 ) include Accuracy, Precision, Recall, and F1 Score, all depending on the confusion matrix shown in Table 2 [40].
• Precision refers to the degree of dispersion of the results; the less dispersed, the greater the precision.From Table 1, the precision is • Recall calculates the proportion of actual positives correctly labeled.Recall is also known as the true positive rate (TPR) or sensitivity.From Table 1, this measure is calculated by • F1-score is defined as the harmonic mean of precision and recall, and is commonly used in the information retrieval field.Based on Table 1, the F1-score is Finally, the models' performances were compared or averaged, depending on the followed validation strategy (Figure 3 9 ).

Results
As stated before, the classification process was carried out using hold-out validation and stratified 10-fold cross-validation.Therefore, the results are presented and analyzed in the same order.
However, the first relevant result of this research was found by exploring the selected data, which was the compound of channels Fp2-F4 and F4-C4 from 21 RBD recordings.
Figure 5 shows a one-second segment of this data, whereas Table 3 shows the number of instances in the selected data per class, where the data is notoriously imbalanced.The most commonly used metric for describing the extent of imbalance of a dataset (i.e., how imbalanced a dataset is) is the imbalance ratio (IR) [40].IR is defined by where N maj is the number of examples in the majority class, and N min is the number of instances in the minority class.Hence, the IR for the selected data is 2.6834.
Although several strategies to handle imbalanced data were considered, we decided to evaluate how the proposed algorithms performed on raw data first.Thus, the data went through to the next steps without further preprocessing.

Classification Using Hold-Out Validation
For the first group of experiments, 10 classifiers and their default parameters were implemented in a hold-out validation strategy.These experiments aimed to find the most suitable classifier for the task.In addition, different embedding sizes or Doc2Vec (from now on, D2V) output dimensions were analyzed.
The results corresponding to D2V output dimensions fixed at 50, 100, 300, and 500 with the 10 classifiers are concentrated in Tables 4-7.These tables highlight the best results for each case and each computed metric, although the F1 score is the most relevant for the unbalanced tasks [40].

Classification Using Stratified K-Fold Cross-Validation
The best classifier of the previous group of experiments (RBF SVC-balanced) was implemented on a stratified 10-fold cross-validation strategy for the second group of experiments.These complementary experiments aimed to analyze how the classifier responds to reduced input vectors, with the implementation of principal component analysis (PCA) for dimensionality reduction.Table 9 shows the results obtained for the D2V output dimension fixed at 25, 50, 100, 300, and 500 without PCA.Additionally, the results corresponding to a D2V output dimension fixed at 25, 50, 100, 300, and 500 with different numbers of features kept after PCA (10, 50, and 100), are shown in Tables 10-12.As shown in the first group of results, Tables 9-12 highlight the best results for each case and each computed metric, although the F1 score is the most relevant for unbalanced tasks like this one.
A summary of the best classification results using the stratified 10-fold cross-validation strategy is shown in Table 13.When considering that the F1 score value is the best for selecting the best result (in this case, 0.7611 ± 0.0133), the conclusion is that the D2V output vector dimension equal to 50 and the classifier input dimension equal to 100 without PCA is the most suitable size and methodology for this experiment.However, it is interesting that the result obtained with the same D2V output dimension (50) that used PCA and kept 100 features performs almost equally to the best-aforementioned result (F1 score of 0.7609 ± 0.0137).Note that the same number of features are kept when no PCA was implemented for dimensionality reduction.Yet, this does not represent the best overall result obtained using the same parameters without PCA.

Discussion
As stated in Section 2, there are numerous pieces of research on automatic CAP detection and classification; some are oriented toward implementing different feature extraction techniques in combination with classification models, and others are oriented toward implementing deep learning models without manually extracted features.It is noteworthy that the latter have gained popularity in recent years.
The diversity of methodologies and classification approaches indicated in Table 1 has two opposite effects.It allows for exploring new methodologies to solve the CAP detection and classification problem while hindering their direct comparison.
Nevertheless, the proposed methodology for a distributional representation of CAP A-phases is a valuable approach since it reduces the complexity of the problem by transforming the sleep EEG data into shorter symbolic strings.Moreover, the proposed methodology allows for the use of powerful NLP tools that are well-known for their versatility and outstanding results.
This proposal has certain particularities that make it unique.It translates the EEG signals into sequences of symbols in a different way to the work of Li et al. (2012) [20] and Gao et al. (2020) [22].Instead, 1d-SAX transformation was implemented.The reason for making this decision was to reduce the problem's complexity by representing the time series data as shorter sequences of symbols.
Then, a method for vectorization was implemented, similar to Kim et al. (2018) [21], yet this process was carried out with a Doc2Vec model instead of using Wave2Vec or Word2Vec, which have comparable methodologies.
Finally, the distributional representation of the data allows for classification through less complex algorithms than the ones used by Loh et al. (2021) [30] and You et al. (2022) [32].
A summary of the results obtained using the SVC classifier with an RBF kernel under the two validation strategies is included in Table A1.The extended results of the literature review are also included for comparison.It is important to note that only the results highlighted in gray focus on the A-phase subtype classification (A1, A2, and A3).
Table A1 shows that the accuracy scores of  [32], the high accuracy score may be due to the consideration of the B-phase in their approaches, as reported by themselves.This class is outstanding in the number of instances (and, therefore, the number of correctly labeled instances) of A-phase segments.
In the case of Arce Santana et al. (2020) [29], where only the A-phase segments were considered, the performance analysis was truncated since they only reported the averaged accuracy.It would be interesting to analyze the F1 Score, for example, which better measures the performance of unbalanced data classification, which is the case for the A-phase subtypes.
Nevertheless, when considering the F1 Score (harmonic mean of precision and recall) as the metric of interest in unbalanced data tasks, the results obtained with the D2V output dimension set at 50 show a better outcome than all the rest of the approaches.
Finally, these results are encouraging since this proposal relies on an entirely new paradigm to analyze EEG data, raising the motivation to research the distributional representation of time series more profoundly.

Conclusions
A novel methodology for CAP analysis based on data symbolization, vectorization, and classification is presented.
First, a subset of 21 patients from the annotated CAP Sleep Database [4] was selected for this research.One-second segmentation was implemented on this data subset.Then, the symbolization process was performed using 1d-SAX, and the experimentation showed that it is a suitable form of transforming EEG signals into sequences of symbols.
Further vectorization was carried out using two Doc2Vec models (one per selected channel) with the PV-DM algorithm.The resulting Doc2Vec-trained models were implemented to infer the vectors corresponding to the EEG segments of interest: those labeled as CAP1, CAP2, and CAP3 (according to the CAP A-phase type).
The classification metrics on the 10 classic ML-implemented models have shown that increasing the dimensions of Doc2Vec does not necessarily improve the classification results.Additionally, PCA dimensionality reduction improves the classification results in comparison to the same-length vectors from the original embedding concatenation.However, it does not represent the best overall result.
The best results when using hold-out validation (F 1 Score = 0.7651) were achieved with the support vector classifier with a radial basis kernel and class-wise balanced regu-larization and the Doc2Vec vector dimension = 300.The best results for stratified 10-fold cross-validation (F 1 Score = 0.7611 ± 0.0133) were achieved by using a Doc2Vec vector dimension = 50, without PCA.
From the confusion matrix (Figure 6) analysis, it can be concluded that the CAP1 and CAP3 classes are more differentiable from each other, whereas CAP2 is easily misclassified as CAP1.This is also supported by the number of instances present in each class (Table 3).
Finally, based on the vector representation's distribution, it can be stated that this problem is not fully solved, and more experimentation and research in this field are needed.

Future Work
Based on the results obtained from this research, the authors propose, firstly, that an automatic detector of micro-events should be implemented, for example, a windowing system that searches CAP events throughout the signal automatically or even the exploration of a grammar-induction-based algorithm to find repeated patterns, as suggested in [20].
By accurately finding the A-phase start and end points, we might also address the CAP identification problem.Consequently, the complementing stages of CAP identification (A-phase vs. Non-A-phase) and A-phase classification (A1 vs. A2 vs. A3) would be a solid basis with which to develop a numerical tool that helps sleep experts consistently identify CAP patterns and automatically measure other CAP-related parameters, including the CAP rate.
Furthermore, by adding other types of patients to our study subset, such as healthy and NFLE patients, the models will be trained on a larger dataset, which may improve their performances.Moreover, this research will be directly comparable to other works that have used different subsets of patients.
Finally, experimentation with Word2Vec instead of Doc2Vec for EEG vectorization using a sequential classification model (such as a long short-term memory) is a compelling proposal.This novel approach will indirectly give more interpretability in terms of the vectorization process since Word2vec is less complex than Doc2Vec and will continue the proposed line of research on the distributional representations of CAP.

Figure 1 .
Figure 1.Cyclic alternating pattern, with its A-phases (red, blue, and green) and B-phases (gray).Note the difference between a CAP Cycle (phases A-B) and a CAP sequence (consisting of a minimum of phases A-B-A-B).

Figure 2 .
Figure 2. Proposed methodology for the data selection, symbolization, and vectorization processes.Note that step 3 implements unsupervised learning to train the Doc2Vec models.

Figure 3 .
Figure 3. Proposed classification methodology, implementing supervised learning under two different validation strategies: first on hold-out validation and, second, on a stratified 10-fold cross-validation.

3. 1 .
CAP Sleep DatabaseThe CAP research team led byTerzano et al. (2001) [4] released in 2012 the CAP Sleep Database on PhysioNet [33].It consists of 108 PSG recordings that were registered by the Sleep Disorders Center of the Maggiore Hospital in Parma, Italy.From the 108 PSG, 16 correspond to the recordings of healthy patients, and 92 correspond to the pathological recordings of patients diagnosed with: • Nocturnal Frontal Lobe Epilepsy (NFLE)-40 recordings; • REM Behaviour Disorder (RBD)-22 recordings; • Periodic Leg Movement (PLM)-10 recordings; at least three EEG signals with complementary EOG, chin and tibial EMG, airflow, respiratory effort, pulse oximetry, and ECG signals.In addition, they include annotations of the sleep stage and CAP labels (CAP1, CAP2, and CAP3, corresponding to the three A-phase types).Each recording has an average duration of 9 hours (a complete night of sleep) with different sample frequency values (from 50 to 512 Hz).The interesting annotations in this research, i.e., CAP1, CAP2, and CAP3, have an average total duration of 16 minutes.This accounts for approximately only 3% of the complete recording.

•
To verify that the selected model's results were independent of the training set used during hold-out validation; • To keep the original label distribution in the training and testing sets in the K splits; • To thoroughly evaluate the impact of different input vector dimensions (inferred output of the Doc2Vec models); • To quantitatively analyze the impact of dimensionality-reduction techniques applied to the input vectors, for example, principal component analysis (PCA), with different numbers of computed features.

Figure 5 .
Figure 5. One-second sleep EEG example registered on the two selected channels F4-C4 and Fp2-F4.

Figure 6 .
Figure 6.Confusion matrix corresponding to the best result obtained using hold-out validation.

Table 1 .
Literature review regarding CAP detection and classification.
• Accuracy is the proportion of correctly labeled instances (True Positives and True Negatives) among the total number of instances.When considering Table 1, accuracy is calculated as follows Accuracy = TP + TN TP + FP + FN + TN

Table 3 .
Number of instances present in the selected data per class.

Table 13 .
Best classification results using hold-out validation, obtained with RBF SVC-balanced.