A Hybrid Feature Selection and Extraction Methods for Sleep Apnea Detection Using Bio-Signals

People with sleep apnea (SA) are at increased risk of having stroke and cardiovascular diseases. Polysomnography (PSG) is used to detect SA. This paper conducts feature selection from PSG signals and uses a support vector machine (SVM) to detect SA. To analyze SA, the Physionet Apnea Database was used to obtain various features. Electrocardiography (ECG), oxygen saturation (SaO2), airflow, abdominal, and thoracic signals were used to provide various frequency-, time-domain and non-linear features (n = 87). To analyse the significance of these features, firstly, two evaluation measures, the rank-sum method and the analysis of variance (ANOVA) were used to evaluate the significance of the features. These features were then classified according to their significance. Finally, different class feature sets were presented as inputs for an SVM classifier to detect the onset of SA. The hill-climbing feature selection algorithm and the k-fold cross-validation method were applied to evaluate each classification performance. Through the experiments, we discovered that the best feature set (including the top-five significant features) obtained the best classification performance. Furthermore, we plotted receiver operating characteristic (ROC) curves to examine the performance of the SVM, and the results showed the SVM with Linear kernel (regularization parameter = 1) outperformed other classifiers (area under curve = 95.23%, sensitivity = 94.29%, specificity = 96.17%). The results confirm that feature subsets based on multiple bio-signals have the potential to identify patients with SA. The use of a smaller subset avoids dimensionality problems and reduces the computational load.


Introduction
During sleep, three basic respiratory disturbances are found. In these types, the most common disorder is sleep apnea (SA). The closure of the upper airway is repeated and temporary, and it is defined as SA in adults. SA can cause a complete breathing cessation, and the cessation lasts more than 10 s [1]. As a consequence, the sleep of people with SA is fragmented, and SA reduces the refreshing effects of sleep [2]. SA is associated with hypertension and cardiovascular diseases. It can also cause daytime sleepiness that has secondary risks such as vehicular accidents. SA should be objectively assessed to treat it. The apnea/hypopnea index (AHI) can evaluate SA. During sleep, it is considered to be the number of apnea and hypopnea events per hour. If it is greater than five or the minimum SaO 2 value is less than 85% in adults, an AHI is defined as abnormal [3]. In general, in healthy people the AHI is less than 5. Five to fifteen is considered mild SA, and 15-30 is considered moderate SA, and greater than 30 is considered severe SA. a hidden Markov model and applied it in the combination with an SVM to detect SA based on ECG signals.
EEG can provide useful sleep respiratory disturbance information. It has become one of the most significant signals for the diagnosis of sleep respiratory disturbance [16]. Trend features in Reference [17] were obtained by the Hilbert-Huang transformation and the Band-pass filtering. These features and the duration of SA were used to confirm SA. In Reference [18], to diagnosis SA, an adaptive neuro-fuzzy inference system was utilized. Features were obtained by the discrete wavelet transformation, and the system classified the apnea and the normal events in EEG signals. The Bispectral analysis was utilized to obtain bispectral characteristics called the quadratic phase coupling, and an ANN was used as a classifier to detect SA using EEG features [19]. The discrete wavelet transformation was employed to decompose the raw EEG signals. The GreyART network is one of the non-supervised machine learning systems, and the features from the wavelet transformation were used as the input for GreyART to identify SA [20]. The results of Reference [21] showed that features extracted by the autoregressive and the classification-by-least-squares SVM were able to detect changes in the EEG.
Many other SA diagnostic methods based on multiple bio-signals have been proposed. Al-Angari et al. [22] showed that the feature extraction from abdominal and thoracic respiratory effort signals, SaO 2 , and ECG signals could be applied with an SVM to detect apnea events. Two effort signals were used to compute the phase-locking value. Features were computed, including the standard deviation and the mean of the time between beats (RR intervals), the normalized and absolute powers in the high-frequency band, the low-frequency and the very-low-frequency bands, and the power ratio of the low-frequency band to the high-frequency band. Standard deviations and means of SaO 2 were also computed. For classification, SVM classifiers were used with these features. A benchmark method for automatic detection SA was established based on ECG, airflow and SaO 2 in Reference [23]. Each signal was decomposed by the wavelet transformation and the depth was 14. At every detail level, the mean, variance, and energy were computed. An ANN and other classifiers were compared and features from different signals were the inputs fed into classifiers. The best model was an ANN applied airflow signals. In addition, in SaO 2 and ECG signals, features were used to train a linear discriminant model in Reference [24].
In automatic detection, feature extraction and selection play significant roles. Feature extraction is applied to obtain significant features that reflect bio-physiological criteria. Features with bio-physiological criteria can provide good classification performance. Feature selection can also reduce computational load. In the present study, the main contribution is to take the next step in the SA feature selection. We explore the importance of each feature extracted from PSG signals. In the evaluation of the importance of these features, the analysis of variance (ANOVA) and the rank-sum method are applied. Then, an SVM classifier is designed using the previously extracted features as the inputs. Feature extraction and selection pinpoint the best features for detecting SA since the selected features are with the best discrimination. In this study, the aim is to select features with the best discrimination, and they can be extracted from PSG signals and used to detect SA. The next section is about the feature extraction from different signals.

Sleep Apnea Dataset
The Physionet apnea dataset (https://physionet.org/content/apnea-ecg/1.0.0/) provided overnight PSG signals. In this study, the aim is to select multi-domain features with good discrimination from different kinds of bio-signals. In the Physionet apnea dataset, only 8 recordings (a01 through a04, b01, and c01 through c03) include five kinds of bio-signals (ECG signal, abdominal and thoracic signals, airflow signals, and oxygen saturation signals). Thus, these 8 recordings were used in this study. The dataset contained eight multi-signals with lengths of 455-529 min (mean ± std: 495 ± 21), and these recordings were sampled at 100 Hz. Each minute for SA in each signal was manually annotated by an expert. The subjects were 7 male and 1 female, and their ages were between 31 and 64 years (mean ± std: 43 ± 8). The heights of the subjects were 168-184 cm (mean ± std: 177 ± 6) and their weights were 63-121 kg (mean ± std: 88.5 ± 22). All information was recorded simultaneously, including ECG, thoracic, abdominal, airflow, and SaO 2 signals. One minute of a PSG signal is shown as an example in Figure 1. Inductance plethysmography was used to obtain thoracic and abdominal signals, and airflow and SaO 2 signals were measured using nasal thermistors and pulse oximetry, respectively. In each signal, one for each minute included the annotation, indicating apnea or normal event at that time. "A" indicates that apnea occurred during the following one-minute interval, while "N" indicates there was no apnea in the annotation. A total of 87 features were obtained from the abdominal and thoracic effort, airflow and SaO 2 , and ECG signals.

Feature Extraction
We extracted 87 features from ECG, SaO 2 , airflow, abdominal, and thoracic signals, obtained by time-, frequency-domain and non-linear methods in the work.

Feature Extraction Using ECG Signal
In the part, we obtained time-domain, frequency-domain, and non-linear features. Table 1 shows ECG features 1 to 62. Preprocessing: We utilized a band-pass filter with 0.05-40 Hz to filter noise and artefacts and conduct the base-line correction. The filter was a 3rd-order infinite impulse response Butterworth. After de-noising, in the filtered ECG signal, the R-peaks were identified using the modified Pan-Tompkin algorithm. Here, a symmetric window of 120 ms was used to obtain all QRS (Q wave, R wave, and S wave) around the R-peaks. The heart rate, known as the RR interval, was considered as the time difference of consecutive R peaks. Due to the poor signal quality, a pre-processing step was done to obtain a set of RR intervals with more physiologically reasonable, taken as the HRV. In this paper, we employed the correction method from [25]. Besides the HRV and QRS complex, ECG-derived respiration (EDR) signals were also derived using the EDR method of Physionet. These signals reflect respiratory activities between the electrodes and the heart since the electrodes were placed on the patient's thorax, and the electrical impedance changed during breathing. The motion of the thoracic cavity also influences the ECG signals [26]. Previous studies have stated that the heart rates of SA patients are lower than the heart rates of healthy people [27].
Time-domain features: Time-based features included (1) the root mean square consecutive difference using a set of R-peak amplitude (RMSSD_R_amp), (2) the number of pairs of adjacent HRV that the latter HRV parameter exceeded the former HRV parameter by greater than 50 ms (NN50_RR), (3) the standard deviation of the differences of consecutive HRV signals (SDSD_RR), (4) the standard deviation between the standard deviation of the HRV signal at the first 30 s and the standard deviation of the HRV signal at the second 30 s (tSD_RR), (5) the standard deviation of HRV signals (std_RR), (6) the mean, variance, kurtosis of ECG signals (mean, var, and kurtosis), (7) the mean of the RR interval (mean_RR), (8) the coefficient of the variance (the ratio of the standard deviation to the mean in EDR signals, CV_EDR), and (9) the mean of the R-peak amplitude (mean_R_amp).
Frequency-domain features: Features of the work were obtained by the wavelet transformation and the power spectral density in frequency-domain. The wavelet transformation has more advantages than classical Fourier transformation methods. The wavelet transformation uses a multi-scale basis. A varying window size is used to analyst non-stationary signals, and the size is narrow at high frequencies and broad at low frequencies. The functions of scaling and wavelet are applied by the wavelet transformation to decompose signals at approximate and detailed levels, and there is the correlation between high-pass and low-pass filters and these functions. The distribution of frequency power components is described by the power spectral density (PSD). The PSD method includes parametric methods and non-parametric methods. The advantage of parametric methods is higher accuracy, while the advantage of non-parametric is less computational complexity.
First, a level-9 wavelet transformation with Daubechies wavelets 4 were used to decompose EDR signals and obtained variances of the 9th and 2nd detail levels (var_EDR_D9 and var_EDR_D2). Second, a 3rd-order Symlet wavelet was the mother wavelet with a level number of 7 to decompose. The Shannon's entropy (entropy_D1 to entropy_A7), the mean (mean_D1 to mean_A7) and the variance (var_D1 to var_A7) were extracted from one approximation and 7 detail coefficient levels, and the wavelet spectral density (WSD) was used to analyze RR intervals and the R amplitude (WSD_RR and WSD_R_amp). On the other hand, the PSD was applied to the RR intervals, and the dominant frequency was found in the 0.03-0.5 Hz frequency band (max_PSD 0.03/0.5 ). We extracted the mean PSD in the 10-20 Hz and 80-100 Hz (mean_PSD 10/20 and mean_PSD 80/100 ). The PSD estimate was performed for the EDR signals and RR intervals and resulted in variances of 0.03-0.4 Hz (var_EDR 0.03/0.4 and var_RR 0.03/0.4 ). We utilized spectral flatness, centroid, spread, decrease, and slope.
Non-linear features: Recurrence is a fundamental characteristic of dynamic systems, and recurrence quantification analysis describes the structures of recurrence. In the study, we extracted three parameters, (1) V_MAX (the longest diagonal length), (2) DET (the percentage of recurrent points to use the minimal length recurrent point and to form diagonal lines), (3) LAM (in vertical lines of minimum length, the percentage of recurrent points). Similarly, in RR intervals, we extracted five serial correlation coefficients (SCrC_1_RR to SCrC_5_RR). Here, the QRS signal was processed by the principal component analysis (PCA) and the kernel principal component analysis (kPCA) [28]. These features correspond to the standard deviation of PCA and kPCA (std_PCA and std_kPCA), the maximum of the diagonal matrix of PCA and kPCA (max_dia_PCA and max_dia_kPCA), and the relative power in the second principal component (RP_2_PC).

Feature Extraction Using Oxygen Saturation Signal
From the oxygen saturation signal, we obtained four features from the time-and frequency-domain. Features were those of Nos. 63-66 in Table 1. Preprocessing: There were many quantitative indices used to diagnose SA. The most widely used index in SaO 2 includes the cumulative time less than a value, a 90% decline from baseline or the number of SaO 2 less than a value, and a 3% decline from baseline. When the sample-to-sample differences were >8%, they were considered as non-physiological artefacts. To reduce non-physiological artefacts, the median value was calculated in the initial 10 s, and these artefacts were replaced by the median value. Here, data were down-sampled to 1 sample/s to be considered as low-amplitude artefacts.
Multi-domain features: For SaO 2 recordings, the time-and frequency-domain features were obtained in this study. The feature set included the median of 60-second (med), and rapid restoration events identified as an increase of more than 4% in 10 s (RES4). Apnea events usually lasted 20-60 s. Therefore, the PSD obtained in desaturation events was correlated with the number and intensity of desaturation events. The PSD was calculated by a 5th-order Yule-Walker autoregressive estimate, and the mean in the 0.016-0.05 Hz frequency range was added to the feature set (mean_PSD 0.016/0.05 ). Finally, a variable called the Poincare SD 1 was computed, which indicates the short-term variability in the SaO 2 signal [29].  (Table 1) were extracted from the airflow signal. Preprocessing: An apnea event in the airflow signal is also defined as the at least 10% decrease from its basal value, and this decrease lasts at least 10 s [7]. Apnea events are recorded when there are at least two missed breaths of length [30]. The airflow signal swings up during exhalation and down during inhalation, which is a direct time-domain change. We started the baseline correction using the median level of a window of 10 s. For noise removal, a 3rd-order Butterworth low-pass filter was used, and its cut-off frequency was 3 Hz. We down-sampled the airflow signal from 200 Hz to 1 Hz.
Multi-domain features: Apean indicates airflow decreases recurrently throughout the night, and this behaviour supported this study. Hence, the mean, median (med), and standard deviation (std) were extracted as time-domain features. On the other hand, the Welch method was applied to obtain PSDs since the airflow signal were non-stationary signals. A segment length of 5 samples with 2.5 overlapped samples was used. The means were calculated within 0-0.1 Hz and 0.4-0.5 Hz (mean_PSD 0/0.1 and mean_PSD 0.4/0.5 ) since there were at least two missed breaths of length which reduced the respiratory frequency, and then there was fast respiratory overexertion which increased the respiratory frequency. The wavelet transformation decomposed each airflow signal and extracted spectral differences between apnea and normal groups. The level-3 wavelet transformation with Daubechies wavelets = 3 was performed. Using the wavelet, a given signal was decomposed into one approximate coefficient and three detailed coefficients (mean_D1 to mean_A3).

Feature Extraction Using Abdominal and Thoracic Signals
We used time-and frequency-domain methods to process abdominal and thoracic signals. Feature Nos. 76-81 were extracted from abdominal signals while Nos. 82-87 were extracted from thoracic events (Table 1). Preprocessing: A total closure of the upper airway is scored as the apnea. During sleep apnea events, the abdomen and thorax of patients move. For the work, we estimated the performance of abdominal and thoracic signals. First of all, we started the baseline correction using the median-level window of 10 s. We used a 3rd-order band-pass Butterworth filter in 0.05-40 Hz to remove noise and artefacts.
Multi-domain features: Firstly, abdominal signals were used to provide feature sets. The sum and standard deviation of the absolute value of each 60-s (sum_abs, std_abs), and the mean of every 60-s (mean) were put into the time-domain feature set. The Yule-Walker method was used to estimate the PSD, and the segment length was 40 samples. The PSD provided the mean in the 80-100 Hz (mean_PSD 80/100 ). A wavelet of depth 2 was designed with Daubechies 2, and the mean of the 1st and 2nd detail levels were computed (mean_D1 and mean_D2). Then, the thoracic signal provided other features, which included the sum, standard deviation, median, variance, and mean of each 60-s (sum, std, med, std, mean, and var). The Yule-Walker method was also used and the mean in the 80-100 Hz frequency band was obtained (mean_PSD 80/100 ).

Feature Selection
A total of features increase the training time of classifier, as time is wasted in processing redundant or potentially detrimental features, which limits real-time detection applications. Also, redundant features influence the high accuracy of the classification algorithm. Thus, before the classification stage, feature selection plays an important role. Feature selection can prevent overfitting of training and reducing computational load. A small subset of data is obtained by feature selection, and the subset is with high-discriminatory power, which maintains performance and shortens the training process time since feature selection leaves the most compact feature sets. For feature selection, which provides the most discriminating information between two classes, we employed a two-stage procedure: the statistical analysis and the SVM selection. In the first stage, according to the statistical performance of different features, redundant features are removed from the feature set, and other features are put into different classes [31][32][33][34]. Then, we evaluated the performance of different feature classes with a trained SVM model. In this case, the hill-climbing method and the k-fold cross-validation were used. The feature set was reduced, and the best feature class with good performance was selected via comparisons of sensitivity, specificity, and accuracy.

Statistical Analysis
To select the features with good discrimination, we used two statistical tests: ANOVA and the rank-sum test. ANOVA [14,32] can analyse differences between group means and associated labels. The error (residual) sum of squares (SSE) was computed while the total sum of squares (SST) was calculated. We obtained the group sum of squares (SSG), group means, and the sum of squares errors by dividing by their respective degrees of freedom, d f group and d f error . Resulting means were defined as the group mean squared and mean squared errors. Finally, we obtained the F-ratio. This ratio follows an F v 1 ,v 2 distribution, where v 1 = d f group and v 2 = d f error , which is used to compute the p-value.
In the rank-sum test [34,35], the aim was to confirm that the two groups were independent. Two values were randomly selected from the first group and the second group, respectively, and these two values were compared. Let n 1 and n 2 be the numbers of the two samples, respectively. We ranked the combined n 1 + n 2 observations in ascending order and substitute a range of 1, 2... to the n 1 + n 2 observations. We assigned conflicting observations with their mean ranks. The w i (i = 1, 2) indicates the sum of the ranks corresponding to n i observations. Let u = min(u 1 ,u 2 ) and u will be compared with the desired critical value. The rank-sum test also obtains the p-value.
We used a simple threshold (ANOVA p-value < 0.05) and a simple number (p-value of the rank-sum test = 1) to classify the positive and negative features. The results of the ANOVA and the rank-sum test are presented in Table 1. In total, there were 87 features. Feature Nos. 1-62 were from each ECG minute, Nos. 63-66 were from SaO 2 signals, Nos. 67-75 were from the airflow signal, Nos. 76-81 were from the abdominal signal, and Nos. 82-87 were from the thoracic signal. In Section 2.1, we mentioned eight PSG signals were used in this study. We found that all events of three PSG signals (c01 through c03) were labeled by normal, and there were no apnea events. Mentioned in the description of statistical analysis, both ANOVA and the rank-sum test used apnea and normal events in one patient to calculate p-value of each feature. Thus, in this statistical analysis stage, only five patients with two kinds of events were used, and eight patients' PSG signals were used in the next SVM stage.
In this study, for each derived feature, statistical results from both apnea and normal events from the first patient were analyzed and a pair of p-values was obtained. This process was repeated until five patients' PSG signals were used to provide five pairs of p-values for this feature. Then the next feature was taken by ANOVA and the rank-sum test. Each pair included one ANOVA p-value and another one of the rank-sum test. After deriving the p-values of the full feature set, features had the good discrimination between two groups and selected into the feature subset. We set λ feature as the number of positive pairs for each feature. In this study, the maximum of λ feature was equal to five since it was set to be the same as the number of patients. In Table 1, λ feature of Feature No. 10 was 2, which means that two pairs of the Feature No. 10 were positive. Similarly, λ feature of Feature No. 2 = 5, which means that this feature has been identified as a positive feature for all 5 patients. If the λ feature of a feature was ≥ 1, as shown in Table 1, then this feature was put into the selected feature set and highlighted with an asterisk (*) in Table 1. Finally, 66 features were selected based on λ feature being ≥ 1. The aim of this stage was to discard noisy and irrelevant features and reduce the computational load of the algorithm.

SVM Selection
An SVM classifier was used in the second stage to find features with good discrimination. The SVM was introduced by Vapnik [36] and it is a classification method and regression analysis. It is a binary classifier and it constructs an optimal separating hyperplane (OSH) in an N-dimensional space. The input vector is transformed via nonlinear mapping into a K-dimensional space from the N-dimensional space.
A training data D = {(x i ,y i )} L i constitutes of every training vector x i ∈R n and the matched label y i ∈ (+1, −1). Every training vector x i is transformed via a nonlinear mapping function z = ∅(x) into a higher-dimension space F. Every vector w ∈ R n is the normal vector to the hyperplane, and the points z lie on the hyperplane, and margin bias b is in R, and these parameters satisfy w · z + b = 0, while the input data satisfy the equation y i (w · z + b) − 1 ≥ 1 ∀i. The bias is computed by 2/ w . When the OSH maximizes the bias to minimize w 2 /2, the SVM constructs the optimal one. The quadratic programming with Lagrange multipliers α i can solve a convex optimization problem and a dual problem.
Selecting a kernel function K(x i ,x j ) affects the performance of the SVM in the construction stage. In this study, we tested three kernels and selected the best one. Three kernels include Linear, Polynomial, and Radial basis function (RBF). To select the best performance of three kernel functions, we used three indexes, that is, accuracy, sensitivity, and specificity.

Results and Discussion
We classified all features into five groups (Classes A, B, C, D, and E) via the relationship between λ feature and the number of PSG signal (ν PSG = 5) from Tables 1 and 2. In Table 1, the λ feature of Feature No. 64 was 5 and was obtained from the SaO 2 signals. The λ feature of Feature No. 64 was equal to ν PSG and was put into Class A. Similarly, the λ feature of Feature No. 79 was 2 (ν PSG -3) and was from the abdominal signal. It was put into Class D. Three kernel functions, Linear, Polynomial and RBF affected the performance of the SVM. Considering all class features as inputs of the SVM, the performance of the kernel functions were evaluated using parameters. The parameters considered were Linear (R), Polynomial (d, and R) and RBF (σ and R). In these parameters, R is used in Linear, Polynomial and RBF. R is the regularization parameter, and d in Polynomial is the degree, and σ in RBF is the width. It was found that eight PSG signals from the Physionet database were imbalanced since the number of normal events was much larger than that of apnea events, which affected sensitivity, specificity, and accuracy. The under-sampling balance method was used in this study. This method reduced the number of normal events and balanced the number of apnea events and the number of normal events in these eight PSG signals. The parameters, including R, d, σ, carried different values, that is, σ (1,5,25), d (2,3,4) and R (0.2,1,10). In order to detect the best classifier, the performance is presented by accuracy, sensitivity, and specificity via the k-fold cross-validation. We followed the hill-climbing algorithm mentioned, in order to detect the feature subset with the most discriminating information. Initially, Class A features were selected to feed into the SVM, and different kernels with different parameters were set. Then, the next class features were added, and this algorithm was repeated until all classes were added. The final feature subset was selected by comparing the performance. According to the results of the k-fold method and the hill-climbing algorithm, the means were computed and Table 3 shows the means (%) for accuracy, sensitivity, and specificity.
In Table 3, most of sets had good performance (more than 90% accuracy). However, the accuracy (≤90%) is highlighted in bold because these sets had worse performance than other sets. Compared the performance of different feature subsets, the results presented showed that Class A was better than the other classes. Table 3. Means (00.00%) of sensitivity specificity and accuracy of the k-fold method based on the feature classes using SVM models with different kernel functions and parameters (the worse accuracy (≤90%) is highlighted in bold). To confirm the best feature subset, two statistical indexes, the standard deviation and the mean, were used to evaluate the results shown in Table 4. In this study, a lower feature class standard deviation indicates less fluctuation, and this feature class was more stable than other classes. A higher feature class mean indicates that the performance of this subset was better than those of other classes. From Table 4, it can be seen that the standard deviations of Class A were 4.27%, 1.82% and 2.79%, respectively. On the other hand, for mean values, Class A exhibited better performance. Its means of sensitivity, specificity and accuracy were 93.22%, 94.60%, and 93.93%, respectively. Comparing the classification performance results in Tables 3 and 4, we find that the best feature subset was Class A (Nos. 2, 47, 64, 66, 77), which had better effectiveness and robustness.

R Class A Class AB Class ABC Class A-D Class A-E Sen Spe Acc Sen Spe Acc Sen Spe Acc Sen Spe Acc Sen Spe Acc
Class A (No. 2, 47, 64, 66, 77) was the feature subset with the highest discriminating power. Features No. 2 and 47 were extracted from ECG signals, and Nos. 64 and 66 were extracted from SaO 2 signals, and No. 77 was extracted from abdominal signals. During apnea events, patients with sleep apnea showed distinct patterns in the cardiac RR interval variation. Compared with the normal events, sleep apnea events increase the RR interval variation at lower heart rates and the decrease variation at faster rates. Feature No. 2 indicates the number of pairs of adjacent HRV samples that the latter HRV sample exceeded the former HRV sample by greater than 50 ms (NN50_RR). This feature indicates the different RR intervals of apnea and normal groups. On the other hand, lower heart rates lead to a change in the frequency domain. Feature No. 47 shows the spectral spread, and it is around the centroid and it reflects the standard deviation of the frequency range. The spectral spread shows that heart rates are lower in apneic parts, and the frequency values are lower than those in normal sub-bands. In apnea events, sudden downturns were followed by relatively fast recoveries in SaO 2 values. Feature No. 64 describes rapid restoration events, an increase of more than 4% in 10 s (RES4), and shows rapid downturns and recoveries in oximetry values. Feature No. 66 is the Poincare SD 1 , which was obtained by comparing each SaO 2 value against the previous one, and reflects the short-term variability. Sudden downturns and fast recoveries in SaO 2 values affect this feature. The closure of the upper airway leads to SA. The patient's abdomen is active during these closures. Feature No. 77 is the standard deviation of the absolute value of each minute of abdominal signals (std_abs). Their values are higher during apnea events than at the normality. We used absolute values since there was a fluctuation in abdominal signals that affected the standard deviation. Overall, selected features with the best discriminative power were associated with the bio-physiological criteria of SA. The receiver operating characteristic (ROC) curve is related to the sensitivity and specificity of the classifiers. Ideally, the sensitivity value is 1, and the specificity value is 0. In this work, after determining that Class A was the best feature subset, the ROC area (AUC) was used to evaluate the different SVM models, and the k-fold method was also used. A comparison of AUCs is presented in Table 5. From Table 5, it found that the AUC of Linear (R = 1) was 95.23%, which held a better performance. From Table 3, it found that every accuracy in the Linear (R = 1) kernel was more than 90%, which means Linear (R = 1) was more stable than other kernels with parameters. Considering Class A features as the SVM input, the classification performance of the Linear (R = 1) kernel was 95.22%, while the sensitivity was 94.29%, and the specificity was 96.17%.
Five kinds of bio-signals (ECG, SaO 2 , airflow, abdominal, and thoracic signals) provided time-domain, frequency-domain, and non-linear features (show in Table 1), since these five signals are easy to collect, and they also show bio-patterns during apnea events are much different from patterns during normal events. Based on the λ feature obtained by ANOVA and the rank-sum test, 87 kinds of features (shown in Table 2) were put into Classes A-E. The results of different classes were conducted by the statistical analysis shown in Table 3. This meant the features in Class A had the good discriminatory capability. The time-and frequency-domain features in Class A from ECG, SaO 2 , and abdominal signals were correlated with the bio-physiological criteria of apnea, such as the changed RR interval variation, sudden downturns and recoveries of SaO 2 values, and the active abdomen muscles. SVM models were used to confirm the potential of these selected features. The SVM model with the Linear (R = 1) provided the high performance (the accuracy of 95.22%) and we were able to conclude that time-and frequency-domain features possessed the good discriminatory capability to classify normal and apneic events. Some limitations can be observed in the study. Firstly, the SVM models were the part of the two-stage feature selection but not an automatic detection classifier, although they provided a better performance compared other studies mentioned in Section 1, such as in References [22,23]. In further work, various classification models will be conducted with the selected features in order to confirm which classifier emerges as the best. Secondly, the quality of bio-signals affects the performance of our method. Although removing artifacts and the baseline correction were used in this paper, it is necessary to upgrade the signal acquisition process and the sensor, such as wearable sensors that can be attached directly on human skin [37].

Conclusions
In conclusion, this paper aims to detect the feature subset with the highest discriminating power. Eighty-seven kinds of features were extracted using time-, frequency-domain, and non-linear algorithms from ECG, SaO 2 , airflow, abdominal, and thoracic signals. The p-values were computed by ANOVA and the rank-sum test, and the 87 extracted features were then classified into five classes (A, B, C, D, and E). The SVM was used to confirm the final feature subset, and k-fold cross-validation algorithm and the hill-climbing algorithm were both implemented. Class A (using 5 of the 87 features) illustrated the best performance, from ECG, SaO 2 , and abdominal signals. Also, the results showed that the Linear kernel with R = 1 outperformed the other classifiers. The proposed approach proves that a feature subset with high discriminative power can be useful in distinguishing apnea and normal sleep. Moreover, using the smaller subset reduces the complexity and the computation time.