Diagnosis of Obstructive Sleep Apnea Using Feature Selection, Classification Methods, and Data Grouping Based Age, Sex, and Race

Obstructive sleep apnea (OSA) is a prevalent sleep disorder that affects approximately 3–7% of males and 2–5% of females. In the United States alone, 50–70 million adults suffer from various sleep disorders. OSA is characterized by recurrent episodes of breathing cessation during sleep, thereby leading to adverse effects such as daytime sleepiness, cognitive impairment, and reduced concentration. It also contributes to an increased risk of cardiovascular conditions and adversely impacts patient overall quality of life. As a result, numerous researchers have focused on developing automated detection models to identify OSA and address these limitations effectively and accurately. This study explored the potential benefits of utilizing machine learning methods based on demographic information for diagnosing the OSA syndrome. We gathered a comprehensive dataset from the Torr Sleep Center in Corpus Christi, Texas, USA. The dataset comprises 31 features, including demographic characteristics such as race, age, sex, BMI, Epworth score, M. Friedman tongue position, snoring, and more. We devised a novel process encompassing pre-processing, data grouping, feature selection, and machine learning classification methods to achieve the research objectives. The classification methods employed in this study encompass decision tree (DT), naive Bayes (NB), k-nearest neighbor (kNN), support vector machine (SVM), linear discriminant analysis (LDA), logistic regression (LR), and subspace discriminant (Ensemble) classifiers. Through rigorous experimentation, the results indicated the superior performance of the optimized kNN and SVM classifiers for accurately classifying sleep apnea. Moreover, significant enhancements in model accuracy were observed when utilizing the selected demographic variables and employing data grouping techniques. For instance, the accuracy percentage demonstrated an approximate improvement of 4.5%, 5%, and 10% with the feature selection approach when applied to the grouped data of Caucasians, females, and individuals aged 50 or below, respectively. Furthermore, a comparison with prior studies confirmed that effective data grouping and proper feature selection yielded superior performance in OSA detection when combined with an appropriate classification method. Overall, the findings of this research highlight the importance of leveraging demographic information, employing proper feature selection techniques, and utilizing optimized classification models for accurate and efficient OSA diagnosis.


Introduction
Obstructive sleep apnea (OSA) is a severe respiratory disorder that was first introduced in 1837 by Charles Dickens [1]. The foremost common symptoms of OSA are loud snoring, dry mouth upon awakening, morning headaches, and concentration difficulties [2,3]. There are over 100 million patients who suffer from sleep apnea, and it can affect both adults and children [4][5][6]. Moreover, it is estimated that nearly 22 million Americans suffer from a type of apnea that varies from moderate to severe [7]. Typically, the apnea-hypopnea index (AHI) is used to measure the severity of the apnea. For example, with nearly 326 million people living in the USA, it's reported that 10% of the US population have mild OSA with AHI scores larger than 5, 3.5% have moderate OSA with AHI scores larger than 15, and 4% have severe OSA syndrome (i.e., apnea/hypopnea) [7].
The publication titled "Hidden health crisis costing America billions" by the American Academy of Sleep Medicine (AASM) presents a new analysis that sheds light on the considerable economic consequences of undiagnosed OSA [8]. Neglecting sleep apnea significantly raises the likelihood of expensive health complications such as hypertension, heart disease, diabetes, and depression [9]. By examining 506 patients diagnosed with OSA, the study showcases the potential improvements in their quality of life following treatment, including enhanced sleep quality, increased productivity, and a notable 40% reduction in workplace absences. A substantial 78% of patients regarded their treatment as a significant investment. Frost & Sullivan, a leading market research firm, has estimated the annual economic burden of undiagnosed sleep apnea among adults in the United States to be approximately $149.6 billion. This staggering amount encompasses $86.9 billion in lost productivity, $26.2 billion in motor vehicle accidents, and $6.5 billion in workplace accidents. Sleep apnea can be categorized into three distinct types: • Obstructive sleep apnea (OSA): The most common type of apnea is known as obstructive sleep apnea (OSA), which is identified by two primary characteristics. The first is a continuous reduction in airflow of at least 30% for a duration of 10 seconds, which is accompanied by a minimum oxygen desaturation of 4%. The second is a decrease in airflow of at least 50% for 10 seconds, coupled with a 3% reduction in oxygen saturation [10]. • Central sleep apnea (CSA): CSA occurs when the brain fails to send appropriate signals to the muscles responsible for breathing. Unlike OSA, which stems from mechanical issues, CSA arises due to impaired communication between the brain and muscles [11,12]. • Mixed sleep apnea (MSA): MSA, also known as complex sleep apnea, represents a combination of obstructive and central sleep apnea disorders, thus presenting a more complex pattern of symptoms and characteristics.
Detecting OSA using an electrocardiogram (ECG) is an expensive process that is inaccessible to a large number of the world's population. The attributes of the ECG signal differ in the case of awake and sleep intervals [13]. Hence, using a combined signal of awake and sleep stages reduces the overall reliability of the detection process. Several researchers recommend examining the ECG signal based on minutes [14]. In general, to detect OSA, the signal length should be at least 10 seconds in length. The diagnosis of OSA from ECG signals using various machine learning methods is a commonly used approach in the literature. For example, artificial neural networks (ANN) and convolutional neural networks (CNN) were introduced to detect and classify OSA. Wang et al. [15] used the CNN model to detect OSA based on ECG signals. The authors extracted a set of features from each signal and then trained a three-layered CNN model. The obtained results showed an acceptable performance and the ability to apply the proposed method over wearable devices.
Erdenebayar et al. [16] provided an automated detection method for OSA using a single-lead ECG and a CNN. The CNN model proposed in their study was meticulously constructed, featuring six convolution layers that were carefully optimized. These layers incorporated activation functions, pooling operations, and dropout layers. The research findings demonstrated that the proposed CNN model exhibited remarkable accuracy in detecting OSA solely by analyzing a single-lead ECG signal. Faust et al. [17] introduced the use of a long short-term memory (LSTM) neural network to detect sleep apnea based on RR intervals signal. Their results showed the ability of the LSTM network to detect sleep apnea with an accuracy equal to 99.80%. Schwartz et al. [18] employed several machine learning methods to detect four types of abbreviated digital sleep questionnaires (DSQs). The authors showed the ability of machine learning in detecting sleep disturbances with high accuracy. Lakhan et al. [19] proposed a dramatic involvement of a deep learning approach to detect multiple sleep apnea-hypopnea syndrome (SAHS). Two types of classifications were employed in their paper: binary classification with three cutoff indices (i.e., AHI = 5, 15, and 30 events/hour) and multiclass classification (i.e., no SAHS, mild SAHS, moderate SAHS, and severe SAHS). The obtained results for the binary classification showed that an AHI with 30 events/hour outperformed other cutoffs with an accuracy of 92.69%. For multiclass classification problems, the obtained accuracy was 63.70%. Banluesombatkul et al. [20] employed a novel deep learning method to detect OSA (i.e., normal and severe patients). The proposed method used three different deep learning methods: (i) one-dimensional CNN (1-D CNN) for feature extraction; (ii) deep recurrent neural networks (DRNNs) with an LSTM network for temporal information extraction; and (iii) fully connected neural networks (DNNs) for feature encoding. The proposed method showed acceptable results compared to the literature.
There have been several efforts to identify the relation between the snoring sound and OSA in the literature. In general, loud snoring is one of the indicators of OSA, and it is commonly thought that the frequency and amplitude of snoring are associated with the severity of the OSA [21]. Alshaer et al. [22] employed an acoustic analysis of breath sounds to detect OSA. The previous research suggests that OSA can be detected using snoring attributes. However, clinicians should pay attention to the possibility of missing an OSA diagnosis for patients with minimal snoring. Kang et al. [23] applied linear predict coding (LPC) and Mel-frequency cepstral coefficient (MFCC) features to detect OSA based on the amplitude of the snoring signal. The proposed method was able to classify three different events, namely, snoring, apnea, and silence, from sleep recordings with accuracies of 90.65%, 90.99%, and 90.30%, respectively.
Feature extraction and feature selection are the most commonly used techniques for data dimensionality reduction. Several papers have been published that highlight the importance of feature selection in OSA detection. Various features are extracted from the ECG signals; then, feature selection is used to reduce the number of extracted features and to determine the most valuable features related to OSA. In the stage of feature extraction, a set of features is extracted from the time series data, which aims to reveal the hidden information within the ECG signal. However, a feature set may contain redundant and irrelevant information, and feature selection is adopted to resolve this issue. A feature selection algorithm can help find the nearly optimal combination of features. Although feature selection is an expensive method, it can produce better classification performance, and high accuracy is significantly important in OSA detection. There are different classification methods that are used to select important features, such as support vector machine (SVM) networks, k-nearest neighbor (kNN) algorithms, artificial neural networks (ANN), linear discriminant analysis (LDA), and logistic regression (LR).
Many researchers have used demographic data to identify OSA. Sheta et al. [24,25] applied LR and ANN models to detect OSA based on demographic data. A real dataset was used that consists of several demographic features (i.e., weight, height, hip, waist, BMI, neck size, age, snoring, the modified Friedman (MF) score, the Epworth sleepiness scale, sex, and daytime sleepiness). The obtained results suggested that the proposed method could detect OSA with an acceptable accuracy. Surani et al. [26] applied the AdaBoost method as a machine learning classifier to detect OSA based on demographic data. The obtained results were promising. Surani et al. [27] applied a wrapper feature selection method based on binary particle swarm optimization (BPSO) with an ANN to detect OSA. The obtained results illustrated that the use of BPSO with an ANN can detect OSA with high accuracy. Haberfeld et al. [28] proposed a mobile application called Sleep Apnea Screener (SAS) to detect OSA based on demographic data. The authors used nine demographic features (i.e., height, weight, waist, hip, BMI, age, neck, M. Friedman, Epworth, snoring, gender, and daytime sleepiness). The application had two machine learning methods: LR and SVM. Moreover, the authors studied the performance of each classifier based on gender. The reported results showed that the proposed application can help patients detect OSA easily compared to an overnight test for OSA diagnosis.
There are many screening approaches for OSA, including tools such as the Berlin Questionnaire, the STOP-BANG Questionnaire, Epworth Sleepiness Scale (ESS), clinical assessment, and population-specific screening tools [29]. These approaches aim to identify individuals at a higher risk of OSA based on symptoms, risk factors, and questionnaire responses. Positive screening results prompt further evaluation using diagnostic tests such as polysomnography (PSG) or home sleep apnea testing (HSAT). Screening helps prioritize resources and directs individuals toward comprehensive sleep assessments. Subramanian et al. [30] introduced a novel screening approach known as the NAMES, which employs statistical methods to identify OSA. The NAMES assessment combines various factors, including neck circumference, airway classification, comorbidities, the Epworth scale, and snoring, to create a comprehensive evaluation that incorporates medical records, current symptoms, and physical examination findings. Experimental findings demonstrated the efficacy of the NAMES assessment in detecting OSA. Furthermore, the inclusion of BMI and gender in the assessment improved its screening capabilities.
This work proposes an efficient classification framework for the early detection of OSA. In specific, it is an extension of the NAMES work machine learning classification method and utilizes a metaheuristic-based feature selection scheme. The main contributions are summarized as follows: 1.
The OSA data was grouped based on age, sex, and race variables for performance improvement. This type of grouping is novel and has never been presented in this area of research before.

2.
Various types of the most well-known machine learning algorithms were assessed to determine the best-performing one for the OSA problem. These methods included twelve predefined (fixed) parameter classifiers and two optimized classifiers (using hyperparameter optimization).

3.
A wrapper feature selection approach using particle swarm optimization (PSO) was employed to determine the most valuable features related to the OSA.

4.
Experimental results from the actual data (collected from Torr Sleep Center, Texas, USA) confirmed that the proposed method improved the overall performance of the OSA prediction.
The rest of this paper is organized as follows: Section 2 presents the proposed method used in this work. Section 3 gives a brief description of the dataset used in the experiment. Section 5 discusses the experimental results and simulations. Finally, the conclusion and future work are presented in Section 8.

Proposed Diagnosis Process
The proposed OSA diagnosis process is illustrated in Figure 1. We suggest collecting data from patients who have undergone demographic, anthropometric measurements, and polysomnographic studies from a community-based sleep laboratory. An expert from the Torr Sleep Center (Corpus Christi, TX, USA) controlled the collection process for the polysomnography (PSG) evaluation of suspected OSA between 5 February 2007, and 21 April 2008. We processed the data to make the data more suitable for the analysis process. All missing data were handled, and a normalization technique was employed to transform the data into a standard scale. The next step was classification-based grouped data, where the classification model was implemented based on two types of learning methods-fixed parameter setting and adaptive parameter setting-through the training process. The benefit of using two kinds of learning methods is to learn more about the dataset and find the optimal parameter settings. After that, we applied a wrapper feature selection using the best performing classifier to identify the most valuable features related to OSA. This step can reveal useful information to physicians and doctors to understand the demographic characteristics of OSA patients. Finally, we used a set of evaluation criteria (i.e., accuracy, TPR, TNR, AUC, precision, F-score, and G-mean) to evaluate the performance of each classifier. Figure 1. The proposed methodology. The figure illustrates the step-by-step process of the proposed methodology, which involved five steps: data collection, data preprocessing, data grouping, classification, feature selection, and evaluation.

Sleep Apnea Dataset
The initial dataset employed in this study encompasses 620 patients, comprising 366 males and 254 females. The age range for males spans from 19 to 88 years, while for females, it ranges from 20 to 96 years. Notably, the prevalence of snoring was 92.6% among males and 91.7% among females. Each patient underwent comprehensive full-night monitoring as part of the study. The dataset comprises 31 input features and a binary output, represented by either 0 or 1, thus indicating the presence or absence of obstructive sleep apnea (OSA) (see Table 1 for a detailed presentation of these features). Additionally, the study recorded each individual's Friedman tongue position (FTP), which encompasses four distinct positions, as depicted in Figure 2. Additionally, the Epworth scale, which is used to assess sleepiness, was collected. The scale details are presented in Table 2. Notably, the dataset is imbalanced, with 357 patients identified as positive cases with OSA and 263 individuals identified without OSA. Table 3 provides a comprehensive overview of the dataset's characteristics.

Data Preprocessing
In the data classification-process-based machine learning, data preprocessing is when the data gets encoded to transfer it to a state that the machine can quickly analyze. In this case, the features of the data were smoothly interpreted by the algorithm. Data preprocessing is a vital step in any machine learning process [32]. This process aims to reduce unexpected behavior through the learning process, thereby enhancing the machine learning algorithm's performance [33,34]. A set of operations such as data cleaning, data transformation, and data reduction are usually involved in data preprocessing. Precisely, the main preprocessing steps used in this research are the following: fill the missing values, data grouping, normalization, and feature selection.

Missing Data
It is ubiquitous to have missing elements from either rows or columns in your dataset. A failure to collect accurate data can occur during the data collection process or be due to a particular adopted data validation rule. There are several methods to handle missing data. They include the following: • If more than 50% of any rows or columns values are missing, we have to remove the whole row/columns, except where it is feasible to fill in the missing values. • If only a rational percentage of values are missing, we can adopt simple interpolation methods to fill in those values. Interpolation methods include filling missing values with the mean, median, or mode value of the respective feature.
In this work, we applied a statistical imputation approach [35,36]. All missing values for each attribute were replaced with a statistical measure that was calculated from the remaining values for that attribute. The statistics used were the mean and mode for the numeric and nominal features, respectively. These methods were chosen because they are fast, easy to implement, prevent information loss, and work well with a small dataset. Figure 3 depicts an example of missing value imputation.

Data Normalization
Data normalization is the process of standardizing numerical attributes into a common scale [37,38]. This operation is strongly recommended for the machine learning process to avoid any bias towards dominant features. Min-max normalization was applied in this research to rescale every numerical feature value into a number within [0,1]. For every feature, every value x gets transformed into x n using formula given in Equation (1).
where x n is the normalized value of x, and min and max represent the minimum and maximum value of the feature, respectively.

Role of Grouping in OSA Diagnosis
Recently, there have been many research efforts toward understanding the relationship between sex, age, and ethnicity in the diagnosis of sleep apnea.
Several research articles have explored the concept of data grouping. For example, in a study conducted by Mohsenin et al. [39], the authors examined the relationship between gender and the prevalence of hypertension in individuals with obstructive sleep apnea (OSA). The study, based on a large cohort of patients assessed at the Yale Center for Sleep Medicine, investigated how gender influences the likelihood of hypertension in OSA patients. The results revealed that hypertension rates increased with age and the severity of OSA, with obese men in the clinic-based population being at approximately twice the risk of hypertension compared to women. Similarly, another study by Freitas et al. [40] investigated the impact of gender on the diagnosis and treatment of OSA.
The study conducted by Ralls et al. [41] delved into the roles of gender, age, race/ethnicity, and residential socioeconomics in OSA syndromes. The research reviewed the existing literature and shed light on several intriguing findings. OSA was found to predominantly affect males, while women exhibited lower apnea-hypopnea index (AHI) values than men during specific sleep stages. Interestingly, women required lower levels of continuous positive airway pressure (CPAP) for treating OSAs of similar severities. The study also highlighted the impact of environmental factors, such as obesity, craniofacial structure, lower socioeconomic status, and residing in disadvantaged neighborhoods, on the prevalence and severity of OSA among different ethnic and racial groups.
In a research paper by Slaats et al. [42], an investigation was conducted to explore the relationship between ethnicity and OSA, which specifically focused on upper airway (UA) morphology, including Down syndrome. The findings of the study revealed that black African (bA) children exhibited a distinct upper airway morphology and were more prone to experiencing severe and persistent OSA compared to Caucasian children. This suggests that ethnicity plays a role in the susceptibility to OSA and highlights the importance of considering ethnic differences in diagnosing and managing the condition.
The Victoria Sleep Cohort study, as discussed in Irene et al. [43], investigated the gender-related impact of OSA on cardiovascular diseases. The study found consistent evidence linking OSA with cardiovascular risk, with a particular emphasis on men with OSA. The authors highlighted that the relationship between OSA and cardiovascular risk is influenced by gender, thereby indicating the need for tailored OSA treatment approaches for men and women. Additionally, Mohsenin et al. [44] conducted a study examining the effect of obesity on pharyngeal size separately for men and women, thus providing insights into the influence of obesity on the upper airway in OSA patients of different genders.
One of the main objectives of this study is to investigate the detection of OSA before and after grouping data based on demographic variables such as age, gender, and race. Accordingly, the original data was grouped by ethnicity (Caucasian and Hispanic), gender (males and females), and age (age ≤ 50 or age > 50). Consequently, six datasets were investigated: Caucasian, Hispanic, females, males, age ≤ 50, and age > 50. In Table 3, we are showing the data-distribution-based grouping. In Figure 4, we show the distribution of apnea and no apnea with respect to age, gender, and race attributes for all datasets.

Wrapper Feature Selection
Feature selection (FS) plays a crucial role in data mining, wherein it serves as a preprocessing phase to identify and retain informative patterns/features while excluding irrelevant ones. This NP-hard optimization problem has significant implications in data classification, as selecting valuable features can enhance the classification accuracy and reduce computational costs [45,46]. FS methods can be categorized into two families based on the criteria used to evaluate the selected feature subset: these include filters and wrappers [46,47]. Filter FS techniques employ scoring matrices to assign weights to features, such as mutual information or chi-square tests. Features with weights below a threshold are then eliminated from the feature set. On the other hand, wrapper FS methods utilize classification algorithms such as SVM or linear discriminant analysis to assess the quality of the feature subsets generated by a search method [48,49].
Generally, wrapper FS approaches tend to yield higher classification accuracy by leveraging dependencies among features within a subset. In contrast, filter FS methods may overlook such dependencies. However, wrapper FS comes with a higher computational cost compared to filter FS [50].
Feature subset generation involves the search for a highly informative subset of features from a set of patterns. Various search strategies, such as heuristic, complete, and random, are employed for this purpose [51][52][53]. The complete search involves generating and examining all possible feature subsets in the search space to identify the most informative one. However, this approach becomes computationally infeasible for large datasets due to the exponential growth of subsets. For instance, if a dataset has 31 features, the complete search would generate 2 31 subsets for evaluation. Random search, as the name implies, randomly explores the feature space to find subsequent feature subsets [54]. Although random search can, in some cases, generate all possible feature subsets similar to a complete search [45,55], it lacks a systematic search pattern.
In contrast, heuristic search is a different approach used to feature subset generation. It is characterized by iteratively improving the quality of the solution (i.e., a feature subset) based on a given heuristic function, thereby aiming to optimize a specific problem [56]. While heuristic search does not guarantee finding the best solution, it can often find good solutions within reasonable memory and time constraints. Several metaheuristic algorithms, such as particle swarm optimization (PSO) [57], ant colony optimization (ACO) [58], the firefly algorithm (FA) [59], ant lion optimization (ALO) [60], the whale optimization algorithm (WOA) [61], and the grey wolf optimizer (GWO) [62], have demonstrated their effectiveness in addressing feature subset selection problems. Examples of FS approaches can be found in [63][64][65][66][67][68][69].
This paper presents a wrapper feature selection approach based on particle swarm optimization (PSO) [70]. The main concept behind PSO is to simulate the collective behavior of bird flocking. The algorithm initializes a group of particles (solutions) that explore the search space in order to find the optimal solution for a given optimization problem. Each particle in the population adjusts its velocity and position based on the best solution found so far within the swarm. By considering the best particle, each individual particle updates its velocity and position according to specific rules, as outlined in Equations (2) and (3).
The PSO-based wrapper feature selection approach described in the paper utilizes this algorithm to search for an effective feature subset that improves the performance of the chosen optimization problem. For further details, please refer to [70,71].
where m denotes the current generation, ω 1 is a parameter, named inertia weight, that is used for controlling the global search and local search tendencies. v j i (m) denotes the current velocity at generation m for the j-th dimension of the i-th particle, and x j i (m) denotes the current position of the i-th particle for the j-th dimension. Two uniformly distributed randomly assigned numbers between (0,1) are presented by r 1 and r 2 , respectively, and c 1 and c 2 are known as acceleration coefficients. pbest is the optimal solution that the particle i has found so far. gbest refers to the best solution found within the population so far.
To adapt the original PSO algorithm for discrete or binary search space, a modified binary version was introduced by [57]. The primary step in this transformation is the utilization of a sigmoid (transfer) function, as shown in Equation (4), to convert the realvalued velocities into probability values ranging from 0 to 1. The objective is to adjust the particle's position based on the probability defined by its velocity. This allows for the representation of binary or discrete variables within the PSO framework.
where v j i (m) refers to the velocity of particle i at iteration m in the j-th dimension. The updating process for the S-shape group is presented in Equation (5) for the next iteration m + 1. After that, the position vectors can be updated based on the probability values of their velocities as follows: The basic version of the BPSO suffers from some drawbacks, such as trapping in local minima. Mirjalili and Lewis [71] proposed a modified version of the BPSO in which transfer functions for mapping continuous search the space into binary were employed. The aim of introducing these functions is to avoid the problem of local optima and to improve the convergence speed. In this work, we employed the S-shaped transfer functions proposed in [71] for converting the PSO into binary. We examined these functions with the PSO algorithm to choose the most appropriate one. Table 4 presents the utilized transfer functions, and Figure 5 shows the shapes of these transfer functions. Table 4. S-shaped transfer functions. The table provides the names and formulas of four S-shaped functions (S1, S2, S3, and S4). These functions exhibit the characteristic sigmoidal shape, which is commonly observed in S-shaped curves.

Name
Transfer Function Formula S1

Formulation of Feature Selection Problem
An FS is typically treated as a binary optimization problem, where candidate solutions are represented as binary vectors. To address this, a binary optimizer such as binary particle swarm optimization (BPSO) can be utilized. This work proposes a wrapper FS method that combines the BPSO as the search strategy and a classifier (e.g., KNN) to evaluate the quality of the feature subsets generated by the BPSO. In the FS problem, a solution is encoded as a binary vector with a length equal to the total number of features in the dataset. Each element in the vector represents a feature, where a value of zero indicates the exclusion of the corresponding feature, and a value of one indicates its inclusion or selection.
The paper introduces four FS approaches based on different binary variants of PSO, with each utilizing a specific S-shaped transfer function to convert continuous values into binary ones. The FS is considered to be a multi-objective optimization problem, thereby aiming to achieve both high classification accuracy and a low number of features. These two objectives are formulated as contradictory objectives in Equation (6) [46,64].
where er indicates the error rate of the utilized classification algorithm (e.g., KNN) over a subset of features produced by the BPSO optimizer. F is the number of selected features, and N denotes the number of all the features. α = 0.99 and β = 0.01 [72,73] are two controlling parameters to balance the importance of both objectives.

Experimental Setup
It is well-known that there is no universal machine learning algorithm that can be the best-performing for all problems (As suggested by the No Free Lunch (NFL) theorem [74]). This motivated our attempts to examine various fixed and adaptive classification algorithms to identify the most applicable one for OSA. In the experiment, various classification methods were tested. However, only those classifiers with better performances are reported. Correspondingly, we adopted the decision tree (DT), naive Bayes (NB), K-nearest neighbor (kNN), support vector machine (SVM), fine decision tree (FDT), coarse decision tree (CDT), linear discriminate analysis (LDA), logistic regression (LR), Gaussian naive Bayes (GNB), kernel naive Bayes (KNB), linear support vector machine (LSVM), medium Gaussian support vector machine (MGSVM), coarse Gaussian support vector machine (CGSVM), cosine k-nearest neighbor (CKNN), weighted K-nearest neighbor (WKNN), and subspace discriminant (Ensemble) classifiers for performance validation. The detailed parameter settings for these classification methods are presented in Table 5. Moreover, the kNN and SVM with hyperparameter optimization settings (see Tables 6 and 7) were also employed in this work.
In this study, a K-fold cross-validation with K = 10 was employed for performance evaluation instead of a hold-out validation. K-fold cross-validation offers the advantage of estimating the generalization error by using different combinations of training and testing sets. This approach allows for comprehensive testing of the data. For assessing the performance of the machine learning models, multiple metrics were utilized, including the accuracy, true positive rate (TPR), true negative rate (TNR), area under the curve (AUC), precision, F-score, and G-mean. These metrics were measured to ensure the effectiveness of the model.

Experimental Results
The following sections show the evaluation of the developed results using the complete dataset and the grouped dataset based on race, gender, and age.

Results with All Data
The experiment was conducted in eight phases. In the first phase, we analyzed the performance results of different classification algorithms for the complete dataset. Accordingly, the DT, LDA, LR, NB, SVM, kNN, Ensemble, optimized kNN, and optimized SVM algorithms provided the best results in this analysis. Thus, only the results of these classifiers are reported, as shown in Table 8. As illustrated, a different result was perceived by each classification algorithm. Compared with the other classifiers, the optimized classifiers (SVM* and kNN*) retained the highest accuracies of 0.7226 and 0.7409, respectively. Our findings suggest that the optimized classifiers achieved the best performance in the sleep apnea classification. The kNN* offered the best result with an accuracy of (0.7409), a TPR of (0.8322), an AUC of (0.7321), a precision of (0.7294), an F-score of (0.7774), and anG-mean of (0.7252). Moreover, the kNN* achieved the highest mean rank of 1.14, thus suggesting that the kNN* was the best classifier when the complete dataset was used.

Data Grouping with Race
In the second phase, we inspected the performance of different classification algorithms based on the grouped data by race. Tables 9 and 10 demonstrate the performance of different classification algorithms based on the data of Caucasian and Hispanic races, respectively. From Table 9, the highest accuracy of 0.7483 was obtained by the CKNN and kNN*. In terms of the AUC value, the KNN* retained the best AUC of 0.7114, which showed better performance in discriminating between the classes. Moreover, the kNN* yielded the optimal mean rank of 2.29. When observing the results in Table 10, it is clear that the kNN* scored the highest accuracy, TPR, TNR, AUC, precision, F-score, and G-mean. The kNN* proved to be the best algorithm in this analysis. The results of the mean rank in both Tables 9 and 10 support this argument.

Data Grouping with Gender
The behavior of the different classification methods regarding the grouped data by gender is studied in this subsection. Tables 11 and 12 outline the evaluation results of different classification algorithms. According to findings in Table 11, it is seen that the best accuracy of 0.7458 was achieved by the WKNN and Ensemble classifiers. However, the Ensemble classifier offered the optimal mean rank of 2.43, which showed excellent results for the grouped data of females. By inspecting the results in Table 12, we can observe that the performance of the kNN* was the best. The kNN* ranked first (mean rank = 1.14) and offered the highest accuracy of 0.6987, the highest TNR of 0.7500, the best AUC of 0.6875, a precision of 0.6349, an F-score of 0.6299, and a G-mean of 0.6847.

Data Grouping with Age
In the fourth phase, we investigated the performance of the different classification algorithms based on the grouped data by age (age ≤ 50 or age > 50). Note that the age was normally distributed around 50. Table 13 shows the evaluation results of age ≤ 50. As can be seen, the SVM* obtained the highest accuracy of 0.7523, followed by the kNN* (0.7431). Correspondingly, the SVM* contributed to the optimal TNR, AUC, precision, and G-mean. On the other side, the evaluation results of age > 50 are tabulated in Table 14. As shown, the kNN* achieved the best accuracy of 0.7333. In addition, the kNN* ranked first with the highest properties of the AUC, precision, F-score, and G-mean. Our findings indicate that the algorithms with hyperparameter optimization (SVM* and kNN*) achieved the best performance in the sleep apnea classification.  Table 15 summarizes the overall ranking results for all classifiers. Meanwhile, the bar chart of the overall ranking is demonstrated in Figure 6. As illustrated in the results, the SVM* and kNN* offered the best ranking in most cases. Among the classifiers, the SVM* and kNN* assured the optimal average rank of 3.09 and 1.80, respectively. The experimental results reveal the supremacy of the optimized algorithms for the classification of sleep apnea. The observed improvement in the kNN* and SVM* is attributed to the training process's hyperparameter optimization, which enabled the models to explain the target concepts better.

Summary Performance with Data Grouping
In the fifth part of the experiment, we studied the impact of grouping (race, gender, and age) on the performance of different classifiers. Table 16 depicts the performance evaluation before and after grouping. One can see that the performances of the classifiers were substantially improved when the grouping was implemented, especially for the data grouped by races (Caucasian and Hispanic).
From the analysis, it can be inferred that the grouping step is beneficial for performance improvement. As observed in the result, the data grouped by Caucasian was the best model for accurate sleep apnea classification, with an optimal mean rank of 1.33. Based on the findings, we can conclude that grouping the data with race (Caucasian) maximizes the features' separability between classes. Furthermore, Table 17 reports the result of the running time (in seconds). Across all the datasets, it is seen that the fastest algorithm was the DT (rank of 1.29), followed by the LDA (rank of 2.14).

Feature Selection
In the sixth phase, we investigated the impact of feature selection techniques for all cases. Generally speaking, data dimensionality has a large impact in the machine learning development process. Data with high dimensionality not only contain irrelevant and redundant features that can negatively affect the accuracy, but also require massive time and computational resources [75]. Hence, feature selection can be an effective way to resolve the above issue while improving the performance of the learning model. In this research, we adopted the most popular feature selection method, called binary particle swarm optimization (BPSO), to assess the significant features from the high-dimensional feature space. It is worth noting that the kNN* was employed as the learning algorithm, since it obtained the best performance from the previous analysis.

Evaluation of BPSO Using Different TFs
Initially, the BPSO with different S-shaped transfer functions (TFs) was studied. Generally, TFs play an essential role in converting the solution into a binary form. In other words, it enables the particles to search around the binary feature space. However, different TFs may yield different kinds of results [71]. Thus, we evaluated the BPSO with four other TFs and found the optimal one. Table 18 shows the average accuracy of the BPSO variants. Based on the result obtained, the BPSO1 achieved the highest accuracy for all five cases except the complete dataset. By observing the result in Table 19, the BPSO1 also yielded the smallest number of selected features in most cases. Our results imply that the BPSO1 was highly capable of finding the optimal feature subset, thereby enhancing the learning model's performance for sleep apnea classification. The results of the mean ranks support this clarification reported in Tables 18 and 19.   Table 20 tabulates the running time (in seconds) of the BSPO variants. As can be observed, the BPSO1 often ran faster to find the near-optimal solution, while the BPSO2 was the slowest. Eventually, the BPSO1 was shown to be the best variant, and it was employed in the rest of the experiment.

Comparison of BPSO with Well-Known Algorithms
In this subsection, the performance of the BPSO1 was further compared with the other seven state-of-the-art methods. The comparison algorithms are the binary Harris hawk optimization (BHHO) [73], the binary gravitational search algorithm (BGSA) [76], the binary whale optimization algorithm (BWOA) [77], the binary grey wolf optimization (BGWO) [78], the binary bat algorithm (BBA) [79], the binary ant lion optimizer (BALO) [78], and the binary moth-flame optimization (BMFO) [48]. Table 21 presents the average accuracy results obtained by the eight different algorithms. From Table 21, it is seen that the BPSO1 outperformed the other methods in tackling the feature selection problem. The results show that the BPSO1 retained the optimal mean rank of 1.57, followed by the BHHO (2.29).
Among the groups (race, gender, and age), age positively impacted accuracy when applying the BPSO1. The results revealed using feature selection showed that the performance of the grouped data by age could be substantially improved. Moreover, Table 22 presents the result of the Wilcoxon signed rank test. From Table 22, the BPSO1 outperformed the other methods in this work.    Table 24. Our result indicates that the best algorithm in the feature reduction was the BBA, while the BSPO1 ranked second. In terms of the computational complexity, one can see from Table 25 that the BPSO1 again scored the optimal mean rank of 1.86 across all datasets. The BPSO1 offers not only the highest accuracy and the minimal number of features, but also the fastest computational speed. Figure 7 illustrates the convergence behavior of the compared algorithms. We can observe that the BPSO1 converged faster and deeper to reach the global optimum out of all seven cases. The BPSO1 showed an excellent convergence rate against its competitors. This can be interpreted due to the strong searching ability of the BSPO1 algorithm. On the other side, the BBA and BGSA were found to have the lowest performance. They suffered from early stagnation and premature convergence, thereby reducing the classification performance.

Relevant Features Selected by BPSO
In the seventh phase, we inspected the relevant features selected by the BPSO1 algorithm. Table 26 outlines the best accuracy results of the classifiers with and without the BPSO algorithm. As shown in Table 26, the classification accuracy increased when the BSPO was deployed. The result affirms the importance and effectiveness of the feature selection method in sleep apnea classification. Taking the Caucasian dataset as an example, an increment of roughly 6% accuracy was achieved by the BPSO1 algorithm, with a feature reduction of 56.67%. In the dataset with age ≤ 50, the proposed approach improved the accuracy by at least 11% while eliminating more than half of the irrelevant and redundant features in the dataset. Moreover, the reduction in the feature size contributed to the overall decrease in classifier complexity. Table 27 presents the details of the selected features yielded through the BPSO algorithm. Instead of using all 31 features, the results show that the number of features chosen was 18 for all datasets, 13 for the Caucasian dataset, 14 for the Hispanic dataset, 13 for the females dataset, 11 for the males dataset, 15 for the age ≤ 50 datasets, and 17 for the age > 50 datasets. The findings suggest that fewer than 20 features are sufficient for accurate sleep apnea classification. On the one hand, Figure 8 exhibits the importance of the features in terms of the number of times each feature was chosen by the BPSO. Across all the datastes, it is suggested that the most selected features were f22 and f11, followed by f14 and f8. Correspondingly, these features had high discriminative power that could best describe the OSA compared to others.  In the final part of the experiments, we compared the performance of the BPSO-kNN to the kNN* and the other well-known models, including the convolutional neural network (CNN) and multilayer perceptron neural network (MLP). Note that the maximum number of epochs for both the CNN and MLP were set at 150. Table 28 presents the accuracy and computational time of the BPSO-kNN, CNN, MLP, and kNN* methods. Upon inspecting the result, the BPSO-kNN contributed to the highest accuracy for all the datasets. Although the computational complexity of the BPSO-kNN was much higher than the CNN, MLP, and kNN*, it can usually ensure an accurate classification process. All in all, our findings affirm the superiority of the BSPO-kNN for the sleep apnea classification.
Based on previous analysis, it showed that the performance of the OSA diagnosis can be enhanced after applying the feature selection method. According to Figure 9, the accuracy percentage showed an increment of at least 3% in most datasets. As can be observed, an increment of roughly 10% could be achieved with the feature selection approach for the dataset age ≤ 50. From the aforementioned, the irrelevant and redundant features are meaningless, and they will degrade the performance of the model, as well as increase the dimensionality of the dataset. By utilizing the BPSO-kNN, most of the unwanted features can be removed while keeping the most informative ones, which guarantees a better diagnosis of the OSA. As a bonus, the BPSO-kNN selects the useful features from the dataset in an automatic way, which means it can be implemented without the need for prior knowledge and experience. In short, feature selection is an essential and efficienct tool for sleep apnea classification.

Comparison Study
To verify the performance of the proposed approach, we compared the obtained results with those reported in the preceding work on the same dataset. For this purpose, the proposed BPSO-kNN was compared with the screening tool (NAMES assessment) offered by Subramanian et al. [30]. Table 29 presents the AUC scores of the NAMES assessment using different combinations of features versus the BPSO-kNN. According to the findings, it is observed that the developed BPSO-kNN outperformed the other methods, with an optimal AUC rate of 0.8320. By comparing our proposed model to [27,28], it is clear that the proposed model overwhelmed the SVM, LR, and ANN models. The results again validate the superiority of the feature selection process. These observations confirm that data grouping and the proper selection of features with an effective classification method can yield better performance for OSA detection.

Conclusions and Future Works
This study proposed an alternative approach to detect obstructive sleep apnea (OSA), which utilized demographic data instead of traditional ECG analysis. Expert physicians and sleep specialists collected a dataset of 31 features from 620 patients at the Torr Sleep Center in Texas, USA. The research focused on evaluating the performance of various machine learning classifiers using fixed and adaptive learning methods, thereby aiming to identify the most suitable classifier for the collected data. The results demonstrated that the kNN classifier achieved the highest accuracy among the tested classifiers. Additionally, a wrapper feature selection method based on the BPSO (binary particle swarm optimization) was employed with the kNN classifier to determine the most relevant features associated with OSA. The experimental outcomes indicate that the proposed method enhanced the overall prediction performance for OSA. As part of future work, the investigation will expand to include several wrapper feature selection methods, such as binary genetic algorithms (BGA) and binary ant colony optimization (BACO), thus aiming to assess the performance of the kNN classifier with different feature selection techniques.

Conflicts of Interest:
The authors declare that there is no conflict of interest regarding the publication of this paper.