Screening for Obstructive Sleep Apnea Risk by Using Machine Learning Approaches and Anthropometric Features

Obstructive sleep apnea (OSA) is a global health concern and is typically diagnosed using in-laboratory polysomnography (PSG). However, PSG is highly time-consuming and labor-intensive. We, therefore, developed machine learning models based on easily accessed anthropometric features to screen for the risk of moderate to severe and severe OSA. We enrolled 3503 patients from Taiwan and determined their PSG parameters and anthropometric features. Subsequently, we compared the mean values among patients with different OSA severity and considered correlations among all participants. We developed models based on the following machine learning approaches: logistic regression, k-nearest neighbors, naïve Bayes, random forest (RF), support vector machine, and XGBoost. Collected data were first independently split into two data sets (training and validation: 80%; testing: 20%). Thereafter, we adopted the model with the highest accuracy in the training and validation stage to predict the testing set. We explored the importance of each feature in the OSA risk screening by calculating the Shapley values of each input variable. The RF model achieved the highest accuracy for moderate to severe (84.74%) and severe (72.61%) OSA. The level of visceral fat was found to be a predominant feature in the risk screening models of OSA with the aforementioned levels of severity. Our machine learning models can be employed to screen for OSA risk in the populations in Taiwan and in those with similar craniofacial structures.


Introduction
Obstructive sleep apnea (OSA) refers to sleep-disordered breathing caused by partial or complete airway obstruction [1]. This disease has become a global health concern, with approximately one billion people aged 30-65 years being affected by mild-to-severe OSA, and 425 million having moderate to severe OSA [2]. In the United States, the prevalence rate of OSA increased by approximately 30% between 1990 and 2010 [3]. OSA is regarded as a risk factor for various comorbidities, including 2-3-fold increased risks of cardiovascular and metabolic diseases [4], and decreased hippocampal volume, which is associated with neurocognitive deficits [5]. Therefore, early diagnosis of and suitable treatment for OSA are essential.
In-laboratory polysomnography (PSG) is the standard measurement to diagnose OSA and differentiate severity. Specifically, the apnea-hypopnea index (AHI), which records the total number of apnea and hypopnea events during sleep time, is determined using PSG data; the index is used to differentiate between four OSA severity categories: normal (AHI < 5), mild (5 ≤ AHI < 15 events/h), moderate (15 ≤ AHI < 30 events/h), and severe (AHI ≥ 30 events/h) [6]. Curative interventions are generally recommended for patients with moderate or severe OSA (AHI ≥ 15 events/h). Despite its usefulness, PSG has some clinical shortcomings. For example, PSG requires a lengthy monitoring time and the involvement of licensed technicians; thus, the average PSG waiting time in developed countries ranges from months to 2 years [7]. The time-consuming and labor-intensive nature of PSG may limit its efficiency and effectiveness. Alternative methods have been proposed to improve measurement accessibility, including the OSA questionnaires and home sleep tests (HSTs) through oximetry; however, none of them is fully reliable as a surrogate for PSG. A review study indicated that inconsistent results from studies using different OSA questionnaires (Berlin; apnea score; sleep apnea scale of the sleep disorders questionnaire; snoring, tiredness, observed-apnea, and high blood pressure (STOP); STOP including body mass index (BMI), age, neck circumference, and sex) can be attributed to heterogeneity in study design and enrolled populations [8]. Regarding HSTs, despite offering convenient diagnosis, this approach may be insufficiently accurate to rule out OSA when the respiratory events of patients are mainly associated with arousals [9]. Moreover, because of the reduced number of physiological channels in HSTs, this approach may not be suitable for patients with complicated comorbidities [10]. Given the aforementioned deficiencies in current methods, novel models to rapidly screen for OSA risk and thereby increase the efficiency of the therapeutic decision-making process are required.
To develop clinically applicable models, exploring the relationships between OSA severity and anthropometric features may be worthwhile. Sex, age, and BMI for instance, have been suggested as useful indicators in OSA risk screening [11]. In one study, the prevalence of moderate to severe OSA in middle-aged men (aged 30-49 years) and in older men (aged 50-77 years) was 3.3-fold and 1.9-fold higher than the values of female cohorts with the same age ranges, respectively [12]. Another study indicated that those with obesity (BMI: 30-39.9 kg/m 2 ) had higher mean AHI and oxygen desaturation index (ODI) values than those with a healthy weight (AHI: 28.5 ± 1.22 events/h vs. 14.3 ± 1.40 events/h; ODI: 32.1 ± 1.20 events/h vs. 15.8 ± 1.40 events/h, all p < 0.01) [13]. Neck and waist size have also been adopted as proxies for BMI when screening for OSA risk, with a neck size of >43 cm in men or >38 cm in women and a waist size of >102 cm for both sexes indicating increased risk [14]. In another study, neck size (ρ: 0.54), waist size (ρ: 0.75), and body water (ρ: 0.69) were all significantly and positively correlated with AHI [15]. Moreover, body fat level was significantly correlated with AHI (r = 0.65), and abdominal visceral fat level calculated through cross-sectional computed tomography exhibited adequate sensitivity and specificity (p < 0.01) in differentiating between those with OSA and healthy individuals [16]. Hence, these easily acquired anthropometric features may be useful in OSA risk screening models because their associations with AHI have been demonstrated.
In this retrospective study, we sought to develop risk screening models for OSA using machine learning approaches and easily acquired parameters, such as anthropometric features. We hypothesized that anthropometric features (e.g., body profile and body composition parameters), which are associated with OSA severity, would be beneficial in models for screening the risk of moderate to severe OSA (AHI ≥ 15) and severe OSA (AHI ≥ 30). We developed OSA risk screening models using various machine learning approaches that incorporated easily accessed anthropometric features. Subsequently, we compared the means of the obtained anthropometric features in groups with different OSA severity. We also examined the correlations between anthropometric features and sleep quality indices. The aim of these analyses was to elucidate the relationships between these variables.

Ethics
The Ethics Committee of the Taipei Medical University-Joint Institutional Review Board reviewed and approved the protocol of this retrospective study (TMU-JIRB No: N201911007). All relevant procedures for data collection, analysis, and preservation were conducted per the approved protocol.

Study Population
We retrospectively collected the data of patients who underwent PSG for OSA severity assessment at the Sleep Center of Taipei Medical University-Shuang Ho Hospital (New Taipei City, Taiwan) between May 2019 and December 2021. The inclusion criteria for data use were as follows: age between 18 and 90 years, overall PSG recording time of >6 h and sleep efficiency of >60%, no history of invasive surgery for OSA, and no regular use of hypnotic or psychotropic medications. Using the medical registration number list of the eligible individuals, we acquired their physical profiles, which included information on age, sex, body mass index (BMI), and neck and waist circumferences, from their responses to a baseline survey questionnaire recorded in a sleep center database. Next, we obtained data regarding the participants' medication and surgical history from their clinical records. Because of known correlations between OSA severity and craniofacial features, we collected data from only Han individuals to limit the effect of craniofacial feature disparities [17].

Body Composition
Body composition data were collected from the aforementioned sleep center database. The procedures used for determining body composition are described below. Before the patients underwent PSG, we measured their body compositions (through bioelectrical impedance) using the Tanita MC-780 system (Tanita, Tokyo, Japan). Before the measurement, the patients fasted for 3 h and emptied their bladder. During data reading, the patients were instructed to stand still and hold the detection handles with both arms straight down while ensuring their inner thighs did not touch. Fat mass and fat-free mass (comprising bone and muscle mass) in various body regions (whole body, only limbs, and only trunk) were assessed, and the percentages of fat and muscle in the aforementioned regions were subsequently derived. Visceral fat level (as an index for evaluating fat encompassing the vital organs in the abdominal cavity; range 1-55I), basal metabolic rate (as the minimum energy required by the body at rest), and physique rating (body fat mass divided by muscle mass) were also determined. To evaluate water distribution in the body, the volume of total body water (TBW), including the volumes of extracellular water (ECW) and intracellular water (ICW), percentage of body water, ratio between ECW and ICW, and ratio between trunk fat and whole-body fat were determined. All the derived parameters were used in further analyses.

Sleep Parameter
Sleep parameters were selected from the PSG database. The procedures used for PSG are described below. In-laboratory PSG was conducted using the ResMed Embla N7000 (ResMed, San Diego, CA, USA) and Embla MPR (Natus Medical, Pleasanton, CA, USA) systems. The PSG recorded various physiological signals, namely electroencephalography, electrooculography, electromyography (chin and leg), electrocardiography, nasal and oral airflow, snoring patterns, thoracic and abdominal impendence, sleeping position, and oxygen saturation. A licensed PSG technologist scored recordings using RemLogic software (version 3.41, Embla, Thornton, CO, USA), following the Americana Academy of Sleep Medicine Scoring Manual Version 2.4 [18]. All the scored results were reviewed by another technologist, and inconsistent scorings were identified and discussed further to achieve a consensus. We determined the distribution of each sleep stage, namely wake, rapid eye movement (REM), and non-REM (NREM), and we subsequently calculated the wake after sleep onset (WASO) accumulation time. OSA severity was classified by AHI into four levels: normal (AHI < 5 events/h), mild (5 ≤ AHI < 15 events/h), moderate (15 ≤ AHI < 30 events/h), and severe (AHI ≥ 30 events/h) [6]. For patients with AHI ≥ 15 events/h, OSA intervention was recommended [19]. We thus developed two types of risk screening models, one for the risk of moderate to severe OSA (AHI ≥ 15 vs. AHI < 15) and the other for the risk of severe OSA (AHI ≥ 30 vs. AHI < 30).

Statistical Analysis
We employed Python (version 3.9.7) and an open-source statistics module, scikitlearn (version 0.21.2), to perform the statistical analyses. Patients were split into three groups according to OSA severity: normal-to-mild, moderate, and severe OSA groups. For continuous variables, the Shapiro-Wilk test was first used to examine the normality of their distribution. We employed nonparametric statistical approaches because the grouped data were nonnormally distributed. Subsequently, we used Levene's test to examine the homogeneity of variance, followed by the Kruskal-Wallis test (homoscedastic) and Welch's analysis of variance test (heteroscedastic). Regarding nominal variables, we used the chi-square test to compare intergroup differences. In addition, Pearson's correlation was applied to determine the correlations between anthropometric features and sleep quality indices, namely AHI, ODI, snoring index, and arousal index. The level of statistical significance was set at p < 0.05.

Machine Learning Approaches
Six supervised machine learning models, namely, logistic regression (LR), k-nearest neighbors (kNN), naïve Bayes (NB), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGBoost), were employed to develop the two types of OSA risk screening models. Figure 1 illustrates the flowchart for the model establishment. Initially, the data were independently separated into two data sets (training and validation set and testing set) at a ratio of 80% and 20%. First, we applied grid search 10-fold crossvalidation during the training and validation stage to determine the optimal classifier for each machine learning approach [20]. Specifically, we compared the accuracy by tuning (a) the inverse values of regularization (C, from 10 −5 to 10 5 ) for the LR models; (b) the k value (ranging from 2 to 5) and weight type (uniform or distance) for the kNN models; (c) the portion of the largest variance of all features for the NB models (var_smoothing, from 10 −9 to 10 9 ); (d) various kernel types (linear, polynomial, and radial basis function) and regularization values (C, between 10 −3 and 10 3 ) for the SVM models, (e) the criterion (Gini index or entropy) and the number of classification and regression trees (set as 250, 500, and 750) for the RF models with the bootstrap technique, and (f) the criterion (mean squared error (MSE), Friedman MSE, or squared error) and the number of estimators (set as 250, 500, and 750) for the XGBoost models. The performance matrix and area under the receiver operating characteristic curve (AUC) of each model were then determined. Thereafter, the machine learning approach with the highest accuracy was employed in the testing stage for further evaluation, and the Shapley values of the input variables for the employed models were calculated and visualized in a scatterplot to evaluate the contribution of each feature within the OSA risk screening models [21]. from 10 −9 to 10 9 ); (d) various kernel types (linear, polynomial, and radial basis function) and regularization values (C, between 10 −3 and 10 3 ) for the SVM models, (e) the criterion (Gini index or entropy) and the number of classification and regression trees (set as 250, 500, and 750) for the RF models with the bootstrap technique, and (f) the criterion (mean squared error (MSE) , Friedman MSE, or squared error) and the number of estimators (set as 250, 500, and 750) for the XGBoost models. The performance matrix and area under the receiver operating characteristic curve (AUC) of each model were then determined. Thereafter, the machine learning approach with the highest accuracy was employed in the testing stage for further evaluation, and the Shapley values of the input variables for the employed models were calculated and visualized in a scatterplot to evaluate the contribution of each feature within the OSA risk screening models [21].

Figure 1.
Training process with grid search cross-validation. Various machine learning models were trained using grid search cross-validation (k-fold: 10). The model demonstrating the highest accuracy in the validation stage was employed to predict the testing data, and the feature importance was investigated. Abbreviations: LR, logistic regression; C, regularization values; kNN, k-nearest neighbor; NB, naïve Bayes; var_smoothing, portion of the largest variance of all features; SVM, support vector machine; RF, random forest; n_trees, number of classifications and regression trees; XGBoost, extreme gradient boosting; n_estimators, number of gradient boosted trees; AHI, apneahypopnea index.

Characterization of Enrolled Participants
We recruited a cohort of 3503 individuals for this retrospective study. Table 1 presents their anthropometric features grouped by OSA severity: normal-to-mild (AHI < 15), moderate (15 ≤ AHI < 30), and severe OSA groups (AHI ≥ 30). Regarding body profiles, the severe group demonstrated the highest mean values for BMI and neck and waist size, Figure 1. Training process with grid search cross-validation. Various machine learning models were trained using grid search cross-validation (k-fold: 10). The model demonstrating the highest accuracy in the validation stage was employed to predict the testing data, and the feature importance was investigated. Abbreviations: LR, logistic regression; C, regularization values; kNN, k-nearest neighbor; NB, naïve Bayes; var_smoothing, portion of the largest variance of all features; SVM, support vector machine; RF, random forest; n_trees, number of classifications and regression trees; XGBoost, extreme gradient boosting; n_estimators, number of gradient boosted trees; AHI, apneahypopnea index.

Characterization of Enrolled Participants
We recruited a cohort of 3503 individuals for this retrospective study. Table 1 presents their anthropometric features grouped by OSA severity: normal-to-mild (AHI < 15), moderate (15 ≤ AHI < 30), and severe OSA groups (AHI ≥ 30). Regarding body profiles, the severe group demonstrated the highest mean values for BMI and neck and waist size, with the ratio of men being higher in the severe OSA group (1299/284, 82.06%) than in the normal-to-mild (348/603, 36.59%) and moderate OSA groups (677/292, 69.87%). For body composition parameters, the severe OSA group exhibited the highest mean values for fat mass and fat percentage (in the whole body, only limbs, and only trunk), visceral fat level, and basal metabolic rate, but the lowest mean values for physique rating and muscle mass as a percentage of whole-body mass (all p < 0.5). Similarly, for body water distribution, the severe OSA group had the highest mean values for TBW (40.18 kg ± 6.37 kg), ECW (16.5 kg ± 1.89 kg), and ICW (23.68 kg ± 4.63 kg), but the lowest mean values for body water percentage (48.59% ± 5.76%).

Sleep Parameters
The details of the sleep quality indices by sleep stage are presented in Table 2. The severe group exhibited the lowest mean values for sleep efficiency (72.5% ± 16.99%), total sleep time (264.96 min ± 62.5 min), and the mean (93.46% ± 2.58%) minimum values (77.14% ± 8.64%) of oxygen saturation measured through pulse oximetry (SpO2). Regarding sleep stage parameters, the patients with severe OSA demonstrated the highest percentage for the wake stage (22.06% ± 16.18%) and highest mean WASO time (73.98 min ± 53.48 min). Conversely, the severe OSA group had the lowest percentage for both the REM and NREM stages (REM: 10.1% ± 6.3%; NREM: 67.83% ± 13.84%). Regarding the sleep quality indices, the severe group had the highest mean values for AHI, ODI (≥3%), snoring index, and arousal index (all p < 0.05). By contrast, the normal-to-mild OSA group had the lowest mean value for all of these indices.

Sleep Quality Index and Anthropometric Features
The correlations between the sleep quality index and anthropometric features are illustrated in Table 3. Regarding body profiles, BMI (ρ: 0.57), neck size (ρ: 0.59), and waist size (ρ: 0.61) had significant moderate correlations with AHI (all p < 0.05). For body composition parameters, fat mass, fat-free mass, and muscle mass (ρ: 0.4 to 0.48) in various body regions (i.e., whole body, only limbs, and only trunk) exhibited significant moderate correlations with AHI (all p < 0.05). Moreover, visceral fat level (ρ: 0.64) had a significant moderate to strong correlation with AHI (p < 0.05). In terms of body water distribution, AHI was positively correlated with TBW, ECW, and ICW (ρ: 0.43 to 0.58, p < 0.05), whereas AHI was negatively correlated with body water percentage (ρ: −0.24, p < 0.05). Moreover, the correlations of anthropometric features with other sleep quality indices, namely, ODI, snoring index, and arousal index, were similar to the correlations with AHI.

Accuracy Performance and Feature Importance
The model performance summary for the testing data set from the RF models is illustrated in Table 5. For the moderate-to-severe OSA RF model, the prediction accuracy was 84.74% and the AUC was 89.58%. For the severe OSA RF model, the prediction accuracy was 72.61% and the AUC was 80.07%. The feature importance in the RF models for the two risk types is presented in Figure 2. In the figure, the variables are arranged from top to bottom according to their Shapley values, with high and low values represented by red and blue dots, respectively. In both risk screening models (for moderate to severe and severe OSA), visceral fat level demonstrated the highest Shapley values, indicating the highest feature importance. Moreover, high visceral fat level (red dot) contributed to high OSA risk (high Shapley value). The ECW, neck and waist size, and BMI were alternately ranked from second to fifth highest in feature importance in both the risk screening models for moderate-to-severe OSA and severe OSA models.

Supplementary
We further developed OSA screening models for multiclass classification, including AHI < 15 (normal to mild OSA), 30 > AHI ≥ 15 (moderate), and AHI ≥ 30 (severe). The outcomes are presented in Supplementary Information. As shown in Table S1, the RF models exhibited the highest prediction accuracy (66.71%) and AUC (79.24%) in the training and validation stage. Subsequently, the RF models were used to predict the testing data sets because this approach outperformed other approaches; the outcomes are summarized in Table S2. The accuracy of multiclass prediction was 62.91%, and the AUC for

Supplementary
We further developed OSA screening models for multiclass classification, including AHI < 15 (normal to mild OSA), 30 > AHI ≥ 15 (moderate), and AHI ≥ 30 (severe). The outcomes are presented in Supplementary Information. As shown in Table S1, the RF models exhibited the highest prediction accuracy (66.71%) and AUC (79.24%) in the training and validation stage. Subsequently, the RF models were used to predict the testing data sets because this approach outperformed other approaches; the outcomes are summarized in Table S2. The accuracy of multiclass prediction was 62.91%, and the AUC for this classification performance was 77.47%. Figure S1 illustrates feature importance in the RF models used for predicting the severity of OSA. Similar to the findings obtained using the moderate-to-severe and severe OSA models, the level of visceral fat exhibited the highest Shapley value in the RF models for multiclass prediction, which indicated its highest feature importance. Waist size, ECW, neck size, and BMI were sequentially ranked from second to fifth in terms of feature importance.

Discussion
To develop robust models based on easily accessed parameters for OSA risk screening, we investigated the relationships of anthropometric features with PSG parameters by using a large sample from Taiwan (N = 3503). We conducted comparisons of the anthropometric features and PSG parameters of patients with different OSA severity. We also examined the correlations between sleep quality indices and anthropometric features, namely, body profiles and body composition parameters. Subsequently, various machine learning models based on anthropometric features were developed for screening the risk of moderate to severe OSA (AHI ≥ 15) and severe OSA (AHI ≥ 30). The models with the highest accuracy in the training and validation stage were used in the validation experiments; these models for both types of OSA severity exhibited high classification accuracy when using the testing data set. Moreover, we examined the feature importance of the adopted models in OSA severity screening. First, concerning model performance, the RF models using the bootstrap technique with optimal parameters derived from grid search cross-valuation demonstrated the highest accuracy and AUC in both types of OSA risk screening models. Similar to the results presented in Supplementary Information, the accuracy and AUC of the newly developed RF models were superior to the classification performance of other approaches. Although the literature does not provide evidence that RF outperforms other machine learning approaches, several plausible explanations may account for the present results. RF models, constructed per the theory of ensemble learning, may promote the accurate convergence of classification results (due in part to favorable antinoise ability) because this model architecture is more sensitive to relevant features and adept at disregarding the effects of irrelevant ones in comparison with other model architectures [22]. Moreover, the bootstrapping procedure and the number of decision trees in RF can be easily fine-tuned to avoid overfitting and maintain model stability [23]. Hence, the RF approach has been broadly employed for aiding medical diagnosis [24,25]. Compared with current machine learning approaches for screening OSA risk, the current models have some advantages. Specifically, our models are more suitable for application in clinical scenarios and demonstrate adequate accuracy. In related research, nocturnal oxygen saturation was adopted as a surrogate in OSA risk evaluation, and models integrating pulse oximetry data were developed [26]. However, variability in wearing situations, including incorrect probe placement, contact problems with the probe, and individuals' body movements, may contribute to severe artefacts in the measurements [27]. Another study proposed machine learning models that integrated electrocardiogram data from wearable devices to screen for OSA risk; these models exhibited acceptable accuracy [28]. However, this type of screening method may not be suitable for patients with cardiopulmonary diseases because of the irregular and complex electrocardiogram signals of such patients. Moreover, some researchers have proposed machine learning models based on craniofacial feature images to predict OSA severity. Although craniofacial factors are significantly associated with OSA risk, the accuracy of those models was only 67% for classifying moderate to severe OSA [29]. This result may be attributed to poor precision caused by variations in captured craniofacial images and the fact that OSA pathology may not be entirely attributable to craniofacial factors. Prior studies have also proposed machine learning approaches for screening OSA risk by using different types of anthropometric features [30,31]; these methods exhibited relatively stable performance, and the interpretation of feature importance was straightforward. The literature thus suggests that machine learning models based on easily accessed anthropometric parameters may be practical for rapid screening of OSA severity in clinical scenarios.
Regarding the importance of features used in the developed models, visceral fat level had the highest Shapley value, suggesting that it is a predominant factor for screening OSA risk. In terms of feature importance, BMI, neck size, and waist size followed visceral fat level. These outcomes were consistent with our statistical findings that these anthropometric parameters were correlated with AHI and ODI. These results can be partially attributed to body fat deposition, which can be estimated using visceral fat (internal organs), waist size (abdomen), neck size (upper airway), or BMI (whole body), being associated with obesity level and thus affecting AHI [32]. Several studies exploring fat accumulation in various body regions have suggested that body fat volume is associated with OSA risk [33,34]. Studies have also indicated that BMI and waist size are significantly associated with OSA [35] and suggested the feasibility of using BMI and neck size to predict OSA risk for men and women, respectively [36]. In addition, for visceral fat, a related study indicated that one of the clinical manifestations of OSA, nocturnal hypoxemia, was associated with increased inflammatory responses in adipose tissue and decreased insulin sensitivity [37]. These interplays can interfere with glucose uptake and stimulate hepatic gluconeogenesis, thereby increasing visceral fat accumulation. Biomechanically, the presence of visceral fat is associated with reduced thoracic capacity and lung volume [38], potentially increasing the workload of respiratory muscles and even resulting in more severe OSA. One study investigated the biomechanism of visceral adipocytes during oxygen starvation and observed that intermittent hypoxia may cause elevated oxidative stress and insulin resistance, aggravate the inflammatory effect, and trigger initial dysmetabolism [39].
Regarding the effect of body water distribution on OSA risk prediction, ECW was the second and fifth most important factor in the risk screening models for moderate to severe and severe OSA, respectively. These results are likely attributable to the associations between ECW and sleep apnea. Researchers have indicated that nocturnal Rostral fluid redistribution from the lower limbs was independent of body weight, and that it may increase the likelihood of upper airway narrowing, thereby contributing to OSA pathogenesis [40]. In another study, ECW was also associated with residual kidney function; the resulting fluid overload can elevate the mucosal water content in the upper airway and thereby aggravate OSA severity [41]. Studies comparing those with and without OSA have observed a higher percentage of ECW [42] as well higher mean values in the percentages of TBW and ECW [43] in those with OSA. Collectively, body water distribution, especially ECW, may serve as a practical indicator when screening the risk of OSA.
The present study has some strengths. First, using the models developed in this study, we can identify patients with moderate to severe and severe risk of OSA on the basis of easily acquired anthropometric parameters. The adequate classification performance of the models may help optimize PSG to increase the availability of medical resources by prioritizing high-risk patients for PSG and treatment. Second, although machine learning approaches have been described as a black box method, which means how input variables are combined to make predictions could not be determined, we evaluated the feature importance by calculating Shapley values; this may improve our understanding of correlations between OSA risk and various input parameters. Finally, considering the outcomes of the feature importance distribution, we may prevent OSA by reducing its effect. For instance, reducing the level of visceral fat should be considered first for reducing OSA risk. Similarly, because ECW was identified to be a key factor in leading OSA, performing exercises to reduce body fluid levels and, by extension, body water retention before sleep time may help reduce or eliminate the severity of OSA.
The current study has some limitations that should be considered and addressed in future work. First, our models were population specific because we collected data exclusively from the populations in Taiwan. However, not only anthropometric features but also craniofacial factors have been reported to affect AHI and OSA severity [44]. Our models, therefore, may only be applied to specific ethnicities or populations with craniofacial features similar to the population in Taiwan. Second, our findings were based on PSG results; because PSG involves manual scoring, interscorer variability between PSG technologists may have affected our data quality. Although technicians from the same sleep center undergo regular scoring training, some degree of human variability is unavoidable [45]. Third, environmental factors, such as the first-night effect, may also have affected the quality of our data set [46]. More precisely, sleeping in a new environment may alter an individual's sleep cycle and physiology, thereby causing inaccurate PSG outcomes. Although we excluded patients with low sleep efficiency, further works may consider using data from repeated PSG to prevent or reduce such bias [47]. Fourth, this retrospective study lacked information regarding lifestyle habits (tobacco and alcohol use) [48] or personal health status (menopausal status and comorbidities) [49,50]. Nevertheless, the association between OSA and these baseline details has been documented. Future research can consider obtaining these data by using questionnaires or retrospect patients' disease-related parameters from personal medical history. Such additional data may be helpful for training more comprehensive models and increasing the accuracy of OSA risk screening.

Conclusions
To address the limitations of current screening tools for OSA severity, we developed novel models using easily accessed parameters. On the basis of anthropometric features obtained from 3503 patients in Taiwan, we developed various machine learning models to predict the risk of severe-to-moderate OSA and severe OSA. In the training and validation stage, the RF-based prediction models demonstrated the highest accuracy and AUC in both OSA severity risk categories among all the machine learning approaches. We, therefore, applied the RF models for testing data set prediction; the accuracy was 84.74% for the moderate to severe model and 72.61% for the severe model. Regarding feature importance, visceral fat level was the most critical feature in the OSA risk screening. Similarly, our statistical outcomes suggested that AHI and ODI significantly correlated with the anthropometric features related to obesity (i.e., BMI; neck size; waist size; visceral fat level; and the mass and percentage of fat in the whole body, only limbs, and only trunk). Our machine learning models may be employed to screen for OSA risk in populations with similar craniofacial features.

Supplementary Materials:
The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/s22228630/s1, Figure S1: Density scatterplots showing the Shapley values of the input parameters used in the RF models for assessing OSA severity using the testing data set; Table S1: Classification of the results of the random forest model used to assess the severity of OSA using the testing data set; Table S2: Classification of the results of the random forest model used to assess the risk of OSA using the testing data set. Funding: This study was funded by Taiwan's Ministry of Science and Technology (Grant number: MOST 110-2634-F-002-049). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Institutional Review Board Statement:
This is a retrospective study that employed a database from a medical center. The Ethics Committee of the Taipei Medical University-Joint Institutional Review Board reviewed and approved the protocol of this retrospective study (TMU-JIRB No: N201911007). All relevant procedures for data collection, analysis, and preservation were conducted per the approved protocol.
Informed Consent Statement: Patient consent was waived due to the fact that this retrospective study determined the data from the hospital database. All the data are derived under data deidentification and analyzed in accordance with the rule of off-label use.

Data Availability Statement:
All the data of this study were collected at the Sleep Center of Taipei Medical University-Shuang Ho Hospital (New Taipei City, Taiwan) between May 2019 and December 2021. Because our data set contains personal information, it is not available in the Supplementary File. For access to the data set or relevant documents, please contact the corresponding author.