Development and Internal Validation of a Prediction Model for Surgical Success of Maxillomandibular Advancement for the Treatment of Moderate to Severe Obstructive Sleep Apnea

Background: Maxillomandibular advancement (MMA) has been shown to be the most effective surgical therapy for obstructive sleep apnea (OSA). Despite high success rates, there are patients who are considered as non-responders to MMA. In order to triage and inform these patients on their expected prognosis of MMA before the surgery, this study aimed to develop, internally validate, and calibrate a prediction model for the presence of surgical success for MMA in patients with OSA. Methods: A retrospective cohort study was conducted that included patients that had undergone MMA for moderate to severe OSA. Baseline clinical, polysomnographic, cephalometric, and drug-induced sleep endoscopy findings were recorded as potential predictors. Presence or absence of surgical success was recorded as outcome. Binary logistic regression analyses were conducted to develop the model. Performance and clinical values of the model were analyzed. Results: One hundred patients were included, of which sixty-seven (67%) patients reached surgical success. Anterior lower face height (ALFH) (OR: 0.93 [0.87–1.00], p = 0.05), superior posterior airway space (SPAS) (OR: 0.76 [0.62–0.92], p < 0.05), age (OR: 0.96 [0.91–1.01], p = 0.13), and a central apnea index (CAI) <5 events/hour sleep (OR: 0.16 [0.03–0.91], p < 0.05) were significant independent predictors in the model (significance level set at p = 0.20). The model showed acceptable discrimination with a shrunken area under the curve of 0.74, and acceptable calibration. The added predictive values for ruling in and out of surgical success were 0.21 and 0.32, respectively. Conclusions: Lower age at surgery, CAI < 5 events/hour, lower ALFH, and smaller SPAS were significant predictors for the surgical success of MMA. The discrimination, calibration, and clinical added values of the model were acceptable.


Introduction
Obstructive sleep apnea (OSA) is a breathing disorder which occurs during sleep and is characterized by recurrent obstruction (partial or complete) of the upper airway, resulting in hypopnea and/or apnea [1]. OSA results in hypoxemia, hypercapnia, and arousals from sleep. It is associated with cardiovascular and cognitive morbidity, a reduced 2 of 16 quality of life, and premature death [2][3][4][5][6]. It is estimated that the prevalence of OSA in the general population is 9% to 38%, whilst prevalence percentages increase due to rising rates of obesity in addition to an aging population [7,8]. Polysomnography (PSG) is the gold standard test for the diagnosis of OSA. The diagnosis and severity of OSA have been largely quantified by the numeric calculation of the number of obstructive, central, and mixed apneas and hypopneas per hour of sleep (AHI). Severity, spanning three levels, is traditionally defined by the cut-offs 5-14, 15-29 and ≥30 events per hour defining mild, moderate and severe OSA, respectively, as suggested by the American Society of Sleep Medicine (AASM) [9].
Continuous positive airway pressure (CPAP) is considered the first treatment choice in patients with moderate to severe OSA [9]. However, a substantial proportion of patients experience problems tolerating CPAP, resulting in a reduced compliance to the therapy [10]. Alternatives for these patients usually consist of a mandibular advancement device (MAD) or surgical treatment, e.g., maxillomandibular advancement osteotomy (MMA) [11]. MMA has shown to be the most effective surgical therapy for OSA, excluding a tracheostomy, with a reported success rate of 85% [12]. However, despite the high success rates, there is a group of patients who are considered as non-responders to MMA [12]. It is thought that the presence of complete anteroposterior collapse at the level of the epiglottis and a minimal retro velar space might contribute to MMA failure [13,14]. However, only a few studies have assessed predictors for failure in MMA; therefore, drawing conclusions remains arbitrary.
In order to efficiently use the scarce medical resources, it is of utmost importance to triage the patients based on their expected prognosis of MMA before the surgeries. To ensure this, prediction models for surgical success are of vital importance. To date, no prediction models for the surgical success of MMA have been developed, further complicating preoperative clinical patient counseling and suitable candidate selection. This is because a prediction model helps to inform patients on their potential prognosis of the surgery and also aids clinicians during preoperative decision-making. Therefore, prediction models for the surgical success of MMA are warranted. Whilst we nowadays aim for tailor-made treatment (personalized medicine) for each individual patient, it is important that preoperative predictors for surgical success are identified. These predictors should lead to the development, validation, and implementation of a prediction model for the surgical success of MMA as a treatment of OSA in the future. Improving MMA candidate selection will not only contribute to improve appropriate care delivery, but also reduce morbidity and increase the therapeutic success of MMA. A broader goal is to better utilize the available healthcare costs by optimizing the cost-effectiveness of MMA as a treatment for OSA. Therefore, the aim of this study was to identify potential predictors for the surgical success of MMA (as defined by Sher's criteria [15]) in patients with OSA, and develop and internally validate a model for the prediction of surgical success.

Materials and Methods
The Medical Ethics Committee of the Amsterdam University Medical Centers (Amsterdam UMC, location Amsterdam Medical Center (AMC)) concluded that this study was exempted from the Medical Research Human Subjects Act (Reference number W22_061#22.093). The present study was carried out based on the Strengthening The Reporting of Observational studies in Epidemiology (STROBE) [16] statement and the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) statement [17].

Study Design and Participants Enrolment
The study was designed as a retrospective cohort study. The inclusion criteria were (1) patients with moderate to severe OSA, diagnosed by means of PSG (AHI ≥ 15/h); (2) age > 18 years old; (3) patients who underwent MMA as a treatment for OSA in the Amsterdam UMC location AMC, from September 2011 to September 2020; (4) an overnight level I or level II PSG was performed to measure the parameters relevant to OSA prior to surgery and at a minimum of 3 months postoperatively; (5) a standardized lateral cephalogram was performed prior to surgery and at a minimum of one week postoperatively; and (6) patients who were followed-up for at least 12 months on the outpatient clinic after MMA.
The non-inclusion criteria were as follows: (1) patients who did not undergo isolated MMA nor simultaneous upper airway surgery (e.g., uvulopalatopharyngoplasty, lateral pharyngoplasty, expansion sphincter pharyngoplasty, barbed reposition pharyngoplasty, tongue volume reduction surgery and/or hyoid bone suspension surgery); (2) patients who underwent a previous MMA osteotomy as a treatment for OSA; (3) patients with instable endocrine dysfunction prior to surgery (hypothyroidism, acromegaly and pituitary adenoma) and/or patients with craniofacial syndromes; and (4) patients who did not give permission for their data to be used for research purposes.

Treatment Protocol
All MMA osteotomies were performed by two experienced oral and maxillofacial surgeons dedicated to the treatment of OSA. MMA osteotomy consisted of a Le Fort I osteotomy of the maxilla with a Hunsuck-Dal Pont modification of the bilateral sagital split osteotomy (BSSO) of the mandible, as described by Obwegeser [18,19]. Subsequently, advancement of the maxillomandibular complex followed, and in a subgroup of patients additional counterclockwise rotation was performed [20]. After applying temporary maxillomandibular fixation by steel-wire ligatures or power chains and intraoperative splints, rigid internal fixation was applied [21,22]. Before the availability of three-dimensional planning, the surgery was planned two-dimensionally with manually fabricated intraoperative splints. In patients who had undergone more recent surgery, the surgery was virtually planned and involved three-dimensionally fabricated intraoperative splints [11].

Predictors
The potential predictors were extracted from the electronic patients' files, including patient-related variables, respiratory parameters assessed by PSG, drug-induced sleep endoscopy (DISE) findings, and cephalometric measurements. All the predictors were measured at baseline before the MMA. All the potential predictors in the present study were decided based on the previous literature [11,23,24] and the authors' clinical experience and knowledge.

Patient-Related Variables
The patient-related variables included gender, age, body mass index (BMI) at time of surgery, pre-existent physiological status by means of the ASA (American Society of Anesthesiology) classification score (ASA I, normal health; ASA II, mild systemic disease; ASA III, severe systemic disease; ASA IV, severe systemic disease that is a constant threat to life; ASA V, not expected to survive without operation) [25], history of upper airway surgery, excluding previous MMA, as a treatment for OSA (Yes or No), and the presence or absence of teeth (dentulous versus edentulous). Patients with 1-27 teeth (excluding the third molars) were classified as partially dentulous.

Respiratory Parameters
All patients underwent an overnight level I or level II PSG prior to surgery and a minimum of 3 months postoperatively. For scoring respiratory events, we adhered to the criteria of the American Academy of Sleep Medicine (AASM), with the use of the recommend rules for the scoring of hypopneas, i.e., (1) peak signal excursions drop by ≥30% of pre-event baseline using nasal pressure (diagnostic study); (2) the duration of the ≥30% drop in signal excursion is ≥10 s; and (3) ≥3% oxygen desaturation from pre-event baseline and/or the event is associated with an arousal) [26]. The following data was obtained from PSG prior to surgery (baseline): AHI, central apnea index (CAI; presence of central apnea events was defined as a CAI ≥ 5 per hour sleep [27]), and presence of positional OSA (positional OSA was defined as a minimally two times higher AHI in supine position when compared to non-supine position [28]).

Cephalometric Variables
The lateral cephalograms were taken with the patients' head in a natural position with the mandibular condyle positioned in centric relation to the glenoid fossa. All cephalograms were analyzed by a single observer using Viewbox software (Viewbox 4, dHAL Software, Kifissia, Greece) [29]. For intra-observer reliability analyses, the observer repeated the measurements one month later in twenty cases that were randomly selected. In the present study, the following cephalometric data at baseline was obtained as the potential predictors: anterior lower face height, anterior total face height, presence of maxillomandibular deficiency (maxillomandibular deficiency was defined as sella-naison-A-point (SNA) angle ≤ 80.5 • and/or sella-naison-B-point (SNB) angle ≤ 78.5 • ) [30], and superior posterior airway space (SPAS). An overview on the cephalometric variables and definitions is illustrated in Table 1. An overview of the landmarks, reference lines, and variables on cephalometry is illustrated in Figure 1.

Drug-induced Sleep Endoscopy
In patients with previous unsuccessful CPAP and/or MAD therapy, DISE was performed prior to MMA osteotomy to assess the precise anatomic level(s) and pattern(s) of upper airway collapse. These patients underwent a standardized DISE procedure, of which the method is described in a previous study [27]. In order to quantify the observers' findings during DISE, the VOTE scoring system was used [28]. In the present study, we

Drug-induced Sleep Endoscopy
In patients with previous unsuccessful CPAP and/or MAD therapy, DISE was performed prior to MMA osteotomy to assess the precise anatomic level(s) and pattern(s) of upper airway collapse. These patients underwent a standardized DISE procedure, of which the method is described in a previous study [27]. In order to quantify the observers' findings during DISE, the VOTE scoring system was used [28]. In the present study, we included data on presence/absence of concentric collapse at the velum and presence/absence of complete anteroposterior epiglottis collapse, both in supine position, as the potential predictors.

Outcomes
Changes in AHI at 3 to 12 months follow-up compared with the preoperative AHI were regarded as the primary outcome for surgical success. The outcome for surgical success was binary. The surgical success of MMA is considered 'present' if a patient's AHI was reduced by ≥50% compared to the preoperative AHI, combined with a postoperative AHI < 20 events/h, as proposed by Sher et al. [15].

Missing Data
The multiple imputation technique was used for the missing values. We created m = 35 imputed datasets with 10 iterations and used predictive mean matching (PMM) for imputing the missing values. All the potential predictors and the outcome variable were included in the imputation model.

Development of the Model Screening of Potential Predictors and Modelling
The potential predictors for surgical success were determined based on clinical experience and previous literature by the research team. Multicollinearity of the potential predictors were assessed using the variance inflation factor (VIF). When a VIF value of a predictor was higher than 10 [31], collinearity was considered present and the predictor was excluded from the subsequent analysis.
To pre-screen the potential predictors, univariate binary logistic regression analysis was used to assess the association between each potential predictor and the outcome. The predictors with a p-value of ≤0.20 were selected for the subsequent multivariate analyses. Multivariate binary logistic regression analysis with backward selection (predictors with p-value of >0.20 were removed) was performed to further screen the potential predictors and develop the prediction model.

Shrinkage Factor
A global shrinkage factor was produced based on the bootstrapping procedure with 100 bootstrap samples. The shrinkage factor was used to shrink the regression coefficients of the predictors in order to prevent the overfitting of the prediction model [32,33].

Performance of the Prediction Model
The performance of the prediction model was assessed in aspects of calibration and discrimination. Calibration is defined as the agreement between predicted and observed outcomes [34]. The calibration of the model was assessed with the calibration plot by plotting the predicted individual outcomes against the observed actual outcomes. The patients were grouped into deciles based on their predicted probabilities of the outcomes. The prevalence of the outcome events in each decile is considered the observed probability. The mean of the individual predicted probabilities in each decile is considered the predicted probability. In the calibration plot, the agreement between predicted probabilities and observed probabilities across the range of the predicted risks was estimated. The overall calibration of the model was assessed with the overall observed-expected ratio (O:E ratio) [34].
The O:E ratio was defined as the ratio between the prevalence of the outcomes (observed) and the mean individual predicted probabilities of the outcomes (expected) within the cohort [35]. An O:E ratio between 0.8 and 1.2 indicates an acceptable overall calibration [36]. The calibration of the model was also assessed with the Hosmer-Lemeshow goodness-of-fit statistic test (HL test). A p-value of >0.10 of the HL test indicates that the model fits the observed data [37].
Discrimination is defined as the ability of the model to differentiate between those with and without the outcome events [34]. The discrimination of the model was assessed with the area under the receiver-operating characteristic curve (AUC). An AUC of 0.70 to 0.80 indicates an acceptable discrimination of the model, while an AUC of ≥0.80 indicates an excellent to outstanding discrimination of the model [38].
The optimal cutoff for the predicted probability of the model was defined as the predicted probability with the maximum sum of sensitivity and specificity in the receiveroperating characteristic curve (ROC).

Clinical (Added) Values
The clinical values of the model at the optimal cutoff for predicted probability were assessed using prevalence (prior probability) and posterior probabilities of the outcome events. The posterior probability was defined as positive predictive value (PPV) and negative predictive value (NPV). PPV was defined as the number of patients with the actual outcome events among the patients who were predicted to have the outcome events. NPV was defined as the number of patients without actual outcome events among the patients who were predicted to have no outcome events. The added predictive value of the model for ruling in an increased probability of the outcome events was defined as the PPV minus prevalence, while that for ruling out an increased probability of the outcome events was defined as the NPV minus complement of prevalence.

Score Chart and Line Chart
A clinical prediction rule for the outcome events was developed to provide an estimate for individual patients of their absolute probability of the outcome events. For the final multivariate binary logistic regression model, the individual probability (P) of the outcome events was predicted with the following formula: where β is the shrunken regression coefficient of a predictor in the models.
To facilitate the calculation of the predicted probability of the outcome events in individual patients, the multivariate logistic regression model was converted to a score chart. In the score chart, the score of each included predictor was produced by the shrunken regression coefficients being multiplied by −100 and subsequently rounded. A line chart was then developed to help determine the predicted probability of the outcome events.
All the statistical procedures mentioned above were performed via SPSS 27.0 (IBM, New York, NY, USA) and R software 4.0.4 ((R Development Core Team, Vienna, Austria).

Results
In the period of September 2011 to September 2020, 111 patients underwent MMA osteotomy for OSA. A total of 100 patients were eligible for analysis, of whom 82 (82%) were male. Eleven patients were excluded due to no patient approval for usage of their data for research purposes (n = 3), mild OSA (n = 3), no postoperative PSG performed (n = 4), and craniofacial syndrome (n = 1). Among the 100 eligible patients, mean age was 50.5 (± 9.9) years and mean BMI was 29.8 (±4.2) kg/m 2 . The majority of patients were ASA II (56%), followed by ASA I (23%) and ASA III (21%). In ninety-eight (98%) patients, CPAP was an unsuccessful therapy and/or intolerance was noted. Two (2%) patients declined CPAP as first-choice therapy. Mean AHI prior to surgery was 52.  Table 2 (Appendix A  contains Table A1, which presents baseline characteristics without multiple imputation). The VIF values of all the predictors were lower than 10, which indicated that the multicollinearity between the predictors was negligible. Therefore, all the predictors were included for further analysis. In the univariate binary logistic regression analyses, anterior total face height, anterior lower face height, SPAS, age, and presence of CAI ≥ 5 events/hour had a p-values of ≤0.20 and were included in the subsequent multivariate binary logistic regression analysis (Table 3). In the multivariate analysis, anterior lower face height, SPAS, age, and presence of CAI ≥ 5 events/hour remained in the final model with p-values of ≤0.20 (Table 3). The shrinkage factor of the model was 0.80. The original AUC of the model was 0.78 (95% confidence interval [95%CI]: 0.66 to 0.87) and the shrunken AUC of the model was 0.74. This indicated that the discrimination of the model was acceptable. The calibration plot ( Figure 2) showed that most plotted dots were lying close to the diagonal line. Therefore, there was a good agreement between the predicted probabilities and actual probabilities of the outcomes. The O:E ratio was 1.01 (95%CI: 0.81 to 1.24), which indicated that the overall calibration of the model was excellent. The p-value of the HL test was 0.42, which showed that the model had good fit. The optimal cutoff for the predicted probability of the model was 0.62. Table 4 presents the prevalence, sensitivity, specificity, PPV, and NPV of the model. The clinical added value of the model for ruling in the probability of surgical success was 0.21 (95%CI: 0.09 to 0.34) in addition to the prevalence, while that for ruling out the probability of surgical success was 0.32 (95%CI: 0.15 to 0.49) in addition to the complement of the prevalence. To enhance the clinical usefulness of the model, a score chart (Table 5) and a line chart ( Figure 3) were produced. A clinician can easily calculate the sum score of a patient using the score chart and determine the corresponding predicted probability of surgical success based on a line chart using the sum score. The predicted probability of surgical success is lower when the sum score is higher. The cutoff of the sum score for the prediction of surgical success was 1111.  The optimal cutoff for the predicted probability of the model was 0.62. Table 4 presents the prevalence, sensitivity, specificity, PPV, and NPV of the model. The clinical added value of the model for ruling in the probability of surgical success was 0.21 (95%CI: 0.09 to 0.34) in addition to the prevalence, while that for ruling out the probability of surgical success was 0.32 (95%CI: 0.15 to 0.49) in addition to the complement of the prevalence. To enhance the clinical usefulness of the model, a score chart (Table 5) and a line chart ( Figure 3) were produced. A clinician can easily calculate the sum score of a patient using the score chart and determine the corresponding predicted probability of surgical success based on a line chart using the sum score. The predicted probability of surgical success is lower when the sum score is higher. The cutoff of the sum score for the prediction of surgical success was 1111. The algorithm for the calculation of a patient`s sum score for surgical success is presented below: Sum score = 6 * anterior lower face height + 23 * SPAS + 3 * age + 147 * CAI ≥ 5 events/hour

Discussion
In the present study, patients with a lower age at surgery, CAI < 5 events per hour, a lower anterior lower face height (ALFH), and a smaller superior posterior airway space (SPAS) may have a higher probability of obtaining surgical success. The prediction model for the surgical success of MMA was derived based on the predictors above, and the performance of the model may be acceptable. To the authors' best knowledge, this is the first study to develop a prediction model for the surgical success of MMA for the treatment of OSA with pre-operative patient data that can be utilized during daily clinical practice.
Clinicians frequently encounter the presence of central and/or mixed events on PSG in patients with OSA, which makes the treatment decision-making process more difficult [39]. The results presented in this study on the CAI and its role with respect to the surgical success of MMA are in line with a study by Markovey et al. [13], illustrating that a lower pre-operative CAI was a statistically significant predictor of surgical success (CAI preoperatively in the success group was 0.6 versus 5.7 in the failure group, p-value = 0.005). Xie et al. studied the difference between patients with pure OSA (100% of the apneas are obstructive) and predominant OSA (presence of both central and obstructive apneas and the obstructive apneas account for >50% of the total number of apneas), and they reported lower breathing control stability in patients with predominant OSA [40]. Therefore, it is thought that in patients with a higher preoperative CAI, the lower breathing control stability might entail obstructive events, leading to lower surgical success rates. This present The algorithm for the calculation of a patient's sum score for surgical success is presented below: Sum score = 6 * anterior lower face height + 23 * SPAS + 3 * age + 147 * CAI ≥ 5 events/hour

Discussion
In the present study, patients with a lower age at surgery, CAI < 5 events per hour, a lower anterior lower face height (ALFH), and a smaller superior posterior airway space (SPAS) may have a higher probability of obtaining surgical success. The prediction model for the surgical success of MMA was derived based on the predictors above, and the performance of the model may be acceptable. To the authors' best knowledge, this is the first study to develop a prediction model for the surgical success of MMA for the treatment of OSA with pre-operative patient data that can be utilized during daily clinical practice.
Clinicians frequently encounter the presence of central and/or mixed events on PSG in patients with OSA, which makes the treatment decision-making process more difficult [39]. The results presented in this study on the CAI and its role with respect to the surgical success of MMA are in line with a study by Markovey et al. [13], illustrating that a lower pre-operative CAI was a statistically significant predictor of surgical success (CAI preoperatively in the success group was 0.6 versus 5.7 in the failure group, p-value = 0.005). Xie et al. studied the difference between patients with pure OSA (100% of the apneas are obstructive) and predominant OSA (presence of both central and obstructive apneas and the obstructive apneas account for >50% of the total number of apneas), and they reported lower breathing control stability in patients with predominant OSA [40]. Therefore, it is thought that in patients with a higher preoperative CAI, the lower breathing control stability might entail obstructive events, leading to lower surgical success rates. This present study also found that ALFH was significantly associated with surgical success. In a meta-analysis on craniofacial morphology in patients with OSA, the authors found a strong tendency towards an increased ALFH in adult patients with OSA [41]. A possible explanation for this altered craniofacial anatomy might be upper airway obstruction occurring as early as childhood [42]. However, to date, still little is known regarding the exact underlying mechanism of cephalometric measurements as predictors for surgical success. Despite the fact that the included predictors in the prediction model were significantly associated with surgical success, the causality between predictor and outcome was not assessed, and conclusions on causality cannot be drawn. Therefore, included predictors might not have a causal relation, whilst still being strong predictors for surgical success in the prediction model.
The original AUC of the model was 0.78, and the shrunken AUC of the model was 0.74, which indicates that the discrimination of the model was acceptable. The calibration plot ( Figure 2) illustrates that there was a good agreement between the predicted probabilities and the actual probabilities of the outcomes. The added predictive value for ruling in surgical success was 0.21, whereas the added predictive value for ruling out surgical success was 0.32. These results denote that if the model predicts a patient to reach surgical success, the posterior probability of such patient to reach surgical success can be increased by 0.21 when compared with the prevalence of surgical success in the patient's group. If the model predicts a patient to have the absence of surgical success, such patient's posterior probability of an absence of surgical success can be increased by 0.32 when compared with the completement of prevalence of surgical success in the patient's group. Both these results denote that the clinical added values of the model were adequate for ruling in and ruling out surgical success.
In order to optimize the utilization of the model during daily clinical practice, calculation of the optimal cut-off value for predicted probability is needed for probability stratification. The optimal cut-off value is determined when both sensitivity and specificity are at their maximum, so false negative and false positive outcomes are at their lowest. The optimal cutoff for the predicted probability of surgical success was 0.62. Thus, in the event of a sum score lower than 1111, individuals were very likely to reach surgical success.
Of note is the fact that a prediction model might entail false positive and false negative outcomes. In the event of a false negative outcome, a patient and clinician might falsely waive MMA as the therapy of choice, which might worsen the patient's OSA and prognosis. On the other hand, a false positive outcome might lead to an incorrect indication for surgery, which entails comorbidity and the risks associated with surgery, such as bleeding, infection, and wound healing problems. Both false negative and false positive outcomes might result in an increase in costs and unfavorable health outcomes. The model presented in this study has a 35% and 12% risk of a false negative and false positive outcome, respectively. The percentage of false negative outcome can be regarded as moderately high. This indicates that when a patient is predicted to have failure of the surgery, clinicians need to be very cautious about the predicted results and should make the final decision based on their experience and other clinical examinations. This may avoid the false negatives to a large extent. In addition, as previously discussed, a false-negative outcome might entail incorrectly waiving MMA as the therapy of choice. However, the disadvantages of a false-positive outcome resulting in the incorrect indication for MMA may be more severe when compared to the incorrect waiving of MMA.
In order to increase surgical success rates, a prediction tool is warranted that aids surgeons in identifying responders and non-responders pre-operatively during patient counseling. If a patient is predicted to have a high probability of surgical success, this endorses the consideration for MMA as the therapy of choice. In addition, if a patient is predicted to have a low probability of surgical success, this will aid clinician and patient to be more cautious in choosing MMA as the therapy of choice and possibly search for other therapeutic options. When a patient with a low probability of surgical success is still determined to undergo MMA since he/she has no other therapeutic options left, the prediction might still help to inform the patient on the prognosis of their OSA, thereby shaping their expectations of MMA. The prediction model allows patients to be informed on their individual chances of surgical success rather than average group success rates.
For the presented study population, 67% of the included patients attained surgical success after MMA. These results are lower when compared with a recent review reporting surgical success rates of up to 85% [12]. We believe this is due to the fact that the patients included in this study had more multi-therapy resistant (complex) types of OSA, since these patients were referred to our academic hospital after the failure of one or more earlier therapies. This study included patients with moderate to severe OSA. This is because patients with mild OSA generally experience milder symptoms and therefore a lower burden of disease and a lower risk of untreated hypoxic burden compared to patients with moderate or severe OSA. Therefore, an invasive therapy such as MMA is not considered the therapy of choice in patients with mild OSA, and non-invasive therapies (i.e., CPAP or MAD therapy) resolve symptoms and obtain success of therapy in most cases [9]. The prediction model presented in this study can therefore solely be utilized for patients with moderate to severe OSA.
This study has some limitations. First, the retrospective design of the study entails higher proportions of missing data. The missing data was considered missing at random, and therefore the multiple imputation technique was used for the missing values. Ideally, a prospective study is preferred due to better control of the data. However, since imputation of missing values is considered superior to complete case analysis in the event of missing data, the potential bias in the results caused by the missing values were minimized [43]. Second, in a multivariate logistic regression analysis, an events per variable (EPV) value of 10 is widely advocated to obtain a reliable outcome [44,45]. The present study, however, did not meet the criterion because of the small sample size, which is a limitation. In order to reduce the number of predictors included in the multivariate analysis, we performed univariate analyses to pre-screen the predictors in the study. In addition, we used a less stringent threshold of p-value = 0.20 in modeling for the selection of potential predictors to avoid the incorrect exclusion of the important predictors due to the small sample size. In this way, the negative consequence caused by the sample size could be reduced to a large extent. Third, the cephalograms that were assessed in this study were all performed while the patients were awake and with a standard upright position. The data obtained on soft tissue measurements might therefore not be an accurate resemblance of the measurements of soft tissue during sleep in supine position. Nevertheless, it has been performed widely as a routine application prior to OSA surgery, and in the context of low costs and convenience, determining pharyngeal and skeletal anatomy by a cephalogram performed in the standard upright position is of added value. Because we did not have a different population, external validation of the model was not possible in our study, which is a limitation. Therefore, we recommend that the external validation of the model is warranted for future research. Fourth, the postoperative PSG was performed at the minimum of 3 months and at the maximum of 12 months. This difference in the timing of the follow-up PSG might influence the observed success rates of the patients, thus causing a bias in the results. However, several studies have illustrated that the decrease in AHI, and therefore surgical success, after MMA is stable over time [23,46], and it is therefore not likely that the postoperative PSG timing biased the final results in a major way. Last, the missing proportion of the DISE variables was 36%, which is relatively large. The main reason for the absence is that the DISE variables were not routinely collected in the clinical practice, and the variables were more likely to be collected when other alternative treatments for CPAP or MAD were indicated, when surgical options were indicated, or when the AHI was very high and initial therapy did not work. Therefore, we think the DISE variables are likely to be missing not at random, because the factors which may impact the absence of the variables were not adjusted in the imputation model. This may, to some extent, bias our results, which is another limitation.

Conclusions
The prediction model was developed for the surgical success of MMA as a surgical treatment for patients with moderate to severe OSA. A lower age at surgery, CAI < 5 events per hour, a lower anterior lower face height, and a smaller superior posterior airway space were significant predictors for the surgical success of MMA. The performance of the model terms of discrimination and calibration was acceptable. The clinical added values of the model were adequate for ruling in and ruling out surgical success of treatment. The model presented in this study may aid surgeons in identifying responders for MMA preoperatively. In addition, it improves preoperative patient counseling on the chances of reaching surgical success. However, prior to the implementation of the model in daily clinical practice, external validation is warranted.