Non-Laboratory-Based Risk Prediction Tools for Undiagnosed Pre-Diabetes: A Systematic Review

Early detection of pre-diabetes (pre-DM) can prevent DM and related complications. This review examined studies on non-laboratory-based pre-DM risk prediction tools to identify important predictors and evaluate their performance. PubMed, Embase, MEDLINE, CINAHL were searched in February 2023. Studies that developed tools with: (1) pre-DM as a prediction outcome, (2) fasting/post-prandial blood glucose/HbA1c as outcome measures, and (3) non-laboratory predictors only were included. The studies’ quality was assessed using the CASP Clinical Prediction Rule Checklist. Data on pre-DM definitions, predictors, validation methods, performances of the tools were extracted for narrative synthesis. A total of 6398 titles were identified and screened. Twenty-four studies were included with satisfactory quality. Eight studies (33.3%) developed pre-DM risk tools and sixteen studies (66.7%) focused on pre-DM and DM risks. Age, family history of DM, diagnosed hypertension and obesity measured by BMI and/or WC were the most common non-laboratory predictors. Existing tools showed satisfactory internal discrimination (AUROC: 0.68–0.82), sensitivity (0.60–0.89), and specificity (0.50–0.74). Only twelve studies (50.0%) had validated their tools externally, with a variance in the external discrimination (AUROC: 0.31–0.79) and sensitivity (0.31–0.92). Most non-laboratory-based risk tools for pre-DM detection showed satisfactory performance in their study populations. The generalisability of these tools was unclear since most lacked external validation.


Introduction
In 2021, type 2 diabetes mellitus (T2DM) accounted for up to 6.7 million deaths, while impacting the lives of 537 million individuals globally [1]. T2DM is often preceded by a stage of sub-DM hyperglycaemia, known as pre-diabetes (pre-DM), which lasts for several years and can be reversible [2]. Indeed, with timely intervention, the blood glucose levels of pre-DM individuals can return to within the normal range [3]. Therefore, cost-effective methods that use the clinical and/or anthropometric characteristics of individuals to predict their pre-DM risks have gained a lot of attention among researchers and clinicians. Such methods can include risk prediction tools, models or algorithms.
To our knowledge, only one review, published in 2014, has evaluated pre-DM risk tools and included studies up to 2013 [4]. The review found that existing pre-DM tools offered similar internal predictive performances despite varying development methods and different numbers of predictors included [4]. However, it is important to note that the majority of studies included in this review used laboratory biomarkers (e.g., blood triglyceride levels) as predictors in the models [4], which limits the applicability for casefinding in general and primary care populations. Pre-DM risk prediction tools are intended to be simple, low-cost and non-laboratory-based in order to save unnecessary blood tests.
The inclusion of laboratory biomarkers cannot be cost-effective as the amount of time and cost incurred for an individual to obtain the required laboratory variable would be similar to performing a pre-DM and DM diagnostic blood test directly. Notably, recent studies have reported the association between DM risks and other less common, modifiable lifestyle factors, e.g., the level of alcohol consumption [5] and the quantity of sleep [6]. As a result, there has been increasing attention on using modifiable predictors to develop risk prediction tools. For instance, despite being developed by different methods, both of the non-laboratory-based risk prediction tools developed by Dong et al. in 2022 included sleeping hours as one of the predictors [7], which could indicate the clinical and statistical significance of such predictors in predicting pre-DM risks. Having said that, the effects of such lifestyle predictors on the prediction accuracy and performance of non-laboratorybased pre-DM risk prediction tools has not been reviewed. Furthermore, due to recent technological advancements, a number of recent studies that used novel methods, such as artificial intelligence and machine learning (ML), to develop prediction tools have been published since Barber et al.'s 2014 review [4].
The current study therefore aimed to systematically review existing non-laboratorybased pre-DM tools published in the literature, focusing on identifying important nonlaboratory predictors and evaluating the performance of these tools to provide an update on the current evidence.

Search Strategy
Separate searches were conducted on three medical databases (PubMed, Embase, MEDLINE), and on one nursing-related database (CINAHL), from 1946 until February 2023 to identify available studies. Embase and MEDLINE were searched via Ovid, while CINAHL was searched via EBSCOhost. In order to avoid missing potential studies, citation searching on reference lists of selected studies, and internet manual searching on Google Scholar were conducted. The detailed search strategy is listed in Table S1 of the Supplementary Material.

Screening and Selection of Studies
Studies were included if they met all of the following criteria:

1.
Included pre-DM as the only, or one of the, main outcome(s) of the risk prediction tool; 2.
Provided a detailed methodology for the development of their tool; 4.
Only utilised non-laboratory predictors as their prediction variables; 5.
Developed tools that were for adults (≥18 years old) in the general population; 6.
Published in the English language with full-text available.
Conversely, studies were excluded if they met any of the following: 1. Included gestational DM or Type 1 DM as the outcome(s) of risk prediction; 2.
Only investigated associations between predictors and outcomes; 3.
Only aimed to develop or test theoretical algorithms without the intention of implementation in clinical practice; 4.
Utilised any laboratory or genetic predictors as their prediction variables; 5.
Developed the tool for a specific population, e.g., pregnant women, children, patients of a specific disease group, or older people; 6.
Commentaries, editorials, conference abstracts, and systematic reviews.
EndNote X9 and EndNote 20 were used to store and manage identified studies. Following the removal of duplicates, two reviewers (W.C. and Y.M.) independently screened the titles and abstracts to select eligible studies based on the inclusion and exclusion criteria. Full texts of selected studies were then retrieved and independently reviewed. Disagree-ments or discrepancies were resolved through discussion to reach an agreement between the two reviewers.

Data Extraction and Quality Assessment
Data from the selected studies were extracted and tabulated into a Google Spreadsheet for the narrative synthesis, according to the following list: (1) study region, (2) study sample size, (3) data source for the study sample, (4) prediction outcome and its measurements, (5) methods used for tool development, (6) methods used for predictors selection, (7) predictors included in the final tool, and (8) performance evaluation measures in internal and/or external validation, including area under the Receiver-Operating Characteristic curve (AU-ROC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), with their respective 95% confidence intervals if reported. For ease of interpretation, extracted predictors were categorised into three groups: (i) socio-demographic factors, (ii) clinical factors, and (iii) lifestyle factors. Furthermore, predictors of a similar nature were combined under one broad umbrella term. For instance, (i) the predictor variable "hypertension" in this review included use of antihypertensive medications, history of hypertension, and duration of hypertension, but excluding 'systolic/diastolic blood pressure levels' as 'blood pressure' was counted as a standalone predictor; (ii) "history of hyperglycaemia" referred to past episodes of hyperglycaemia confirmed by a blood test in a medical check-up, during an illness, or during pregnancy; (iii) "dyslipidaemia" included dyslipidaemia and history of hyperlipidaemia; and (iv) "family history of DM" summarised any predictors related to the number of parents and/or siblings with DM. All units were converted to mmol/L for blood glucose levels, and to percentage for HbA1c for comparison and consistency.
We applied the Clinical Prediction Rule Checklist of Critical Appraisal Skills Programme (CASP) appraisal checklist [8] to assess the quality and risk-of-bias of the selected studies. This review was reported in compliance with the PRISMA Checklist and PRISMA flowchart [9].
Details of the protocol for this systematic review were registered on PROSPERO and can be accessed at www.crd.york.ac.uk/prospero/display_record.php?RecordID=345706 (accessed on 23 August 2022).

Results
A total of 6398 titles were identified from the database searches. Following removal of duplicates, 4686 articles were screened based on titles and abstracts, and 77 full texts were then retrieved. A total of 19 studies were eligible to be included in the review. Seven additional studies were identified through citation and internet manual searches, with five of them meeting the inclusion criteria. Finally, 24 studies were included in our review ( Figure 1). From the 24 studies, there were a total of 28 risk prediction tools developed. Table 1 provides a summary on the study subject characteristics and prediction tools of the included studies.

Quality of Included Studies
The CASP Clinical Prediction Rule Checklist was applied to assess the quality of the included studies (Table S2 of the Supplementary Material). The validity of the results reported in several studies is uncertain due to the lack of external validation [7,13,16,[18][19][20][21]23,28,[30][31][32]. As a result, the applicability of the findings is compromised. Furthermore, one study used a small external sample of 83 individuals to validate their tool [14], which could lead to potentially biased results. Overall, it was found that the methods used to construct different tools were adequately reported in nearly all of the studies. However, in a study that developed the prediction tool by ML [22], there could be selection bias due to limited explanations regarding how the factors were selected and weighted in the prediction algorithms. The majority of studies reported the performance of their prediction tools by AUROC, sensitivity, specificity, PPV, and NPV. However, two did not report the AUROC [27,28], four did not report the PPV and NPV of the tools [12,14,17,20], and two reported AUROC without referring to any other performance measurements [10,29]. Ten studies (41.7%) sought to improve the precision of the predictive performances by refining their tools with the addition and/or elimination of predictors following the initial validation [7,12,14,20,[25][26][27][28][29]32].

Quality of Included Studies
The CASP Clinical Prediction Rule Checklist was applied to assess the quality of the included studies (Table S2 of the Supplementary Material). The validity of the results reported in several studies is uncertain due to the lack of external validation [7,13,16,[18][19][20][21]23,28,[30][31][32]. As a result, the applicability of the findings is compromised. Furthermore, one study used a small external sample of 83 individuals to validate their tool [14], which could lead to potentially biased results. Overall, it was found that the methods used to construct different tools were adequately reported in nearly all of the studies. However, in a study that developed the prediction tool by ML [22], there could be selection bias due to limited explanations regarding how the factors were selected and weighted in the prediction algorithms. The majority of studies reported the performance of their prediction tools by AUROC, sensitivity, specificity, PPV, and NPV. However, two did not report the AUROC [27,28], four did not report the PPV and NPV of the tools [12,14,17,20], and two reported AUROC without referring to any other performance measurements [10,29]. Ten studies (41.7%) sought to improve the precision of the predictive performances by refining their tools with the addition and/or elimination of predictors following the initial validation [7,12,14,20,25-29,32].

Predictors for Risk Prediction Tools
Predictors among the pre-DM risk prediction tools, and their frequencies of being included in a tool, are summarised in Figure 2. In general, non-laboratory-based pre-DM risk prediction tools included a median of six predictors (range: two to twelve). A total of 23 different predictors were identified among the 28 tools, including 15 clinical, 4 socio-demographic, and 4 lifestyle factors. The risk prediction tools tend to include more clinical factors than socio-demographic and lifestyle factors. Age was the most common predictor to predict pre-DM as it was included in all but one of the tools (96.4%) [14]. Other commonly included factors were obesity, measured by body mass index (BMI) or waist circumference (26 tools), family history of DM (19 tools), hypertension (18 tools), and sex (14 tools). The most commonly included lifestyle factor among the tools was exercise. Less common predictors included waist-to-height ratio [13], sleep duration [7], and macrosomia (applicable to a woman who had given birth to a child with an excessive birth weight) [30]. Notably, age, family history of DM, hypertension and obesity (represented by BMI and/or WC) were predictors included among all the tools that had been externally validated [10][11][12]14,15,17,22,[24][25][26][27]29], indicating their robustness and reliability.

Predictors for Risk Prediction Tools
Predictors among the pre-DM risk prediction tools, and their frequencies of being included in a tool, are summarised in Figure 2. In general, non-laboratory-based pre-DM risk prediction tools included a median of six predictors (range: two to twelve). A total of 23 different predictors were identified among the 28 tools, including 15 clinical, 4 sociodemographic, and 4 lifestyle factors. The risk prediction tools tend to include more clinical factors than socio-demographic and lifestyle factors. Age was the most common predictor to predict pre-DM as it was included in all but one of the tools (96.4%) [14]. Other commonly included factors were obesity, measured by body mass index (BMI) or waist circumference (26 tools), family history of DM (19 tools), hypertension (18 tools), and sex (14 tools). The most commonly included lifestyle factor among the tools was exercise. Less common predictors included waist-to-height ratio [13], sleep duration [7], and macrosomia (applicable to a woman who had given birth to a child with an excessive birth weight) [30]. Notably, age, family history of DM, hypertension and obesity (represented by BMI and/or WC) were predictors included among all the tools that had been externally validated [10][11][12]14,15,17,22,[24][25][26][27]29], indicating their robustness and reliability.

Methods for Tool Development
Logistic regression (LR) was used to develop the prediction tool in all but one of the studies [27]. Of the four studies (16.7%) that applied more than one development method (other than logistic regression) [7,16,22,31], all used machine learning (ML). No studies reported a significant difference in predictive performances in the tools developed by different methods. For instance, Dong et al. (2022) [7] developed two pre-DM and DM risk prediction models using LR and ML methods, and found similar performance results (AUROC: 0.81 and 0.82, respectively). Another study reported a slightly inferior performance of the classification tree model (AUROC: 0.69) when compared with the LR model (AUROC: 0.72) [16].

Methods for Tool Development
Logistic regression (LR) was used to develop the prediction tool in all but one of the studies [27]. Of the four studies (16.7%) that applied more than one development method (other than logistic regression) [7,16,22,31], all used machine learning (ML). No studies reported a significant difference in predictive performances in the tools developed by different methods. For instance, Dong et al. (2022) [7] developed two pre-DM and DM risk prediction models using LR and ML methods, and found similar performance results (AUROC: 0.81 and 0.82, respectively). Another study reported a slightly inferior performance of the classification tree model (AUROC: 0.69) when compared with the LR model (AUROC: 0.72) [16].

Performance of Risk Prediction Tools
Performances of the risk prediction tools, when validated internally or externally, are summarised in Table 2. It was found that half of the studies validated their risk prediction tools using an external dataset [10][11][12]14,15,17,22,[24][25][26][27]29], with three such studies having validated the tools with two or more external datasets [15,17,24]. Eight studies (33.3%) had validated their tools internally [7,16,18,20,21,23,30,31]. Among them, five studies used a proportionated dataset derived from the same source as the development dataset [7,16,18,30,31], two studies validated using bootstrapping methods [20,23] and one study randomly removed fifty participants from their development dataset in order to serve it as their validation sample [21]. The sample size of the validation dataset ranged from 50 to 66,108, with a median of 1987. Four studies (16.7%) did not perform any validation and only reported the tool performances that were generated during development [13,19,28,32].
The most frequently reported prediction performance measure was AUROC, but two studies did not report this for pre-DM prediction [14,27]. Existing tools showed mostly fair performances in internal validation, two studies that performed internal validation using the split-sample method yielded AUROCs above 0.8 [7,18]. On the other hand, the performance of the tools in external validation when available was more variable, with AUROCs ranging from 0.31 to 0.79, and mostly between 0.6 and 0.8. The prediction models developed by Wang et al. (2015) [15] performed poorly (AUROC: 0.31 and 0.50) when they were validated in an external dataset that was demographically different from the development dataset. It has been noted that the 95% confidence interval for AUROC was not reported in nine of the included studies (37.5%) [10,13,14,16,24,25,27,29,31], with two of the nine studies not presenting any information on AUROC [14,27].
Finally, sensitivities and specificities of the tools, acquired either during development or as a result of validation, together with their corresponding risk thresholds or cut-offs, were reported among all studies. Only around half of the studies reported assessment on the prediction tools' goodness-of-fit, or the accuracy of the predicted risk against the observed risk, [7,11,12,15,[17][18][19][21][22][23][24][25]30,32], by calibration plots or the Hosmer-Lemeshow test [33]. Furthermore, only one study [17] evaluated their prediction tools using more recent performance measures, such as the decision-curve analysis [34].

Discussion
This review identified 28 risk prediction tools that used only non-laboratory predictors to detect individuals with pre-DM from 24 published studies. The published prediction tools included similar predictors such as age, family history of DM, hypertension and obesity (represented by BMI and/or WC), despite the potential cultures and lifestyles differences of subjects from different study locations, supporting their robustness and reliability. The majority of existing non-laboratory-based tools (n = 26) had fair to good discrimination in case finding of pre-DM in the population that they were developed for. It was found that existing logistic regression (LR) and machine learning (ML) risk tools offered similar performance. However, pre-DM was inconsistently defined and the external validity of most tools was unclear.
It should be noted that these factors were also predictors of T2DM risks [35]. A family history of DM is a well-established risk factor for developing pre-DM [36], while associations between the age and DM risks have also been widely reported [37], but unfortunately these are not modifiable. Modifiable clinical factors (e.g., BMI, waist-to-hip ratio (WHR)) and lifestyle factors (e.g., number of hours of sleep and duration of physical activity) are particularly important because they offer potentials for intervention to prevent pre-DM and T2DM. It helps patient activation to emphasise the reversibility of pre-DM through healthy lifestyles. Interestingly, sleep hours was an important predictor in predictor tools published in recent years [7], but it was not considered in most studies, probably because the data were not available. Inadequate sleep duration has been associated with increased T2DM risks, which is likely due to the influence that sleep has on regulating endogenous hormones, such as testosterone and cortisol [6,38]. Indeed, by incorporating sleep hours as one of the predictors but without the inclusion of family history of DM, Dong et al. were able to obtain AUROCs over 0.8 for their tools in internal validation [7]. It is important to note that without a head-to-head comparison between existing tools, it is difficult to determine whether tools that include particular predictors offer statistically better predictions.
Although theoretically ML can develop more accurate prediction models by the inclusion of more complex parameter interactions, our review indicated that prediction tools developed by traditional LR and novel ML methods offered similar predictive performances in detecting pre-DM individuals. Sadek et al. further showed that ML models that were developed by different ML techniques offered no statistically significant difference in performance in an external dataset [22]. Consistent with the literature [39], the ML pre-DM prediction tools found in our review did not provide the weights that govern the interactions between predictors [22], while traditional LR models offered a comparatively higher interpretability on the interactions between predictors. Therefore, LR models could be more suitable for pre-DM risks prediction in real-world clinical practice, whilst ML approaches might be better suited for exploring and identifying novel predictors [40,41].
The majority of existing non-laboratory-based risk prediction tools for case finding of pre-DM showed satisfactory internal discrimination. As only half of the studies had validated their tools in an external dataset, the external performance of most existing tools could not be established. In general, a lower discriminatory ability was found in external validation. Notably, the Southern Chinese pre-DM risk prediction tool developed by Wang et al. showed good internal and external discrimination among datasets that shared similar socioeconomical characteristics as the development dataset, but not when their tools were applied to an external dataset that comprised Chinese people from Western China [15]. Such results suggest that the performance of a risk predication tool can be compromised when it is applied to a population living in a different environment to that of the development population. In order to offer an insightful evaluation on the tools' performance, the validity and reliability in the population which they are intended to be used in as well as the appropriateness and representativeness of the dataset for external validation are important factors to be considered [42].
It was noted that some reporting inconsistency was found among the studies on non-laboratory-based pre-DM prediction tools. First, and most importantly, the casedefinition of pre-DM was inconsistent among the studies, possibly due to a change in pre-DM definition by the American Diabetes Association in 2009 [43]. As a result, several studies used a fasting plasma glucose level of ≥6.1 mmol/L to define pre-DM/DM [44], while other studies used ≥5.6 mmol/L [45]. The case definition can have a significant effect on the discriminatory ability, sensitivity and specificity of the prediction tool. Second, the indictors of prediction performance varied widely among the studies. Although most of the studies reported the discriminatory ability by AUROC, many did not provide information on calibration. To focus only on discrimination could produce misleading predictions and potentially be detrimental during clinical decision-making processes [46]. In addition, a lack of 95% confidence intervals on the performance measures (e.g., AUROC, sensitivity, specificity, PPV, NPV) of existing tools was also noted. Overall, these reporting inconsistencies could hinder the generalisability and applicability of existing tools.
The strength of this review is that it included an up-to-date synthesis of the results from studies that used traditional and novel strategies for prediction tool development. Our findings also provide evidence to support the feasibility and efficacy of using only readily available non-laboratory predictors to facilitate case finding of pre-DM. Furthermore, the findings on the importance of sleep hours and duration of exercise can inform the development of interventions for the prevention and treatment of pre-DM. However, there are several limitations regarding our review that must be acknowledged. A meta-analysis was not performed on the included studies due to the large heterogeneity among the outcome measures and the reported performance indices. In addition, the inclusion of only studies published in the English language could have introduced bias and resulted in some pre-DM risk prediction tools being missed.

Conclusions
This systematic review of 24 studies identified 28 non-laboratory-based pre-DM prediction tools. The most common predictors were age, family history of DM, hypertension and obesity measured by BMI and/or WC. Sleep hours and exercise duration were found to be important lifestyle predictors of pre-DM in more recent studies. Despite the difference in development methods, existing non-laboratory-based tools were mostly effective in the population that they were developed for. The generalisability of these tools was unclear as most of them had not been validated externally. External validation using datasets obtained from the intended target population should always be performed before application to practice for case-finding of pre-DM individuals.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable. Data Availability Statement: No additional dataset was generated. All data extracted and synthesised was as summarised and reported in Tables 1 and 2.