Agreement between Type 2 Diabetes Risk Scales in a Caucasian Population: A Systematic Review and Report

Early detection of people with undiagnosed type 2 diabetes (T2D) is an important public health concern. Several predictive equations for T2D have been proposed but most of them have not been externally validated and their performance could be compromised when clinical data is used. Clinical practice guidelines increasingly incorporate T2D risk prediction models as they support clinical decision making. The aims of this study were to systematically review prediction scores for T2D and to analyze the agreement between these risk scores in a large cross-sectional study of white western European workers. A systematic review of the PubMed, CINAHL, and EMBASE databases and a cross-sectional study in 59,042 Spanish workers was performed. Agreement between scores classifying participants as high risk was evaluated using the kappa statistic. The systematic review of 26 predictive models highlights a great heterogeneity in the risk predictors; there is a poor level of reporting, and most of them have not been externally validated. Regarding the agreement between risk scores, the DETECT-2 risk score scale classified 14.1% of subjects as high-risk, FINDRISC score 20.8%, Cambridge score 19.8%, the AUSDRISK score 26.4%, the EGAD study 30.3%, the Hisayama study 30.9%, the ARIC score 6.3%, and the ITD score 3.1%. The lowest agreement was observed between the ITD and the NUDS study derived score (κ = 0.067). Differences in diabetes incidence, prevalence, and weight of risk factors seem to account for the agreement differences between scores. A better agreement between the multi-ethnic derivate score (DETECT-2) and European derivate scores was observed. Risk models should be designed using more easily identifiable and reproducible health data in clinical practice.


Introduction
Type 2 diabetes (T2D) is a common disease associated with reduced life expectancy and considerable morbidity [1]. Furthermore, T2D is commonly an asymptomatic condition [2] associated with other non-communicable diseases, and causes a high number of hospitalizations and a significant economic impact [3,4]. According to the International Diabetes Federation (Diabetes Atlas 2019), the prevalence of diabetes is increasing worldwide, and it is estimated to increase from 9.3% to 10.9% by 2045, affecting 700 million adults [5]. In addition, about 50% of all people with diabetes worldwide are not diagnosed [1]. Diabetes prevalence varies among regions, ranging from 5% to 20%, with a higher incidence in Oceania, the Caribbean, South Asia, the Middle East, Latin America, and Central Asia [6].
Early detection of people with undiagnosed T2D is an important public health concern, as up to half of people with newly diagnosed T2D present one or more complications when it is diagnosed [7,8]. Several large clinical trials have shown that diabetes could be prevented by recommending lifestyle changes such as advocating physical activity, a healthy low fat diet and weight reduction [3,9,10]. These lifestyle changes have been shown to decrease the risk of diabetes by nearly 60% [9,[11][12][13][14]. Thus, policies focusing on diabetes prevention and early identification of T2D in populations at high risk may be worth seriously considering [15]. Furthermore, some studies show how some diabetes risk scores, such as the German Diabetes Risk Score (GDRS) or the Atherosclerosis Risk in Communities model (ARIC 2009), allow the identification of high-risk target groups for cost-effective lifestyle interventions to prevent T2D [16].
The worldwide number of published risk prediction tools for identifying individuals at risk of T2D has greatly increased in the last few years [15,17,18]. However, there is no consensus on which is the best risk score and only a few of them end up being used in clinical practice. Prognostic risk-score models for T2D vary from those including only clinical variables to those with a genetic score and biochemical markers. These risk scores have been derived over a wide range of different populations.
T2D risk scores should be externally validated using data from different settings, populations, and ethnics groups, because generalization outside of the context in which they were designed could affect their performance and therefore, their external usefulness [18]. A review and external validation of commonly used prediction tools including only clinical and conventional biomarkers demonstrated that risk prediction tools work properly in the validation cohorts [19]. However, most of the models overestimate the number of people at high risk. These fitting differences could be explained by differences in population baseline characteristics and in the methodology used [20][21][22][23]. Furthermore, ethnicity or diabetes incidence could play an important role in the differences between risk scores [24].
The aims of this study were to systematically review predictive T2D equations and to analyze the agreement between these prediction risk scores in a cross-section study of Spanish Caucasian workers. Risk scores derived from populations with very different characteristics such as distinct ethnicities and a wide range of diabetes incidence were considered.

Search Strategy
Articles that presented new risk prediction models for detecting T2D were identified. A systematic review was performed in Medline, PubMed, CINAHL, and EMBASE databases following PRISMA guidelines [25] looking for articles that reported models or predictive equations for incident diagnosis of T2D until July 2018. The following search string was used: (("diabetes mellitus" OR "type 2 diabetes" OR "diabetes") AND ("predictive model" OR "predictive equation" OR "prediction model" OR "prediction rule" OR "risk assessment" OR "risk score") NOT ("review" OR "bibliography")). Articles were restricted to English, Portuguese, and Spanish language literature. Reference lists was also verified for relevant citations. The search strategy was performed in cooperation with a research librarian. Unpublished literature was identified through the Information System on Gray Literature in Europe (Open Gray), Conference Proceedings of the Web of Science and ProQuest Dissertations, and Theses Global.
References of the studies identified by the literature search strategy were imported into EndNote X9 (Clarivate Analytics, Philadelphia, PA, USA) literature management software, and duplicates were removed. One researcher (A.A.) screened for the titles and abstracts of all articles identified by the search string to exclude articles that did not report risk prediction models. After reviewing the retrieved titles, to ensure the quality of the process, two additional authors (J.A.A.-V. and M.B.-V.) independently reviewed the abstracts to select the relevant papers. Each article was randomly assigned to reviewers. Discrepancies between reviewers were solved by a third reviewer (A.L.). To reduce the risk of bias, a pilot exercise was carried out to apply the inclusion criteria in a sample of 10 references.
Study characteristics and study data was managed using Microsoft Excel 2013 (Microsoft Corp, Redmond, WA, USA, www.microsoft.com) and Review Manager software (RevMan version 5.3, Copenhagen, Denmark: The Nordic Cochrane Centre, the Cochrane Collaboration 2014), respectively. A standardized form was used for data extraction, including general information (author(s), journal, location, year, country, and conflict of interest); population characteristics (age, sex, ethnicity, and inclusion and exclusion criteria); and study details (study design, sample size, statistical analysis, and bias). Articles where the model/equation included genetic testing or non-common biomarkers were excluded.

Study Design
A cross-sectional study with Caucasian western European adult workers (aged 20-65 years) was performed. All subjects were from a Spanish Mediterranean area and belonged to different productive sectors (public administration, health department, and the post office). Study design, procedures and reporting followed guidance from the STROBE statement on observational studies [26]. To evaluate the correlation of the retrieved prediction models, the original published prediction models were used (scores or original regression equations). Then, participants' risk of developing T2D was determined using the following models: diabetes and impaired glucose tolerance score (DETECT-2) [27]; Danish Diabetes Risk Score (DDRS) [28]; data from the epidemiological study on the insulin resistance syndrome score (DESIR) [29]; Cambridge risk score [30]; QDScore [31]; FINDRISC score [32]; EGAT score [33]; Australian T2D risk score (AUSDRISC) [34]; instrument for T2D score (ITD) [35]; atherosclerosis risk in communities score (ARIC) [36]; San Antonio prediction model risk score [37]; Framingham offspring score [38]; diabetes population risk tool score (DPoRT) [39]; scores from Oman [40], India (IDRS) [41], Taiwan [42], and Kyushu island of Japan [43]; and scores from military officers of China [44], and Mauritian Indians [45]. Six predictive models selected in the systematic review were not included to analyze the correlation with the equation because it was impossible to calculate the score in the German Diabetes Risk Score (GDRS) [46] and the modified GDRS [47] because diet was not ascertained in our population. The Tromsø Study [48], the Mauritan Indians risk score [45], the Tehran lipid and glucose studies [49], and the AusDiab [50] general scores were not determined because the original paper did not report enough data to allow proper score calculations. Agreement between results was analyzed as it is indicated in the statistical analysis section.

Participants and Recruitment
Participants in the study were recruited during their periodic health examination in the workplace between January 2008 and December 2010. Every day each worker was assigned a number, and half of the examined workers were randomly selected using a random number table. Thus, from a total population of 130,487 workers, 65,200 were invited to participate in the study. 14,946 (22.9%) refused to participate, leaving the final number of participants standing at 590,424 (77.1%), with 25,510 women (43.2%), and 33,532 men (56.8%). The mean age of the participants in the study was 39.7 years (SD 10.2). All participants were informed of the purpose of this study before they provided written informed consent to participate. Following the current legislation, members of the Health and Safety Committees were informed as well. The study protocol was in accordance with the Declaration of Helsinki and was approved by the Institutional Review Board of the Mallorca Health Management (GESMA). After acceptance, a self-reported complete medical history, including family and personal history, was recorded. The following inclusion criteria were considered: age between 18 and 65 years (working age population), being gainfully employed, and without a previous diagnosis of diabetes.
Subjects who did not meet any of the inclusion criteria and those who refused to participate were excluded from the study.

Samples and Measurements
The methodology used was similar to the one previously reported [51]. Anthropometric measurements were made in the morning at the same time, and according to the recommendations of the International Standards for Anthropometric Assessment (ISAK) [52]. Furthermore, all measurements were performed by well-trained technicians or researchers to minimize coefficients of variation. Body weight (electronic scale Seca 700 scale, Seca GmbH, Hamburg, Germany), height (stadiometer Seca 220 CM Telescopic Height Rod for Column Scales, Seca GmbH, Hamburg, Germany), and abdominal waist circumference using a Lufkin Executive ® Thinline, precision 1 mm (Lufkin Executive Thinline, W606PM, Cooper Industries, Lexington, SC, USA) were determined according to the aforementioned recommended techniques. Body mass index (BMI) was calculated as weight (kg) divided by height (m) squared. Waist circumference was measured halfway between the lower costal border and the iliac crest. The measurement was made at the end of a normal expiration while the subject stood upright, with feet together and arms hanging freely at their sides.
Venous blood samples were taken after participants were seated at rest for at least 15 min from the antecubital vein with suitable vacutainers without anticoagulant to obtain serum. Blood samples were taken following a 12 h overnight fast. Concentrations of glucose, cholesterol and triglycerides were measured in serum by standard clinical biochemistry laboratory procedures using an automated hematology analyzer (SYNCHRON CX ® 9 PRO, Beckman Coulter, Brea, CA, USA).

Statistical Analysis
Descriptive analysis was used to report the frequency and distribution of categorical variables, whereas means and standard deviations (SDs) were reported for quantitative variables. The Spearman correlation coefficients were used to analyze the correlation between prediction scores of T2D. Participants were classified as high risk for developing diabetes if the cut-off points were reported in the publication; the following cut-off points were considered for classifying people at high risk: 31 points out of 60 in the Danish Diabetes Risk Score (DDRS); 7 points out of 32 in the Diabetes and Impaired Glucose Tolerance score (DETECT-2); 0.37 out of 1 point in the Cambridge score; a score of over 6 out of 20 in the FINDRISC; over 12 out of 43 in the Australian T2D Risk (AUSDRISK); over 60 out of 100 in the Indian Diabetes Risk Score (IDRS); over 6 out of 17 in the Electric Generating Authority of Thailand Study (EGATS); over 14 out of 49 in the Hisayama study; over 21 out of 40 in the National Urban Diabetes Survey (NUDS); and 55 out of 100 in the instrument for T2D (ITD). Agreement between 2 by 2 charts of participants classified as high risk was assessed using the kappa statistic. Statistical analyses were performed using IBM SPSS Statistics version 24 (SPSS/IBM, Chicago, IL, USA).

Results
The literature search strategy retrieved 820 original articles, 42 of which met the inclusion criteria considered ( Figure 1). The summarized characteristics of the studies included in the review are shown in Table 1. We selected prediction tools developed from adult or middle adult populations that include predictors generally available in health records (demographics, history of parental diabetes or gestational diabetes, obesity, diet, lifestyle factors, obesity, antihypertensive medication, use of corticoids) as well as conventional biomarkers such as glucose, HDL-cholesterol, LDL-cholesterol, and triglycerides.

Mean (SD)/n (%)
Age (years) 39 Table 4 shows the Spearman correlation coefficients of the models. There is a wide range of correlation values, from low to high values of correlation. For example, correlations considering the Cambridge score range from 0.560 to 0.898, and correlations including the Framingham offspring range from 0.481 to 0.760. Table 5 shows the agreement (kappa) between risk predicted (high and non-high risk) and Table S1 (Supplementary Material) shows the distribution of the population into high risk and non-high risk groups. Agreements found between the classifications using the different scores ranged from 0.412 (between the ARIC and the DESIR score) to 0.916 (between AUSDRISK and the Hisayama study).

Discussion
There is great variability between risk prediction models for developing T2D. These prediction models include a wide range of clinical variables and conventional biomarkers, from the most simple models including only age, waist circumference, parental history of diabetes, and physical exercise practice [41], to the most complex including also dietary characteristics [16,46,47], social deprivation measures [31], educational level [48], and ethnicity [31,34,37,39].
The most commonly used risk predictors were age, BMI or obesity, family history of diabetes, and hypertension. There were differences in the weight of the risk predictors included in the equation, the adjusted odds ratio of obesity for undiagnosed T2D varied when comparing different countries from North Europe such as Denmark (≤30 kg/m 2 vs. <25 kg/m 2 ) 4.4 (2.6-7.3) [28] and Finland 2.99 (1.31-6.81) [32] to those from Asia such as Thailand (≤27,5 kg/m 2 vs. 23 kg/m 2 ) 1.74 (1.17-2.60) [33], and China (≤28 kg/m 2 vs. 24 kg/m 2 ) 1.56 (1.03-2.38) [44]. As it can be observed, there were differences in the BMI cut-off point for obesity, which was lower in Asian populations. The prevalence of a risk factor such as obesity and the cut-off points also differed, ranging from 16.3% (≤30 kg/m 2 ) in Demark [28] to 6.3% (≤27.5 kg/m 2 ) in the derivation cohort from the EGAT study in Thailand [33].
In the present cross-sectional study, the scores of the retrieved prediction equations as well as the Spearman coefficients for the correlations between them were calculated. In the Caucasian population considered, no higher correlations were found between the scores derived from Caucasians and the ones derived from other ethnic groups, prevalence of diabetes, estimated cut-off points or country proximity. However, higher correlations were found between the scores that included only clinical variables than between those that included clinical and conventional biomarkers. Furthermore, higher correlations were found between models that included only hypertension as risk predictors of T2D than between those that included several cardiovascular risk factors (systolic and diastolic blood pressure, cholesterol, triglyceride levels, etc.) There is also poor agreement when models derived from "special populations" such as volunteers [29], Chinese living in Taiwan [42], or military officers [44] were considered.
The percentage of people classified as high risk ranged from 3.1% [35] to 47.1% [54]. These differences could be due to the fact that the cut-off point to classify people at high risk of developing T2D is clearly related to the incidence of diabetes which, in the studies considered, ranged from 1.3% in an adult population in the UK [30] to 26.6% in a Chinese population of military officers in Beijing [44].
Our study suggests that when highly diverse diabetes incidence and ethnicity derived risk predictor models were applied in a Caucasian worker population, poor agreements were achieved. Furthermore, differences in people classified as at high risk were also observed. Agreement did not improve by prevalence of diabetes or country proximity. However, in a validation cohort in a worldwide population (Africa, Asia, Oceania, North America, and Europe) [20], the area under the curve of white Caucasian population behaved similarly, and showed a better prediction between geographically closer countries, showing lower specificity when European developed risk prediction models were applied to African or Asian populations.
Agreement between models did not improve when ethnicity was considered in the models. In this regard, Tanamas [74], in a multi-ethnic cohort validation of highly diverse ethnicity development predictive models, showed a modest influence of the ethnicity in the development cohort in the prediction but there was no evidence that models performed better in populations with a similarity between the development derivation ethnicity and the ethnics in the validation cohort. In this sense, Rosella [24] in a multi-ethnic cohort showed that adding ethnicity did not improve discrimination or the accuracy of predictive models. The causes of the ethnic differences in T2D incidence are not well known. Specifically, the relative contributions of genetic and environmental factors to such differences are largely unknown. Only a few studies in isolated populations have shown evidence on how differences in frequencies of known T2D susceptibility genetic alleles account for ethnic differences [75]. However, research for genetic susceptibility has not been uniform among the world's ethnic groups. Actually, ethnicity is associated with many other risk factors for T2D that may account for the race/ethnic differences in risk of T2D. These factors include, among others, obesity or overweight, prediabetes condition, diet characteristics, socioeconomic status, area of residence, and environmental contaminants [75,76]. An improved understanding of the impact of these factors on T2D risk should lead to more effective preventive strategies. Performing better designed research must be a goal to understand the ethnic related risk for T2D. Belonging to similar ethnicities or showing similar T2D risk could not be the best way of ascertaining whether a model will perform properly in another population.
Although from the individual risk perspective, ethnicity information could be important, when predicting new cases of diabetes at the population level, detailed ethnic information has not been shown to improve discrimination and accuracy of the model or to identify a significant higher number of diabetics in the population. Therefore, it could be more important to develop models using measurements highly reproducible and available in the clinical practice.

Limitations
Our systematic review was limited to English, Portuguese, and Spanish language articles; therefore, we may have missed some useful studies. We would like to emphasize that the purpose of the study was to highlight the heterogeneity of the risk of developing diabetes in this population when using different risk prediction models. It was not possible to validate this onset of disease in participants in the present study since they were not followed up.

Conclusions
Numerous T2D prediction models exist based on readily available health data and provide an adequate but not perfect estimate of the chance of developing T2D in the future. The systematic review of 26 predictive models highlights a great heterogeneity in the risk predictors included and the cut-off points of some risk predictors. A poor reporting of the development procedure of the risk prediction models in terms of describing the data and providing sufficient detail in all steps taking in building the model has been observed. Furthermore, most of the models have not been externally validated. Ethnicity includes intrinsically important genetic and environmental factors related to diabetes onset; however, the evidence is still controversial as regards the influence of ethnicity as an independent risk predictor for T2D onset. Risk prediction models should be derived from the general population and further research is required to improve prediction of T2D.
Differences in diabetes incidence, prevalence, and weight of risk factors seem to account for the agreement differences between scores. In the Caucasian population of workers considered in the present study there is better agreement between the multi-ethnic derivate score (DETECT-2) and European derivate scores. Risk models development should change towards the use of more available and reproductible risk predictors.
Supplementary Materials: The following are available online at http://www.mdpi.com/2077-0383/9/5/1546/s1, Table S1: Distribution of people at high risk using DETECT-2, DDRS, FINDRISC, EGATS, NUDS, Hisayama, ITD and ARIC score. Funding: This project was funded by the Carlos III Health Institute (Ministry of Economy and Competitiveness, Spain) through the Network for Prevention and Health Promotion in Primary Care (redIAPP, RD16/0007/008), and by European Union ERDF funds.

Acknowledgments:
The authors would like to thank all participants in the study and the colleagues who assisted in the data collection.

Conflicts of Interest:
The authors declare no conflict of interest.