Screening Model for Estimating Undiagnosed Diabetes among People with a Family History of Diabetes Mellitus: A KNHANES-Based Study

A screening model for estimating undiagnosed diabetes mellitus (UDM) is important for early medical care. There is minimal research and a serious lack of screening models for people with a family history of diabetes (FHD), especially one which incorporates gender characteristics. Therefore, the primary objective of our study was to develop a screening model for estimating UDM among people with FHD and enable its validation. We used data from the Korean National Health and Nutrition Examination Survey (KNHANES). KNAHNES (2010–2016) was used as a developmental cohort (n = 5939) and was then evaluated in a validation cohort (n = 1047) KNHANES (2017). We developed the screening model for UDM in male (SMM), female (SMF), and male and female combined (SMP) with FHD using backward stepwise logistic regression analysis. The SMM and SMF showed an appropriate performance (area under curve (AUC) = 76.2% and 77.9%) compared with SMP (AUC = 72.9%) in the validation cohort. Consequently, simple screening models were developed and validated, for the estimation of UDM among patients in the FHD group, which is expected to reduce the burden on the national health care system.


Introduction
Diabetes mellitus affected 422 million people worldwide in 2014, and this number continues to increase [1]. Globally, the proportion of patients with undiagnosed diabetes mellitus (UDM) varies from 30 to 80% [2], with most cases being asymptomatic [3,4]. Early diagnosis allows for optimized treatment of patients with type 2 diabetes mellitus, which helps to achieve good outcomes among individuals with a long asymptomatic disease phase [5,6]. Although the oral glucose tolerance test, fasting plasma glucose level, and hemoglobin A1C level are established biochemical indicators in people with diabetes mellitus [7], it is insufficient to stratify a large population in developing countries [8]. Population-wide screening models for UDM allow for the estimation of individuals at a high risk of developing diabetes without requiring invasive laboratory tests [7]. With these backgrounds, considerable screening models have been previously proposed for various countries and ethnic groups [8][9][10][11][12][13][14][15][16][17]: Glumer et al. developed a simple self-administered questionnaire, identifying individuals with undiagnosed diabetes [8]. Lee et al. developed and validated a self-assessment score for undiagnosed diabetes in the Korean population, based on The Korea National Health and Nutrition Examination Survey (KNHANES) [11]. Katulanda et al. developed a non-invasive screening tool that can classify 80% of undiagnosed diabetes mellitus by selecting 40% of Sri Lankan adults [12]. Heikes et al. developed a simple tool for identifying undiagnosed diabetes mellitus and pre-diabetes mellitus [13]. Zhou et al. developed a screening model for self-assessment of diabetes in a rural Chinese population [8]. Aekplakorn et al. proposed an estimation model for people at a high risk for diabetes mellitus in Thailand [14]. Nanri et al. proposed a prediction model for three-year incidence of type 2 diabetes mellitus in the Japanese population [15]. Gao et al. developed and validated a screening tool for diabetes in Chinese adults [16]. Baan et al. proposed a prediction model to identify individuals who had an increased risk of undiagnosed diabetes [17]. With these backgrounds, previous screening models showed that age and family history of diabetes (FHD) must be included as common variables and generally included age, sex, BMI, family history of diabetes as important variables.
Yang et al. and the authors of this manuscript independently showed that family history of diabetes mellitus (FHD) is significantly correlated with UDM compared to non-FHD [18]. Therefore, we surmised that a practical and meaningful screening model for UDM should be introduced into public health care to reduce UDM in potential high-risk populations. Even though a considerable number of screening models have been developed and are in use, there is a lack of screening models for people with FHD. There has also been minimal research regarding the development of UDM screening models which also incorporate gender characteristics. Consequently, the objective of our study was to develop a practical screening model to estimate the high risk of UDM in people with FHD.

Materials and Methods
The development of UDM screening model for population (SMP) comprising both males and females, male (SMM) population and female (SMF) population in FHD group consists of three steps, as shown in Figure 1. In the first step, KNAHANES datasets collected from 2010 to 2017 were consistently combined. If the scale measurement of a variable was not changed during the study period, it was included in the present study. The combined dataset was pre-processed to obtain a reliable experimental dataset. In the second step, basic characteristics were analyzed comparatively for FHD and non-FHD groups, for male and female populations. Non-invasive variables (NIVs) were selected based on bivariate analysis for each the male, female, and male and female combined populations. These NIVs were used to develop the SMP, SMM, and SMF. These models were evaluated with various measures based on validation datasets in order to generate efficient screening models for people with FHD. Finally, these models were transformed to assign a simple linear risk score to yield a simplified estimate of UDM in people with FHD. With these backgrounds, considerable screening models have been previously proposed for various countries and ethnic groups [8][9][10][11][12][13][14][15][16][17]: Glumer et al. developed a simple self-administered questionnaire, identifying individuals with undiagnosed diabetes [8]. Lee et al. developed and validated a self-assessment score for undiagnosed diabetes in the Korean population, based on The Korea National Health and Nutrition Examination Survey (KNHANES) [11]. Katulanda et al. developed a non-invasive screening tool that can classify 80% of undiagnosed diabetes mellitus by selecting 40% of Sri Lankan adults [12]. Heikes et al. developed a simple tool for identifying undiagnosed diabetes mellitus and pre-diabetes mellitus [13]. Zhou et al. developed a screening model for self-assessment of diabetes in a rural Chinese population [8]. Aekplakorn et al. proposed an estimation model for people at a high risk for diabetes mellitus in Thailand [14]. Nanri et al. proposed a prediction model for three-year incidence of type 2 diabetes mellitus in the Japanese population [15]. Gao et al. developed and validated a screening tool for diabetes in Chinese adults [16]. Baan et al. proposed a prediction model to identify individuals who had an increased risk of undiagnosed diabetes [17]. With these backgrounds, previous screening models showed that age and family history of diabetes (FHD) must be included as common variables and generally included age, sex, BMI, family history of diabetes as important variables.
Yang et al. and the authors of this manuscript independently showed that family history of diabetes mellitus (FHD) is significantly correlated with UDM compared to non-FHD [18]. Therefore, we surmised that a practical and meaningful screening model for UDM should be introduced into public health care to reduce UDM in potential high-risk populations. Even though a considerable number of screening models have been developed and are in use, there is a lack of screening models for people with FHD. There has also been minimal research regarding the development of UDM screening models which also incorporate gender characteristics. Consequently, the objective of our study was to develop a practical screening model to estimate the high risk of UDM in people with FHD.

Materials and Methods
The development of UDM screening model for population (SMP) comprising both males and females, male (SMM) population and female (SMF) population in FHD group consists of three steps, as shown in Figure 1. In the first step, KNAHANES datasets collected from 2010 to 2017 were consistently combined. If the scale measurement of a variable was not changed during the study period, it was included in the present study. The combined dataset was pre-processed to obtain a reliable experimental dataset. In the second step, basic characteristics were analyzed comparatively for FHD and non-FHD groups, for male and female populations. Non-invasive variables (NIVs) were selected based on bivariate analysis for each the male, female, and male and female combined populations. These NIVs were used to develop the SMP, SMM, and SMF. These models were evaluated with various measures based on validation datasets in order to generate efficient screening models for people with FHD. Finally, these models were transformed to assign a simple linear risk score to yield a simplified estimate of UDM in people with FHD.   was performed without approbation of IRB because this survey is the direct public welfare based on Bioethics law in republic of Korea. Additionally, personal identifiable Information such as name was masked and replaced with unique personal identification numbers. KNHANES data is openly published on website (https://knhanes.cdc.go.kr) [19]. was performed without approbation of IRB because this survey is the direct public welfare based on Bioethics law in republic of Korea. Additionally, personal identifiable Information such as name was masked and replaced with unique personal identification numbers. KNHANES data is openly published on website (https://knhanes.cdc.go.kr). [19] KNHANES is a national surveillance system that assesses the health and nutrition status of Koreans, as stipulated in the National Health Promotion Act. KNHANES database includes information on individuals' health-related behaviors, quality of life, healthcare utilization, anthropometric measures, bio-chemical and clinical profiles [15]. Without duplications, there were 64,759 subjects in the KNHANES 2010-2017 dataset. Those with an age <20 years (n = 14,812), previous diagnosis of diabetes or drug treatments (insulin or hypoglycemic agent) (n = 4144), null or response of unknown (n = 8613), or non-FHD (n = 30,204) were excluded. Study subjects were divided into a development dataset (n = 5939, KNHANES 2010-2016) and validation dataset (n = 1047, KNHANES 2017). Both the development dataset and validation dataset were separated into a male (n = 2270 and 402) and female (n = 3669 and 645) subjects, respectively. These processes are shown in Figure 2.  KNHANES is a national surveillance system that assesses the health and nutrition status of Koreans, as stipulated in the National Health Promotion Act. KNHANES database includes information on individuals' health-related behaviors, quality of life, healthcare utilization, anthropometric measures, bio-chemical and clinical profiles [15]. Without duplications, there were 64,759 subjects in the KNHANES 2010-2017 dataset. Those with an age <20 years (n = 14,812), previous diagnosis of diabetes or drug treatments (insulin or hypoglycemic agent) (n = 4144), null or response of unknown (n = 8613), or non-FHD (n = 30,204) were excluded. Study subjects were divided into a development dataset (n = 5939, KNHANES 2010-2016) and validation dataset (n = 1047, KNHANES 2017). Both the development dataset and validation dataset were separated into a male (n = 2270 and 402) and female (n = 3669 and 645) subjects, respectively. These processes are shown in Figure 2.

Definition of Terminology
The UDM was defined as people having fasting plasma glucose (FPG) levels ≥ 126 mg/dL (6.993 mmol/L), having received no previous diagnosis of diabetes mellitus from healthcare professionals and receiving no insulin or oral anti-diabetic agents [11]. Impaired fasting glucose (IFG) was defined as an FPG of 100-125 mg/dL (5.55-6.9375 mmol/L) with above-constraint satisfaction. Individuals were categorized as having FHD including parents or siblings were diagnosed with diabetes mellitus [2]. Body mass index (BMI) was categorized into underweight (BMI < 18.5 kg/m 2 ), normal weight (18.5-24.9 kg/m 2 ), overweight (25-29.9 kg/m 2 ), and obese (BMI ≥ 30 kg/m 2 ) [20]. Abdominal obesity was classified as abnormal among males whose waist circumference (WC) was more than 90 cm and among females whose WC was more than 85 cm [20]. Drinking was calculated based on the number of units consumed per week (more than four shots on 5 days per week). Walking indicated a walking workout on more than 6 days per week. Weight training was defined as participating in weight training if for more than a day per week. Stress and depression were incorporated as factors through questionnaires, where participants answered on the levels of stress they felt in their daily lives and the levels of sadness and desperation that interfered with their daily lives for more than two weeks in a row, respectively [19]

Statistical Analyses
Descriptive statistics were used to report participant characteristics. Between-group comparisons were performed with t-tests and Chi-square tests; continuous and categorical variables were expressed as mean ± standard deviation (SD) and percentages (%). In the experimental development analyses of the SMP, SMM and SMF, we discretized continuous variables into nominal variables in order to generate scores in the individual populations. Age (Q1: 37, Q2: 45, Q3: 55) and WC (Q1: 74.2, Q2: 80.9, Q3: 87.8) in male and female combined were discretized in accordance with the discretization criterion that was a quartile approach, and BMI was discretized by WHO criteria. In male and female populations, the ages of male (Q1: 37, Q2: 44, Q3: 55) and female (Q1: 37, Q2: 46, Q3: 56) participants were discretized by quartiles based on each distribution, and BMI and WC were discretized by WHO criteria. A univariate logistic regression approach analyzed all potential univariate correlations (p < 0.05) in order to generate the multivariate logistic regression model by backward stepwise logistic regression that includes prognosis predictors. We defined scores by rounding down odds ratios (ORs) to first decimal places in the final screening model for estimation of UDM. For example, OR 1.43 was rounded to 1.4 and OR 2.18 was rounded to 2.1. These models were evaluated for sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (PIR), negative likelihood ratio (NIR), Youden index (Youden), and acre under curve (AUC) [12]. These analyses were performed using R software version 3.5.2 (R Foundation for Statistical Computing, Vienna., Austria; http://www.r-project.org/)

Comparison of Basic Characteristics between Non-FHD Group and FHD Group
The FHD group showed higher body mass index, weight, height, waist circumference, DBP, CHOL, TG, LDL, fasting glucose levels, proportion of females and levels of stress in the high-, mid-high-, mid-low-and low-level categories than the non-FHD group. However, the FHD group was younger, were less likely to have had a previous smoking habit, walked less, had lower incidences of depression, SBP, BUN, Creatinine levels, and hypertension. However, the ratio of IFG and undiagnosed diabetes were higher in the FHD group (Table 1). In order to adjust variable bias, we used a propensity score matching which was used conditional variable such as age, sex, waist circumference, and body mass index. As a result, FHD group showed significant difference as compared with non-FHD group (Table A1). We used a false discovery rate (FDR), and the result of this experiment was statistically significant (Table A2). FRD has shown same variables in Table 1 which are with p-value less than 0.05.

Comparison of Basic Characteristics between Male and Female Participants in the FHD Group
The males in the FHD group had higher values of BMI, weight, height, waist circumference, hospital visit rate, drinking, current smoking, previous smoking, walking, weight training, SBP, DBP, CHOL, TG, HB, HCT, BUN, creatinine, fasting glucose level and hypertension. On the contrary, the males in the FHD were younger than the females, had lower levels of depression, HDL and LDL.
Furthermore, the proportion of IFG and undiagnosed diabetes were higher in males in the FHD group as shown in (Table A3).

Screening Model Development and Performance Evaluation
Univariate analysis revealed several potential causal factors in the development of undiagnosed diabetes in each population. The group which were combined male and female participants with FHD included variables such as age in years, male, BMI, WC, hypertension, drinking, current smoking, and previous smoking. The male population was age, BMI, WC, hypertension and drinking. Age, BMI, WC, hypertension, and drinking were shown to be significant variables in the female population ( Table 2). In multivariate analysis, age, male, BMI, WC, drinking, and hypertension correlated with UDM in the male and female combined. Age, BMI, WC, drinking, and hypertension were associated significantly with UDM in the male population. Age, BMI, WC, and drinking were significant in the female population (Table 3). The simplified screening model was created using relevant non-invasive variables based on OR. Continuous variables were divided into four intervals. Values within the first quartile were assigned 1 point and were used as a reference, values in the other quartiles received higher scores, depending on the associated OR. For example, the OR of age in males is 1.43, so people < 37 years of age were assigned 1 point, those 37 ≤ age < 45 were assigned 14 points, those 45 ≤ age < 55 were assigned 28 points, and those 55 ≤ years were assigned 42 points. In the case of categorical variables, the risk score was set to 1 point if the value was no; otherwise, the score increased based on the OR. For instance, hypertension in males had an OR of 2.02, so males with hypertension were assigned 20 points and the other people were assigned 1 point. Furthermore, we defined the nomogram based logistic function to automatically estimate people with UDM, which takes the sum of the scores identified from the score tables in Figure A1. For example, male A is 37 years of age, BMI 30, adnominal waist circumference, drinking, and non-hypertension corresponding to respective risk scores of 14, 45, 15, 21, and with 1 point. The cumulative risk score for people like A is 96 points. The people like A had risk probability of 17.9% for UDM, and the model should recommend people in this category of A have a consultation with an internal medical doctor.
The screening model was verified using the validation dataset (KNHANES 2017) of 1047 people. These data were collected at a different time point to that of the development dataset (KNHANES 2010(KNHANES -2016. We evaluated discriminations with respect to models of UDM, based on the validation dataset using various measures such as sensitivity, specificity, PPV, NPV, PIR, NIR, Youden, and AUC ( Table 4)

Discussion
The estimation of people with UDM is important in ensuring positive patient outcomes and preventing complications. Previous studies from various countries and with diverse populations have shown adequate goodness of fit and satisfactory validity of the proposed models for identifying individuals with UDM. [8,[10][11][12][13][14][15][16][17]. These studies centered on the estimation of UDM in the general population. In contrast to these studies, our model concentrated on specific smaller populations, namely male and female with FHD. In our experimental results, people with FHD displayed a variety of adverse indicators including higher BMI, weight, waist circumference, CHOL, DBP, TG, LDL, fasting glucose level, higher frequency of stress, and lower frequency of walking compared to people without FHD. Although people with FHD tended to be younger, had a lower frequency of depression, previous smoking, hypertension, lower BUN, and creatinine levels, this group showed significantly higher rates of IFG and UDM, compared to the non-FHD population. Especially, we used a propensity score matching to adjust differences of the basic clinical characteristics and unbalanced number of people in the FHD and non-FHD group. Conditional variables were used age, sex, waist circumference, and body mass index. After adjustment basic characteristic, the FHD group showed significant difference as compared with non-FHD (Table A1). Additionally, multivariate analysis result for non-FHD showed significantly difference as compared with FHD results (Table A4). Furthermore, male population in FHD showed a higher BMI, waist circumference, drinking, current smoking, previous smoking, SBP, DBP, CHOL, HDL, TG, BUN, creatinine, fasting glucose, and hypertension compared to the female population. This group especially showed significantly higher frequencies of IFG and UDM. With these backgrounds, we focused on establishing screening models dedicated to male and female populations with FHD.
In order to reflect the characteristics of male and female populations in FHD, we generated SMM and SMF, respectively. Common estimation factors in each model were age, BMI, waist circumference, and drinking. Age was a major risk factor in the model, as age is strongly associated with an increased risk of diabetes mellitus [2,21,22]. Our model included an estimation factor for aging which was divided into quartiles according to the respective gender distributions. Waist circumference is considered a reliable indicator of future diabetes mellitus risk [23][24][25]. In our experimental model, abnormality in waist circumference was differentially defined between male and female populations, according to WHO criteria [20]. Drinking is associated with disruption of the glucose metabolism, including its effects on the muscle, liver, and adipose tissue, as observed under baseline conditions and following stimulation [26][27][28][29]. Although drinking was significantly higher among the male population compared to the female population, this factor was equally included in the estimation models. BMI is an important variable to estimate UDM in this model [30][31][32][33], we used BMI according to the WHO criteria [20]. Hypertension is an important comorbidity among patients with diabetes [34,35], and has significantly higher incidence among diabetics compared to the general population [36][37][38]. Hypertension occurrence is higher among males than females, regardless of race and ethnicity [39][40][41]. In our experimental model, hypertension was the only differential factor between SMM and SMF in a comparison based on estimation factors. Hypertension is therefore uniquely associated with UDM among the male population with FHD.
Compared to previous studies, the differentiation between male and female populations with FHD makes our screening model unique. Significant differences in experimental results were observed in not only the basic characteristics between male and female populations but also the components of SMM and SMF. Crucially, SMM and SMF showed more appropriate performance compared to SMP. We therefore recommend subdividing the population based on sexes to generate models for the estimation of UDM. The models were evaluated to guarantee reliability based on the validation dataset (KNHANES 2017) being assembled at a different timepoint to the development dataset (KNHANES 2010(KNHANES -2016.
Our study has some limitations. First, FHD and non-FHD groups showed unbalance distribution, which should cause bias result that ignoring people without FHD. Second, this study has been carried out very specific dataset in a country. Third, recently, machine learning algorithms were applied to develop the screening model in various clinical area. Our study focused on statistical model based on logistic regression. In order to overcome these limitations, a randomized, prospective, large volume clinical trial and various approach such as machine learning should be clearly required in future work.

Conclusions
The proposed screening models included non-invasive variables, which can be used in large populations. This simple and practical tool can be used by patients with limited access to clinical examinations, including blood tests. These screening models showed good predictive performance that was transformed into a simple linear screening score, which can be easily calculated. Future studies should examine larger populations and analyze the impact of comorbidity and chronic disease. Moreover, research on relevant predictive models are needed based on multinational standards.

Conflicts of Interest:
The authors declare no conflict of interest.

Data Sharing Statement:
The dataset used during the current study are available from the corresponding authors upon reasonable request. Table A1. Propensity score-matched baseline characteristics of people with (FHD) and without (non-FHD) family history of diabetes.

Variable
Non-FHD (n = 6986) FHD (n = 6986) p-Value  Data are expressed as percentage or mean ± SD. FHD, family history of diabetes; Drinking was calculated based on the number of units consumed per week: more than 4 shots on 5 days per week. Walking, walking workout more than 6 days per week; Weight training, participating in weight training more than a day per week; Stress, stress recognition by self; Depression, felt sad or desperate enough to interfere with your daily life for more than two weeks in a row, during the past year. Data are expressed as percentage or mean ± SD. FHD, family history of diabetes; Drinking was calculated based on the number of units consumed per week: more than 4 shots on 5 days per a week; Walking, walking workout more than 6 days per week; Weight training, participating in weight training more than a day per week; Stress, stress recognition by self; Depression, felt sad or desperate enough to interfere with your daily life for more than two weeks in a row, during the past year.