Incidence and Risk Factors for Progression to Diabetes Mellitus: A Retrospective Cohort Study

(1) Objective: This study examined the incidence and risk factors contributing to the progression to diabetes mellitus (DM) in a seven-year follow-up study of non-diabetic National Health Examinees. (2) Methods: For this retrospective observational cohort study, we used two national representative databases: the National Health Screening (HEALS) database 2009 and the National Health Insurance Service (NHIS) database 2009–2015. The eligible subjects without DM with blood sugar levels of <126 mg/dL were selected using the HEALS database. The subsequent follow-up and clinical outcomes were evaluated using the NHIS database. Cox proportional hazard regression was applied to examine the effects of the covariates on progression to diabetes. (3) Results: Among those who took part in the national health screening in 2009, 4,205,006 subjects who met the eligibility criteria were selected. Of these, 587,015 were diagnosed with DM during the follow-up by 2015. The incidence of progression from non-diabetes to DM was 14.0%, whereas that from impaired fasting glucose (IFG) to DM was 21.9%. Compared to the normal group, the newly diagnosed DM group was more likely to comprise older, female, currently smoking, and high-risk drinking participants and participants with IFG, hypertension, dyslipidemia, and metabolic syndrome. (4) Conclusions: This epidemiological study in the Republic of Korea found risk factors similar to those of other studies, but the incidence of progression to DM was 22.8 per 1000 person-years, which is higher than that previously reported. Hence, more care is needed to prevent DM.


Introduction
Diabetes mellitus (DM) is one of the world's leading public health concerns; it has increased steadily in incidence over the past few decades and resulted in complications in multiple organ systems [1]. The number of people with DM has risen from 285 million in 2009 to 463 million in 2019 and is expected to increase to 578 million by 2030 [1].
Prediabetes is defined when the glucose levels do not meet the criteria for DM but are too high to be considered normal, including impaired fasting glucose (IFG) and impaired glucose tolerance (IGT) [2]. Among them, IFG is diagnosed when the blood glucose levels are between 100 and 125 mg/dL (5.6 to 7.0 mmol/L), and it is highly associated with the development of DM [2]. IFG increases the risk of progression to DM, and the annual rate of conversion to DM is 6-9% [3][4][5]. The prevalence of IFG ranges from 1% to 30%, depending on the country and its estimated standards [6][7][8]. According to the American Diabetes Association (ADA), the risk factors for DM are age, gender, obesity, family history, physical inactivity, hypertension, and prediabetes including IFG [3]. On the other hand, the incidence and risk factors for chronic diseases, such as DM, can also show interregional, ethnic, and racial differences [9]. Indeed, some studies reported that Asian people have a higher risk of developing diabetes than other ethnicities, as well as a higher incidence [10,11]. A Japanese epidemiological study reported that IFG increased the risk of diabetes 8.8-fold [12]. However, few epidemiological studies on Koreans derived from a representative large-scale database that complements the strengths and weaknesses of each dataset have been reported.
Knowledge of the exact incidence and prevalence of prediabetes and diabetes is essential for effectively managing diabetes care and health insurance finances. In particular, the incidence of progression and risk factors of progression from non-diabetes, including IFG, to DM need to be determined.
In the present study, the subjects screened for normal glucose and IFG in the 2009 National Health Screening (HEALS) were followed up for seven years using the National Health Insurance Service (NHIS) database to investigate the incidence and risk factors of DM.

Study Design and Data Source
This was a retrospective observational cohort study using two national representative databases: the HEALS and NHIS databases owned by the NHIS system, a single insurer in the Republic of Korea (ROK) ( Figure 1) [13,14]. The NHIS data cover approximately 96.6% of the ROK population and include the demographic and medical treatment information of the participants [13]. In contrast, HEALS data represent a cohort of people who participated in national health screening programs provided by the NHIS. It contains information on the health problems and risk factors of examinees obtained through the national health screening programs [14]. The HEALS and NHIS databases are linked to personal identification numbers, but the data are anonymized and provided to a designated secure computer in a security room. These data can be used to identify the incidence of new-onset DM among non-diabetic individuals at a health screen event and discover the risk factors for new-onset DM in the ROK. hand, the incidence and risk factors for chronic diseases, such as DM, can also show interregional, ethnic, and racial differences [9]. Indeed, some studies reported that Asian people have a higher risk of developing diabetes than other ethnicities, as well as a higher incidence [10,11]. A Japanese epidemiological study reported that IFG increased the risk of diabetes 8.8-fold [12]. However, few epidemiological studies on Koreans derived from a representative large-scale database that complements the strengths and weaknesses of each dataset have been reported. Knowledge of the exact incidence and prevalence of prediabetes and diabetes is essential for effectively managing diabetes care and health insurance finances. In particular, the incidence of progression and risk factors of progression from non-diabetes, including IFG, to DM need to be determined.
In the present study, the subjects screened for normal glucose and IFG in the 2009 National Health Screening (HEALS) were followed up for seven years using the National Health Insurance Service (NHIS) database to investigate the incidence and risk factors of DM.

Study Design and Data Source
This was a retrospective observational cohort study using two national representative databases: the HEALS and NHIS databases owned by the NHIS system, a single insurer in the Republic of Korea (ROK) ( Figure 1) [13,14]. The NHIS data cover approximately 96.6% of the ROK population and include the demographic and medical treatment information of the participants [13]. In contrast, HEALS data represent a cohort of people who participated in national health screening programs provided by the NHIS. It contains information on the health problems and risk factors of examinees obtained through the national health screening programs [14]. The HEALS and NHIS databases are linked to personal identification numbers, but the data are anonymized and provided to a designated secure computer in a security room. These data can be used to identify the incidence of new-onset DM among non-diabetic individuals at a health screen event and discover the risk factors for new-onset DM in the ROK.

Study Subjects and Setting
The eligible target subjects were selected using the HEALS database and subsequently followed up until 2015 when access was granted for analysis. The clinical outcomes were checked using the NHIS database ( Figure 1).

Study Subjects and Setting
The eligible target subjects were selected using the HEALS database and subsequently followed up until 2015 when access was granted for analysis. The clinical outcomes were checked using the NHIS database ( Figure 1

Outcome Variables
The primary clinical endpoint was the progression from non-diabetes to DM, and new-onset DM was defined as a diagnosis with DM (ICD-10-CM: E10, 11,13,14) from the medical records of the NHIS database. The diagnostic criteria for DM in ROK are based on a fasting plasma glucose of ≥126 mg/dL for eight hours, or a two-hour plasma glucose of ≥200 mg/dL during a 75 g oral glucose tolerance test, or a glycosylated hemoglobin (A1C) level of ≥6.5% [15]. Repeated verification is required on another day if there were no obvious hyperglycemia symptoms (polyurea, polydipsia, and unexplained weight loss), but two or more of the above abnormal results from the same sample can be provided immediate confirmation [15].

Household Income
The household income was calculated based on the insurance owner's income level to claim health insurance premiums and was classified into quintiles. The higher the quintile, the higher the income level.

Metabolic Syndrome
Metabolic syndrome was defined in individuals meeting three or more of the following criteria: (1) abdominal obesity, with waist circumference of ≥90 cm in men or ≥85 cm in women; (2) hypertriglyceridemia, with triglyceride (TG) of ≥150 mg/dL or medication use; (3) low high-density lipoprotein (HDL)-cholesterol, with HDL-cholesterol of <40 mg/dL in men and <50 mg/dL in women; (4) high systolic blood pressure (BP), with systolic BP of ≥130 mmHg and/or diastolic BP of ≥85 mmHg; or (5) hyperglycemia, with Fasting Plasma Glucose (FPG) of >100 mg/dL or medication use [16].

Current Smoking, High-Risk Drinking, and Proper Exercise
Current smoking was defined in those who had smoked more than 100 cigarettes in their lives or were currently smoking [17].

Variables 2.3.1. Outcome Variables
The primary clinical endpoint was the progression from non-diabetes to DM, and new-onset DM was defined as a diagnosis with DM (ICD-10-CM: E10, 11, 13, 14) from the medical records of the NHIS database. The diagnostic criteria for DM in ROK are based on a fasting plasma glucose of ≥126 mg/dL for eight hours, or a two-hour plasma glucose of ≥200 mg/dL during a 75 g oral glucose tolerance test, or a glycosylated hemoglobin (A1C) level of ≥6.5% [15]. Repeated verification is required on another day if there were no obvious hyperglycemia symptoms (polyurea, polydipsia, and unexplained weight loss), but two or more of the above abnormal results from the same sample can be provided immediate confirmation [15].

Household Income
The household income was calculated based on the insurance owner's income level to claim health insurance premiums and was classified into quintiles. The higher the quintile, the higher the income level.

Metabolic Syndrome
Metabolic syndrome was defined in individuals meeting three or more of the following criteria: (1) abdominal obesity, with waist circumference of ≥90 cm in men or ≥85 cm in women; (2) hypertriglyceridemia, with triglyceride (TG) of ≥150 mg/dL or medication use; (3) low high-density lipoprotein (HDL)-cholesterol, with HDL-cholesterol of <40 mg/dL in men and <50 mg/dL in women; (4) high systolic blood pressure (BP), with systolic BP of ≥130 mmHg and/or diastolic BP of ≥85 mmHg; or (5) hyperglycemia, with Fasting Plasma Glucose (FPG) of >100 mg/dL or medication use [16].

Current Smoking, High-Risk Drinking, and Proper Exercise
Current smoking was defined in those who had smoked more than 100 cigarettes in their lives or were currently smoking [17].
High-risk drinking was defined as drinking more than 300 mL of alcoholic beverages per day on average. For traditional Korean drinks, one standard drink unit corresponds to one bowl (approximately 300 mL) of Korean rice beer (Makgeoli) or a quarter bottle (approximately 90 mL) of 20% Korean liquor (Soju) [17,18].
Proper exercise was defined as follows: (1) intensive exercise lasting more than 20 min per session and more than three times per week or (2) moderate exercise lasting more than 30 min per session and more than five times per week [19,20].

Statistical Analyses
The continuous variables are presented as the mean and standard deviation (SD) and were compared using Student's t-test. The categorical variables are presented as a proportion and were compared using a chi-square test. The relationships between the dependent variable (progression to DM) and the various risk factors or independent variables were examined via the Cox proportional hazard model. Multicollinearity analysis with the variance inflation factor (VIF) was performed to identify the collinearity between the variables. Variables with VIF > 5 were considered to show severe multicollinearity; there were no variables with VIF > 5 in the model. The Cox proportional hazard model was applied to consider the timing of the event. The onset of DM was defined as an event and was censored when the follow-up was terminated or death occurred. In other words, the Cox proportional hazard regression estimated the prognostic influence of the non-diabetes status on the conversion to DM, while simultaneously controlling for the confounding effects of covariates. This model estimated the instantaneous relative risk of conversion to DM, averaged over the entire follow-up duration. The proportional hazard assumption was tested using the goodness-of-fit test, which compares the observed and expected risk probabilities. The adjusted hazard ratios (HRs) and 95% CIs are reported. Subgroup analysis was conducted on participants over 40 years of age because the prevalence of DM in Koreans in 2016 exceeded 10% for men in their 40s and women in their 50s. The data were analyzed using SAS statistical software, version 9.4, for Windows (SAS, Cary, NC, USA); two-sided probability values less than 0.05 were considered significant.

Baseline Demographics
Of the 4,205,006 participants analyzed, the mean age was 40.1 ± 12.2 years, with 71.8% being male. This is because males are more likely to participate in the national health screening [21]. The proportion of participants with IFG with fasting blood sugar levels of 100-125 mg/dL was 24.4%, of which 80.6% were male. Metabolic syndrome was detected in 60.9% of the total: 64.0% males and 36.0% females. The proportions of all variables were significantly higher in males than in females (p < 0.001) ( Table 1).

Incidence and Characteristics of Progression from Non-Diabetes to Diabetes
The cumulative incidence of DM was 14% for the seven-year follow-up, and the conversion rate of non-diabetes to DM was 22.8 per 1000 person-years ( Table 2). The mean age of the 587,015 new-onset DM cases was 48.0 ± 12.7 years, which was higher than the 38.8 ± 11.6 years in the patients not diagnosed with DM; 21.9% of IFG subjects were diagnosed with DM compared to 11.4% of subjects with normal blood sugar levels; and 33.7% of subjects with hypertension were diagnosed with DM, which was higher than the 12.4% of non-hypertensive cases (Table 3).

Risk Factors of Progression from Non-Diabetes to Diabetes
The newly diagnosed DM group were characterized by IFG ( (Table 4). The results were similar in the analysis of participants aged 40 years and over (Table 5).

Discussion
The incidence of and risk factors contributing to DM progression in a follow-up study of non-diabetic national health examinees were examined by linking two types of national representative databases in the ROK. Approximately 14% of people not diagnosed with DM in the 2009 National Health Screening were diagnosed with DM over the seven-year follow-up. Furthermore, 24.4% of all subjects had IFG at the 2009 screening; 21.9% of those had converted to DM by 2015. In addition, the conversion rate from non-diabetes to DM was 22.8 per 1000 person-years.
A study of the 40~69-year age group without DM at the baseline examination in 2001~2002 using the Korean Genome and Epidemiology Study (KoGES) revealed an overall DM incidence of 22.1 per 1000 person-years after a 12-year follow-up [22]. The follow-up period was longer than that of the present study, and the study subjects were limited to residents of certain regions over 40 years of age. On the other hand, the conversion rate was 22.1, which is slightly lower than that in the present findings. These results may be because the data was collected in 2001-much earlier than 2009, the collection year of the present study. According to the NHIS database, the number of DM patients has increased steadily from 2.7 million in 2016 to 3.3 million in 2020. Of these, 95% of patients with DM in 2020 were in their 40s or older [23].
According to a previous study, among the 6.4 million members of the Hong Kong population who used hospital authority services from 2006 to 2014, the incidence of DM was 5.9% (n = 377,565), and the conversion rate to DM was 9.46 per 1000 person-years in 2014, which is much lower than that in the present findings [24]. In relatively old data from Japan, a systematic review and meta-analysis of the studies conducted between 1980 and 2003 resulted in a pooled DM incidence rate of 8.8 (95% confidence interval, 7.4-10.4) per 1000 person-years [25]. On the other hand, the Chennai Urban Rural Epidemiology Study (CURES) cohort (n = 1376), which followed an Asian Indian cohort for 9.1 years until 2013, reported a 30% incidence of DM [10]. In addition, they reported DM conversion rates of 33.1 per 1000 person-years in non-diabetics, including those with prediabetes, and 61.0 per 1000 person-years in IFG subjects [10]. In this study, female subjects exhibited 1.01 times higher conversion from non-diabetes to DM and 1.10 times higher conversion from IFG/IGT to DM than male subjects, but this difference was not statistically significant [10]. Similarily, the conversion to DM was 1.521 times higher in IFG subjects than in normal-glucose subjects, and the conversion to DM in non-diabetics was 1.198 times higher in females than in males [10]. On the other hand, the incidence of DM and the conversion rate of DM were overwhelmingly higher than those of the present research. At the start of the observation, the proportion of IFG subjects was 4.9% (while the proportion of all types of prediabetic subjects was 21.7%), which is lower than the 24.4% of IFG in the present study [10]. These differences may be related to the metabolic effects of the western-style diet, or tissue resistance to insulin. One study reported that Asian Indian people have the highest incidence of DM when compared to other Asian people [26]. The Southall And Brent Revisited (SABRE) study (n = 1007) that observed South Asian men 40-69 years of age living in North and West London for 19 years until 2011 reported a 35% incidence of DM [11]. This is higher than the 14.1% incidence of DM in men in the present study. Therefore, Asian ethnic groups have different incidence rates of DM, depending on their growth environment.Accordingly, more epidemiological studies are needed under a range of conditions to better understand the status and trends, suppress the increase in the incidence of prediabetes and DM, and set healthcare policies for prevention and treatment that are compatible with the burden of prediabetes and DM.
In the present study and reported research, IFG is a risk factor of progression to DM. A systematic review by the U.S. Preventive Services Task Force (USPSTF) reported that the treatment of IFG is associated with delayed progression to DM [27]. Therefore, it is necessary to prevent progression to DM in subjects with IFG by applying the appropriate lifestyle and medical interventions [2]. In particular, subjects with IFG are at higher risk of developing cardiovascular disease. Therefore, they require intensive cardiovascular risk management [28]. The National Institute for Health and Care Excellence (NICE) suggested a national strategy and policy to prevent DM linked to diet, physical activity, and obesity. In addition, adult DM patients need to manage their blood pressure, lipids and cardiovascular risk, blood glucose, and complications [29]. In line with these recommendations, hypertension, dyslipidemia, higher BMI, and metabolic syndrome were also risk factors for DM in the present study. However, there is also a report that IGT is not a risk factor for conversion to diabetes, which may explain partly why some subjects without metabolic syndrome converted to diabetes in present study [30].
Health behaviors are also linked to the development of DM. Smoking increases the risk of DM by affecting visceral abdominal fat accumulation, insulin resistance, and pancreatic b-cell dysfunction [31,32]. Various epidemiological studies have demonstrated a risk of DM and its complications linked to smoking [33][34][35][36]. Alcohol consumption also increases the risk of DM and its complications by affecting plasma glucose, gluconeogenesis, and insulin sensitivity [37]. In contrast, physical activity decreases the risk of DM and its complications [38][39][40]. In the present study, proper exercise was not a significant factor in the incidence of DM, which appears to be due to the limitation that it is not an accurate measurement of the amount and quality of exercise. Unlike smoking and drinking, exercise may have behavioral variations in terms of continuity and addictive behavior. Therefore, repeated measurement data using accurate tools will be needed. For example, a Korean study measured physical activity using a self-reported international physical activity questionnaire and reported a lower trend in the incidence of DM [38]. Unlike the incidence and conversion rates of DM, racial and regional differences in the risk factors appear to be indistinguishable.
Care is needed to prevent DM, and more efforts will be needed to reduce the risk of DM and the number of diabetic patients. In addition, more epidemiological studies in DM are needed to identify and alleviate the disease burden of DM and its complications.
This study had some limitations. First, the HEALS and NHIS databases are secondary databases that were not planned and collected according to specific research objectives, thereby limiting the validity of variables' definitions and research results. These secondary data used limited by the lack of detailed clinical information to study a specific disease due to the data used for insurance claims. For example, the German Diabetes Association and the German Clinical Chemistry Association, in their 2019 guidelines, recommend caution using HbA1c to diagnose diabetes in the elderly [41]. Nevertheless, it is difficult to discuss these issues in the data used. It may be that Korean doctors diagnosed and entered the diagnosis code according to the diabetes diagnostic criteria recommended by the latest diabetes guidelines. On the other hand, the use of secondary databases can include largescale groups of research participants, which makes it easier to generalize the results and reduces selective reporting. Second, this studyexcluded subjects without information on smoking, drinking, and exercise in the subject selection process. Such exclusion was an inevitable choice because health behavior information is important for finding the risk factors for the conversion to DM. Third, the changes in health behaviors during the followup period could not be analyzed due to data limitations. Despite these limitations, these findings provide comprehensive epidemiological information on diagnosed IFG and DM in the ROK using a large national population-based sample.

Conclusions
This epidemiological study in the ROK, which was a seven-year follow-up of nondiabetics from National Health Examinees, showed that the risk factors are similar to those for other regions and races. The incidence of progression to DM was 22.8 per 1000 person-years, which is higher than that found in previous studies in the ROK, but much lower than the incidence reported in an Asian Indian population.  Informed Consent Statement: Patient consent was exempted because of the total anonymity of all research data used in this study. The authors were able to analyze these data using a secure computer in the security room of NHIS.

Data Availability Statement:
The dataset may not be taken out of the NHIS according to the policy of the National Health Insurance Service of Korea. The data can be accessed on the National Health Insurance Data Sharing Service homepage of the NHIS (https://nhiss.nhis.or.kr/bd/ab/bdaba000 eng.do (accessed on 27 October 2021)). Applications to use the NHIS-HEALS data will be reviewed by the inquiry committee of research support. Once approved, raw data will be provided to the applicant for a fee. Although the datasets are coded in English and numbers, not in Korean (Hangul), the use of individual data is allowed only for Korean researchers. Nevertheless, it would be possible for researchers outside the country to gain access to the data by conducting a joint study with Korean researchers.

Conflicts of Interest:
The authors have no conflict of interest to declare with respect to the authorship and/or publication of this article.