The Identification of Diabetes Mellitus Subtypes Applying Cluster Analysis Techniques: A Systematic Review

Diabetes Mellitus is a chronic and lifelong disease that incurs a huge burden to healthcare systems. Its prevalence is on the rise worldwide. Diabetes is more complex than the classification of Type 1 and 2 may suggest. The purpose of this systematic review was to identify the research studies that tried to find new sub-groups of diabetes patients by using unsupervised learning methods. The search was conducted on Pubmed and Medline databases by two independent researchers. All time publications on cluster analysis of diabetes patients were selected and analysed. Among fourteen studies that were included in the final review, five studies found five identical clusters: Severe Autoimmune Diabetes; Severe Insulin-Deficient Diabetes; Severe Insulin-Resistant Diabetes; Mild Obesity-Related Diabetes; and Mild Age-Related Diabetes. In addition, two studies found the same clusters, except Severe Autoimmune Diabetes cluster. Results of other studies differed from one to another and were less consistent. Cluster analysis enabled finding non-classic heterogeneity in diabetes, but there is still a necessity to explore and validate the capabilities of cluster analysis in more diverse and wider populations.


Introduction
Diabetes Mellitus (DM) is a chronic and lifelong metabolic disorder characterized by elevated levels of glucose circulating in the blood that occurs either when the pancreas does not secrete enough insulin, due to destruction of pancreatic β-cells; when the body's cells do not respond to insulin effectively; or by a combination of both mechanisms. The prevalence of DM has increased across the globe and is expected to rise to 592 million by 2035, incurring tremendous human, economic and social costs [1].
DM imposes a considerable burden on society in the form of low productivity, poor quality of life, increased healthcare expenditures, and premature mortality. The global cost of DM is overwhelming: US $1.31 trillion or 1.8% of global GDP. Notably, indirect costs accounted for 34.7% of the total burden [2].
DM significantly increases the risk of mortality: 1 in 12 of all-cause deaths may be attributable to DM [3][4][5]. Regardless of existence of effective treatments, DM outcomes are poor: DM patients show high frequency of serious and life-threatening micro-and macrovascular complications (strokes, acute coronary events, blindness, amputations, renal disease, heart failure) and premature mortality exceeding the general population [6]. DM management is challenging because of the heterogeneity in individual patient responses, which vary due to factors such as illness severity, sociodemographic characteristics, and specific clinical factors (e.g., glycated hemoglobin (HbA1c), insulin sensitivity, body composition, and duration of disease) [7]. DM is much more complex than the classification into Type 1 and Type 2 suggests. Recently, Alhqvist and colleagues using K-means cluster analysis (CA) has proposed a novel classification of adult onset DM into five subgroups: Severe Autoimmune Diabetes (SAID); Severe Insulin-Deficient Diabetes (SIDD); Severe Insulin-Resistant Diabetes (SIRD); Mild Obesity-Related Diabetes (MOD); and Mild Age-Related Diabetes (MARD) [7]. This classification is based on six measures that are commonly collected in clinical practice: body mass index (BMI); age at DM diagnosis; HbA1C; β-cell functioning; insulin resistance; and the presence of DM-related autoantibodies. The five subgroups differ in their patterns of progression and risk of complications. Currently, there is a rising interest in identifying more homogeneous groups of DM patients so therapeutic plans could be applied in a more targeted manner. New analytic techniques, namely unsupervised learning methods, such as CA, have been used in a variety of settings, with various sources and information and including different types of variables for proposing subtypes of DM patients.
The objective of this work is to systematically review the scientific literature to identify publications that have applied CA to generate homogeneous groups of DM patients, describe the main features of the analytic techniques that have been applied, as well as the variables included to propose DM subgroups.

Search Strategy and Selection Criteria
We systematically searched Medline Complete (from 1978 until August 2020) and PubMed (1974 until August 2020) databases on 7 August 2020 following PRISMA guidelines. Additionally, the reference lists of the selected articles from the above-mentioned databases were hand-searched.
In the databases we searched studies published on the area of unsupervised CA of DM patients. The search strategy applying Medical Subject Headings (MeSH) was used in the Medline Complete database with the following keywords: "Diabetes Mellitus" or "Diabetes Mellitus, Type 2" or "Diabetes Mellitus, Type 1" or "Diabetes" AND "Cluster analysis" or "Cluster". In the Pubmed database papers were searched applying "Diabetes" and "Cluster" keywords. The results were limited to articles in the English language and which had humans as a research subject. All database-specific technical variations were taken into account during the search.

Methods of the Review
The selection process was performed by two independent researchers. Search results from two databases were combined to remove duplicates, after which all unique results were screened based on the title and abstract. In the next stage, full-text articles of potentially suitable articles were obtained and assessed for eligibility criteria: (1) the study population consisted of diabetic patients (Type 1 and/or Type 2 DM); (2) clusters were identified through one of the unsupervised clustering algorithms; (3) clustering was based on the patients' clinical data. Studies with specific aims were excluded to provide comparability within clusters.

Data Extraction
The information was retrieved by two authors from selected articles to the a priori prepared tables, with the following columns: study design, source of the data taken for exploration, size and characteristics of targeted population, diagnostic criteria of DM, variables chosen for cluster analysis, and the number of clusters and their characteristics, as well as the data standardization, chosen clustering algorithm, methods for the determination on the number of clusters, and validation of clusters on an independent sample (please, see Appendices A and B).

Results
The search identified 6319 publications from two databases. After removing duplicates and screening the papers, 75 full-text articles were reviewed and 65 were excluded for the following reasons: 6 were review articles, 9 papers focused on exploring clusters of diabetic patients with specific comorbidities at baseline, 32 studies pursued other aims than finding subgroups of DM, 7 studies used other methodologies than unsupervised learning techniques, 9 studies conducted a similar analysis but with other specific aims (clustering of genetic data etc.), and 2 studies were conducted on mice. As a result, 14 papers were found to be eligible: 10 articles were included in the review [7][8][9][10][11][12][13][14][15][16] and an additional 4 eligible papers were found after hand-searching of the reference lists of selected articles [17][18][19][20]. The selection process is presented in Figure 1.

Sample Characteristics
The sample size ranged between 33 and 85,783 participants within studies constituting a total 130,353 diabetic patients: 33 type 1 diabetes (T1DM) patients, 238 latent autoimmune diabetes patients (LADA) and 130,082 type 2 DM (T2DM) patients. The largest sample size was in the study of Karpati et al. from Israel, constituting 85,783 patients of whom 60,423 were considered eligible for cluster analysis [15]. The second largest was the study of Kahkoska et al.,with 20,274 DM patients [14], followed by 8980 individuals from the ANDIS cohort in the study of Ahlgvist et al. [7,20] The study with the smallest sample of 33 T1DM patients from several university hospitals was conducted in the UK [11].
The variability in population size could be explained by the source of the data, as data were taken from electronic medical records, healthcare databases, from previously conducted longitudinal observational studies and surveys. Disease duration among target populations of reviewed publications, along with newly diagnosed diabetic patients, ranged from 40 days after diagnosis to 12 years or longer [14,20]. The age of the participants varied depending on the type of DM: 5-16 years among T1DM patients, LADA patients were 35 years and older, the age of T2DM patients were between 18-96 years. Different criteria were used for the diagnosis of DM in the studies: American Diabetes Association Criteria [9], 1999 World Health Organization criteria [17], International Diabetes Federation diagnostic guidelines [12]. When data were extracted from health records or healthcare databases, diagnosis was based on specific ICD-10 codes for DM or antidiabetic medications [7,8,[14][15][16]19,20]. Some studies used different diagnostic methods using biochemical indicators (fasting plasma glucose/HbA1c levels/blood test for autoimmune responses) with or without restrictions on the duration of treatment [10,11,18], while one study used self-reported DM cases [13].
Hammer and colleagues tried to cluster participants with DM according to self-reported symptoms, including, upper GI/dysmotility, diarrhea, constipation, nausea/vomiting [13].
Karpati and colleagues focused on clustering based on HbA1c levels. Thus, changes in HbA1c levels during the 3 year period, mean of the absolute first differences in HbA1c, and the ratio of the maximum absolute second difference to mean absolute first difference of HbA1c have been measured and included in the CA [15].
Li and colleagues had the highest number of variables included for CA among studies included in this systematic review, 73 variables [19].
Methods for determining the number of clusters varied from one study to another. Seven papers used the direct silhouette width method [7][8][9]14,17,18,20], one paper had a fixed number of clusters [10], one study determined the number of clusters based on hierarchical clustering with Ward's method [11].
In addition, two publications determined the number of clusters based on principal component analysis (PCA) [12,13], one publication performed a "NbClust" algorithm that selected an optimal method for the determination of number of clusters [15], one study was based within the cluster sums of squares against the number of clusters [16], one study was based on a cosine distance metric [19].

Cluster Validation on an Independent Sample
Only five studies performed validation of results of CA on an independent sample [7,[18][19][20], while Karpati et al. split the database to train and test datasets to replicate findings [15].
The main five clusters identified across studies shared similar phenotypic characteristics. All of the patients in the first SAID cluster were GADA-positive, were younger compared to other cluster members, had low BMI and insulin deficiency characterized by low HOMA-2b and higher HbA1c levels. The patients with DM in the SIDD cluster had the same characteristics but were GADA-negative. At the same time, participants from SIRD differed with high BMI, whole-body and/or adipose-tissue insulin resistance characterized by high HOMA-IR and were at a relatively younger age. Individuals in the MOD cluster were slightly younger and had obesity and moderate insulin resistance compared to the SIRD cluster. The oldest age of diabetic patients and moderate metabolic dysregulations were inherent to the MARD cluster. Authors in the reviewed papers identified several complications associated with each cluster, which were also observed in the replicated studies. The major conditions were diabetic or chronic kidney diseases (DKD, CKD), liver diseases (non-alcoholic fatty liver disease (NAFLD) or hepatic fibrosis), retinopathy, polyneuropathies, and cardiovascular diseases (CVDs). Thus, in studies of Zaharia et al. and Ahlgvist et al., the SIRD cluster and in the study of Tanabe et al., both SIRD and SAID clusters were associated with a higher risk for CKD and DKD [7][8][9]20]. The cluster with the presence of metabolic syndrome in the study conducted by Safai et al. reported the same association with nephropathies [16]. However, Dennis et al. did not find an increased risk for CKD complications among clusters after adjustment for baseline estimated glomerular filtration rate (eGFR) [18]. SAID and SIDD in the study of Tanabe et al., but only the SIDD cluster in the studies of Ahlgvist et al., were associated with the increased risk for retinopathy [7,8,20]. Along with them, the similar non-autoimmune b-cell failure cluster to SIDD in the study of Safai et al. demonstrated the same association with retinopathy [16]. Liver diseases such as NAFLD and hepatic fibrosis were found to be associated with the SIRD cluster in studies of Zaharia et al. and both studies of Ahlgvist et al. [7,9,20] At the same time, neuropathies identified in the Zaharia et al. study among SIDD individuals, were not associated with any cluster after adjustment for disease duration or age at onset in the study of Safai et al. [9,16] In the study of Kahkoska et al., unadjusted analysis showed that CVDs were associated with the SIDD cluster, which is characterized by low BMI and insulin deficiency [14]. However, CVDs did not differ among clusters after adjustment for known modifiable and non-modifiable risk factors in the studies of Safai et al. and Tanabe et al. [8,16] Amato et al. phenotyped diabetic patients based on fasting incretin levels into two independent clusters: cluster 1 (65.6%) with lower incretin levels and cluster 2 (34.4%) with higher incretin levels [10]. Thus, cluster 1 differed by a lower glucagon-like peptide-1 (GLP-1), glucose-dependent insulinotropic polypeptide (GIP) and, consequently, with higher levels of HbA1c and fasting plasma glucose (FPG) compared to cluster 2, which was explained by possible increased a-cell activity and its effect on the reduction in b-cell function. However, there were no differences in the clinical-anthropometric characteristics between clusters.
Based on the data from electronic medical records, Li et al. clustered T2DM patients applying TBA and came up with three different subtypes with inherent clinical characteristics and comorbidities [19]. Individuals in subtype 1 had higher weight and serum glucose levels and were associated with diabetic nephropathy and retinopathies, patients in subtype 2 had lower weight and were associated with cancer malignancy and CVDs, while subtype 3 was characterized by neurological diseases, allergies, HIV and CVDs.
Karpati et al. found ascending (14.4%, mean HbA1c 8.7% (1.9)), descending (10.0%, mean HbA1c 7.8% (1.8)) and stable (75.6%, mean HbA1c 7.1% (1.2)) subtypes of T2DM patients, with the duration of 3-7 years, based on their HbA1c levels' trajectories and their five-year risk of complications [15]. Diabetic patients in the ascending cluster were the youngest compared to the representatives of other clusters, and were taking mostly non-insulin medications, while insulin medications were often prescribed to patients in the descending cluster. However, micro-and macrovascular complications were prevalent in both ascending and descending clusters. The mortality rate was higher in the descending cluster.
Hammer et al., based on gastro-intestinal symptoms of T2DM patients, found four4 such clusters as Upper GI/Dysmotility (44.8% of the total variance), Diarrhea (10.4% of the total variance), Constipation (7.8% of the total variance), and Nausea/Vomiting (6.3% of the total variance) [13]. Analysis in the given study has shown that oral medications taken by diabetic patients were associated with the Nausea/Vomiting cluster. After adjustment for the type of treatment (insulin or oral medication), gender, and age, members of Upper GI/Dysmotility cluster were heavily linked with use of insulin in conjunction with hypoglycemic medication, while Nausea/Vomiting cluster members had a strong relationship with the intake of insulin, oral hypoglycemic medication, and with the combination of both. Diarrhea and Constipation clusters have not shown any significant linkages.
Arif et al. found two clusters of T1DM patients by assessment of different parameters of autoimmunity of CD4 T-cell and B-lymphocyte responses [11]. Thus, T1DM patients in the later stages are differentiated with (AAb++ and IFN-g. IL-10) and (AAb6 and IFN-g, IL-10), as well as other non-diabetic individuals with high AAbs who had an increased risk for T1DM development. Overall, cluster 1 was dominated for IL-10 response to GAD, insulin, and proinsulin compared to cluster 2.
Pes et al. found four different clusters of LADA patients. Each cluster had a special set of important characteristics extracted based on the PCA. One of the main findings related to the disease progression was the association of b-cell function with four clusters (PCs) [12]. The fastest b-cell failure was observed among members of PC 2, which was characterized by genetic profile, while mild and slower b-cell activity was seen among PC 1, as well as gender and TGs predominated PC 3 with cholesterol predominated PC 4, respectively.

Discussion
The main finding of this systematic review is that data-driven algorithms reflect a larger heterogeneity in DM subtypes that the classical division into T1DM and T2DM or solely based on glycemic or HbA1c levels may reflect. Another finding is that a significant number of studies with data from a diversity of patient origins receiving the same five clusters of DM patients, which shared similar physiological and clinical characteristics across studies and were associated at most with analogous comorbidities, although having a different prevalence as well as variations across them in the frequency of the variables included in each of them. However, there were also six papers that provided clusters of DM patients based on different types of variables shown also to be appropriate in terms of statistical significance as well as clinical meaning. Another relevant finding is that there is significant variability in terms of the use of specific analytic techniques to generate those clusters of DM patients.
Overall, those findings confirm that the process of using clustering techniques, although not exempt from certain limitations, may be applied for monitoring the progression and control of patients with DM, but there is still uncertainty on the variables that should be used for generating subtypes of patients, as well as for what is the most appropriate clustering method.
As for the studies that proposed the same five clusters, the proportions of individuals in each cluster varied from one study to another. Several factors may influence those disproportionate distributions. First, the source of the data applied for CA in the studies varied based on the availability and may explain some variations in the sample size, as well as the type of diabetic patients participating in the analysis. For instance, the extreme proportion of SAID patients in the study of Zaharia et al. could be explained by active recruitment of T1DM patients, while studies that have utilized data from other cohorts showed consistent results [9]. Second, some cohorts used for CA were focused specifically on the studies with DM patients at onset [7,[9][10][11]17,20], while others recruited patients with a longer [8,[13][14][15][16]18] or not defined [12,19] [7,14,17]. This is an important aspect, since the clustering results of the Japanese population showed that Asian diabetic patients, due to their inherent lower b-cell activity and insulin secretion, showed a higher proportion of SIRD cluster with a comparatively lower BMI than the studies from western cohorts, meaning there is a potential earlier onset of DM in their population [7,8,14,21].
Overall, five main clusters were reproducible in the studies which used databases from cross-sectional, longitudinal observational and trial studies. All the aforementioned papers, revealing meaningful complications specific for clusters, used data from longitudinal observational studies. The cross-sectional study of Zou et al. and of Kahkoska et al., which selected patients with a baseline high risk for CVDs and long-lasting DM, were not able to estimate risks for complications [14,17], while Dennis et al., who did the study with protocol-driven follow-up, were able to find out several complications adjusting to different treatments [18].
The range of associated comorbidities is not limited to the aforementioned conditions. There might be other complications of diabetic patients that would eventually need to be considered in the further clustering studies. Li et al., in their study, observed a wider range of associated comorbidities applying TBA [19]. Adjustment for known modifiable and non-modifiable risk factors are also suggested to determine their true effect, as some studies showed no association with CVDs, indicating the importance of sticking to a healthy lifestyle to reduce the risk of complications [8,16].
Another relevant issue still in need of further investigation is the optimal number of variables which provide the balance between validity and economic efficiency of clustering diabetic patients: Kahkoska et al., using only three variables (age, BMI, and HbA1c), obtained the four clusters with very similar characteristics to the original clusters proposed by Ahlgvist et al. with six variables.
Other studies which found clusters of diabetic patients with different GI symptoms [13], fasting incretin tone [10], trajectories of HbA1c levels [15], clusters among T1DM [11] and LADA [12] patients, as well as clusters identified through novel TBA [19], were unique and not replicated and therefore should be considered as a call for future research initiatives.
However, the study of Karpati et al., with a sufficient sample size of 60,423 patients, identified interesting findings by clustering based on HbA1c levels: the ascending cluster had complications only in the extremely high levels, which could possibly suggest other risk factors among this group, while the highest risk for complications among DM patients were found in the stable cluster with HbA1c < 6.0%, which contradicts the guideline recommendations and is consistent with J-shaped risk [15,22].
Moreover, the only study of clustering T1DM identified patients with different immunological responses and could be implicated in the clinical practice by tailoring immune-based therapies, raising issues about the underlying basis for the different phenotypes observed if they reflect the different immunological pathways of the disease.
Overall, results of all studies indicated the need to pay attention to symptoms and clinical characteristics of the diabetic patients, which previously were underestimated and may have an impact on their disease progression, as well as on the need to incorporate the wealth of information of unstructured data from the free text of patient records [23]. Genetic information is another critical domain that will be necessary to explore in order to identify subgroups of DM patients [24].
Review studies applied different methodological approaches of CA. Each step before and during the running CA in different ways may affect the clustering outputs. It is critical not to violate the reproducibility of unsupervised learning techniques, therefore, validation in different datasets is required to provide robustness of the results. Second, the type of data (observational/longitudinal) is also critical in cluster analysis to give a chance to observe temporal patterns of disease progression, as cluster analysis does not explain the aetiology of the disease. Third, the number of clusters depends on the specific methodology applied as well as the proportions of populations among clusters that could vary based on the chosen sample size and the presence/absence of scaling the dataset (preprocessing) [25].
Among the issues related to methods for determining the number of clusters, one study has chosen to limit the number of clusters to two [10]. Manually limiting the number of clusters could lead to error as there might be more clusters within the data.
Regarding the methods for clustering, seven out of fourteen studies have performed a k-means clustering. Several studies relied solely on k-means, other studies have performed it only to confirm the results from the hierarchical clustering or to cluster only GADA-negative individuals. In k-means clusters, the presence of outliers could distort the results of clusterization [26]. Among seven studies, only two reported excluding outliers prior to clustering [18,20]. Performing k-means requires running the clustering multiple times to obtain optimal results, but it also increases the risk of ending in a local optimum. The local optima is characterized by poorer quality of clusters that might affect the number of clusters [27]. None of the studies reported minimizing the local optima. The next widespread method after k-means was hierarchical clustering. The distance metric and linkage criteria choices ranged among six studies that performed hierarchical clustering. Those choices could affect the result of clustering as, currently, there is no sturdy theoretical justification for such decisions. Another issue with hierarchical clustering is the treatment of missing values. Most software does not work if this is the case. Four studies have not reported the presence or absence of missing data variables [7,9,10,14]. The third widespread method for clustering was PCA [12,16]. Pes and colleagues have not reported standardizing the data standardization prior to PCA, which is essential to enable the PCA with the search of optimal principal components [12]. The last method to discuss is TBA. Li and colleagues performed TBA, which is quite new in machine learning and it has a strong theoretical basis [19].
The next aspect to discuss is the validation of clustering results. Nine studies have not reported validating clustering results [8][9][10][11][12][13][14]16,17]. The validation of the results by external validation on an independent sample or cross-validation within a dataset is vital to obtain the information on the quality of performed CA [28].
The data standardization process is also an important step to enable comparison of variables that could have units at different scales. Without standardization, variables with different scales would unequally contribute to the results of analysis [29]. Only seven studies out of fourteen have reported standardizing the data prior to CA.
Some common limitations among the included studies were: the lack of some variables in their data that would affect the clustering results [7,12,[16][17][18]20]; having small or relatively small sample sizes for doing clustering [8,10,11,19]; issues that may affect the generalizability of the results [8,9,14]; and having a relatively short follow-up of participants [11][12][13][14][15]. Last but not least, Hammer et al. had reported grouping all oral medications into one group, while some drugs, such as metformin, could have significantly different effects on controlling the high blood sugar than other drugs [13]. Thus, it might have affected the results of clustering.

Conclusions
This systematic review has explored the research publications that utilized clustering algorithms to identify non-classic heterogeneity in DM. DM is a complex condition and clustering analysis is showing to be an effective method for finding clinically meaningful subgroups. Identifying homogeneous subgroups of patients with potential disease progression at an onset, based on routinely collected measurements, could be useful to apply therapeutic and prevention measures, targeting patients that will be benefitted the most. There is a significant number of effective therapeutic alternatives to treat DM, including insulin and oral medications, the latter having quite diverse mechanisms of action. It will be necessary to identify which sub-groups of patients with DM benefit most of those available therapies and advance towards more targeted treatments. Nevertheless, there are still some methodological aspects that must be clarified as well as what may be the metabolic pathways affected in each subgroup of patients. There is also a need for studies that would explore and validate the capabilities of CA in more diverse and wider populations, combining variables that have already shown statistical and clinical relevance to generate homogeneous groups of DM patients.

3.
Zaharia et al. cluster 1 SAID (N = 247): GADA positive, were more likely to be of a younger age, had relatively low BMI, poor glycemic control and overt insulin deficiency. 158 (67.0%) received insulin on diagnosis cluster 2 SIDD (N = 28): showed similarities with patients with SAID, but GADA negative; had the highest prevalence of confirmed diabetic sensorimotor polyneuropathy and cardiac autonomic neuropathy; 12 (44.0%) were treated with insulin on diagnosis; cluster 3 SIRD (N = 121): had high BMI and whole-body adipose-tissue insulin resistance, had the highest sensitivity for C-reactive protein, high hepatocellular lipid content and fatty liver index, low eGFR levels; cluster 4 MOD (N = 323): had obesity and substantial adipose tissue insulin resistance, high sensitivity for C-reactive protein, but they had moderate whole-body insulin resistance; cluster 5 MARD (N = 386): older than those in other clusters and showed only minor metabolic abnormalities. Health records about known type 2 diabetes for <6 months and in stable treatment for the last 3 months with metformin Cluster 1 (n = 63): significantly lower levels of GLP-1, GIP and ghrelin compared to cluster 2 (n = 33), and higher levels of HbA1c and fasting plasma glucose.
Regarding the clinical and anamnestic characteristics of the patients, there were not any significant differences between the two clusters, except for a greater prevalence of patients practicing physical activity in Cluster 2.

Arif et al. (2014) [11] UK Cross-sectional study
Several university and regional hospitals in UK took part in the research N = 33. Children with newly diagnosed type 1 diabetes (5-16 years), unaffected siblings of patients with type 1 diabetes (6-16 years).
Test of blood autoimmune response phenotypes by combinatorial, multiparameter analysis of autoantibodies and autoreactive T-cell responses For Autoimmune Inflammatory Phenotypes in Children With Newly Diagnosed Type 1 Diabetes group: 1. interferon-g 2. interleukin 10 (Il-10) 3. antigen-specific autoantibodies (Aabs) 4. proinsulin 5. insulin 6. Islet antigen antibodies (IA-2Ab) 7. GAD65 antibody 8. zinc transporter 8 antibody Cluster 1 (n = 15): a combination of islet AAbs and IFN-g responses to all antigens. Have a significantly higher frequency of IL-10 response to GAD, insulin, proinsulin. There are also differences in the frequency of islet AAbs between clusters. AAbs against IA-2 and ZnT8 are significantly less frequent in the IL10-dominated cluster-1. Two children had no islet AAbs present at diagnosis, five had only a single AAb, and eight had two or more AAbs. Cluster 2 (n = 18): The frequency of multiple AAbs was significantly higher, all 18 children had two or more IL-10 responses to all antigens.     1. Autoimmune β-cell failure cluster (n = 65), characterized by patients with a positive GAD65 autoantibody titer. They also had the lowest TG level.
2. Insulin resistance with short disease duration cluster (n = 490), characterized by patients being diagnosed with type 2 diabetes relatively recently and having the highest HOMA2-β. 3. Non-autoimmune βcell failure cluster (n = 510), patients in sub-group 3 were the youngest at diabetes diagnosis but otherwise resembled sub-group 1 apart from the lack of positive GAD65 autoantibody titer.
Increased risk for retinopathy. 4. Insulin resistance with long disease duration cluster (n = 727). Cluster 4 and 2 were very alike with a high age at diagnosis, similar BMI, better glycemic regulation, a relatively preserved β-cell function and at the same time a relatively high HOMA2-IR. The most important variable separating these two subgroups was the duration of diabetes. 5. Presence of metabolic syndrome cluster (n = 498), characterized by having the highest BMI compared to the other groups. It also consisted of those with the highest fasting glucose, HbA1c, C-peptide, HOMA2-IR and TG level.  only cluster 4 (MOD) in RECORD trial. After adjustment to baseline UACR, time to albuminuria was shorter for cluster 3 (SIRD) vs. cluster 2 (SIDD) in ADOPT, but not RECORD.

13.
Li Patients in subtype 1 (762) were the youngest (59.76 ± 0.45 years) and were notable for features classically associated with T2DM, such as the highest BMI (33.07 ± 0.29 kg/m 2 ) and highest serum glucose concentrations at point-of-care testing (POCT) (193.69 ± 11.45 mM). Although these patients had better kidney function compared to those in the other two subtypes. They were characterized by T2DM complications as diabetic nephropathy and diabetic retinopathy and ACE gene. Patients in subtype 2 (617) had the lowest weight (85.17 ± 1.14 kg) compared with those in the other subtypes. Subtype 2 was enriched for cancer malignancy and cardiovascular diseases. Patients in subtype 3 (1096) had the highest SBP (135.7 ± 0.7 mmHg), serum chloride levels (102.03 ± 0.11 mEq/liter), and troponin I levels (0.36 ± 0.09 mg/liter) and were more often prescribed ARB/ACEI (62.96%) for the treatment of hypertension and statins (56.0%) for cholesterol reduction. They were associated most strongly with cardiovascular diseases, neurological diseases, allergies, and HIV infections and FHIT gene.  Cluster 1 (SAID, 6.4%); was characterized by early onset, relatively low BMI, poor metabolic control, insulin deficiency, and presence of GADA, frequent ketoacidosis (30.5%); Cluster 2 (SIDD, 17.5%): was GADA negative but otherwise similar to SAID, frequent ketoacidosis (25.1%) and early signs of diabetic retinopathy; Cluster 3 (SIRD, 15.3%): was characterized by insulin resistance (high HOMA2-IR) and high BMI, had the highest prevalence of non-alcoholic fatty liver disease and high risk for CKDs; Cluster 4 (MODD, 21.6%): was characterized by obesity but not by insulin resistance; Cluster 5 (MARD, 39.1%): were older, but showed, as cluster 4, only modest metabolic derangements.