Deep Resequencing of 9 Candidate Genes Identifies a Role for ARAP1 and IGF2BP2 in Modulating Insulin Secretion Adjusted for Insulin Resistance in Obese Southern Europeans

Type 2 diabetes is characterized by impairment in insulin secretion, with an established genetic contribution. We aimed to evaluate common and low-frequency (1–5%) variants in nine genes strongly associated with insulin secretion by targeted sequencing in subjects selected from the extremes of insulin release measured by the disposition index. Collapsing data by gene and/or function, the association between disposition index and nonsense variants were significant, also after adjustment for confounding factors (OR = 0.25, 95% CI = 0.11–0.59, p = 0.001). Evaluating variants individually, three novel variants in ARAP1, IGF2BP2 and GCK, out of eight reaching significance singularly, remained associated after adjustment. Constructing a genetic risk model combining the effects of the three variants, only carriers of the ARAP1 and IGF2BP2 variants were significantly associated with a reduced probability to be in the lower, worst, extreme of insulin secretion (OR = 0.223, 95% CI = 0.105–0.473, p < 0.001). Observing a high number of normal glucose tolerance between carriers, a regression posthoc analysis was performed. Carriers of genetic risk model variants had higher probability to be normoglycemic, also after adjustment (OR = 2.411, 95% CI = 1.136–5.116, p = 0.022). Thus, in our southern European cohort, nonsense variants in all nine candidate genes showed association with better insulin secretion adjusted for insulin resistance, and we established the role of ARAP1 and IGF2BP2 in modulating insulin secretion.


Introduction
Type 2 diabetes (T2D) is a complex disease affecting the world's population at epidemic rates and whose pathophysiology remains elusive. T2D is projected to affect up to 700 million people worldwide by 2045, with a 51% global increment [1]. The burden of diabetes will affect mostly developing countries, as diabetes is strongly associated with urbanization and increased wealth. T2D carries an increased risk of developing a wide range of macrovascular (cardiovascular, cerebrovascular and peripheral artery diseases) and microvascular complications (retinopathy, neuropathy and nephropathy) [2], with large differences in prevalence, severity and comorbidities across global populations. T2D is characterized by an inadequate β-cell response to the progressive insulin resistance that accompanies advancing age, inactivity and weight gain [3]. Impaired insulin secretion is considered one of the first defects leading to impaired glucose metabolism and the development of T2D. A genetic contribution is well recognized in diverse forms of both early-onset and adult-onset diabetes [4][5][6][7]. Aside from genetic determinants, several other factors are involved in the pathogenesis of diabetes, obesity and diabesity (co-occurrence of obesity and diabetes). For example, studies pointing to the involvement of oxidative stress show that it is significantly higher in obese vs. nonobese subjects. It is positively correlates with worse clinical measurements (e.g., BMI, waist, fasting plasma glucose, total cholesterol, etc.). Furthermore, reactive oxygen species (ROS) levels are higher in obese diabetic vs. nondiabetic obese subjects. On the other hand, the measurement of antioxidant levels negatively correlates with BMI and total cholesterol [8]. Additionally, there are well-established prenatal and risk factors that influence metabolic impairment onset and development, such as those reported for gestational diabetes mellitus [9]. Glucose metabolism and T2D are well-established multifactorial processes. To deconstruct the heterogeneity of T2D, cluster analyses using serum biomarkers and clinical data of individuals have been performed to identify T2D subgroups [10,11]. Cluster analyses have been used to change the current paradigm of classifying patients with diabetes mellitus. This analysis comprises an unbiased cluster allocation using common variables such as autoimmunity, age at diagnosis, BMI, glycemic control and homeostasis model estimates of β-cell function and insulin resistance. The final scope is a refined classification that could provide a powerful tool to individualize treatments and identify individuals with increased risk of complications at diagnosis [10,11]. These studies suggest newer routes for future research, but there are also limitations given the nature of the variables included in the analyses, which are bound to change over time. Contrariwise to serum biomarkers, germline genetic variants associated with T2D remain constant, regardless of age, disease state or treatment, and are more likely to identify T2D causal mechanisms. In the last decade, large-scale genomic studies, including genome-wide association studies (GWAS), have identified over 400 common variants in more than 100 loci that confer disease susceptibility [12,13]. Despite the great number of loci linked, there is an extensive gap between the discovery of many T2D-associated single-nucleotide polymorphisms (SNPs) and the understanding of their physiological impact on T2D pathogenesis or their clinical use as risk factors. Furthermore, a consistent discrepancy between observed heritability and recognized genetic background in complex disease is well established, often reported as "missing heritability" [14,15]. Identification of genetic factors and genes that underlie T2D could shed light on diabetes molecular background and inform clinical management strategies, including patient stratification, personalized medicine or optimization of study design of randomized controlled trials.
The genes identified so far are mainly associated with pancreatic β-cells maturation or function [16], but all the mechanisms affecting β-cell function have not yet been fully understood.
Common variants associated with complex disease by GWAS cause modest increases in disease risk, with odds ratios generally <1.2 [17][18][19][20][21]. They may indicate a gene or a locus strongly involved in that disease, as most GWA studies use technologies that allow investigating only known or common mutations. As suggested by Rivas et al., targeted resequencing may help to discover other new variants, especially rare or low frequencies, harbored in genes that may exert extra influence on the trait or disease [22,23]. Furthermore, it has been reported several times that multiple rare variants can have a stronger effect on complex traits than individual common variants [24,25].
It is therefore clear that there is an increasing need to analyze in-depth candidate genes by direct sequencing [25][26][27]. This strategic approach is now made possible thanks to next-generation sequencing (NGS) technologies. Moreover, a new powerful strategy was proposed to increase the chance to highlight causal variants involved in modulating a phenotype in association studies. It consists of sequencing samples from the extremes of a trait [28]. Intuitively, such samples should be enriched for the burden of alleles influencing a trait, thus improving statistical power to discover risk/protective variants and association to the trait [24,29,30]. Additionally, it allows for more homogeneity into cohorts, determining a greater power to detect association and helping to identify markers with higher ORs [31].
Given the fact that β-cell function and glucose-stimulated insulin secretion appear to be the traits most strongly associated with T2D pathogenesis, and that insulin resistance is accompanied by early compensatory upregulation of insulin secretion, the best method for measuring β-cell function is evaluating insulin secretion adjusted for insulin sensitivity. This ratio, called the disposition index (DI), assumes that at the same glucose tolerance this index remains constant. DI is calculated by insulin secretion/insulin resistance (∆I/∆G ÷ IR), with several formulas being proposed depending on available measurements [32,33]. The loss of function of β-cells, which reduces their capacity to raise insulin secretion to compensate for insulin resistance, results in a lower DI [34][35][36]. Thus, DI can be considered a measure of the functionality of the pancreas and can predict the normal β-cell response adequate for any degree of insulin resistance. In diabetes, β-cells are unable to respond adequately to insulin resistance, thus determining the appearance of impaired glucose regulation and altering the disposition index. DI avoids using gold-standard techniques (i.e., euglycemic clamp), which is difficult to apply on a wide scale [34][35][36].
Thus, we aimed to evaluate common and low-frequency (1-5%) variants in genes associated with insulin secretion by targeted resequencing in subjects selected from the extremes of insulin release measured by DI. We selected the nine genes most associated in the literature to β-cell's insulin secretion who reach genome-wide significant association with T2D from published GWAS (p ≤ 5 × 10 −8 ). Genes selected are ADAMTS9, ADCY5, CDAKL1, IGF2BP2, JAZF1, GCK, NAT2, KCNQ1 and ARAP1. These genes are all involved in the process of insulin secretion, including synthesis, trafficking, rate, localization and vesicles formation. Published odds ratios (ORs) along with 95% confidence intervals (CIs) and references for selected genes are shown in Supplementary Table S1 [37,38].
To our knowledge, no studies involving NGS of such candidate genes have been performed so far in obesity and diabetes associated with obesity, and no deep resequencing has been implemented on these selected genes.

Study Samples and Quintiles of Disposition Index
The clinical features of all 757 participants presented as 20% versus 80% (1st vs. 5th quintile) of the disposition index are shown in Table 1. Table 1 also shows p-values calculated between the extremes. Sex distribution was similar between the two quintiles: in the 20% DI (n = 377), 18 (31.3%) were males, and 259 (68.7%) were females; in the 80% DI (n = 380), 94 (24.8%) were males, and 285 (75.2%) were females.
Our cohort shows, as expected, a high prevalence of obesity, as the median BMI (Kg/m 2 ) in the two extremes is 39 and 40, respectively. Additionally, the lowest and highest quintiles of DI differ significantly for most of the lipid measurements, with the lower insulin secretion (lower DI) showing the worst metabolic profile (e.g., higher LDL, lower HDL, higher TGs).

Number and Type of Variants Observed in Study Subjects
Quality assessment for sequencing data resulted in a QScore ≥ 30 for 72.7% of bases, QScore ≥ 20 for 81.7% of bases, and QScore ≥ 12 reads for 100% of bases. The fraction of the captured targeted regions (29 685 bp) covered by at least one read was 98.2%. After the sequencing runs, acquisition of data and variant calling, we observed 5636 variants in the raw dataset. Filter setup is explained in detail in the Materials and Methods section (Section 4). The first filtering retained 2751 variants in the whole sample, 1876 of which were differentially distributed in one of the extremes. After the final filtering, 1221 variants were retained. Among these, 879 affected protein function by in silico analysis. The gene distribution and function of these 1221 variants are reported in Table 2. LoF: loss of function, meaning missense, nonsense, splicing and frameshift variants; LoF/TOT: ratio between observed LoF and total variants; rs describes known variants. Total column is not the raw sum of single-gene data, as several variants were found in adjacent, intronic noncoding regions or in the antisense transcript.

Association Analyses
After this discovery phase, binary logistic regression was performed to evaluate the association of variants, both alone and collapsed by gene or function, with DI extremes. When analyzing all variants that passed the filter (n = 1221), collapsed by gene and/or function, only the association between DI and nonsense variants were significant, also after adjustment for established risk factors such as age, gender and BMI. Carriers of these stop variants had overall 75% less probability to be in the lower, more pathologic, extreme of DI (OR = 0.25 95% CI = 0.11-0.59 p = 0.001). Analysis of goodness of fit showed that this model explained around 25% of the variable's variance (R 2 Nagelkerke = 0.25; Hosmer-Lemeshow test = 0.063). Thus, in our cohort, nonsense variants seem to be protective regarding insulin secretion status. We then evaluated the variants individually, and eight of them, within four genes (ARAP1, GCK, KCNQ1 and IGF2BP2), were significantly associated with one of the extremes of DI. After adjustment for confounding factors (age, gender and BMI), only three novel variants, in the ARAP1, IGF2BP2 and GCK genes, remained significantly associated (see Table 3). The strongest significant effect was observed for carriers of the IGF2BP2 variant, showing an 85% (CI = 50% to 95%) reduced probability to be in the pathological extreme of insulin secretion relative to insulin resistance. Carriers of SNPs in ARAP1 and GCK showed similar results in significance and effect, with a mean probability of 70% and 62%, respectively. Other genes did not reach significance in our samples. Of note, 22 out of 29 (75%) of the carriers of the missense variant in ARAP1 were diagnosed as normal glucose tolerant (NGT), assessed by OGTT. Additionally, between the 18 carriers of the intronic variant in IGF2BP2, 13 (72%) were identified as NGT.

Genetic Risk Model of Significant Variants
Due to the low frequency observed for the variants in the ARAP1, IGF2BP2 and GCK genes, without subjects carrying more than one variant, we constructed a genetic risk model. We initially used all three associated variants. Unexpectedly, when all three SNPs in the genetic risk model were analyzed by binomial regression, GCK did not reach significance. We therefore used only the ARAP1 and IGF2BP2 variants to build a genetic model. Association analysis of this genetic risk model adjusted for age, gender and BMI, strengthened both significance and effect compared to the individual association (see Table 4). Carriers of at least one of the two allele variants showed to be, on average, 78% less likely to be in the lower, pathological, extreme of DI, with a reduction of between 90% and more than 50%. Interestingly, this model explained more than 25% of DI variance (R 2 Nagelkerke = 0.254, p = 0.015). Observing a large number of NGT between carriers of each SNP, a post hoc regression analysis was performed to assess whether glucose tolerance status is associated with being carriers of at least one of the two selected variants (see Supplementary Table S2). Indeed, carriers of any of the two variants in the genetic risk model had a 2.4-fold higher probability to be NGT after adjustment for age, gender and BMI than noncarriers (OR = 2.41 95% CI = 1.14-5.12 p = 0.02).

Discussion
In this study, we employed a candidate gene approach with NGS technology to confirm and validate results derived from genome-wide association analyses in a southern European real-life cohort with a high prevalence of overweight/obesity. As previously discussed, GWAS results should be confirmed and validated with further studies, including molecular and cellular studies or targeted sequencing [22,23]. Because of its own technique, GWA studies focus on common variants. Empirical observations showed that heritability explained by common variants emerging from GWAS is limited, especially for multifactorial diseases such as diabetes or obesity. Possibly, rare and low-frequency variants harbor more effect, and for a selective pressure in fitness, they are less frequent. Additionally, GWASderived variants may not directly affect the trait but might be in linkage with a "real" causal variant. Furthermore, SNPs in noncoding regions could affect protein regulation through several ways pre/post-transcriptionally and translationally, such as modulation of chromatin, RNA transcription, translation and stability. For all this evidence, we performed deep resequencing of nine genes selected from the most strongly associated with insulin secretion from GWAS (see Supplementary Table S1), to confirm the genes' roles and to extend our knowledge on the T2D molecular mechanism.
Considering all variants together, collapsed by function or position in the nine genes, the presence of nonsense variants is associated significantly with a reduced chance to be in the lower quintile of the disposition index (i.e., the quintile with the most reduced insulin secretion adjusted for insulin resistance). Carriers of nonsense variants in this dataset showed on average a 75% reduced probability to be in the worst quintile of insulin secretion, ranging from 45% to 90%. To explain this association of the nonsense variants with a protective role, we could hypothesize that those nonfunctional forms of these proteins could interfere with molecular feedback pathways responsible for the processing of insulin secretion. For example, reducing the expected increase in insulin secretion was secondary to the rise in insulin resistance. This may result in the lowering of DI and possibly slowing down diabetes progression. However, we cannot exclude additional mechanisms, such as enhancing or blocking secondary effectors of insulin secretion or signaling or positive/negative feedback pathways.
Analyzing data individually, three variants in the GCK, ARAP1 and IGF2BP2 genes were significantly associated with DI. Evaluating these three variants in a single predictive model, the variant in GCK lost significance. Thus, its contribution to the association was dropped. The remaining associated variants were chr3:185363420 A > G in IGF2BP2 and chr11:72422158 A > C, in ARAP1. Both were novel variants not previously reported.

IGF2BP2 Variant
chr3:185363420 A > G in IGF2BP2 (NM_006548) is a variant in intron 15 (c.1708-9 A > G; g.184410 G > C), next to the start of exon 16/16. It is located two nucleotides apart from a rare known variant (rs1199891239 c.1708-7 G > C) laying in a splice site. IGF2BP2 binds and modulates insulin growth factor 2 (IGF2) 5'-UTR mRNA, affecting its localization, translation and stability. Moreover, it modulates the rate and site of translation of target transcripts and protects them from endonucleases or microRNA-mediated degradation [39]. IGF2BP2 is highly expressed in pancreatic islets, but its contribution to diabetes is unclear. Animal models show that it is implicated in regulating growth and metabolism. The null model results in dwarf mice resistant to obesity and fatty liver if subjected to a high-fat diet (HFD). Results in tissue-specific knockout models are divergent. IGF2BP2 knockout in β-cells showed reduced fasting insulin, c-peptide levels and lower glucose-stimulated insulin secretion in animals fed an HFD. KO IGF2BP2 hepatocytes mice showed high resistance to fatty liver in HFD. They also show reduced fat mass and lipid oxidation. In a mouse model where IGF2BP2 was instead upregulated in the liver by the transgene, lipid storage was enhanced, which led to fibrosis and NAFLD [40][41][42]. Of note, the insulin (INS) gene is located near the IGF2 gene, and the INS-IGF2 readthrough transcript (INSIGF) has been observed. It aligns the INS gene in the 5' region to the IGF2 gene in the 3' region. INSIGF expression was found in the placenta, liver, pancreas, fat, ovary and endothelial tissues, the tissues most strongly implicated in regulating metabolism. It has been speculated that IGF2BP2 could bind the INSIGF transcript with the same domain utilized with IGF2, also fine modulating insulin expression and consequently insulin receptor activity, thus ameliorating DI [43].
Given this scenario, we can speculate that this could be a splicing variant likely impairing IGF2BP2 protein function. Thus, our results are in line with previous studies, which demonstrated that IGF2BP2 deletion in mice improves glucose tolerance and insulin sensitivity and protects mice from diet-induced obesity and fatty liver [44]. Finally, IGF2BP2 also contributes to obesity and T2D through its regulation of IGF2, which participates in the pathogenesis of these diseases [45].

ARAP1 Variant
The variant in ARAP1 (NM_001040118), chr11:72422158 A > C is a missense, Val 374 Gly (V374G). Several prediction tools (such as SIFT, PolyPhen2, MutPred, FATHMM and PROVEAN) suggest that V374G is deleterious or probably damaging for protein function. For example, PolyPhen-2 analysis of V374G results as probably damaging with a score of 0.971 (sensitivity: 0.77; specificity: 0.96). ARAP1 phosphorylates a large family of GTPases, which modulate actin and cytoskeleton through ARF and RHO proteins. It is wildly expressed and involved in the Golgi apparatus, molecular trafficking and cellular membrane function. ARAP1 is activated by (3,4,5) trisphosphate (PIP3) and 3.4 PI (PIP2) with less efficiency. PIP3 is a secondary signaling lipid generated by insulin signaling. It has been reported in the drosophila cell model that deleting a negative regulator for PIP3 (phosphatidylinositol 5 phosphate 4-kinase (PIP4K)) causes an increase in PIP3 levels, with enhanced insulin sensitivity [46]. Additionally, overexpression of ARAP1 mRNA in the human pancreatic cell, due to a common functional variant (rs11603334; MAF in non-Finnish EU: 16%) in the promotor, was associated with decreased production of proinsulin and an increase in T2D risk [47]. Contrariwise, a proinsulin-raising variant was associated with lower fasting glucose (0.019 mg/dL per allele; p = 1.7 × 10 −4 ), lower A1C (0.023%; p = 0.02), improved β-cell function (p = 1.1 × 10 −5 ) and a lower risk of T2D (OR = 0.88; p = 7.8 × 10 −6 ) [48]. Furthermore, ARAP1 regulates the ARF family of GTPases, which control several key cellular processes, including membrane trafficking such as secretion or endocytosis, ciliogenesis, lipid metabolism, energy balance, motility, division, apoptosis and transcriptional regulation [49]. Among the ARF family, ARAP1 strongly interacts with ARF6, which is a known modulator of insulin secretion [50][51][52][53]. Thus, as ARAP1 is involved in the Golgi apparatus, molecular trafficking and cellular membrane, its contribution to insulin secretion can be postulated by its ability to affect insulin storage in vesicles, their movement, membrane binding and release. A nonfunctional ARAP1 protein may lead to a decrease in glucose-stimulated insulin secretion, possibly via ARF6. Overall, our data are in line with Kulzer and Strawbridge's results, where higher levels of ARAP1 mRNA are associated with an increased risk of T2D and decreased proinsulin release, while the reduction in ARAP1 levels or function ameliorates β-cell function and T2D risk.

Post hoc Analysis
In our results, unexpectedly, the ARP1 and IGF2BP2 variants protectively influence DI. Analysis showed an increased probability for carriers to be in the highest quintile of insulin secretion adjusted for insulin sensitivity. This condition is also confirmed by the high number of carriers presenting with normal glucose tolerance. Our post hoc analysis displayed a significant association between normal glucose metabolism and the variants in a genetic risk model (see Supplementary Table S2). Specifically, carriers of any variant present in the genetic risk model are significantly associated (p = 0.02) with a 2.4-fold (95% CI = 1.14-5.12) higher probability to be normoglycemic (NGT).

Novelties and Limits
As limits, our study samples allow us to detect only ORs of relative medium magnitude (1.7-2.6 depending on MAF). Thus, we could have missed some variants that exert weaker effects or that become significant only in very large datasets. However, as suggested by Nejentsev et al., in complex diseases, such as T2D, there may be no low-frequency variants, or very few, with very strong effects (e.g., allele OR >3). Even if such variants have large impacts on a certain molecule's function, it is possible that in complex multifactorial diseases, such a molecule and its biological pathway are just one of many contributing to the pathogenesis. Nevertheless, the discovery of such rare variants using high-throughput sequencing will help to identify disease genes in the loci found by GWAS in various complex diseases. [54].
Results from published GWAS and metadata studies need to be assessed in specific and real-life populations to generalize findings in gene function and ethnicity. The novelties of this study are the targeted resequencing searching for associated variants, especially rare or low frequencies, harbored in genes identified by GWAS [22,23] in a Central Italy cohort, and the study design, exploring the extremes of a trait (the disposition index of insulin secretion), which allows more homogeneity in the study sample and enhances statistical power. Our study shows that nonsynonymous variants in nine candidate genes are all associated with better insulin secretion adjusted for insulin resistance. Additionally, from the single analysis, the novel low-frequency variants chr11:72422158 A > C in ARAP1 and chr3:185363420 A > G in IGF2BP2 showed significant association with healthier insulin secretion, relative to insulin sensitivity, measured by DI.

Study Subjects
From a cohort of 2232 white Italian patients enrolled at Policlinico Umberto I Hospital of University of Rome Sapienza, Italy, attending the outpatient clinics of Endocrinology and Diabetology during the years 2001-2018, we selected 757 subjects from the first and last quintiles of DI distribution for the sequencing study. Ethnicity was self-reported. Anthropometric and clinical measurements comprising a minimum 3-point OGTT were recorded in an anonymized database. Body mass index (BMI) was calculated as body weight (kg) divided by height squared (m 2 ) and was used as a marker of obesity. Glucose tolerance status (NGT, IFG: impaired fasting glucose, IGT: impaired glucose tolerance, DM: diabetes mellitus) was diagnosed according to ADA 2021 [2]. Plasma/serum biochemistry (glucose, insulin, full lipid profile, transaminases, etc.) was measured in the same laboratory with standard techniques. The disposition index (DI), as well other clinical derivative estimates of insulin release and sensitivity, was calculated from OGTT. The insulinogenic index (IGI30), as a measure of glucose-stimulated insulin secretion and also of β-cell function, was calculated as (Ins30-Ins0)/(Glc30-Glc0) [55]. Insulin sensitivity (ISI) was estimated as [56]: 10,000/(Glc0 × Ins0 × GlcMean × InsMean)1/2. DI was calculated as the product of IGI30 and ISI for estimating insulin secretory capacity adjusted for insulin resistance [57].
Samples to be sequenced were selected from two extremes of insulin secretion calculated by DI (1st and 5th quintiles or <20% and >80% of DI distribution) in the whole cohort comprising more than 2000 patients. To compare genetic variants between subjects with a better and worse DI index of insulin secretion, they were analyzed as a case-control cohort.
Study protocols and informed consent procedures were approved by the local Institutional Ethics Committees

DNA Extraction and Sequencing
DNA extraction was carried out from white-cell peripheral blood by the standard procedure of the salting-out method. DNA was then evaluated for quality (NanoDrop 2000 Spectrophotometers, Thermo Fisher, Waltham, MA, USA) and quantified (Qubit Fluorometric Quantification, Thermo Fisher, Waltham, MA, USA) before processing. The design of the gene panel, which included 9 genes (i.e., ADAMTS9 NM_182920, ADYC5 NM_183357, ARAP1 NM_001040118, CDKAL1 NM_017774, GCK NM_000162, IGF2BP2 NM_006548, JAZF1 NM_175061, NAT2 NM_000015 and KCNQ1 NM_000218), was made with DesignStudio (Illumina) online software. GC content, specificity, probe interaction and presence of SNPs were considered in the probes' selection. The probe's panel was optimized through in silico simulation. Then, it was tested and validated in Illumina WetLab. Each probe comprehends the sequences designed for capturing the regions of interest, including one specific sequence to be utilized in successive genetic amplification. NGS was made through Illumina TrusSeq Custom Amplicon (Illumina, San Diego, CA, USA) technology in the Illumina MiSeq (cartridge V2 300c) sequencer. Probes were designed to amplify, by 219 amplicon intermediates, the coding region, along with 100 bp flanking on both sides of the exons, for a total of 29,685 bp. The study design provides a mean coverage of 100× for each sample, allowing more accurate base calling, especially for low-frequency variants. Briefly, the sequencing process started with amplification of all exons and flanking 100 bp from selected genes via amplicon generation. They represent the genetic libraries. Obtained libraries from all samples were purified singularly through the magnetic beads approach. Then, they were normalized and pooled together to perform high-throughput parallel sequencing by cluster generation and successive sequencing by reading fluorescence. Following the runs of the libraries on a MiSeq system, data were automatically processed using built-in and on-cloud software, such as Illumina software BaseSpace (https://basespace.illumina.com, accessed on 20 February 2021). The output variant call format (VCF) file was then annotated through BaseSpace, VariantStudio (Illumina), wANNOVAR (http://wannovar.wglab.org accessed on 20 September 2021) and Cravat (https://www.cravat.us/CRAVAT/ accessed on 20 September 2021) software. Collected data were analyzed both with dedicated software and plug-ins made by Illumina and free bioinformatics and biostatistics tools (SAMtools, BCFtools, VCFtools) [58]. The variants were annotated as nonsense, missense, splicing, synonymous and UTR following published guidelines [59]. Functional affection of the variants was investigated using the major prediction programs available: SIFT, PolyphenII, SNP&go, Provean, Mutation T@ster, Mutation Assessor, FATMHH and CADD were used for exonic, while ESEfinder, GeneSplicer, and NetGene2 for intronic variants were used. This methodological approach allowed us to assess, in transcribed regions of genes associated with insulin secretion, the distribution of genetic variants within our southern European cohort.

Variants Filtering
SNPs and insertions/deletions were identified across the targeted subset of the reference genome (hg19). We filtered all variants observed for quality and quantity of reads, as well as information on annotated variants. Several filters were subsequently applied. First were selected data with more than 30 reads (DP > 30) and genotype quality equal to or more than 30 (GQX ≥ 30). Additionally, default sample and record levels filters were applied from Illumina VCF metrics. Then, variants with genotype quality less than 99 (GQX < 99) were filtered out to avoid most false-positive results from NGS. All passing-filter variants were retained.

Statistical Analysis
Categorical variables were compared with the Chi-square test or Fisher's exact test. Differences between continuous variables were evaluated by two-tailed Student's t-test and ANOVA. For nonparametric measures, the Mann-Whitney U test was used. To control for the effects of other confounding factors, multivariate linear and logistic regression analyses were performed. The adequacy of the final model was assessed using the Hosmer-Lemeshow goodness-of-fit test. Furthermore, the Nagelkerke R 2 was calculated to indicate how useful the explanatory variables in the model were in predicting DI association [60]. Variants in carriers were considered both individually and collapsed together, evaluating the possible combined effect. In general, p < 0.05 was taken as statistically significant, except where Bonferroni correction was applied. All statistical analyses were performed with SPSS (IBM Corp.

Power Calculation
According to Guey et al. [29], we estimated that an ascertainment sample of 757 individuals in the top and bottom quintiles of a quantitative trait (DI in our study), assuming a disease prevalence of 0.15, gave a power >80% with a significance level of 0.05, to detect low-frequency (1-5%) variants, with ORs ranging between 1.7 and 2.6, depending on allele frequency (from 1 to 5%), from a population of more than 2000 subjects. Power analysis was performed by the Genetic Association Study (GAS) Power Calculator (© 2017 Jennifer Li Johnson, University of Michigan; available online: https://csg.sph.umich.edu/abecasis/gas_power_calculator/index.html accessed on 20 November 2021).

Conclusions
In conclusion, nonsense variants in all nine candidate genes showed association with better insulin secretion adjusted for insulin resistance. Furthermore, in the ARAP1 and IGF2BP2 genes, we found two low-frequency novel variants (MAF = 1.2% and 1.9% for IGF2BP2 and ARAP1, respectively) showing independent association with insulin secretion adjusted for insulin sensitivity. Importantly, here, we demonstrated the association and measured the effect of each of the newly discovered low-frequency variants, both separately and analyzed together. Thus, in our southern European real-life cohort, we confirmed the role of the ARAP1 and IGF2BP2 genes in modulating insulin secretion assessed with DI. Further and deeper genetic studies are warranted to assess the presence and effects of low-frequency variants involved in insulin secretion.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on reasonable request. The data are not publicly available due to privacy restrictions, lacking specific patients' consent.

Conflicts of Interest:
The authors declare no conflict of interest.