Computational Analysis of Single Nucleotide Polymorphisms Associated with Altered Drug Responsiveness in Type 2 Diabetes

Type 2 diabetes (T2D) is one of the most frequent mortality causes in western countries, with rapidly increasing prevalence. Anti-diabetic drugs are the first therapeutic approach, although many patients develop drug resistance. Most drug responsiveness variability can be explained by genetic causes. Inter-individual variability is principally due to single nucleotide polymorphisms, and differential drug responsiveness has been correlated to alteration in genes involved in drug metabolism (CYP2C9) or insulin signaling (IRS1, ABCC8, KCNJ11 and PPARG). However, most genome-wide association studies did not provide clues about the contribution of DNA variations to impaired drug responsiveness. Thus, characterizing T2D drug responsiveness variants is needed to guide clinicians toward tailored therapeutic approaches. Here, we extensively investigated polymorphisms associated with altered drug response in T2D, predicting their effects in silico. Combining different computational approaches, we focused on the expression pattern of genes correlated to drug resistance and inferred evolutionary conservation of polymorphic residues, computationally predicting the biochemical properties of polymorphic proteins. Using RNA-Sequencing followed by targeted validation, we identified and experimentally confirmed that two nucleotide variations in the CAPN10 gene—currently annotated as intronic—fall within two new transcripts in this locus. Additionally, we found that a Single Nucleotide Polymorphism (SNP), currently reported as intergenic, maps to the intron of a new transcript, harboring CAPN10 and GPR35 genes, which undergoes non-sense mediated decay. Finally, we analyzed variants that fall into non-coding regulatory regions of yet underestimated functional significance, predicting that some of them can potentially affect gene expression and/or post-transcriptional regulation of mRNAs affecting the splicing.


Introduction
Diabetes is one of the leading causes of mortality in contemporary society [1]. A recent report by the International Diabetes Federation indicated an onset rate of about 8.4% in adults and a total number of 382 million cases of diabetes worldwide. This number is estimated to critically increase to

Results
Several reports have so far described the association of SNPs and altered drug responsiveness in T2D [13]. These SNPs fall within genes with a clear relation to drug transport (SLC22A1 and SLC22A2), metabolism (CYP2C9), activity (PPARG and ABCC8), genes that have a direct role in diabetes onset or progression (KCNJ11, IRS1 and TCF7L2) or that have been frequently associated by GWAS to T2D and drug response (CAPN10), despite the fact that the exact mechanisms are still unclear.
Here, a literature-based selection of SNPs in the most commonly associated genes has been performed, and the related SNPs have been analyzed using a combination of in silico and experimental approaches. Protein alignments and evolutionary conservation of polymorphic sites, analysis of protein tertiary structure (if available) and Sorting Intolerant From Tolerant (SIFT)/PolyPhen scores were used to predict the impact of protein-coding SNPs (cSNPs) on protein functionality. The most significant results are shown in the manuscript, and other data are collected in the Figures S1-S9. As many SNPs fall outside the coding regions of genes, we systematically searched CpG islands, transcription factors' binding sites and relevant epimarks that can potentially be altered by SNPs. Using RNA-Seq on HEK293 cell lines, and taking advantage of a large collection of RNA-Seq datasets from different tissues and cell lines available in our laboratory, we also correlated SNPs to gene expression. Selected SNPs, their genomic localization, the possible alleles at that position, their IDs (according to the database of Single Nucleotide Polymorphisms, dbSNP), as well as the amino acid change (if any), are reported in Table 1.

Computational Predictions for Coding SNPs
For the SLC22A1 gene, which encodes the Oct1 protein, results from the analysis of two cSNPs (rs12208357 and rs34059508) are discussed here. The analysis of rs34130495 is reported in the Figure S1. The rs12208357-a C/T substitution in the first exon of SLC22A1 ( Figure 1A)-causes R61C amino acid substitution into an extracellular topological domain of Oct1 protein (organic cation transporter 1; UniProt ID O15245) involved in the binding of the substrate. It leads to a decreased binding affinity to substrates [14]. Notably, the mutated residue is not charged, is smaller and more hydrophobic than the arginine, and it is predicted to alter inter-residual interactions and the correct folding of the ion channel. Protein alignment and analysis of evolutionary conservation revealed that arginine is highly conserved at this position in many species ( Figure 1A). SIFT and PolyPhen scores (Table 2) indicate this SNP as damaging to protein functionality. The rs34059508 in exon 9 causes G465R amino acid change. Interestingly, 3D structure analysis of Oct1 protein revealed that G465 is close to the inner surface of the channel ( Figure 1B), and the presence of a charged residue (arginine) in that position is likely to affect ion channel functionality. Accordingly, to support this observation, SIFT and PolyPhen scores were 0 and 0.999, respectively, indicating this SNP as "strongly damaging". to support this observation, SIFT and PolyPhen scores were 0 and 0.999, respectively, indicating this SNP as "strongly damaging".  Figure. The boxes indicate the exons and numbering is indicated above. The blue arrows indicate the genomic localization of the polymorphisms. In the lower part of panels (A), (C) and (D) are shown the 3D representations of amino acid changes corresponding to these SNPs. The canonical amino acid is reported in green and the mutated residue is shown in red. The evolutionary conservation score for each altered amino acid residue-according to Consurf software-is also shown. The specific amino acid is highlighted by a blue box; (B) on the left part, a schematic representation of the Oct1 protein in the cell membrane is shown. The red asterisk indicates the approximate position of the amino acid 465G, whose alteration is associated with reduced metformin responsiveness. On the right part of panel (B), the 3D protein structure of the entire Oct1 protein is shown. The inner portion of the channel is indicated in blue. Amino acid changes under evaluation are indicated in red (red arrow indicates G465R substitution); (D) on the right, the 3D protein structure of Cyp2c9 is shown, with the canonical R144 residue (in green) and three amino acids that are predicted to interact with it (in orange). Dashed white lines indicate the H-bonds among these residues.
However, we found at least three protein isoforms of Oct1 in primates that carry this amino acid substitution ( Figure 1A), indicating that the polymorphic protein may be evolutionarily conserved. In addition, taking advantage of public human transcriptome data (available at Gene Expression Atlas; [15]) we found that SLC22A1 has high tissue-specific expression, mainly restricted to the liver ( Figure S10).
For the IRS1 gene, we focused on a cSNP (rs1801278) in the exon 1 that causes the G971R change. Notably, it is often referred to as G972R, even though-according to dbSNP and UniProt databases-the exact nomenclature is G971R. Thus, we will refer to it as "G971R". Protein alignment revealed that several species (reptiles and birds) carry this amino acid substitution ( Figure 1C). Despite this substitution being conserved along the evolution, the biochemical properties of the residues are very different, as they differ in size and charge. The steric effect and the presence of a positive charge in the side chain are predicted to affect protein folding locally and/or the binding to other proteins.  Figure. The boxes indicate the exons and numbering is indicated above. The blue arrows indicate the genomic localization of the polymorphisms. In the lower part of panels (A), (C) and (D) are shown the 3D representations of amino acid changes corresponding to these SNPs. The canonical amino acid is reported in green and the mutated residue is shown in red. The evolutionary conservation score for each altered amino acid residue-according to Consurf software-is also shown. The specific amino acid is highlighted by a blue box; (B) on the left part, a schematic representation of the Oct1 protein in the cell membrane is shown. The red asterisk indicates the approximate position of the amino acid 465G, whose alteration is associated with reduced metformin responsiveness. On the right part of panel (B), the 3D protein structure of the entire Oct1 protein is shown. The inner portion of the channel is indicated in blue. Amino acid changes under evaluation are indicated in red (red arrow indicates G465R substitution); (D) on the right, the 3D protein structure of Cyp2c9 is shown, with the canonical R144 residue (in green) and three amino acids that are predicted to interact with it (in orange). Dashed white lines indicate the H-bonds among these residues. However, we found at least three protein isoforms of Oct1 in primates that carry this amino acid substitution ( Figure 1A), indicating that the polymorphic protein may be evolutionarily conserved. In addition, taking advantage of public human transcriptome data (available at Gene Expression Atlas; [15]) we found that SLC22A1 has high tissue-specific expression, mainly restricted to the liver ( Figure S10).
For the IRS1 gene, we focused on a cSNP (rs1801278) in the exon 1 that causes the G971R change. Notably, it is often referred to as G972R, even though-according to dbSNP and UniProt databases-the exact nomenclature is G971R. Thus, we will refer to it as "G971R". Protein alignment revealed that several species (reptiles and birds) carry this amino acid substitution ( Figure 1C). Despite this substitution being conserved along the evolution, the biochemical properties of the residues are very different, as they differ in size and charge. The steric effect and the presence of a positive charge in the side chain are predicted to affect protein folding locally and/or the binding to other proteins.  In addition, PolyPhen marked this cSNP as damaging (Table 2). However, the absence of crystallographic data and of homologous proteins as templates did not permit us to determine a reliable 3D model to analyze such predicted damaging changes. Furthermore, analysis of public databases revealed IRS1 to be almost ubiquitously expressed ( Figure S10). This is in line with the pleiotropic role of Irs1 protein in the activation of the PI3K/AKT1/GSK3 signaling pathway, and, consequently, in the stimulation of glucose transport and of glycogen synthesis [16].
Finally, we analyzed cSNPs (rs1799853 and rs1057910) in the CYP2C9 gene associated with sulphonylureas resistance. The gene encodes for cytochrome P450-2C9, a hepatic enzyme involved in the metabolism of most of sulphonylureas. The rs1799853 is a C/T variation in the exon 3 (R144C; Figure 1C) and is associated with resistance to sulphamethaxazole in T2D patients [17,18]. Although such a change is conserved not only in primates through evolution, the novel residue differs from the canonical for several characteristics. Indeed, the positive charge and the increased steric effect of the wild-type amino acid (arginine) with a neutral and smaller residue (cysteine) are predicted to have structural effects in the core of the protein. Computational prediction performed using the HOPE (Have (y)Our Protein Explained) prediction tool revealed that R144 forms hydrogen bonds with R139, S180 and Q261 ( Figure 1C). Conversely, the mutated residue (i.e., C144) does not. Accordingly, both PolyPhen and SIFT scores indicate this cSNP as damaging ( Table 2). CYP2C9 gene has a peculiar expression pattern, mostly restricted to the liver, small intestine and duodenum ( Figure S10). This is in line with the notion that this enzyme is responsible for sulphonylureas' metabolism [18].
Similar conservation-and modeling-based approaches were also used to analyze SNPs in the coding regions of other genes commonly associated with altered drug response in T2D, such as SLC22A2, KCNJ11, ABCC8 and PPARG genes. We could not in silico find any significant alteration in protein functionality according to the amino acid changes in these genes, or these nucleotide variations are already well characterized. A summary of the results is reported in Figures S1-S9.

SNPs Map to Regulatory Regions and Motifs of Transcription Factors
To address how the above-described SNPs may contribute to drug resistance, we extensively analyzed whether all variants reported in Table 1 map to regulatory sites, including transcription factors' binding sites (TFBS), CpG islands, ESE/ESS (Exonic Splicing Enhancer/Exonic Splicing Silencer) and ISE/ISS (Intronic Splicing Enhancer/Intronic Splicing Silencer), and other regulatory regions ( Figure 2; Table S1). Taking advantage of public ChIP-Seq data of the ENCODE Consortium for several transcription factors (TFs) in different cell lines (Methods), we first assessed whether SNPs can disrupt confirmed TF binding sites. Interestingly, as schematized in Figure 2, rs34130495 (in the SLC22A1 gene) disrupts the consensus sequence recognized by Tcf12, Tfap2a, Tfap2c and Max, and the rs35167514 in the same gene alters a binding site for Max, a master transcriptional regulator. The SNP rs1801278 (IRS1 gene) falls within a Pol2a consensus sequence, while the rs3842570 in CAPN10 gene falls within the binding site of Znf263, a negative transcriptional regulator.
Using a similar approach, we also analyzed whether drug resistance-associated SNPs map to genomic regions hypersensitive to DNAseI digestion, i.e., nucleosome-free transcriptional regulatory regions usually accessible to transcription factors. In line with the previous analysis, the rs34130495 and rs35167514 SNPs in SLC22A1 falls within a strong DNAseI hypersensitive site (DHS), supported by experimental evidence in 80 cell lines. Similarly, the rs1801278 in the IRS1 gene falls in DHS found in more than 90 cell lines. Weaker evidence has been also found for rs5030952 in CAPN10 and rs757110 in ABCC8 genes (DHS reported in 23 and 14 cell lines, respectively). The rs12208357 in SLC22A1 gene falls in a DHS were reported only in hepatocytes and medulloblastoma, while, for the SNP rs3842570, in chorion cells and pancreatic islets.
Similar analysis was also carried out on computationally predicted consensus for TFBS (Jaspar and TRANSFAC databases). The rs5219 SNP-which causes the K23E substitution in KCNJ11 gene ( Figure S3)-maps to the consensus of the nuclear receptor PPARγ, and it is predicted to alter the binding affinity of this TF, a crucial player in glucose and lipid homeostasis and that is associated with metabolic disorders [19]. Notably, the same SNP also falls in a CpG island. Schematic summary of such information for other SNPs are reported in Table S1.
Finally, we investigated if drug resistance-associated SNPs may possibly affect the splicing extensively searching-in exons and introns-the regulatory splicing sites (ISE/ISS and ESE/ESS). This analysis revealed that only the rs3842570 and the rs12208357 in CAPN10 and SLC22A1 genes are Taking advantage of public ChIP-Seq data of the ENCODE Consortium for several transcription factors (TFs) in different cell lines (Methods), we first assessed whether SNPs can disrupt confirmed TF binding sites. Interestingly, as schematized in Figure 2, rs34130495 (in the SLC22A1 gene) disrupts the consensus sequence recognized by Tcf12, Tfap2a, Tfap2c and Max, and the rs35167514 in the same gene alters a binding site for Max, a master transcriptional regulator. The SNP rs1801278 (IRS1 gene) falls within a Pol2a consensus sequence, while the rs3842570 in CAPN10 gene falls within the binding site of Znf263, a negative transcriptional regulator.
Using a similar approach, we also analyzed whether drug resistance-associated SNPs map to genomic regions hypersensitive to DNAseI digestion, i.e., nucleosome-free transcriptional regulatory regions usually accessible to transcription factors. In line with the previous analysis, the rs34130495 and rs35167514 SNPs in SLC22A1 falls within a strong DNAseI hypersensitive site (DHS), supported by experimental evidence in 80 cell lines. Similarly, the rs1801278 in the IRS1 gene falls in DHS found in more than 90 cell lines. Weaker evidence has been also found for rs5030952 in CAPN10 and rs757110 in ABCC8 genes (DHS reported in 23 and 14 cell lines, respectively). The rs12208357 in SLC22A1 gene falls in a DHS were reported only in hepatocytes and medulloblastoma, while, for the SNP rs3842570, in chorion cells and pancreatic islets.
Similar analysis was also carried out on computationally predicted consensus for TFBS (Jaspar and TRANSFAC databases). The rs5219 SNP-which causes the K23E substitution in KCNJ11 gene ( Figure S3)-maps to the consensus of the nuclear receptor PPARγ, and it is predicted to alter the binding affinity of this TF, a crucial player in glucose and lipid homeostasis and that is associated with metabolic disorders [19]. Notably, the same SNP also falls in a CpG island. Schematic summary of such information for other SNPs are reported in Table S1.
Finally, we investigated if drug resistance-associated SNPs may possibly affect the splicing extensively searching-in exons and introns-the regulatory splicing sites (ISE/ISS and ESE/ESS). This analysis revealed that only the rs3842570 and the rs12208357 in CAPN10 and SLC22A1 genes are located in an ISE. In particular, these indels were predicted to abrogate two binding sites for the same splicing factor, the Mbnl1 protein (AGCTGGG).

RNA-Seq Identifies New Transcripts of the CAPN10 Gene
As gene annotations are rapidly being updated thanks to high-throughput sequencing, we also searched if these SNPs overlap newly annotated transcripts (both coding and not) and/or Expressed Sequence Tags (ESTs). Details for all analyzed SNPs are provided in the Table S1 and are graphically summarized in Figures 3 and 4. located in an ISE. In particular, these indels were predicted to abrogate two binding sites for the same splicing factor, the Mbnl1 protein (AGCTGGG).

RNA-Seq Identifies New Transcripts of the CAPN10 Gene
As gene annotations are rapidly being updated thanks to high-throughput sequencing, we also searched if these SNPs overlap newly annotated transcripts (both coding and not) and/or Expressed Sequence Tags (ESTs). Details for all analyzed SNPs are provided in the Table S1 and are graphically summarized in Figures 3 and 4.  ( Figure 3).
In addition, RNA-Seq data revealed that the rs5030952, annotated as "intergenic" and associated with CAPN10 gene, maps in the intron of two new transcripts spanning CAPN10 and GPR35 genes. Notably, one of these transcripts (details in Figure 4) is annotated in University of California Santa Cruz Genome Browser (UCSC Genome Browser), Ensembl and GENCODE databases as "non-sense mediated decay" (NMD). RT-PCR, upon treatment of HEK293 cells with CHX, confirmed the presence of this new transcript generated by trans-splicing between CAPN10 and GPR35 genes, revealing that it undergoes NMD (Figure 4). We concluded that rs5030952 should be considered an "intronic" SNP of this new transcript encompassing CAPN10 and GPR35 loci.
Similarly, we found that the intronic SNPs associated with TCF7L2 (rs12255372 and rs7903146) map to a transcribed region (data not shown) and overlap two unspliced ESTs (details in the Table S1).
Finally, taking advantage of our recently published-and still unpublished-RNA-Seq datasets, we investigated whether the presence of these SNPs may correlate to gene expression We used ab initio transcript reconstruction starting from RNA-Sequencing data, generated from HEK293 cell line sequencing. We found that the rs3842570 in CAPN10-described as intronic-maps within an actively transcribed region (Figure 3). The new transcript reconstructed from RNA-Seq overlaps an AceView [20] gene model (CAPN10 and GPR35.mAug10-unspliced) and EST BF528146. Similarly, the SNP rs3792267-located in the third intron of the same gene-overlaps a new transcribed region-unspliced ESTs (BQ899318, AL703891 and DA094595) from sciatic nerve and cerebellum libraries and a transcript annotated as "retained intron" in Ensembl and GENCODE databases (ENST00000494738). Taking advantage of RNA-Seq transcript reconstruction combined with a RT-PCR assay in RT + (Reverse Transcriptase) and RT-cDNA from HEK293 cell line (Methods), we could confirm that both of the SNPs map to actively transcribed regions, despite being currently annotated as "intronic" SNPs of the CAPN10 gene. Direct Sanger sequencing validated our findings (Figure 3).
In addition, RNA-Seq data revealed that the rs5030952, annotated as "intergenic" and associated with CAPN10 gene, maps in the intron of two new transcripts spanning CAPN10 and GPR35 genes. Notably, one of these transcripts (details in Figure 4) is annotated in University of California Santa Cruz Genome Browser (UCSC Genome Browser), Ensembl and GENCODE databases as "non-sense mediated decay" (NMD).
RT-PCR, upon treatment of HEK293 cells with CHX, confirmed the presence of this new transcript generated by trans-splicing between CAPN10 and GPR35 genes, revealing that it undergoes NMD (Figure 4). We concluded that rs5030952 should be considered an "intronic" SNP of this new transcript encompassing CAPN10 and GPR35 loci.
Similarly, we found that the intronic SNPs associated with TCF7L2 (rs12255372 and rs7903146) map to a transcribed region (data not shown) and overlap two unspliced ESTs (details in the Table S1).
Finally, taking advantage of our recently published-and still unpublished-RNA-Seq datasets, we investigated whether the presence of these SNPs may correlate to gene expression levels. RNA-Seq datasets of two HEK293 replicates, 22 thyroid [21], 2 MCF7 [22], eight heart biopsies (manuscript in preparation) were analyzed in order to find samples homozygous for the wild-type, heterozygous or homozygous for the alternative alleles. However, we could find a significant number of samples for each category only for the KCNJ11, CAPN10 and ABCC8 genes. No significant correlation could be found between SNPs in these genes and the mRNA levels of related genes ( Figure S11).

Discussion
A comprehensive analysis of the most relevant single nucleotide variants associated with T2D predisposition and/or with altered drug responsiveness is crucial to understanding their potential contribution to the phenotype. Previous studies have highlighted the association between polymorphic variants and the onset of drug responsiveness in T2D patients. In this regard, our work aimed to systematically infer computational predictions about the effects of such common variations on protein functionality or about their potential impact on genomic regulatory sites. The combination of different computational approaches used herein has confirmed that the organic cation transporter family plays a pivotal role in drug responsiveness. Oct1 and Oct2 proteins are involved in the uptake of organic cations (including drugs) from the bloodstream into hepatocytes [23]. SNPs in the Oct1 encoding gene (SLC22A1) can influence drug response because of their ability to alter hepatic drug clearance as well as the uptake and elimination of drugs. This mechanism has been demonstrated for the 1-methy-4-phenyl-1,2,3,6-tetrahydropyridine (MPTP) [24]. Public RNA-Seq data confirmed the liver-specific expression of this gene, and our in silico prediction revealed that two SLC22A1 cSNPs ( Figure 1A) can potentially be damaging to protein functionality. These predictions strengthen GWAS data about the relationship between SLC22A1 and metformin resistance in T2D. In addition, our 3D model of the polymorphic Oct1 protein has indicated that G465R amino acid change may influence metformin uptake since it is located in close proximity of the inner surface of Oct1 channel. Regarding the CYP2C9 gene, it has been established that it plays an important role in the pharmacokinetics of sulphonylureas [25]. Arg144Cys substitution in the CYP2C9 gene has been shown to markedly reduce the rates of substrate metabolism [26,27]. Specifically, carriers of polymorphic CYP2C9 variants may have both higher therapeutic response and a higher incidence of hypoglycemic adverse events [28]. We have predicted that the lack of arginine in position 144 affects protein stability as this residue is critical in creating H-bonds that contact serine in position 180 and glutamine in position 261 ( Figure 1C). It suggests a potential mechanism for the reduced activity of the mutated enzyme.
Whereas it is arguable that protein-coding SNPs can affect protein functionality-and then that they can associate to clinical manifestation of the disease-it is quite difficult to explain the contribution of intronic or intergenic SNPs. Indeed, most of analyses reported in literature for T2D have been performed on cSNPs. Here, we have considered nucleotide variations that potentially affect regulatory genomic elements since most of SNPs associated by GWAS to complex traits/diseases fall outside protein-coding regions. Therefore, a large fraction of these may determine the loss (or gain) of regulatory functions. Interestingly, we found that all analyzed SNPs fall within at least one TF binding site or genomic region overlapping a DHS and/or CpG island or a splicing regulatory region, according to public ChIP-Seq data and computational predictions. In particular, since few studies investigated the role of polymorphism rs5219 in the KCNJ11 gene in therapeutic response to sulphonylureas [9] or hypoglycemia risk [29], we inferred that this SNP falls in a PPARγ binding site and within a CpG island located in the gene body, possibly linking such variant to altered KCNJ11 expression. However, although we could not find any statistically significant association between this SNP and mRNA levels in our RNA-Seq datasets, we cannot definitely exclude such a mechanism. More interestingly, we found that SNPs, until now annotated as "intronic" or "intergenic", are misannotated. Indeed, through ab initio transcriptome reconstruction and experimental validations we have shown they map inside mature mRNAs.
Two intronic CAPN10 SNPs (rs3842570 and rs3792267) belong to actively transcribed (and spliced) CAPN10 transcripts. Our findings are in line with the gene predictions of AceView database and with ESTs. Notably, the former SNP (a 1 bp deletion) is located in an intron splicing enhancer region, and its presence is predicted to affect the binding of two different splicing factors. We cannot exclude that the presence of this SNP may account for splicing variations that give rise to the new mRNA isoform. Similarly, our data indicate that the "intergenic" rs5030952 falls in the intron of a new transcript, generated by trans-splicing between CAPN10 and GPR35 genes.
Despite being aware that computational predictions alone cannot prove any causal relation between SNPs and the clinical outcome in patients, we believe that combining different in silico approaches to experimental validations can provide insights to improving the knowledge underlying the SNPs' effect on complex phenotypes. In particular, growing evidence indicates that determining specific genotypes is crucial to establishing targeted therapeutic strategies. Thus, it would be desirable to have a deeper understanding of the mechanisms by which environmental and epigenetic factors interact with patient genetic milieu [30], and in contrast, how SNPs are likely to modify genomic regions that are involved in the epigenetic regulation of gene expression or directly modify the primary sequence of non-coding RNAs, and potentially their function. In this scenario, our study highlights that even SNPs that do not affect the protein coding ability of a gene can potentially have a functional impact on splicing and/or gene expression, possibly through epigenetic mechanisms, or they may modify the consensus sequence of transcription factors. Functional targeted studies are needed to experimentally demonstrate the impact of such variations mapping in non-coding regions of the genome.

Computational Predictions of Amino Acid Changes
Amino acidic sequences and functional protein domains of the annotated proteins were retrieved by the UniProt database [34]. The functional effects of nucleotide variants were evaluated only for cSNPs, using SIFT and PolyPhen scores, through the VEP tool (Variant Effect Predictor) in the Ensembl database [35].
The prediction of biochemical and structural effects of amino acidic substitutions was carried out using Project HOPE [36]. For some cSNPs, HOPE was unable to predict structural effects of the substitution, due to the lack of structural information in public databases. In these cases, the prediction of tridimensional structures of the polymorphic proteins was performed applying a homology modeling approach through the web server I-TASSER [37]. Obtained models were finally rendered using the PyMol system [38].
Evolutionary analysis of the residue changes induced by cSNPs was inferred using human and other vertebrates species. In particular, the conservation of polymorphic sites was investigated among vertebrates through the alignment of the longest sequence for the human annotated protein (UniProt) and orthologous protein sequences. These alignments were then submitted to the web server ConSurf [39]. Consurf-based analysis was launched using Swiss-Prot as target database (except for ABCC8 for which we selected UniRef90) and running at least three iterations with the PSI-BLAST algorithm. For the remaining parameters, we used the default settings.

In Silico Analysis of Epimarks and Regulatory Motifs
The genomic coordinates and the allelic frequencies of selected SNPs have been downloaded from dbSNP (v138) database [40]. The table browser option of UCSC Genome Browser [41] was used to retrieve custom tracks of interest (ENCODE Transcription Factor ChIP-seq Uniform Peaks, ENCODE DNaseI Hypersensitive Site Master List). To investigate if SNPs may alter splicing sites, or related regulatory regions such as exonic and/or intronic splicing enhancer/silencer (ESE/ESS and ISE/ISS) we used the web tool RegRNA 2.0 [42], as previously described in Scarpato et al. [43]. Briefly, FASTA files of the nucleotide sequences surrounding the polymorphic site (10 bp upstream and 10 bp downstream the SNP) were retrieved by the UCSC Genome Browser and submitted to RegRNA web server. The analysis of correlation between gene expression and SNPs has been carried out using our published [21,22] and unpublished RNA-Seq data. Specifically, such datasets were inspected using the Integrative Genomic Viewer (IGV) [44]. Expression profiles of the genes associated with reduced drug responsiveness has been retrieved from the RNA-Seq samples of 122 human individuals from 32 different tissues (E-MTAB-2836) available on the Gene Expression Atlas website (available at: http://www.ebi.ac.uk/) [45].

RNA-Sequencing: Library Preparation and Data Analysis
Briefly, RNA has been isolated and quantified from HEK293 cell lines as previously described in [46]. Paired-end libraries were prepared using the TruSeq RNA Sample Preparation Kit (Illumina, San Diego, CA, USA) and sequenced on the Illumina HiSeq2000 platform according to the manufacturer's instructions. Reads mapping was performed using TopHat v.2.0.7 [47], and ab initio transcriptome reconstruction was carried out with Cufflinks [42], setting GENCODE v19 as reference transcriptome. Coverage files were generated using the genomecoverageBed function of BEDtools [48]. Visual inspection on the UCSC Genome Browser of coverage files revealed the presence of new transcripts overlapping the genomic coordinates of analyzed SNPs. PCR duplicates were removed using Picard tools v1.117 (Broad Institute, Cambridge, MA, USA) (available at: http://broadinstitute.github.io/picard/) and SNP calling was carried out using GATK v3.3 workflow (Broad Institute, Cambridge, MA, USA) optimized for RNA-Seq reads, as described in [49]. Normalization, filtering, data inspection and general data analysis statistics were performed using RNASeqGUI [50].

In Vitro Validations
The experimental validation for the new transcripts of CAPN10 and for the CAPN10/GPR35 transcript have been performed on the RNA isolated from HEK293 cell line. HEK293 cells were cultured in Dulbecco's modified Eagle's medium (DMEM, Life Technologies, Carlsbad, CA, USA) supplemented with 10% fetal bovine serum (FBS), glutamine (2 mmol/L), penicillin (100 units/mL) and streptomycin (100 units/mL) at 37˝C in a humidified atmosphere of 95% air and 5% CO 2 . Cells were treated with cycloheximide (CHX) at increasing doses (10, 20, 40, 120 µM) or with the vehicle control (DMSO) for 6 h as described in [51]. Total RNA was isolated from treated cells using TRIzol solution (Invitrogen, Waltham, MA, USA) according to the manufacturer's instructions. RNA obtained by each samples was reverse transcribed using "high-capacity cDNA reverse-transcription kit" (Applied Biosystems, Foster City, CA, USA) and the cDNA were used as template for RT-PCR assays. PCR assays have been performed with MyTaq DNA Polimerase (Bioline Reagents Ltd., London, UK) using these condition: 95˝C for 3 min, followed by 40 cycles at 95˝C for 30 s, 58˝C for 30 s, 72˝C for 70 s, and 72˝C for 7 min. Sanger sequencing to confirm the presence of rs3842570 and rs3792267 in CAPN10 has been carried on PCR amplicons carried out with 2.5U AmpliTaq-Gold (Life Technologies, Carlsbad, CA, USA) according to the manufacturer's protocol. PCR amplicons were verified through electrophoresis on agarose gel (1.5%).