Epigenetic Regulation of Alternative mRNA Splicing in Dilated Cardiomyopathy

In recent years, the genetic architecture of dilated cardiomyopathy (DCM) has been more thoroughly elucidated. However, there is still insufficient knowledge on the modifiers and regulatory principles that lead to the failure of myocardial function. The current study investigates the association of epigenome-wide DNA methylation and alternative splicing, both of which are important regulatory principles in DCM. We analyzed screening and replication cohorts of cases and controls and identified distinct transcriptomic patterns in the myocardium that differ significantly, and we identified a strong association of intronic DNA methylation and flanking exons usage (p < 2 × 10−16). By combining differential exon usage (DEU) and differential methylation regions (DMR), we found a significant change of regulation in important sarcomeric and other DCM-associated pathways. Interestingly, inverse regulation of Titin antisense non-coding RNA transcript splicing and DNA methylation of a locus reciprocal to TTN substantiate these findings and indicate an additional role for non-protein-coding transcripts. In summary, this study highlights for the first time the close interrelationship between genetic imprinting by DNA methylation and the transport of this epigenetic information towards the dynamic mRNA splicing landscape. This expands our knowledge of the genome–environment interaction in DCM besides simple gene expression regulation.


Introduction
Dilated cardiomyopathy (DCM) is the predominant heart muscle disease, characterized by a dilatated left ventricle (LV) and reduced contractility. It is associated with high hospitalization rates, sudden cardiac death risk, and substantial demand for heart failure therapies. The prevalence of all forms of DCM is estimated to be as high as 1:250 [1]. In the patients with DCM, approximately 30%-50% are assumed to have familial predisposition of the disease [2], and 40% of the familial DCM patients possess currently identifiable genetic variations [3], with most of them having an autosomal-dominant

Patient Enrollment and Phenotyping
In the present study, we used data generated in the Care4DCM multi-omics project [8,14]. The diagnosis of DCM was established by impaired heart systolic function, after ruling out coronary artery disease through coronary angiography. Notably, aside from DCM patients with advanced heart failure, patients in early stages of the disease, i.e., left ventricular ejection function > 45% and < 55%, were also included in the present study. Patients with concomitant valvular heart disease, myocarditis, and inflammatory DCM were excluded from the study after being screened with echocardiography and cardiac magnetic resonance imaging (CMR), and after the myocardial biopsy of the patients were examined histopathologically. Other exclusion criteria included previous cardiotoxic chemotherapy, alcoholism, illicit substance abuse, and untreated arterial hypertension. For control subjects, after written informed consent, we collected 31 specimens of left ventricle biopsy from asymptomatic patients after heart transplantation at least 6 months ago. All patient recruitment was authorized by the ethics committee. An independent replication cohort comprised 18 explanted DCM hearts and 8 healthy hearts from traffic accident victims. The following library preparation and analytic procedures were identical in both the screening and replication cohorts. Further information regarding patient enrollment and phenotyping can be found in our previously published study [8].

Library Preparation and Next Generation Sequencing
In the Care4DCM cohort, the biopsies were retrieved from the LV apex during heart catheterization [8]. The specimen of each patient was immediately washed and preserved in liquid nitrogen according to standardized protocols. After the histopathological work-up, DNA and RNA were extracted from the leftover cardiac tissues using Allprep Kits (Qiagen, Düsseldorf, Germany). In the replication cohort, the DCM hearts were immediately transported and stored after explantation, and the hearts of healthy traffic accident victims were flash-frozen less than three hours after death. Before starting sequencing, a Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA) and a eukaryote total RNA pico assay (Agilent Technologies, Santa Clara, CA, USA) were utilized to examine the purity and concentration of RNA. The sequencing of DNA methylation profiling was carried out using the Infinium HumanMethylation 450K BeadChip kit (Illumina, San Diego, CA, USA). The RNA sequencing was performed with the TrueSeq RNA Sample Prep Kit (Illumina, San Diego, CA, USA) to paired-end reads, unstrandedly in the screening cohort and strandedly in the replication cohort.

Quality Control of Sequencing Data
All obtained RNA sequences and DNA methylation profiling went through diverse quality checks. The number of reads with assigned feature was calculated with samtools in the Unix environment [15]. Samples with less than 1,000,000 assigned reads were not included in the further downstream analysis. Finally, 55 samples from the screening cohort passed the RNA quality control, of them, 34 were DCM patients and 21 were controls; 16 samples from the replication cohort passed the RNA quality filter, of them, 11 were DCM patients and 5 were healthy controls. The overall sequencing depth of the samples in the final cohorts is visualized in the histograms ( Figure S5A,B). In addition, the probed methylation sites in that Infinium HumanMethylation 450 K BeadChip kit, known to be possibly influenced by genetic variants, were removed from the following downstream analysis, because the influence of genetic variation on methylation profiling was not in the lens of the present study. In addition, probed methylation sites on X and Y chromosomes and those known to cross-hybridize with non-targeted DNA were dropped. Subsequently, 394,247 qualified probed loci with methylation measurements were included in further analysis.

Data Normalization and Batch Effect Correction
Principal component analysis (PCA) was done using package factoextra in R programming [16] When analyzing the data characteristics of RNA sequences from both cohorts together, a batch effect could be delineated in the principal component analysis derived from the normalized count matrix. In the PCA plot, samples from both cohorts clustered independently from each other, because the first principal component, responsible for up to 40% of the data variances ( Figure S2A), significantly represented the distance between the two cohorts, as could be visualized in the PCA plot ( Figure S2B). However, since the relative distribution of the DCM samples and controls samples were consistent in both screening and replication cohorts, and no statistical test of samples across different cohorts were planned to be made, we did not use a cross-cohort batch correction. When performing the statistical tests for differential exon use (DEU) between DCM patients and controls, a normalization of the read count for each gene and each exonic part defined by hg19/GrCh37 was performed. Likewise, all methylation measurements were normalized. The batch effects from different sequencing dates and flow cells were removed.

Bioinformatic Computation and Analysis
The RNA sequences were mapped to hg19/GrCh37 using HISAT2 in the Unix environment [17], and the annotated bam files were generated. The genome-wide statistical tests for differential gene expression (DGE) between DCM patients and controls were performed using the DESeq2 package in R programming [18]. The genome-wide statistical tests for differential exon use (DEU) between DCM patients and controls were performed using the DEXSeq package in R programming [19]. Features (exonic bins) with a total read count of all samples less than six were filtered out, in order to reduce false positives. The PSI score for each annotated exonic region defined by reference genome GRCh37/hg19 was computed [20]. Moreover, the methylation measurement of each probed site was also mapped to the reference genome hg19/GrCh37 through the GenomicRanges package in R programming [21]. Statistical tests of differential methylated regions of the 394,247 qualified probed methylation sites across the whole genome were performed with the limma package in R programming [22]. The M-values of DNA methylation were used. With limma, linear models of methylation values were defined with the following parameters: condition (DCM or control), sex, age, use of tacrolimus, use of mycophenolate, use of steroid, use of everolimus, use of ciclosporin, principal component 1, and principal component 2. Parameters sex and age were defined as categorical variables; parameters tacrolimus, mycophenolat, steroid, everolimus, and ciclosporin were binary variables, meaning intake of the specific immunosuppresive drug. Parameters principal component 1 and principal component 2 were continuous variables, which were the top two principal components of the methylation data, representing the potential substratefications of the DNA methylation data. A moderated t-statistic was applied for each probe, p values and adjusted p values in the setting of multiple testing were calculated using the Benjamini-Hochberg method. The results were mapped to the reference genome GRCh37/hg19 using the GenomicRanges package in R programming.

Genome-Wide Analysis of Neighboring Intron and Exon
The inclusion of a specific exon and the methylation status of its flanking intron were analyzed across the whole genome in order to investigate the potential local effects of intronic DNA methylation on the inclusion of alternative exons. We determined the bordering exon and intron pairs using the GenomicRanges package in R programming, and implemented a logistic regression analysis in quasi-Poisson distribution, with the independent variable as the intronic DNA methylation levels and the dependent variable as the exon usage. The logistic regression model was carried out in both DCM patients and controls, respectively, and both in intron-exon and exon-intron pairs, respectively. The regression model was adjusted by the distance between intron and exon in the pair. Additionally, we identified the regions with concomitant DEU and DMR (p value <0.05, respectively). The distinguished regions underwent gene ontology analysis using GeneTrail2 (version 1.6) website [23,24].

Methylome-Transcriptome Correlation
To further investigate the methylome-transcriptome correlation in DCM, we accomplished an epigenome-wide association analysis, modified from a genome-wide association study (GWAS), between the DNA methylation measurement (beta value) and PSI score. The correlational analyses were done both in DCM patients and in control subjects. The advantage of this approach was that not only local but also remote regulation of intronic DNA methylation on the exon inclusion within the same gene could be thoroughly explored. The single locus analysis was used. Logistic regression was applied as it permits adjustment with additional parameters and provides odds ratios as effect sizes [25]. An odds ratio and a p value were calculated for each generated regression model. The first two principal components were integrated in the regression model as covariates in order to account for possible population stratification and to minimize the inflation. The robustness of this genome-wide association approach was evaluated and subsequently optimized following the criteria suggested commonly [26]. In addition to the above-mentioned procedure, we carried out genome-wide statistical tests of significance for the difference between correlation coefficients in DCM patients and in controls. The aim of this approach was to identify genomic regions with significantly divergent methylome-transcriptome relationship between DCM patients and controls. The statistical tests were performed using Fischer's z test with the help of the cocor package in R programming [27].
The significance threshold was determined to be FDR < 0.05 by Benjamini-Hochberg in the screening cohort and a raw p value < 0.05 in the replication cohort.

DCM-Related Reconnection of mRNA Expression
We first normalized the read counts of RNA-Seq of each gene in each sample. In the sample-sample distance plot of the screening cohort, clustering of the majority of DCM samples could be seen ( Figure 1B), while only a few DCM samples were close to the cluster of control samples. This was an expected picture confirming the known heterogenous transcriptomic profile of DCM patients having disease states ranging from mild to severe. In the corresponding principal component analysis, a similar distribution pattern was detected ( Figure 1C). In the sample-sample distance plot of the replication cohort, only one sample in each group were outliers ( Figure S1A), and the same effect can be seen in the PCA plot ( Figure S1B). Overall, the existence of outliers in the data might render further analysis more challenging; however, since our attempt was to pinpoint only the most robust associations, we set to work with all samples. In the analysis for differential gene expression, 1330 genes were found to be significantly differentially expressed between DCM and control samples with a significance threshold of FDR < 0.05 (Supplemental File 1). Of these, 259 genes were upregulated and 1071 genes were downregulated (Supplemental File 2). These findings indicate that, on the transcriptomic level, orchestrated changes of gene expression govern the disease state ( Figure 1A). The FPKM (fragments per kilobase per million) scatter presents the relative expression between the DCM and control ( Figure 2B). The MA plot visualized the relationship between the mean of normalized count and the log fold change in the In the analysis for differential gene expression, 1330 genes were found to be significantly differentially expressed between DCM and control samples with a significance threshold of FDR < 0.05 (Supplemental File 1). Of these, 259 genes were upregulated and 1071 genes were downregulated (Supplemental File 2). These findings indicate that, on the transcriptomic level, orchestrated changes of gene expression govern the disease state ( Figure 1A). The FPKM (fragments per kilobase per million) scatter presents the relative expression between the DCM and control ( Figure 2B). The MA plot visualized the relationship between the mean of normalized count and the log fold change in the analyzed samples ( Figure 2A). As examples of the upregulated genes in DCM, we demonstrated the gene browser tracks of genes NPPA and NPPB ( Figure 2C). NPPA and NPPB genes encode natriuretic peptides, ANP and BNP, respectively, which are commonly used as biomarkers in diagnostics and monitoring of DCM since they are strongly associated with the disease. J. Clin. Med. 2020, 9, x FOR PEER REVIEW 7 of 18 peptides, ANP and BNP, respectively, which are commonly used as biomarkers in diagnostics and monitoring of DCM since they are strongly associated with the disease. In the gene ontology analysis for cellular components (Supplemental File 3), the upregulated genes were found to be enriched for several neural system components, extracellular matrix components, ion channel complexes, as well as contractile fibers and sarcolemma (FDR < 0.05). The down-regulated genes were enriched for several immunological complexes, ribosomal subunits, and numerous cellular membranous components, including reticulum membrane (FDR < 0.05). These are typical findings of DCM pathogenesis. In summary, the data on whole-transcriptomes from patients and controls accentuates the distinct expression landscape of mRNA transcripts. This raised the question of whether the individual transcripts are also differentially composed, e.g., by alternative splicing.  In the gene ontology analysis for cellular components (Supplemental File 3), the upregulated genes were found to be enriched for several neural system components, extracellular matrix components, ion channel complexes, as well as contractile fibers and sarcolemma (FDR < 0.05). The down-regulated genes were enriched for several immunological complexes, ribosomal subunits, and numerous cellular membranous components, including reticulum membrane (FDR < 0.05). These are typical findings of DCM pathogenesis. In summary, the data on whole-transcriptomes from patients and controls accentuates the distinct expression landscape of mRNA transcripts. This raised the question of whether the individual transcripts are also differentially composed, e.g., by alternative splicing.

Epigenome-Wide Linkage of DNA Methylation and Inclusion of Exon
In the data exploration of 394,247 qualified methylation probes in the screening cohort, two principal components could well separate the samples of DCM patients from samples of control subjects, and the two clusters had an overlapping area in the middle ( Figure S3A). In the replication cohort, the clustering of DCM patients and control subjects was more delineated in the PCA plot ( Figure S3B), as only one outlier of the DCM sample stood out in the top right quadrant.
Of all probed sites of the Infinium HumanMethylation 450 K BeadChip, 88,699 probes were found to locate in intronic regions, comprising approximately 20% of all probes. Further, we identified around 33% of all exons to have accessible methylation measurements in their neighboring introns (either upstream, downstream, or both-sides). Subsequently, these identified exonic regions and their flanking introns with available methylation measurements were analyzed as pairs together in order to inspect the possible regional effects of intronic methylation on the inclusion of alternative exons. The analyzable pairs included 41,158 intron-exon-pairs and 41,253 exon-intron pairs across the whole genome ( Figure 3). The association between intronic DNA methylation and the calculated exon usage was modeled with logistic regression, which consequently showed a robust positive correlation between intronic DNA methylation and exon usage (up-and downstream, p value < 2 × 10 −16 and p value < 2 × 10 −16 , respectively), even after adjustment for intron-exon distance (Table 1).

Epigenome-Wide Linkage of DNA Methylation and Inclusion of Exon
In the data exploration of 394,247 qualified methylation probes in the screening cohort, two principal components could well separate the samples of DCM patients from samples of control subjects, and the two clusters had an overlapping area in the middle ( Figure S3A). In the replication cohort, the clustering of DCM patients and control subjects was more delineated in the PCA plot ( Figure S3B), as only one outlier of the DCM sample stood out in the top right quadrant.
Of all probed sites of the Infinium HumanMethylation 450 K BeadChip, 88,699 probes were found to locate in intronic regions, comprising approximately 20% of all probes. Further, we identified around 33% of all exons to have accessible methylation measurements in their neighboring introns (either upstream, downstream, or both-sides). Subsequently, these identified exonic regions and their flanking introns with available methylation measurements were analyzed as pairs together in order to inspect the possible regional effects of intronic methylation on the inclusion of alternative exons. The analyzable pairs included 41,158 intron-exon-pairs and 41,253 exon-intron pairs across the whole genome (Figure 3). The association between intronic DNA methylation and the calculated exon usage was modeled with logistic regression, which consequently showed a robust positive correlation between intronic DNA methylation and exon usage (up-and downstream, p value < 2 × 10 −16 and p value < 2 × 10 −16 , respectively), even after adjustment for intron-exon distance (Table 1).

Co-Occurrence of Differential Exon Usage and Differential DNA Methylation between DCM and Control
We attempted to define regions with differential DNA methylation levels and associated differential exon usage between DCM patients and controls. This approach was set to provide important mechanistic insights into repatterning of epigenetic regulation during cardiac disease. We carried out statistical tests for differential exon use (DEU) of all exons in all gene transcripts, as well as differential methylated regions (DMR) of all probes across the whole epigenome between DCM patients and healthy controls. As mentioned in the Methods section, the DEXSeq package was used to perform statistical tests for differential exon use. During the process, we estimated the variability of RNA sequences data in each exonic part of each gene in each sample ( Figure S4A) to effectively distinguish between actual effects across different conditions (DCM vs. control) and noises caused by biological or technical variations. Further, dispersion per exon was evaluated (black dots) and a mean relative to it was determined (rot dots) based on the estimated dispersion. Finally, the dispersion could be shrunk (blue dots) and utilized as an effective reference to examine differential exon usage. Next, statistical testing was carried out for all annotated exonic bins to determine if the fraction of the reads aligned to specific exons was different between DCM and control samples. We were able to identify 22,871 out of in total 644,354 (4%) exonic regions to be differentially used with a p value less than 5%. The MA plot ( Figure S4B) visualized the differential exon usage based on the number of the reads mapped to each exonic region. These exonic regions were located in 8631 of the total 60,153 coding and non-coding genes (14%) annotated in hg19/GrCh37.
As mentioned in the Methods, we used the limma package to implement epigenome-wide statistical tests of differential methylated regions (DMR) between DCM patients and controls. As a result, we detected 13,223 of the total 394,247 (3%) probes to be significantly differentially methylated (p value <0.05). When overlaying the hits of DEU and DMR on the reference genome, we found 706 intron-exon pairs from 630 genes as well as 650 exon-intron pairs from 564 genes with concomitant DEU and DMR ( Figure 4A). As an example of these identified candidate regions, we generated gene browser tracks for LDB3 ( Figure 5). In the gene ontology analysis, these detected genes were enriched for critical cellular components in the sarcomere, such as myofibril, contractile fibers, and actomyosin (FDR < 0.05), which are highly relevant in DCM. The relevant results of gene ontology analysis are depicted in Figure 4B, and detailed information can be found in Supplemental File 4.

Epigenome-Wide Association of Intronic DNA Methylation and Splicing in DCM vs. Control
We first performed a genome-wide analysis of variance (ANOVA) and discovered that, on average, 2.71% (95%CI: 2.70-2.72, n = 2,684,933 tests) of the variance of PSI scores could be explained by the intronic DNA methylation on the same gene, which is sizable at the genomic scale. Thereafter, we carried out an epigenome-wide association study between methylation measurements and PSI scores of exons within the very same gene. Correlation coefficient, odds ratio, and p value of each association test were determined. Moreover, we compared the correlation coefficients between DCM and control samples, that is, we carried out statistical tests of significance for the difference between correlation coefficients in DCM patients and those in controls. As shown in the Manhattan plot for genome-wide statistical tests ( Figure 6A), several loci with significantly different (FDR <0.05) correlation coefficients between DCM patients and controls were detected in the screening cohort, signifying a disease-dependent differential impact of DNA methylation on alternative splicing.
In the replication cohort, five exonic regions in TTN-AS1 as well as one exonic region in DCTN1 were validated with statistical significance (p < 0.05). However, only the five verified exonic regions in TTN-AS1 could also be "directionally replicated" in the replication cohort ( Figure 6B), with a positive correlation between PSI scores and methylation values in DCM patients, as well as a negative correlation between PSI scores and methylation values in healthy controls, as shown in Figure 7. Tables 2 and 3 list the odds ratios per 0.01 increments of DNA methylation (beta value) and p values derived from logistic regression models of both cohorts. The values in the parentheses indicate the 95% confidence interval. In all identified candidates, the odds ratios in DCM were less than one, indicating a negative association, that is, if there is an increase of DNA methylation in DCM patients, a decrease of PSI value is expected, and vice versa. On the other hand, in all identified candidates, the odds ratios in control samples were greater than one, suggesting a positive correlation. In terms of significance level, the positive association between DNA methylation and PSI value in control samples reached statistical significance both in screening and replication cohorts. However, the positive correlation between DNA methylation and PSI values in the DCM samples reached statistical significance only in the replication cohort, which could be due to less noise in the replication cohort owing to the stranded RNA-Seq, providing more of an advantage for antisense analysis.

Epigenome-Wide Association of Intronic DNA Methylation and Splicing in DCM vs. Control
We first performed a genome-wide analysis of variance (ANOVA) and discovered that, on average, 2.71% (95%CI: 2.70-2.72, n = 2,684,933 tests) of the variance of PSI scores could be explained by the intronic DNA methylation on the same gene, which is sizable at the genomic scale. Thereafter, we carried out an epigenome-wide association study between methylation measurements and PSI scores of exons within the very same gene. Correlation coefficient, odds ratio, and p value of each association test were determined. Moreover, we compared the correlation coefficients between DCM and control samples, that is, we carried out statistical tests of significance for the difference between correlation coefficients in DCM patients and those in controls. As shown in the Manhattan plot for genome-wide statistical tests ( Figure 6A), several loci with significantly different (FDR <0.05) correlation coefficients between DCM patients and controls were detected in the screening cohort, signifying a disease-dependent differential impact of DNA methylation on alternative splicing.
In the replication cohort, five exonic regions in TTN-AS1 as well as one exonic region in DCTN1 were validated with statistical significance (p < 0.05). However, only the five verified exonic regions in TTN-AS1 could also be "directionally replicated" in the replication cohort ( Figure 6B), with a positive correlation between PSI scores and methylation values in DCM patients, as well as a negative correlation between PSI scores and methylation values in healthy controls, as shown in Figure 7. Tables 2 and 3 list the odds ratios per 0.01 increments of DNA methylation (beta value) and p values derived from logistic regression models of both cohorts. The values in the parentheses indicate the 95% confidence interval. In all identified candidates, the odds ratios in DCM were less than one, indicating a negative association, that is, if there is an increase of DNA methylation in DCM patients, a decrease of PSI value is expected, and vice versa. On the other hand, in all identified candidates, the odds ratios in control samples were greater than one, suggesting a positive correlation. In terms of significance level, the positive association between DNA methylation and PSI value in control samples reached statistical significance both in screening and replication cohorts. However, the positive correlation between DNA methylation and PSI values in the DCM samples reached statistical significance only in the replication cohort, which could be due to less noise in the replication cohort owing to the stranded RNA-Seq, providing more of an advantage for antisense analysis. The PSI scores of the validated exonic regions (green) in TTN-AS1 were significantly associated with the methylation level of the highlighted locus (red). The first track is a reference transcript of TTN, the following three tracks are transcripts of TTN-AS1. The last four tracks were added to visualize the log-scaled RNA-Seq coverage in DCM and control, in both screening and replication cohorts. It should be noted that the RNA-Seq of the replication cohort was stranded, while the RNA-Seq of the screening cohort was unstranded. Hence, coverage in the screening cohort is noisier than in the replication cohort. Nevertheless, the candidates in TTN-AS1 could be replicated in the replication cohort with statistical significance. The PSI scores of the validated exonic regions (green) in TTN-AS1 were significantly associated with the methylation level of the highlighted locus (red). The first track is a reference transcript of TTN, the following three tracks are transcripts of TTN-AS1. The last four tracks were added to visualize the log-scaled RNA-Seq coverage in DCM and control, in both screening and replication cohorts. It should be noted that the RNA-Seq of the replication cohort was stranded, while the RNA-Seq of the screening cohort was unstranded. Hence, coverage in the screening cohort is noisier than in the replication cohort. Nevertheless, the candidates in TTN-AS1 could be replicated in the replication cohort with statistical significance.

Discussion
The present study utilized an epigenome-wide association approach to examine the interaction between DNA methylome and splicing of the transcriptome in the heart, as both biological processes were only recently shown to play an essential regulatory role in DCM. A significant positive correlation between intronic DNA methylation and usage of adjacent exons was detected. Moreover, we pinpointed and stringently validated several regions in TTN-AS1 with a disease-dependent differential regulation of DNA methylation on alternative splicing. This is the first study to investigate in the full epigenome the complex yet highly ordered orchestration of methylome and transcriptome in the healthy human heart as well as in DCM.
In the past few years, GWAS have helped to identify several novel genomic regions associated with cardiac phenotypes. As a result, there has been a rapid progress in functional genetics to assist in the exploration of biological meaning of disease-associated genomic regions. Although genome studies massively advanced our knowledge of DCM, plenty of biological mechanisms of disease still need to be deciphered, and investigations on epigenetic-genetic and epigenetic-transcriptomic levels have been proposed to provide yet another crucial piece in disease etiology [28]. As an example, Wang et al. implemented the GWAS approach on an epigenomic dataset to identify signatures related to clinical parameters, such as from electrocardiograms (ECG). Eventually, they were able to experimentally validate the findings in iPSC cardiomyocytes [29]. The present study relied on human cardiac tissue as disease-relevant processes are often tissue-and species-specific [30]. While other studies attempted to investigate DNA methylation and alternative splicing in cell cultures, our approach is the first to inspect the relationship between DNA methylation and alternative splicing in heart muscle disorders using human cardiac tissues [31].
We discovered a positive correlation between DNA methylation of the flanking introns and the inclusion of the bordering exon across the whole genome. This relationship exists both in DCM and control subjects. There are three potential mechanisms underlying these findings. First, as early as in 1988, it was identified that splicing occurs during transcription [32]. Hence, it is possible that the detected association is mediated by some specific DNA-binding proteins, such as CTCF and MeCP2. These proteins can change their binding affinity to DNA by apprehending methylation signatures of DNA. When they are bound to DNA, they can impact the elongation rate of RNA polymerase II, influence the time for the splicing machinery to recognize weak splice-sites, and subsequently affect the inclusion of alternative exons [33][34][35]. Aside from the known proteins, other DNA-binding proteins with similar functions could still exist and be undiscovered so far. Second, it is also possible that DNA methylation-dependent recruitment of splicing factors takes place. For example, it has been reported that the adaptor protein HP1 can recruit splicing factors if bound to methylated DNA [36]. Interestingly, it has been suggested that histone modifications could facilitate the splicing factors to bind to pre-mRNA [37], while there is literature reporting the strong link between histone modifications and DNA methylation [38][39][40], which is in line with our theory. Third, based on the evidence demonstrating DNA methylation's correlation with nucleosome occupancy as well as the regulatory role of DNA methylation on the modification of histones [41][42][43][44][45][46], it is reasonable to speculate that DNA methylation influences splicing through regulating the orchestration of chromatin remodeling and nucleosome positioning, especially the nucleosome positioning relative to the splice sites of interest [47], while more in-depth understanding of the interaction between DNA methylation and nucleosome occupancy is still needed.
In the present study, we identified numerous regions on TTN-AS1 with a DCM-dependent differential regulation of DNA methylation on alternative splicing, while TTN-AS1 is practically an inverse counterpart to TTN. TTN-AS1 encodes Titin antisense 1, which is a long noncoding RNA (lncRNA) that produces an estimate of 80 different transcripts. In the literature, lncRNAs were reported to play a role in cardiac development and regeneration, in the pathogenesis of cardiovascular diseases, as well as in the doxorubicin-induced cardiac toxicity, which predisposes people to DCM [48][49][50]. Furthermore, disrupted splicing of lncRNAs were found to cause dysregulation of important cardiac proteins in mice, such as potassium voltage-gated channel proteins encoded by Kcnq1 and Kcna2 [51][52][53]. In addition, a cluster of antisense lncRNAs in the MYH7 locus was noticed to be essential in early development of cardiomyopathy under pressure-overload [54]. Interestingly, in recent studies, Titin antisense 1 was shown to act as a competing endogenous RNA (ceRNA) to sequester diverse microRNAs (miRNAs) and transcription factors, thus, consequently facilitating tumor aggressiveness in several cancers, including lung cancer [55], cervical cancer [56], esophageal cancer [57], gastric cancer [58], papillary thyroid cancer [59], colorectal cancer [60], and prostate cancer [61]. However, although TTN-AS1 is highly expressed in the heart, there is minimal understanding of its cardiac role in the literature, and more investigations are required. The validated regions were found to locate at the counterpart locus of the genomic region encoding the A-band of titin. This finding is intriguing, because genetic variations in titin A-band are the leading cause of DCM [62,63]. Hence, based on our finding, it is not unreasonable to speculate that dysregulated splicing of Titin antisense 1 might be able to induce deleterious exon-skipping in the A-band of titin, which may mimic the pathologies seen in TTNtv, and that modification of this region might even be a therapeutic principle [64,65]. Hence, the current study provides new understanding of the regulation of this important gene locus.
A potential limitation of the here conducted epigenome-wide approach is the sparse preexisting data on false-positive and false-negative discovery rates and adequate power calculations of epigenome-wide association analysis and in particular the here conducted multi-omics type of analysis [29]. Our approach, although explorative in the screening stage, used an independent validation step. Since cardiac tissue is highly limited for research studies, the sizes of both cohorts were still relatively small compared to traditional GWAS. However, the combination of information from a priori connected biological processes and coordinated molecular layers is able to reduce false-positive discoveries and add statistical power [8].
In conclusion, this study emphasizes the intricate interplay between the DNA methylation landscape and the mRNA splicing machinery. With a state-of-the-art epigenome-wide approach and utilization of human cardiac tissue as study material, a new understanding of the genome-epigenome relationship in DCM was presented. We showed that dysregulated methylation of the gene encoding Titin antisense 1 is associated with its splicing, which could induce pathological exon-skipping during the transcription of TTN.
Supplementary Materials: The supplementary materials are available online at http://www.mdpi.com/2077-0383/ 9/5/1499/s1. Figure S1: RNA-Seq in the replication cohort. Figure S2: PCA of RNA-Seq in both cohorts. Figure S3: PCA of DNA methylation measurement in both cohorts. Figure S4: Dispersion plot and MA plot of differential exon usage analysis in the screening cohort. Figure S5: Histograms showing the sequencing depth of both cohorts. File S1: Results of differential gene expression analysis. File Table S2: List of up-and down-regulated genes. File Table S3: Gene ontology analysis of up-and down-regulated genes. File S4: Gene ontology analysis of regions with cocurrent DEU and DMR.