1. Introduction
Duchenne muscular dystrophy (DMD) is a debilitating and lethal neuromuscular disorder. Diagnosis is made between the age of two and five, but transcriptomes from pre-symptomatic DMD patients reveal typical dystrophic abnormalities [
1]. In fact, studies of human fetuses [
2,
3,
4] and in various animal models [
5,
6,
7] revealed that the pathology starts during prenatal development and continues into adulthood. The first DMD defects are detectable in developing cells even before their differentiation into muscle [
8]. Given that muscle regeneration replicates processes occurring in muscle development, and that some developmental mechanisms are reactivated in tumors, it is intriguing that changes in
DMD gene expression are increasingly being described in various malignancies [
9]. Moreover,
DMD downregulation affects cell proliferation [
10,
11,
12], adhesion, migration, and invasion [
12]—traits that are commonly associated with tumor development. Gaining some understanding of whether these alterations, in a range of diverse cells, have a common origin and lead to a related outcome would expand our knowledge of the role of the
DMD gene and potentially open up new treatment avenues.
Such analysis must take into consideration the complexity of DMD, the largest human gene known, with 79 exons and eight independent tissue-specific promoters driving the expression of distinct dystrophin isoforms. Three promoters control the expression of 14-kb full-length transcripts encoding 427 kDa isoforms (Dp427). The Dp427m is expressed in myofibers and muscle stem (satellite) cells. In myofibers, it interacts with the dystrophin-associated protein complex (DAPC) [
13] with structural and scaffolding roles and an involvement in the regulation of various signaling pathways [
14]. Yet, in satellite cells, Dp427m has a different interactome and it is essential for asymmetric cell divisions [
11,
15]; in myoblasts the loss of its expression results in abnormalities of cell proliferation and migration [
10,
16,
17]. Dp427c and Dp427p full-length dystrophins are expressed in various neurons [
18,
19], where their loss during development has been linked to the neuropsychological impairment in DMD.
Moreover, intragenic promoters give rise to transcripts encoding truncated isoforms: Dp260 in the retina, Dp140 in the CNS, Dp116 in Schwann glia [
20], and Dp71, which is the most ubiquitous
DMD product. Alternative splicing adds further structural and functional diversity [
21,
22].
DMD transcripts are summarized in
Figure S1 of Supplementary File S1.
Importantly, loss of Dp427 expression, which is both necessary and sufficient for the occurrence of Duchenne MD, was also associated with increased metastasis of tumors with myogenic programs [
17] and soft tissue sarcomas [
23]. In contrast to this potential tumor suppressor role of the full-length dystrophin, Dp71 expression was essential for myogenic tumor cell growth [
17,
23]. Interestingly, a putative role for the
DMD gene was also suggested in various non-myogenic cancers, including carcinomas [
24,
25,
26], melanoma [
27], leukemia [
28], lymphoma [
24], and CNS tumors [
29,
30,
31].
Yet, many of the tumors showing phenotypic changes associated with decreased
DMD gene expression originate from healthy tissues that are not generally believed to express the full-length dystrophin protein. Seemingly paradoxical, this observation is not limited to tumors: myoblasts, lymphocytes, and endotheliocytes express 14-kb dystrophin transcripts but are not known to synthesize detectable levels of dystrophin protein. Nevertheless,
DMD downregulation leads to significant functional abnormalities in these cells [
16,
32,
33].
We hypothesized that malignancy can be used as a model to investigate changes in DMD gene expression across normal tissues and corresponding tumors, and aid our understanding of the overall role of this gene, which clearly extends beyond Duchenne MD. Therefore, we studied DMD mRNA and protein expression across various normal tissues and matching tumors, and explored transcriptomic alterations in primary tumors and corresponding tumor cell lines with altered DMD expression to identify putative downstream molecular pathways that could be associated with DMD dysregulation across human tissues. We also examined the association of DMD gene expression with the onset and survival endpoints in tumor patients.
3. Results
3.1. Significant Expression of Dystrophin Transcript and Protein in a Range of Healthy Tissues
Expression data for 17 healthy tissues as well as skeletal muscle tissue from the GTEx database was examined. These were adrenal glands, bladder, breast, cervix, colon, esophagus, kidney, liver, lung, ovary, pancreas, prostate, skin, stomach, thyroid, uterus, and whole blood.
DMD expression levels across this spectrum of healthy tissues were compared with the expression levels of two housekeeping genes (HKGs),
PKG1 and
HMBS, which were identified as HKGs across 32 tissues in the GTEx database [
36].
DMD expression averaged 79.3% of
PKG1 (range between 30.8 and 94.6%) and 114.7% of
HMBS (43.7 to 137.4%) expression levels with the lowest expression found in whole blood (
Supplementary Table S3). In skeletal muscle,
DMD expression was 94.6% of
PKG1 and 138.2% of
HMBS.
DMD expression relative to PKG1 and HMBS was compared between skeletal muscle tissue and the previously mentioned 17 healthy tissues using a Kruskal-Wallis and Dunn’s multiple comparisons test. The expression of DMD relative to PKG1 in skeletal muscle was comparable to that in the pancreas, uterus, and bladder (adjusted p-Value > 0.9999), and DMD expression relative to HMBS was comparable to that in the uterus, ovary, and bladder (adjusted p-Value > 0.9999).
Given this widespread presence of considerable
DMD transcript levels, we investigated whether it is accompanied by protein expression. We interrogated mass spectrometry (MS) protein expression datasets available at Proteomics DB. MS identified the full-length dystrophin protein in a variety of normal tissues (
Supplementary Figure S2). Moreover, quantitative profiling of the proteome of 201 samples from 32 tissues in the GTEx database identified dystrophin as a housekeeping protein, as it was present in all of the 32 tissues analyzed [
36]. Dystrophin protein expression was statistically significantly and positively correlated with
DMD gene expression in those samples (Spearman correlation = 0.67, BH-adjusted
p-Value < 0.1).
Finally, in the Protein Abundance Database (PAXdb), dystrophin expression was ranked in the top 25% of MS-quantified proteins in a range of tissues, such as fallopian tubes, esophagus, uterus, bladder, colon, prostate, and rectum, in addition to the heart. Thus, data from three databases demonstrated significant expression of dystrophin protein in a range of healthy tissues. Therefore, we investigated whether alterations in DMD expression might occur in tumors that originate from tissues not commonly associated with dystrophin protein expression or function.
3.2. Downregulation of DMD Gene Expression across Malignancies
We investigated RNA-seq expression data for 25 different types of primary tumors from the TCGA database (carcinomas, melanoma, lymphoma, and leukemia) and their corresponding NATs from the TCGA or healthy GTEx tissues. The analyzed primary tumors were acute myeloid leukemia (LAML), adrenocortical carcinoma (ACC), bladder urothelial carcinoma (BLCA), breast invasive carcinoma (BRCA), cervical and endocervical cancer (CESC), cholangiocarcinoma (CHOL), colon adenocarcinoma (COAD), diffuse large B-cell lymphoma (DLBC), esophageal carcinoma (ESCA), head and neck squamous cell carcinoma (HNSC), kidney chromophobe cell carcinoma (KICH), kidney clear cell carcinoma (KIRC), kidney papillary cell carcinoma (KIRP), liver hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), ovarian serous cystadenocarcinoma (OV), pancreatic adenocarcinoma (PAAD), prostate adenocarcinoma (PRAD), rectal adenocarcinoma (READ), skin cutaneous melanoma (SKCM), stomach adenocarcinoma (STAD), thyroid carcinoma (THCA), uterine carcinosarcoma (UCS), and uterine corpus endometrioid carcinoma (UCEC).
Global
DMD gene expression was reduced in 20 out of 25 primary tumors in comparison to their corresponding control tissues (
Figure 1;
Supplementary Table S1). The largest difference in expression was found in primary breast invasive carcinoma (LogFC = −3.7), and the smallest in primary kidney papillary cell carcinoma (LogFC = −1).
DMD gene expression was increased in primary tumors in two out of 25 comparisons, namely in primary thyroid carcinoma and diffuse large B-cell lymphoma, with LogFC values of 0.9 and 4.8, respectively. No significant expression changes were found in three comparisons (acute myeloid leukemia, kidney clear cell, and chromophobe cell carcinomas).
To examine the impact of batch effects resulting from differences in processing between TCGA and GTEx samples [
45] in 13 comparisons, where
DMD expression was compared between TCGA tumors and their corresponding NATs, we also compared
DMD expression between these tumors and corresponding healthy tissues from the GTEx database. We found consistent results in 11 out of 13 comparisons, including BRCA, BLCA, UCEC, COAD, STAD, PRAD, LUSC, LIHC, LUAD, KICH, and THCA (
Supplementary Table S4). In KIRP,
DMD expression was reduced compared to kidney NAT in TCGA datasets (adjusted
p-Value = 9.30 × 10
−22), however, no statistically significant difference was found between
DMD expression in KIRP and GTEx healthy kidney tissue samples (adjusted
p-Value = 1). Moreover, there was no statistically significant difference in
DMD expression between KIRC and paired kidney NAT samples (adjusted
p-Value = 1), while
DMD expression was found upregulated in KIRC samples compared to GTEx healthy kidney tissue (adjusted
p-Value = 2.77 × 10
−8). These inconsistencies might result from small sample sizes leading to a reduction in statistical power.
Given the well-known multiplicity of transcripts originating from the DMD gene and the variability in their expression patterns, a hierarchical clustering analysis was performed to identify changes in DMD expression profiles at the transcript level across various control tissues and corresponding tumors.
3.3. Hierarchical Clustering Analysis of the Relative Expression of DMD Transcripts Distinguishes Tumor Tissues
Of the top ten highly expressed transcripts (
Supplementary Table S5), nine were predicted to be protein-coding (
Supplementary Figure S3). Three of these mRNAs were not canonical
DMD gene transcripts. Of those, ENST00000358062.6 encoding H0Y304 protein is a poorly annotated transcript (Ensembl). The start of its CDS is unknown due to a 5′ truncation of the available sequence. The 5′ sequence upstream of the first exon (equivalent to exon 48 of the full-length transcript) is composed of the last 50 base pairs of the intron located between exons 47 and 48. Therefore, it is not clear whether this is a pre-mRNA sequence undergoing co-transcriptional splicing or a mature mRNA with this part of the intron spliced in. If the latter, the predicted protein encoded by such a transcript would have an N-terminus longer than Dp140 with valine as the putative initiation codon (UniProt). The transcript encoding H0Y864 appears to be a partial four-exon transcript, whose predicted protein sequence does not encode any functional domains (
Supplementary Figure S3). ENST00000475732.1 is a two-exon sequence not predicted to encode a protein.
Alternative splicing is a discernible feature of
DMD transcripts found in the analyzed samples, with the splice variant Dp71b lacking exon 78 having the highest mean abundance level of all
DMD transcripts across the analyzed tumor and control tissues, followed by Dp427m. The mRNA encoding Dp140c differs from the canonical Dp140 transcript, as it lacks exons 28 to 31 (equivalent to exons 71 to 74 of the full-length transcript) (
Supplementary Figure S3).
Hierarchical clustering analysis of these ten highly expressed transcripts in tumor and control tissues yielded six clusters.
Supplementary Table S2 reports other dendrogram cut-offs. We focused mainly on three of these clusters (
Figure 2) that contained the majority of tumor and control tissues. The first cluster was composed of nine tumors and healthy whole blood and pancreas tissue. The second cluster was composed of 16 control tissues (including six healthy GTEx tissues and 10 NATs from the TCGA) as well as SKCM and PRAD tumor samples. Finally, the third cluster contained 12 tumors in addition to kidney and thyroid NATs from the TCGA.
The relative expression of the transcript encoding Dp427m was significantly lower in the first and third clusters, which were composed mainly of tumors, compared to the second cluster with a majority of control tissues, while the relative expression of Dp71 variants was higher in the first and third clusters (
Supplementary Figure S4). Specific transcriptomic alterations associated with decreased
DMD gene expression are described in
Section 3.7. Transcripts ENST00000493412.1 and ENST00000475732.1 did not show significantly different levels of expression between any of the clusters (
Supplementary Figure S4), and the differential expression of ENST00000358062.6 was not followed due to its uncertain annotation.
3.4. Changes in the Expression of Dp427m and Dp71 Transcripts across Malignancies
Given the importance of the full-length dystrophin, whose loss is responsible for Duchenne muscular dystrophy, and the predominance of the transcript encoding Dp427m (
Supplementary Table S5), its expression patterns were analyzed in more detail. As Dp71 is the isoform most widely expressed across the body and its splice variants were among the top highly expressed
DMD transcripts in tumor and control tissues (
Supplementary Table S5), we also compared expression patterns of this transcript (
Supplementary Table S6). Dp427m expression was decreased in primary tumors compared to control tissues in 17 out of 25 comparisons with LogFC values ranging from −7.2 to −2.7 in primary uterine carcinosarcoma and primary thyroid carcinoma, respectively (
Figure 3). There was a statistically significant change in the expression of transcripts encoding Dp71 in 10 out of 25, Dp71ab in 13 out of 25, while Dp71b was altered in 20 out of 25 comparisons (a decrease in 17 and an increase in three) (
Supplementary Table S6).
Further analysis of these
DMD transcripts showed that changes in overall
DMD expression levels in two tumor categories were confounded by the opposing dysregulation of Dp427m and Dp71 variants. Specifically, while
DMD expression was higher in primary thyroid carcinoma compared to thyroid NAT, Dp427m expression there was lower compared to thyroid NAT, and the increase in overall
DMD gene expression resulted from the elevated expression of Dp71b (
Supplementary Figure S5A). In contrast, in primary pancreatic adenocarcinoma, total
DMD expression was lower compared to the healthy pancreas tissue, but this decrease resulted from the lower expression of transcripts encoding Dp71 variants, while Dp427m expression was higher in this tumor type (LogFC = 3.7 compared to healthy pancreas tissue) (
Supplementary Figure S5B).
3.5. DMD Expression Downregulation Occurs Irrespective of Somatic Mutations within the DMD Locus
Next, using datasets for samples from 23 of the previously mentioned TCGA tumors (LAML and SKCM samples were excluded as they did not have mutation and SCNA details available) we investigated the association between DMD expression and mutations in coding (CRs) and non-coding regions (NCRs) as well as SCNAs in the DMD gene.
The majority of tumor samples had no identified CR mutations in the
DMD gene locus (6201 out of 6751). A univariate GLM analysis was carried out to assess the effect of CR mutations and gender (to account for the X-chromosome localization of the
DMD gene) as well as their interaction on
DMD expression. The GLM indicated a significant effect for CR mutations on
DMD expression (
p = 0.007). There was no effect for gender (
p = 0.703) or the interaction between CR mutations and gender (
p = 0.648). Samples with missense and multiple mutations had significantly lower levels of
DMD expression compared to samples with no CR mutations in the
DMD locus (
p < 0.001 and
p = 0.043, respectively) (
Figure 4A). A Chi Square test revealed that there is an overrepresentation of
DMD mutations in tumor samples with low
DMD expression (X
2(1) = 45.44,
p < 0.0001). Specifically, 11.7% of samples in the low
DMD group (237 out of 2025) had
DMD CR mutations compared to 5.73% (116 out of 2025) in the high
DMD group. However, in the low
DMD group, 88.3% of samples (1788 out of 2025) did not have any detectable mutations in the CRs of the
DMD locus.
As for NCR mutations, 98.2% of tumor samples (6632 out 6751) had no mutations in the non-coding regions of the
DMD gene. NCR mutations were found to have a significant effect on
DMD expression (
p = 0.031), but no effect was found for gender (
p = 0.104) or the interaction between gender and NCR mutations (
p = 0.698). Samples with intronic mutations had significantly lower
DMD expression compared to samples without any
DMD NCR mutations (
p = 0.02) (
Figure 4B). A Chi Square test revealed that there is an overrepresentation of
DMD NCR mutations in tumor samples with low
DMD expression (X
2(1) = 18.71,
p < 0.0001). In the low
DMD group, 3.01% of tumor samples (61 out of 2025) had
DMD NCR mutations compared to 1.09% of samples (22 out of 2025) in the high
DMD group. However, 96.99% of samples in the low
DMD group (1964 out of 2025) did not have any NCR mutations in the
DMD locus.
Regarding SCNAs, 66.7% of samples had a normal copy number of the
DMD locus (4503 out of 6751). Both SCNAs (
p < 0.001) and gender (
p = 0.006) had a significant effect on
DMD expression. However, the interaction between the two factors did not have a significant effect. Samples with a normal
DMD copy number had higher expression compared to samples with deep and shallow deletions and those with gains (
p < 0.001). Additionally, samples with deep deletions had lower expression compared to those with amplifications (
p = 0.002) (
Figure 4C).
In samples from female patients, we found that there is an overrepresentation of DMD SCNAs in the low DMD group (X2(1) = 36.40, p < 0.0001), where 39.08% of samples (399 out of 1021) had SCNAs in the DMD locus compared to 26.54% in the high DMD group (271 out of 1021). In the low DMD group, 60.92% of samples (622 out of 1021) did not have any SCNAs.
In samples from male patients, we also found that there is an overrepresentation of DMD SCNAs in the low DMD group (X2(1) = 69.82, p < 0.0001), where 44.18% of samples (444 out of 1005) had SCNAs in the DMD locus compared to 26.37% in the high DMD group (265 out of 1005). In the low DMD group, 55.82% of samples had no SCNAs.
We further investigated the association between DMD expression and mutations using data from 921 tumor cell lines (cBioPortal). The majority of tumor cell lines had no identified mutations within the DMD gene region (n = 773), while the remaining had missense (n = 120), truncating (n = 11), splice (n = 4), and multiple mutations (n = 13). We found that there is an overrepresentation of DMD mutations in tumor cell lines with low levels of DMD expression (X2(1) = 9.727, p = 0.0018). However, only 23% of tumor cell lines with low DMD gene expression (63 out of 276) had DMD mutations, but the majority of cell lines (213 out of 276) had low levels of DMD expression without any detectable mutations.
Therefore, downregulation of DMD expression in tumors cannot be simply attributed to somatic mutations or copy number alterations within the DMD locus, but rather involves a regulatory mechanism.
3.6. Association between DMD Expression and Cancer Stage and Patient’s Age
We investigated the association between DMD expression and stage in 18 different types of primary cancers with available data for stage, gender, and patient’s age at the initial diagnosis. These tumors included ACC, BLCA, BRCA, CHOL, COAD, ESCA, HNSC, KICH, KIRC, KIRP, LIHC, LUAD, LUSC, PAAD, READ, SKCM, STAD, and THCA. The ages of patients ranged from 14 to 90 years old.
A GLM analysis was performed to assess the effect of cancer stage on DMD expression while accounting for gender and age differences. Cancer stage was found to have an effect on DMD expression (p = 0.035). While no effect was identified for gender differences (p = 0.735), the interaction between cancer stage and gender was found to have an impact on DMD expression (p = 0.010). Age of patients was also found to have an impact on DMD gene expression (p < 0.001) as well as the interaction between age and cancer stage (p = 0.003).
Samples from patients with stage I cancer had significantly higher levels of
DMD gene expression compared to those with stage II (LogFC = 0.72,
p < 0.001), stage III (LogFC = 0.50,
p < 0.001), and stage IV cancer (LogFC = 0.69,
p < 0.001) (
Figure 5A). Moreover, samples from younger patients had significantly higher
DMD expression compared to samples from older patients (
Figure 5B).
Twelve of the tumor tissues used in the previous analysis have corresponding healthy tissues in the GTEx database: adrenal glands, bladder, breast, colon, esophagus, kidney, liver, lung, pancreas, skin, stomach, and thyroid. We examined whether there is an association between DMD gene expression and age in these healthy tissue samples (n = 2863). The ages of donors ranged from 20 to 79 years old, and samples were grouped into six age groups: 20–29, 30–39, 40–49, 50–59, 60–69, and 70–79 years. There was no statistically significant effect for age (p = 0.484), gender (p = 0.647), or their interaction (p = 0.591) on DMD expression in these healthy samples.
3.7. Decreased DMD Gene Expression in Primary Tumors Is Associated with Specific Transcriptomic Alterations
Expression of the
DMD gene is significantly altered in tumors, with the majority having lower expression levels compared to their respective control tissues. This downregulation predominantly affects the transcript encoding the full-length dystrophin due to a regulatory alteration. To identify downstream molecular pathways that could be associated with such
DMD downregulation (analogous to the impact of full-length dystrophin loss in muscle), we compared transcriptomes of primary tumor samples with low vs. high
DMD gene expression from 15 different tumor types: BRCA, BLCA, UCEC, CESC, OV, COAD, STAD, PRAD, LUSC, ESCA, HNSC, LUAD, KIRP, and THCA, where Dp427m was found to be downregulated compared to control tissues. In addition, we included sarcoma (SARC) in this comparison because sarcomas originate from tissues (muscle and bone) known to express the
DMD gene. For clarification, RNA-seq data for control tissues for sarcomas was not available in the TCGA TARGET GTEx cohort, and therefore
DMD gene expression in sarcomas vs. control tissues could not be compared. In each tumor type, the DEGs between samples at the bottom 33.3% and top 33.3% of
DMD expression were identified and used to perform a pathway enrichment analysis.
Figure 6 shows the combined score values (−Log (
p-Value) × odds ratio) for KEGG pathways that were found to be enriched in DEGs in more than 50% of the analyzed primary tumors (adjusted
p-Value < 0.05).
The differentially expressed transcripts in tumors with low DMD expression showed enrichment in the ECM-receptor interaction pathway in all 15 primary tumors analyzed. Calcium signaling and protein digestion and absorption were enriched in 13 tumors, cell adhesion molecules in 12, focal adhesion in 11, PI3K and cAMP signaling in 10, Wnt signaling in 9, and cGMP-PKG signaling and axon guidance were enriched in 8 comparisons.
The specific GO Biological Process terms enriched in DEGs in primary tumors with low vs. high
DMD expression were also identified (
Figure 7). The GO terms enriched in more than 50% of tumors were: extracellular matrix organization (14 out of 15 tumors), axonogenesis (12), regulation of cell migration, synapse organization and nervous system development (11), skeletal system development, regulation of ERK1 and ERK2 cascade (10), and calcium ion transmembrane import into cytosol (8). Lists of DEGs in each comparison and results of the pathway and GO term analysis can be found in
Supplementary File S2.
In order to confirm that these transcriptomic changes between primary tumor samples with low vs. high
DMD expression are associated with
DMD downregulation and not a result of other factors, we repeated the analysis for each tumor type using three groups of tumor samples identical in size to the groups used in the previous analysis, but that were chosen at random.
DMD gene expression was not statistically significantly different between the three random groups in all of the 15 tumors, and no statistically significant differences were found in the transcriptomes between these groups, confirming the specific gene expression alterations to be associated with
DMD downregulation (
Supplementary File S2).
3.8. Transcriptomic Alterations Associated with Decreased DMD Gene Expression in Tumor Cell Lines
To further confirm that these transcriptomic changes are evoked by
DMD downregulation specifically in cancer cells rather than originating from
DMD expression in the stromal or infiltrating immune cells present in tumor tissue samples, we conducted a two-class comparison analysis using the DepMap portal to identify DEGs between cell lines of the same tumor origin (carcinoma and sarcoma) but with low vs. high level of
DMD gene expression. While the
DMD gene was not found to be essential for tumor cell line survival [
46,
47], its altered expression may have an important common effect across malignancies.
One-hundred and forty tumor cell lines were grouped into four groups based on their level of
DMD expression (low or high) and their origin (carcinoma or sarcoma) (
Supplementary File S2). The ranges for
DMD expression for these cell lines were as follows (Unit: log
2 (TPM + 1)): carcinoma cell lines with high (3.3–7.4) and low (0–0.01)
DMD expression, sarcoma cell lines with high (4.07–7.6) and low (0–0.06)
DMD expression. The carcinoma cell lines used in this analysis originated from tissues where primary tumors were found to have a lower level of Dp427m transcript compared to control tissues. The two-class comparison analysis between cell lines with low vs. high
DMD expression identified 998 DEGs for carcinoma, and 543 DEGs for sarcoma cell lines. Interestingly, in carcinoma cell lines with low
DMD expression, the majority of DEGs were downregulated (n = 976 out of 998).
Figure 6 shows the combined score values (−Log (
p-Value) × odds ratio) for KEGG pathways that were found to be enriched in DEGs in these two comparisons (adjusted
p-Value < 0.05).
Pathway enrichment for the DEGs in carcinoma cell lines with low DMD expression suggested that DMD downregulation may affect the following KEGG pathways: ECM-receptor interaction (p = 0.037), protein digestion and absorption (p = 0.015), focal adhesion (p = 0.029), PI3K-Akt signaling (p = 0.021), cAMP signaling (p = 0.044), cGMP-PKG signaling (p = 0.024), and axon guidance (p = 0.002). Notably, these pathways were also enriched in the DEGs in more than 50% of primary tumors with low DMD expression.
Similar to carcinoma cell lines, the following KEGG pathways were enriched in the DEGs in sarcoma cell lines with low DMD expression: ECM-receptor interaction (p = 0.048), PI3K-Akt signaling (p = 0.041), and cAMP signaling (p = 0.034). The calcium signaling pathway, which was enriched in the DEGs in more than 50% of primary tumors with low DMD expression, was also enriched in the DEGs in the comparison of sarcoma cell lines (p = 7.24 × 10−4).
Moreover, GO Biological Process term analysis for DEGs in carcinoma cell lines with low vs. high
DMD expression (
Figure 7 and
Supplementary File S2) identified the following terms: extracellular matrix organization (
p = 5.22 × 10
−4), cell junction assembly (
p = 0.012), and positive regulation of epithelial to mesenchymal transition (
p = 0.038). Terms related to the development of the CNS and synaptic transmission: nervous system development (
p = 8.39 × 10
−5), neuron migration (
p = 8.39 × 10
−5), and regulation of neuron projection development (
p = 1.04 × 10
−5) were also found.
The GO Biological Process term regulation of cell migration was enriched (
p = 0.02) in DEGs in sarcoma cell lines with low vs. high
DMD expression (
Figure 7 and
Supplementary File S2). Lists of the DEGs in each comparison and results of the pathway and GO term analysis can be found in
Supplementary File S2.
3.9. Transcriptomic Alterations in Duchenne Skeletal Muscle Compared to Healthy Muscle Samples
Next, we investigated whether the transcriptomic changes resulting from
DMD downregulation in primary tumors and tumor cell lines are similar to those observed in skeletal muscles of DMD patients. We compared gene expression data from 12 DMD skeletal muscle and 11 healthy muscle samples. The GEO2R tool identified 1160 genes to be differentially expressed between DMD and healthy skeletal muscle.
Figure 6 shows the combined score values (−Log (
p-Value) × odds ratio) for the KEGG pathways that were found to be enriched in the top 1000 DEGs with the highest |LogFC| values in this comparison (adjusted
p-Value < 0.05).
Pathway enrichment analysis indicated DMD downregulation to be associated with changes in the following KEGG pathways: ECM receptor interaction (p = 7.82 × 10−11), calcium signaling pathway (p = 8.80 × 10−8), protein digestion and absorption (p = 3.67 × 10−8), cell adhesion molecules (p = 1.55 × 10−4), focal adhesion (p = 2.87 × 10−12), PI3K-Akt signaling (p = 7.82 × 10−11), cAMP signaling (p = 0.020), cGMP-PKG signaling (p = 0.007), and axon guidance pathways (p = 0.030). These very pathways were enriched in the DEGs in more than 50% of primary tumors with low DMD expression, and also enriched in DEGs in carcinoma and sarcoma cell lines with low vs. high DMD expression.
GO Biological Process term analysis for DEGs in DMD skeletal muscle compared to healthy muscle (
Figure 7) identified the following terms: extracellular matrix organization (
p = 6.25 × 10
−17), axonogenesis (
p = 9.35 × 10
−5), regulation of cell migration (
p = 3.39 × 10
−12), synapse organization (
p = 0.009), nervous system development (
p = 1.87 × 10
−4), skeletal system development (
p = 2.22 × 10
−4), regulation of ERK1 and ERK2 cascade (
p = 5.15 × 10
−8), and calcium ion transmembrane import into cytosol (
p = 0.001). These GO terms were enriched in DEGs in more than 50% of primary tumors with low
DMD expression. GO terms that were found to be in common with carcinoma cell lines with low
DMD expression were: extracellular matrix organization, nervous system development, cell junction assembly (
p = 0.004), regulation of neuron projection development (
p = 0.005), neuron migration (
p = 0.004), and positive regulation of epithelial to mesenchymal transition (
p = 2.52 × 10
−4). Finally, the GO term, regulation of cell migration, was found to be in common with sarcoma cell lines with low
DMD expression.
Furthermore, functional enrichment tests of the dystrophin protein-protein interaction network constructed using high confidence PPI information demonstrated that the enriched functional terms were consistent with the pathways found to be significantly enriched in the aforementioned comparisons of primary tumors, tumor cell lines, and DMD skeletal muscle: extracellular matrix organization (
p = 1.10 × 10
−18), axon guidance (
p = 9.45 × 10
−10), focal adhesion (
p = 3.27 × 10
−14), and PI3K-Akt signaling pathways (
p = 9.27 × 10
−9) (
Supplementary Figure S6).
3.10. Low DMD Gene Expression Is Associated with Poor Survival in Patients with 15 Different Primary Tumor Types
Given these similarities between pathways altered in tumors and in the lethal muscle disease, we examined the association between
DMD gene expression and patients’ survival in the following tumor types: BRCA, BLCA, UCEC, CESC, OV, COAD, STAD, PRAD, LUSC, ESCA, HNSC, LUAD, KIRP, THCA, and SARC. We compared overall survival (OS), progression-free interval (PFI), disease-specific survival (DSS), and disease-free interval (DFI) endpoints between patients at the bottom 25% of
DMD expression and those at the top 25% across all the aforementioned tumor types. OS was lower in the low
DMD group (HR 1.33; 95% 1.17, 1.51;
p < 0.0001) with 2417 days compared to 3253 for the high
DMD group. PFI was also decreased in the low
DMD group (HR 1.28; 95% 1.14, 1.45;
p < 0.0001) with 2472 days compared to 3669 days, respectively. Finally, the low
DMD group had lower DSS (HR 1.46; 95% 1.24, 1.72;
p < 0.0001) and DFI (HR 1.30; 95% 1.06, 1.59;
p = 0.012) compared to the high
DMD group (
Figure 8A).
In order to confirm that these changes in survival outcomes between patients with low vs. high DMD expression in their tumors are associated with DMD downregulation and not a result of other factors, we repeated the survival analysis using two groups of patients chosen at random, and no statistically significant differences in survival endpoints were found between the two.
3.11. Transcriptomic Alterations in Hematological Malignancies with Low vs. High Dp71 Expression
The hierarchical clustering analysis revealed that the blood malignancies acute myeloid leukemia (LAML) and diffuse large B-cell lymphoma (DLBC) had a unique pattern of
DMD transcripts, to the point that these two malignancies were classified as a separate cluster (
Figure 2). While no changes in Dp427m expression were observed in both TCGA datasets for LAML and DLBC compared to healthy blood, Dp71 levels including its splice variants Dp71b and Dp71ab were higher in these tumors (
Figure 3,
Supplementary Table S6). Levels of Dp71 and its splice variants were also found higher in two TARGET datasets (acute myeloid leukemia and acute lymphoblastic leukemia) compared to GTEx healthy whole blood (
Supplementary Table S7). However, we note that age differences between TARGET and GTEx donors might be a confounding factor when interpreting these results.
Next, we compared gene expression between samples from the previously mentioned TCGA and TARGET datasets at the bottom 33.3% and top 33.3% of Dp71 expression across all tumor types. The pathways enriched in the top 1000 DEGs with the largest |LogFC| were: protein digestion and absorption (
p = 8.59 × 10
−5), ECM-receptor interaction (
p = 3.87 × 10
−4), focal adhesion (
p = 0.003), and PI3K-Akt signaling pathway (
p = 0.02). The identified GO Biological Process terms for these genes were extracellular matrix organization (
p = 1.57 × 10
−10), regulation of angiogenesis (
p = 7.94 × 10
−4), skeletal muscle development (
p = 8.38 × 10
−4), positive regulation of MAPK cascade (
p = 0.005), regulation of ERK1 and ERK2 cascade (
p = 0.006), regulation of cell migration (
p = 0.01), and positive regulation of calcium ion import (
p = 0.02) (
Supplementary File S2). Thus, these pathways and processes were identical with those found in comparisons with low vs. high expression of Dp427m in completely unrelated malignancies. A comparison between random groups of samples from these hematological tumors did not reveal any statistically significant changes in gene expression.
3.12. Low Expression of Dp71 Is Associated with Poor Survival in Patients with Hematological Malignancies
We compared OS between hematological malignancies patients with low and high expression of Dp71 and its splice variants. OS was lower in the low Dp71 group (HR 2.39; 95% 1.44, 3.99;
p = 0.0003) with 792 days compared to 1992 for the high Dp71 group (
Figure 8B). For specificity testing, the survival analysis was conducted using two random groups of patients with these hematological malignancies and no statistically significant difference in overall survival was found.
Thus, while specific dystrophins are differentially regulated across various tumors, the low expression of both full-length dystrophin and the Dp71 variants in the analyzed tumors is associated with analogous molecular alterations and significantly decreased patient survival.
4. Discussion
We found a significant downregulation of
DMD gene expression across diverse primary tumors. Both full-length and truncated dystrophin variants were differentially expressed, and hierarchical clustering of the top highly expressed transcripts distinguished tumors from corresponding control tissue samples. A similar trend for
DMD downregulation across carcinomas has been described previously [
24], and our analysis discriminating specific
DMD transcripts showed that levels of Dp427m mRNA were statistically significantly decreased in the clusters composed mainly of tumor tissues (
Supplementary Figure S4), suggesting a specific impact of the loss of the full-length transcript. Indeed, pancreatic adenocarcinoma was the only primary tumor with higher levels of Dp427m transcripts compared to healthy pancreas tissue. In contrast, the relative expression of Dp71 splice variants was increased in tumor clusters.
In contrast to the majority of primary tumor samples, in hematological malignancies, total
DMD expression and the expression of the full-length and Dp71 transcripts was higher or unchanged when compared to normal blood, which also showed the lowest
DMD expression relative to housekeeping genes of all the healthy tissues analyzed (
Supplementary Table S3). However, human and mouse hematopoietic stem cells were found to express Dp71, and its expression was decreasing with cell differentiation (manuscript submitted), which agrees with whole blood showing the lowest
DMD expression. Interestingly, while Dp71 expression is found upregulated in hematological malignancies, its low levels were found to be associated with the very same dystrophic molecular alterations in cancer cells.
Despite the advantage that using NATs as control samples in cancer studies reduces individual and anatomical site-specific confounding factors and eliminates technical interlaboratory differences, it was found that these tissues are distinct from healthy and tumor tissues and represent a unique intermediate state between them [
48]. Although we showed that the results of comparing
DMD expression between TCGA tumors and their corresponding NATs were consistent with those of comparing
DMD expression between TCGA tumors and healthy GTEx tissues in 11 out of 13 comparisons, the unique transcriptomic profile of NATs might explain why thyroid NATs and kidney NATs from three different kidney TCGA tumors clustered with tumor tissues in the third cluster. The opposite was observed in the second cluster, where PRAD and SKCM samples clustered with control tissues. This might be the result of a high proportion of non-tumor cells in these tumor samples.
Given some evidence of a causative link between
DMD downregulation and phenotypic changes in tumor cells [
17,
23] and that alterations in Duchenne, such as increased cell proliferation, abnormalities in adhesion, migration, and invasion [
11,
12] are commonly associated with malignancy, transcriptomes of primary tumor samples as well as tumor cell lines with low vs. high levels of
DMD gene expression were compared.
While no causation can be confirmed at this stage, it is important to note that
DMD dysregulation was associated with specific transcriptomic changes across 15 primary tumors and 140 various tumor cell lines (summarized in
Figure 9).
Functional enrichment analysis showed that the pathways and GO Biological Process terms significantly enriched in DEGs in primary tumors and cell lines with low vs. high
DMD expression were consistent with the pathways and GO terms enriched in DEGs in DMD skeletal muscle as well as the functional dystrophin PPI network (
Supplementary Figure S6). Key pathways altered, including cell adhesion, ECM interactions, and PI3K-Akt signaling, correspond to alterations found in Duchenne patients’ cells [
49,
50]. The calcium signaling pathway enriched in DEGs in tumors samples with low
DMD expression from 13 out of the 15 analyzed primary tumors, and well as in sarcoma cell lines with low
DMD expression, agrees with the dysregulation of calcium signaling across a whole spectrum of dystrophic cells (reviewed in [
51]), as do GO terms related to the regulation of the developmental mechanisms [
8]. Thus, the
DMD gene may play similar roles in cancer and development, two processes showing biological and molecular similarities [
52].
The protein digestion and absorption amongst the top pathways identified is somewhat surprising, but it is also present in DMD muscles and must reflect the overlap between DEGs. For example, 17 genes in this pathway are shared with the ECM receptor interaction, focal adhesion, and PI3K-Akt, and six genes are shared with the cAMP and cGMP-PKG signaling pathways.
Dystrophin in tumor cell lines did not correlate strongly with the presence of its established DAPC partners, suggesting that it may have a different role(s) than those in muscle cells (
Supplementary Table S8). This is unsurprising, given that DAPs are known to differ in different tissues, with muscle and brain being the most notable examples. But even within muscle, the dystrophin interactome changes with differentiation, and functionally distinct DAPs exist in satellite cells and myofibers.
The importance of the
DMD gene in tumorigenesis is supported by the finding that low
DMD expression was associated with poor survival outcomes in patients with 15 different types of tumors (14 carcinomas and sarcoma). The overall survival of cancer patients with decreased
DMD expression in tumors was 27 months lower than that of patients with high
DMD expression. However, since the low and high
DMD groups used in this analysis were composed of 15 tumor types, the number of samples for each of those types is highly variable between the two groups, and this could possibly be a confounding factor when interpreting the results. In other studies, mutations in the
DMD gene were associated with poor overall survival of patients of two out of 11 analyzed tumors, namely uterine corpus endometrioid carcinoma and breast invasive carcinoma [
24]. Dystrophin protein was also identified as a survival biomarker in upper gastrointestinal cancer patients, as poorer survival was observed in patients with low compared to high levels of dystrophin protein [
53].
In our analyses, the overall survival of patients suffering from hematological malignancies with decreased Dp71 expression was about 39 months lower than that of patients with high Dp71 expression. However, in low-grade glioma [
54] and B-cell chronic lymphocytic leukemia [
28], high Dp71 expression was previously associated with poor patient survival.
We found DMD expression to be associated with the tumor stage. Samples from patients with stage I had significantly higher levels of DMD expression compared to higher stages after controlling for age and gender differences.
DMD expression was also found to decrease with the age of onset, as samples from younger patients had higher
DMD expression compared to samples from older patients. This association between
DMD expression and age was unique to tumor tissues, as no such association was found by us in the corresponding healthy tissues from the GTEx database, and also in a meta-analysis that identified genes with age-associated expression in human peripheral blood samples [
55].
It is worth noting that recent studies identified increased frequency of rhabdomyosarcomas in DMD patients [
56,
57] which agrees with previous data on spontaneous rhabdomyosarcomas in dystrophic mice [
58].
Crucially, we demonstrated downregulation of DMD expression in tumors with incidence increasing with age. Therefore, with improved therapies, the risk of malignancy in DMD should be considered. As for hematological malignancies, which frequently affect children, poor survival was associated with Dp71 downregulation. Expression of this dystrophin is not affected in the vast majority of DMD patients.
A further indication of the functional significance of the
DMD gene in tumors is that
DMD downregulation across various malignancies involves regulatory changes, not just results from somatic gene mutations. Although, as expected, the presence of some types of somatic mutations and SCNAs was associated with lower levels of
DMD expression, in about 88% of tumor samples and 77% of cell lines,
DMD downregulation could not be linked to mutations in the coding regions of the
DMD gene. Moreover,
DMD downregulation was not a result of SCNAs in about 61% of samples from female patients and about 56% of samples from male patients. While deletions were described as a causative factor for the downregulation of
DMD gene expression in some tumors [
17,
23,
30], significantly reduced expression was also found in the absence of deletions or nonsense mutations [
27], in agreement with our comprehensive analysis across different malignancies. Moreover, in primary pancreatic adenocarcinoma, Dp427m transcript was increased while Dp71 expression was reduced (
Supplementary Figure S5B). Given the
DMD gene structure, such an expression pattern can only be explained by differential regulation. Targeted degradation of dystrophin transcripts was suggested before [
59], and recently, an epigenetic mechanism responsible for reduced
DMD transcript levels has been described [
60]. These data indicate that, rather than simply being an effect of random mutations, quite likely to occur in this large gene,
DMD alterations in tumor cells may have a complex regulatory nature involving mechanisms such as transcriptional regulation, chromatin remodeling, or transcript degradation. Given that miRNAs might be responsible for the differential regulation of
DMD expression in tumor samples, we investigated but did not find any differentially expressed miRNAs in tumor samples with low vs. high
DMD expression that were common to all of the 15 primary tumor types analyzed (
Supplementary Table S9).
Thus, specific alterations in
DMD gene expression are a common feature across a spectrum of malignancies, including those originating from tissues previously not associated with the expression of the full-length dystrophin. Yet, we found
DMD transcript levels in these tissues to be comparable to the levels of housekeeping genes. Moreover, interrogation of proteomics datasets demonstrated a much wider distribution profile for the full-length dystrophin protein (
Supplementary Figure S2). Interestingly, according to the Human Protein Atlas [
38], the antibody HPA002725, directed against amino acids 186–333 of the full-length dystrophin, detected moderate cytoplasmic and/or membrane staining in a range of normal tissues in addition to the expected staining in skeletal and cardiac muscle and the CNS. However, the antibody HPA023885, raised against amino acids 2843–2992 and therefore supposed to detect all dystrophin isoforms upstream of Dp71, showed staining in skeletal and cardiac muscle, while other tissues were negative. This staining pattern disagrees with the established expression of Dp260, Dp140, and Dp116, which is broader than that of Dp427 isoforms, and so it could not be accurate. Given the high sensitivity and specificity of the mass-spec [
61,
62], identification of the full-length dystrophin in a wide spectrum of normal tissues using this latter method is likely to represent the true expression status. Thus, dystrophin may be present in many tissues at low levels and/or in a tightly controlled spatiotemporal manner, which might be missed using standard detection methods.
The molecular signature associated with decreased
DMD expression in tumors and corresponding tumor cell lines is concordant with that found in Duchenne muscular dystrophy. The
DMD gene encodes a spectrum of dystrophin isoforms, but significant expression of the majority of these appeared to be restricted to specific tissues. While loss of the full-length dystrophin results in Duchenne muscular dystrophy, mutations additionally disrupting other isoforms result in exacerbated phenotypes [
36,
63]. In skeletal myofibers, cardiomyocytes, and neurons, which express the highest levels of the full-length dystrophin, this protein has been described to serve as a structural scaffold for proteins engaged in ECM and cell-cell interactions, and in intracellular signaling. However, more recent data demonstrate that the loss of
DMD expression impacts a broader spectrum of cells than those affected in DMD. In myoblasts [
10,
16,
64,
65], lymphocytes [
33], endotheliocytes [
32,
66,
67], mesodermal [
8], and myogenic cells [
11,
15], loss of
DMD expression leads to significant abnormalities. Moreover, the same abnormalities can occur in very distinct cells, e.g., calcium dys-homeostasis was found across multiple cells [
51], and the damaging purinergic phenotype affects myoblasts and lymphocytes [
33,
65]. Some old findings, such as platelet abnormalities [
68], have recently been vindicated [
69]. Yet, these defects cannot be clearly attributed to the loss of interaction between dystrophin and the known dystrophin-associated proteins. Indeed, these cell-autonomous defects appear to affect dystrophic cells, which, when healthy, express the 14-kb
DMD transcript, but were not shown to produce detectable levels of full-length dystrophins. This phenomenon, where expression of the 14-kb
DMD transcript in cells such as myoblasts and lymphocytes does not correlate with detectable dystrophin protein, was known for decades but just disregarded as an “illegitimate transcription” [
70].
Our data suggest that more attention should be given to the subtler DMD gene functions, beyond those causing the main symptoms of DMD. Such studies have a potential to identify new therapeutic targets for the treatment of this debilitating and still incurable disease. Moreover, given the poor survival rate of patients with tumors downregulating dystrophin, the DMD gene may be important in oncology.