Pan-Cancer Analysis of the Genomic Alterations and Mutations of the Matrisome

The extracellular matrix (ECM) is a master regulator of all cellular functions and a major component of the tumor microenvironment. We previously defined the “matrisome” as the ensemble of genes encoding ECM proteins and proteins modulating ECM structure or function. While compositional and biomechanical changes in the ECM regulate cancer progression, no study has investigated the genomic alterations of matrisome genes in cancers and their consequences. Here, mining The Cancer Genome Atlas (TCGA) data, we found that copy number alterations and mutations are frequent in matrisome genes, even more so than in the rest of the genome. We also found that these alterations are predicted to significantly impact gene expression and protein function. Moreover, we identified matrisome genes whose mutational burden is an independent predictor of survival. We propose that studying genomic alterations of matrisome genes will further our understanding of the roles of this compartment in cancer progression and will lead to the development of innovative therapeutic strategies targeting the ECM.


Introduction
The advent of next-generation sequencing (NGS) techniques and the wealth of "big data" they have generated have revolutionized biomedical research and propelled the discovery of mechanisms underlying diseases [1] leading to the development of novel strategies to diagnose and care for patients. In recent years, The Cancer Genome Atlas (TCGA) has provided researchers with an unmatched set of genomics, epigenomics, transcriptomics, and clinical data [2], enabling disruptive discoveries of driver mutations and oncogenic signaling pathways [3], probing the immune landscape of tumors pathways [4], or correlating genomic alterations to response to anti-cancer therapies [5].
While cancer research has mostly focused on the study of tumor-cell-intrinsic processes, the past few decades have seen an increased focus being placed on the study of the tumor microenvironment and on the tumor extracellular matrix (ECM) [6][7][8]. The extracellular matrix (ECM) is the complex and dynamic assembly of hundreds of proteins that regulates cellular metabolism and phenotypes and governs tissue formation and homeostasis [9]. The ECM is a major structural and functional component of the tumor microenvironment [10]. Desmoplasia, or ECM accumulation, is a characteristic feature of tumors, and a higher ECM content is often associated with poorer prognosis in a broad range of cancer types [11]. Moreover, all 10 hallmarks of cancers proposed by Weinberg and Hanahan [12] are under In addition, proteomics of the ECM of tumor xenografts has also shown that, while stromal cells and in particular cancer-associated fibroblasts are the main contributors to the production of the ECM of tumor microenvironments [26], tumor cells also produce and secrete ECM proteins [17,21,22]. Used to annotate transcriptomic data, it has helped shed light on the ECM contribution to specific cancer types, including high-grade serous ovarian cancer [25,27,28] or acute myeloid leukemia [29], or across cancers [30]. Importantly, in a recent study, we evaluated the level of the expression of matrisome genes in 10,487 patients across 32 tumor types using TCGA data and demonstrated that matrisome gene expression can segregate different tumor types [31]. Last, "omic" technologies have uncovered ECM genes and proteins whose levels are predictive of cancer patient outcome [23,25,[32][33][34]. However, while mutations in ECM genes have been linked to a plethora of diseases and syndromes [35], no study has focused on determining the presence and extent of genomic alterations and mutations in ECM genes in cancers, a crucial piece of information to further understanding of the tumor microenvironment (TME) [36].
Here, we sought to profile the genomic and the mutational landscapes of the matrisome using TCGA data. We focused our analysis on a panel of 14 of the most frequent solid cancer types occurring in diverse organs and projected to account for more than 1 million of new cancer cases in the US in 2020 and to be responsible for more than 350,000 cancer deaths in the US in 2020 (Table 1) [37]. For our analysis, we retrieved data on 1014 of the 1027 human matrisome genes for 6740 patients and surveyed the nature and potential consequences of 4433 copy number alterations (CNAs) and 4497 mutations affecting matrisome genes ( Table 2). We determined the impact of these genomic alterations (copy number alterations and mutations) on gene expression levels, predicted protein functions, and overall patient survival. Our results demonstrate that matrisome genes are subject to more copy number and mutational alterations than the rest of the genome and that mutations of matrisome genes are statistically more likely to have a functional impact. We further identified common core matrisome and matrisome-associated genes altered across multiple cancer types, and within these genes, we identified sequences encoding protein domains that accumulate more frequent mutations, which hints at the potential functional consequences these mutations could have on the multi-faceted roles of the ECM in cancer initiation and progression. Last, we report the identification of matrisome genes whose mutational burden correlates with overall survival, demonstrating the potential prognostic value of analyzing genomic features of the matrisome to predict cancer patient outcome.  Note: Cancer types are color-coded using the color of their respective awareness ribbon.

Copy Number Alterations in Matrisome Genes Are More Frequent than in the Rest of the Genome
We first sought to measure the extent and impact of copy number alterations (CNAs) on matrisome genes. To address this, we evaluated the frequency with which matrisome genes were subject to CNAs and showed that, overall, they tended to be more frequently and/or more extensively altered than the rest of the genome (Graphical Abstract and Figure 1).
We further determined the type of CNA that affected matrisome genes and stratified CNAs as low-or high-level copy number amplifications ( Figure S1A,B and Table S1) and homozygous or single-copy deletions (Figure S1C,D and Table S1). To simplify interpretation, we binned data into quartiles (Q0-Q4) based on the percentage of samples showing CNAs in a given tumor, Q0 indicating that no CNAs were detected, Q1 the first quartile where CNAs are found in 0-25% of the samples, Q2 the second where CNAs are found in 26-50% of the samples, etc. Results show that, in general, Cancers 2020, 12, 2046 4 of 21 matrisome CNAs in a tumor follow the same quantitative trends as non-matrisome ones, though at times they are more abundant overall (e.g., Figure 1 BRCA, LUAD, and PAAD) and more quantitatively affected (e.g., gain in Q2 in Figure 1 in LUSC). On the other hand, in some tumors, matrisome CNAs quantitatively decreased (e.g., Figure 1, ESCA, OV, and UCEC). These data suggest a trend towards a more dynamic copy number tolerance than the rest of the genome whose consequences have not been, until now, evaluated.
Cancers 2020, 12, 2046 4 of 20 Further breakdown of the data per matrisome gene category ( Figure S2) shows the same general outlook: matrisome genes seem particularly tolerant of copy number alterations, with minor tumorspecific differences in the amount and type of CNAs within the matrisome categories. For example, we observed that the core matrisome (collagens, glycoproteins, and proteoglycans) tends to accumulate more CNAs than matrisome-associated genes in breast and lung neoplasms, while tending to accumulate less in colorectal, ovarian, and uterine tumors, possibly hinting at different contexts, and perhaps selective pressure on the matrisome components, between these tumor types. Bar charts represent the frequency of CNAs in matrisome genes (purple bars) and non-matrisome genes (rest of the genome, grey bars) across 14 different cancer types. Chi-square test p-values, (calculated on the frequency per binned categories) are indicated for each cancer type (* p < 0.05; ** p < 0.01; *** p < 0.001). Binned groups are represented by different shades of purple and grey to represent genes in which CNAs are found in x% of the samples: Q0 = 0% (lighter shade), 0% < Q1 ≤ 25%, 25% < Q2 ≤ 50% (darker shade). See also Figures S1 and S2, and Table S1.

Consequences of CNAs on Matrisome Gene Expression Levels
We next sought to determine the potential consequences of CNAs on matrisome gene expression levels. While the majority of alterations were predicted to have no impact on expression levels ( Figure  2A), we identified, for each cancer type, subsets of core matrisome and matrisome-associated genes that either have a significantly positive impact on gene expression, i.e., genes showing a 0.5-fold (or 50%) higher expression in patients with CNAs versus patients with no CNAs ( Figure 2B,C), or have a significantly negative impact on gene expression, i.e., genes showing a 0.5-fold (or 50%) lower expression in patients with CNAs versus patients with no CNAs ( Figure 2B,D). Among these, the majority of genes having a significant impact on gene expression were matrisome-associated genes Bar charts represent the frequency of CNAs in matrisome genes (purple bars) and non-matrisome genes (rest of the genome, grey bars) across 14 different cancer types. Chi-square test p-values, (calculated on the frequency per binned categories) are indicated for each cancer type (* p < 0.05; ** p < 0.01; *** p < 0.001). Binned groups are represented by different shades of purple and grey to represent genes in which CNAs are found in x% of the samples: Q0 = 0% (lighter shade), 0% < Q1 ≤ 25%, 25% < Q2 ≤ 50% (darker shade). See also Figures S1 and S2, and Table S1.
Further breakdown of the data per matrisome gene category ( Figure S2) shows the same general outlook: matrisome genes seem particularly tolerant of copy number alterations, with minor tumor-specific differences in the amount and type of CNAs within the matrisome categories. For example, we observed that the core matrisome (collagens, glycoproteins, and proteoglycans) tends to accumulate more CNAs than matrisome-associated genes in breast and lung neoplasms, while tending to accumulate less in colorectal, ovarian, and uterine tumors, possibly hinting at different contexts, and perhaps selective pressure on the matrisome components, between these tumor types.

Consequences of CNAs on Matrisome Gene Expression Levels
We next sought to determine the potential consequences of CNAs on matrisome gene expression levels. While the majority of alterations were predicted to have no impact on expression levels ( Figure 2A), we identified, for each cancer type, subsets of core matrisome and matrisome-associated genes that either have a significantly positive impact on gene expression, i.e., genes showing a 0.5-fold (or 50%) higher expression in patients with CNAs versus patients with no CNAs ( Figure 2B,C), or have a significantly negative impact on gene expression, i.e., genes showing a 0.5-fold (or 50%) lower expression in patients with CNAs versus patients with no CNAs (Figure 2B,D). Among these, the majority of genes having a significant impact on gene expression were matrisome-associated genes ( Figure 2C,D), suggesting that functional and signaling elements within the matrisome (ECM remodeling enzymes, cytokines and chemokines, growth factors etc.) are more frequently targeted by CNA-dependent expression regulation than structural genes (collagens, glycoproteins, and proteoglycans), which is along the same line of the pan-cancer observations by Shao et al. [38]. Additionally, these findings can also be explained by the high number of paralogs of core matrisome genes that might act as a buffer to preserve the functionality of this compartment. ( Figure 2C,D), suggesting that functional and signaling elements within the matrisome (ECM remodeling enzymes, cytokines and chemokines, growth factors etc.) are more frequently targeted by CNA-dependent expression regulation than structural genes (collagens, glycoproteins, and proteoglycans), which is along the same line of the pan-cancer observations by Shao et al. [38]. Additionally, these findings can also be explained by the high number of paralogs of core matrisome genes that might act as a buffer to preserve the functionality of this compartment.

Matrisome Genes Are Significantly More Susceptible to Be Mutated
Matrisome genes, and in particular core matrisome genes encoding structural components of the ECM such as collagens, are significantly longer than other genes ( Figure 3A) and thus call for more mutations than the rest of the genome. This observation prompted us to compute the number of mutations normalized by gene length (see Section 3), which revealed that matrisome genes accumulate significantly more mutations per gene length and across the overall number of genes

Matrisome Genes Are Significantly More Susceptible to Be Mutated
Matrisome genes, and in particular core matrisome genes encoding structural components of the ECM such as collagens, are significantly longer than other genes ( Figure 3A) and thus call for more mutations than the rest of the genome. This observation prompted us to compute the number of mutations normalized by gene length (see Section 3), which revealed that matrisome genes accumulate significantly more mutations per gene length and across the overall number of genes involved than the rest of the genome (Graphical Abstract, Figure 3B, Figure S3, and Table S2A-D). This suggests that either or both a lower selective pressure on these mutations by, for example, immune cells and/or a higher fitness as local mutators that act as a buffer to the preservation of the global genomic information might act on matrisome sequences at the genomic level [38][39][40]. In line with this, we found a higher overall mutational burden in the matrisome compartment in comparison to the rest of the genome ( Figure S3), but a lower recurrence of mutations, with most mutations in matrisome genes found in only one patient (Table S2C,D). Of note, for three cancer types, cutaneous melanoma, stomach adenocarcinoma, and endometrial carcinoma, we found a subset of mutations in matrisome genes found in more than five patients (Table S2C). In light of our findings on CNAs, we can speculate that the selective pressure on these mutations might be counteracted and dispersed by the high number of matrisome gene paralogs, which might further point to a role for matrisome gene mutations as local mutators or interactors rather than cancer drivers.
While most of the mutations identified were specific to only one patient or to a few patients within a single tumor type (Table S2), we could nonetheless identify potential "hotspot" mutations, defined as occurring in at least five patients per tumor type, and in at least two different tumor types, for six genes ( Figure S4). Among these are PXDN and FBN2, encoding the core ECM glycoproteins peroxidasin and fibrillin 2, whose roles in different cancers have been already discussed [41,42], though no evidence of a mutational impact of these proteins has been presented.
We further interrogated the molecular nature of the mutations and found that for all matrisome gene categories and for most cancer types, these were in majority (>~70%) transitions, i.e., the interchange of a purine for another (A/G) or of a pyrimidine for another (C/T) ( Figure S5). Two noticeable deviations are lung adenocarcinomas and skin cutaneous melanomas. Interestingly, for the former, the frequency of transversions, the replacement of a purine by a pyrimidine and a hallmark of the carcinogenic effects of smoking on genes [43], and the frequency of transitions were similar, and this was consistent across all matrisome gene categories ( Figure S5). For melanoma, the carcinogenic effects of ultraviolet-A and -B wavelengths suffice to explain the increased amount of transitions, and the sum of these observations put local genomic variance in the matrisome as a factor that probably comes later in carcinogenic evolution than the primary effects of driver events.
When looking at the type of mutations affecting matrisome genes, we found that the majority of mutations across all cancer types were missense mutations (~50%), followed by silent mutations (~25%) ( Figure 3C). We also observed a high percentage of frame shift deletions in breast cancer (BRCA), while cervical squamous cell carcinomas and endocervical adenocarcinomas (CESC) and esophageal carcinomas (ESCA) accumulated a large number of mutations in the 3 UTR of matrisome genes. In addition, when looking specifically at the frequency of mutation types per matrisome gene categories and cancer types, we observed that matrisome genes and particularly secreted factors presented frequent mutations affecting splicing sites in cervical squamous cell carcinomas and endocervical adenocarcinomas (CESC), esophageal carcinomas (ESCA), and uterine carcinosarcoma (UCS) ( Figure S6). This is of particular relevance, since there exist multiple examples of alternative splicing of ECM genes (e.g., fibronectin, tenascin) or growth factors resulting in the production of isoforms only reported to be expressed in pathological conditions such as wound healing and cancers [44][45][46], and these splice variants have been proposed to serve as biomarkers or anchors to selectively target drugs or biological agents to tumors [47][48][49].

Mutated Protein Domains and Potential Consequences on ECM Protein Functions
ECM proteins present a characteristic domain-based organization that supports their scaffolding properties via ECM protein-protein interactions and their signaling properties via ECM/growth factor interactions and ECM/ECM-receptor interactions [18,50,51]. These protein domains initially served as the basis for the in-silico prediction of the matrisome component via sequence analysis [17]. We thus sought to determine whether mutations in matrisome genes occurred preferentially in certain protein domains and/or preferential sites and whether we could infer the possible impact of such mutations on protein folding, protein complex assembly, or signaling functions. We focused our survey on the top 20 most mutated domains in each of the 14 cancer types studies and identified 46 unique protein domains with the highest mutation frequency (Figure 4 and Table S3). Of these, 19

Mutated Protein Domains and Potential Consequences on ECM Protein Functions
ECM proteins present a characteristic domain-based organization that supports their scaffolding properties via ECM protein-protein interactions and their signaling properties via ECM/growth factor interactions and ECM/ECM-receptor interactions [18,50,51]. These protein domains initially served as the basis for the in-silico prediction of the matrisome component via sequence analysis [17]. We thus sought to determine whether mutations in matrisome genes occurred preferentially in certain protein domains and/or preferential sites and whether we could infer the possible impact of such mutations on protein folding, protein complex assembly, or signaling functions. We focused our  Table S3). Of these, 19 are core-matrisome-defining protein domains and 15 are matrisome-associated-protein defining domains [17]. While some of these domains are not exclusively found in ECM proteins, others, such as the laminin G and laminin N-terminal domains, the zona pellucida domain, or the NIDO domain, are specific to matrisome components.
Cancers 2020, 12,2046 8 of 20 are core-matrisome-defining protein domains and 15 are matrisome-associated-protein defining domains [17]. While some of these domains are not exclusively found in ECM proteins, others, such as the laminin G and laminin N-terminal domains, the zona pellucida domain, or the NIDO domain, are specific to matrisome components. The color code indicates domains originally used to predict core-matrisome-proteins (purple) and matrisome-associated (coral) proteins (see [17] for details on matrisome-defining domains). The diameter of the bubbles is proportional to the mutation frequency of each domain in each cancer type. See also Table S3.

Functional Consequences of Matrisome Gene Mutations
Using the "Polymorphism Phenotyping v2" (PolyPhen-2) algorithm predictions as previously reported [52], we next evaluated the impact of mutations at the protein level. As compared to mutations affecting non-matrisome genes, we found that mutations of matrisome genes are statistically slightly more likely to have a functional impact ( Figure 5A). Further investigations into predicted effects for the different matrisome categories show major differences, with ECM proteins (collagens, proteoglycans, and ECM-affiliated molecules) having a proportionally much greater burden of mutations with unclear/unknown effects as compared to, for example, ECM-associated proteins such as metalloproteinases or growth factors ( Figure 5B).
While further investigations will be required to explain this observation, we can hypothesize that the highly modular structure of core matrisome genes and proteins, in particular collagens, can potentially absorb more mutations without suffering a functional damage with respect to genes with a much simpler organization such as ECM regulators and secreted factors. Moreover, how these The color code indicates domains originally used to predict core-matrisome-proteins (purple) and matrisome-associated (coral) proteins (see [17] for details on matrisome-defining domains). The diameter of the bubbles is proportional to the mutation frequency of each domain in each cancer type. See also Table S3.

Functional Consequences of Matrisome Gene Mutations
Using the "Polymorphism Phenotyping v2" (PolyPhen-2) algorithm predictions as previously reported [52], we next evaluated the impact of mutations at the protein level. As compared to mutations affecting non-matrisome genes, we found that mutations of matrisome genes are statistically slightly more likely to have a functional impact ( Figure 5A). Further investigations into predicted effects for the different matrisome categories show major differences, with ECM proteins (collagens, proteoglycans, and ECM-affiliated molecules) having a proportionally much greater burden of mutations with unclear/unknown effects as compared to, for example, ECM-associated proteins such as metalloproteinases or growth factors ( Figure 5B).
Cancers 2020, 12, 2046 9 of 20 mutations affect post-translational modifications and three-dimensional protein conformation remains to be addressed. This general trend holds true when subsetting results by matrisome gene category and cancer type ( Figure S7), suggesting no specialization in the functional type of mutations occurring within the matrisome of different tumor types.

Identification of the Top 10 Most Mutated Matrisome Genes Across 14 Cancer Types
Our analysis reveals that several matrisome genes are frequently mutated across tumors, though with different specific mutations ( Figure 6A and Table S4). Notably, the ten most mutated matrisome genes per tumor type overlap regardless of tissue-or cell-of origin-patterns ( Figure 6A and Table S4). This is, for example, the case of mucin 16 (MUC16) or filaggrin (FLG), which are mutated in all 14 tumors analyzed, or of hemicentin 1 (HMCN1), mucin 5 B (MUC5B) or reelin (RELN) mutated in 12/14 tumors. These genes, however, are also the largest of all matrisome genes and it is thus unsurprising to find them topping the Pan-Cancer matrisome mutational burden chart. Interestingly, we observed again a differential distribution in the accumulation of mutations between core matrisome and matrisome-associated genes, with the latter (which includes the mucins) being more frequently represented at top position across all cancers considered.
Among the core matrisome, the most frequently mutated gene across cancer types is HMCN1, which encodes the glycoprotein hemicentin-1, also known as fibulin-6 (FBN6) and a member of the fibulin protein family and component of basement membranes [53] (Figure 6B). While the importance of HMCN1 in cancer progression remains unclear, our data suggest a wider contribution to oncological processes than previously reported [54,55]. Of note, our survey did not find any correlation between the mutational burden in HMCN and cancer patient survival.
Similarly, especially considering the impact on patient survival (see Figure 7), our data suggest that mutations within the mucin genes, especially MUC16 and MUC5B, are worth further assessment for their potential prognostic use, again expanding on observations from previous reports [56]. While further investigations will be required to explain this observation, we can hypothesize that the highly modular structure of core matrisome genes and proteins, in particular collagens, can potentially absorb more mutations without suffering a functional damage with respect to genes with a much simpler organization such as ECM regulators and secreted factors. Moreover, how these mutations affect post-translational modifications and three-dimensional protein conformation remains to be addressed. This general trend holds true when subsetting results by matrisome gene category and cancer type ( Figure S7), suggesting no specialization in the functional type of mutations occurring within the matrisome of different tumor types.

Identification of the Top 10 Most Mutated Matrisome Genes across 14 Cancer Types
Our analysis reveals that several matrisome genes are frequently mutated across tumors, though with different specific mutations ( Figure 6A and Table S4). Notably, the ten most mutated matrisome genes per tumor type overlap regardless of tissue-or cell-of origin-patterns ( Figure 6A and Table S4). This is, for example, the case of mucin 16 (MUC16) or filaggrin (FLG), which are mutated in all 14 tumors analyzed, or of hemicentin 1 (HMCN1), mucin 5 B (MUC5B) or reelin (RELN) mutated in 12/14 tumors. These genes, however, are also the largest of all matrisome genes and it is thus unsurprising to find them topping the Pan-Cancer matrisome mutational burden chart. Interestingly, we observed again a differential distribution in the accumulation of mutations between core matrisome and matrisome-associated genes, with the latter (which includes the mucins) being more frequently represented at top position across all cancers considered.
Among the core matrisome, the most frequently mutated gene across cancer types is HMCN1, which encodes the glycoprotein hemicentin-1, also known as fibulin-6 (FBN6) and a member of the fibulin protein family and component of basement membranes [53] (Figure 6B). While the importance of HMCN1 in cancer progression remains unclear, our data suggest a wider contribution to oncological processes than previously reported [54,55]. Of note, our survey did not find any correlation between the mutational burden in HMCN and cancer patient survival.

Consequences of Matrisome Gene Mutations on Patient Survival
Finally, we evaluated the consequences of mutational burden in matrisome genes (at the whole gene level as well as at the domain level) on patient survival, focusing on genes with a concordant effect per se (univariate analysis) and after correcting for age, sex, and ethnicity in multivariate analyses. Figure 7 depicts the prognostic value of two core matrisome genes, COL6A1 and LAMB3, and of two matrisome-associated genes, MUC5B and MUC16, whose mutational burden significantly correlated either negatively (Table 3A and Table 4A) or positively (Table 3B and Table 4B) with overall survival in at least two cancer types: colorectal cancer and melanoma for COL6A1, lung adenocarcinoma and stomach adenocarcinoma for LAMB3, melanoma and uterine corpus endometrial carcinoma for MUC16, and lung adenocarcinoma and uterine corpus endometrial carcinoma for MUC5B. More globally, our results show that, independent of the matrisome category to which these genes belong (core matrisome, Table 3, Figure 7, and Table S5A; or matrisome-associated, Table 4, Figure 7, and Table S4A), the prognostic value of their mutational burden depends on the gene itself, hinting at the functional consequences of mutations on the functions of the respective protein. The Similarly, especially considering the impact on patient survival (see Figure 7), our data suggest that mutations within the mucin genes, especially MUC16 and MUC5B, are worth further assessment for their potential prognostic use, again expanding on observations from previous reports [56].  domains reported in Table S5), supporting the idea that mutations in the tumor matrisome disrupt the organization of the tumor microenvironment and disadvantage neoplastic cells taking away structural cues they require for extensive growth, spreading, and metastasis. This observation may provide another explanation for the low recurrence of matrisome mutations found across tumors, though the lack of time coordinates within the TCGA data and their bulk rather than single cell structure prevented us from testing this further (see Section 4).

Consequences of Matrisome Gene Mutations on Patient Survival
Finally, we evaluated the consequences of mutational burden in matrisome genes (at the whole gene level as well as at the domain level) on patient survival, focusing on genes with a concordant effect per se (univariate analysis) and after correcting for age, sex, and ethnicity in multivariate analyses. Figure 7 depicts the prognostic value of two core matrisome genes, COL6A1 and LAMB3, and of two matrisome-associated genes, MUC5B and MUC16, whose mutational burden significantly correlated either negatively (Tables 3A and 4A) or positively (Tables 3B and 4B) with overall survival in at least two cancer types: colorectal cancer and melanoma for COL6A1, lung adenocarcinoma and stomach adenocarcinoma for LAMB3, melanoma and uterine corpus endometrial carcinoma for MUC16, and lung adenocarcinoma and uterine corpus endometrial carcinoma for MUC5B. More globally, our results show that, independent of the matrisome category to which these genes belong (core matrisome, Table 3,  Figure 7, and Table S5A; or matrisome-associated, Table 4, Figure 7, and Table S4A), the prognostic value of their mutational burden depends on the gene itself, hinting at the functional consequences of mutations on the functions of the respective protein. The same holds true for matrisome protein domains (Table S5B), though, from both the gene-centric and the domain-centric analyses, we observed that mutations in the tumor matrisome are much more likely to associate with increased overall survival (overall survival, approximately 61% of genes and 81% of domains reported in Table S5), supporting the idea that mutations in the tumor matrisome disrupt the organization of the tumor microenvironment and disadvantage neoplastic cells taking away structural cues they require for extensive growth, spreading, and metastasis. This observation may provide another explanation for the low recurrence of matrisome mutations found across tumors, though the lack of time coordinates within the TCGA data and their bulk rather than single cell structure prevented us from testing this further (see Section 4).

Cross-Validation Using Independent Cancer Patient Cohorts
While cross-cohort comparisons can be hindered by the composition of the cohorts (mixed population background, age, etc.) and the differing end-points used and tend to mask rarer mutations (such as the ones reported here), we further sought to validate our observations in other, comparably large cohorts of cancer patients. We focused on the four genes, COL6A1, LAMB3, MUC5B, and MUC16, for which we showed that mutational burden had a prognostic value for patient survival (Figure 7) and interrogated a large collection of samples from patients and cells from 178 cohorts available via the cBioPortal (see Section 3). We observed a wide variation in the number of cases harboring CNAs or mutations in these genes ( Figure S8A and Table S6A-D). We also observed an overall low occurrence and recurrence of CNAs and mutations for each of these genes (see the peak of the density plots around the 0 value in Figure S8A) and a higher number of studies with cases affected by CNAs or mutations for MUC5B and MUC16, than for COL6A1 and LAMB3 (Table S6A-D). Importantly, both observations are in line with our findings on matrisome mutational frequencies in TCGA.
We further sought to cross-validate the prognostic value of these genes in a combined cohort of adult patients from TCGA and Genotype-Tissue expression project (GTEx) cohorts and pediatric patients from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) cohort and KidsFirst initiatives ( Figure S8B). Interestingly, we observed significant associations with survival for both MUC5B and MUC16 and a borderline association for COL6A1, which is similar to what we observed in the pan-cancer TCGA cohort (Figure 7), with MUC5B mutations associating with better survival and COL6A1 and MUC16 associating with poorer survival ( Figure S8B).

Source Data
All data except the matrisome gene list and Pan-Cancer tumor purity values were sourced from the harmonized TCGA Pan-Cancer Atlas resource and downloaded from the University of California, Santa Cruz UCSC Xena Browser hub (http://xenabrowser.net/). Online analyses for cross-validation purposes were performed through cBioPortal (https://www.cbioportal.org/) and the Xena Browser. The following files were downloaded for further analysis.

Matrisome Gene List
The human matrisome data were downloaded from the Matrisome Project data browser (http://matrisome.org/) [19]. In order to assist with the identification and classification of genes encoding proteins found within the ECM, we previously defined the matrisome as the collection of genes encoding structural elements of the ECM ("core matrisome") and genes encoding proteins either structurally or functionally associated with the ECM ("matrisome-associated") [17,18,57]. We further divided these divisions of the matrisome into categories, the core matrisome being composed of collagens, proteoglycans, and other ECM glycoproteins, while the matrisome-associated is composed of proteins structurally of functionally affiliated with ECM proteins, ECM-remodeling enzymes and their regulators ("ECM regulators"), and secreted factors [17,18,57].

Copy Number Alterations (CNAs)
Gene-level copy number (gistic2_thresholded), Xena identifier: TCGA.PANCAN.sampleMap/ Gistic2_CopyNumber_Gistic2_all_thresholded.by_genes. TCGA pan-cancer gene-level copy number alterations (CNA) were estimated using the Genomic Identification of Significant Targets in Cancer 2 (GISTIC2) threshold method, compiled using data from all TCGA cohorts. Copy number was measured experimentally using whole genome microarray at a TCGA genome characterization center. Subsequently, the GISTIC2 method was applied using the TCGA FIREHOSE pipeline to produce gene-level copy number estimates. GISTIC2 further thresholded the estimated values to -2, -1, 0, 1, and 2, representing homozygous deletion, single copy deletion, diploid normal copy, low-level copy number amplification, and high-level copy number amplification, respectively. Genes were mapped onto the human genome coordinates using UCSC cgData HUGO probeMap [58].

Clinical Data
Curated clinical data, Xena identifier: Survival_SupplementalTable_S1_20171025_xena_sp. These data were derived from the integration of the TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR) [60].

Pan-Cancer Purity Data
TCGA Pan-Cancer tumor purity data (consensus measurement of purity estimations, CPE) were obtained from Aran et al [61].

Cross-Validation Data
The prevalence of mutations in COL6A1, LAMB3, MUC5B, and MUC16 across 178 studies was evaluated via cBioPortal (http://www.cbioportal.org/) [62,63]. Further assessments on the effect of the mutational burden of these genes on patient survival were conducted in the integrated "TCGA TARGET GTEx KidsFirst" cohort, available via the Xena Browser.

Statistical Analysis
All analyses were performed in The R Project for Statistical Computing (R) and were restricted to the following tumor types . Quantitative categorical differences were tested using a two-sided Chi-square test, while quantitative numerical differences were tested using a two-sided Mann-Whitney U test.
To calculate the effect of CNAs on transcription, only those CNAs were selected whose expression level for the same gene harboring the CNA in carriers was at least 50% increased or decreased vs. non-carriers.
Differences in the number of mutations normalized by gene length in the matrisome vs. non matrisome were tested by the Mann-Whitney U test and by further randomization tests. These included (1) 1000 tests against random~33% of whole non-matrisome human genes, (2) 1000 tests against random non-matrisome human gene sets, each the same size as the number of mutated matrisome genes, and (3) 1000 tests against random non-matrisome human gene sets, each composed of genes longer than the average length of matrisome genes. Gene lengths were pulled from the "Goseq" library in R. For matching mutations and protein domains, we first pulled protein domain coordinates for matrisome genes using the Ensemble database and the R libraries "ensembldb" and "EnsDb.Hsapiens.v86" and then mapped each mutation onto the domains using a "between" SQL query implemented in the R library "sqldf".
Hotspots mutations were defined as those occurring at least five times per tumor type in at least two different tumor types.
Mutation effects on overall survival (at the whole-gene or domain level) were modelled in univariate (Kaplan-Meier) and multivariate (Cox proportional hazard) analyses, the latter including also age at diagnosis, gender, and ethnicity as covariates. Mutations were filtered before analysis to remove entries with fewer than 10 patients in at least one tumor, and analyses were performed for overall survival (OS). Further data can be provided on request to the Authors or by running the relevant code section (see Sections 3.9 and 3.10). Only genes/domains with a significant concordant effect in both the analyses are reported.
To assess the eventual effect of tumor purity on CNAs and mutations, we retrieved consensus measurement of purity estimations (CPEs, available for 11 of the 14 tumor types studied here) and imputed effects using generalized linear models (GLMs).
In all analyses, a p value < 0.05 was chosen as the threshold for reporting significant results.

Data Availability
All starting data are freely available and downloadable from the sources noted above (see Section 3.1).
The same data are enclosed in a freely accessible Zenodo repository (10.5281/zenodo.3941354). All results tables can be obtained from the authors upon request.

Conclusions
This first survey of the genomic and mutational landscape of the cancer matrisome has uncovered the interesting, and yet perhaps unexpected, extent and consequences of copy number and mutational alterations of matrisome genes in a panel of 14 solid tumor types.
Of note, TCGA data were collected from bulk tumor samples and more specifically, mostly from tumor cells, as previously shown [61] and further validated here ( Figure S9 and Table S7). We can thus confidently map our findings to tumor cells rather than to other cells of the tumor microenvironment. In this respect, we observed that the presence of eventual impurities (i.e., the presence of other cell types of the TME) is a marginal confounder for CNAs and a negligible one for mutations. Further acknowledging that the number of CNAs and mutations in the cancer genome and the sample composition in terms of tumor/TME fractions are not linearly nor directly associated [61,64], the estimates we report for their interactions probably exceed their true extent.
While we have previously shown that tumor cells do secrete ECM proteins, cancer-associated fibroblasts are the main ECM producers and remodelers. In addition, there is now an increased recognition of the impact of cellular and microenvironmental heterogeneity on tumor progression, metastasis formation, and response to treatment. Future studies should thus focus on elucidating the presence and roles of matrisome CNAs and mutations in the different cell populations found in the tumor microenvironment, the timeline of their occurrence, and, importantly, the loco-regional distribution of mutated ECM proteins within the tumor microenvironment.
Future studies are also necessary to decipher the functional consequences of the mutations identified here. One possibility is that mutations in matrisome genes, and more specifically located in sequences encoding protein domains, can affect protein/protein (e.g., ECM protein/ECM protein, ECM/growth factor, ECM/enzyme, ECM protein/ECM receptor) interactions and subsequently alter biochemical and mechanical signaling, leading to dysregulation of cellular phenotypes and eventually to cancer progression. Additionally, with recent reports highlighting the impact of the ECM on immune cells within the tumor microenvironment [30], mutations in matrisome genes could also result in the generation of neo-antigens and thus rewire the immune response.
Last, our analysis also had the power to identify matrisome genes whose mutational burden was an independent predictor of overall survival. It would thus be interesting to expand our survey to the study of specific genes that could predict disease-specific or metastasis-free survival and compute whether mutational burden in certain matrisome genes correlates with the variation of the progression-free interval.
We believe that our results are a starting point to the more extensive mapping of clinically relevant matrisome gene alterations and can be used to prioritize further investigations that may lead to significant translational applications to improve cancer patient care.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-6694/12/8/2046/s1, Figure S1: Copy number alterations of matrisome genes across 14 different cancer types broken down by CNA type (related to Figure 1), Figure S2: Copy number alterations of matrisome genes across 14 different cancer types broken down by CNA type and matrisome gene category (related to Figure 1), Figure S3: Number of mutations per matrisome gene length and matrisome gene category (related to Figure 3), Figure S4: Identification of potential mutational hot spots in matrisome genes, Figure S5: Type of mutations per matrisome gene category and cancer type, Figure S6: Location and type of mutations per matrisome gene category and cancer type, Figure S7: Prediction of mutational effects per matrisome gene category and cancer type (related to Figure 5), Figure S8: Cross-validation using independent cancer patient cohorts, Figure S9: Purity of samples assessed across the TCGA Pan-Cancer cohort, Table S1. Consequences of CNAs on matrisome gene expression levels, Table S2. Frequency and recurrence of mutations of matrisome genes, Table S3. Top 20 most frequently mutated domains in ECM proteins, Table S4. Top 10 most mutated matrisome genes, Table S5. Effect of mutations on univariate and multivariate survival at the ECM gene level (A) and ECM protein-domain level (B), Table S6