Multi-Omic Approaches in Colorectal Cancer beyond Genomic Data

Colorectal cancer (CRC) is one of the most frequent tumours and one of the major causes of morbidity and mortality globally. Its incidence has increased in recent years and could be linked to unhealthy dietary habits combined with environmental and hereditary factors, which can lead to genetic and epigenetic changes and induce tumour development. The model of CRC progression has always been based on a genomic, parametric, static and complex approach involving oncogenes and tumour suppressor genes. Recent advances in omics sciences have sought a paradigm shift to a multiparametric, immunological-stromal, and dynamic approach for a better understanding of carcinogenesis and tumour heterogeneity. In the present paper, we review the most important preclinical and clinical data and present recent discoveries in the field of transcriptomics, proteomics, metagenomics and radiomics in CRC disease.


Introduction
Colorectal cancer (CRC) is one of the most frequent tumours and one of the major causes of morbidity and mortality globally. Its incidence has increased in recent years, being the third most common worldwide with approximately one million new cases per year, particularly in developed countries [1]. This dramatic increase could be linked to risk factors such as unhealthy dietary habits, stress, smoking, and a sedentary lifestyle that, combined with environmental and hereditary factors [2], can lead to genetic and epigenetic changes in normal epithelial cells and induce tumour development. The polyp cancer progression sequence model described by Fearon and Vogelstein is considered a parametric, static and complex model involving oncogenes (e.g., Kirsten rat sarcoma viral oncogene (KRAS), Neuroblastoma RAS viral oncogene homolog (NRAS), V-raf murine sarcoma viral oncogene homolog B1 (BRAF), and phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha (PIK3CA)), tumour suppressor genes (e.g., tumour protein P53 (Tp53), adenomatous polyposis coli (APC), phosphatase and tensin homolog (PTEN)) and pathognomonic signalling pathways that modulate cell differentiation, proliferation and apoptosis in the CRC: Wnt/β-cadherin, epidermal growth factor receptor (EGFR), mitogen-activated protein kinase (MAPK), transforming growth factor beta (TGF-β) and phosphoinositide 3-kinase (PI3K) [3]. Despite multiple efforts to better understand tumorigenesis, the lack of new biomarkers and tumour heterogeneity present many unclarified challenges. Since the human genome project, omics sciences have revolutionised the study of CRC. Transcriptomics, proteomics, metagenomics, metabolomics and radiomics contribute to a paradigm shift towards a multiparametric, dynamic, immunological and stromal model that allows for a better understanding of CRC development as well as its classification into different molecular subtypes for patient stratification and the development of new biomarkers and targeted therapies. This review highlights the contributions of transcriptomics, proteomics, metagenomics and radiomics over the last few years in building a multi-omics model for a better understanding of tumour development and heterogeneity to ensure optimal treatment of CRC. population should be tested. Molecularly driven therapy with the selective TRK inhibitor larotrectinib and multikinase inhibitors with activity against these fusion proteins has been tested in selected populations with NTRK fusions. Finally, RET fusions have been found in a small fraction of patients with CRC (<1%), predominantly on the right side, RAS and BRAF wild-type tumours, and carry a worse prognosis compared to patients without RET fusions [19][20][21].
Although several pathways are involved in CRC carcinogenesis, TCGA provided data that 93% of tumours showed alterations in the WNT/B-catenin pathway through inactivation of the APC gene or active mutations of the CTNNB1 gene as well as alteration of negative regulatory genes such as ARID1A and FAM123B. Genetic alterations in the RAS-MAPK and PI3K pathways were reported as the second most common, not only due to overexpression of the insulin-like growth factor 2 (IGF2) gene, but also due to mutations in PIK3CA and in the Ras gene family. Another pathway deregulated in CRC is the TGF-B pathway, where genomic alterations were found in the TGFBR1, SMAD4-SMAD3-SMAD2 and ACVR2A genes [6,[22][23][24].
Defects in DNA repair genes can occur due to germline mutations, somatic mutations or gene silencing leading to biallelic inactivation of these genes. The DNA mismatch machinery consists of four proteins (MSH2, MSH6, MLH1 and PMS2) so that when a loss of function of any of them occurs, a cascade of favourable events is triggered in the development of CRC [25,26]. Immunohistochemistry (IHC) and genomic sequencing tools can be used for diagnosis; IHC detects the loss or absence of some of the proteins, while PCR can analyse the corresponding loci (BAT-25, BAT-26, DS2S123, D5S346 and D17S250) [27]. Microsatellite instability status occurs in 5% of metastatic colon tumours and 15% of primary tumours and may have predictive and prognostic value, respectively. The combination and sum of these alterations lead to a genetic classification of CRC carcinogenesis, and two phenotypes can be identified: [1] The CIN phenotype (65-70%) is related to defects in chromosomal segregation, telomeric stability and mutations in APC, KRAS and TP53; [2] MSI phenotype (15%), which are hypermutated as a result of a defective DNA mismatch repair (MMR) system. This categorisation provides a linear model of carcinogenesis that has raised several unresolved questions about tumour heterogeneity.

Transcriptomics
The information present in the DNA and its genetic and epigenetic changes are expressed by transcription, detailing the most precise activity of a cell at that moment and its close relationship with the tumoral phenotype and its subsequent clinical behaviour. From the linear genetic model of carcinogenesis pathways to the advancement of technologies to study the transcriptome, multiple attempts have been made to molecularly characterise CRC. In the absence of a gold standard for molecular analysis of CRC, in 2015 a group of experts evaluated in an exhaustive and methodological way all the molecular classifications of CRC obtained through different approaches, to achieve an integrative set of samples that could resolve the inconsistencies of the pre-existing classifications and unify all the existing data not only on gene expression but also at the level of mutational burden, copy number, microRNAs, methylations and proteins in order to achieve four molecular subtypes (CMSs), being the clearest and most consistent classification in use up to date [28]. The main characteristics of each subtype are summarised as follows: at the genomic aberration level, CMS1 (~15%) is characterised by a hypermutated and generalised hypermethylated status, with low Somatic Copy Number Alteration (SCNA) in concordance with TCGA data as well as overexpression of proteins involved in DNA repair. On the other hand, CMS2 (~40%) and CMS4 (~25%) are characterised by high chromosomal instability (CIN) and high SCNA. Meanwhile, CMS3 (~13%) has a mixed pattern with few SCNAs but also a state of moderate hypermutation. Despite attempts to identify specific mutations in each subgroup, we were only able to identify a higher presence of BRAF mutations in CMS1 and KRAS mutations in CMS3, but without achieving a distinctive pattern. Finally, at the level of gene expression, CMS1 was characterised by the presence of genes associated with immune infiltration and with activation of immune evasion pathways, while CMS2 was closely associated with the Wnt/Myc pathways. By contrast, CMS3 showed an enhancement of multiple metabolic signatures and CMS4 a pronounced upregulation of genes associated with epithelial to mesenchymal transition and signatures linked to the activation of TGF-β signalling. Moreover, multiple subsequent investigations have tried to relate these subgroups to different events (clinical-pathological relationship, immune-microenvironment interaction, and prognosis-treatment association] to achieve a possible refinement of the classification (Figure 1). The clinical-histopathological variables through the CMSs are associated with different histological types of the precursor adenoma; trabecular-mucinous, complex tubular structure, papillary and desmoplastic reaction for CMS1, CMS2, CMS3 and CMS4, respectively [28,29]. In addition, a relationship with their localisation has also been observed; CMS1 is predominantly found on the right side and CMS2 on the left side and may be correlated to their respective mutational characteristics [30]. Another clinical feature observed has been the relationship between CMSs and the different stages of CRC, with CMS4 being the most frequently found in advanced stages. However, converting these relationships into a gold-standard pattern is complicated by the intratumoral heterogeneity of CMSs in the same sample observed in different studies. This pronounced heterogeneity is probably explained by gene expression variations between the different regions of the tumour and in association with the components of the tumour microenvironment (immune-stromal content). The different immuno-stromal phenotypes of CRC can be directly related to genomic events and the production of immunogenic peptides as well as the density of CTL infiltration [31]. This leads to two major immuno-stromal phenotypes [1]; Highly immunogenic: hypermutated tumours with DNA repair defects, high infiltration of Cytolytic T Lymphocytes (CTLs) and Lymphocyte T helper 1 (Th1), high expression of CTL-associated antigen 4 (CTLA4), programmed cell death protein 1 (PD1), PD1 ligand 1 (PDL1), and indoleamine 2,3 dioxygenase 1 (IDO1] [2]; Inflamed: high infiltration of regulatory T cells (T-reg) cells, myeloid-derived suppressor cells (MDSCs], intimately related to TGF-β and genes encoding cytokines IL-23 and IL-17 [32,33]; more recently, Thorsson et al. [34] presented a possible global immune classification of solid tumours based on the transcriptomic profiles where six groups were detailed (Table 1). The application of these phenotypes across CMSs taking into account their genomic and transcriptomic framework can establish that CMS1 would be immune active, while CMS4 could be related to the inflammatory or immunosuppressed pattern; in turn, CMS2 is considered immune desert and CMS3 immune mixed. The prognostic value of CMS subgroups has been extensively studied over the last few years but Galon and colleagues [35][36][37] were among the first to highlight the relevance of immunologic phenotypes in the prognosis of early-stage CRC, describing that high lymphocyte infiltration, especially of Th1 CTLs and interferon gamma (IFNγ), correlates with positive overall survival (OS) and disease-free survival (DFS), and higher levels of interleukin (IL 17) and Th17 are associated with worse outcomes, which may be linked to their ability to develop pro-metastatic immune evasive mechanisms. On the other hand, the predictive value in early stages was demonstrated only by a retrospective analysis of the MOSAIC trial where CMS2 patients benefited from the use of Oxaliplatin as adjuvant treatment. In the metastatic setting, retrospective analyses of clinical trials such as CALGB-80045 and FIRE 3 showed that CMS2 was associated with a positive OS, while CMS4 and CMS1 were associated with poor and intermediate OS, respectively [38,39]. The CALGB 80045 study demonstrated that the use of Bevacizumab in CMS1 was associated with better OS compared to the use of Cetuximab and that CMS2 showed a prolonged OS under treatment with Cetuximab, which could be explained by the intimate relationship of CMS1 with BRAF mutations and an MSI state. In addition, immunogenic subtypes, especially CMS1, have a greater tendency to respond to immunotherapy than immunosuppressed tumours. Despite all these developments, further efforts are needed to refine the CMS classification to assess its predictive role and for the development of optimal therapies.

Proteomics
Proteins represent key actors in several biological processes and their expression could be altered by the presence of gene mutations. The proteome is the functional translation of the genome, as well as a useful source of potential biomarkers. Protein biomarkers are notably up-or downregulated in the cancer proteome as compared to the normal proteome; for this reason, in recent years, proteomics research has focused on identifying differential expression characteristics between normal and cancer cells; detecting proteins involved in cancer formation and progression as well as observing the effects of protein perturbation or modification to provide new classification tools such as possible diagnostic, prognostic and predictive biomarkers in CRC.
Several preclinical studies have identified the proteome of CRC cell lines and murine models, to underline the biological changes that affect CRC disease. CRC cell line secretome has been studied as a part of a large analysis of different solid tumour cell lines in which 4584 non-redundant proteins were identified and 30% of these were found in a ubiquitous manner along with different tumour types. On the other hand, 109 proteins were found only in CRC cells, thus demonstrating specificity for CRC and potentially being considered as biomarkers of disease [40]. A lot of other differences in in vitro studies showed the presence of different proteomics biomarkers, also in correlation with treatment, such as the study by Boisvert et al., in which the authors found that DNA damage could change subcellular proteomic localisation by performing a proteomics analysis [41]. Proteomics studies also involved engineered murine models. Interestingly, Zhu and colleagues utilised APC−/+ mouse models and identified 27 up-regulated proteins in tumour tissue, compared to the normal one. Another group found biomarkers such as MCM4, S100A9 and CHI3L1 in CRC proteasome proximal fluids of conditional APC knockout mice, compared to healthy mice with normal mucosa [42]. Several proteomics studies have been performed in a clinical setting using biological samples such as blood, stools and tissue, with the same aim to identify putative biomarkers of CRC disease. However, even if a lot of effort has been made over the last 20 years in the field of proteomics, predictive proteomic biomarkers of response to treatment have not yet been defined. In fact, all the analyses have been conducted in small cohorts and have not provided the expected results since a plethora of biomarkers have been identified but none of these similarities between the different studies. Moreover, none of the abovementioned biomarkers found in a "discovery phase" have reached a "validation phase", thus precluding their use in a clinical setting [43][44][45][46][47] (Table 2). Integration of proteomics with transcriptomic and genomic data and the implementation of technologies such as mass spectrometry assays could overcome the heterogeneity of proteomics biomarkers.

Metagenomics
Metagenomics is the study of a microbiota community. The microbiota is the set of microorganisms (bacteria, viruses, fungi, protozoa, worms and archaea) that inhabit the human body. This science allows the discovery of microbial communities in their complex natural environment and their relationship with the host using techniques based on sequence divergences of the small subunit ribosomal RNA (16S rRNA) as NGS, denaturing gradient gel electrophoresis (DGGE), fluorescence in situ hybridisation (FISH), terminal restriction fragment length polymorphism (T-RFLP) and DNA microarrays [48]. The Human Microbiota project emerged in the 2000s with the aim of characterising the human microbiome with more precision in order to determine the intrinsic relationship with diseases and to provide a standardised data source. Both metagenomics and HMP allowed the development of the gut microbiota profiling that represents 29% of the human microbiota and is mostly composed of prokaryotic microorganisms that maintain a dynamic and homeostatic symbiotic relationship with the host supporting a robust immune and nutritional system [49,50]. The disruption of this homeostatic process leads to the development of multiple diseases such as inflammatory bowel disease (IBD) and CRC. Initial evidence for host-microbiota interactions in CRC emerged in 1969 with the publication of Vivienne Aries et al. and in 1975 when it was shown that the carcinogen dimethylhydrazine triggered significantly less colonic tumorigenesis in germ-free rats than in those with gut microbiota [51], but over time, multiple studies have also demonstrated that pro-carcinogenic microorganisms can influence cell proliferation, genomic instability and the tumour microenvironment of CRC [50,[52][53][54]. Enterotoxigenic Bacteroides fragilis (ETBF) secretes B-fragilis toxin (BFT) that binds to E-cadherin, allowing its translocation to B-cadherin and the subsequent activation of the proto-oncogene c-Myc and therefore the cell proliferation of the colonic epithelium; by a similar mechanism, Fusobacterium nucleatum binds to E-cadherin through FadA, activating the Wnt/B-cadherin pathway, while Escherichia coli (EC) releases the genotoxin colibactin (pks+), which causes senescent cells to secrete growth factors. Furthermore, in vitro studies have shown that the genotoxin colibactin pks+ and Enterococcus faecalis alkylates DNA, producing double-strand breaks, aneuploidy and microsatellite instability [55]; by contrast, ETBF can induce DNA damage by stimulating inflammation and a pro-oxidant microenvironment through the expression of the spermine oxidase enzyme (SMO). Moreover, the relationship between the microbiota and the immune system has allowed endogenous pathogens to interfere in the tumour microenvironment by activating pro-tumorigenic immune responses like ETBF increase in mice models the T-Th17 that is generally associated with worse prognosis in CRC and Fusobacterium nucleatum use Fad 2 adhesine binding to the T cell immunoreceptor with immunoglobulin and immunoreceptor tyrosine-based inhibitory motif domains (TIGIT) to silence the tumour-killing capabilities of cytotoxic immune cells, among other mechanisms described [56,57]. Therapeutic treatment of the gut microbiota has been the focus of recent studies (Refs. [58,59]); antibiotics, microbiota transplantation, vaccines and immunotherapy have been some of the therapies proposed but have not yet been enough to stop the development of CRC. The need for large, international studies with prospective and longitudinal sampling and a more focused study of colorectal cancer microbiota and emerging targeted therapies has led to the creation of two recent projects: the OPTIMISTICC project (Opportunity to investigate the microbiome's impact on science and treatment in colorectal cancer) and MICROCOSM (Microbiome of colorectal cancer: a longitudinal study of mechanism), the results of which are awaited.

Radiomics
The progress to a multiparametric approach to CRC development is also closely linked to technological advances in medical imaging. Magnetic resonance imaging (MRI), computed tomography (CT) and fluorodeoxyglucose positron emission tomography (FDG-PET/CT) have diagnostic and prognostic value in all stages of CRC. Tumour location, volume, size and texture as well as FDG uptake represent important qualitative parameters that cannot reflect tumour heterogeneity. Since genomic profiling is essential for therapy in CRC, there have been several attempts to explore the potential role of radiomics in this context by developing radiogenomic models able to predict genomic mutations such as KRAS, BRAF and MSI status and enhance decision-making and patient outcomes. Lee et al. in 2016 attempted to predict KRAS status depending on C-reactive protein (CRP) levels using FDG-PET/CT; 179 patients of all stages were studied, 75% had normal CRP values and 25% had increased values. The maximum standardised uptake value (SUV max ) relationship could only be demonstrated in KRAS mutated (KRASmt) patients with normal CRP values [60]. Years later, Arslan et al. also attempted to demonstrate by FDG-PET/CT the association of SUV max with the coexistence of KRAS mutations in 83 patients with CRC; they found that SUVmax was higher in KRASmt patients than in wild-type patients (24.0+/−9.0 vs. 17.7+/−8.2) [61]. Chen et al. were also able to show, in a study of 74 patients, the association between radiomics and KRAS mutations using SUV max , 6 histograms and 40 textural indices [62]. Other works such as those of Oh et al. and Xu et al. were performed with MRI specifically for patients with rectal cancer. The first group was able to demonstrate that three radiomics features were significantly associated with KRAS mutation status, while Xu et al. observed that differences were higher in the KRASmt group [63,64]. Gonzalez Castro et al. in a study of 147 patients noted that grey-level pixels and spectral texture features CT-based radiomics can predict KRAS mutations [65]; these findings were supported by the group of Taguchi et al. [66]. On the other hand, two studies (Orner et al. and Krikelis et al.) that predict KRAS status by FDG-PET/CT failed to demonstrate a statistically significant relationship between SUV max value and KRAS mutation status [67]. Similarly, Hong et al. also failed to find a significant relationship between MRI and KRAS mutation. By contrast, there is limited literature related to radiomics predicting BRAF mutations and MSI status in CRC [68]. Kawada et al. in 2012 with a retrospective study of 51 patients showed that KRAS/BRAF mutation correlated with higher SUV max [69]. Similar results were found by Lei Yang et al. in 2018 that demonstrated that three CT radiomics feature signatures were significantly associated with KRAS/NRAS/BRAF mutations (p < 0.001) [70]; Negreros-Osuna et al. in 2020 show that BRAF mutation could be predicted by radiomics features [71]. Concerning MSI's predicted status, two studies published in 2021 support the previous research findings of Pernicka et al. [72]. Both prospective and multicentre trials attempted to predict MSI status by CT using three models (clinical model, radiomics model and an integrated model); the clinical-radiomic model was in both cases the best predictor of the relationship with MSI status [73,74]. (Table 3 summarises the characteristics of radiomics studies.)  The potential role of radiomics in the identification of new prognostic and predictive CRC biomarkers has been translated into the use of machine-learning algorithms to provide clinical information on innovative artificial intelligence (AI) models. These computational analyses have identified patterns that represent a diagnostic tool better than conventional radiomics models [75]. Recently, at ESMO 2021 Congress, an AI model to automatically detect MSI status in early CRC has been presented. AI integrated imaging has led to the identification of CRC on unstained tissue samples and subsequently to a dichotomic differential diagnosis between MSS and MSI status, even if with low specificity. Therefore, although AI imaging could represent an innovative approach to determine MSI status, larger studies are requested to further confirm these data [76].
Despite the high potential of radiomics, the small population numbers and lack of reproducibility are two major limitations. Future efforts with a better-defined population, combined (clinical-radiomic) models, and the definition of the most appropriate imaging method could better clarify the landscape for prospective studies and change clinical practice.

Conclusions
The development of omics sciences and their technologies has helped to understand the onset, progression and treatment of CRC in a more integrated manner. Genomic and transcriptomic profiling has the main role of establishing molecular subtypes with the corresponding stratification of patients to pave the way for personalised medicine. Both DNA and RNA are vectors of genetic information that encode proteins. Their study, through proteomics, allows the true functional interpretation of what happens at the cellular level in a given situation as well as its most crucial contribution to CRC in the identification of potential new biomarkers and targets for novel targeted therapies. The study of the microbiota provides a non-traditional tool with a future role in better understanding tumour biology, as does radiomics, which serves as a bridge between medical imaging and precision medicine, providing objective and accurate information that in the future will help to better understand intratumoral and intertumoral heterogeneity through the use of a non-invasive method. This multiparametric and holistic approach has provided short-term benefits through biomarkers and potential targets, but there is still a long way to see long-term benefits through early diagnosis and increased overall survival in CRC.

Conflicts of Interest:
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: E.M. has served as advisor and speaker for Astra Zeneca, Amgen, Bayer, Merck-Serono, Roche, Sanofi, Servier and Pierre Fabre. T.T. has served as advisor and speaker for Roche, Merck-Serono, Sanofi, Servier, Novartis and Bayer. F.C. has served as advisor and speaker for Roche, Amgen, Merck-Serono, Pfizer, Sanofi, Bayer, Servier, BMS, Cellgene and Lilly and received institutional research grants from Bayer, Roche, Merck-Serono, Amgen, AstraZeneca and Takeda. E.S., D.C., S.N., C.M.D.C., A.R., G.A. and G.M. declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.