1. Introduction
Colorectal cancer (CRC) accounted for 10% of all new cancer cases (males only, all ages) in 2022 and is the third-most prevalent cancer in the world (
https://gco.iarc.who.int/media/globocan/factsheets/populations/900-world-fact-sheet.pdf, accessed on 15 July 2024). As per data from Globocan 2022, there were ~43,360 new CRC cases (males only, all ages) in 2022 in India, making it the fourth most prevalent cancer in the country (
https://gco.iarc.who.int/media/globocan/factsheets/populations/356-india-fact-sheet.pdf, accessed on 15 July 2024). Projections taking into account aging, population growth, and human development estimate that by 2040, the incidence of CRC in India will increase by more than 60% (both sexes, all ages) (Global Cancer Observatory, Cancer Tomorrow,
https://gco.iarc.fr/tomorrow/en, accessed on 15 July 2024). The number of young people (0–49 years) expected to be diagnosed with CRC has been estimated to increase by more than 13% between 2022 and 2030 and by more than 20% between 2022 and 2040 (both sexes) (Global Cancer Observatory, Cancer Tomorrow;
https://gco.iarc.fr/tomorrow/en, accessed on 15 July 2024).
Age represents the primary risk factor for CRC [
1], although the cumulative risk for early-onset CRC (0–49 years old) in India has increased by 116% in males and 200% in females in the time span of 26 years (1986 to 2012) (Global Cancer Observatory, Cancer Over Time;
https://gco.iarc.fr/overtime/, accessed on 15 July 2024). Analysis of CRC incidence via the Surveillance, Epidemiology, and End Results (SEER) data from 2000 to 2019 revealed that adults aged less than 50 years were noted to have an average of 2.4% annual increase in CRC incidence rates, while adults above the age of 65 years had an average of −3.4% change in CRC incidence. Thus, although there is a general decrease in the overall incidence of CRC, the incidence of the disease in young adults (<50 years) has increased worldwide. This has prompted the ACS (American Cancer Society) to revise the standard age for CRC risk screening to 45 years from 50 years [
2].
Traditionally CRC has been classified into three molecular subgroups based on the mechanism of carcinogenesis: chromosomal instability (
APC (adenomatous polyposis coli) inactivation, Wnt (wingless-related integration site) signaling activation, activating
KRAS (Kirsten rat sarcoma viral oncogene homolog) mutations), defects in DNA mismatch repair (Microsatellite Instability—MSI) and aberrant CpG island hypermethylation and gene silencing (CIMP,
BRAF (v-RAF murine sarcoma viral oncogene homolog B1) mutations) [
3,
4]. However, consensus exists that Early-Onset CRC (EOCRC) is pathologically, anatomically, metabolically, and biologically different from Late-Onset CRC (LOCRC) and hence should be investigated and managed differently [
5,
6,
7]. EOCRC tumors were found to be mostly located in the distal colon (80%), particularly the sigmoid colon and the rectum, with a higher prevalence of adverse histological factors such as signet ring cell differentiation, venous invasion, and perineural invasion [
8]. The tumors lacked frequent activating
BRAF or
KRAS mutations, suggesting that the molecular events in tumor development differed with respect to the late-onset group [
8]. Additionally, EOCRC was not frequently associated with precursor adenomatous lesions [
8], suggesting that the classical adenoma-to-carcinoma pathway of molecular events [
6,
9] does not occur in this patient subset.
Recently, gene expression analyses have identified putative targets that indicate altered pathways in EOCRC [
10,
11,
12,
13,
14]. The MAPK (mitogen-activated protein kinase) pathway appeared to be deregulated in the early-onset sporadic group as compared to PI3K-Akt (phosphatidylinositol 3-kinase/protein kinase B) in the late-onset group [
10]. Bioinformatics analysis on microarray data sets to identify EOCRC-linked differentially expressed genes (DEGs) highlighted 108 upregulated genes and 23 downregulated genes [
11]. Functional enrichment of the EOCRC-associated upregulated DEGs indicated strong implication of molecular mechanisms involved in vascular smooth muscle contraction signaling pathway [
11]. PPI network analysis identified 7 hub genes—
ACTA2 (smooth muscle cell alpha-2 actin),
ACTG2 (actin gamma-2 smooth muscle),
MYH11 (myosin-11),
CALD1 (caldesmon),
MYL9 (myosin regulatory light polypeptide 9),
TPM2 (β-tropomyosin), and
LMOD1 (leiomodin 1) associated with the vascular smooth muscle contraction signaling pathway [
11]. Early-onset sporadic tumors lacking canonical genetic aberrations like MSI and Wnt/β-catenin activation were found to be enriched in Ca
2+/NFAT pathways [
12,
13]. High-throughput RNA sequencing of EOCRC tumors followed by validation by RT-qPCR identified significant upregulation of genes
TNS1 (tensin 1) and
MET (MET proto-oncogene, receptor tyrosine kinase/hepatocyte growth factor receptor) [
14].
MicroRNAs (miRNAs) are important regulatory molecules that may act as either tumor suppressors or oncogenes depending on the cellular environment in which they are expressed [
15]. Analysis of miRNA expression profiles and their predicted target genes can indicate the aberrant physiology of a system and may be targeted in therapy or used as biomarkers for diagnostic purposes [
16,
17]. miRNAs play important roles in the development of CRC, as their deregulation affects signaling pathways like Wnt/β-catenin, epidermal growth factor receptor, p53, mismatch repair/DNA repair, transforming growth factor beta, PI3K/Akt, and Ras-Raf-MAPK [
18,
19]. Numerous studies assessing miRNA levels in the blood and tissues of CRC patients have detected their altered expression [
18,
19,
20,
21,
22,
23]. Experimental modulation of wild-type p53 in CRC cell lines was found to upregulate tumor-suppressing miR-34a, miR-192, miR-194, and miR-215 [
24,
25].
In spite of the bulk of studies investigating miRNAs in CRC, very few have differentially examined miRNAs in EOCRC as compared to LOCRC. Yantiss et al. in 2009 studied the clinical, pathological, and molecular features of young-onset colorectal carcinoma in patients < 40 years old. They observed significant overexpression of miR-21, miR-20a, miR-145, miR-181b, and miR-203 in the tumors of young patients [
26]. Investigation of Turkish EOCRC tumors revealed upregulation of miR-106a and downregulation of miR-143 and miR-125b [
27]. Elevated expression of miR-106a and downregulation of miR-125b correlated with lymph node metastasis in patients [
27]. In this study, EOCRC tumors were compared with normal tissues, and no direct comparison with a LOCRC subset was included in the investigation. Recent work by Nakamura et al. identified a four-miRNA liquid biopsy panel for EOCRC diagnosis that robustly identified patients with EOCRC even in early-stage disease, indicating its clinical effectiveness [
28]. RNA-seq of sporadic EOCRC-associated miRNAome and transcriptome and validation by bioinformatics study and RT-qPCR in additional cohorts identified the miR-31-5p-
DMD axis as a novel biomarker of sporadic EOCRC [
29].
DMD (dystrophin) was found to be downregulated and miR-31-5p was found to be upregulated in sporadic EOCRC, pointing to its possible role in the occurrence of EOCRC [
29]. This study also identified miRNAs significantly altered in LOCRC. They reported elevated levels of miR-31-3p and reduced levels of miR-10b-5p specifically in tumors of late-onset CRC patients as compared to adjacent pericarcinomatous tissue [
29]. However, miRNAs deregulated in Indian EOCRC patients have not yet been explored.
The goal of our study was to highlight EOCRC-specific miRNA alterations in Indian cohorts, which could be used to discriminate between EOCRC and LOCRC and potentially identify deregulated molecular pathways in early-onset disease. Highlighting the EOCRC-specificity of the dysregulated miRNAs was important for an insight into the mechanism contributing to the rise in EOCRC cases. To achieve this goal, we performed genome-wide small-RNA sequencing of sporadic colorectal tumors in young patients (<50 years old) and old patients (>50 years old) negative for canonical CRC markers like MSI, nuclear β-catenin, and APC mutation. Differentially expressed EOCRC miRNAs (DEMs) were validated by analysis of expression in TCGA-COAD and TCGA-READ datasets followed by quantitative real-time PCR (RT-qPCR) in additional EOCRC and LOCRC patient cohorts. Subsequent bioinformatic analysis of the validated miRNAs identified deregulated pathways in EOCRC. To the best of our knowledge, this study is the first to compare miRNA expression between EOCRC and LOCRC patients in India and additionally identify EOCRC tumor miRNA alterations that are specific to early-onset disease.
3. Discussion
The last few decades have seen an increase in the incidence of colorectal cancer among individuals less than 50 years old, also referred to as EOCRC. Not much is known about the EOCRC or the reason for its increase in the younger population. We performed an RNA-Seq analysis of sporadic colorectal tumors in young patients (EOCRC < 50 years old) and aged patients (LOCRC > 50 years old) negative for canonical CRC markers like MSI, nuclear β-catenin, and
APC mutation. 23 miRNAs were differentially expressed specifically in young patients, 11 miRNAs were differentially expressed specific to aged patients, and 5 miRNAs were found to be differentially expressed in both. The 5 miRNAs (hsa-miR-129-5p, hsa-miR-9-5p, hsa-miR-1-3p, hsa-miR-145-5p, and hsa-miR-133a-3p) common to both EOCRC and LOCRC have been previously reported as downregulated in CRC [
41,
42,
43]. For validation of identified EOCRC DEMS, we divided the TCGA-COAD and TCGA-READ datasets into young (<50 years) and old (>50 years) groups and compared the expression of the top 10 EOCRC miRNAs in normal and tumor samples of the age-specific TCGA cohorts. Hsa-miR-1247-3p, hsa-miR-27a-5p, hsa-miR-96-5p, hsa-miR-148a-3p, hsa-miR-326, hsa-miR-378a-5p, hsa-miR-135b-5p, hsa-miR-378c, and hsa-miR-378d showed differential expression in young tumors as compared to corresponding normal. Interestingly, hsa-miR-326 and hsa-miR-378a-5p are significantly downregulated in tumors of the TCGA-COAD and TCGA-READ cohorts (
Figure 3F,G); however, age-specific TCGA analysis revealed upregulation in young (<50 years) tumor datasets (
Figure 4F,G). EOCRC-specific expression of miRNAs seems to differ from the overall miRNA profile, with a larger number of LOCRC cases skewing the data to resemble that of late-onset CRC.
Upregulation/downregulation observed in the age-specific TCGA analysis confirmed our previous RNA-Seq analysis for all miRNAs, except hsa-miR-326, hsa-miR-378a-5p, hsa-miR-378c and hsa-miR-378d. These miRNAs were found to be upregulated in young tumors of the TCGA cohort, whereas our RNA-Seq analysis showed downregulation in young patients. Hence, to resolve the anomalies between TCGA data and our RNA-seq results, we needed to validate our observations in additional cohorts. For downstream validation by RT-qPCR, we left out hsa-miR-378e as TCGA analysis revealed no expression in young datasets. Selected miRNAs were further validated in additional EOCRC and LOCRC cohorts of 16 young patients (<50 years old) and 11 old patients (>50 years old). Significantly upregulated DEMs included hsa-miR-1247-3p and hsa-miR-148a-3p. Hsa-miR-326 was significantly downregulated in both EOCRC discovery and validation cohorts in contrast to upregulation in TCGA-COAD and TCGA-READ young tumor samples. Additionally, the slight change in hsa-miR-326 expression between young and old normal tissues observed in the TCGA age-specific analysis was not replicated in the RT-qPCR validation cohort. The possible reason behind this could be that the ethnicity of the population in TCGA cohorts consists of Caucasians and African Americans, with no representation of the Indian population amongst them [
44]. Hsa-miR-326 seems to potentially show ethnicity-specific changes accounting for the difference in expression between TCGA datasets and our Indian validation cohort. Since our study concentrates on an East and North Indian population, racial differences may contribute to this variation in hsa-miR-326 expression.
Experimentally validated targets of the selected miRNAs were compared with differentially expressed (DE) genes of the TCGA-COAD and TCGA-READ cohorts to identify predicted DE-miRNA (DEM) targets altered in colorectal tumor tissue in a direction reciprocal to that of the miRNAs. Most of the DE target genes of upregulated miRNA hsa-miR-1247-3p (a total of 20 in number) were enriched in biological processes of anatomical structure morphogenesis (10/20 genes), receptor-mediated endocytosis (4/20 genes), and in molecular function of phosphatidylinositol-3,5-bisphosphate binding (2/20 genes). DE target genes of upregulated miRNA hsa-miR-148a-3p (total 26 in number) were enriched in biological processes of negative regulation of metabolic processes (11/26 genes), regulation of phosphorus metabolic processes (8/26 genes), tissue morphogenesis (7/26 genes), cellular response to nutrient levels (5/26 genes) and negative regulation of anoikis (3/26 genes). Cellular component analysis of miR-148a-3p DE-target genes revealed enrichment of genes located in the ruffle membrane (3/26 genes), which is an indicator of tumor cell motility and metastatic ability [
45]. The overall picture indicates a dysregulation of metabolic pathways and deregulated tissue morphogenesis contributing to epithelial-mesenchymal plasticity. Tumor cells are able to meet the demands of enhanced growth and proliferation by a plethora of metabolic reprogramming and also by competing with other surrounding cells and consuming essential nutrients from the microenvironment [
46,
47,
48]. Deregulated tissue morphogenesis has important physiological significance, as the downregulation of epithelial gene expression signature and the dissolution of epithelial intercellular junctions are key events in epithelial-mesenchymal transition (EMT) [
49]. Negative regulation of anoikis molecular pathways promotes anchorage-independent growth and EMT, leading to cancer progression and tumor metastasis [
50].
Downregulated miRNA hsa-miR-326 DE-target genes (a total of 32 in number) were found to be enriched mostly in the biological processes of vasculature development/blood vessel development (7/32 genes) and gland development (6/32 genes). These predicted upregulated targets include various molecules whose expression are known to correlate with the parameters of disease aggressiveness like tumor invasion, angiogenesis, liver metastasis, disease recurrence, and poor prognosis [
51,
52,
53,
54,
55,
56,
57,
58,
59,
60,
61,
62,
63,
64]. Cellular component analysis of upregulated DE-targets indicated enrichment in the basolateral plasma membrane (3/32 genes) or the basal part of the cell. Intestinal epithelial cells are known to exhibit epithelial cell polarity with distinct apical and basolateral plasma membrane domains [
65]. The basolateral plasma membrane is rich in phosphatidylinositol-3,4,5-trisphosphate and contains junctional complexes that regulate intercellular adherence and adherence with the basement membrane [
65]. Alterations of basolateral membrane proteins have been found to correlate with loss of epithelial architecture and onset of cancer [
66]. Taken together, our results potentially indicate metabolic reprogramming, deregulation of anoikis-regulating pathways, and alterations in proteins present in the basal part of intestinal epithelial cells.
In spite of the bulk of previously conducted studies on CRC gene expression and miRNAs [
13,
14,
27,
29,
67,
68,
69,
70], there are significant gaps in our understanding of EOCRC and how it differs from LOCRC. Most of the published reports with genome-wide RNA sequencing [
14,
67] concentrate on a cohort comprised exclusively of EOCRC patients. The inclusion of sporadic LOCRC is essential in the initial discovery cohort, as it is not possible to identify markers specifically deregulated in early-onset disease (significantly upregulated/downregulated with respect to LOCRC) without comparison with LOCRC tissues. The diagnosis of CRC before the age of 50 always raises the suspicion of a genetic cancer predisposition syndrome (Lynch syndrome and familial adenomatous polyposis). So, it is important to screen the EOCRC/LOCRC cohort for known canonical markers as their presence creates a hypermutable and pro-oncogenic phenotype. Liu et al. performed genome-wide miRNA and transcriptome profiling, but the tumors included in their study cohorts were not assessed for canonical CRC markers like MSI activation of the Wnt pathway, or
APC mutations [
29]. Another bottleneck of big-data transcriptomic studies is that they very rarely include paired adjacent colonic mucosa as normal samples in their analysis. Ideally, tumor samples should be paired with corresponding normal samples to avoid biological differences between individuals.
Our study is the first attempt to identify differentially expressed miRNAs specific to EOCRC in the Indian population. We have attempted to remove the inconsistencies of previous studies by the inclusion of a LOCRC cohort, both in the discovery cohort and also in the validation cohort, with paired adjacent colonic mucosa corresponding to each tumor analyzed in this study. One limitation of our study is the small sample size of the discovery cohort. To compensate for that, we have validated our findings in the TCGA-COAD and TCGA-READ datasets and finally in an additional cohort of 16 young patients and 11 aged patients. Other than hsa-miR-326, all selected miRNAs showed similar expression between age-specific TCGA datasets and our study cohorts. Additionally, we have also screened our cohorts for MSI, nuclear β-catenin, and APC mutations to collect tumors negative for these known CRC canonical markers. To the best of our knowledge, our study is the first study that incorporates TCGA-COAD and TCGA-READ-based age-specific validation along with RT-qPCR to identify miRNAs deregulated in early-onset CRC. Since a miRNA can target many mRNAs, we screened the targets of our EOCRC-validated DEMs to identify those target genes (46 downregulated and 32 upregulated) known to be differentially expressed in colorectal adenocarcinoma (TCGA-COAD and TCGA-READ) datasets. In the future, these target genes need to be explored in EOCRC and LOCRC cohorts for the identification of potential pathways responsible for the early onset of colorectal cancer.
4. Materials and Methods
4.1. Patient Recruitment
34 (7 patients < 50 years, 27 patients > 50 years) colorectal tumor samples and respective adjacent normal colonic mucosas from histologically proven CRC patients were collected in collaboration with doctors and pathologists from the Surgical Oncology Department of Netaji Subhas Chandra Bose Cancer Hospital, Kolkata (discovery cohort). This cohort was used for screening MSI, nuclear β-catenin, and APC mutations. 5 (3 patients < 50 years, 2 patients > 50 years) colorectal tumors and adjacent normal colonic mucosas (total 10 tissue samples) sent for small-RNA seq were screened from this cohort. 56 (24 patients < 50 years, 32 patients > 50 years) histologically proven colorectal tumors and respective adjacent normal colonic mucosas were obtained from the Departments of Surgical, Medical and Radiation Oncology, Surgical Gastroenterology and General Surgery, All India Institute of Medical Sciences (AIIMS), Rishikesh (validation cohort). 27 (16 patients < 50 years, 11 patients > 50 years) tumors and adjacent normal colonic mucosas were screened from this cohort for validating our RNA-seq results. All patients with histopathologically proven colorectal tumors undergoing treatment from May 2020 to May 2022 who fulfilled the inclusion and exclusion criteria were included. All necessary IEC permissions were obtained prior to sample collection. Clinicopathological information like age, sex, site, stage, and differentiation of tumor, familial history of CRC, and presence of any other inflammatory bowel disease was also collected. Inclusion criteria: Patients admitted for surgical resection with biopsy-proven colorectal adenocarcinoma, age up to 80 years, willing to provide written informed consent. Exclusion criteria: Patients with Familial Colorectal Carcinoma, Unable/unwilling to give consent, patients with cancers other than CRC, and patients receiving neoadjuvant chemotherapy and/or radiotherapy.
4.2. Biospecimen Collection
Tumor and normal tissue samples were collected in RNALater (RNAlaterTM Stabilization Solution, Invitrogen, catalog# AM7020, Carlsbad, CA, USA) and 10% neutral buffered formalin. Samples collected in RNALater (Invitrogen) for nucleic acid extraction were stored at −80 °C for processing at a later date. Samples stored in neutral buffered formalin were processed into FFPE blocks, sectioned into 5 µm sections, and adhered onto positively charged slides. Histological sections were stained with hematoxylin and eosin. All specimens with histopathological features suggestive of an inflammatory colorectal disease were excluded from this study. Reporting was performed by trained histopathologists. Grossing and reporting of colectomy specimens suspicious of colorectal carcinoma were conducted according to CAP (College of American Pathologists).
4.3. Immunohistochemistry
IHC for MMR proteins (MLH1, MSH2, MSH6, and PMS2) and nuclear β-catenin was performed as per standard protocol. Briefly, about 5–10 μm paraffin sections of tissue samples were deparaffinized and rehydrated in a series of graded alcohols. Heat-induced antigen retrieval was conducted in Tris-EDTA buffer (pH 9) for MMR proteins and in 10 mM sodium citrate buffer (pH 6) for nuclear β-catenin in the microwave, followed by peroxidase quenching and antibody blocking with 3% BSA. Slides were then subjected to overnight incubation at 4 °C with the respective primary antibodies at standardized dilutions (given in
Table 5). The slides were developed using 3-3′ diaminobenzidine (DAB) as the chromogen and counterstained with hematoxylin. The PolyExcel HRP/DAB detection system—TWO STEP Universal kit for Mouse and Rabbit primary antibodies (PathnSitu, catalog# PEH002, Pleasanton, CA, USA), was used for the qualitative identification of the nuclear antigens. Slides were analyzed by a trained histopathologist for the detection of MSI (as per CAP guidelines) and nuclear β-catenin. Normal colorectal tissue was taken as an internal control. A known case of MSI CRC was used as a positive control for MSI detection. External control for nuclear β-catenin consisted of a histologically diagnosed section of fibromatosis (desmoid tumor). No antibody controls were taken as negative controls. The scoring of Wnt positive nuclear β-catenin expression (Wnt+) was performed according to Raman et al. [
12]. A sample was scored as Wnt positive (Wnt+) if the β-catenin nuclear stain was observed in more than 35% of tumor epithelial cells and Wnt negative (Wnt−) if a nuclear stain was detected in less than 25% of cells. IHC images were captured using the Olympus BX53F2 (Olympus, model# BX53F2, Tokyo, Japan) biological microscope.
4.4. RNA Isolation
RNA isolation was performed from fresh frozen colorectal (tumor and normal) tissues stored in RNALater (Invitrogen) at −80 °C. Diethyl Pyrocarbonate (DEPC) treatment of glassware and forceps was performed prior to RNA isolation. Isolation was performed using the Qiagen AllPrep® DNA/RNA/miRNA Universal kit (Qiagen, catalog# 80224, Hilden, Germany) as per the manufacturer’s protocol. Elution was performed in nuclease-free water. For isolation of RNA from tissues in the validation cohorts, 50–100 mg of paired tumor and adjacent normal colonic tissues were chopped into small pieces using a sterile surgical scalpel. The tissues were then homogenized in 1 mL of TRIzol reagent (Invitrogen, catalog# 15596026) and incubated at 4 °C overnight for efficient homogenization and lysis. Downstream processing for RNA isolation from TRIzol reagent (Invitrogen) was conducted as per standard protocol.
4.5. DNA Isolation from Tissue
DNA isolation from tissue was performed by standard protocol. Chopped 50 mg tissue samples were incubated in digestion buffer (60 mM Tris pH 8.0, 100 mM EDTA, 0.5% SDS) and proteinase K (500 ng/mL) at 56 °C overnight. An equal volume of phenol-chloroform solution was added and mixed well by inverting repeatedly. Tubes were centrifuged at 12,000× g for 15 min at room temperature for phase separation. An equal amount of chloroform was added and mixed well before centrifugation at 12,000× g for 15 min at room temperature for phase separation. 1/10th volume of 3M sodium acetate pH 6.0 and 2.5 volumes of absolute ethanol were added to the aqueous phase and kept at −20 °C overnight for nucleic acid precipitation. Centrifugation was performed at 12,500× g for 15 min at 4 °C and washed with 70% ethanol at 12,500× g for 5 min at 4 °C. The DNA pellet was air-dried and resuspended in 100 µL of TE pH 8.0. RNase treatment (20 µg/mL) was conducted for 30 min at room temperature to eliminate RNA. Phenol-chloroform phase separation and chloroform phase separation steps were repeated to remove the RNase and then DNA was purified from the aqueous phase by using Promega kit protocol (Promega Wizard SV Gel and PCR Clean-Up System, part# 9FB072, Madison, WI, USA). Isolated DNA was quantitated by spectrometry and visualized by electrophoresis in 0.8% agarose gel using the BioRad GelDoc imaging system (Bio-Rad Laboratories, Hercules, CA, USA).
4.6. PCR for APC Gene
Normal and tumor DNA from each patient was subjected to PCR amplification. Primer sequences were designed to amplify the mutation cluster region (exon 15, codons 1260–1596) of the APC gene in overlapping PCR segments. Reaction conditions were as follows: 95 °C, 5 min; 95 °C, 30 s; 60 °C/57 °C/60 °C (Annealing Temperature—Ta), 1 min; 68 °C, 1 min; 68 °C, 5 min; for 35 cycles.
PCR amplification was performed in a 25 µL reaction with 1 unit of NEB Taq DNA polymerase (NEB, catalog# M0273S, Ipswich, MA, USA), 1X Standard Taq Buffer (NEB), 200 µM dNTPs (NEB), 1µM forward and reverse primer each (
Table 6) and 100 ng of template DNA. The PCR product was visualized by electrophoresis in a 2% agarose gel in 1X TBE (0.13M Tris (pH 7.6), 45mM boric acid, 2.5mM EDTA) buffer.
Information on PCR primers is provided below (
Table 6):
4.7. Direct DNA Sequencing
The PCR products were purified using the Promega Wizard SV Gel and PCR Clean-Up System (Promega) according to the manufacturer’s instructions. Direct sequencing was performed by Eurofins Genomics India Pvt. Ltd. (Whitefield, Bangalore, India).
4.8. APC Gene Mutation Analysis
The nucleotide and deduced amino acid sequences were compared with reference sequences of the APC gene available at the NCBI (National Center for Biotechnology Information) GenBank database using the BLASTx (Basic Local Alignment Search Tool) program.
4.9. miRNA Seq Analysis
A total of 2 μg RNA samples isolated from 10 tissue types (5-tumor, 5-normal) were sent for RNA sequencing. RNA samples were outsourced to the company Bencos Research Solutions (Kolkata, India) for RNA sequencing and data analysis. RNA with RIN > 8.0 proceeded for library preparation using the NEBNext
® Small-RNA Library Prep Set for Illumina
® (Illumina Inc., San Diego, CA, USA) and sequenced at the National Genomics Core, CDFD, Hyderabad, using 50-bp single-end reads on Illumina NextSeq 500 Sequencer (Illumina Inc.). The data was generated by using the paired-end approach of the Illumina technique. A total of twenty paired-end fastq files were used for the small-RNA-seq analysis via a pipeline of FastQC-FastpSortMeRNA-miRDeep2-edgeR. Fastq files were subjected to Fastqc (v.0.11.9) for the sequence quality check (FastQC;
https://www.bioinformatics.babraham.ac.uk/projects/fastqc/, accessed on 14 January 2022) and found that all the quality features were passed, except some features that were flagged with warnings and failed. After sequencing, adapters, and low-quality sequences were removed from the obtained raw reads by the Fastp tool (v.0.23.2) [
71], and clean data was counted using the FastQC program. All the reads of every sample were subjected to SortMeRNA (v.4.3.2) for removal of ribosomal RNA sequences [
72]. Four databases, silva-euk-28s-id98, silva-euk-18s-id95, rfam-5.8s-database-id98, and rfam-5s-database-id98 were utilized in the rRNAs removal analysis (
https://github.com/biocore/sortmerna/archive/2.1.tar.gz, accessed on 20 January 2022). Ten independent read mappings of small-RNA-seq data were performed by miRDeep2 (v.2.0.1.2) using a genome reference sequence database Homo_sapiens. GRCh38.dna.primary_assembly.fa [
73] along with mature and hairpin miRNA sequences specific to humans (hsa as a species) retrieved from the miRBase (v22.1) database (
https://www.mirbase.org, accessed on 22 January 2022). The reference sequence fasta file was downloaded from the Ensembl genome base (
http://ftp.ensembl.org/pub/release-105/fasta/homo_sapiens/dna/, accessed on 22 January 2022).
4.10. Identification of Known and Novel miRNAs
Processed reads were used to generate collapsed reads using mapper.pl module of the miRDeep2 package with a minimum length of 18 parameters. For predicting miRNAs, known and novel, the collapsed reads were passed to miRDeep2.pl module of the package. In this analysis, the reference genome sequences and the miRBase mature and hairpin sequences specific to humans (hsa as a species) were utilized. The count matrix was generated using the final results after removing the duplicates based on the same genomic coordinates.
4.11. Differential miRNA Expression Analysis
Read count was performed for all the samples using miRDeep2. For differential miRNA expression analysis, edgeR (v.3.36.0) R-package was utilized [
74]. The normalization factor was calculated using raw read counts, and after that, the count data were normalized by the Count Per Million (CPM) method. The exact test method is available in the egdeR package, which was implemented for the differential expression analysis (DEA) of the single sample comparisons. In the single sample comparisons, a 0.2 divergence value was considered for the DEA. However, the lmFit module of edgeR, a linear model using weighted least squares for each gene, was utilized to fit the linear model into the data for multiple sample comparisons after applying the voom transformation and calculation of variance weights. Further, the empirical Bayes (eBayes) module of edgeR was employed for smoothing the standard errors and calling the differentially expressed transcripts of miRNAs. Differentially expressed miRNAs were obtained by filtering the results of DEA using adjusted
p-value or FDR ≤ 0.05 and Log
2 fold change value at ≥2 (upregulation) or ≤−2 (downregulation).
4.12. Volcano Plots
Global gene expression values obtained from a pairwise comparison analysis were also plotted in the form of a volcano plot using the R-package. In the volcano plots, miRNA rows were ordered from the final result of edgeR analysis according to the adjusted p-value or FDR in decreasing order. The volcano plot was generated for all five single-sample comparisons.
4.13. Heatmap
The pheatmap package in R was implemented to plot the heatmaps, and the rlog-normalized read count matrix for all the samples was used as the input data (
https://cran.r-project.org/web/packages/pheatmap/index.html, accessed on 23 May 2023) derived from the DESeq2 R-package [
75]. The function rlog returns a Summarized Experiment object that contains the rlog-transformed values in its assay slot. Corresponding Z-scores were computed from the rlog-normalized read count matrix, and pheatmap drew the heatmap accordingly. The top 20 most variable miRNAs were extracted from the matrix to be plotted on the heatmap. The miRNAs that showed expression values higher than the mean expression across samples were assigned a positive Z-score denoted by green. The opposite, that is, the negative Z-score is denoted by a red on the heatmap.
4.14. Data Acquisition and Processing from TCGA-COAD and TCGA-READ Database
In order to obtain the expression values of miRNAs, expressed in different ages of normal and tumor samples of COAD and READ patients, we downloaded and processed the bulk RNA seq data along with clinical files of TCGA-COAD and TCGA-READ project by using “
TCGAbiolinks“ R package (version 2.19.2) [
76]. With the “
TCGAbiolinks“, the RNAseq raw count matrix was downloaded from the GDC server. By using a few more R packages tidyr, dplyr, tibble [
77,
78,
79] we extracted the miRNAs of our interest and their expression values for four groups:
(a) miRNAs found in normal samples having age > 50 years we grouped them as N_O samples (nCOAD = 165; nREAD = 45), (b) miRNAs found in normal samples having age < 50 years we grouped them as N_Y samples (nCOAD = 32; nREAD = 15), (c) miRNAs found in tumor samples having age > 50 years we grouped them as T_O samples (nCOAD = 183; nREAD = 83), (d) miRNAs found in tumor samples having age < 50 years we grouped them as T_Y samples (nCOAD = 22; nREAD = 5). ‘nCOAD’ represents the sample size for COAD dataset, ‘nREAD’ represents the sample size for READ dataset. miRNA expression values of all four groups (Normal_Old, Normal_Young, Tumor_Old, Tumor_Young) for COAD and READ datasets are given in
Supplementary Excel Files (Supplementary Excel File S1–S8).
4.15. Target Identification and Selection
Experimentally validated targets for the selected miRNAs were identified using miRNet [
37]. Additionally, genes differentially expressed (DE) in TCGA colon adenocarcinoma (COAD) and TCGA rectal adenocarcinoma (READ) samples with |Log
2FC| cutoff of 1.00 and q-value cutoff of 0.01 were derived using GEPIA [
38]. Target genes were compared with this DE-gene list to identify DE-targets altered in colorectal tumor tissue in a direction reciprocal to that of miRNAs. Identities of DE genes and DE targets for each miRNA are given in
Supplementary Excel File S9 and Supplementary Table S3.
4.16. Gene Ontology and Pathway Enrichment Analysis
For each comparison group, gene ontology (GO) and pathway enrichment analysis were performed separately for both the upregulated and downregulated sets of differentially expressed miRNAs’ target genes against humans as a selected species. A tool, ShinyGO (v.0.81) [
40], was used for retrieving functional annotations based on the Biological Process (BP), Molecular Function (MF), and Cellular Component (CC). FDR was calculated based on the nominal
p-value from the hypergeometric test. Identities of miRNA targets differentially expressed in TCGA-COAD and TCGA-READ and enriched in GO Enrichment pathways are given in
Supplementary Table S4.
4.17. miRNA Validation by RT-qPCR
Real-time analyses by two-step RT-qPCR were performed for quantification of miRNA levels. The stem-loop RT-qPCR method was used for miRNA screening and quantification [
35,
36]. Reverse transcription (RT) was performed with Verso cDNA Synthesis Kit (Thermo Scientific, catalog #AB-1453/A, Van Allen Way, Carlsbad, CA, USA) as per the manufacturer’s instructions using 100 ng of total cellular RNA. The 10 µL of RT reaction mixture contained 1 µL of RT primer (1 µM), 500 µM each of dNTP, 2 µL of 5X cDNA synthesis buffer, 0.5 µL of RT enhancer, and 0.5 µL of Verso Enzyme Mix (Thermo Scientific). All miRNA RT-qPCRs were performed on the Biorad CFX96 Real-Time System (Bio-Rad Laboratories, Hercules, CA, USA). One-tenth of the reverse transcription mix was subjected to PCR amplification with Bio-Rad SsoAdvanced Universal SYBR
® Green Supermix (Bio-Rad, catalog #1725270). The 20 µL of RT-qPCR reaction mixture contained 2 µL of forward and reverse primers (1 µM each) and 10 µL of 2X SsoAdvanced Universal SYBR
® Green Supermix (Bio-Rad). The RT reaction condition was: 25 °C, 10 min; 42 °C, 60 min; 95 °C, 5 min; 4 °C, ∝. The RT-qPCR condition was: 95 °C, 3 min; 95 °C, 30 s; 60 °C, 1 min; for 40 cycles. All Samples were analyzed in triplicates. The concentrations of intracellular miRNAs were calculated based on their normalized Ct values. Normalization was performed by U6 snRNA. The ΔΔCt method for relative quantitation (RQ) of gene expression was used and relative quantification was performed using the equation 2
−ΔΔCt (as per ‘Guide to Performing Relative Quantitation of Gene Expression Using Real-Time Quantitative PCR’ by Applied Biosystems (
https://assets.thermofisher.com/TFS-Assets/LSG/manuals/cms_042380.pdf, accessed on 30 December 2024) [
80]. Briefly, ∆Ct = Ct
miRNA–Ct
U6, ∆∆Ct = ∆Ct
TUMOR/NORMAL–∆Ct
NORMAL, 2
−∆∆Ct represents the relative quantification as compared to the respective normal.
Information on miRNA Reverse Transcription (RT) Stem Loop Primers (SLP), miRNA Real-time PCR forward and reverse primers are provided below in
Table 7 and
Table 8 respectively.
The reverse primer sequence is complementary to a portion of the RT USLP and is hence common for all miRNAs.
4.18. Statistical Analysis
All graphs were plotted and analyzed using GraphPad Prism 8.00 (GraphPad, San Diego, CA, USA). For statistical analysis, a non-parametric two-tailed, paired, or unpaired Student’s t-test was performed. Error bars indicate mean with standard deviation.