1. Introduction
Colon adenocarcinoma (COAD) is a significant public health concern, ranking as the fourth most common cancer worldwide and exhibiting a worrisome increase in incidence rates [
1,
2]. Despite the proven efficacy of surgical intervention and adjuvant chemotherapy in treating COAD [
3], the identification of novel therapeutic targets and robust prognostic markers remains a critical research priority. While the use of carcinoembryonic antigen (CEA) and carbohydrate antigen 19-9 (CA199) as biomarkers for COAD diagnosis and prognosis prediction is prevalent in clinical settings, their suboptimal performance in this regard has been well-documented [
4].
Epithelial–mesenchymal transition (EMT) is a complex biological process that underlies the acquisition of mesenchymal features by epithelial cells, which is a critical step in tumor progression, invasion, metastasis, and drug resistance [
5,
6,
7]. Given the importance of EMT in cancer biology, identifying key molecules that influence its development is of paramount importance. The methyl-CpG-binding domain (MBD) family of proteins, which are known to interact with methylated CpG dinucleotides, have been implicated in the pathogenesis of various diseases, including cancer [
8]. MBD3, a member of the MBD gene family, has been shown to play a role in the development of several digestive tract tumors. In liver cancer, MBD3 has been shown to promote tumor cell growth, angiogenesis, and metastasis by inhibiting the tumor suppressor tissue factor pathway inhibitor 2 (TFPI2) [
9]. In pancreatic cancer, MBD3 has been shown to inhibit EMT through the TGF-β/Smad signaling pathway [
10] and stemness through the Hippo pathway [
11]. However, the precise role of MBD3 in EMT in COAD remains unclear.
In this study, we utilized a multi-pronged approach, including cytological experiments, animal experiments, patient pathological sections, and various bioinformatics methods, to investigate the potential molecular mechanisms of MBD3 in the development and clinical prognosis of COAD. Specifically, we performed single-cell sequencing, a comprehensive analysis of MBD3 expression profiles, survival status, and potential molecular pathways in TCGA and GEO databases. We further validated differential expression in patient pathological sections and COAD cell lines and explored the effect of MBD3 on EMT in COAD cells through animal experiments and cytological experiments.
2. Materials and Methods
2.1. Data Download Gene Expression Analysis
We downloaded 33 kinds of tumor project STAR process RNAseq data and extracted the TPM format from the TCGA database (
https://portal.gdc.cancer.gov, accessed on 12 December 2022). The relevant data of normal tissues and cells were downloaded from the Genotype-Tissue Expression (GTEx) database. Transcripts permillion reads (TPM) are used to standardize the HTSeq FPKM Level 3 data. R software v4.2.1 was used for statistical analysis, and the ggplot2 package was used for visualization. The Wilcoxon rank-sum test was used to detect the data of the two groups, and
p < 0.05 was considered statistically significant (ns,
p ≥ 0.05; *,
p < 0.05; *,
p < 0.01; *,
p < 0.001).
2.2. The MSI Analysis and Gene Mutation Landscape of MBD3
We downloaded the harmonized pan-cancer dataset from the UCSC (
https://xenabrowser.net/, accessed on 12 December 2022) database: TCGA Pan-Cancer (PANCAN, N = 10535, G = 60499); we further extracted ENSG00000071655 (MBD3) expression data in each sample, and further screened the sample source as Primary Blood Derived cancer-peripheral Blood, Primary Tumor samples. From a previous study (Landscape of Microsatellite Instability Across 39 Cancer Types, DOI:10.1200/PO.17.00073) [
12], we integrated the MSI and gene expression data of the samples, and further applied log
2(x + 0.001) transformation to each expression value. In addition, we downloaded software MuTect2 (
https://portal.gdc.cancer.gov/, accessed on 12 December 2022) from GDC for processing all Simple level4 TCGA sample Nucleotide Variation dataset; we calculated MATH (Mutant-allele tumor) for each tumor using the inferHeterogeneity function of the R package maftools (version 2.2.10). We further applied a log
2(x + 0.001) transformation to each expression value by integrating the TMB and gene expression data of the samples and excluded cancers with less than 3 samples in a single cancer, resulting in 37 cancer expression data. For the Variation dataset, we integrated the mutation data of the samples and obtained the protein domain information from the R package maftools (version v2.2.10). The full names and abbreviations of cancers are shown in
Table S1.
2.3. Single Cell Sequencing
CancerSEA (Yuan et al., 2019) is a specialized single-cell sequencing database that can provide different functional states of cancer cells at the single-cell level. The correlation between MBD3 expression and different tumor functions was analyzed based on single-cell sequencing data. The T-SNE plot shows the MBD3 expression profile of single cells in TCGA samples.
2.4. Survival Prognosis Analysis
Kaplan–Meier plots were used to assess the relationship between MBD3 expression and prognosis (OS) of cancers. Proportional-hazards hypothesis testing and fitted survival regression were performed with the survival package (version 3.3.1), and the results were visualized with the survminer package and the ggplot2 (version 3.3.6) package. The Log-rank test was used in the hypothesis test, and p < 0.05 is considered statistically significant.
2.5. Clinical Significance of MBD3 in COAD
Risk score, Calibration, Nomogram, and forest map were used for further clinical significance analysis of MBD3 in COAD. Risk score maps were visualized with the ggplot2 package (version 3.3.6). The survival package (version 3.3.1) was used for proportional hazards hypothesis testing and Cox regression analysis, and the rms package (version 6.3-0) was used for Calibration analysis and visualization. The survival package was used for proportional hazards hypothesis testing and Cox regression analysis, and the rms package was used to construct and visualize the nomogram correlation model. Forest map visualization was performed using ggplot2 (version 3.3.6).
2.6. Co-Expression Gene Analysis of MBD3 and Function Enrichment in COAD
We extracted the data of the corresponding molecules from the TCGA public database and divided them into a high-expression group and a low-expression group according to the expression of the corresponding molecules. The raw Counts matrix of the selected public data was analyzed using the DESeq2 package (version 1.36.0) following standard procedures. Using Pearson’s correlation coefficient, we also showed the correlation between MBD3 expression and the expression of the top 5 positively correlated genes and the top 5 negatively correlated genes using heat maps and lollipop plots. Functional enrichment between co-expressed genes and MBD3 in colon cancer was predicted by KEGG, GO and GSEA analysis. The above data were visualized using ggplot2 (version 3.3.6).
2.7. Cell Line and Cell Culture
Colon cancer cell lines SW620, SW480, CaCo2 and HCT116 were provided and maintained by the Central Laboratory of the Affiliated Hospital of Jiangsu University and the Institute of Basic Medicine, Jiangsu University School of Medicine. Colon cancer cell lines SW620, SW480, CaCo2, and HCT116 were cultured in DMEM (Hyclone, Beijing, China) supplemented with 10% fetal bovine serum, Gibco, Carlsbad, CA, USA) in 100 mg 1 penicillin in a 37 °C humidified incubator with 5% CO2 supply.
2.8. RNA Extraction and Real-Time PCR
Total RNA was extracted using Trizol (Invitrogen, Carlsbad, CA, USA), and for nude mouse tissues, tissue blocks were placed directly in a acetabulum with a small amount of liquid nitrogen, followed by rapid grinding. After the tissue was softened, a small amount of liquid nitrogen was added and then ground again and repeated three times. Trizol was added at 50 to 100 mg tissue/mL, transferred to a centrifuge tube, and the homogenate was thoroughly homogenized for about 1–2 min using an electric homogenizer. Reverse transcription was performed using the RevertAid first-strand cDNA Synthesis Kit (Thermo, Waltham, MA, USA) according to the manufacturer’s instructions. Quantitative real-time PCR was performed using iQ SYBR Premix Ex Taq Perfect Real Time from Bio-Rad Laboratories in 10 μL tubes, and SYBR was screened with DNA-specific fluorescent dye. Human U6 was chosen as the housekeeping gene. In this system, the primer pairs used to amplify the human UCA1 gene and human U6 were as follows in
Table S1. Primer sequences of EMT-related molecules were as follows in
Table S2. The samples were cycled under the following conditions: 95 °C for 3 min, 95 °C for 20 s, 56 °C for 20 s, and 72 °C for 30 s for 40 cycles. The relative expression of genes was calculated by the comparative CT method (ΔΔCT), and the fold enrichment was determined as follows: 2 − [ΔCT(sample) − ΔCT(calibrator)].
2.9. Cell Total Protein Extraction and Western-Blot
Cultured cells were rinsed with cold PBS and treated with RIPA lysis buffer at 4 °C for 10 min, followed by heating at 100 °C for 10 min, centrifugation at 14,000× g/min at 4 °C for 10 min, removal of the supernatant, and determination of protein concentration by the BCA assay. Each lane was loaded with about 20 mg of protein, separated by 10% SDS-PAGE and transferred to PVDF membranes. Membranes were blocked with 5% skim milk powder for 1 h at room temperature, followed by incubation with primary antibodies overnight at 4 °C and secondary antibodies for 1 h at room temperature. The liquid was uniformly dropped on the membrane from an appropriate amount of ECL, and the membrane surface was uniformly covered with color solution. The image was photographed and analyzed by chemiluminescence imaging analysis software.
2.10. Plasmid Construction, Transfection and Infection
The complete MBD3 sequence was amplified by RT–PCR using primers MBD3-all-F (5′-CGGAATTCCGATGGAGCGGAAGAGCCCGAGCG-3′) and MBD3-all-R (5′-GGGGTACCCCCTAGACGTGCTCCATCTCCGGGT-3′) from a cDNA library of PANC1 cells, then inserted into the expression Vector p3xFLAG-Myc-CMV-24 (Sigma, St. Louis, MO, USA). The sh-EGFP and sh-MBD3 plasmids were previously constructed in our laboratory (Sigma) and kept at the School of Basic Medical Sciences, Jiangsu University. Plasmids were transfected into colon cancer cells using Lipofectamine 2000 (Invitrogen) according to the manufacturer’s instructions. The specific sequences are shown in
Table S2. Methods for generating retroviruses encoding reprogramming factors and further infecting NPCs were referred to a previous paper (Pontes et al., 2014).
2.11. Transwell Migration and Invasion Assay
Transwell assays were performed using transwell inserts (Corning, Corning, New York, NY, USA) containing 8 mm permeable Wells according to the manufacturer’s protocol. Transfected SW620, SW480, CaCo2, and HCT116 cells were harvested, resuspended in serum-free medium, and transferred to 8 μm permeable wells (100,000 cells per well). The cells were then incubated with culture medium containing 10% FBS for 24 h before detection. The cells on the upper surface were scraped off, and the migrating cells on the lower surface were fixed and stained with 0.05% crystal violet for 30 min. Finally, five independent fields per transwell were counted, and the average number of cells per field is represented in the figure. To assess cell invasion, 100,000 cells were seeded in Matrigel-coated transwell inserts (BD Bioscience, Corning, NY, USA) in serum-free medium. Cells were then treated similarly to cell migration assays.
2.12. Cell Proliferation Assays
Cell proliferation was detected by cell-counting kit-8 (CCK-8, Beyotime Institute of Biotechnology, Shanghai, China). For the CCK-8 assay, 2 × 104 cells were seeded in 96-well plates for 24 h and transfected with Vector, Flag-MBD3, sh-EGFP, and sh-MBD3 in colon cancer cells. At 0, 1, 2, 3, 4, 5, and 6 days after transfection, 10 μL of cell-counting kit solution was added to each well; 96-well plates were incubated at 37 °C for 2 h, and absorbance values at each time point were measured at 450 nm using a microplate reader. All experiments were performed with at least three biological replicates.
2.13. Colony Formation Assay
Stable cell lines were collected, re-suspended in medium, transferred to 6-well plates (500 cells per well), and cultured for 10 to 14 days until large colonies appeared. Cells were fixed in 4% paraformaldehyde for 15 min and then stained with 0.05% crystal violet for 30 min to count the number of colonies.
2.14. Xenograft Mouse Model
The protocol was approved by the Institutional Animal Care and Use Committee of Jiangsu University, Zhenjiang, China. SW620 cells (2.0 × 106 cells/site) stably transfected with shEGFP and shMBD3.CaCO2 cell lines stably transfected with Flag-Vector and Flag-MBD3 were subcutaneously injected into 5-week-old BALB/c nude mice (Shanghai SLAC Laboratory Animal Co., Ltd., Shanghai, China) to generate xenografts. There are four female mice in each group. Tumor volume was measured weekly after injection and calculated using the formula: length × width × height × π/6.
2.15. Pathological Sample Collection
A total of 3 samples of colon cancer tissues and their matched paracancerous tissues were collected between March 2023 and April 2023 at the Affiliated Hospital of Jiangsu University. This study was approved by the medical ethics committees of Affiliated Hospital of Jiangsu University and was conducted in line with the Declaration of Helsinki.
2.16. Immunohistochemistry
Immunohistochemistry (IHC) staining was carried out as previously described (DeRycke et al., 2009). Tumor tissues and paracancerous tissues were fixed in 10% formalin, paraffin-embedded, sliced into 4~6 μm sections, and placed onto slides. After deparaffinization, rehydration and microwave antigen retrieval, the slides were incubated with MBD3 (Proteintech Cat No. 14258-1-AP) antibody at 1:800 dilution at 4 °C overnight. Then, the slides were incubated with secondary antibody at room temperature for 30 min and stained with DAB substrate, followed by hematoxylin counterstaining.
2.17. Statistical Analyses
All data are presented as mean ± standard deviation, from at least three independent experiments. The t test was used for comparison between two groups, and one-way analysis of variance was used for comparison between multiple groups. Kaplan–Meier survival analysis was performed using the log-rank test. A p value of less than 0.05 was considered statistically significant.
4. Discussion
The methyl-CpG-binding domain (MBD) protein family has been implicated in a variety of biological processes, including tumorigenesis [
8,
13,
14]. Specifically, MBD2 has been shown to promote the progression and poor prognosis of renal cell carcinoma [
15], while MBD4 has been associated with cervical cancer polymorphism [
16]. MBD3 has been demonstrated to promote the metastasis and growth of various digestive tract tumors, and, in liver cancer, it can enhance the growth, angiogenesis, and metastasis of tumor cells by inhibiting the tissue factor pathway inhibitor 2 (TFPI2) [
9]. In pancreatic cancer, MBD3 can inhibit the epithelial–mesenchymal transition (EMT) process through TGF-β/Smad signaling transduction [
10] and suppress the stemness of pancreatic cancer cells through the Hippo pathway [
11]. However, the role of MBD3 in colon cancer remains to be fully elucidated.
Colon cancer (COAD) is the fourth most common cancer worldwide, with an increasing incidence rate [
1]. Early diagnosis of COAD currently relies on invasive techniques such as colonoscopy [
17], and while molecular biology markers such as CEA and CA199 have been widely utilized, the need for more sensitive molecular markers remains urgent [
18,
19]. Although surgery remains the primary treatment for rectal cancer [
20], new targeted therapies such as immunotherapy combined with anti-angiogenic drugs [
21] and adjuvant chemotherapy [
22] have improved patient prognosis. Thus, identifying novel therapeutic targets for COAD is of the utmost importance.
Based on the findings presented above, it is believed that MBD3 can facilitate the metastasis and growth of tumors through EMT, and may thus represent a promising biological marker and therapeutic target. To further explore the clinical significance of MBD3, we conducted an analysis of its expression differences across various tumors, with particular focus on colon cancer, utilizing a combination of TCGA and GTEx databases. We investigated its potential clinical relevance through a range of analytical techniques, including KM curves, hazard factor maps, nomograms, calibration, and univariate and multivariate COX regression analysis. The gene mutation landscape, tumor microsatellite instability, and t-SNE map of single-cell sequencing were also utilized to provide further insight into the role of MBD3 in tumor biology. Our results suggest that MBD3 has the potential to serve as a novel biological marker. Through functional enrichment analysis of co-expressed genes of MBD3 in colon cancer, we identified its involvement in multiple pathways of colon cancer biological processes and its association with EMT.
Colon cancer is a significant cause of mortality worldwide [
23], with tumor metastasis being a leading cause of death [
24]. Metastasis is the end product of a multistep cell-biological process of the invasion–metastasis cascade [
25] and remains the principal cause of cancer death [
26]; in colon cancer, tumor metastases such as liver metastases lead to poor prognosis [
27]. Molecules such as E-cadherin and Snail have been shown to be associated with tumor metastasis, and they are key molecules in the EMT process [
28,
29]. EMT involved in fundamental processes in embryonic development and tissue repair and has been identified as a major factor in promoting cancer cell metastasis [
30,
31]. It has been confirmed in various tumors, including breast and gastric cancer [
32,
33]. More key molecules have been shown to affect the molecular mechanisms of EMT. For example, TGF-β1 can induce matrix POSTN to promote the migration and invasion of ovarian cancer [
34], while the loss of PRC2 can affect the progression of prostate cancer [
35]. In ovarian cancer, the Wnt/β-catenin axis has also been shown to affect the EMT process [
2]. However, the role of MBD3 as a key molecule affecting EMT and metastasis in colon cancer has yet to be fully explored.
To address this gap in knowledge, we conducted an analysis of MBD3 using online databases TCGA and GETx and verified the differential expression of MBD3 in vivo and in vitro through the immunohistochemistry of patient pathological slices and total RNA qtPCR of cell lines. We subsequently verified the effect of MBD3 on tumor growth in animal models through nude mouse tumorigenesis experiments and further confirmed its impact on colon cancer migration, invasion, and proliferation through transwell and CCK8 experiments in colon cancer cell lines. However, the specific molecular mechanism of MBD3 on colon cancer EMT requires further investigation.
Despite the contributions of this study, there are some limitations to be noted. Firstly, the analysis was limited to the use of online databases TCGA and GETx, and the pathological slices obtained were also limited. Secondly, further investigation is required to clarify the specific molecular mechanism of MBD3’s effect on EMT, through techniques such as western blotting. Additionally, pattern animals that meet the necessary conditions will need to be constructed to further elucidate the mechanism of MBD3 in colon cancer.
In summary, this study has identified the potential role of MBD3 in colon cancer through bioinformatics and has evaluated its significance through a range of analytical techniques. The findings suggest that MBD3 plays an important role in multiple aspects of colon cancer, especially the EMT process, which promotes metastases and leads to poor prognoses, and has potential biological value as a novel therapeutic target.