Next Article in Journal
Suppressed Autoxidation, Enhanced Antioxidant Activity, and Improved Cytocompatibility of Epigallocatechin Gallate via Alginate Site-Specific Conjugation with Tunable Substitution Degree
Previous Article in Journal
Polynucleotides Enhance Collagen Synthesis via Modulating Phosphoenolpyruvate Carboxykinase 1 in Senescent Macrophages: Experimental Evidence
Previous Article in Special Issue
Leveraging Saliva for Insights into Head and Neck Cancer
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Cell-Free DNA Bisulfite Sequencing Reveals Epithelial–Mesenchymal Transition Signatures for Breast Cancer

1
Center for Epigenetics and Disease Prevention, Institute of Biosciences & Technology, Texas A&M Health Science Center, Houston, TX 77030, USA
2
Rigor and Reproducibility Core, Institute of Biosciences & Technology, Texas A&M Health Science Center, Houston, TX 77030, USA
3
Department of Nutrition, Texas A&M University, College Station, TX 77843, USA
4
National Cancer Institute, Division of Cancer Prevention, Rockville, MD 20850, USA
*
Authors to whom correspondence should be addressed.
Int. J. Mol. Sci. 2025, 26(17), 8723; https://doi.org/10.3390/ijms26178723 (registering DOI)
Submission received: 5 August 2025 / Revised: 29 August 2025 / Accepted: 5 September 2025 / Published: 7 September 2025
(This article belongs to the Special Issue Integrative Multi-Omics Analysis for Cancer Biomarkers)

Abstract

Cell-free DNA (cfDNA), shed by malignant tumor cells into extracellular fluid, provides valuable epigenetic information indicative of cancer status. Nipple aspirate fluid (NAF), a noninvasive liquid biopsy from at-risk women, contains nucleic acid and protein biomarkers from adjacent cancer cells, showing promise for breast cancer (BrC) detection. However, despite its potential, the application of cfDNA in NAF for BrC screening is still underexplored. Here, we report a proof-of-concept study for using cfDNA bisulfite sequencing (cfBS) to assess tumor DNA methylation signatures from NAF samples. For four healthy individuals and three BrC patients, cfBS achieved greater than 20× sequencing depth with an average coverage of 26.5× on the genome. A total of 7471 differentially methylated regions were identified, with significant hypermethylation in BrC samples compared to healthy controls. Gene set enrichment analysis indicated that the differentially methylated genes (DMGs) were significantly associated with epithelial–mesenchymal transition (EMT). By developing a novel EMT scoring metric, we found that BrC samples had more of a mesenchymal phenotype than samples from healthy individuals. CDH1, WNT2, and TRIM29 were hypermethylated near the promoter region, while COL5A2 was hypermethylated in the coding region. The DNA methylation and EMT changes were validated through The Cancer Genome Atlas Breast Invasive Carcinoma study, which confirmed that DMGs were associated with gene expression change and that our methylation-based EMT score reliably distinguished tumors from healthy controls. Our findings support the utilization of the NAF cfDNA cfBS methylation profile for noninvasive BrC screening and pave the way for enhanced early detection of this disease.

1. Introduction

While mammography is the primary screening method for breast cancer (BrC), suspicious mammographic findings are often found on biopsy to be falsely positive [1,2,3,4]. The invasive procedures to exclude cancer are costly and can cause sequelae, some serious. Cell-free DNA (cfDNA) has been studied for its potential as a liquid biopsy for cancer screening and detection [5,6]. In particular, aberrant hypermethylation in tumor DNA has been found in the promoter regions of tumor suppressor genes, leading to their transcriptional silencing [7]. Detecting aberrant DNA methylation in plasma cfDNA has been explored in detecting early-stage BrC, particularly in individuals with dense breasts [8,9,10,11]. Genome-wide profiling demonstrated that 70% of tumor suppressor gene promoters, located within CpG islands, are hypermethylated [12,13]. Specifically, aberrant methylation was identified in the promoters of BrC-associated genes, including APC, BRCA1, and RASSF1A. [14,15,16]. Nevertheless, plasma cfDNA has shown fewer encouraging outcomes in BrC detection compared to other cancers [17], probably due to the low yield of cfDNA extracted from plasma [18].
Compared to plasma-based approaches, nipple aspirate fluid (NAF) provides a more localized and potentially enriched source of BrC-specific biomarkers [19,20]. Nipple aspiration is a safe, non-invasive procedure for collecting breast epithelial cells and extracellular fluid produced by the breast epithelium, which is predominantly where BrCs originate. NAF is a valuable source for liquid biopsy, particularly for the analysis of large biomolecules such as secreted proteins and DNA [21,22], though its relatively low cellularity makes it less ideal for cytologic evaluation [23,24]. cfDNA can be extracted from NAF, containing extracellular DNA fragments originating directly from the breast ductal system and the diseased tissue of origin. Therefore, cfDNA methylation in NAF has the potential to serve as a favorable and sensitive approach for BrC detection and screening.
Because NAF cfDNA is usually available in low quantities, a highly sensitive method is needed to detect tumor-derived signals. Whole genome bisulfite sequencing (WGBS) represents a state-of-the-art technology for DNA methylation analysis [25], providing genome-wide profiling at single-base resolution. Based on WGBS, we developed a low-input cfDNA bisulfite sequencing (cfBS) protocol optimized for NAF samples. Here, we present a proof-of-concept study to determine whether DNA methylation signatures derived from NAF cfBS could support accurate and reliable non-invasive BrC screening. Our approach successfully uncovered cfDNA methylation signatures in NAF from BrC patients, underscoring the promise of cfDNA methylome profiling as a novel tool for early detection.

2. Results

2.1. Characteristics of NAF cfBS

NAF samples were collected from seven individuals, three with BrC and four without. Sample volumes ranged from 2 to 20 µL. After cfDNA extraction and bisulfite conversion, a DNA library was constructed with roughly 20 µg of DNA per sample. The cfDNA libraries exhibited an average fragment size of 357 bp, and no significant adapter dimer peaks were observed. Each sample generated an average of 136 million reads per library. These reads were aligned to the hg38 reference genome, resulting in a mapping rate of over 97% for each sample (Table 1). The sequencing depth for each of the seven samples ranged between 20× and 35×, covering from 38.4% to 58.8% of the genome. These high mapping rates and high genome coverage demonstrated that high-quality cfBS data can be reliably obtained from a relatively small volume of NAF samples, affirming the feasibility of our approach for this proof-of-concept study.
We assessed the percentage of cfDNA sequenced reads mapped to different genomic regions in each sample to understand their genomic distribution. This analysis demonstrated a consistent pattern across the samples, with an average of 66.5% of the reads located within the gene body regions, as shown in Supplemental File S1: Figure S1a. A similar trend was observed for the distribution of CpG sites, where 72.1% were found in the gene body (Supplemental File S1: Figure S1b). The distribution pattern of cfDNA fragments suggests that they are not randomly distributed but are instead preferentially derived from gene regions. This pattern supports the reliability of NAF cfDNA as a source for quantifying DNA methylations that regulate gene expression.

2.2. Differentially Methylated Regions Are Enriched in Promoter Regions

In order to identify differentially methylated regions (DMRs), we developed an unsupervised learning approach based on the mean shift algorithm to uncover CpG islands (methylated regions). Our algorithm identified DNA methylation regions with an average size of 246.61 bp and 10.14 CpG sites. This region size aligns closely with the standard definition of a CpG island (length ≥ 200 bp) as described by [26]. As a result, we identified a total of 3,493,780 methylation regions (Figure 1a).
Using Fisher’s exact test, we identified 7471 DMRs that exhibited greater than 10% methylation differences between cancer and normal samples and p-values less than 10−8 (Figure 1b). A volcano plot illustrates these methylation differences against their corresponding p-values, indicating a trend where statistical significance enhances with an increase in methylation difference (Figure 1b). The distribution of p-values relative to the transcription start site (TSS) exhibited a symmetric pattern, with increasing significance when closer to the TSS, as demonstrated in Figure 1c. Our results also highlighted a predominance of hypermethylated DMRs over hypomethylated ones, with 62.1% hypermethylated DMRs and 37.9% hypomethylated DMRs (Figure 1d). A substantial proportion of DMRs (64.2%) were located within gene regions, which typically comprise only 1–2% of the human genome. This indicates a significant enrichment of DNA methylation changes associated with genes, particularly in the promoter region (12.1%). In addition, distinct methylation patterns around the TSS were observed by contrasting normal and cancer samples, as shown in Figure 1e.

2.3. Differentially Methylated Genes Are Associated with Differentially Expressed Genes in BrC

Next, we investigated whether these identified DMRs contribute to gene expression alterations in BrC patients. We annotated differentially methylated genes (DMGs) by associating DMRs with their corresponding genes. This led to the identification of 5700 DMRs connected to the promoter and gene body regions of 4462 distinct genes. The top 50 DMRs and their associated DMGs are listed in Supplemental File S2: Table S1 To pinpoint specific genes implicated in BrC, we compared these DMGs with differentially expressed genes (DEGs) assessed from The Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA) dataset. There were 6636 DEGs identified by comparing primary tumor tissues and normal solid tissues in the TCGA-BRCA dataset. Notably, there was an overlap of 980 genes between the DEGs and DMGs, as illustrated in Figure 1f. This overlap was found to be statistically significant (p < 1 × 10−12), as the observed number of overlapping genes exceeded those expected from random overlap. A total of 59.5% of these overlapping genes had an inverse correlation between DNA methylation and gene expression patterns. This result suggests that changes in DNA methylation could be a key molecular mechanism influencing gene expression alterations during BrC development.

2.4. Epithelial–Mesenchymal Transition Is Activated in BrC cfBS Data

To enhance our understanding of the relationships among the samples, principal component analysis (PCA) was conducted to display the positions of individual samples within a reduced dimensionality space. Notably, the first six principal components (PCs) represented almost equal variance (ranging from 274.3 to 317.0) and together accounted for more than 99.99% of the total variance in the data. By assessing each PC with respect to the two major groups, PC4 emerged as significant and informative, effectively differentiating between cancer and normal samples (Figure 2a). We selected the top 2.5% of CpG regions (PC4 weights > 0.0027). The threshold of 2.5% is based on the cutoff of a two-tailed statistical test at an alpha level of 5%. This selection resulted in 11,281 CpG regions. A heatmap of their methylation levels demonstrated that these genes are differentially methylated between the two groups, normal and cancer (Figure 2b). Gene set analysis of these PC4-associated genes highlighted the epithelial–mesenchymal transition (EMT)-related pathways, such as adherens junction, cell adhesion molecules, focal adhesion, PI3K-Akt signaling pathway, and Ras signaling pathway, to be significantly enriched (adjusted p-value < 0.01), suggesting EMT may play an important role in BrC development (Figure 2c).
In order to study the EMT status of each sample, we developed EMT scoring metrics, which allowed us to quantify the extent of EMT-associated methylation changes. The methylation regions found in PC4 were linked to an EMT signature of 77 genes. The accumulated probability of methylation levels for either epithelial genes or mesenchymal genes was plotted in Figure 3a. The area under the curve (AUC) represents the distribution of these methylation levels. While the AUC for epithelial genes was similar between the two groups, the AUC for mesenchymal genes was higher in the cancer group compared to the normal group, indicating hypomethylation of mesenchymal genes in BrC. This pattern suggests that EMT is active in BrC, with cancer cells likely shifting toward a mesenchymal state (Figure 3a). EMT scores for BrC samples were significantly higher (more mesenchymal) when compared to the normal samples (Figure 3b; p-value = 0.0122). The methylation levels of these 77 EMT genes were demonstrated in the heatmap (Figure 3c). In BrC patients, 23.5% of genes were hypermethylated, while 76.5% were hypomethylated.

2.5. Validation of EMT Scoring Metric in TCGA Data

Applying the EMT scoring metric for the normalized methylation levels in TCGA, we found that the EMT scores for tumor samples were significantly higher than those of normal samples (Figure 4a; p-value < 0.001), indicating a predominantly mesenchymal state in cancer cells. We further evaluated the discriminative power of the EMT score by constructing a receiver operating characteristic (ROC) curve. The EMT score yielded an AUC of 0.995, demonstrating an almost perfect ability to distinguish tumor samples from normal controls (Figure 4b). The TCGA primary tumor samples were classified as stages I through IV, where lower stages signify that the cancer is more localized and has not spread extensively, reflecting a potentially earlier phase of the disease. We calculated the percentage of mesenchymal samples (EMT score > 0) for each tumor stage and observed an increase in the proportion of mesenchymal samples as the tumor stage advanced (Figure 4c).

2.6. EMT Gene Expression and Methylation Correlations in TCGA Data

To evaluate the expression of the 77 EMT genes, we plotted the heatmap in the matched subset of TCGA tumor and normal samples (Figure 5a). Among the 77 genes, there were 64 genes (83.1%) found differentially expressed in the TCGA study, and 68.7% of genes were upregulated while 31.3% of genes were downregulated in the cancer group.
We further assessed the relationship between gene expression and DNA methylation. Several EMT genes, including S100A14, AKAP12, and TMPRSS4, displayed inverse correlations between methylation and expression (Figure 5b). These findings demonstrate widespread EMT-associated transcriptional changes in BrC tumors and suggest that DNA methylation may serve as a potential regulatory mechanism of EMT.

2.7. NAF cfDNA Methylation Reflects Tumor-Specific EMT Gene Alterations

To validate our cfDNA findings, we examined EMT-associated genes that showed matched methylation and expression changes in TCGA and assessed whether similar methylation changes were observed in NAF cfDNA. RABGAP1L displayed predominant hypomethylation and was upregulated in primary tumor samples, with cfDNA methylation analysis confirming hypomethylation within intron 17 (748 bp). In contrast, AKAP12 exhibited hypermethylation and was downregulated in breast tumors, with cfDNA analysis confirming hypermethylation within intron 2 (329 bp) in NAF BrC samples. PGK1 was hypomethylated and upregulated in BrC samples, with cfDNA analysis also revealing hypomethylation in intron 1 (716 bp). Additionally, S100A14 and TMPRSS4 were both hypomethylated and upregulated in breast tumors. cfDNA analysis revealed hypomethylation in the promoter region of S100A14 (528 bp) and in intron 1 of TMPRSS4 (736 bp).

2.8. Differential Methylation of EMT Genes

We confirmed the DNA methylation variations in select genes that were associated with BrC and EMT at the CpG level in NAF samples. Specifically, CDH1, WNT2, and TRIM29 were hypermethylated near the promoter region, while COL5A2 was hypermethylated in the coding region (Figure 6), with methylation levels of 25%, 100%, 17.5%, and 100%, respectively. Beyond protein-coding genes, our methylation analysis also highlighted CpG sites with the highest p-values mapped to noncoding regions, particularly in the long noncoding RNA (lncRNA), CASC15 and miR-129-2. Interestingly, hypermethylation was observed upstream of the TSS of both genes (within 5000 bp), with methylation levels of 30% for both (Figure 6), suggesting potential regulatory implications.

3. Discussion

Early and accurate detection of BrC is essential to improve patient survival. Nevertheless, current screening strategies like mammography have known limitations, including reduced sensitivity and a higher risk of overdiagnosis. This study provides proof-of-concept evidence that NAF can serve as a noninvasive liquid biopsy for BrC detection. NAF offers a promising source of cfDNA for comprehensive genome-wide DNA methylation analysis. This is not the first study to evaluate DNA methylation in NAF, with prior studies by ourselves [27,28] and others [29] using methylation-specific PCR (MSP). In this report, we discuss our findings using cfBS, generally considered the “gold standard” method of DNA methylation assessment [25], for it offers a comprehensive, quantitative analysis of methylation across a region of DNA, whereas MSP is a qualitative method which focuses on identifying methylation at specific CpG sites. Using cfBS, we demonstrate the feasibility of obtaining high-quality, informative methylation data from minimal sample volumes (2–20 µL), the typical amounts of NAF obtained in clinical settings. In this study, we successfully extracted 0.6–3.4 ng of cfDNA from less than 20 µL of NAF. Through cfBS, we generated high-quality libraries, confirming the technical feasibility and reliability of using NAF-derived cfDNA for methylation-based analysis, supporting its potential as a minimally invasive platform for BrC screening and molecular characterization.
For early cancer diagnosis, high-efficiency library construction and sensitive cfDNA detection are necessary yet challenging due to the limited yield and highly fragmented nature of cfDNA [30]. While DNA methylation biomarkers derived from cfDNA have shown great promise for cancer diagnosis, prognosis, and molecular characterization [31,32], most studies have relied on cfDNA extracted from plasma, where tumor-derived signals are often diluted by DNA from other tissues. In contrast, cfDNA from NAF offers a more localized and potentially enriched source of tumor-specific DNA. However, the use of NAF cfDNA in methylation studies remains underexplored. In this study, we demonstrated that high-quality sequencing libraries can be generated from nanogram-scale NAF cfDNA, achieving mapping rates exceeding 97% and genome coverage between 20× and 35×. This depth exceeds the 5–15× range recommended for reliable DMR detection in WGBS studies [33], supporting that our average coverage of 26.5× was sufficient for robust methylation calling.
To profile methylation across the genome, we employed cfBS, which offers several advantages over other high-throughput methylation technologies. Compared to MeDIP-seq [34,35], which uses antibody-based enrichment for methylated regions, cfBS provides base-pair resolution without bias toward heavily methylated sequences. Array-based methods such as the Illumina 450K and EPIC BeadChips offer high reproducibility and cost-efficiency, but they are limited to pre-selected CpG sites (~450,000 to 850,000 probes) and do not capture the full complexity of the methylome [36,37]. WGBS is the most comprehensive approach, capable of measuring the methylation status of every cytosine in the genome [38]. However, WGBS requires substantial DNA input and is cost-prohibitive for routine clinical use, particularly in liquid biopsy applications where DNA is limited [39]. In contrast, cfDNA is often enriched in coding and regulatory regions, possibly due to protection from nuclease degradation [13,40], making cfBS especially efficient for capturing functionally relevant methylation changes.
Our findings revealed 7471 DMRs between BrC and healthy samples, with a predominance of hypermethylation in cancer-associated gene regions, including several well-known tumor suppressors. Importantly, these methylation changes were enriched in gene promoters, suggesting a functional role in gene silencing and cancer progression, corroborating previous studies [41,42]. DNA methylation over the gene body is known to correlate positively with the level of gene transcription in the human genome [43,44]. To validate this, we used TCGA-BRCA data to confirm that a significant proportion of these DMGs were also differentially expressed, many with an inverse methylation-expression relationship, reinforcing the biological relevance of the observed methylation patterns.
A major finding of our study was the role of EMT during cancer progression. Using a novel EMT scoring method based on methylation profiles of curated EMT gene sets, we found that BrC cfDNA samples consistently exhibited a more mesenchymal phenotype than healthy controls. EMT-related pathways such as cell adhesion, PI3K-Akt, and Ras signaling were significantly enriched among the DMGs. Furthermore, the EMT scores correlated with tumor stage, supporting their potential prognostic value.
We also identified hypermethylation in several EMT-related genes with known roles in BrC. Promoter hypermethylation of CDH1, a hallmark of stable EMT [45], was observed through our findings and is consistent with its positive association with EMT in BrC cell lines [46]. This supports the notion that epigenetic silencing of CDH1 contributes to reduced E-cadherin expression [47]. Similarly, TRIM29 is often silenced in breast tumors due to aberrant gene hypermethylation and acts as a tumor suppressor through its ability to suppress EMT [48]. WNT2, a key ligand in the Wnt signaling pathway, also displayed promoter hypermethylation and has been suggested to play an important role in BrC tumorigenesis [49]. In addition, COL5A2, a gene related to extracellular matrix remodeling, exhibited hypermethylation within the coding region. Aberrant expression of COL5A2 has been reported in BrC and is suggested to play a role in facilitating the invasiveness of BrC cells [50]. Furthermore, lncRNA-CASC15 and miR-129-5p emerged as potential epigenetic regulators. CASC15, an oncogenic factor in tumorigenesis of various cancers including BrC [51], is known to promote EMT by increasing N-cadherin and vimentin protein levels while decreasing that of E-cadherin via TWIST1 [52]. In our study, CASC15 exhibited promoter hypermethylation, which may represent an epigenetic mechanism contributing to its oncogenic activity. It has been reported that miR-129 is consistently downregulated in BrC samples, thereby regulating BrC cell proliferation and apoptosis [53]. Our observation of promoter hypermethylation of miR-129-5p provides a plausible mechanism for its downregulation, in line with reports demonstrating that downregulation of miR-129-5p through the Twist1-Snail feedback loop stimulates EMT [54]. These findings suggest that cfDNA methylation signatures in NAF not only reflect the presence of cancer but also capture molecular phenotypes indicative of disease aggressiveness.
Given the well-established inverse correlation between DNA methylation and gene expression [55], we were most interested in EMT-associated genes that demonstrate a negative correlation between these parameters. There was a significant overlap between DEGs identified in TCGA and DMGs in our cfBS data, with a substantial proportion of overlapping genes demonstrating concordant expression and methylation changes. Furthermore, consistent with TCGA data, our cfBS data highlighted differential methylation in key EMT-associated genes, supporting mechanisms that regulate the expression of genes previously linked to EMT. These findings suggest that cfDNA methylation patterns can be reliable indicators of tumor-specific epigenetic alterations. Functionally, RABGAP1L was shown to promote the invasive migration of BrC cells by facilitating the recycling of active β1 integrins [56]. As a known tumor suppressor, AKAP12 inhibits the growth and metastasis of cancer cells [57]. Downregulation of PGK1, a key enzyme in aerobic glycolysis, was shown to suppress the invasion of BrC cells and reverse the EMT process [58]. S100A14 has been identified as a modulator of HER2 signaling, with its overexpression significantly enhancing migration, invasion, and metastasis of BrC cells [59,60]. Similarly, overexpression of TMPRSS4, a serine protease expressed on the cell surface that contributes to the degradation of the extracellular matrix, has been suggested to promote tumor proliferation and aggressiveness in BrC [61]. These findings highlight the critical role of cfDNA differential methylation in regulating EMT-associated gene expression and driving BrC progression.
The major limitation of our study is the relatively small sample size, which limits the statistical power for identifying methylation differences and makes cross-validation unfeasible. To address this, we expanded our analysis by incorporating external validation using the TCGA-BRCA dataset, which includes both DNA methylation and gene expression data derived from a large cohort of primary BrC and normal tissue samples. This allowed us to validate key findings, such as DMGs and EMT signatures, in an independent dataset and to establish consistent correlations between methylation alterations and gene expression changes. Furthermore, we are currently planning a larger-scale study involving NAF cfBS profiling in a large patient cohort. This follow-up study will include a broader range of BrC subtypes and clinical stages, enabling more detailed biomarker discovery, subtype stratification, and assessment of diagnostic and prognostic performance.
Collectively, this study provides proof of concept that NAF cfDNA methylation profiling via WGBS is both feasible and informative, offering a powerful, noninvasive approach for BrC screening and molecular characterization. The ability to detect EMT activation from NAF cfDNA further adds functional insight that may guide risk stratification or therapeutic decisions. Moving forward, larger cohort studies are needed to validate these markers and assess their performance in early detection, particularly in high-risk or mammographically challenging populations. The integration of NAF-based liquid biopsy into clinical workflows could complement current screening tools and advance personalized, minimally invasive diagnostics for BrC.

4. Methods

4.1. Sample Collection and cfDNA Extraction

De-identified NAF samples from 7 individuals (3 with breast cancer and 4 healthy controls) were obtained from an established biobank. No new samples were collected for this study. The Texas A&M University Institutional Review Board (IRB) reviewed the project and determined that it does not constitute research involving human subjects. The NAF samples were gently drawn using a non-invasive breast pump, as described in [62], and subsequently stored in capillary tubes at a temperature of −80 °C. QIAamp Circulating Nucleic Acid Kit (Qiagen GmbH, Hilden, Germany) was used for the isolation of cfDNA. Briefly, the NAF samples were diluted in phosphate-buffered saline to reach the minimal required volume of 1 mL prior to extraction. The extraction procedure, comprising 4 steps (lyse, bind, wash, and elute), was carried out using QIAamp Mini columns (Qiagen GmbH, Hilden, Germany) on a vacuum manifold. The cfDNA was collected in a final elution volume of 20 µL. The concentration of the extracted cfDNA was quantified using the Qubit 1X dsDNA High Sensitivity Assay (Invitrogen, Thermo Fisher Scientific, Waltham, MA, USA).

4.2. cfDNA Bisulfite Sequencing

cfBS libraries were constructed using the Pico Methyl-Seq Library Kit (Zymo Research, Irvine, CA, USA). The process began with the bisulfite treatment of the input cfDNA, which simultaneously led to its random fragmentation. Post bisulfite conversion, the DNA underwent an initial amplification using random primers. This step was followed by the ligation of adaptors and a final amplification stage using Illumina TrueSeq indices (Illumina, San Diego, CA, USA). The amplified library size was assessed by Bioanalyzer High Sensitivity DNA Analysis (Agilent Technologies, Santa Clara, CA, USA) to validate the library quality. The library concentrations were measured using the Qubit 1X dsDNA High Sensitivity assay (Invitrogen, Thermo Fisher Scientific, Waltham, MA, USA). The samples were pooled at 8 nM and sequenced on Illumina HiSeq 2000 (paired-end 150, Illumina, San Diego, CA, USA).

4.3. Bioinformatics and Statistics

The sequenced reads were evaluated through FastQC (version 0.11.8). The mapping quality and sequencing coverage were assessed through samtools flagstat (version 1.9) and NGSEP CoverageStats (version 3.3.2), respectively. All qualified cfDNA sequenced reads were aligned to the Human Reference Genome Build GRCh38 (hg38) via bwa-meth (version 0.2.2). MethylDackel (version 0.5.2) was used to extract methylation levels from each sample.
The methylation level at each CpG site was calculated by the proportion of methylated reads relative to the total read count. The CpG islands were determined by clustering individual CpG sites using MethylC, an in-house mean shift-based machine learning program. This method iteratively shifts single CpG sites towards the highest density window with an initial size of 200 bp, and it locates the local maxima of mean methylation levels.
Fisher’s exact test was performed for differential methylation analysis. The mean difference in percentage methylation (%) was computed as the mean methylation ratio of cancer samples subtracted by the mean methylation ratio of normal samples. The DMRs between cancerous and normal samples were identified using stringent criteria: Fisher’s exact test p-value < 10−8, a total read count per CpG site twice exceeding the sample size (2n), and an absolute methylation difference ≥ 10%. The DMRs were annotated to genes according to their genomic locations. DMRs located within the region between the TSS and transcription end site (TES) were classified as being associated with the gene body. A promoter region was defined as being 5000 base pairs upstream of the TSS. Those sites outside of promoter regions or gene bodies were classified as intergenic.
A statistical test based on the geometric distribution was employed to determine whether the number of overlapping genes exceeded what was expected by chance. Single gene analysis was conducted to investigate the DMGs by assessing methylation levels at individual CpG sites within the corresponding DMR.

4.4. Gene Sets Analysis

PCA was conducted for dimensionality reduction and to identify the important PCs that effectively distinguished between the groups. For pathway analysis, we utilized ‘quickpath’ (version 0.0.0.9000), an R package developed by our lab [63,64]. This analysis uncovered biological processes significantly represented in DMGs, based on the Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. A hypergeometric test, equivalent to a one-tailed version of Fisher’s exact test, was used to measure the significance of GO terms and biological pathways. The p-values were adjusted using the Benjamini–Hochberg method to control the false discovery rate (FDR) below 0.05, establishing our threshold for significance.

4.5. Computation of EMT Score

We developed a method called EMT-Met for quantifying the EMT score in each sample based on the Kolmogorov–Smirnov method [65]. First, the methylation level for each sample was normalized to ensure uniformity in scoring. Secondly, we referenced the EMT genes from [66], which were pre-categorized into two groups: Epithelial (Epi; E) and Mesenchymal (Mes; M). There were 171 E genes and 52 M genes. The selected EMT genes are listed in Supplemental File S3: Table S2. Thirdly, for these biomarkers, we calculated the empirical cumulative distribution functions (ECDFs) based on their DNA methylation levels (β values). In Figure 7, the blue and red curves are representatives of Epi and Mes, respectively. Lastly, the AUC for both Epi and Mes curves was calculated. The EMT score (Figure 7 shaded area) was calculated by subtracting the AUC for Epi from the AUC for Mes, as follows:
EMT score = AUCecdf,M − AUCecdf,E
In Equation (1), the score ranges from −1 to 1, with a positive score indicating an M phenotype, whereas a negative score is associated with the E phenotype. The EMT scoring was performed on MATLAB R2021b (MathWorks, Natick, MA, USA).

4.6. The Cancer Genome Atlas Breast Invasive Carcinoma Data Analysis

To validate the cfDNA findings, we analyzed the TCGA-BRCA dataset derived from tissue samples, focusing on EMT-associated genes identified in our study. RNA-sequencing data (raw read counts) and DNA methylation microarray data (methylation levels in β values) from the TCGA-BRCA dataset were obtained using TCGAbiolinks (version 2.34.0) in the R programming environment (version 4.4.1). Differential expression analysis was performed using the R limma package (version 3.62.1) by comparing 1111 tumor tissues with 114 adjacent normal tissues. DEGs were determined by the following criteria: absolute log2-based fold change ≥ 1 and a Bonferroni-adjusted p-value < 0.05. EMT score was calculated for each TCGA-BRCA sample using the EMT gene signatures previously identified from cfDNA methylation profiling. The β values, which represent the ratio of methylated probe intensity to the total probe intensity (both methylated and unmethylated), were used to calculate the EMT scores. To evaluate the discriminatory ability of the EMT score, we performed ROC curve analysis comparing tumor and normal samples using the pROC R package (version 1.18.5). For integrative analyses of gene expression and DNA methylation, 112 primary tumor and 84 normal solid tissue samples with both RNA-sequencing and methylation data were used. Pearson correlation analysis was performed between normalized gene expression values and methylation β values. The detailed clinical characteristics of the TCGA-BRCA cohort are provided in Supplemental File S4: Table S3.

5. Conclusions

This proof-of-concept study demonstrates the feasibility and potential of using cfDNA from NAF to detect BrC-associated DNA methylation signatures. We identified 7471 DMRs, the majority of which were enriched in gene regulatory regions, particularly promoters. This enrichment established a strong association between these epigenetic alterations and the gene expression changes observed in the TCGA-BRCA dataset. Our findings highlight the activation of the EMT program in BrC samples, supported by both methylation and transcriptomic data. By introducing a novel EMT scoring method based on cfDNA methylation patterns, we demonstrated that BrC samples exhibit a more mesenchymal phenotype, which correlates with increasing tumor stage. Together, these results underscore the clinical utility of NAF cfDNA methylation profiling as a noninvasive, localized, and informative approach for early BrC detection and molecular characterization. Future studies with larger cohorts are warranted to validate these findings and advance NAF-based liquid biopsy into routine clinical screening.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms26178723/s1.

Author Contributions

M.S.J. performed the data analysis and drafted the manuscript. Z.D. helped prepare libraries for sequencing. C.P. provided computational support. J.L. developed the EMT. scoring algorithm. K.K.Z., E.S. and L.X. conceived the idea and contributed to manuscript writing. All authors have read and agreed to the published version of the manuscript.

Funding

The study is supported by the Texas A&M University Presidential Clinical Research Partnership program.

Institutional Review Board Statement

De-identified NAF samples from 7 individuals (3 with breast cancer and 4 healthy controls) were obtained from an established biobank. No new samples were collected for this study. The Texas A&M University Institutional Review Board (IRB) reviewed the project and determined that it does not constitute research involving human sub-jects.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used in this study are available through the Gene Expression Omnibus (GEO) database with GEO accession number GSE238014. The R package ‘quickpath’ is available on Github: https://github.com/jiangyuan2li/quickpath (accessed on 6 September 2025). The code for MethylC is available on Github: https://github.com/minsunsjeon/MethylC (accessed on 6 September 2025). The code for EMT-Met is available on Github: https://github.com/minsunsjeon/EMT-Met (accessed on 6 September 2025).

Acknowledgments

We gratefully acknowledge the Rigor and Reproducibility Core at the Texas A&M Institute of Biosciences and Technology for their support with data analysis.

Conflicts of Interest

The authors declare no conflict of interest.

Declarations

Opinions expressed by the authors are their own, and this material should not be interpreted as representing the official viewpoint of the US Department of Health and Human Services, the National Institutes of Health, the National Cancer Institute, or the Division of Cancer Prevention.

Abbreviations

The following abbreviations are used in this manuscript:
AUCArea Under the Curve
BrCBreast Cancer
cfBScfDNA Bisulfite Sequencing
cfDNACell-free DNA
DEGDifferentially Expressed Gene
DMGDifferentially Methylated Gene
DMRDifferentially Methylated Region
ECDFEmpirical Cumulative Distribution Function
EMTEpithelial–Mesenchymal Transition
EpiEpithelial
FDRFalse Discovery Rate
GOGene Ontology
KEGGKyoto Encyclopedia of Genes and Genomes
MesMesenchymal
MSPMethylation-Specific PCR
NAFNipple Aspirate Fluid
PCPrincipal Component
PCAPrincipal Component Analysis
ROCReceiver Operating Characteristic
TCGA-BRCAThe Cancer Genome Atlas Breast Invasive Carcinoma
TESTranscription End Site
TSSTranscription Start Site
WGBSWhole-Genome Bisulfite Sequencing

References

  1. Grimm, L.J.; Avery, C.S.; Hendrick, E.; Baker, J.A. Benefits and Risks of Mammography Screening in Women Ages 40 to 49 Years. J. Prim. Care Community Health 2022, 13, 21501327211058322. [Google Scholar] [CrossRef]
  2. Dahabreh, I.J.; Wieland, L.S.; Adam, G.P.; Halladay, C.; Lau, J.; Trikalinos, T.A. Core Needle and Open Surgical Biopsy for Diagnosis of Breast Lesions: An Update to the 2009 Report; Agency for Healthcare Research and Quality: Rockville, MD, USA, 2014.
  3. Fahy, B.N.; Bold, R.J.; Schneider, P.D.; Khatri, V.; Goodnight, J.E., Jr. Cost-benefit analysis of biopsy methods for suspicious mammographic lesions; discussion 994-5. Arch. Surg. 2001, 136, 990–994. [Google Scholar] [CrossRef]
  4. Siegel, R.L.; Miller, K.D.; Fuchs, H.E.; Jemal, A. Cancer statistics, 2022. CA Cancer J. Clin. 2022, 72, 7–33. [Google Scholar] [CrossRef]
  5. Diehl, F.; Schmidt, K.; Choti, M.A.; Romans, K.; Goodman, S.; Li, M.; Thornton, K.; Agrawal, N.; Sokoll, L.; Szabo, S.A.; et al. Circulating mutant DNA to assess tumor dynamics. Nat. Med. 2008, 14, 985–990. [Google Scholar] [CrossRef]
  6. Luo, H.; Wei, W.; Ye, Z.; Zheng, J.; Xu, R.H. Liquid Biopsy of Methylation Biomarkers in Cell-Free DNA. Trends Mol. Med. 2021, 27, 482–500. [Google Scholar] [CrossRef] [PubMed]
  7. Poduval, D.B.; Ognedal, E.; Sichmanova, Z.; Valen, E.; Iversen, G.T.; Minsaas, L.; Lonning, P.E.; Knappskog, S. Assessment of tumor suppressor promoter methylation in healthy individuals. Clin. Epigenetics 2020, 12, 131. [Google Scholar] [CrossRef]
  8. Zhang, X.; Zhao, D.; Yin, Y.; Yang, T.; You, Z.; Li, D.; Chen, Y.; Jiang, Y.; Xu, S.; Geng, J.; et al. Circulating cell-free DNA-based methylation patterns for breast cancer diagnosis. NPJ Breast Cancer 2021, 7, 106. [Google Scholar] [CrossRef]
  9. Rodriguez-Casanova, A.; Costa-Fraga, N.; Castro-Carballeira, C.; Gonzalez-Conde, M.; Abuin, C.; Bao-Caamano, A.; Garcia-Caballero, T.; Brozos-Vazquez, E.; Rodriguez-Lopez, C.; Cebey, V.; et al. A genome-wide cell-free DNA methylation analysis identifies an episignature associated with metastatic luminal B breast cancer. Front. Cell Dev. Biol. 2022, 10, 1016955. [Google Scholar] [CrossRef] [PubMed]
  10. Manoochehri, M.; Borhani, N.; Gerhauser, C.; Assenov, Y.; Schonung, M.; Hielscher, T.; Christensen, B.C.; Lee, M.K.; Grone, H.J.; Lipka, D.B.; et al. DNA methylation biomarkers for noninvasive detection of triple-negative breast cancer using liquid biopsy. Int. J. Cancer 2023, 152, 1025–1035. [Google Scholar] [CrossRef] [PubMed]
  11. Salimi, M.; Rastegarpouyani, S. E74-like Factor 5 Promoter Methylation in Circulating Tumor DNA as a Potential Prognostic Marker in Breast Cancer Patients. Asian Pac. J. Cancer Prev. 2023, 24, 4035–4041. [Google Scholar] [CrossRef]
  12. Ruiz-De La Cruz, M.; Martinez-Gregorio, H.; Estela Diaz-Velasquez, C.; Ambriz-Barrera, F.; Resendiz-Flores, N.G.; Gitler-Weingarten, R.; Rojo-Castillo, M.P.; Pradda, D.; Oliver, J.; Perdomo, S.; et al. Methylation marks in blood DNA reveal breast cancer risk in patients fulfilling hereditary disease criteria. NPJ Precis. Oncol. 2024, 8, 136. [Google Scholar] [CrossRef] [PubMed]
  13. Liu, J.; Zhao, H.; Huang, Y.; Xu, S.; Zhou, Y.; Zhang, W.; Li, J.; Ming, Y.; Wang, X.; Zhao, S.; et al. Genome-wide cell-free DNA methylation analyses improve accuracy of non-invasive diagnostic imaging for early-stage breast cancer. Mol. Cancer 2021, 20, 36. [Google Scholar] [CrossRef] [PubMed]
  14. Salta, S.; Nunes, S.P.; Fontes-Sousa, M.; Lopes, P.; Freitas, M.; Caldas, M.; Antunes, L.; Castro, F.; Antunes, P.; Palma de Sousa, S.; et al. A DNA Methylation-Based Test for Breast Cancer Detection in Circulating Cell-Free DNA. J. Clin. Med. 2018, 7, 420. [Google Scholar] [CrossRef] [PubMed]
  15. Nunes, S.P.; Moreira-Barbosa, C.; Salta, S.; Palma de Sousa, S.; Pousa, I.; Oliveira, J.; Soares, M.; Rego, L.; Dias, T.; Rodrigues, J.; et al. Cell-Free DNA Methylation of Selected Genes Allows for Early Detection of the Major Cancers in Women. Cancers 2018, 10, 357. [Google Scholar] [CrossRef]
  16. Hagrass, H.A.; Pasha, H.F.; Shaheen, M.A.; Abdel Bary, E.H.; Kassem, R. Methylation status and protein expression of RASSF1A in breast cancer patients. Mol. Biol. Rep. 2014, 41, 57–65. [Google Scholar] [CrossRef]
  17. Liu, M.C.; Oxnard, G.R.; Klein, E.A.; Swanton, C.; Seiden, M.V.; Consortium, C. Sensitive and specific multi-cancer detection and localization using methylation signatures in cell-free DNA. Ann. Oncol. 2020, 31, 745–759. [Google Scholar] [CrossRef]
  18. Lau, B.T.; Almeda, A.; Schauer, M.; McNamara, M.; Bai, X.; Meng, Q.; Partha, M.; Grimes, S.M.; Lee, H.; Heestand, G.M.; et al. Single-molecule methylation profiles of cell-free DNA in cancer with nanopore sequencing. Genome Med. 2023, 15, 33. [Google Scholar] [CrossRef]
  19. Sauter, E.R. Analysis of nipple aspirate fluid for diagnosis of breast cancer: An alternative to invasive biopsy. Expert Rev. Mol. Diagn. 2005, 5, 873–881. [Google Scholar] [CrossRef]
  20. Sauter, E.R.; Wagner-Mann, C.; Ehya, H.; Klein-Szanto, A. Biologic markers of breast cancer in nipple aspirate fluid and nipple discharge are associated with clinical findings. Cancer Detect. Prev. 2007, 31, 50–58. [Google Scholar] [CrossRef]
  21. Qin, W.; Gui, G.; Zhang, K.; Twelves, D.; Kliethermes, B.; Sauter, E.R. Proteins and carbohydrates in nipple aspirate fluid predict the presence of atypia and cancer in women requiring diagnostic breast biopsy. BMC Cancer 2012, 12, 52. [Google Scholar] [CrossRef] [PubMed]
  22. Qin, W.; Zhang, K.; Clarke, K.; Weiland, T.; Sauter, E.R. Methylation and miRNA effects of resveratrol on mammary tumors vs. normal tissue. Nutr. Cancer 2014, 66, 270–277. [Google Scholar] [CrossRef]
  23. Sauter, E.R.; Ross, E.; Daly, M.; Klein-Szanto, A.; Engstrom, P.F.; Sorling, A.; Malick, J.; Ehya, H. Nipple aspirate fluid: A promising non-invasive method to identify cellular markers of breast cancer risk. Br. J. Cancer 1997, 76, 494–501. [Google Scholar] [CrossRef] [PubMed]
  24. Mannello, F.; Tonti, G.A.; Qin, W.; Zhu, W.; Sauter, E.R. Do nipple aspirate fluid epithelial cells and their morphology predict breast cancer development? Breast Cancer Res. Treat. 2007, 102, 125–127. [Google Scholar] [CrossRef] [PubMed]
  25. Kurdyukov, S.; Bullock, M. DNA Methylation Analysis: Choosing the Right Method. Biology 2016, 5, 3. [Google Scholar] [CrossRef]
  26. Gardiner-Garden, M.; Frommer, M. CpG islands in vertebrate genomes. J. Mol. Biol. 1987, 196, 261–282. [Google Scholar] [CrossRef] [PubMed]
  27. Krassenstein, R.; Sauter, E.; Dulaimi, E.; Battagli, C.; Ehya, H.; Klein-Szanto, A.; Cairns, P. Detection of breast cancer in nipple aspirate fluid by CpG island hypermethylation. Clin. Cancer Res. 2004, 10 Pt 1, 28–32. [Google Scholar] [CrossRef]
  28. Qin, W.; Zhu, W.; Sauter, E. Detection of gene methylation in nipple aspirate fluid of breast cancer patients by methylation-specific PCR. Cancer Res. 2004, 64 (Suppl. S7), 302. [Google Scholar]
  29. de Groot, J.S.; Moelans, C.B.; Elias, S.G.; Jo Fackler, M.; van Domselaar, R.; Suijkerbuijk, K.P.; Witkamp, A.J.; Sukumar, S.; van Diest, P.J.; van der Wall, E. DNA promoter hypermethylation in nipple fluid: A potential tool for early breast cancer detection. Oncotarget 2016, 7, 24778–24791. [Google Scholar] [CrossRef]
  30. El Messaoudi, S.; Rolet, F.; Mouliere, F.; Thierry, A.R. Circulating cell free DNA: Preanalytical considerations. Clin. Chim. Acta 2013, 424, 222–230. [Google Scholar] [CrossRef]
  31. Page, K.; Martinson, L.J.; Fernandez-Garcia, D.; Hills, A.; Gleason, K.L.T.; Gray, M.C.; Rushton, A.J.; Nteliopoulos, G.; Hastings, R.K.; Goddard, K.; et al. Circulating Tumor DNA Profiling From Breast Cancer Screening Through to Metastatic Disease. JCO Precis. Oncol. 2021, 5, 522. [Google Scholar] [CrossRef]
  32. Moss, J.; Zick, A.; Grinshpun, A.; Carmon, E.; Maoz, M.; Ochana, B.L.; Abraham, O.; Arieli, O.; Germansky, L.; Meir, K.; et al. Circulating breast-derived DNA allows universal detection and monitoring of localized breast cancer. Ann. Oncol. 2020, 31, 395–403. [Google Scholar] [CrossRef]
  33. Ziller, M.J.; Hansen, K.D.; Meissner, A.; Aryee, M.J. Coverage recommendations for methylation analysis by whole-genome bisulfite sequencing. Nat. Methods 2015, 12, 230–232. [Google Scholar] [CrossRef]
  34. Zhu, Y.; Cao, Z.; Lu, C. Microfluidic MeDIP-seq for low-input methylomic analysis of mammary tumorigenesis in mice. Analyst 2019, 144, 1904–1915. [Google Scholar] [CrossRef]
  35. Taiwo, O.; Wilson, G.A.; Morris, T.; Seisenberger, S.; Reik, W.; Pearce, D.; Beck, S.; Butcher, L.M. Methylome analysis using MeDIP-seq with low DNA concentrations. Nat. Protoc. 2012, 7, 617–636. [Google Scholar] [CrossRef] [PubMed]
  36. Lussier, A.A.; Schuurmans, I.K.; Grossbach, A.; Maclsaac, J.; Dever, K.; Koen, N.; Zar, H.J.; Stein, D.J.; Kobor, M.S.; Dunn, E.C. Technical variability across the 450K, EPICv1, and EPICv2 DNA methylation arrays: Lessons learned for clinical and longitudinal studies. Clin. Epigenetics 2024, 16, 166. [Google Scholar] [CrossRef] [PubMed]
  37. Xie, L.; Weichel, B.; Ohm, J.E.; Zhang, K. An integrative analysis of DNA methylation and RNA-Seq data for human heart, kidney and liver. BMC Syst. Biol. 2011, 5 (Suppl. S3), S4. [Google Scholar] [CrossRef]
  38. Beck, S.; Rakyan, V.K. The methylome: Approaches for global DNA methylation profiling. Trends Genet. 2008, 24, 231–237. [Google Scholar] [CrossRef]
  39. Gao, Y.; Zhao, H.; An, K.; Liu, Z.; Hai, L.; Li, R.; Zhou, Y.; Zhao, W.; Jia, Y.; Wu, N.; et al. Whole-genome bisulfite sequencing analysis of circulating tumour DNA for the detection and molecular classification of cancer. Clin. Transl. Med. 2022, 12, e1014. [Google Scholar] [CrossRef] [PubMed]
  40. Qi, T.; Zhou, Y.; Sheng, Y.; Li, Z.; Yang, Y.; Liu, Q.; Ge, Q. Prediction of Transcription Factor Binding Sites on Cell-Free DNA Based on Deep Learning. J. Chem. Inf. Model. 2024, 64, 4002–4008. [Google Scholar] [CrossRef]
  41. Carmona, F.J.; Davalos, V.; Vidal, E.; Gomez, A.; Heyn, H.; Hashimoto, Y.; Vizoso, M.; Martinez-Cardus, A.; Sayols, S.; Ferreira, H.J.; et al. A comprehensive DNA methylation profile of epithelial-to-mesenchymal transition. Cancer Res. 2014, 74, 5608–5619. [Google Scholar] [CrossRef]
  42. Titus, A.J.; Way, G.P.; Johnson, K.C.; Christensen, B.C. Deconvolution of DNA methylation identifies differentially methylated gene regions on 1p36 across breast cancer subtypes. Sci. Rep. 2017, 7, 11594. [Google Scholar] [CrossRef]
  43. Ball, M.P.; Li, J.B.; Gao, Y.; Lee, J.H.; LeProust, E.M.; Park, I.H.; Xie, B.; Daley, G.Q.; Church, G.M. Targeted and genome-scale strategies reveal gene-body methylation signatures in human cells. Nat. Biotechnol. 2009, 27, 361–368, Erratum in Nat. Biotechnol. 2009, 27, 485. [Google Scholar] [CrossRef]
  44. Suzuki, M.M.; Bird, A. DNA methylation landscapes: Provocative insights from epigenomics. Nat. Rev. Genet. 2008, 9, 465–476. [Google Scholar] [CrossRef]
  45. Pistore, C.; Giannoni, E.; Colangelo, T.; Rizzo, F.; Magnani, E.; Muccillo, L.; Giurato, G.; Mancini, M.; Rizzo, S.; Riccardi, M.; et al. DNA methylation variations are required for epithelial-to-mesenchymal transition induced by cancer-associated fibroblasts in prostate cancer cells. Oncogene 2017, 36, 5551–5566. [Google Scholar] [CrossRef] [PubMed]
  46. Lombaerts, M.; van Wezel, T.; Philippo, K.; Dierssen, J.W.; Zimmerman, R.M.; Oosting, J.; van Eijk, R.; Eilers, P.H.; van de Water, B.; Cornelisse, C.J.; et al. E-cadherin transcriptional downregulation by promoter methylation but not mutation is related to epithelial-to-mesenchymal transition in breast cancer cell lines. Br. J. Cancer 2006, 94, 661–671. [Google Scholar] [CrossRef]
  47. Caldeira, J.R.; Prando, E.C.; Quevedo, F.C.; Neto, F.A.; Rainho, C.A.; Rogatto, S.R. CDH1 promoter hypermethylation and E-cadherin protein expression in infiltrating breast cancer. BMC Cancer 2006, 6, 48. [Google Scholar] [CrossRef]
  48. Ai, L.; Kim, W.J.; Alpay, M.; Tang, M.; Pardo, C.E.; Hatakeyama, S.; May, W.S.; Kladde, M.P.; Heldermon, C.D.; Siegel, E.M.; et al. TRIM29 suppresses TWIST1 and invasive breast cancer behavior. Cancer Res. 2014, 74, 4875–4887. [Google Scholar] [CrossRef] [PubMed]
  49. Zougros, A.; Michelli, M.; Chatziandreou, I.; Nonni, A.; Gakiopoulou, H.; Michalopoulos, N.V.; Lazaris, A.C.; Saetta, A.A. mRNA coexpression patterns of Wnt pathway components and their clinicopathological associations in breast and colorectal cancer. Pathol. Res. Pr. 2021, 227, 153649. [Google Scholar] [CrossRef] [PubMed]
  50. Vargas, A.C.; McCart Reed, A.E.; Waddell, N.; Lane, A.; Reid, L.E.; Smart, C.E.; Cocciardi, S.; da Silva, L.; Song, S.; Chenevix-Trench, G.; et al. Gene expression profiling of tumour epithelial and stromal compartments during breast cancer progression. Breast Cancer Res. Treat. 2012, 135, 153–165. [Google Scholar] [CrossRef]
  51. Sheng, L.; Wei, R. Long Non-Coding RNA-CASC15 Promotes Cell Proliferation, Migration, and Invasion by Activating Wnt/β-Catenin Signaling Pathway in Melanoma. Pathobiology 2020, 87, 20–29. [Google Scholar] [CrossRef]
  52. Li, Y.; Chen, G.; Yan, Y.; Fan, Q. CASC15 promotes epithelial to mesenchymal transition and facilitates malignancy of hepatocellular carcinoma cells by increasing TWIST1 gene expression via miR-33a-5p sponging. Eur. J. Pharmacol. 2019, 860, 172589. [Google Scholar] [CrossRef]
  53. Tang, X.; Tang, J.; Liu, X.; Zeng, L.; Cheng, C.; Luo, Y.; Li, L.; Qin, S.L.; Sang, Y.; Deng, L.M.; et al. Downregulation of miR-129-2 by promoter hypermethylation regulates breast cancer cell proliferation and apoptosis. Oncol. Rep. 2016, 35, 2963–2969. [Google Scholar] [CrossRef]
  54. Yu, Y.; Zhao, Y.; Sun, X.H.; Ge, J.; Zhang, B.; Wang, X.; Cao, X.C. Down-regulation of miR-129-5p via the Twist1-Snail feedback loop stimulates the epithelial-mesenchymal transition and is associated with poor prognosis in breast cancer. Oncotarget 2015, 6, 34423–34436. [Google Scholar] [CrossRef]
  55. Wajed, S.A.; Laird, P.W.; DeMeester, T.R. DNA methylation: An alternative pathway to cancer. Ann. Surg. 2001, 234, 10–20. [Google Scholar] [CrossRef]
  56. Samarelli, A.V.; Ziegler, T.; Meves, A.; Fassler, R.; Bottcher, R.T. Rabgap1 promotes recycling of active beta1 integrins to support effective cell migration. J. Cell Sci. 2020, 133, 243683. [Google Scholar] [CrossRef]
  57. Wu, X.; Wu, T.; Li, K.; Li, Y.; Hu, T.T.; Wang, W.F.; Qiang, S.J.; Xue, S.B.; Liu, W.W. The Mechanism and Influence of AKAP12 in Different Cancers. Biomed Env. Sci 2018, 31, 927–932. [Google Scholar] [CrossRef]
  58. Zhang, K.; Sun, L.; Kang, Y. Regulation of phosphoglycerate kinase 1 and its critical role in cancer. Cell Commun. Signal. 2023, 21, 240. [Google Scholar] [CrossRef]
  59. Xu, C.; Chen, H.; Wang, X.; Gao, J.; Che, Y.; Li, Y.; Ding, F.; Luo, A.; Zhang, S.; Liu, Z. S100A14, a member of the EF-hand calcium-binding proteins, is overexpressed in breast cancer and acts as a modulator of HER2 signaling. J. Biol. Chem. 2014, 289, 827–837. [Google Scholar] [CrossRef] [PubMed]
  60. Li, X.; Wang, M.; Gong, T.; Lei, X.; Hu, T.; Tian, M.; Ding, F.; Ma, F.; Chen, H.; Liu, Z. A S100A14-CCL2/CXCL5 signaling axis drives breast cancer metastasis. Theranostics 2020, 10, 5687–5703. [Google Scholar] [CrossRef] [PubMed]
  61. Li, X.M.; Liu, W.L.; Chen, X.; Wang, Y.W.; Shi, D.B.; Zhang, H.; Ma, R.R.; Liu, H.T.; Guo, X.Y.; Hou, F.; et al. Overexpression of TMPRSS4 promotes tumor proliferation and aggressiveness in breast cancer. Int. J. Mol. Med. 2017, 39, 927–935. [Google Scholar] [CrossRef] [PubMed]
  62. Sade-Feldman, M.; Yizhak, K.; Bjorgaard, S.L.; Ray, J.P.; de Boer, C.G.; Jenkins, R.W.; Lieb, D.J.; Chen, J.H.; Frederick, D.T.; Barzily-Rokni, M.; et al. Defining T Cell States Associated with Response to Checkpoint Immunotherapy in Melanoma. Cell 2019, 176, 404. [Google Scholar] [CrossRef]
  63. Zhang, K.; Wang, H.; Bathke, A.C.; Harrar, S.W.; Piepho, H.P.; Deng, Y. Gene set analysis for longitudinal gene expression data. BMC Bioinform. 2011, 12, 273. [Google Scholar] [CrossRef] [PubMed]
  64. Garrett, S.H.; Clarke, K.; Sens, D.A.; Deng, Y.; Somji, S.; Zhang, K.K. Short and long term gene expression variation and networking in human proximal tubule cells when exposed to cadmium. BMC Med. Genom. 2013, 6 (Suppl. S1), S2. [Google Scholar] [CrossRef] [PubMed]
  65. Tan, T.Z.; Miow, Q.H.; Miki, Y.; Noda, T.; Mori, S.; Huang, R.Y.; Thiery, J.P. Epithelial-mesenchymal transition spectrum quantification and its efficacy in deciphering survival and drug responses of cancer patients. EMBO Mol. Med. 2014, 6, 1279–1293. [Google Scholar] [CrossRef] [PubMed]
  66. Chakraborty, P.; George, J.T.; Tripathi, S.; Levine, H.; Jolly, M.K. Comparative Study of Transcriptomics-Based Scoring Metrics for the Epithelial-Hybrid-Mesenchymal Spectrum. Front. Bioeng. Biotechnol. 2020, 8, 220. [Google Scholar] [CrossRef]
Figure 1. Characterization of DNA methylation patterns in breast cancer (BrC). (a) Manhattan plot showing the significance of differential methylation for CpG regions. The black dotted line indicates the threshold for significance (p-value < 0.05). (b) Volcano plot of the mean difference between normal and cancer samples versus the p-value. Dashed lines indicate cut-off for differentially methylated regions (DMRs)—abs mean difference ≥ 10%, p-value < 10−8, blue = hypomethylation, red = hypermethylation. (c) The relative distance of CpG regions to the transcription start site (TSS) versus the p-value. (d) Frequency of hypo- and hyper-DMRs; distribution of DMRs in the genome. (e) Methylation profile heatmap around the TSS of top significant genes. (f) Venn diagram showing the overlap between differentially expressed genes (DEGs) in BrC with differentially methylated genes (DMGs).
Figure 1. Characterization of DNA methylation patterns in breast cancer (BrC). (a) Manhattan plot showing the significance of differential methylation for CpG regions. The black dotted line indicates the threshold for significance (p-value < 0.05). (b) Volcano plot of the mean difference between normal and cancer samples versus the p-value. Dashed lines indicate cut-off for differentially methylated regions (DMRs)—abs mean difference ≥ 10%, p-value < 10−8, blue = hypomethylation, red = hypermethylation. (c) The relative distance of CpG regions to the transcription start site (TSS) versus the p-value. (d) Frequency of hypo- and hyper-DMRs; distribution of DMRs in the genome. (e) Methylation profile heatmap around the TSS of top significant genes. (f) Venn diagram showing the overlap between differentially expressed genes (DEGs) in BrC with differentially methylated genes (DMGs).
Ijms 26 08723 g001
Figure 2. Principal component analysis reveals epithelial–mesenchymal transition (EMT). (a). Hierarchical clustering of DNA methylation profiles. (b). Heatmap of dominant genes in PC4. (c). Top 15 cancer-related pathways (q-value < 0.01), showing the percentage and p-value of each pathway.
Figure 2. Principal component analysis reveals epithelial–mesenchymal transition (EMT). (a). Hierarchical clustering of DNA methylation profiles. (b). Heatmap of dominant genes in PC4. (c). Top 15 cancer-related pathways (q-value < 0.01), showing the percentage and p-value of each pathway.
Ijms 26 08723 g002
Figure 3. EMT status of BrC. (a) Epithelial (left) and mesenchymal (right) area under the curve (AUC). (b) EMT score between cancer and normal groups (* p-value < 0.05). (c) Heatmap of EMT gene signatures.
Figure 3. EMT status of BrC. (a) Epithelial (left) and mesenchymal (right) area under the curve (AUC). (b) EMT score between cancer and normal groups (* p-value < 0.05). (c) Heatmap of EMT gene signatures.
Ijms 26 08723 g003
Figure 4. Validation of EMT scoring method in The Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA) dataset. (a) EMT score between primary tumor and normal solid tissue (*** p-value < 0.001). (b) Receiver operating characteristic (ROC) curve for EMT score discriminating tumor versus normal samples. (c) Percentage of mesenchymal samples by tumor stage.
Figure 4. Validation of EMT scoring method in The Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA) dataset. (a) EMT score between primary tumor and normal solid tissue (*** p-value < 0.001). (b) Receiver operating characteristic (ROC) curve for EMT score discriminating tumor versus normal samples. (c) Percentage of mesenchymal samples by tumor stage.
Ijms 26 08723 g004
Figure 5. EMT gene expression and methylation correlations in TCGA-BRCA. (a) Expression heatmap of EMT gene signatures in TCGA-BRCA samples. (b) Representative scatter plots showing correlations between DNA methylation levels and gene expression levels for EMT genes.
Figure 5. EMT gene expression and methylation correlations in TCGA-BRCA. (a) Expression heatmap of EMT gene signatures in TCGA-BRCA samples. (b) Representative scatter plots showing correlations between DNA methylation levels and gene expression levels for EMT genes.
Ijms 26 08723 g005
Figure 6. Methylation status of CpG sites of genes associated with BrC and EMT. Hypermethylation observed for the cancer group (red) in the promoter and gene body regions.
Figure 6. Methylation status of CpG sites of genes associated with BrC and EMT. Hypermethylation observed for the cancer group (red) in the promoter and gene body regions.
Ijms 26 08723 g006
Figure 7. EMT scoring method. The blue curve indicates the empirical cumulative distribution function (ECDF) of epithelial gene signatures, while the red curve indicates the ECDF of mesenchymal signatures. The shaded area (EMT score) is derived by subtracting the AUC for epithelial genes from the AUC of mesenchymal genes.
Figure 7. EMT scoring method. The blue curve indicates the empirical cumulative distribution function (ECDF) of epithelial gene signatures, while the red curve indicates the ECDF of mesenchymal signatures. The shaded area (EMT score) is derived by subtracting the AUC for epithelial genes from the AUC of mesenchymal genes.
Ijms 26 08723 g007
Table 1. Sample information and the genome-mapping statistics of cfDNA bisulfite sequenced reads.
Table 1. Sample information and the genome-mapping statistics of cfDNA bisulfite sequenced reads.
SampleCancer StatusRaceAgeNo. of Total ReadsNo. of Total Mapped Reads% MappingSequence DepthPercent Coverage
1BenignWhite46129,617,861127,235,74198.2%22.3×53.4%
2BenignWhite37116,237,181112,740,14497.0%27.4×38.4%
3BenignWhite54157,025,790154,506,00898.4%33.9×42.5%
4BenignWhite38104,337,726102,763,75098.5%21.5×44.6%
5CancerWhite51136,874,162135,539,28799.0%23.6×53.8%
6CancerWhite50203,458,690200,828,21698.7%31.9×58.8%
7CancerWhite40107,615,356106,564,36399.0%24.7×40.3%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jeon, M.S.; Ding, Z.; Pei, C.; Li, J.; Xie, L.; Sauter, E.; Zhang, K.K. Cell-Free DNA Bisulfite Sequencing Reveals Epithelial–Mesenchymal Transition Signatures for Breast Cancer. Int. J. Mol. Sci. 2025, 26, 8723. https://doi.org/10.3390/ijms26178723

AMA Style

Jeon MS, Ding Z, Pei C, Li J, Xie L, Sauter E, Zhang KK. Cell-Free DNA Bisulfite Sequencing Reveals Epithelial–Mesenchymal Transition Signatures for Breast Cancer. International Journal of Molecular Sciences. 2025; 26(17):8723. https://doi.org/10.3390/ijms26178723

Chicago/Turabian Style

Jeon, Minsun Stacey, Zehuan Ding, Casey Pei, Jing Li, Linglin Xie, Edward Sauter, and Ke Kurt Zhang. 2025. "Cell-Free DNA Bisulfite Sequencing Reveals Epithelial–Mesenchymal Transition Signatures for Breast Cancer" International Journal of Molecular Sciences 26, no. 17: 8723. https://doi.org/10.3390/ijms26178723

APA Style

Jeon, M. S., Ding, Z., Pei, C., Li, J., Xie, L., Sauter, E., & Zhang, K. K. (2025). Cell-Free DNA Bisulfite Sequencing Reveals Epithelial–Mesenchymal Transition Signatures for Breast Cancer. International Journal of Molecular Sciences, 26(17), 8723. https://doi.org/10.3390/ijms26178723

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop