Next Article in Journal
The Role of Natural Antimicrobials in Reducing the Virulence of Vibrio parahaemolyticus TPD in Shrimp Gut and Hepatopancreas Primary Cells and in a Post-Larvae Challenge Trial
Previous Article in Journal
Administration of Adipose-Derived Stem Cells Lowers the Initial Levels of IL6 and TNF-Alpha in the Rat Model of Necrotizing Enterocolitis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Unveiling Epigenetic Regulatory Elements Associated with Breast Cancer Development

by
Marta Jardanowska-Kotuniak
1,2,†,
Michał Dramiński
1,†,
Michal Wlasnowolski
3,
Marcin Łapiński
1,
Kaustav Sengupta
3,
Abhishek Agarwal
4,
Adam Filip
1,
Nimisha Ghosh
5,
Vera Pancaldi
6,
Marcin Grynberg
2,
Indrajit Saha
7,
Dariusz Plewczynski
3,4,* and
Michał J. Dąbrowski
1,*
1
Computational Biology Group, Institute of Computer Science of the Polish Academy of Sciences, 01-248 Warsaw, Poland
2
Institute of Biochemistry and Biophysics of the Polish Academy of Sciences, 02-106 Warsaw, Poland
3
Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, 00-662 Warsaw, Poland
4
Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, 02-097 Warsaw, Poland
5
Department of Computer Science and Engineering, Shiv Nadar University, Chennai 201314, India
6
Cancer Research Center Toulouse, National Centre for Scientific Research (CNRS), Inserm, Université de Toulouse, 31037 Toulouse, France
7
Department of Computer Science and Engineering, National Institute of Technical Teachers’ Training and Research, Kolkata 700106, India
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Int. J. Mol. Sci. 2025, 26(14), 6558; https://doi.org/10.3390/ijms26146558
Submission received: 5 May 2025 / Revised: 27 June 2025 / Accepted: 30 June 2025 / Published: 8 July 2025
(This article belongs to the Section Molecular Oncology)

Abstract

Breast cancer affects over 2 million women annually and results in 650,000 deaths. This study aimed to identify epigenetic mechanisms impacting breast cancer-related gene expression, discover potential biomarkers, and present a novel approach integrating feature selection, Natural Language Processing, and 3D chromatin structure analysis. We used The Cancer Genome Atlas database with over 800 samples and multi-omics datasets (mRNA, miRNA, DNA methylation) to select 2701 features statistically significant in cancer versus control samples, from an initial 417,486, using the Monte Carlo Feature Selection and Interdependency Discovery algorithm. Classification of cancer vs. control samples on the selected features returned very high accuracy, depending on feature-type and classifier. The cancer samples generally showed lower expression of differentially expressed genes (DEGs) and increased β-values of differentially methylated sites (DMSs). We identified mRNAs whose expression is explained by miRNA expression and β-values of DMSs. We recognized DMSs affecting NRF1 and MXI1 transcription factors binding, causing a disturbance in NKAPL and PITX1 expression, respectively. Our 3D models showed more loosely packed chromatin in cancer. This study highlights numerous possible regulatory dependencies, and the presented bioinformatic approach provides a robust framework for data dimensionality reduction, enabling the identification of key features for further experimental validation.

1. Introduction

In 2021, the WHO announced that, for the first time in 20 years, the most commonly diagnosed cancer in the world was not lung cancer but breast cancer. According to the GLOBOCAN report publishing cancer statistics for 2020 based on data from 185 countries and 36 different types of cancer, 2.3 million people were affected by breast cancer and 684,996 people died from it [1]. It means that, currently, almost one in four oncology female patients develops breast cancer. Early-stage cancer detection is crucial to apply the most effective therapy available [2]. Due to the heterogeneous nature of breast cancer and the large amount of information to be considered, the implementation of appropriate treatment is extremely difficult [3].
The effectiveness of cancer therapies is closely related to diagnostic accuracy. Inclusion of molecular features in the classification of cancers [4,5] has allowed for the development of targeted treatment. Further molecular studies aimed at detecting cancer markers and potential drug targets are essential to further improve available therapies. Large molecular databases, while extremely rich in information, are also burdened with a certain level of information noise, which is a significant analytical challenge, especially when the number of samples available is limited and data dimension is high. In such a case, the risk of obtaining false positives and false negatives may be higher because, from the statistical perspective, the problem is ill-defined [6]. To overcome this challenge, in this research, we applied the Monte Carlo Feature Selection and Interdependency Discovery (MCFS-ID) algorithm [7] to reveal significant signals related to breast cancer in various molecular datasets [8]. The MCFS-ID has been successfully used in a broad range of scientific disciplines, including oncology, virology, and cardiology [9,10,11,12,13].
It is a known fact that breast cancer is associated with multiple DNA mutations and genome rearrangements that affect cell physiology, resulting in gene expression changes [14,15]. Yet, it has been shown that DNA alterations alone cannot fully explain breast cancer development, and in recent years there has been accelerated research toward the epigenetic regulation of cancer-related gene expression [16,17,18,19,20].
One of the most important and well-studied epigenetic modifications is DNA methylation. Methylation of gene promoters and regulatory regions plays an essential role in regulating gene expression and shows high variation across cell types. Its deregulation is associated with tumorigenesis [21,22,23] and was demonstrated to have a role in predicting patient survival [17]. The DNA binding affinity of multiple transcription factors (TFs) relies on DNA methylation patterns [24,25]. Interestingly, DNA hypo-methylation is present in the regulatory regions of oncogenes promoting tumorigenesis [26,27,28], while hyper-methylation is frequently connected with the silencing of tumor suppressor genes [29,30,31], but other research shows that these patterns may be more complex and depend on genomic location of the methylation alterations [32,33]. That is why a further large-scale analysis of locus-specific DNA methylation patterns in relation to TF affinity and the level of gene expression may bring novel knowledge about cancer biomarkers and deregulated biological pathways that promote tumorigenesis.
Similarly to DNA methylation, miRNA can also act as an epigenetic regulator [34,35] that may post-transcriptionally modulate breast cancer related mRNA genes affecting its development and drug resistance of breast cancer [36,37]. Another epigenetic mechanism known to contribute to cancerogenesis is the 3D chromatin structure alterations, which can be affected by several molecular elements including point mutations [38] or DNA methylation levels [39]. It was shown to disrupt gene expression in many cancers [40,41]. Therefore, defining the distances between regulatory elements and their target promoters in 3D chromatin structure may provide insights into the underlying mechanisms of genomic regulation also in breast cancer.
In the present study, we applied the MCFS-ID algorithm to extract the significant transcriptomic and DNA methylation features from The Cancer Genome Atlas (TCGA) dataset that could distinguish between healthy and cancerous tissues. Subsequently, we conducted analyses of mRNA expression, DNA methylation, detection of TF motifs, miRNA potential targeting by drugs, and modeling of 3D chromatin structure. This integrative approach helped reveal the biological importance of the selected features as well as the direct and indirect connections between them and their impact on the initiation and development of breast cancer.

2. Results

2.1. Detection of Potential Breast Cancer Biomarkers Using the MCFS-ID Algorithm

Our study aimed to verify if there are significant molecular features and interactions between them that may be important for breast cancer prediction and possibly used as biomarkers. The Monte Carlo Feature Selection and Interdependency Discovery (MCFS-ID) algorithm was used to select top significant features that distinguish cancerous from normal tissue samples from TCGA data. The final feature set was derived by merging MCFS-ID outcomes from two independent steps (see Section 4.2). Firstly MCFS-ID was performed on the joined set consisting of mRNA and miRNA expression and DNA methylation. As a result, the feature ranking was dominated by the methylation features, followed by mRNA expression (Table 1, Table S1). Only six miRNA expression features were found to be relevant in breast cancer prediction. In the next step, three more MCFS-ID experiments were conducted on datasets consisting of single-feature types to verify whether each of those was informative in distinguishing cancer from normal samples (Table S1). With this approach, it was possible to expand the number of significant features, especially for miRNA data, and confirm the statistical significance of each individual set of attributes in sample classification. Finally, out of 417,486 multi-omics input features, 2701 (2006 + 590 + 105) (Table 1) were selected by the algorithm as features potentially involved in cell physiological changes resulting in breast cancer development. The last two columns in Table 1 show significantly high weighted predictive accuracy (wAcc) of support vector machines (SVM) and random forest (RF) models, for classification of samples into cancer vs. normal, where the test samples were not used in the feature selection phase (see Section 4.1). It is worth underlining that after the MCFS-ID run, each feature is evaluated by the RI (relative importance) value so that for each data type, instead of an unordered set of features, a ranking of the most informative features is produced (Table S1).

2.2. Descriptive Analysis of mRNAs Having a Significant Predictive Value

The Machine Learning feature selection process focuses on the selection of features based on their high statistical significance. However, it does not consider their biological meaning, which must be examined afterwards. This section provides a detailed biological interpretation and literature-based validation of the identified significant mRNA features. At first, the top 10 mRNA genes from the MCFS-ID ranking (ADAMTS5, COL10A1, TMEM220, ARHGAP20, MMP11, CAVIN2, PLPP3, MICU3, MME, CD300LG) were screened, and all of them were confirmed to have an association to cancer prediction and development. The top five were reported as effective cancerous tissue markers [42]. There is a number of scientific research for each of the top 10 mRNA genes, well-documenting their significance and association with cancerogenesis: ADAMTS5 [43], TMEM220 [44], ARHGAP20 [45], MICU3 [46]; or precisely with breast cancer: COL10A1 [47], MMP11 [48], CAVIN2 (formerly known as SDPR) [49], PLPP3 [50], MME [51], and CD300LG [52]. They confirm the potential usefulness of the implemented approach. These findings strongly support the potential utility of these identified mRNA genes as promising biomarkers for breast cancer detection and progression monitoring, warranting further clinical validation.
Subsequent analysis included all significant mRNA genes to unveil their biological role in cancerogenesis and to discover new significant bio-functional relationships. Among the 590 mRNA genes returned by MCFS-ID, 576 revealed differential expression when filtering by the required log2FC and adjusted p-value (Figure 1A). Interestingly, these differentially expressed genes (DEGs) seem to be strong, independent predictors of breast cancer. At the level of feature selection performed with MCFS-ID, the returned decision trees (see Section 4.2) presented very shallow depth; classical statistical tests confirmed that these genes significantly differed in expression between cancer and normal samples. Moreover, the majority of DEGs demonstrated a lowered expression in cancer (n = 447), whereas only 129 showed increased expression. The down-expressed genes were enriched in 16 pathways from the Reactome database [53], which showed a great functional heterogeneity (Figure 1B). Over-expressed genes were enriched in 91 pathways (Figure 1C and Table S2), and most of them were related to the cell cycle and mitosis. Down-expressed DEGs were enriched in pathways related to lipids regulation and transport, neurotransmission, and retinoic acid synthesis (Figure 1B). These findings were confirmed by a Natural Language Processing approach (NLP). The pathways in which DEGs were enriched correspond very well to the unique keywords that describe clusters built on the gene function descriptions using NLP methods and hierarchical clustering (see Section 4.3). The two most numerous clusters (mostly down-expressed) were related to the following terms: ‘regulation and metabolic processes’ and ‘ion transmembrane transport’ (Table 2). Finally, we confirmed that the set of 590 mRNA genes was significantly enriched in genes-related (n = 79) and immunological processes (chi-squared test, p < 0.05). This fact confirms the well-known engagement of immune-related genes in cancerogenesis.

2.3. Genomic Context of DNA Methylations with Predictive Value

Feature selection revealed 2006 significant sites that differed in methylation levels between cancer and normal samples (Table 1), hereafter called differentially methylated sites (DMSs). Out of all DMSs, only one locus (cg02025583), located within the promoter of one of the top 10 mRNA genes, returned in the main MCFS-ID ranking. This promoter precedes the TMEM220 gene, and the cytosine cg02025583 is overlapped by a motif of the E2F2 transcription factor (TF), which is a good example of an altered epigenetic regulation of gene expression.
To assess if the distribution of significant DMSs was random, we compared it to the background distribution of all CpG probes on the Illumina 450K array. This revealed a statistically significant, non-random distribution of DMSs across genomic regions, with enrichment in CpG Islands (CpGI) and open seas and depletion in shores (chi-squared test, corrected p-value ≤ 0.05, Figure 2A). The indicated sites are candidates for modulating activity of regulatory regions; therefore, we focused on their methylation levels in normal vs. cancer samples to unveil their putative regulatory role in cancer development. We found that DMSs’ methylation β-values showed a significant shift towards higher values within CpGI and presented significantly different distribution of β-values within shores and open seas between cancer vs. normal samples (Wilcoxon test p-value ≤ 0.05, Figure 2B). There were over two times more hyper-methylated DMSs (n = 479) than hypo-methylated DMSs (n = 225) discovered in tumors (Figure 2C). Interestingly, the vast majority of hyper-methylated DMSs were located within CpGI, which are well-known gene transcription regulators (Figure 2D). Medium-methylated DMSs showed very high frequency not only in CpGI but also in shores, shelves, and open seas (Figure 2D).
To investigate a potential regulatory association of gene expression mediated by DNA methylation, the Spearman correlation was measured between mRNA levels of 590 significant genes and β-values of each DMS within 1 Mbp upstream and downstream from TSS of these genes. The correlation cut-off value was set to |rho| ≥ 0.6, and there were 59 pairs meeting this condition (Table S3). The majority of the obtained correlations were negative (n = 44), with only a few positive (n = 15). A more frequent negative correlation was expected if DNA methylation located in promoters or enhancers inhibited gene expression. Among these pairs, there were 34 unique genes and 55 unique DMS loci. Almost all genes were down-regulated in tumor samples, but the HN1L and KIFC1 were up-regulated. Out of 55 DMSs, five were located within one gene promoter. Interestingly, all of them were hyper-methylated and were close to each other, within one CpGI in a range of 25 bp and within the NKAPL gene promoter (Figure S1). Two of them (cg18694169 and cg10253847) were overlapped by a motif of the NRF1 TF. NRF1 normally activates gene expression, but here, due to hypermethylation, its binding to DNA can be inhibited by decreased affinity or blocked if MBD protein binds at this site, which may depend on the CpG density. Accordingly, NKAPL was down-regulated in cancer samples, which could be explained by hyper-methylation of the five DMSs [54]. The other 50 DMSs were not assigned to any promoter and also had a confirmed chromatin state, indicating possible activity (Table S3); therefore, they were defined as potential distal regulatory factors.
In order to have a better understanding of the potential role of the DMSs in gene expression regulation, all DMSs were intersected with the chromatin states of MCF-7 breast cancer cell line (Figure 3A); simultaneously, all sites from Illumina 450K were intersected with the chromatin states of MCF-7 as well (Figure 3B). The observed distribution of DMSs across chromatin states was similar to the distribution of all Illumina 450K sites across the chromatin states (chi-squared test p-value > 0.2). Comparison of the distribution of hypo- and hyper-methylated DMSs revealed a visible enrichment of hyper-methylated DMSs in transcriptionally inactive chromatin states, such as heterochromatin. At the same time, hyper-methylated DMSs were depleted within states associated with gene transcription activation, such as enhancer, promoter, and transcribed regions (Figure 3C). Conversely, for the hypo-methylated DMSs, the opposite pattern was observed (Figure 3C).
Next, the evaluation of the impact of DMSs on survival was tested. Out of 2006 DMSs there were 691 (Table S4) that were found to be significantly correlated with patients’ survival (p < 0.05). The number of loci associated with the survival in 100 random picks of 2006 methylation probes was between 375 and 450, which confirms that MCFS-ID returned an enriched list (Figure S2). However, after application of the correction for multiple tests, none of the sites achieved expected statistical significance. Therefore, the prediction of patients’ survival by the DMSs should be treated with caution.

2.4. Biological Role of Significant miRNA Genes

Firstly, for the top 10 miRNAs identified by MCFS-ID, it was confirmed that they are associated with breast cancer biological processes, namely miR-139 [55], miR-10b [56], miR-21 [57], miR-183 [58], miR-145 [59], miR-99a [60], miR-182 [61], miR-96 [62], miR-486 [63], and miR-141 [64].
Next, to characterize the regulatory functions of all 105 miRNAs identified in the MCFS-ID experiment, their associations with mRNA from the miR + Pathway database were verified, resulting in the detection of 822 unique mRNAs linked to significant miRNAs. Out of these mRNAs, 43 were shown to be significant in predicting breast cancer in the MCFS-ID analysis (Table 1). The intersection with the mRNA clusters returned by NLP analysis showed that out of 43 mRNAs, 32 belong to cluster 1, one mRNA to cluster 2, two to cluster 3, seven to cluster 5, and one to cluster 6 (Table 2). Most of the mRNAs assigned to cluster 1 had a decreased expression in the cancer samples, but five had an increased expression. The opposite situation could be observed in cluster 5, where six mRNAs were upregulated in the tumor and one was down-regulated.
For 58 miRNA genes’ down-expressed in cancer, selected out of the 105 significant miRNAs (Figure S3B), the target mRNA genes were assigned to them, and the putative associations between 46 miRNAs and 126 mRNAs were confirmed with Spearman correlation (rho ≤ −0.2, Table S5). KEGG pathway analysis of these 126 mRNAs returned insignificant results (adj. p-value > 0.05). The mRNAs contributing to significant correlations formed a protein–protein interaction network consisting of 2265 proteins. For the 50 proteins with the highest number of interactions in this network, the miRNAs targeting their mRNAs were assigned (Table S6), resulting in 22 unique miRNAs, all initially obtained as significant in the main MCFS-ID run. KEGG pathway analysis of the genes encoding those 50 proteins showed significant annotations to breast cancer (p-value = 0.000454) as well as to many other cancers, e.g., melanoma, renal cell carcinoma, acute myeloid leukemia, and colorectal cancer (Figure S4A). These proteins were also found to be associated with cancerogenesis-related biological processes, e.g., chemical, viral or proteoglycans, as well as pathways known to be crucial for cancer development, e.g., p53 signaling pathway (Figure S4A). The returned terms from GO BP analysis were almost all related directly or indirectly to cell cycle, the crucial process for cancer development and progression (Figure S4B). Moreover, out of the 50 proteins with the highest number of interactions in the miRNA-regulated protein–protein network, 16 are known drug targets in breast cancer treatment. The largest number of them was targeted by palbociclib, ribociclib [65], and abemaciclib [66], among 19 others drugs (Table S7). It is worth mentioning that both scores showed in Table S7, i.e., Drug Score (DScore), which measures suitability of the drug according to the genomic profile, and Gene Score (GScore), which reflects biological relevance of genes in the tumoral process, had high values for the majority of the aforementioned drugs, indicating their significant effect (Table S7). The resulting miRNA–protein–drug network is visualized in Figure S3C. There are two mRNA genes specified in this network (CDC25A and BIRC5) whose expression significantly correlated with upregulated miRNAs, namely hsa-mir-100 and hsa-mir-218-2, which are also known to be breast cancer drug targets (Figure S4A,B, Table S7). These two miRNAs were selected by MCFS-ID main run.

2.5. Detection of miRNA and DNA Methylation Loci Significant in the Context of Predicting mRNA Expression Levels

The result of 590 MCFS-ID experiments run on 590 significant mRNA features (Table 1), each separately used as the target variable, with miRNA expression or DNA methylation set as predictors, showed that miRNA features are better predictors than methylation. This section further explores the biological relevance of these selected miRNA and DNA methylation loci, especially those with high predictive values across multiple-target mRNAs. Out of 590 mRNA expression features, only 73 could be correctly predicted by miRNA features, 66 by DNA methylation features, and 39 by both (where Pearson correlation level ≥ 0.8). For each target variable (out of 590), a different significant set of features was returned. These sets differed in size and contained different features in the top ranking; therefore, it was possible to analyze the distribution of significant set size (based on MCFS-ID cutoff), Pearson correlation calculated between each mRNA expression, and its prediction based on the significant feature set. To find out if, for a different mRNA, there were common predictive miRNAs or DNA methylations, the frequency of a single significant feature across all significant feature sets was calculated as well (separate for miRNA and DNA methylation). Histograms show that the number of selected significant DNA methylations in the rankings was much greater than for the miRNA (Figure 4A,B), which may correspond to the size of the input datasets and to the fact that statistical modeling of the mRNA expression is much more complex in the case of DNA methylation data. However, the quality of the prediction of mRNA expression, based on miRNA features (Figure 4C,D), is comparable to that achieved with the help of DNA methylation data.
To obtain a better overview of all the selected top features, all MCFS-ID top rankings (separated by data category) were combined, and, for each predictor feature, the sums of RI, mean RI, and frequency (how many times a single feature was found as significant) were calculated. The resulting two rankings—separate for miRNA expression and DNA methylation—are shown in Table 3 (the top 15 features) and the Supplementary Material (Tables S8 and S9). For 73 mRNA target features (that could be predicted with Pearson correlation level ≥ 0.8), 97 miRNA predictors were found by MCFS-ID as significant (Freq column), which means that the same miRNAs took part in a successful prediction of the 73 protein coding genes’ expression. The predictive impact of particular miRNAs depended on mRNA, which was observed by a different position in a single MCFS-ID ranking. Moreover, 81 out of 97 miRNA genes were also significant in cancer prediction and selected by the main MCFS-ID experiment (see Section 2.1 and Table 1). The columns ‘Sum RI’ and ‘Mean RI’ in Table 3 accumulate the RI (Relative Importance) values from all MCFS-ID experiments where correlation level was ≥0.8. The last column refers to the ranking of the main MCFS-ID experiment (on a given data type). Additionally, all the top-15 miRNA genes in the table are confirmed as cancer-specific in the literature according to www.mirbase.org (accessed on 1 March 2023).

2.6. Tracking Associations Between DMSs and Detected TF Motifs

Depending on the cytosine location in the genome or changes in its DNA methylation level, the cytosine loci overlapping transcription factor binding sites (TFBS) may significantly affect binding affinity of a TF to the DNA. To investigate this issue, for each DMS, a DNA sequence covering 41 bp was obtained (site +/− 20 bp), and the TF motif search was applied. First, DMSs were divided according to their location in genomic regions: promoters, gene bodies, and intergenic. There were 616 DMSs located within promoters, 1037 in gene bodies, and 353 in intergenic regions, and the motif search returned 48, 54, and 21 TF motifs in these three genomic regions, respectively. The numbers of returned TF motifs proportional to DMSs reflect no significant enrichment in the mentioned specific genomic regions (chi-squared test, p-value = 0.138). Moreover, it was confirmed that none of the protein families were overrepresented among the confirmed motifs within the three genomic regions (p-value after FDR correction > 0.05). There were 45 common motifs between promoter and gene body regions, of which 21 were also shared with the intergenic. Specifically, we detected three TFs (E2F1, ESR2, and NRF1) only in the promoters and nine (ELK3, HIF1A, ITF2, LYL1, NFIA, NR2C2, SNAI1, ZN341, and ZN589) only within the gene bodies (Table S10).
When verifying TF motifs overlapping hyper-methylated (n = 479) cytosines, we confirmed 48 motifs; for hypo-methylated loci (n = 225), there were 5 motifs detected. Only one motif, EPAS1.0.B, was specific to the hypo-methylated set of DMS; the remaining four were detected for both cytosine methylation levels (Figure 5A,B).
To verify the similarities of PWMs of the detected TF motifs (Figure 5A,B), the hierarchical clustering was performed; it showed that the majority of motifs overlapping hyper-methylated cytosines (Figure 5A) constituted two homogenous clusters. The third cluster contained motifs overlapping both hypo- and hyper-methylated cytosines; therefore, it seems that TF motifs have some common characteristics independent from cytosine methylation level. Next, using the KEGG database [67] (Figure 5D), it was verified that the majority of biological pathways related to genes encoding TF motifs overlapping hyper-methylated DMS were precisely connected with cancer (e.g., misregulation in cancer, breast cancer, lung cancer, etc.) as well as through pathways, which are well-known to be connected with tumorigenesis (e.g., TGF-β signaling pathway [68], human cytomegalovirus infection [69]). Of note is the group of genes SP1, MYC, HEY2, E2F1, E2F2, and HES1, which are known to be associated with the KEGG breast cancer functional pathway (Figure 5D).

2.7. Models of Regulatory Networks

To discover interdependencies between DMSs and genes encoding TF motifs overlapping these DMSs in the context of breast cancer prediction, another MCFS-ID experiment was conducted (Table S12). Only the genes encoding TFs that were overlapping significant DMSs, as well as the levels of DNA methylations of these DMSs, were considered. The MCFS-ID algorithm returned 281 significant features (Table S13), out of which the vast majority were DMSs with only seven genes encoding TFs, namely MXI1, EPAS1, PLAGL1, E2F1, NR0B1, BHLHE41, and ARNT2. These 281 features were annotated to their closest genes resulting in identification of 279 mRNA genes named hereafter target genes. Nine of them (PITX1, TGFBR3, PAFAH1B3, TAL1, TDRD10, SHE, LEP, TMEM220, and NKAPL) were previously reported in the main MCFS-ID ranking (Table S1). Only one target gene, PITX1, contained a motif of one of the seven TFs encoded by the significant genes identified in the second MCFS-ID experiment, namely MXI1, which was overlapping hyper-methylated DMS cg00396667 cytosine; the remaining eight target genes were linked only to the DMS but not to TFs. For breast cancer, it was confirmed that the down-regulation of PITX1 improves prognosis, and this gene is associated with DNA methylation levels [51,70].
To review possible regulatory dependencies, the mRNA expression values of these nine target genes were used as prediction variables in a set of linear models. As the result, four linear models for the target genes TMEM220, NKAPL, TGFBR3, and SHE reached statistical significance (p ≤ 0.05, R2 > 0.5), and these genes were located on the 3rd, 44th, 245th, and 311th position in the main MCFS-ID ranking, respectively (Table S1). All four linear models were highly impacted by tissue type value; however, after removing this feature from the set of explanatory variables, the prediction of the models held a relatively high level (Pearson correlation calculated between target gene expression and predicted value decreased from 0.7–0.8 to 0.6–0.7 after removing tissue type—see Table S14). This observation suggests that the relation between target gene expression and linked TFs features is noticeably strong and specific in the context of tissue type. The four target genes were down-regulated in the tumor samples, suggesting that they were tumor suppressors regulated by hyper-methylated DMSs that reduced TFs’ binding affinity. Moreover, these DMSs were in heterochromatin regions, which was in line with gene silencing. To illustrate the hypothesis that DMSs located within the TF motif would cause disruption of the TF binding, we visualized features of the linear models in a way that the symbols of genes that encode TFs were first connected with DMSs and then with their target gene (Figure 6A). Additionally, based on the Spearman correlation results between the up-regulated miRNAs and their top target mRNAs (Table S5), one additional association of hsa-miR-211 with SHE was detected and added to the visualization (Figure 6A).
Additionally, to obtain a better insight into tumor-suppressive paths involving target genes, the gene–gene interactions were established using Pathway Commons and aggregated into one network (Figure 6B). At first, the direct interactions with a group of new genes were established, and then the most frequent of these new genes were used as an input to Pathway Commons to obtain levels of indirect interactions. This approach helped build a more dense network to select genes with the largest number of putative interactions (MYC, NOG, TGFB1, VIRMA, SRC, and AR) pointing at their significance in this network. Genes NOG, TGFB1, VIRMA, and SRC were over-expressed in cancer. Unexpectedly, MYC was down-expressed, and AR showed no significant difference in the Wilcoxon statistical test (Figure S5A). Genes MYC and AR are known to be over-expressed in specific breast cancer subtypes [71,72,73], but this pattern was not proven for the entire, much more heterogeneous, cohort of TCGA breast cancer samples.
Biological pathway analysis of the aforementioned six genes and target genes used in the linear models (TMEM220, NKAPL, TGFBR3, and SHE) showed enrichment of biological pathways related to various cancer types (e.g., bladder cancer, chronic myeloid leukemia, proteoglycans in cancer) and to signaling pathways whose alterations are often associated with carcinogenesis, e.g., TGF-β-signaling [68], erbB-signaling [74], and relaxin-signaling pathways [75] (Figure 6C).
Finally, the Interdependency Discovery function of the MCFS-ID algorithm was used to find statistically significant, nonlinear interactions between features that amplified each other in the classification task. Figure S6 provides four ID-graphs created for the top50 features and the top 50 strongest links between the features. These figures were created as an additional result of four main MCFS-ID experiments (described in the Section 4.2). Figure 6D shows a part of the ID-graph created for the NKAPL gene and its neighbors, connected by edges that represent the power of interaction between them. Recently, this gene has been shown as a significant driver of cancer development [76], a prognostic marker [77], and an important factor associated with resistance to pharmacotherapy [78]. The visible directions of interactions show that the NKAPL gene expression plays a major role in the classification of normal and tumor tissues, and the remaining DNA methylation and miRNA features boost its predictive power. Notice that the hsa_mir-7_3 is known as a significant factor contributing to cancerogenesis [79]. Figure S5B provides additional ID-graphs created for genes that were used as target genes in the linear models, namely the TMEM220, TGFBR3, and SHE genes.

2.8. Epigenomic Regulatory Spatial Model

The expression of a gene is directly associated with the distances between its body and regulatory regions, and these distances differ in three-dimensional space as compared to the linear space [80]. Therefore, measurable and quantitative variations in spatial distance are subsequently responsible for changes in gene expression. Epigenetic processes, such as DNA methylation, impact the transcription machinery to influence gene expression [81]. Therefore, the spatial proximity of genes to some of the features identified as significant in the main MCFS-ID analysis, as discerning between breast cancer and normal samples, was verified at the level of 3D chromatin structure.
In the mass MCFS-ID analysis, several genes, whose expression was well-predicted by DNA methylation levels, were found. One of them was FXYD1, which was selected as an example to visualize the putative functional association between regulatory features and mRNA expression in 3D. (Figure 7A(i)). To achieve this, an ensemble of 100 models was built using the 3D-GNOME approach [82,83]. From this ensemble, the most representative spatial conformation was selected for visualization purposes in UCSF Chimera [84]. The gene promoter region and cg23866403 DNA methylation loci were observed to be in close proximity to the enhancer of the FXYD1 gene in the normal sample, as compared to the longer distance between the regulatory elements in breast cancer, where FXYD1 gene is down-regulated and cg23866403 loci are hyper-methylated, as shown in the box plots (Figure 7A(ii)). The spatial distance distribution between the enhancer–promoter and the methylation site-promoter was also calculated to see how these distances vary within the ensemble of 100 models (Figure 7A(iii)).
To thoroughly investigate the contrast between the 3D structure of cancer and normal cells, a model was constructed around the NKAPL gene, which contributes considerably to the classification of normal and tumor tissues [54,77] (also shown in the previous Section 2.7). To do this, two specific datasets (ChIA-PET and PCHi-C) were considered. The ChIA-PET dataset identifies 3D contacts (in this particular experiment, mediated by the RAD21 protein), which provides a chromosome-wide 3D view of the target gene and its spatial connectivity. The PCHi-C experiments provided a promoter-centric view of a target gene, with interactions between each gene promoter regions and other distal DNA segments, including regulatory elements.
Cohesin-mediated chromatin loops were explored by applying chromatin interaction analysis with the use of paired-end tag sequencing data (ChIA-PET) downloaded from Encode [85] for two cell lines (MCF-7 (cancer) and hTERT-HME1 (normal)) aligned to the hg38 reference genome. It was confirmed that the identified cohesin-mediated loops (significant interactions) connecting two distant genomic fragments (anchors) surrounding the NKAPL gene were stronger and higher in number in hTERT-HME1 (normal) than the MCF-7 (cancer). Moreover, the visual examination of the loops in the genome browser illustrates that loops anchored by distal enhancers were confirmed only for hTERT-HME1 (Figure 7B(i)). These enhancer-specific interactions in the hTERT-HME1 cell line work together with the promoter to control the expression of the NKAPL gene, and they are absent in the MCF-7 cell line (Figure 7B(i)). Additionally, 3D models from ChIA-PET were constructed using 3D-GNOME algorithm to examine and annotate the resulting structures (Figure 7B(ii)). We generated a spatial model of the NKAPL gene region for both cell lines by 3D-GNOME; these models suggest that chromatin in this regions is more condensed in the normal cell line compared to the cancer cell line, in which it is loosely packed in three-dimensional spaces (Figure 7B(ii)).
Next, the spatial distances in Euclidean space between (1) the NKAPL gene body and DNA methylation sites or (2) the NKAPL gene body and the enhancer were calculated for two cell lines, MCF-7 and hTERT-HME1 (Figure 7B(iii)). These distances were obtained by mapping the beads from the 3D model representing the gene body (NKAPL), the enhancer (NKAPL), and DMSs (Table S15, Figure S7), and by calculating them for respective pairs of beads. The hypothesis that the chromatin around this gene is more loosely packed in the cancer cell line than in the non-cancer one was confirmed by the box plots of the distance distributions (Figure 7B(iii) and Figure S7) for MCF-7 and hTERT-HME1 Cohesin ChIA-PET interaction sets.
To further investigate the differences between the 3D structure of this locus in cancer and normal cells, we examined promoter interactions with distal regulatory elements using PCHi-C datasets obtained from Javierre et al. [86] and Beesley et al. [87,88]. In PCHi-C, 23 interactions around the NKAPL gene in MCF-7 and 91 interactions in MCF-10A (normal tissue) were detected (Figure 7C(i)). In this case, 3D models were constructed using the Spring Model, which is based on the OpenMM molecular dynamics simulation engine based on a beads-on-string representation [89] (Figure 7C(ii)). Additionally, both 3D models are showing the loci with differential DNA methylation around the NKAPL gene, reflecting higher DNA methylation in cancer (Table S15, Figure S7). Again, the distributions of the Euclidean distance (calculated as previously for ChIA-PET data) confirmed the hypothesis that chromatin in this locus is more loosely packed in cancer than in normal cell lines (Figure 7C(iii)). The high concordance between the resulting box plot for PCHi-C and that of the ChIA-PET dataset confirms the reliability of the previous result.

3. Discussion

In this study, the MCFS-ID algorithm was applied to return a ranking of statistically significant molecular features distinguishing cancerous and normal tissue samples deposited in The Cancer Genome Atlas (TCGA) [https://www.cancer.gov/tcga] (accessed on 1 January 2020). Using this algorithm, we could select a small number of multi-omics features significantly different between cancer and normal samples, reducing the dimensionality of these datasets from 417k initial features to only 2.7k of the ranked features. Our further effort focused on verifying whether these significant features also had a substantive meaning in the context of cancerogenesis. It was shown that almost all (n = 590) mRNA significant genes returned by MCFS-ID reflected differential expression (Figure 1A) between cancer and normal samples. Nevertheless, MCFS-ID also explores the interactions; therefore, the statistical significance of individual features for the entire group of samples is not obvious. Meanwhile, a large part of the DMSs did not show differential methylation levels (Figure 2C). This suggests that mRNA-significant features may be standalone breast cancer predictors, while DNA methylation loci features must be considered in interaction with others to obtain highly predictive features (Table 1). This finding seems logical in the context of the regulatory functions of DNA methylation. The top n = 10 of 590 significant mRNA genes from MCFS-ID ranking were verified and confirmed (based on the literature) to have a meaningful impact on cancerogenesis, which implies that the choice of the implemented feature selection approach was meaningful. In the METABRIC cohort [90], the authors reported significant enrichment of immunologically related genes, which was also confirmed in our study, showing 79 of 590 mRNA genes being related to immunological processes. Over 75% of 590 mRNAs were down-regulated in cancer samples, which suggests that DNA hypomethylation in cancer leads to de-repression of gene expression, although such a high participation of down-expressed genes in cancer is quite surprising compared to the previously described patterns [91]. The majority of mRNAs that exhibited up-regulation in cancer samples contributed to cell cycle progression and mitosis. This finding is consistent with the characteristics of cancer cells, namely accelerated cell division and growth. However, such a generalization may lead to erroneous conclusions. We found that the group of up-regulated genes, namely CCNB2, CCNB1, PLK1, and CDK1, was associated with the Reactome pathway “Activation of NIMA Kinases NEK9, NEK6, NEK7” (Table S2). Up-regulation of PLK1 in some of the breast cancer subtypes may inhibit tumor development by interfering with cytokinesis and mitosis [92,93] as well as one of the NIMA kinases NEK9, which is associated with tumor growth prevention, when upregulated [94]. Additionally, a group of genes from cluster 5, identified with the Natural Language Processing (NLP) clustering analysis (Table 2), showed a link with tumor suppressors, e.g., the APC gene that, through association with other proteins, prevents the uncontrolled growth of cells [95]. Concluding, depending on the biological context, up-regulated genes may result in activation as well as repression of processes contributing to cancer development.
As mentioned before, the majority of significant mRNAs (n = 590) were down-regulated in cancer cells. Pathway analysis showed that they can perturb the regulation of a wide range of Reactome metabolic pathways, including ethanol oxidation [96] and lipid metabolism [97,98], which are linked to breast cancer development, and also retinoid acid or neurotransmission, which are specifically connected with breast cancer treatment methods [99,100,101,102]. In addition to the results from the Reactome database, the NLP clustering approach unveiled a very interesting group of genes down-regulated in cancer (cluster 3, Table 2) associated with G protein-coupled receptors (GPCRs), which are cell surface receptors that detect ligands outside of the cell and initiate cellular response [103]. Moreover, in pathological states, GPCRs are over-expressed and activated in an aberrant way. This may imply certain aspects of cancer, including growth, invasion, migration, angiogenesis, and metastasis [104,105]. In breast cancer, multiple specific GPCRs were confirmed to participate in a plethora of autocrine and paracrine physiological effects or through activation of various ligands modulate cellular functions, which was associated with mRNA gene over-expression (revised in [106,107]). Interestingly, the genes from cluster 3 related to GPCRs were down-regulated, which is in contrast to known studies and that inconsistency should be tested in vivo. Another group of genes (cluster 2 Table 2) was described by keywords related to the ATP-binding cassette transporters (ABC transporters), some of which are involved in ion transport crucial for muscle contraction and cardiac processes. The majority of genes in this group were down-regulated in our cancer sample data. The literature provides evidence linking ABC transporters to single terms returned also by our NLP analysis such as ‘ions’, ‘transport’, ‘muscle’, ‘contraction’, and ‘cardiac’ with cancer biology [108]. For the ABC transporters family, both decreased and increased expression levels may be unfavorable for cancer patients, and some genes within this family are already molecular targets for anticancer drugs [109,110]. In this study, their levels were decreased in cancer samples. This reduction has been reported to contribute to more aggressive cancer forms, for example, as shown in TMPRSS2-ERG-negative prostate cancer [111]. For instance, reduced ABCA9 expression in epithelial ovarian cancer has been linked to significantly shorter time to progression [112]. Interestingly, in patients with breast cancer, the reduced ABCA8 expression lowers 5-year patients survival rate, is present in older patients (>60 age) as well as in the three breast cancer subtypes: ER-negative, PR-negative, HER-positive [113]. Among the keywords describing the remaining clusters of genes (Table 2), both general and very specific phrases were present. It seems that NLP based clustering analysis allows for efficient linking of even small gene groups with their related processes, which we find a big advantage. The obtained clustering results allow for a precise selection of genes having a general role from once having specific functions Further development of this NLP-based gene ontology analysis seems promising, especially as NLP is already widely and successfully used in other fields, e.g., supports disease diagnosis [114].
In breast cancer samples, DNA hyper-methylation in the regulatory regions of tumor suppressor genes and hypo-methylation of oncogenes has been shown [115]. In this study, MCFS-ID returned a ranking of 2006 DMSs that were significant predictors of breast cancer. These DMSs were located noticeably more often in CpG Islands (CpGI) and open seas, and less often in shores, than would be expected by the cytosine distribution in the entire Illumina panel. This result suggests that the genomic location of a cytosine influences its likelihood of being identified as a significant site, differentially methylated in breast cancer within our analytical framework. Overall, a small fraction of the returned significant DMSs was hypo-methylated in cancer samples (Figure 2C), while the majority was hyper-methylated. DMSs located in CpGIs were the most differentiated between normal and tumor samples in terms of methylation level (Figure 2B). The observed enrichment of DMSs within CpGI and their inverse correlation with mRNA expression, exemplified by the hypermethylation in the NKAPL promoter leading to its downregulation, highlights a precise regulatory mechanism. The above corresponds very well to the pattern of generally dominant DNA hyper-methylation in breast cancer and only local disorders of DNA hypo-methylation (revised in [116]). Furthermore, the distinct distribution of hyper- and hypo-methylated DMSs across various chromatin states—with hypermethylation prevailing in transcriptionally inactive regions and hypomethylation in active ones—provides a compelling epigenetic signature. This suggests that the interplay between specific methylation patterns and chromatin accessibility significantly contributes to the altered gene expression landscape characteristic of breast cancer, underscoring the potential of these specific methylation events as key drivers or indicators of tumorigenesis.
Moreover, we discovered putative functional associations between 59 DMS–mRNA pairs, out of which over 90% of these DMSs were located in distal genomic regulatory regions. DNA methylation changes in these DMSs may potentially affect the activity of enhancer-like regions, influencing the target-gene expression. Out of 34 genes under such enhancer-like regulatory effect, only two (KIFC1 and HN1L) were over-expressed in cancer, both with significant negative correlation with DNA methylation. Up-regulation of KIFC1 expression is well known for breast cancer [117], and the protein has been suggested as a chemotherapy target [118], while over-expression of HN1L is related to tumor invasion in breast cancer [119]. Next, based on the discovered association among the DNA methylation sites within the promoter of NKAPL gene and TFBS of NRF1 TF shown in this study, we are confident about the presence of putative functional dependency of these molecular elements. The proposed regulatory model suggests that hypermethylation at five cytosine loci within the NKAPL promoter may obstruct NRF1 binding, either by diminishing its affinity [120] or through MBD protein binding, resulting in its down-expression in tumor samples. MBD proteins can bind to specifically methylated cytosines, preventing other factors from binding and simultaneously may co-form complexes (for example, MBD2 and MBD3 in the NuRD complex), which may lead to gene silencing [121]. Moreover, based on the results from the extensive use of MCFS-ID (Table S9), it was possible to select 23 cytosines significantly associated with NKAPL expression (r = 0.81). Among all of them, only one cytosine (cg18675097) was located within the NKAPL promoter, and all the others were located (up/down-stream) at least 1,369,587 bp from NKAPL TSS, suggesting their presence in distal regulatory regions of NKAPL. By using this approach, we additionally selected 66 mRNA genes whose expressions were well-predicted by DNA methylation, and 39 of them were well-predicted by miRNA expressions as well (see Section 2.5).
To review the significant set of n = 105 miRNA returned by MCFS-ID, the analysis of associations between miRNA to mRNA was conducted resulting in the list of proteins targeted by drugs used for breast cancer treatment. For example, palbociclib (DB09073), a drug for treating metastatic breast cancer which targets proteins encoded by genes such as CCNA2, CCND1, CDC25A, CDK1, CHEK1, ESR1, KRAS, and PLK1 has a DScore of 0.99, while the GScore of the genes is 0.85. At the same time, other drugs with high DScore, used in breast cancer treatment, e.g., ribociclib [65], abemaciclib [66], tamoxifen [122], etc., were found to be linked to miRNA reported as significant in this study (Figure S3C).
Methylated/unmethylated nucleotides within the TFBS may disturb TF binding to DNA sequence. This may change TF binding affinity, and shift the factor binding site, resulting in alternative protein complex formation, binding prevention or other TFs binding to such locus [123,124]. At the same time, it is not clear what the order of these events is, what initiates the process, and what the results are [24]. In this study, we identified many more TF motifs overlapping hyper-methylated DMSs than hypo-methylated, which reflects a much higher frequency of hyper-methylated cytosines among the 2006 DMSs. Moreover, almost all motifs, except EPAS1, containing hypo-methylated cytosines were identical to the TF motifs containing hyper-methylated cytosines. There are reports indicating that EPAS1 may support proliferation and migration and increase tumor cell invasiveness [125,126]. Genes encoding TFs that bind to motifs identified in sequences containing hyper-methylated cytosines (Figure 5D) belonged to, among others, cell cycle-signaling pathways, transcription misregulation in cancer, the TGF-β pathway, cellular senescence, and several cancer types, including breast cancer q-value = 0.0000234 (Figure 5D and Table S10).
In the second MCFS-ID experiment, the association between DMSs and the expression of genes encoding TFs, whose motifs overlapped these DMSs, was exposed. There were seven genes encoding TFs: MXI1, EPAS1, PLAGL1, E2F1, NR0B1, BHLHE41, and ARNT2. PLAGL1 has been reported as a possible epigenetically regulated tumor suppressor gene [127], and NR0B1 (also known as DAX-1) has been repeatedly indicated as a potential target for anticancer therapy in patients with breast cancer [128,129]. Likewise, there is evidence that low expression of BHLHE41, which was also observed in the results of this study, promotes breast cancer tumor invasion [130]. The other four (EPAS1, MXI1, ARNT2, and E2F1) relate to the signaling pathways involved in the processes of tumor formation and development [131].
Using the epigenetic variables that functionally interact with each other, here DMSs and TFs, together with the MCFS-ID and linear regression models, allowed for the identification of mRNA target genes under probable epigenetic regulation (see Figure 6A, Table S12). Among nine target genes, four were confirmed to have linear models with high goodness of fit: TMEM220, NKAPL, SHE, and TGFBR3. These genes were down-regulated in the tumor samples and seemed to be tumor suppressors whose activity could be regulated by DNA methylation located within TFBS of specific TFs.
Hyper-methylated DMSs may reduce these four TFs’ binding affinity and change gene expression. Moreover, these DMSs are located in heterochromatin regions that are known to contribute to gene silencing. NKAPL (NKAP-like) is a cell-specific transcriptional suppressor in Notch signaling [132], and its reduced expression in cancer has been indicated by several articles, just as the relationship between DMS hyper-methylation and the demonstrated change in NKAPL expression [77,133]. Transforming growth factor beta receptor III encoded by TGFBR3 binds inhibin and can mediate functional antagonism of activin signaling [134]. Decreased expression of TGFBR3 (former ETDL1) causes decreased TβRIII expression in tumor tissues, resulting in tumor progression due to increased invasiveness, angiogenesis, and a chance of metastasis [135,136]. Next, the SH2 (SHE) domain-containing adapter protein E possesses the Src homology 2 (SH2) domain identified in the oncoproteins Src and Fps. It functions as a regulatory module of intracellular signaling cascades by interacting with phosphotyrosine-containing target peptides [137]. The Transmembrane protein 220 (TMEM220) is involved in the FOXO and PI3K-Akt pathways [138] and promotes regeneration [139]. The down-expression of TMEM220 and SHE genes (also in connection with hyper-methylation) has been repeatedly indicated as a significant factor important in the formation and development of cancer but not necessarily breast cancer (Refs. [44,138,140] and Supplement Table S1 in [141]). Our results are, therefore, the first to indicate the impact of these two genes in breast cancer development.
To extend the analysis of functional importance of the four genes, the gene–gene interactions were used to build the network of 17 new genes connected to the initial four (Figure 6B). The functional investigation of genes with the highest number of connections (MYC, NOG, TGFB1, VIRMA, SRC, and AR) in the network showed an overrepresentation of signaling pathways related to cancer processes, associated mainly with cell cycle disorders, namely proliferation, growth, differentiation, migration or apoptosis, as well as patients survival. The returned KEGG pathways were consistent with the previously discussed gene’s biological functions. For example, the hyperactivation of the Mitogen Activated Protein Kinase (MAPK) pathway is frequently observed in many cancers, including breast cancer. It is an oncogenic pathway, and at the same time, it is crucial for the signal transduction of the ErbB protein family [142]. Some proteins from the ErbB family are oncogenes associated with proliferation and apoptosis. They are also related to cancer treatment resistance in some breast cancer subtypes [143]. At the same time, the MAPK is associated with PI3K-AKT-mTOR, i.e., the pathway that is directly related to TGFB1 and MYC and indirectly with TMEM220 and AR. PI3K-AKT-mTOR is associated with the processes of oncogenesis and breast cancer development, and many inhibitors of this pathway are currently in clinical trials [144,145]. Another overrepresented pathway was Hippo, which is linked with proliferation, migration and apoptosis [146], and metastasis changes [147]. The Hippo pathway is well-known for having an impact on the transforming growth factor beta (TGF-β)-signaling pathway through which they may control tumor development [148,149]. Additionally, components of the TGF-β pathway play a significant role in the proliferation, cell growth, and differentiation of cells, but also affect the immune system, enabling the repair or development of ongoing processes that were shown to negatively affect the patient’s condition [148,150,151]. Members of the TGF-β protein family play an essential role in apoptosis and migration, the regulation of which can have a vast impact on breast tumor development, especially at its later stage [148,151,152]. Additionally, one of the miRNA genes (hsa-miR-211), which was pointed out as significant in the presented regulatory network (Figure 6A, Table S5), is known to participate in the TGF-β pathway [153]. The hsa-miR-211 is involved in the regulation of proliferation, migration, invasion, apoptosis, and drug resistance [154], and we discovered its association with the SHE gene. Alterations in the expression level of hsa-miR-211 have been repeatedly reported in the context of various cancer types, but the direction of its expression level changes depending on the type of cancer. SHE was confirmed as an oncogene and/or tumor suppressor, depending on cancer type [155]. In breast cancer, change in SHE expression, no matter if decreased or increased, results in metastasis and poor prognosis [154]. We are convinced that the analysis of hsa-miR-211 with the SHE, whose role is little known, should become the subject of detailed studies. Additionally, one of the terms related to the obtained gene–gene network was “chemical carcinogenesis”, which is connected to many environmental and chemical factors, having a strong impact on the oncogenic processes including DNA methylation. Therefore, there are multiple indirect confirmations that the created network demonstrates an interplay among the detected epigenetic disorders, which, in turn, leads to subsequent changes affecting target gene expression and disease development.
From the massive MCFS-ID computational approach, we discovered an association between cg23866403 loci and FXYD1; to verify this functional putative association, we built a chromatin spatial model using the 3D-GNOME approach. Based on the obtained results (Figure 7), we hypothesize that, in the normal tissue, the lower level of cg23866403 loci methylation in the FXYD1 gene promoter results in a shortened spatial distance to the gene enhancer, and, as a result, FXYD1 increased expression compared to the cancer tissue. The significant impact of DNA methylation on the chromatin structure in cancer is well-studied [156], showing the appearance of changes, for example, due to CTCF binding disruption [157]. Moreover, the length of the DNA loops may change depending on the cohesion’s presence in gene expression regulatory machinery as well as the presence of CTCF in one or both anchors of a loop [158]. The relationship between distance change and gene expression change is very poorly understood; the only example we found was in Drosophila [159]. Based on our results, one could suggest that the change in DNA methylation affected protein binding; consequently, it changed the length of the loop. Through the results presented in this study, the NKAPL gene was found to appear in multiple contexts, making it an interesting target, especially in its transcription relation due to altered DNA methylation. Therefore, we built spatial models for it using two different experimental chromosome conformation capture protocols, namely ChIA-PET and PCHi-C (Figure 7). The first model allowed us to observe the change in loop length and discover that cohesin-mediated loops surrounding the NKAPL gene were longer in the normal cell line (hTERT-HME1) than in the cancerous (MCF-7) and also that loops anchored by enhancers are present only in the normal cell line. Thanks to the second model, it was noticed that in the normal cell lines (MCF-10A) around the NKAPL gene, several times more interactions can be identified compared to the cancer cell line (MCF-7), with a similarly higher level of DNA methylation in cancer. Based on the spatial models obtained by both methods, we hypothesize that the reduced level of methylation in normal cells results in the formation of a much larger number of stable interactions, which translates into a more condensed chromatin region. However, this increased level of chromatin condensation in the normal cell line cannot be interpreted as closed chromatin, preventing transcription and expression regulation processes. It is tighter because of the higher number of established connections. In contrast to the cancer cell line, where fewer connections result in looser chromatin structure, putatively unstable, and, accordingly, exposed to unexpected transcriptional changes that may result in the development of potentially oncogenic changes. Chromatin organization has a significant impact on organism functioning [160], and its disruption may support pathogenic processes, e.g., through chromosomal instability, which intensifies deregulation of gene expression [161]. Changes in DNA looping can cause errors in gene regulation during both cancer initiation and development [40,162]. Indeed, alterations such as the absence of specific loops or shift in chromatin interaction frequency have been directly linked to carcinogenesis in various cell lines [85]. Moreover, a more loosely packed chromatin state may reduce nuclear stiffness and increase chromatin mobility [163,164]. This enhanced mobility can lead to chromosomal translocations and changes in the transcriptional landscape, which may contribute to oncogenic events [161,165]. This mechanism may explain why we observed more loosely packed chromatin in the cancer cell line specifically related to the NKAPL gene promoter in our study. Beyond these structural changes, cohesin-mediated chromatin structures are known as regulators of Epithelial–Mesenchymal Transition (EMT)-related genes, whose altered expression influences cancer progression, including breast cancer [166]. EMT itself may also be influenced by disturbances in the TGF-β-signaling pathway, which may suggest the multilayered nature of cohesin-mediated chromatin structures disorder [166,167]. Finally, it must be noticed that we used TCGA tissue samples, and they represent various cell heterogeneity. This may influence the study, because some of the observed values might be averaged and not confirmed as significant. At the same time, we are certain that the altered transcriptomic and epigenetic signals presented in this work are of great value; when studied further, at the level of specific cell populations, they will bring detailed insight into gene expression regulation during breast cancer development.

4. Materials and Methods

4.1. Data Collection

This study is based on breast cancer data obtained from The Cancer Genome Atlas (TCGA) including mRNA, miRNA expression, and DNA methylation levels [https://www.cancer.gov/tcga] (accessed on 1 January 2020). The data was filtered as follows: (1) all attributes having zero variance across samples were removed; (2) only female samples were included in the study. The final dataset consisted of 1191 samples taken from 1068 female patients (123 patients donated both normal and cancerous tissues). Out of the 1191 samples, only 381 were complete among mRNA, DNA methylation, and miRNA data. They consisted of 328 cancerous and 53 normal samples (Table 4 and Figure 8). The remaining set of samples (incomplete among all datasets) were used as testing sets in separated classification experiments described in Section 4.2 and Section 2.1.

4.2. Detection of Significant Features Using MCFS-ID Algorithm

Our analysis utilized the Monte Carlo Feature Selection and Interdependencies Discovery (MCFS-ID). This algorithm allows the user to perform a supervised feature selection introduced in [168]. MCFS-ID generates a ranking of features based on their potential to distinguish records between classes, e.g., cancerous vs. normal. It also enables the prediction of continuous values and allows the user to discover possible interdependencies between features.
The algorithm builds thousands of decision (or regression) trees on randomly selected subsets of data samples and attributes. The relative importance (RI) score for each feature is calculated based on all decision trees and nodes built on that feature: the number of samples split by the node, the information gain of the node, and the predictive quality of the trees. The RI score is used to build the ranking of all input features. The ranking signifies which attributes are best to use in classification or regression tasks. Additionally, the algorithm provides an RI cutoff that assures that attributes that exceed it are better for predictions than attributes with random values, which might distinguish classes by pure chance. The upper part of the feature-ranking cut-off by the RI value constitutes the significant features set.
The Interdependency Discovery function of the MCFS-ID algorithm allows the user to find links between features that amplify each other in the classification task. Decision or regression trees used for calculation of RI scores are also used to generate feature interdependency scores. In this case, the score for the pair of features (parent/child nodes in the tree) is generated using information gain of the child node multiplied by its associated number of samples expressed as a fraction of samples in the parent node.
The resulting scores indicate which features amplify the prediction powers of other features. The result of this algorithm has the form of a directed graph where features are visualized as nodes and the thickness of edges symbolizes the strength of this amplification. For better clarity and to avoid false positives, MCFS-ID allows us to cut off edges that are weaker than those that might be caused by random patterns occurring in the data. Interdependency graphs were generated using the ‘build.idgraph’ function from the ‘rmcfs’ package. For more details of the MCFS-ID, see [7].
To establish a final list of significant features for each data type, four runs of the MCFS-ID algorithm were performed (Figure 9).
First of all, a selection of features was performed on the full dataset consisting of mRNA expression, DNA methylation, and miRNA expression in 381 samples. This approach allowed for capturing the relationships between attributes from different types of datasets. Next, a feature selection was performed on the same dataset but separately for each data type (Figure 9), where the sample size differed depending on a data type (Table 4, Figure 8). In all of these four runs, the features were selected by their ability to differentiate between normal and cancerous samples. Running MCFS-ID on the data of all types allowed us to find features significant in conjunction with features of other types. Attributes derived from this run are to be called significant, along with all categories. Separate runs for each data type capture the less (but still) important features that would be under the relevance threshold of the MCFS-ID algorithm in case they were part of a larger set of features. The whole procedure (Figure 9) will be referred to as the main MCFS-ID experiment later in the paper, and the features selected by the procedure as the significant set of mRNA/miRNA genes or DNA methylation features. The analysis was conducted using version 1.3.1 of the ‘rmcfs’ package from the CRAN repository in R 3.6.3. The default parameters were used except splitSetSize = 200; mode = 2; cutoffPermutations = 20.
Additionally, to validate the quality of the selected important features, two different classification models (SVM (support vector machine) and RF (random forest)) were trained on the 381 complete samples and tested later on the test samples that were not used in the feature selection phase. For each type of data, there are over 400 unique test-ready samples that do not contain values for all categories; these were used to obtain weighted accuracy of classification (wAcc) for each category separately.

4.3. Descriptive Analysis of Significant mRNA Genes

To annotate genes as down- or over-expressed in the cancer samples, the log2 fold change was calculated for each of the significant genes (returned by MCFS-ID). The distribution of gene expression was verified with the Shapiro–Wilk test, and, depending on the obtained results, the Wilcoxon test was applied. Next, functional annotation of significant genes was performed using the Enrichr web server [169], followed by the Benjamini–Hochberg correction for multiple testing. Initially, the q-value was set to 0.05, but because of a huge number of significant terms returned, in one (over-expressed mRNA) analysis, the threshold was set to 0.001. Moreover, based on the set of significant genes descriptions, gathered from various molecular databases provided by BioMart (such as NCBI, Gene Ontology, KEGG, Reactome, WikiPathways, Biocarta), Natural Language Processing (NLP) methods were applied to group these genes into functionally/descriptively similar clusters and retrieve sets of key words to describe each of the returned cluster. To perform this operation, each gene was represented as the text document (combined from all available descriptions), and, later, used to calculate TF-IDF (term frequency–inverse document frequency) as a bag of words/terms. When each gene is represented by a TF-IDF vector, it is possible to calculate cosine similarity between the documents and a hierarchical clustering model can be built based on these similarities. Finally, discovering a set of unique keywords that characterize each cluster provides a good functional biological overview of the input genes and groups that should be studied more closely.

4.4. Descriptive Analysis of Significant DNA Methylation Sites

DNA methylation sites selected by the MCFS-ID (hereafter differentially methylated sites, DMSs) were mapped to the specific genomic regions. Original DMSs positions were converted from hg19 to hg38 using LiftOver [170]. Intersections of genomic regions, i.e., CpG islands (CpGIs) or promoters with DMSs were performed using bedtools (v2.29.2) [171]. The locations of CpG islands (CpGI) were taken from the UCSC database represented in hg38 genome assembly (https://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/cpgIslandExt.txt.gz, accessed on 1 November 2019). The shores were defined as the regions flanking the CpGI by 2000 bp up- and down-stream, the shelves as the regions flanking the shores by +/−2000 bp, and the open seas as the regions between the shelves. Next, to prepare the background distribution of cytosines across these four region types, all cytosines included in the Human Methylation 450K BeadChip (Illumina 450K) were mapped to CpGIs, shores, shelves, and open-sea regions. The sites covered by the Illumina 450K represented the background distribution of cytosines across the named genomic regions and allowed to verify (chi-squared test) whether DMSs show any specific distribution. Afterwards, the distribution of β-values of cancer vs. normal samples was compared separately for CpGIs, shores, shelves, and open-sea regions using the Wilcoxon test. β-values represented the ratio of the intensity of the methylated bead type to the combined intensity of a locus and obtained values between 0 and 1 [172]. Finally, we tested whether the distributions of hypo-, medium-, and hyper-methylated cytosines in cancer samples differed across the four region types (chi-squared test). Bonferroni correction was applied for multiple testing corrections for the aforementioned analysis.
To annotate DMSs to promoters or gene bodies, the gene positions from Ensembl (https://www.ensembl.org/Homo_sapiens/Info/Index, accessed on 1 February 2020) were taken, and their promoters were set to +/−2000 bp around the TSS. DMSs assigned to the promoter or the gene body established a pair: a DMS and its target gene. DMSs within intergenic regions were paired with their target genes using the bedtools closest function. Assignment of hyper-, hypo-, and medium-methylated β-values was performed on the basis of the log2 fold change (log2FC) between sample types (cancer vs. normal). Sites with log2FC ≤ −1 were labeled as hyper-methylated in cancer and log2FC ≥ 1 as hypo-methylated DMSs in cancer; the remaining sites were labeled as medium-methylated. To evaluate the significance of the observed log2FC, first the Shapiro–Wilk test was used to verify whether the data was normally distributed, and, based on the test results, a nonparametric Wilcoxon test with FDR correction was chosen to verify the null hypothesis that there was no difference in distribution of β-values for DMSs between cancer and normal samples.
To detect putative regulatory regions, the association between gene expression and the β-value of DMSs located 1 Mb upstream/downstream from that gene, TSS was verified by calculation of the Spearman correlation with FDR correction, defining significant correlations when Spearman’s |rho| ≥ 0.6 and FDR ≤ 0.05. Moreover, to assign DMSs to specific chromatin states, we used chromatin state annotations for the MCF-7 breast cancer cell line (GSE57498). The data were converted from the original annotation Human GRCh37/hg19 to GRCh38/hg38 annotation using LiftOver [170]. Positions indicating acetylation of the lysine 27 of the histone H3 protein (H3K27ac) for the MCF-7 cell line were taken from the ENCODE database (https://www.encodeproject.org/files/ENCFF621API/, accessed on 1 October 2020). The sites included in the Illumina 450K panel were intersected with the ranges of chromatin states to achieve the general distribution of cytosines across chromatin states. Next, from the significant 2006 DMSs, hyper- and hypo-methylated DMSs were selected and independently intersected with chromatin states to unveil their distribution across chromatin states. To verify whether the obtained distributions were specific, 2006 random loci were drawn 1000 times from the Illumina 450K panel. Each time, a set of those drawn loci was intersected with the chromatin state ranges to compute the percentage of loci assigned to specific chromatin state. Next, the logarithm of fold change between percentage of hyper-methylated DMSs and mean percentage of randomly drawn loci for each chromatin state was computed. The same procedure was applied to hypo-methylated sites, generating empirical p-values of significance for these overlaps.
Next, to evaluate the impact of DMSs on the patients’ survival, a multivariate log rank test (‘lifelines.statistics.multivariate_logrank_test’ function in Python 3.8) was used. The samples were split into high and low methylated groups, defined by a median of methylation level. Additionally, to confirm that the number of DMSs, discovered to have a significant impact on survival, differs significantly from the number of such sites selected randomly, a bootstrapping technique (sampling 100 times) was applied. After each sampling of 2006 random sites from all sites present in the Illumina 450K panel, their impact on survival was tested, and the number of sites with p-value below 0.05 was noticed. Next, we counted how many times the number of significant sites (having impact on survival) found in a random set was greater than the number of significant sites (having impact on survival) obtained for 2006 DMSs.

4.5. Descriptive Analysis of Significant miRNA Genes

To characterize the set of significant miRNAs, returned by the MCFS-ID, we verified their presence in the miR + Pathway database [173], which contains information about mRNA–miRNA connections and 150 KEGG pathways linked with mRNAs. The mRNAs linked with the searched miRNAs were intersected with 590 mRNAs returned in the MCFS-ID experiment. Biological functions of the resulting mRNAs were verified on NLP cluster keywords (described in Section 4.3).
Next, to study the suppressive impact of breast cancer on the expression of significant miRNAs (those returned in the main MCFS-ID experiment) and miRNAs’ regulatory role in mRNA gene expression, the selection of down-expressed miRNAs’ genes in cancer was performed based on log2FC (Figure S3A). The genes were defined to be down-expressed in cancer if log2FC ≤ −0.5 (Figure S3B). Next, for the selected miRNAs down-expressed in cancer, their mRNA target genes were assigned using the MicroRNA Target Prediction Database [174], and Spearman correlation was calculated for the obtained miRNA–mRNA pairs. The correlations where rho ≤ −0.2, and adjusted p-values ≤ 0.05 were considered in further analysis. Next, for the mRNA genes that significantly correlated with miRNAs using the STRING database [175], the protein–protein interaction identification was performed, and based on the number of interactions among proteins, the top 50 proteins were selected (with the highest number of interactions). Next, using KEGG pathways and gene ontology biological processes (GO BP), the enrichment analysis of those top 50 proteins was performed. Additionally, to discover putative drugs associated with down-expressed miRNAs, the Enrichr Database was searched with the same set of proteins.

4.6. Detection of Significant miRNA and Methylations in the Context of Predicting mRNA Expression Levels

To discover more complex interactions between the top 590 most significant mRNA genes (obtained from the main MCFS-ID experiment described in Section 4.2) and two other types of molecular data (DNA methylation and miRNA expression), two additional sets of MCFS-ID experiments were performed (Figure 10). Both sets of experiments were based on the same idea of running the MCFS-ID algorithm on all top significant 590 mRNA features obtained from the main MCFS-ID runs. Each of those mRNA features was used as a target variable, and miRNAs or DNA methylation were used as predictor features. Finally, two different feature rankings were obtained. In the case of DNA methylation, for each run on a different target mRNA gene, methylation loci within the chromosome of the gene were explored. This limitation heavily reduced the calculation time—the number of all DNA methylation sites is almost 400k and it is biologically justified to focus on such relations within one chromosome [176]. In both sets of experiments, the final cross validation of the result was based on a regression tree modeling and the calculation of Pearson correlation between the predicted value and the observed mRNA gene expression level.
For each significant mRNA (selected by the main experiment described in Section 4.2) treated as a target variable, a separate MCFS-ID experiment was performed. The same procedure was used for DNA methylation as predictors instead of miRNA expression levels.

4.7. Descriptive Analysis of Associations Between DMS and TFs

To predict transcription factors (TFs) whose binding affinity to DNA sequence may be changed due to differential DNA methylation, the sequences surrounding each DMS (+/−20 bp) were generated using the bedtools getfasta function (v2.29.2) [171]. To identify TF motifs, the HOCOMOCO database of position weight matrices (PWMs) [177] with the PWMEnrich tool [178] were used with the following settings: (i) sequences background build based on randomly selected fasta sequences and (ii) motif significance cutoff p-value ≤ 0.001. Next, to detect the exact Transcription Factor Binding Site (TFBS) positions of the motifs that passed the threshold, the online FIMO tool [179] from the MEME Suite 5.0.5 was applied with a significance threshold of p ≤ 0.0001. The returned TFBS were intersected with DMS to keep only these TFBS, within which any DMS was confirmed, to ensure that differential DNA methylation may affect binding affinity. To group TF motifs, returned by PWMEnrich, based on the heterogeneity of their PWMs, the STAMP tool [180] was used with Pearson Correlation Coefficient as a measure of distance. Sequence alignment was performed using an ungapped Smith–Waterman algorithm with the iterative refinement multiple alignment strategy. Visualization of the clustering results was performed with UPGMA [180]. Furthermore, functional annotation of genes encoding TFs was performed using the Enrichr web server [169] with the Benjamini–Hochberg correction for multiple testing and a significance threshold q-value of ≤0.05.

4.8. Building Models of Regulatory Networks

To investigate whether the putative associations between TFs and DMSs within the binding sites of these TFs had similar patterns in cancer vs. normal samples, the additional analysis using the MCFS-ID algorithm was performed. The input decision table consisted of 484 patients for whom both mRNA and DNA methylation datasets were available. Features, namely TFs’ coding genes and DMS within DNA sequences of motifs detected for those TFs, were analyzed. The returned significant features, together with cancer/normal tissue type, were used as explanatory variables to train a set of linear models to predict mRNA expression of a target gene. Each time, a different mRNA gene (target gene) whose promoter overlapped with a set of explanatory variables (TFs and DMS) was used as a dependent variable to build a single linear model. Finally, for the best-fitted models (adjusted p ≤ 0.05, R2 > 0.5), the feasible biological relationships between DMS, TFs, and their target genes were visualized. These target genes were also used to discover direct and the closest indirect associations between them, as well as other genes by the systematic literature review and interaction graphs obtained through the Pathway Commons online tool [181]. Based on the associations found, a final graph of connections between identified target genes and other genes was created and visualized.

4.9. The Visualization of Chromatin 3D Structure of Selected Loci

In order to visualize the putative functional association between genes, DNA methylations significant in breast cancer, and enhancer regions, the chromatin structures of the FXYD1 and NKAPL loci were generated using 3D-GNOME [182] and Spring Model (SM) [89] polymer simulation methods.
3D-GNOME is a chromatin 3D structure modeling method that uses a multiscale bead-on-a-string approach and a Monte Carlo simulated annealing algorithm [182]. It models chromatin interactions mediated by specific proteins based on high-frequency PET (multiple paired end tags mapped on two genomic loci) cluster interactions and singletons (single paired end tag). The algorithm uses a tree structure to manage the relationships between different levels of genomic organization (chromosomes, segments containing a topological domain, and chromatin interaction anchors), simulating their spatial positions independently by minimizing an energy function based on high-frequency chromatin PET cluster interactions and energy terms. Next, sub-anchor beads are added between neighboring anchor beads to model chromatin loops, and their positions are again simulated by minimizing energy. Finally, the algorithm refines the loop shape using a singleton interaction heatmap and motif orientation. The 3D-GNOME models were generated based on cohesin mediated Chromatin Interaction Analysis by Paired-End Tag sequencing (ChIA-PET) data for the hTERT-HME1 (normal) and MCF-7 (cancer) cell lines (ENCODE Accession ID: ENCSR991JXX—hTERT-HME1, ENCSR255XYX—MCF-7) [85].
The Spring Model represents polymers as a collection of points in three-dimensional space using the beads-on-chain approach. In the resolution chosen by the user, each bead represents a segment of DNA of the same length. In this study, chromatin models with a resolution of 1 kbp were constructed, where each bead represented 1000 base pairs. If there was a spatial interaction between beads in the polymer model, harmonic bonds were used to connect pairs of interacting beads by springs. The spring-based pairwise forces were subjected to energy minimization in a SM polymer simulation. In order to establish the final 3D structure of the polymer fiber with the set of experimentally identified contacts, the SM simulation undertook the global energy minimization-given data-driven forces represented by the springs and polymer chain parameters (such as stiffness). The initial conformation of the polymer was given as a circular 3D structure of polymer fiber. Three-dimensional models generated by the Spring Model approach were built using Promoter Capture Hi-C (PCHi-C) [87] data for MCF-10A (healthy) and MCF-7 (cancer) breast tissues to achieve a promoter centric view.

5. Conclusions

This study successfully extracted a very rich set of multi-omic features, including mRNA expression, miRNA expression, and DNA methylation, enabling accurate classification of breast cancer vs. normal samples. The top-ranked features included many established breast cancer-related markers, validating our analytical approach. Simultaneously, we identified novel features, and we are certain that some of them represent potential novel targets for functional research in the context of breast cancer development and drug studies. Among the previously undescribed features, there were mRNAs (NKAPL, PITX1, and TMEM220) for which we proposed potential regulatory mechanisms based on epigenetic interactions. Furthermore, 3D chromatin structure models revealed a less condensed chromatin structure in cancer, suggesting significant regulatory dependencies. The presented bioinformatic approach, which effectively reduces data dimensionality, provides a robust framework for selecting key features for subsequent experimental validation.

6. Limitations of the Study

Despite identifying numerous statistically significant molecular features and their complex interdependencies related to breast cancer, our study has certain limitations that warrant discussion. Firstly, while the TCGA dataset is extensive and widely utilized, it is primarily based on bulk tumor tissue analysis. This approach inherently averages signals from heterogeneous cell populations, including tumor cells, stromal cells, and immune infiltrates. Consequently, our findings may not fully capture the nuanced, cell-type-specific molecular alterations that drive cancer, nor do they reflect the single-cell resolution required for a complete understanding of tumor heterogeneity and evolution. Secondly, although MCFS-ID efficiently identifies features with high predictive power, the biological relevance and causal relationships for many of them remain putative. While we provide extensive literature support and biological pathway analyses, direct experimental validation (e.g., in vitro or in vivo functional assays) for gene expression, methylation, or miRNA modulation was beyond the scope of this computational study. Such validation would be crucial to confirm their functional roles as biomarkers or therapeutic targets. Furthermore, the cross-sectional nature of the TCGA data limits our ability to infer temporal causality or track dynamic changes in multi-omics profiles over the course of disease progression or treatment. Lastly, while we explored 3D chromatin structures for specific examples like NKAPL, these analyses relied on established cell line data, which may not perfectly recapitulate the complex in vivo environment of primary human tumors. This limits the direct translatability of certain 3D chromatin findings to heterogeneous patient populations. Future studies integrating single-cell multi-omics data, experimental functional validation, and longitudinal patient cohorts will be essential to overcome these limitations and further advance the understanding and clinical application of these findings.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/ijms26146558/s1.

Author Contributions

Conceptualization, M.J.-K., M.D., I.S., D.P. and M.J.D.; methodology, M.J.-K., M.D., M.W., V.P., I.S., D.P. and M.J.D.; software, M.J.-K., M.D., M.W., M.Ł., A.F., N.G., I.S. and M.J.D.; validation, M.J.-K., M.D., M.W., M.Ł., K.S., A.A. and M.J.D.; formal analysis, M.J.-K., M.D., M.W., M.Ł., K.S., A.A., A.F., N.G., I.S. and M.J.D.; investigation, M.J.-K., M.D., M.W., M.Ł., K.S., A.A., A.F., N.G., I.S. and M.J.D.; resources, D.P. and M.J.D.; data curation, M.J.-K., M.D., M.W., M.Ł., K.S., A.A., A.F., N.G., I.S. and M.J.D.; writing—original draft preparation, M.J.-K., M.D. and M.J.D.; writing—review and editing, M.J.-K., M.D., M.W., K.S., A.A., N.G., V.P., M.G., I.S., D.P. and M.J.D.; visualization, M.J.-K., M.D., M.W., K.S., A.A., N.G., I.S. and M.J.D.; supervision, M.D., V.P., M.G., I.S., D.P. and M.J.D.; project administration, D.P. and M.J.D.; funding acquisition, D.P. All authors have read and agreed to the published version of the manuscript.

Funding

M.W., A.A., K.S., D.P. research was funded by Warsaw University of Technology within the Excellence Initiative: Research University (IDUB) program and co-supported by Polish National Science Centre (2020/37/B/NZ2/03757). Computations were performed thanks to the Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology using Artificial Intelligence HPC platform financed by Polish Ministry of Science and Higher Education (decision no. 7054/IA/SP/2020 of 28 August 2020). The work was co-supported by National Institute of Health USA 4DNucleome grant 1U54DK107967-01 “Nucleome Positioning System for Spatiotemporal Genome Organization and Regulation”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article, Supplementary Materials and TCGA publicly available database.

Acknowledgments

A special thanks to Jacek Koronacki for his comprehensive review of the document and invaluable support of the authors. We also would like to thank Małgorzata Perycz for the revision of the manuscript draft.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ChIA-PETChromatin Interaction Analysis by Paired-End Tag Sequencing
CpGCytosine Phosphate Guanine
CpGICpg Island
DEGDifferentially Expressed Gene
DMSDifferentially Methylated Sites
DScoreDrug Score
GScoreGene Score
GO BPGene Ontology Biological Processes
H3K27acAcetylation of the Lysine 27 of the Histone H3 Protein
HOCOMOCOHomo Sapiens Comprehensive Model Collection
KEGGKyoto Encyclopedia of Genes and Genomes
log2FCLog2 Fold Change
MCFS-IDMonte Carlo Feature Selection and Interdependencies Discovery
NLPNatural Language Processing Methods
NCBINational Center for Biotechnology Information
PCHi-CPromoter Capture Hi-C
PWMPosition Weighted Matrix
RFRandom Forest
RIRelative Importance
SMSpring Model
SVMSupport Vector Machine
TCGAThe Cancer Genome Atlas
TFTranscription Factor
TF-IDFTerm Frequency-Inverse Document Frequency
TFBSTranscription Factor Binding Site
TSSTranscription Start Site
wAccWeighted Accuracy
WHOWorld Health Organization

References

  1. Sung, H.; Ferlay, J.; Siegel, R.L.; Laversanne, M.; Soerjomataram, I.; Jemal, A.; Bray, F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 Countries. CA A Cancer J. Clin. 2021, 71, 209–249. [Google Scholar] [CrossRef] [PubMed]
  2. Dumbrava, E.I.; Meric-Bernstam, F. Personalized cancer therapy—Leveraging a knowledge base for clinical decision-making. Mol. Case Stud. 2018, 4, a001578. [Google Scholar] [CrossRef] [PubMed]
  3. Chopra, S.; Khosla, M.; Vidya, R. Innovations and challenges in breast cancer care: A review. Medicina 2023, 59, 957. [Google Scholar] [CrossRef]
  4. Bean, G.R.; Lin, C.Y. Breast neuroendocrine neoplasms: Practical applications and continuing challenges in the era of the 5th edition of the WHO classification of breast tumours. Diagn. Histopathol. 2021, 27, 139–147. [Google Scholar] [CrossRef]
  5. Cree, I.A.; White, V.A.; Indave, B.I.; Lokuhetty, D. Revising the WHO classification: Female genital tract tumours. Histopathology 2019, 76, 151–156. [Google Scholar] [CrossRef]
  6. Blumer, A.; Ehrenfeucht, A.; Haussler, D.; Warmuth, M.K. Learnability and the Vapnik-Chervonenkis dimension. J. ACM 1989, 36, 929–965. [Google Scholar] [CrossRef]
  7. Dramiński, M.; Koronacki, J. rmcfs: An R Package for Monte Carlo Feature Selection and Interdependency Discovery. J. Stat. Softw. 2018, 85, 1–28. [Google Scholar] [CrossRef]
  8. Dramiński, M.; Da̧browski, M.J.; Diamanti, K.; Koronacki, J.; Komorowski, J. Discovering networks of interdependent features in high-dimensional problems. In Big Data Analysis: New Algorithms for a New Society. Studies in Big Data; Japkowicz, N., Stefanowski, J., Eds.; Springer International Publishing: Cham, Switzerland, 2015; Volume 16, pp. 285–304. [Google Scholar]
  9. Chen, L.; Zhou, X.; Zeng, T.; Pan, X.; Zhang, Y.H.; Huang, T.; Fang, Z.; Cai, Y.D. Recognizing pattern and rule of mutation signatures corresponding to cancer types. Front. Cell Dev. Biol. 2021, 9, 712931. [Google Scholar] [CrossRef]
  10. Li, J.; Xu, Q.; Wu, M.; Huang, T.; Wang, Y. Pan-cancer classification based on self-normalizing neural networks and feature selection. Front. Bioeng. Biotechnol. 2020, 8, 766. [Google Scholar] [CrossRef]
  11. Li, Z.; Mei, Z.; Ding, S.; Chen, L.; Li, H.; Feng, K.; Huang, T.; Cai, Y.D. Identifying methylation signatures and rules for COVID-19 with machine learning methods. Front. Mol. Biosci. 2022, 9, 908080. [Google Scholar] [CrossRef]
  12. Chen, L.; Zhang, S.; Pan, X.; Hu, X.; Zhang, Y.H.; Yuan, F.; Huang, T.; Cai, Y.D. HIV infection alters the human epigenetic landscape. Gene Ther. 2018, 26, 29–39. [Google Scholar] [CrossRef] [PubMed]
  13. Li, D.; Lin, H.; Li, L. Multiple feature selection strategies identified novel cardiac gene expression signature for heart failure. Front. Physiol. 2020, 11, 604241. [Google Scholar] [CrossRef] [PubMed]
  14. Paratala, B.S.; Dolfi, S.C.; Khiabanian, H.; Rodríguez-Rodríguez, L.; Ganesan, S.; Hirshfield, K.M. Emerging role of genomic rearrangements in breast cancer: Applying knowledge from other cancers. Biomark. Cancer 2016, 8, 1–14. [Google Scholar] [CrossRef] [PubMed]
  15. Banerji, S.; Cibulskis, K.; Rangel-Escareno, C.; Brown, K.K.; Carter, S.L.; Frederick, A.M.; Lawrence, M.S.; Sivachenko, A.Y.; Sougnez, C.; Zou, L.; et al. Sequence analysis of mutations and translocations across breast cancer subtypes. Nature 2012, 486, 405–409. [Google Scholar] [CrossRef]
  16. Kagohara, L.T.; Stein-O’Brien, G.L.; Kelley, D.; Flam, E.; Wick, H.C.; Danilova, L.V.; Easwaran, H.; Favorov, A.V.; Qian, J.; Gaykalova, D.A.; et al. Epigenetic regulation of gene expression in cancer: Techniques; resources and analysis. Brief. Funct. Genom. 2017, 17, 49–63. [Google Scholar] [CrossRef]
  17. Dąbrowski, M.J.; Dramiński, M.; Diamanti, K.; Stępniak, K.; Mozolewska, M.A.; Teisseyre, P.; Koronacki, J.; Komorowski, J.; Kaminska, B.; Wojtas, B. Unveiling new interdependencies between significant DNA methylation sites; gene expression profiles and glioma patients survival. Sci. Rep. 2018, 8, 4390. [Google Scholar] [CrossRef]
  18. Cao, J.; Yan, Q. Cancer epigenetics; tumor immunity; and immunotherapy. Trends Cancer 2020, 6, 580–592. [Google Scholar] [CrossRef]
  19. Zhong, Q.; Fan, J.; Chu, H.; Pang, M.; Li, J.; Fan, Y.; Liu, P.; Wu, C.; Qiao, J.; Li, R.; et al. Integrative analysis of genomic and epigenetic regulation of endometrial cancer. Aging 2020, 12, 9260–9274. [Google Scholar] [CrossRef]
  20. Vezzani, B.; Carinci, M.; Previati, M.; Giacovazzi, S.; Della Sala, M.; Gafà, R.; Lanza, G.; Wieckowski, M.R.; Pinton, P.; Giorgi, C. Epigenetic regulation: A link between inflammation and carcinogenesis. Cancers 2022, 14, 1221. [Google Scholar] [CrossRef]
  21. Portela, A.; Esteller, M. Epigenetic modifications and human disease. Nat. Biotechnol. 2010, 28, 1057–1068. [Google Scholar] [CrossRef]
  22. Skvortsova, K.; Stirzaker, C.; Taberlay, P. The DNA methylation landscape in cancer. Essays Biochem. 2019, 63, 797–811. [Google Scholar] [CrossRef] [PubMed]
  23. Yan, W.; Herman, J.G.; Guo, M. Epigenome-based personalized medicine in human cancer. Epigenomics 2016, 8, 119–133. [Google Scholar] [CrossRef] [PubMed]
  24. Héberlé, É.; Bardet, A. Sensitivity of transcription factors to DNA methylation. Essays Biochem. 2019, 63, 727–741. [Google Scholar] [CrossRef]
  25. Blattler, A.; Farnham, P.J. Cross-talk between site-specific transcription factors and DNA methylation states. J. Biol. Chem. 2013, 288, 34287–34294. [Google Scholar] [CrossRef]
  26. Costa, F.F.; Paixão, V.A.; Cavalher, F.P.; Ribeiro, K.B.; Cunha, I.W.; Rinck, J.A., Jr.; O’Hare, M.; Mackay, A.; Soares, F.A.; Brentani, R.R.; et al. SATR-1 hypomethylation is a common and early event in breast cancer. Cancer Genet. Cytogenet. 2006, 165, 135–143. [Google Scholar] [CrossRef]
  27. Yi, J.; Gao, R.; Chen, Y.; Yang, Z.; Han, P.; Zhang, H.; Dou, Y.; Liu, W.; Wang, W.; Du, G.; et al. Overexpression of NSUN2 by DNA hypomethylation is associated with metastatic progression in human breast cancer. Oncotarget 2016, 8, 20751–20765. [Google Scholar] [CrossRef]
  28. Choi, J.Y.; James, S.R.; Link, P.A.; McCann, S.E.; Hong, C.C.; Davis, W.; Nesline, M.K.; Ambrosone, C.B.; Karpf, A.R. Association between global DNA hypomethylation in leukocytes and risk of breast cancer. Carcinogenesis 2009, 30, 1889–1897. [Google Scholar] [CrossRef]
  29. Martin, T.A.; Goyal, A.; Watkins, G.; Jiang, W.G. Expression of the transcription factors Snail; Slug; and Twist and their clinical significance in human breast cancer. Ann. Surg. Oncol. 2005, 12, 488–496. [Google Scholar] [CrossRef]
  30. Dulaimi, E.; Hillinck, J.; Ibanez de Caceres, I.; Al-Saleem, T.; Cairns, P. Tumor suppressor gene promoter hypermethylation in serum of breast cancer patients. Clin. Cancer Res. 2004, 10, 6189–6193. [Google Scholar] [CrossRef]
  31. Alvarez, C.; Tapia, T.; Cornejo, V.; Fernandez, W.; Muñoz, A.; Camus, M.; Alvarez, M.; Devoto, L.; Carvallo, P. Silencing of tumor suppressor genes RASSF1A.; SLIT2; and WIF1 by promoter hypermethylation in hereditary breast cancer. Mol. Carcinog. 2012, 52, 475–487. [Google Scholar] [CrossRef]
  32. Su, J.; Huang, Y.H.; Cui, X.; Wang, X.; Zhang, X.; Lei, Y.; Xu, J.; Lin, X.; Chen, K.; Lv, J.; et al. Homeobox oncogene activation by pan-cancer DNA hypermethylation. Genome Biol. 2018, 19, 108. [Google Scholar] [CrossRef] [PubMed]
  33. Spainhour, J.C.; Lim, H.S.; Yi, S.V.; Qiu, P. Correlation patterns between DNA methylation and gene expression in the cancer genome atlas. Cancer Inform. 2019, 18, 1176935119828776. [Google Scholar] [CrossRef] [PubMed]
  34. Yao, Q.; Chen, Y.; Zhou, X. The roles of microRNAs in epigenetic regulation. Curr. Opin. Chem. Biol. 2019, 51, 11–17. [Google Scholar] [CrossRef]
  35. Ying, S.Y.; Chang, D.C.; Lin, S.L. The microRNA (miRNA): Overview of the RNA genes that modulate gene function. Mol. Biotechnol. 2008, 38, 257–268. [Google Scholar] [CrossRef]
  36. Pavlíková, L.; Šereš, M.; Breier, A.; Sulová, Z. The roles of microRNAs in cancer multidrug resistance. Cancers 2022, 14, 1090. [Google Scholar] [CrossRef]
  37. Muñoz, J.P.; Pérez-Moreno, P.; Pérez, Y.; Calaf, G.M. The role of microRNAs in breast cancer and the challenges of their clinical application. Diagnostics 2023, 13, 3072. [Google Scholar] [CrossRef]
  38. Umer, H.M.; Cavalli, M.; Dabrowski, M.; Diamanti, K.; Kruczyk, M.; Pan, G.; Komorowski, J.; Wadelius, C. A significant regulatory mutation burden at a high-affinity position of the CTCF motif in gastrointestinal cancers. Hum. Mutat. 2016, 37, 904–913. [Google Scholar] [CrossRef]
  39. Monteagudo-Sánchez, A.; Noordermeer, D.; Greenberg, M.V.C. The impact of DNA methylation on CTCF-mediated 3D genome organization. Nat. Struct. Mol. Biol. 2024, 31, 404–412. [Google Scholar] [CrossRef]
  40. Grabowicz, I.E.; Wilczyński, B.; Kamińska, B.; Roura, A.J.; Wojtaś, B.; Dąbrowski, M.J. The role of epigenetic modifications; long-range contacts; enhancers and topologically associating domains in the regulation of glioma grade-specific genes. Sci. Rep. 2021, 11, 15668. [Google Scholar] [CrossRef]
  41. Poulos, R.C.; Sloane, M.A.; Hesson, L.B.; Wong, J.W.H. The search for cis-regulatory driver mutations in cancer genomes. Oncotarget 2015, 6, 32509–32525. [Google Scholar] [CrossRef]
  42. Cestarelli, V.; Fiscon, G.; Felici, G.; Bertolazzi, P.; Weitschek, E. CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules. Bioinformatics 2015, 32, 697–704. [Google Scholar] [CrossRef] [PubMed]
  43. Huang, J.; Sun, Y.; Chen, H.; Liao, Y.; Li, S.; Chen, C.; Yang, Z. ADAMTS5 acts as a tumor suppressor by inhibiting migration; invasion and angiogenesis in human gastric cancer. Gastric Cancer 2018, 22, 287–301. [Google Scholar] [CrossRef]
  44. Choi, B.; Han, T.S.; Min, J.; Hur, K.; Lee, S.M.; Lee, H.J.; Kim, Y.J.; Yang, H.K. MAL and TMEM220 are novel DNA methylation markers in human gastric cancer. Biomarkers 2016, 22, 35–44. [Google Scholar] [CrossRef] [PubMed]
  45. Liu, G.; Li, J.; Zhang, C.Y.; Huang, D.Y.; Xu, J.W. ARHGAP20 Expression inhibited HCC progression by regulating the PI3K-AKT signaling pathway. J. Hepatocell. Carcinoma 2021, 8, 271–284. [Google Scholar] [CrossRef]
  46. Bai, Y.; Wei, C.; Zhong, Y.; Zhang, Y.; Long, J.; Huang, S.; Xie, F.; Tian, Y.; Wang, X.; Zhao, H. Development and validation of a prognostic nomogram for gastric cancer based on DNA methylation-driven differentially expressed genes. Int. J. Biol. Sci. 2020, 16, 1153–1165. [Google Scholar] [CrossRef]
  47. Giussani, M.; Landoni, E.; Merlino, G.; Turdo, F.; Veneroni, S.; Paolini, B.; Cappelletti, V.; Miceli, R.; Orlandi, R.; Triulzi, T.; et al. Extracellular matrix proteins as diagnostic markers of breast carcinoma. J. Cell. Physiol. 2018, 233, 6280–6290. [Google Scholar] [CrossRef]
  48. Zhuang, Y.; Li, X.; Zhan, P.; Pi, G.; Wen, G. MMP11 promotes the proliferation and progression of breast cancer through stabilizing Smad2 protein. Oncol. Rep. 2021, 45, 16. [Google Scholar] [CrossRef]
  49. Ozturk, S.; Papageorgis, P.; Wong, C.K.; Lambert, A.W.; Abdolmaleky, H.M.; Thiagalingam, A.; Cohen, H.T.; Thiagalingam, S. SDPR functions as a metastasis suppressor in breast cancer by promoting apoptosis. Proc. Natl. Acad. Sci. USA 2016, 113, 638–643. [Google Scholar] [CrossRef]
  50. Nema, R.; Kumar, A. Sphingosine-1-phosphate catabolizing enzymes predict better prognosis in triple-negative breast cancer patients and correlates with tumor-infiltrating immune cells. Front. Mol. Biosci. 2021, 8, 697922. [Google Scholar] [CrossRef]
  51. Zhang, W.; Wang, H.; Qi, Y.; Li, S.; Geng, C. Epigenetic study of early breast cancer (EBC) based on DNA methylation and gene integration analysis. Sci. Rep. 2022, 12, 1989. [Google Scholar] [CrossRef]
  52. Bao, Y.; Wang, L.; Shi, L.; Yun, F.; Liu, X.; Chen, Y.; Chen, C.; Ren, Y.; Jia, Y. Transcriptome profiling revealed multiple genes and ECM-receptor interaction pathways that may be associated with breast cancer. Cell. Mol. Biol. Lett. 2019, 24, 38. [Google Scholar] [CrossRef]
  53. Gillespie, M.; Jassal, B.; Stephan, R.; Milacic, M.; Rothfels, K.; Senff-Ribeiro, A.; Griss, J.; Sevilla, C.; Matthews, L.; Gong, C.; et al. The reactome pathway knowledgebase 2022. Nucleic Acids Res. 2021, 50, D687–D692. [Google Scholar] [CrossRef] [PubMed]
  54. Ng, P.K.S.; Lau, C.P.Y.; Lam, E.K.Y.; Li, S.S.K.; Lui, V.W.Y.; Yeo, W.; Ng, Y.K.; Lai, P.B.S.; Tsui, S.K.W. Hypermethylation of NF-κB-Activating Protein-Like (NKAPL) promoter in hepatocellular carcinoma suppresses Its expression and predicts a poor prognosis. Dig. Dis. Sci. 2018, 63, 676–686. [Google Scholar] [CrossRef] [PubMed]
  55. Dai, H.; Gallagher, D.; Schmitt, S.; Pessetto, Z.Y.; Fan, F.; Godwin, A.K.; Tawfik, O. Role of miR-139 as a surrogate marker for tumor aggression in breast cancer. Hum. Pathol. 2016, 61, 68–77. [Google Scholar] [CrossRef]
  56. Halim, A.; Al-Qadi, N.; Kenyon, E.; Conner, K.N.; Mondal, S.K.; Medarova, Z.; Moore, A. Inhibition of miR-10b treats metastatic breast cancer by targeting stem cell-like properties. Oncotarget 2024, 15, 591–606. [Google Scholar] [CrossRef]
  57. Wang, H.; Tan, Z.; Hu, H.; Liu, H.; Wu, T.; Zheng, C.; Wang, X.; Luo, Z.; Wang, J.; Liu, S.; et al. microRNA-21 promotes breast cancer proliferation and metastasis by targeting LZTFL1. BMC Cancer 2019, 19, 738. [Google Scholar] [CrossRef]
  58. Mohammaddoust, S.; Sadeghizadeh, M. Mir-183 functions as an oncogene via decreasing PTEN in breast cancer cells. Sci. Rep. 2023, 13, 8086. [Google Scholar] [CrossRef]
  59. Hajibabaei, S.; Sotoodehnejadnematalahi, F.; Nafissi, N.; Zeinali, S.; Azizi, M. Aberrant promoter hypermethylation of miR-335 and miR-145 is involved in breast cancer PD-L1 overexpression. Sci. Rep. 2023, 13, 1003. [Google Scholar] [CrossRef]
  60. Long, X.; Shi, Y.; Ye, P.; Guo, J.; Zhou, Q.; Tang, Y. MicroRNA-99a Suppresses Breast Cancer Progression by Targeting FGFR3. Front. Oncol. 2020, 9, 1473. [Google Scholar] [CrossRef]
  61. Sameti, P.; Tohidast, M.; Amini, M.; Bahojb Mahdavi, S.Z.; Najafi, S.; Mokhtarzadeh, A. The emerging role of MicroRNA-182 in tumorigenesis; a promising therapeutic target. Cancer Cell Int. 2023, 23, 134. [Google Scholar] [CrossRef]
  62. Hong, Y.; Liang, H.; Urrehman, U.; Wang, Y.; Zhang, W.; Zhou, Y.; Chen, S.; Yu, M.; Cui, S.; Liu, M.; et al. miR-96 promotes cell proliferation, migration and invasion by targeting PTPN9 in breast cancer. Sci. Rep. 2016, 6, 37421. [Google Scholar] [CrossRef] [PubMed]
  63. Gharehdaghchi, Z.; Baradaran, B.; Salehzadeh, A.; Kazemi, T. miR-486-5p regulates cell proliferation and migration in breast cancer. Meta Gene 2019, 23, 100643. [Google Scholar] [CrossRef]
  64. Li, P.; Xu, T.; Zhou, X.; Liao, L.; Pang, G.; Luo, W.; Han, L.; Zhang, J.; Luo, X.; Xie, X.; et al. Downregulation of miRNA-141 in breast cancer cells is associated with cell migration and invasion: Involvement of ANP32E targeting. Cancer Med. 2017, 6, 662–672. [Google Scholar] [CrossRef] [PubMed]
  65. Hortobagyi, G.N.; Stemmer, S.M.; Burris, H.A.; Yap, Y.S.; Sonke, G.S.; Hart, L.; Campone, M.; Petrakova, K.; Winer, E.P.; Janni, W.; et al. Overall survival with ribociclib plus letrozole in advanced breast cancer. N. Engl. J. Med. 2022, 386, 942–950. [Google Scholar] [CrossRef]
  66. Tolaney, S.M.; Beeram, M.; Beck, J.T.; Conlin, A.; Dees, E.C.; Puhalla, S.L.; Rexer, B.N.; Burris, H.A.; Jhaveri, K.; Helsten, T.; et al. Abemaciclib in combination with endocrine therapy for patients with hormone receptor-positive; HER2-negative metastatic breast cancer: A phase 1b study. Front. Oncol. 2022, 11, 810023. [Google Scholar] [CrossRef]
  67. Kanehisa, M.; Furumichi, M.; Sato, Y.; Ishiguro-Watanabe, M.; Tanabe, M. KEGG: Integrating viruses and cellular organisms. Nucleic Acids Res. 2020, 49, D545–D551. [Google Scholar] [CrossRef]
  68. Colak, S.; ten Dijke, P. Targeting TGF-β Signaling in Cancer. Trends Cancer 2017, 3, 56–71. [Google Scholar] [CrossRef]
  69. Michaelis, M.; Doerr, H.W.; Cinatl, J. The story of human cytomegalovirus and cancer: Increasing evidence and open questions. Neoplasia 2009, 11, 1–9. [Google Scholar] [CrossRef]
  70. Zhao, J.; Xu, Y. PITX1 plays essential functions in cancer. Front. Oncol. 2023, 13, 1253238. [Google Scholar] [CrossRef]
  71. Vidula, N.; Yau, C.; Wolf, D.; Rugo, H.S. Androgen receptor gene expression in primary breast cancer. npj Breast Cancer 2019, 5, 47. [Google Scholar] [CrossRef]
  72. Fallah, Y.; Brundage, J.; Allegakoen, P.; Shajahan-Haq, A.N. MYC-driven pathways in breast cancer subtypes. Biomolecules 2017, 7, 53. [Google Scholar] [CrossRef] [PubMed]
  73. Pietri, E.; Conteduca, V.; Andreis, D.; Massa, I.; Melegari, E.; Sarti, S.; Cecconetto, L.; Schirone, A.; Bravaccini, S.; Serra, P.; et al. Androgen receptor signaling pathways as a target for breast cancer treatment. Endocr.-Relat. Cancer 2016, 23, R485–R498. [Google Scholar] [CrossRef] [PubMed]
  74. Hardy, K.M.; Booth, B.W.; Hendrix, M.J.C.; Salomon, D.S.; Strizzi, L. ErbB/EGF signaling and EMT in mammary development and breast cancer. J. Mammary Gland Biol. Neoplasia 2010, 15, 191–199. [Google Scholar] [CrossRef] [PubMed]
  75. Thanasupawat, T.; Glogowska, A.; Nivedita-Krishnan, S.; Wilson, B.; Klonisch, T.; Hombach-Klonisch, S. Emerging roles for the relaxin/RXFP1 system in cancer therapy. Mol. Cell. Endocrinol. 2019, 487, 85–93. [Google Scholar] [CrossRef]
  76. Yang, S.; Chen, K.; Cao, K.; Xu, S.; Ma, C.; Cai, Y.; Hu, Y.; Zhou, Y. miR-182-5p Inhibits NKAPL expression and Promotes the Proliferation of Osteosarcoma. Biotechnol. Bioprocess Eng. 2021, 26, 758–766. [Google Scholar] [CrossRef]
  77. Zhang, X.; Kang, X.; Jin, L.; Bai, J.; Zhang, H.; Liu, W.; Wang, Z. ABCC9; NKAPL; and TMEM132C are potential diagnostic and prognostic markers in triple-negative breast cancer. Cell Biol. Int. 2020, 44, 2002–2010. [Google Scholar] [CrossRef]
  78. Silva, R.; Glennon, K.; Metoudi, M.; Moran, B.; Salta, S.; Slattery, K.; Treacy, A.; Martin, T.; Shaw, J.; Doran, P.; et al. Unveiling the epigenomic mechanisms of acquired platinum-resistance in high-grade serous ovarian cancer. Int. J. Cancer 2023, 153, 120–132. [Google Scholar] [CrossRef]
  79. Morales-Martínez, M.; Vega, M.I. Role of MicroRNA-7 (MiR-7) in Cancer Physiopathology. Int. J. Mol. Sci. 2022, 23, 9091. [Google Scholar] [CrossRef]
  80. Schneider, E.; Pliushch, G.; El Hajj, N.; Galetzka, D.; Puhl, A.; Schorsch, M.; Frauenknecht, K.; Riepert, T.; Tresch, A.; Müller, A.M.; et al. Spatial, temporal and interindividual epigenetic variation of functionally important DNA methylation patterns. Nucleic Acids Res. 2010, 38, 3880–3890. [Google Scholar] [CrossRef]
  81. Cedar, H. DNA methylation and gene activity. Cell 1988, 53, 3–4. [Google Scholar] [CrossRef]
  82. Szalaj, P.; Michalski, P.J.; Wróblewski, P.; Tang, Z.; Kadlof, M.; Mazzocco, G.; Ruan, Y.; Plewczynski, D. 3D-GNOME: An integrated web service for structural modeling of the 3D genome. Nucleic Acids Res. 2016, 44, W288–W293. [Google Scholar] [CrossRef] [PubMed]
  83. Wlasnowolski, M.; Sadowski, M.; Czarnota, T.; Jodkowska, K.; Szalaj, P.; Tang, Z.; Ruan, Y.; Plewczynski, D. 3D-GNOME 2.0: A three-dimensional genome modeling engine for predicting structural variation-driven alterations of chromatin spatial structure in the human genome. Nucleic Acids Res. 2020, 48, W170–W176. [Google Scholar] [CrossRef] [PubMed]
  84. Pettersen, E.F.; Goddard, T.D.; Huang, C.C.; Couch, G.S.; Greenblatt, D.M.; Meng, E.C.; Ferrin, T.E. UCSF Chimera--a visualization system for exploratory research and analysis. J. Comput. Chem. 2004, 25, 1605–1612. [Google Scholar] [CrossRef] [PubMed]
  85. Grubert, F.; Srivas, R.; Spacek, D.; Kasowski, M.; Ruiz-Velasco, M.; Sinnott-Armstrong, N.; Greenside, P.; Narasimha, A.; Liu, Q.; Geller, B.; et al. Landscape of cohesin-mediated chromatin loops in the human genome. Nature 2020, 583, 737–743. [Google Scholar] [CrossRef]
  86. Javierre, B.M.; Burren, O.S.; Wilder, S.P.; Kreuzhuber, R.; Hill, S.M.; Sewitz, S.; Cairns, J.; Wingett, S.W.; Várnai, C.; Thiecke, M.J.; et al. Lineage-Specific Genome Architecture Links Enhancers and Non-coding Disease Variants to Target Gene Promoters. Cell 2016, 167, 1369–1384.e19. [Google Scholar] [CrossRef]
  87. Beesley, J.; Sivakumaran, H.; Moradi Marjaneh, M.; Lima, L.G.; Hillman, K.M.; Kaufmann, S.; Tuano, N.; Hussein, N.; Ham, S.; Mukhopadhyay, P.; et al. Chromatin interactome mapping at 139 independent breast cancer risk signals. Genome Biol. 2020, 21, 8. [Google Scholar] [CrossRef]
  88. Achinger-Kawecka, J.; Stirzaker, C.; Portman, N.; Campbell, E.; Chia, K.M.; Du, Q.; Laven-Law, G.; Nair, S.S.; Yong, A.; Wilkinson, A.; et al. The potential of epigenetic therapy to target the 3D epigenome in endocrine-resistant breast cancer. Nat. Struct. Mol. Biol. 2024, 31, 498–512. [Google Scholar] [CrossRef]
  89. Kadlof, M.; Rozycka, J.; Plewczynski, D. Spring Model—Chromatin modeling tool based on OpenMM. Methods 2020, 181–182, 62–69. [Google Scholar] [CrossRef]
  90. Mei, J.; Zhao, J.; Fu, Y. Molecular classification of breast cancer using the mRNA expression profiles of immune-related genes. Sci. Rep. 2020, 10, 4800. [Google Scholar] [CrossRef]
  91. Deng, J.L.; Xu, Y.; Wang, G. Identification of potential crucial genes and key pathways in breast cancer using bioinformatic analysis. Front. Genet. 2019, 10, 695. [Google Scholar] [CrossRef]
  92. de Cárcer, G.; Venkateswaran, S.V.; Salgueiro, L.; El Bakkali, A.; Somogyi, K.; Rowald, K.; Montañés, P.; Sanclemente, M.; Escobar, B.; de Martino, A.; et al. Plk1 overexpression induces chromosomal instability and suppresses tumor development. Nat. Commun. 2018, 9, 3012. [Google Scholar] [CrossRef] [PubMed]
  93. Lashen, A.; Toss, M.S.; Wootton, L.L.; Green, A.; Mongan, N.P.; Madhusudan, S.; Rakha, E. Characteristics and prognostic significance of polo-like kinase-1 (PLK1) expression in breast cancer. Histopathology 2023, 83, 414–425. [Google Scholar] [CrossRef] [PubMed]
  94. Xu, Z.; Shen, W.; Pan, A.; Sun, F.; Zhang, J.; Gao, P.; Li, L. Decreased Nek9 expression correlates with aggressive behaviour and predicts unfavourable prognosis in breast cancer. Pathology 2020, 52, 329–335. [Google Scholar] [CrossRef]
  95. Lesko, A.C.; Goss, K.H.; Yang, F.F.; Schwertner, A.; Hulur, I.; Onel, K.; Prosperi, J.R. The APC tumor suppressor is required for epithelial cell polarization and three-dimensional morphogenesis. Biochim. Biophys. Acta (BBA)—Mol. Cell Res. 2015, 1853, 711–723. [Google Scholar] [CrossRef]
  96. Seitz, H.K.; Stickel, F. Acetaldehyde as an underestimated risk factor for cancer development: Role of genetics in ethanol metabolism. Genes Nutr. 2010, 5, 121–128. [Google Scholar] [CrossRef]
  97. Fu, Y.; Zou, T.; Shen, X.; Nelson, P.J.; Li, J.; Wu, C.; Yang, J.; Zheng, Y.; Bruns, C.; Zhao, Y.; et al. Lipid metabolism in cancer progression and therapeutic strategies. MedComm. 2020, 2, 27–59. [Google Scholar] [CrossRef]
  98. Long, J.; Zhang, C.J.; Zhu, N.; Du, K.; Yin, Y.F.; Tan, X.; Liao, D.F.; Qin, L. Lipid metabolism and carcinogenesis; cancer development. Am. J. Cancer Res. 2018, 8, 778–791. [Google Scholar]
  99. Chen, M.C.; Hsu, S.L.; Lin, H.; Yang, T.Y. Retinoic acid and cancer treatment. BioMedicine 2014, 4, 22. [Google Scholar] [CrossRef]
  100. Jan, N.; Sofi, S.; Qayoom, H.; Haq, B.U.; Shabir, A.M.; Mir, M.A. Targeting breast cancer stem cells through retinoids: A new hope for treatment. Crit. Rev. Oncol./Hematol. 2023, 192, 104156. [Google Scholar] [CrossRef]
  101. Costantini, L.; Molinari, R.; Farinon, B.; Merendino, N. Retinoic acids in the treatment of most lethal solid cancers. J. Clin. Med. 2020, 9, 360. [Google Scholar] [CrossRef]
  102. Li, R.Q.; Zhao, X.H.; Zhu, Q.; Liu, T.; Hondermarck, H.; Thorne, R.F.; Zhang, X.D.; Gao, J.N. Exploring neurotransmitters and their receptors for breast cancer prevention and treatment. Theranostics 2023, 13, 1109–1129. [Google Scholar] [CrossRef] [PubMed]
  103. Rosenbaum, D.M.; Rasmussen, S.G.F.; Kobilka, B.K. The structure and function of G-protein-coupled receptors. Nature 2009, 459, 356–363. [Google Scholar] [CrossRef] [PubMed]
  104. Dorsam, R.T.; Gutkind, J.S. G-protein-coupled receptors and cancer. Nat. Rev. Cancer 2007, 7, 79–94. [Google Scholar] [CrossRef]
  105. De Francesco, E.; Sotgia, F.; Clarke, R.; Lisanti, M.; Maggiolini, M. G Protein-Coupled receptors at the crossroad between physiologic and pathologic angiogenesis: Old paradigms and emerging concepts. Int. J. Mol. Sci. 2017, 18, 2713. [Google Scholar] [CrossRef]
  106. Singh, A.; Nunes, J.J.; Ateeq, B. Role and therapeutic potential of G-protein coupled receptors in breast cancer progression and metastases. Eur. J. Pharmacol. 2015, 763, 178–183. [Google Scholar] [CrossRef]
  107. Lappano, R.; Jacquot, Y.; Maggiolini, M. GPCR modulation in breast cancer. Int. J. Mol. Sci. 2018, 19, 3840. [Google Scholar] [CrossRef]
  108. He, J.; Fortunati, E.; Liu, D.X.; Yan, L. Pleiotropic roles of ABC transporters in breast cancer. Int. J. Mol. Sci. 2021, 22, 3199. [Google Scholar] [CrossRef]
  109. Chen, Y.; Gera, L.; Zhang, S.; Li, X.; Yang, Y.; Mamouni, K.; Wu, A.Y.; Liu, H.; Kucuk, O.; Wu, D. Small molecule BKM1972 inhibits human prostate cancer growth and overcomes docetaxel resistance in intraosseous models. Cancer Lett. 2019, 446, 62–72. [Google Scholar] [CrossRef]
  110. Muriithi, W.; Wanjiku Macharia, L.W.; Pilotto Heming, C.; Lima Echevarria, J.; Nyachieo, A.; Niemeyer Filho, P.; Moura Neto, V. ABC transporters and the hallmarks of cancer: Roles in cancer aggressiveness beyond multidrug resistance. Cancer Biol. Med. 2020, 17, 253–269. [Google Scholar] [CrossRef]
  111. Demidenko, R.; Razanauskas, D.; Daniunaite, K.; Lazutka, J.R.; Jankevicius, F.; Jarmalaite, S. Frequent down-regulation of ABC transporter genes in prostate cancer. BMC Cancer 2015, 15, 683. [Google Scholar] [CrossRef]
  112. Elsnerova, K.; Mohelnikova-Duchonova, B.; Cerovska, E.; Ehrlichova, M.; Gut, I.; Rob, L.; Skapa, P.; Hruda, M.; Bartakova, A.; Bouda, J.; et al. Gene expression of membrane transporters: Importance for prognosis and progression of ovarian carcinoma. Oncol. Rep. 2016, 35, 2159–2170. [Google Scholar] [CrossRef] [PubMed]
  113. Lv, C.; Yang, H.; Yu, J.; Dai, X. ABCA8 inhibits breast cancer cell proliferation by regulating the AMP activated protein kinase/mammalian target of rapamycin signaling pathway. Environ. Toxicol. 2022, 37, 1423–1431. [Google Scholar] [CrossRef] [PubMed]
  114. Agius, R.; Parviz, M.; Niemann, C.U. Artificial intelligence models in chronic lymphocytic leukemia—Recommendations toward state-of-the-art. Leuk. Lymphoma 2021, 63, 265–278. [Google Scholar] [CrossRef]
  115. Wilting, R.H.; Dannenberg, J.H. Epigenetic mechanisms in tumorigenesis; tumor cell heterogeneity and drug resistance. Drug Resist. Updates 2012, 15, 21–38. [Google Scholar] [CrossRef] [PubMed]
  116. Ma, L.; Li, C.; Yin, H.; Huang, J.; Yu, S.; Zhao, J.; Tang, Y.; Yu, M.; Lin, J.; Ding, L.; et al. The mechanism of DNA methylation and miRNA in breast cancer. Int. J. Mol. Sci. 2023, 24, 9360. [Google Scholar] [CrossRef]
  117. Li, Y.; Lu, W.; Chen, D.; Boohaker, R.J.; Zhai, L.; Padmalayam, I.; Wennerberg, K.; Xu, B.; Zhang, W. KIFC1 is a novel potential therapeutic target for breast cancer. Cancer Biol. Ther. 2015, 16, 1316–1322. [Google Scholar] [CrossRef]
  118. Xiao, Y.X.; Yang, W.X. KIFC1: A promising chemotherapy target for cancer treatment? Oncotarget 2016, 7, 48656–48670. [Google Scholar] [CrossRef]
  119. Jiao, D.; Zhang, J.; Chen, P.; Guo, X.; Qiao, J.; Zhu, J.; Wang, L.; Lu, Z.; Liu, Z. HN1L promotes migration and invasion of breast cancer by up-regulating the expression of HMGB1. J. Cell. Mol. Med. 2020, 25, 397–410. [Google Scholar] [CrossRef]
  120. Domcke, S.; Bardet, A.F.; Adrian Ginno, P.; Hartl, D.; Burger, L.; Schübeler, D. Competition between DNA methylation and transcription factors determines binding of NRF1. Nature 2015, 528, 575–579. [Google Scholar] [CrossRef]
  121. Schmolka, N.; Karemaker, I.D.; Cardoso, R.; Recchia, D.C.; Spegg, V.; Bhaskaran, J.; Teske, M.; de Wagenaar, N.P.; Altmeyer, M.; Baubec, T. Dissecting the roles of MBD2 isoforms and domains in regulating NuRD complex function during cellular differentiation. Nat. Commun. 2023, 14, 3848. [Google Scholar] [CrossRef]
  122. Osborne, C.K. Tamoxifen in the treatment of breast cancer. N. Engl. J. Med. 1998, 339, 1609–1618. [Google Scholar] [CrossRef] [PubMed]
  123. Hu, S.; Wan, J.; Su, Y.; Song, Q.; Zeng, Y.; Nguyen, H.N.; Shin, J.; Cox, E.; Rho, H.S.; Woodard, C.; et al. DNA methylation presents distinct binding sites for human transcription factors. eLife 2013, 2, e00726. [Google Scholar] [CrossRef] [PubMed]
  124. Zhu, H.; Wang, G.; Qian, J. Transcription factors as readers and effectors of DNA methylation. Nat. Rev. Genet. 2016, 17, 551–565. [Google Scholar] [CrossRef] [PubMed]
  125. Zhao, J.; Bai, Z.; Feng, F.; Song, E.; Du, F.; Zhao, J.; Shen, G.; Ji, F.; Li, G.; Ma, X.; et al. Cross-talk between EPAS-1/HIF-2α and PXR signaling pathway regulates multi-drug resistance of stomach cancer cell. Int. J. Biochem. Cell Biol. 2016, 72, 73–88. [Google Scholar] [CrossRef]
  126. Lu, X.; Zhang, W.; Zhang, J.; Ren, D.; Zhao, P.; Ying, Y. EPAS1; a hypoxia- and ferroptosis-related gene; promotes malignant behaviour of cervical cancer by ceRNA and super-enhancer. J. Cell. Mol. Med. 2024, 28, e18361. [Google Scholar] [CrossRef]
  127. Abdollahi, A.; Pisarcik, D.A.; Roberts, D.D.; Weinstein, J.K.; Cairns, P.; Hamilton, T.A. LOT1 (PLAGL1/ZAC1); the candidate tumor suppressor gene at chromosome 6q24–25; is epigenetically regulated in cancer. J. Biol. Chem. 2003, 278, 6041–6049. [Google Scholar] [CrossRef]
  128. Lanzino, M.; Maris, P.; Sirianni, R.; Barone, I.; Casaburi, I.; Chimento, A.; Giordano, C.; Morelli, C.; Sisci, D.; Rizza, P.; et al. DAX-1; as an androgen-target gene; inhibits aromatase expression: A novel mechanism blocking estrogen-dependent breast cancer cell proliferation. Cell Death Dis. 2013, 4, e724. [Google Scholar] [CrossRef]
  129. Chae, B.J.; Lee, A.; Bae, J.S.; Song, B.J.; Jung, S.S. Expression of nuclear receptor DAX-1 and androgen receptor in human breast cancer. J. Surg. Oncol. 2011, 103, 768–772. [Google Scholar] [CrossRef]
  130. Zhang, D.; Zheng, Q.; Wang, C.; Zhao, N.; Liu, Y.; Wang, E. BHLHE41 suppresses MCF-7 cell invasion via MAPK/JNK pathway. J. Cell. Mol. Med. 2020, 24, 4001–4010. [Google Scholar] [CrossRef]
  131. Keith, B.; Johnson, R.S.; Simon, M.C. HIF1α and HIF2α: Sibling rivalry in hypoxic tumor growth and progression. Nat. Rev. Cancer 2011, 12, 9–22. [Google Scholar] [CrossRef]
  132. Okuda, H.; Kiuchi, H.; Takao, T.; Miyagawa, Y.; Tsujimura, A.; Nonomura, N.; Miyata, H.; Okabe, M.; Ikawa, M.; Kawakami, Y.; et al. A novel transcriptional factor Nkapl is a germ cell-specific suppressor of notch signaling and is indispensable for spermatogenesis. PLoS ONE 2015, 10, e0124293. [Google Scholar] [CrossRef] [PubMed]
  133. Li, M.; Sun, Q.; Wang, X. Transcriptional landscape of human cancers. Oncotarget 2017, 8, 34534–34551. [Google Scholar] [CrossRef] [PubMed]
  134. Lewis, K.A.; Gray, P.C.; Blount, A.L.; MacConell, L.A.; Wiater, E.; Bilezikjian, L.M.; Vale, W. Betaglycan binds inhibin and can mediate functional antagonism of activin signalling. Nature 2000, 404, 411–414. [Google Scholar] [CrossRef]
  135. Bao, S.; He, G. Identification of key genes and key pathways in breast cancer based on machine learning. Med. Sci. Monit. 2022, 28, e935515. [Google Scholar] [CrossRef]
  136. Dong, M.; How, T.; Kirkbride, K.C.; Gordon, K.J.; Lee, J.T.; Hempel, N.; Kelly, P.; Moeller, B.J.; Marks, J.R.; Blobe, G.C. The type III TGF-β receptor suppresses breast cancer progression. J. Clin. Investig. 2007, 117, 206–217. [Google Scholar] [CrossRef]
  137. Mayer, B.J.; Baltimore, D. Signalling through SH2 and SH3 domains. Trends Cell Biol. 1993, 3, 8–13. [Google Scholar] [CrossRef]
  138. Li, T.; Guan, L.; Tang, G.; He, B.; Huang, L.; Wang, J.; Li, M.; Bai, Y.; Li, X.; Zhang, H. Downregulation of TMEM220 promotes tumor progression in Hepatocellular Carcinoma. Cancer Gene Ther. 2021, 29, 835–844. [Google Scholar] [CrossRef]
  139. Jung, S.Y.; Kim, D.Y.; Yune, T.Y.; Shin, D.H.; Baek, S.B.; Kim, C.J. Treadmill exercise reduces spinal cord injury-induced apoptosis by activating the PI3K/Akt pathway in rats. Exp. Ther. Med. 2013, 7, 587–593. [Google Scholar] [CrossRef]
  140. Su, P.H.; Hsu, Y.C.; Huang, R.; Weng, Y.C.; Wang, H.C.; Chen, Y.; Tsai, Y.J.; Yuan, C.C.; Lai, H.C. Methylomics of nitroxidative stress on precancerous cells reveals DNA methylation alteration at the transition from in situ to invasive cervical cancer. Oncotarget 2017, 8, 65281–65291. [Google Scholar] [CrossRef]
  141. Zhang, X.; Zhang, H.; Fan, C.; Hildesjö, C.; Shen, B.; Sun, X.F. Loss of CHGA protein as a potential biomarker for colon cancer diagnosis: A study on biomarker discovery by machine learning and confirmation by immunohistochemistry in colorectal cancer tissue microarrays. Cancers 2022, 14, 2664. [Google Scholar] [CrossRef]
  142. Kirouac, D.C.; Du, J.; Lahdenranta, J.; Onsum, M.D.; Nielsen, U.B.; Schoeberl, B.; McDonagh, C.F. HER2+ cancer cell dependence on PI3K vs. MAPK signaling axes is determined by expression of EGFR.; ERBB3 and CDKN1B. PLoS Comput. Biol. 2016, 12, e1004827. [Google Scholar] [CrossRef] [PubMed]
  143. Arteaga, C.L.; Engelman, J.A. ERBB receptors: From oncogene discovery to basic science to mechanism-based cancer therapeutics. Cancer Cell 2014, 25, 282–303. [Google Scholar] [CrossRef]
  144. Paplomata, E.; O’Regan, R. The PI3K/AKT/mTOR pathway in breast cancer: Targets; trials and biomarkers. Ther. Adv. Med. Oncol. 2014, 6, 154–166. [Google Scholar] [CrossRef] [PubMed]
  145. Li, H.; Prever, L.; Hirsch, E.; Gulluni, F. Targeting PI3K/AKT/mTOR signaling pathway in breast cancer. Cancers 2021, 13, 3517. [Google Scholar] [CrossRef]
  146. Shi, P.; Feng, J.; Chen, C. Hippo pathway in mammary gland development and breast cancer. Acta Biochim. Biophys. Sin. 2014, 47, 53–59. [Google Scholar] [CrossRef]
  147. Han, Y. Analysis of the role of the Hippo pathway in cancer. J. Transl. Med. 2019, 17, 116. [Google Scholar] [CrossRef]
  148. Zhao, M.; Mishra, L.; Deng, C.X. The role of TGF-β/SMAD4 signaling in cancer. Int. J. Biol. Sci. 2018, 14, 111–123. [Google Scholar] [CrossRef]
  149. Labibi, B.; Bashkurov, M.; Wrana, J.L.; Attisano, L. Modeling the control of TGF-β/Smad nuclear accumulation by the Hippo pathway effectors; Taz/Yap. iScience 2020, 23, 101416. [Google Scholar] [CrossRef]
  150. Band, A.M.; Laiho, M. Crosstalk of TGF-β and estrogen receptor signaling in breast cancer. J. Mammary Gland Biol. Neoplasia 2011, 16, 109–115. [Google Scholar] [CrossRef]
  151. Moses, H.; Barcellos-Hoff, M.H. TGF-β biology in mammary development and breast cancer. Cold Spring Harb. Perspect. Biol. 2010, 3, a003277. [Google Scholar] [CrossRef]
  152. Zhang, Y.; Alexander, P.B.; Wang, X.F. TGF-β family signaling in the control of cell proliferation and survival. Cold Spring Harb. Perspect. Biol. 2016, 9, a022145. [Google Scholar] [CrossRef] [PubMed]
  153. Yang, Z.; Liu, Z. The emerging role of microRNAs in breast cancer. J. Oncol. 2020, 2020, 9160905. [Google Scholar] [CrossRef] [PubMed]
  154. Ye, L.; Wang, F.; Wang, J.; Wu, H.; Yang, H.; Yang, Z.; Huang, H. Role and mechanism of miR-211 in human cancer. J. Cancer 2022, 13, 2933–2944. [Google Scholar] [CrossRef] [PubMed]
  155. Ray, A.; Kunhiraman, H.; Perera, R.J. The Paradoxical Behavior of microRNA-211 in Melanomas and Other Human Cancers. Front. Oncol. 2021, 10, 628367. [Google Scholar] [CrossRef]
  156. Dąbrowski, M.J.; Wojtaś, B. Global DNA methylation patterns in human gliomas and their interplay with other epigenetic modifications. Int. J. Mol. Sci. 2019, 20, 3478. [Google Scholar] [CrossRef]
  157. Líu, H.; Tsai, H.W.; Yang, M.; Li, G.; Bian, Q.; Ding, G.; Wu, D.; Dai, J. Three-dimensional genome structure and function. MedComm. 2023, 4, e326. [Google Scholar] [CrossRef]
  158. Calderon, L.; Weiss, F.D.; Beagan, J.A.; Oliveira, M.S.; Georgieva, R.; Wang, Y.F.; Carroll, T.S.; Dharmalingam, G.; Gong, W.; Tossell, K.; et al. Cohesin-dependence of neuronal gene expression relates to chromatin loop length. eLife 2022, 11, e76539. [Google Scholar] [CrossRef]
  159. Bateman, J.R.; Johnson, J.E. Altering enhancer–promoter linear distance impacts promoter competition in cis and in trans. Genetics 2022, 222, iyac098. [Google Scholar] [CrossRef]
  160. Zheng, H.; Xie, W. The role of 3D genome organization in development and cell differentiation. Nat. Rev. Mol. Cell Biol. 2019, 20, 535–550. [Google Scholar] [CrossRef]
  161. Sehgal, P.; Chaturvedi, P. Chromatin and cancer: Implications of disrupted chromatin organization in tumorigenesis and its diversification. Cancers 2023, 15, 466. [Google Scholar] [CrossRef]
  162. Bompadre, O.; Andrey, G. Chromatin topology in development and disease. Curr. Opin. Genet. Dev. 2019, 55, 32–38. [Google Scholar] [CrossRef] [PubMed]
  163. Stephens, A.D. Chromatin rigidity provides mechanical and genome protection. Mutat. Res. 2020, 821, 111712. [Google Scholar] [CrossRef] [PubMed]
  164. Fischer, T.; Hayn, A.; Mierke, C.T. Effect of Nuclear Stiffness on Cell Mechanics and Migration of Human Breast Cancer Cells. Front. Cell Dev. Biol. 2020, 8, 393. [Google Scholar] [CrossRef]
  165. Meghani, K.; Folgosa Cooley, L.; Piunti, A.; Meeks, J.J. Role of chromatin modifying complexes and therapeutic opportunities in bladder cancer. Bladder Cancer 2022, 8, 101–112. [Google Scholar] [CrossRef]
  166. Yun, J.W.; Song, S.H.; Kim, H.P.; Han, S.W.; Yi, E.C.; Kim, T.Y. Dynamic cohesin-mediated chromatin architecture controls epithelial–mesenchymal plasticity in cancer. EMBO Rep. 2016, 17, 1343–1359. [Google Scholar] [CrossRef]
  167. Hao, Y.; Baker, D.; ten Dijke, P. TGF-β-mediated epithelial-mesenchymal transition and cancer metastasis. Int. J. Mol. Sci. 2019, 20, 2767. [Google Scholar] [CrossRef]
  168. Dramiński, M.; Rada-Iglesias, A.; Enroth, S.; Wadelius, C.; Koronacki, J.; Komorowski, J. Monte Carlo feature selection for supervised classification. Bioinformatics 2007, 24, 110–117. [Google Scholar] [CrossRef]
  169. Kuleshov, M.V.; Jones, M.R.; Rouillard, A.D.; Fernandez, N.F.; Duan, Q.; Wang, Z.; Koplev, S.; Jenkins, S.L.; Jagodnik, K.M.; Lachmann, A.; et al. Enrichr: A comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016, 44, W90–W97. [Google Scholar] [CrossRef]
  170. Hinrichs, A.S.; Karolchik, D.; Baertsch, R.; Barber, G.P.; Bejerano, G.; Clawson, H.; Diekhans, M.; Furey, T.S.; Harte, R.A.; Hsu, F.; et al. The UCSC Genome Browser Database: Update 2006. Nucleic Acids Res. 2006, 34, D590–D598. [Google Scholar] [CrossRef]
  171. Quinlan, A.R.; Hall, I.M. BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 2010, 26, 841–842. [Google Scholar] [CrossRef]
  172. Weisenberger, D.J.; Van Den Berg, D.; Pan, F.; Berman, B.P.; Laird, P.W. Comprehensive DNA Methylation Analysis on the Illumina ® Infinium ® Assay Platform. 2008. Available online: https://www.illumina.com/content/dam/illumina-marketing/documents/products/appnotes/appnote_dna_methylation_analysis_infinium.pdf (accessed on 21 September 2024).
  173. Pian, C.; Zhang, G.; Gao, L.; Fan, X.; Li, F. miR+Pathway: The integration and visualization of miRNA and KEGG pathways. Brief. Bioinform. 2019, 21, 699–708. [Google Scholar] [CrossRef] [PubMed]
  174. Chen, Y.; Wang, X. miRDB: An online database for prediction of functional microRNA targets. Nucleic Acids Res. 2020, 48, D127–D131. [Google Scholar] [CrossRef] [PubMed]
  175. Szklarczyk, D.; Kirsch, R.; Koutrouli, M.; Nastou, K.; Mehryary, F.; Hachilif, R.; Gable, A.L.; Fang, T.; Doncheva, N.T.; Pyysalo, S.; et al. The STRING database in 2023: Protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 2022, 51, D638–D646. [Google Scholar] [CrossRef]
  176. Pongubala, J.M.R.; Murre, C. Spatial organization of chromatin: Transcriptional control of adaptive immune cell development. Front. Immunol. 2021, 12, 633825. [Google Scholar] [CrossRef]
  177. Kulakovskiy, I.V.; Vorontsov, I.E.; Yevshin, I.S.; Sharipov, R.N.; Fedorova, A.D.; Rumynskiy, E.I.; Medvedeva, Y.A.; Magana-Mora, A.; Bajic, V.B.; Papatsenko, D.A.; et al. HOCOMOCO: Towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis. Nucleic Acids Res. 2017, 46, D252–D259. [Google Scholar] [CrossRef]
  178. Stojnic, R.; Diez, D. PWMEnrich: PWM Enrichment Analysis. Bioconductor 2024. Available online: https://bioconductor.org/packages/release/bioc/html/PWMEnrich.html (accessed on 9 August 2024).
  179. Grant, C.E.; Bailey, T.L.; Noble, W.S. FIMO: Scanning for occurrences of a given motif. Bioinformatics 2011, 27, 1017–1018. [Google Scholar] [CrossRef]
  180. Mahony, S.; Benos, P.V. STAMP: A web tool for exploring DNA-binding motif similarities. Nucleic Acids Res. 2007, 35, W253–W258. [Google Scholar] [CrossRef]
  181. Rodchenkov, I.; Babur, O.; Luna, A.; Aksoy, B.A.; Wong, J.V.; Fong, D.; Franz, M.; Siper, M.C.; Cheung, M.; Wrana, M.; et al. Pathway Commons 2019 Update: Integration; analysis and exploration of pathway data. Nucleic Acids Res. 2019, 48, D489–D497. [Google Scholar] [CrossRef]
  182. Szałaj, P.; Tang, Z.; Michalski, P.; Pietal, M.J.; Luo, O.J.; Sadowski, M.; Li, X.; Radew, K.; Ruan, Y.; Plewczynski, D. An integrated 3-Dimensional Genome Modeling Engine for data-driven simulation of spatial genome organization. Genome Res. 2016, 26, 1697–1709. [Google Scholar] [CrossRef]
Figure 1. Overview of mRNAs indicated as significant in distinguishing cancer from normal tissue samples. (A) The volcano plot shows differences in the expression levels of 590 mRNA considered significant in the cancer/normal prediction in the feature selection set (adjusted raw p-value < 0.05 and log2FC = ±1). (B) Enriched pathways from the Reactome pathway database for down-regulated genes. (C) Enriched pathways from the Reactome pathway database for over-expressed genes. To allow for better readability, the number of less enriched pathways in the graph was reduced with the cutoff q-value = 0.001 (all terms for cutoff q-value = 0.05 are available in Table S2).
Figure 1. Overview of mRNAs indicated as significant in distinguishing cancer from normal tissue samples. (A) The volcano plot shows differences in the expression levels of 590 mRNA considered significant in the cancer/normal prediction in the feature selection set (adjusted raw p-value < 0.05 and log2FC = ±1). (B) Enriched pathways from the Reactome pathway database for down-regulated genes. (C) Enriched pathways from the Reactome pathway database for over-expressed genes. To allow for better readability, the number of less enriched pathways in the graph was reduced with the cutoff q-value = 0.001 (all terms for cutoff q-value = 0.05 are available in Table S2).
Ijms 26 06558 g001
Figure 2. Characteristics of 2006 significant DNA methylation sites (DMSs). (A) Distribution of loci included in the Illumina 450K array in comparison to the distribution of DMSs. (B) DMSs’ β-values distribution in the tumor and normal samples with respect to the specific genomic regions. (C) DMSs assigned to hyper/medium/hypo-methylated with respect to log2 Fold Change in β-values. (D) Number of loci with hyper/medium/hypo-methylated DNA in particular types of genomic regions. Significance coding used throughout the panels (A) and (B): ns p > 0.05 (not significant); * p ≤ 0.05; ** p ≤ 0.01; **** p ≤ 0.0001.
Figure 2. Characteristics of 2006 significant DNA methylation sites (DMSs). (A) Distribution of loci included in the Illumina 450K array in comparison to the distribution of DMSs. (B) DMSs’ β-values distribution in the tumor and normal samples with respect to the specific genomic regions. (C) DMSs assigned to hyper/medium/hypo-methylated with respect to log2 Fold Change in β-values. (D) Number of loci with hyper/medium/hypo-methylated DNA in particular types of genomic regions. Significance coding used throughout the panels (A) and (B): ns p > 0.05 (not significant); * p ≤ 0.05; ** p ≤ 0.01; **** p ≤ 0.0001.
Ijms 26 06558 g002
Figure 3. Distribution of cytosines across chromatin states obtained for the MCF-7 breast cancer cell line. (A) Number of DMS in individual chromatin states. (B) Illumina 450K sites assigned to individual chromatin states, representing the background distribution. (C) Differential distribution of hypo- and hyper-methylated DMS in specific chromatin states.
Figure 3. Distribution of cytosines across chromatin states obtained for the MCF-7 breast cancer cell line. (A) Number of DMS in individual chromatin states. (B) Illumina 450K sites assigned to individual chromatin states, representing the background distribution. (C) Differential distribution of hypo- and hyper-methylated DMS in specific chromatin states.
Ijms 26 06558 g003
Figure 4. Summary of mass MCFS-ID experiments on 590 mRNA genes. (A) Distribution of the number of significant miRNA features returned across 590 experiments. (B) Distribution of the number of significant DNA methylation loci returned across 590 experiments. (C) Distribution of the Pearson correlations obtained for linear models built on significant miRNA features returned across 590 experiments (D) Distribution of the Pearson correlations obtained for linear models built on significant DNA methylations returned across 590 experiments.
Figure 4. Summary of mass MCFS-ID experiments on 590 mRNA genes. (A) Distribution of the number of significant miRNA features returned across 590 experiments. (B) Distribution of the number of significant DNA methylation loci returned across 590 experiments. (C) Distribution of the Pearson correlations obtained for linear models built on significant miRNA features returned across 590 experiments (D) Distribution of the Pearson correlations obtained for linear models built on significant DNA methylations returned across 590 experiments.
Ijms 26 06558 g004
Figure 5. TF motifs overlapping differentially methylated cytosines. (A) TF motifs overlapping hyper-methylated DMS. (B) TF motifs overlapping hypo-methylated DMS. In (A,B), the red horizontal line indicates the p-value cut-off point. (C) Hierarchical clustering of TF motifs based on their PWMs. (D) Functional analysis of genes encoding TFs whose motifs overlapped DMSs hyper-methylated in cancer (KEGG database). The list of genes related to specific terms is shown in Table S11.
Figure 5. TF motifs overlapping differentially methylated cytosines. (A) TF motifs overlapping hyper-methylated DMS. (B) TF motifs overlapping hypo-methylated DMS. In (A,B), the red horizontal line indicates the p-value cut-off point. (C) Hierarchical clustering of TF motifs based on their PWMs. (D) Functional analysis of genes encoding TFs whose motifs overlapped DMSs hyper-methylated in cancer (KEGG database). The list of genes related to specific terms is shown in Table S11.
Ijms 26 06558 g005
Figure 6. Target genes interactions, biological functions and graphical representation of their putative regulatory elements. (A) Visualization of interactions driven from linear models. (B) The network of gene–gene interactions created for the identified target genes visualized in (A). (C) KEGG pathway analysis for 10 genes highlighted in (B). (D) ID-graph for NKAPL gene.
Figure 6. Target genes interactions, biological functions and graphical representation of their putative regulatory elements. (A) Visualization of interactions driven from linear models. (B) The network of gene–gene interactions created for the identified target genes visualized in (A). (C) KEGG pathway analysis for 10 genes highlighted in (B). (D) ID-graph for NKAPL gene.
Ijms 26 06558 g006
Figure 7. Spatial Regulatory Model of chromatin. (A) (i) The most representative chromatin 3D computational model from the ensemble of 100 spatial models generated by the 3D-GNOME method for the FXYD1 gene with labeled promoter (blue), gene body (yellow) cg23866403 methylation loci (purple) and potential enhancer region (orange). (ii) The box plots show cancer and healthy samples FXYD1 expression (left) and cg23866403 loci methylation levels (right). (iii) The spatial distance distribution between the FXYD1 gene promoter and its enhancer region (left) and the cg23866403 methylation loci (right). (B) (i) Cohesin-mediated chromatin interactions around the NKAPL gene in the integrative genomics viewer for hTERT-HME1 (healthy) and MCF-7 (cancer) cell lines. Green color annotates enhancer–promoter loops, blue color promoter–promoter loops. (ii) The representative chromatin 3D model based on Cohesin ChIA-PET data for the NKAPL gene. (iii) The spatial distances between promoter-methylation (left) and promoter-enhancer (right) for cancer and healthy cell lines. (C) (i) PCHi-C interactions around the NKAPL gene in the integrative genomics viewer for MCF-10A (healthy) and MCF-7 (cancer) samples. (ii) Chromatin 3D model of the NKAPL gene in MCF-10A (left) and MCF-7 (right) cell lines. (iii) The spatial Euclidean distances between the NKAPL gene body and DMS (left); the NKAPL gene body and the enhancer (right) both for MCF-7 and MCF-10A.
Figure 7. Spatial Regulatory Model of chromatin. (A) (i) The most representative chromatin 3D computational model from the ensemble of 100 spatial models generated by the 3D-GNOME method for the FXYD1 gene with labeled promoter (blue), gene body (yellow) cg23866403 methylation loci (purple) and potential enhancer region (orange). (ii) The box plots show cancer and healthy samples FXYD1 expression (left) and cg23866403 loci methylation levels (right). (iii) The spatial distance distribution between the FXYD1 gene promoter and its enhancer region (left) and the cg23866403 methylation loci (right). (B) (i) Cohesin-mediated chromatin interactions around the NKAPL gene in the integrative genomics viewer for hTERT-HME1 (healthy) and MCF-7 (cancer) cell lines. Green color annotates enhancer–promoter loops, blue color promoter–promoter loops. (ii) The representative chromatin 3D model based on Cohesin ChIA-PET data for the NKAPL gene. (iii) The spatial distances between promoter-methylation (left) and promoter-enhancer (right) for cancer and healthy cell lines. (C) (i) PCHi-C interactions around the NKAPL gene in the integrative genomics viewer for MCF-10A (healthy) and MCF-7 (cancer) samples. (ii) Chromatin 3D model of the NKAPL gene in MCF-10A (left) and MCF-7 (right) cell lines. (iii) The spatial Euclidean distances between the NKAPL gene body and DMS (left); the NKAPL gene body and the enhancer (right) both for MCF-7 and MCF-10A.
Ijms 26 06558 g007
Figure 8. Number of samples with the complete data for a given dataset type. The overlap of 381 samples that contain the complete data for all three dataset types was used in the feature selection procedure.
Figure 8. Number of samples with the complete data for a given dataset type. The overlap of 381 samples that contain the complete data for all three dataset types was used in the feature selection procedure.
Ijms 26 06558 g008
Figure 9. Selection of features identified as significant in cancer prediction (the main MCFS-ID experiment).
Figure 9. Selection of features identified as significant in cancer prediction (the main MCFS-ID experiment).
Ijms 26 06558 g009
Figure 10. Selection of miRNA genes significant in prediction of mRNA gene expression levels.
Figure 10. Selection of miRNA genes significant in prediction of mRNA gene expression levels.
Ijms 26 06558 g010
Table 1. Significant features number returned by the MCFS-ID rankings together with RF and SVM classification results.
Table 1. Significant features number returned by the MCFS-ID rankings together with RF and SVM classification results.
Joined-SetIndividual-SetIntersectionSumRF wAccSVM wAcc
DNA methylation15041987148520060.90680.9398
mRNA4325884305900.93470.9347
miRNA610561050.98480.9370
Table 2. The result of NLP clustering of the significant mRNA genes.
Table 2. The result of NLP clustering of the significant mRNA genes.
Cluster IDCluster SizeTop Words Associated with Genes in ClusterKeywords
Interpretation
Mean Log Fold Change for Verification SetDirection of Change in Expression
1393regulation, process, metabolism, negative, negative_regulation, response, positive, positive regulation, gene, metabolicregulation and metabolic processes1.5894over-expressed: 78
down-expressed: 313
238transport, ion, transmembrane, transmembrane_transport, calcium, abc, muscle, cardiac, ion_transmembrane, contractionion trans-
membrane transport
2.5636over-expressed: 4
down-expressed: 34
318receptor, g, coupled, g_protein, protein_coupled, gpcr, coupled_receptor, receptors, protein, ligandreceptor proteins2.8793over-expressed: 0
down-expressed: 18
424transcription, polymerase, rna_polymerase, rna, polymerase_ii, ii, regulation_transcription, transcription_rna, differentiation, developmenttranscription process1.5496over-expressed: 4
down-expressed: 20
531mitotic, cell_cycle, cycle, g, cell, apc, transition, apc_c, c, g_transitioncell cycle regulation−2.5362over-expressed: 28
down-expressed: 3
616golgi, transport, er, golgi_er, retrograde, vesicle, vesicle_mediated, mediated_transport, mediated, trafficgolgi apparatus related−0.9197over-expressed: 11
down-expressed: 5
74biological_process, biological, processbiological processes2.7369over-expressed: 0
down-expressed: 4
Table 3. Top 15 miRNA genes and DNA methylation loci from the mass MCFS-ID experiments.
Table 3. Top 15 miRNA genes and DNA methylation loci from the mass MCFS-ID experiments.
miRNA GeneFreqSum RIMean RIMCFS-ID
Rank
DNA
Methylation
FreqSum RIMean RIMCFS-ID
Rank
hsa_mir_1397365.6280.8991 cg0726755073.0440.4351234
hsa_mir_1417338.5190.52810 cg0091496373.0360.4342048
hsa_mir_10b7336.6900.5032 cg1953397773.0240.43225
hsa_mir_1837333.0200.4524 cg0811356272.7140.38812,061
hsa_mir_1407332.8250.45011 cg1790103872.6200.374186
hsa_mir_200a7331.1820.42715 cg1825379972.5840.3692217
hsa_mir_967328.4170.3898 cg2041795372.5040.3586490
hsa_mir_4297325.4520.34920 cg2052412872.4880.3552066
hsa_mir_2047323.5460.32312 cg1052059472.3930.3424407
hsa_mir_99a7323.0410.3166 cg2070145772.1590.3081487
hsa_mir_5927322.3520.30616 cg1600997072.0900.29913,163
hsa_mir_3787322.3200.30638 cg0697602572.0120.2873878
hsa_mir_1457322.2180.3045 cg1560126471.9930.285188
hsa_mir_217321.7000.2973 cg2260849271.9320.276151
hsa_let_7c7321.6860.29713 cg1144169371.9130.27312,369
Table 4. Input data description.
Table 4. Input data description.
Data TypeUnit of MeasurementNumber of
Total Samples
Number of Normal SamplesNumber of Features
mRNA expressionreads per kilobase million8679920,524
DNA methylationbeta-value87097396,065
miRNA expressionsreads per million miRNA mapped83286897
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jardanowska-Kotuniak, M.; Dramiński, M.; Wlasnowolski, M.; Łapiński, M.; Sengupta, K.; Agarwal, A.; Filip, A.; Ghosh, N.; Pancaldi, V.; Grynberg, M.; et al. Unveiling Epigenetic Regulatory Elements Associated with Breast Cancer Development. Int. J. Mol. Sci. 2025, 26, 6558. https://doi.org/10.3390/ijms26146558

AMA Style

Jardanowska-Kotuniak M, Dramiński M, Wlasnowolski M, Łapiński M, Sengupta K, Agarwal A, Filip A, Ghosh N, Pancaldi V, Grynberg M, et al. Unveiling Epigenetic Regulatory Elements Associated with Breast Cancer Development. International Journal of Molecular Sciences. 2025; 26(14):6558. https://doi.org/10.3390/ijms26146558

Chicago/Turabian Style

Jardanowska-Kotuniak, Marta, Michał Dramiński, Michal Wlasnowolski, Marcin Łapiński, Kaustav Sengupta, Abhishek Agarwal, Adam Filip, Nimisha Ghosh, Vera Pancaldi, Marcin Grynberg, and et al. 2025. "Unveiling Epigenetic Regulatory Elements Associated with Breast Cancer Development" International Journal of Molecular Sciences 26, no. 14: 6558. https://doi.org/10.3390/ijms26146558

APA Style

Jardanowska-Kotuniak, M., Dramiński, M., Wlasnowolski, M., Łapiński, M., Sengupta, K., Agarwal, A., Filip, A., Ghosh, N., Pancaldi, V., Grynberg, M., Saha, I., Plewczynski, D., & Dąbrowski, M. J. (2025). Unveiling Epigenetic Regulatory Elements Associated with Breast Cancer Development. International Journal of Molecular Sciences, 26(14), 6558. https://doi.org/10.3390/ijms26146558

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop