Chromatin Remodeling Enzyme Cluster Predicts Prognosis and Clinical Benefit of Therapeutic Strategy in Breast Cancer

The treatment provided for breast cancer depends on the expression of hormone receptors, human epidermal growth factor receptor-2 (HER2), and cancer staging. Surgical intervention, along with chemotherapy or radiation therapy, is the mainstay of treatment. Currently, precision medicine has led to personalized treatment using reliable biomarkers for the heterogeneity of breast cancer. Recent studies have shown that epigenetic modifications contribute to tumorigenesis through alterations in the expression of tumor suppressor genes. Our aim was to investigate the role of epigenetic modifications in genes involved in breast cancer. A total of 486 patients from The Cancer Genome Atlas Pan-cancer BRCA project were enrolled in our study. Hierarchical agglomerative clustering analysis further divided the 31 candidate genes into 2 clusters according to the optimal number. Kaplan–Meier plots showed worse progression-free survival (PFS) in the high-risk group of gene cluster 1 (GC1). In addition, the high-risk group showed worse PFS in GC1 with lymph node invasion, which also presented a trend of better PFS when chemotherapy was combined with radiotherapy than when chemotherapy was administered alone. In conclusion, we developed a novel panel using hierarchical clustering that high-risk groups of GC1 may be promising predictive biomarkers in the clinical treatment of patients with breast cancer.


Introduction
Breast cancer is the most prevalent cancer globally, with 2.26 million new cases diagnosed in 2020 [1]. Breast cancer treatment includes surgery, chemotherapy, radiotherapy, and targeted therapy, which are comprehensively administered according to tumor size, lymph node status, and the expression of hormone receptors and human epidermal growth factor receptor-2 (HER2). For non-metastatic breast cancer, which at the time of diagnosis accounts for approximately 90% of patients, the treatment goals are tumor elimination and recurrence prevention [2]. transcriptional regulators that control gene expression by modifying histones and other proteins [24]. Aberrant expression of these genes has been linked to the development and progression of breast cancer [19]. Elevated levels of HDAC and DNMT have been associated with poor prognosis and resistance to chemotherapy and radiotherapy in breast cancer patients [25]. On the other hand, decreased expression of PRDM and PRMT genes has been linked to poor prognosis and resistance to therapy [24].
It is thought that changes in the expression levels of these genes alter the expression of critical genes involved in cell cycle regulation, DNA damage repair, and apoptosis. These alterations may affect the sensitivity of cancer cells to chemotherapy and radiotherapy and contribute to the development of resistance. However, the mechanism behind the association of these genes with breast cancer patient outcomes and response to therapy is complex and not fully understood. In the past decade, the direction of cancer treatment is from a "one-size-fits-all" approach turn into focusing on the precision medicine based on genomic variants [26]. Therefore, we focused on common, well-known 5 proteins of DNMT family, 3 proteins of HDAC family, 15 proteins of PRDM family, and 8 proteins of PRMT family to discover an effective prognostic or predictive factor focusing on epigenetic processing of breast cancer in chemotherapy and radiotherapy settings using hierarchical agglomerative clustering analysis.

Baseline Characteristics
A total of 486 patients were included in this study. Table 1 shows the baseline characteristics of the study groups. The mean age of the patients was 55.9 years old and 68% were aged >50 years. Among all the patients, 106 (22%) had TNBC and 380 (78%) had non-TNBC. A total of 261 patients (54%) presented with lymph node invasion. According to the eighth edition of the AJCC TNM system, 300 patients (62%) had pathological stage II disease, 87 (18%) had stage I disease, and 99 (20%) had stage III disease. Of the patients, 305 (63%), 386 (79%), and 253 (52%) received radiotherapy, chemotherapy, and a combination of the two, respectively. In total, 29 (6.0%) of the patients died, and disease progression was observed in 45 (9.3%) patients.

Hierarchical Clustering Analysis
The mRNA expression of the 31 candidate genes in the study population is shown in Table 2 and normalized using a heatmap ( Figure 1). The median and range of each gene were calculated and compared with those of normal tissue. High and low expression of each gene are represented with a red-to-green color bar among the total 486 patients in the TNBC and non-TNBC groups. Figure 2 shows a heatmap of the correlations among the 31 candidate genes in all study populations. A high-intensity positive correlation (blue) was observed between DNMT3B/DNMT3A, HDAC2/DNMT3A, PRDM2/PRDM10, PRDM4/PRDM10, PRDM8/PRDM1, PRMT10/PRDM10, PRMT10/PRDM2, and PRMT7/ PRMT1. A strong negative correlation (red) was observed between PRMT1/PRDM10, PRMT1/PRDM2, PRMT1/PRDM5, PRMT7/PRDM10, and PRMT10/PRMT1. Figure 3 shows the results of the hierarchical clustering analysis. According to the silhouette index, the optimal number of clusters was two ( Figure 3A). A dendrogram of the target genes is shown in Figure 3B. All 31 target genes were assigned to 2 gene clusters according to the optimal number of clusters, and are illustrated in a cluster plot in Figure 3C.

High-and Low-Risk Groups Clinical Manifestation
The distributions of the baseline characteristics according to hierarchical clusters are summarized in Table S2. The distribution of TNBC was significantly different between the low-and high-risk groups for both GCs (p < 0.001). There were no significant differences in lymph node invasion, stage, or radiotherapy between the low-and high-risk groups within the two GCs. More than 90% of high-risk patients in GC1 received chemotherapy, with a significant difference (p < 0.001). Progressive disease was more frequent in the high-risk group of GC1 compared with the low-risk group (16 [16.5%] vs. 29 [7.5%], p = 0.006).

Progression-Free Survival Is Lower in the High-Risk Group of GC1
The Kaplan-Meier plot illustrating the 10-year PFS is shown in Figure 4. The patients were further divided into low-and high-risk subgroups in both gene clusters according to the risk estimation analysis. The results showed significant differences between the lowand high-risk groups in GC1 (p = 0.009), with a lower PFS in the high-risk group. However, no significant differences were observed in GC2 (p = 0.543). In addition, the high-risk GC1 group with lymph node invasion showed worse PFS than the low-risk group (p = 0.003), as illustrated in Figure 5. The estimated risk subgroup of GC1 revealed a worse PFS at high risk of chemotherapy alone (p = 0.018). Interestingly, there was a trend toward better PFS in the high-risk subgroup of GC1 in the chemotherapy combined with radiotherapy setting, but the difference was not significant (p = 0.241) ( Figure 6). Additionally, chemotherapy with radiotherapy seemed to have a better PFS in the high-risk GC1 group than in the low-risk group, although the difference was not significant (p = 0.214) ( Figure S1).

Higher mRNA Expression of Target Genes in High-Risk Groups
The high-risk subgroups in GC1 and GC2 revealed higher mRNA expression of almost all target genes, as shown in Tables 3 and S1, with significant differences (p < 0.05),

mRNA Expression and Progression-Free Survival Analysis of GC1 High-Risk Group Target Genes According to TNBC and Non-TNBC Subgroups
The boxplot of the 20 target genes of the high-risk group in GC1 demonstrated significant differences in mRNA expression in the TNBC and non-TNBC groups, except for HDAC3, PRDM14, PRDM9, and PRMT6 ( Figure S2). PFS was inferior in the high-risk group of GC1 in the TNBC subgroup analysis, although the difference was not significant (p = 0.061) ( Figure S3). Regarding lymph node status, there was a trend for worse PFS in the high-risk group for GC1 in TNBC with lymph node invasion ( Figure S4). Combined chemotherapy with radiotherapy seemed to result in better PFS in the TNBC subgroup with high-risk GC1 genes, as seen in the lymph node invasion analysis ( Figures S5-S8).

Discussion
Breast cancer is a complex and multifactorial disease and the most common cancer in women worldwide [27]. Currently, its diagnosis relies on analysis of clinical manifestations, imaging scans, and immunohistopathological examinations. Among all breast cancer subtypes, TNBC has a poor prognosis and high metastatic ability owing to its heterogeneity [28]. Conventional systemic chemotherapy remains the mainstay of treatment. Radiotherapy is reserved for breast conservative surgery or post-mastectomy high-risk patients as a local, regional control. Recently, some targeted therapies and the immune checkpoint inhibitors were developed for advanced TNBC treatment [29,30]. However, not all TNBC patients respond to these treatments [31]. Therefore, it still needs to find more precise treatments for individuals.
In the present study, a serial comprehensive analysis was conducted based on hierarchical clustering results. Thirty-one genes were divided into two clusters, thereby fulfilling the optimal number. We found that the high-risk group in GC1 had a higher proportion of TNBCs than the low-risk group. Moreover, patients in the high-risk subgroup of GC1 displayed a significantly lower PFS than those in the low-risk subgroup (p = 0.009). Another interesting finding was that combining chemotherapy with radiotherapy might result in a better survival rate in the high-risk subgroup of GC1. Hence, the panel of high-risk gene expression in GC1 may play a role as a promising predictive index and may also be effective in predicting treatment response to chemotherapy combined with radiotherapy in patients with TNBC. The risk estimation effects of GC1 maintained a consistent threshold in the TNBC and TNBC-LN-positive subgroups. Although no significant differences were found, the high-risk subgroup was more likely to achieve a poor PFS than the low-risk subgroup. The marginal effects of statistical significance may be because of the restricted sample size.
Epigenetic processes play important roles in carcinogenesis. The most well-known and comprehensive study of epigenetic mechanisms involves DNA methylation. Alterations in DNA methylation can lead to the development of cancer. Previous study indicated that DNMT1, DNMT3A, and DNMT3B are upregulated in breast tumor tissues compared with normal breast tissues [32]. In addition, higher expression of DNMT1 and DNMT3A was found in TNBC than in other breast cancer subtypes. Consistent with these findings, we found that the target genes were associated with epigenetic processing.
One of the key mechanisms involved in the regulation of histone modification is histone acetylation and deacetylation by HAT and HDAC, respectively. Several studies have identified deregulation of HDAC, particularly in breast cancer. Previous study found that HDAC1 and HDAC3 were upregulated in E2F1 and E2F4, which are responsible for the downregulation of ARH1, a maternally imprinted tumor suppressor gene [33]. Another comparative study demonstrated that global hypoacetylation of H4 was observed in the progression from normal epithelium to ductal carcinoma in situ to invasive ductal carcinoma [34]. However, the abovementioned studies only focused on analyzing genes or proteins from one of the chromatin-modifying enzymes individually, which were different to this study.
Protein arginine methyltransferase (PRMT) is a histone methyltransferase (HMT) that methylates glycine-and arginine-rich motifs (GAR motifs). PRMTs are classified into three groups according to their differences in catalytic activity: type I (PRMT1, 2, 3, 6, and 8, CARM1 [PRMT4]), type II (PRMT5 and 9), and type III (PRMT7). Next, we focused on discussing the relationship among PRMT2, PRMT8, PRMT7, and previous findings. The mRNA expression analysis of PRMT2 showed a negative median range in patients with breast cancer, and the high-risk group of GC1 reflected disputable consequences in this study. Previous studies also indicated that the role of PRMT2 in breast cancer progression remains controversial. A study suggested that spliced variants of PRMT2 are capable of binding to ERα, which further promotes EHMTRα-mediated EMT-related genes E-cadherin and Snail [35]. However, other studies indicated that the loss of PRMT2 may be essential for increasing the invasiveness of breast cancer [36,37]. Therefore, it is necessary to further study the function of PRMT2 in breast cancer. PRMT8 is an enzyme whose exact mechanism of action in breast cancer is poorly understood. Previous study indicated that PRMT8 was associated with a poor prognosis via z-score analysis, whereas the Kaplan-Meier plot showed a protective trend for breast cancer [38]. Another study observed that high PRMT8 expression correlated with worse survival in gastric cancer; however, increased survival was observed in patients with breast and ovarian cancer. These contrasting results may be attributed to variant-specific expression of PRMT8, including the novel PRMT8 variant 2 [39]. This may explain why PRMT8 in GC1 was negatively expressed in both the highand low-risk subgroups. E-cadherin, an essential biomarker of EMT, is downregulated by PRMT7 at its promoter, with elevated H4R3me3 and decreased H3K4me3, H3, and H4Ac. PRMT7 interferes with E-cadherin expression through the PRMT7-YY1-HDAC3 complex, thus inducing cell migration and invasion in the MDA-MB-231 TNBC cell line [40]. However, the results of our study showed higher expression of PRMT7 in the low-risk subgroup of GC1, which may indicate an insidious pathway or variations in PRMT7 enzyme activity.
To date, the PRDM families were found to contain 19 proteins that presented with a PR domain in the N-terminal region and variable numbers of zinc finger repeats [24]. Hierarchical clustering analysis selected 6 PRDMs (PRDM7, 9,12,13,14,15) in GC1, and half of them showed negative expression (PRDM7, 9,14), whereas the rest showed positive expression (PRDM12, 13, 15) in the high-risk group. However, studies of PRDM7 and carcinogenesis are limited. Sorrentino et al. observed that PRDM7 was overexpressed in hepatocarcinoma specimens through pan-cancer reanalysis of TCGA datasets. PRDM9 has been reported to interfere with homologous recombination by binding DNA using its zinc fingers, and H3K4/36me3 occurs in surrounding nucleosomes [41]. It involves overlapping mechanisms of DSB during meiosis and mitosis [42,43]. Mutations in PRDM9 have yet to be correlated with some solid tumors, such as head and neck squamous cell carcinoma and bladder cancer [44,45]. Previous studies suggested that PRDM12 may act as a tumor suppressor in patients with CML [46]. However, several studies indicated that PRDM12 is upregulated in prostate and colon cancers compared to normal tissues [47]. PRDM13 acts as an anti-proliferation regulator by upregulating INCA1, a CDK inhibitor, and ADAMTS12, thereby inhibiting cell invasion. Overexpression of PRDM13 in U87 glioma cells significantly inhibits cell migration and invasion by interacting with deleted liver cancer 1 (DLC1) and ARHGAP30 (Rho GTPase-activating protein 30) genes [48]. However, pan-cancer reanalysis has shown high PRDM13 overexpression in many cancers. These results are consistent with our study, in which PRDM13 expression was higher in the high-risk subgroup of GC1. PRDM15 could be a key regulator in diffuse large B-cell lymphoma (DLBCL), which sustained the activity of the PI3K/AKT/mTOR pathway and glycolysis [49]. In addition, PRDM15 could also stabilize the naïve pluripotency of embryonic development by modulating the transcription of upstream regulators of WNT and MAPK-ERK signaling [50]. Previous studies found that there is elevated expression of PRDM14 in breast cancer, whereas low or no expression was found in non-tumorous cells. Moreover, its amplification has been correlated with more aggressive cases of breast cancer, including high-grade and distant metastases. Knockdown of PRDM14 suppressed cell growth by inducing apoptosis and increasing sensitivity to chemotherapy agents [51][52][53]. However, PRDM14 mRNA expression was negative for both high-and low-risk GC1 in our study. Evidence has shown that significant PRDM14 methylation occurs in highgrade non-muscle invasive bladder, colon, and lung cancers. PRDM14 s role as a tumor suppressor-and whether its variant isoform exists-remains unknown [54][55][56].
Based on the provided information, our study found that the expression of a combination of genes, including DNMT1, DNMT2, DNMT3A, DNMT3B, DNMT3L, HDAC1, HDAC2, HDAC3, PRDM7, PRDM9, PRDM12, PRDM13, PRDM14, PRDM15, PRMT1, PRMT2, PRMT5, PRMT6, PRMT7, and PRMT8, could be associated with an increased risk of poor PFS in patients with lymph node invasion. Additionally, the study suggested that the combination of chemotherapy and radiotherapy may be more effective in improving PFS compared to chemotherapy alone. We also tried to determine crosstalk between pathways with four gene families, including DNMT, HDAC, PRMT, and PRDM, which were illustrated using Ingenuity pathway analysis (IPA) systems ( Figure S9). The possible mechanisms underlying these findings are not clear from the information provided, but some of the genes identified in the study are known to be involved in epigenetic regulation and protein modification. It is possible that the combination of these genes could be involved in the regulation of key pathways involved in cancer progression, such as cell proliferation, apoptosis, and DNA repair. Further studies would be needed to confirm these findings and to investigate the underlying mechanisms in more detail.
The limitations of our study are the relatively small sample size and the lack of clinical practice. Nonetheless, the study gives us an indication that epigenetic modifications may play a crucial role in breast cancer tumorigenicity and could be a therapeutic target in the future.

Data Source
All data were downloaded from the TCGA Pan-Cancer BRCA project via the cBioPortal platform [57,58]. This study included patients with non-metastatic breast cancer diagnosed as invasive ductal carcinoma by histological confirmation. Baseline characteristics included age at diagnosis, subtype, stage of tumor size, lymph node invasion status, pathological stage, radiotherapy, and chemotherapy. Progression-free survival (PFS) was considered as the primary endpoint of the study population. All patients were tracked from the date of initial diagnosis to the date of disease progression, metastasis, or the end of the study.

Messenger RNA (mRNA) Expression
The differential expression of candidate genes was estimated using raw digital gene expression counts, which were normalized using the variation of the reads/Kb/Million (RPKM). mRNA expression indicates the RPKM-normalized log2-transformed differential expression profiles (comparing the expression z-scores of tumor samples and adjacent normal samples in the study cohort) for the candidate genes. The mRNA expression of the candidate genes in all patients was visualized using a heatmap and the correlation between candidate genes were illustrated using a heatmap and tested using Pearson's correlation test.

Hierarchical Clustering
The mRNA expression levels of candidate genes were normalized into a range of 0 to 1. The average silhouette width in k clusters was calculated to determine the optimal number of clusters for the candidate genes. Hierarchical clustering was conducted to generate dendrograms for candidate genes according to the gene similarity in mRNA expression. The greatest silhouette width was considered the optimal number of clusters to ensure the greatest dissimilarity between the identified n gene clusters. The cluster plot of candidate genes generated according to the assigned n gene clusters showed the similarity of the clustered genes in a two-dimensional distribution. Next, mRNA expression in the n gene cluster was used to dichotomize the study population into two risk estimation subgroups according to PFS status. The cutoff criteria for the two risk estimation subgroups were determined based on the distance metrics computed using a hierarchical algorithm based on the similarity of correspond mRNA expression. The computed distances were then used to establish cutoff values that separated the study population into two risk estimation subgroups, the subgroup with a higher proportion of disease progression was classified as the high-risk group, while the subgroup with a lower proportion was classified as the low-risk group.

Statistical Analysis
The baseline characteristics of all patients and the risk estimation subgroup determined by each gene cluster were summarized as frequency and percentage, and age at diagnosis was summarized by mean and standard deviation. The difference in baseline characteristic distribution between the risk estimation subgroups was estimated using an independent two-sampled t-test or chi-squared test. The mRNA expression of all candidate genes in all patients and the risk estimation subgroup were summarized using medians and ranges. The difference in mRNA expression between the risk-estimation subgroups was estimated using the Wilcoxon rank-sum test. The PFS between risk estimation subgroups in all patients, different lymph node invasion status, and treatment groups was estimated using the Kaplan-Meier method, and the survival difference between subgroups was tested using the log-rank test. All p-values were two-sided, and statistical significance was set at p < 0.05. All analyses were performed using the R 4.0.5 software (R Core Team, Vienna, Austria, 2021).

Conclusions
In this study, we aimed to identify an optimal predictive biomarker for TNBC. First, we searched the TCGA database to select 486 patients, and 31 candidate genes were identified. The hierarchical clustering analysis represented two clusters according to the optimal number. GC1 was significantly associated with patients with breast cancer. Second, Kaplan-Meier analysis showed shorter PFS in the high-risk group of GC1 and lymph node invasion subgroups. The trend of improved PFS in the chemotherapy combined