Identification of Hub Genes and Biological Mechanisms Associated with Non-Alcoholic Fatty Liver Disease and Triple-Negative Breast Cancer

The relationship between non-alcoholic fatty liver disease (NAFLD) and triple-negative breast cancer (TNBC) has been widely recognized, but the underlying mechanisms are still unknown. The objective of this study was to identify the hub genes associated with NAFLD and TNBC, and to explore the potential co-pathogenesis and prognostic linkage of these two diseases. We used GEO, TCGA, STRING, ssGSEA, and Rstudio to investigate the common differentially expressed genes (DEGs), conduct functional and signaling pathway enrichment analyses, and determine prognostic value between TNBC and NAFLD. GO and KEGG enrichment analyses of the common DEGs showed that they were enriched in leukocyte aggregation, migration and adhesion, apoptosis regulation, and the PPAR signaling pathway. Fourteen candidate hub genes most likely to mediate NAFLD and TNBC occurrence were identified and validation results in a new cohort showed that ITGB2, RAC2, ITGAM, and CYBA were upregulated in both diseases. A univariate Cox analysis suggested that high expression levels of ITGB2, RAC2, ITGAM, and CXCL10 were associated with a good prognosis in TNBC. Immune infiltration analysis of TNBC samples showed that NCF2, ICAM1, and CXCL10 were significantly associated with activated CD8 T cells and activated CD4 T cells. NCF2, CXCL10, and CYBB were correlated with regulatory T cells and myeloid-derived suppressor cells. This study demonstrated that the redox reactions regulated by the NADPH oxidase (NOX) subunit genes and the transport and activation of immune cells regulated by integrins may play a central role in the co-occurrence trend of NAFLD and TNBC. Additionally, ITGB2, RAC2, and ITGAM were upregulated in both diseases and were prognostic protective factors of TNBC; they may be potential therapeutic targets for treatment of TNBC patients with NAFLD, but further experimental studies are still needed.


Introduction
Breast cancer is the most commonly diagnosed cancer and the leading cause of cancer death among women [1]. Triple-negative breast cancer (TNBC), defined by estrogen receptor (ER)-negative, progesterone receptor (PR)-negative, and human epidermal growth factor receptor-2 (HER2)-negative histological presentation, accounts for approximately 20% of all breast cancer cases [2,3]. In contrast to other breast cancer types, TNBC has a more aggressive expression profile (high p53 and Ki67 and low Bcl-2 expression), large tumor size, and high histological grade, and is associated with an increased risk of early relapse and poor prognosis [4]. Many potential risk factors for TNBC have been reported, including non-modifiable factors such as age, sex, race, genetic mutations, breast tissue

Differentially Expressed Gene (DEG) Selection
DEGs were extracted and analyzed separately using the R package "limma". The fold changes (FCs) were calculated for individual gene expression levels. Genes meeting specific cut-off criteria of p-value < 0.05 and |logFC| > with [mean|logFC|) + 2 × sd(|logFC|)] were defined as DEGs. The overlapping DEGs between NAFLD and TNBC were delineated using the R package "ggVennDiagram". These common DEGs with consistent upregulation or downregulation trends were retained for subsequent analysis.

Functional Classification and Pathway Enrichment of DEGs
The above overlapping DEGs were submitted to Gene Ontology (GO) functional enrichment analysis, which consisted of biological process (BP), cellular component (CC), and molecular function (MF) analyses, and to Kyoto Encyclopedia of Genes and Genomes (KEGG) signaling pathway enrichment analysis using the R package "cluster Profiler". The enriched GO terms and KEGG pathways with an adjusted p-value < 0.1 were selected.

Protein-Protein Interaction (PPI) Establishment and Hub Gene Identification
To further explore the interactions among the common genes obtained as described above, the Search Tool for the Retrieval of Interacting Genes (STRING) (http://string-db.org/, accessed on 12 November 2022) was used for PPI network construction. Subsequently, Cytoscape software was used to visualize the PPI network. The Cytoscape plug-in Minimal Common Oncology Data Elements (MCODE, http://apps.cytoscape.org/apps/mcode, accessed on 12 November 2022) was used to screen out key protein expression molecules and multiple topological analysis algorithms in the cytoHubba plug-in (http://hub.iis. sinica.edu.tw/cytohubba/, accessed on 12 November 2022), such as MCC, MNC, Degree, and EPC, were used to screen the hub genes in the PPI network.

Hub Gene Expression Validation and Prognostic Analysis
The expression levels of the identified hub genes were validated in 132 TNBC samples and 113 controls from the TCGA cohort, and 18 NASH samples and 14 controls from the GEO database. The Wilcoxon test was used to compare the data between the two groups, and a two-sided p-value < 0.05 was considered significant. According to the median expression level for each gene, the TNBC samples were divided into high-or lowexpression groups. Survival analysis was performed using univariate and multivariate Cox regression hazard analysis, providing hazard ratios (HRs) and 95% confidence intervals (CIs), and survival curves were derived using Kaplan-Meier (KM) survival analysis with log-rank tests for comparison.

Immune Infiltration Analysis
The ssGSEA (single-sample gene set enrichment analysis) algorithm is a rank-based method that defines a score representing the degree of absolute enrichment of a particular gene set in each sample [28,29]. The ssGSEA score utilized immune-cell-markerassociated gene sets (http://cis.hku.hk/TISIDB/data/download/CellReports.txt, accessed on 20 November 2022) to quantify the infiltration of immune cells in TNBC tissue and determine the level of immune infiltration in each sample. Pearson's correlation analysis was used to reveal the relationships between hub genes and immune cells.

DEG Identification in NASH and TNBC
In the NASH and control groups in the GSE63067 dataset, there were 498 up-DEGs and 163 down-DEGs screened with a logFC threshold of 0.472 ( Figure 1A). In the TNBC and control groups in the GSE38959 dataset, there were 453 up-DEGs and 422 down-DEGs screened with a logFC threshold of 1.834 ( Figure 1B). A Venn diagram was used to determine the intersection and 42 common DEGs were identified ( Figure 1C). After excluding genes with opposite expression trends, 27 DEGs with the same expression trends were found, including 19 common upregulated genes and 8 common downregulated genes (Table S1).
Cox regression hazard analysis, providing hazard ratios (HRs) and 95% confidence intervals (CIs), and survival curves were derived using Kaplan-Meier (KM) survival analysis with log-rank tests for comparison.

Immune Infiltration Analysis
The ssGSEA (single-sample gene set enrichment analysis) algorithm is a rank-based method that defines a score representing the degree of absolute enrichment of a particular gene set in each sample [28,29]. The ssGSEA score utilized immune-cell-marker-associated gene sets (http://cis.hku.hk/TISIDB/data/download/CellReports.txt, accessed on 20 November 2022) to quantify the infiltration of immune cells in TNBC tissue and determine the level of immune infiltration in each sample. Pearson's correlation analysis was used to reveal the relationships between hub genes and immune cells.

DEG Identification in NASH and TNBC
In the NASH and control groups in the GSE63067 dataset, there were 498 up-DEGs and 163 down-DEGs screened with a logFC threshold of 0.472 ( Figure 1A). In the TNBC and control groups in the GSE38959 dataset, there were 453 up-DEGs and 422 down-DEGs screened with a logFC threshold of 1.834 ( Figure 1B). A Venn diagram was used to determine the intersection and 42 common DEGs were identified ( Figure 1C). After excluding genes with opposite expression trends, 27 DEGs with the same expression trends were found, including 19 common upregulated genes and 8 common downregulated genes (Table S1).

GO and KEGG Enrichment Pathway Analysis of DEGs
To better understand the biological functions of the identified DEGs, GO and KEGG pathway enrichment analyses were performed. After screening with the threshold of adjusted p < 0.1, significantly enriched GO terms and KEGG terms were selected.
As shown in Figure 2, in the BP category, DEGs were mainly enriched in maintenance of location, leukocyte cell-cell adhesion, leukocyte aggregation, leukocyte migration involved in inflammatory response, protein nitrosylation, peptidyl-cysteine S-nitrosylation, and regulation of apoptotic signaling pathway. In the CC category, DEGs were principally associated with secretory granule lumen, cytoplasmic vesicle lumen, vesicle lumen, immunological synapse, collagen-containing extracellular matrix, and nuclear inner membrane. The analysis of the MF category indicated that DEGs were enriched in toll-like receptor binding, fatty acid derivative binding, fatty acid binding, monocarboxylic acid binding, microtubule binding, integrin binding, and tubulin binding. Furthermore, two KEGG pathways with significant enrichment were the peroxisome proliferator-activated receptor (PPAR) signaling pathway and the IL-17 signaling pathway. and normal samples in GSE38959; (C) Venn diagram of the common DEGS between the two upregulation and two downregulation modules in NAFLD and TNBC. DEGs, differentially expressed genes; NASH, non-alcoholic steatohepatitis; TNBC, triple-negative breast cancer.

GO and KEGG Enrichment Pathway Analysis of DEGs
To better understand the biological functions of the identified DEGs, GO and KEGG pathway enrichment analyses were performed. After screening with the threshold of adjusted p < 0.1, significantly enriched GO terms and KEGG terms were selected.
As shown in Figure 2, in the BP category, DEGs were mainly enriched in maintenance of location, leukocyte cell-cell adhesion, leukocyte aggregation, leukocyte migration involved in inflammatory response, protein nitrosylation, peptidyl-cysteine S-nitrosylation, and regulation of apoptotic signaling pathway. In the CC category, DEGs were principally associated with secretory granule lumen, cytoplasmic vesicle lumen, vesicle lumen, immunological synapse, collagen-containing extracellular matrix, and nuclear inner membrane. The analysis of the MF category indicated that DEGs were enriched in toll-like receptor binding, fatty acid derivative binding, fatty acid binding, monocarboxylic acid binding, microtubule binding, integrin binding, and tubulin binding. Furthermore, two KEGG pathways with significant enrichment were the peroxisome proliferator-activated receptor (PPAR) signaling pathway and the IL-17 signaling pathway.

PPI Network Construction and Hub Gene Identification
To determine the interactions among the DEGs and identify hub genes, the PPI network of the DEGs was generated using STRING. With the aim of preventing important hub genes being missed, we modified the PPI settings to have a minimum required interaction score of medium confidence (0.400) and a maximum number of interactions of no

PPI Network Construction and Hub Gene Identification
To determine the interactions among the DEGs and identify hub genes, the PPI network of the DEGs was generated using STRING. With the aim of preventing important hub genes being missed, we modified the PPI settings to have a minimum required interaction score of medium confidence (0.400) and a maximum number of interactions of no more than 20 interactors to increase the maximum number of interactions and the number of proteins directly related to the input proteins. Then, a PPI with 47 nodes and 122 edges, with a PPI enrichment p value < 1.0 × 10 −16 , was obtained and imported into Cytoscape software v3.9.1 for visualization ( Figure 3A, Table S2). more than 20 interactors to increase the maximum number of interactions and the number of proteins directly related to the input proteins. Then, a PPI with 47 nodes and 122 edges, with a PPI enrichment p value < 1.0 × 10 −16 , was obtained and imported into Cytoscape software v3.9.1 for visualization ( Figure 3A, Table S2). The MCODE plug-in was used to conduct module analysis to detect crucial clustering modules. Three modules were retrieved from the PPI network. The criteria were set as follows: Degree Cutoff = 2, Node Score Cutoff = 0.2, K-Core = 2, and Max. Depth = 100. Module 1 included 10 nodes and 84 edges with a cluster score (density times the number of members) of 9.333. Module 2 and module 3 had 6 nodes and 20 edges and 3 nodes and 6 edges, respectively, and the scores were 4.000 and 3.000, respectively ( Figure 3B-D).
Combined with the logFC values of the hub genes in the GSE38959 dataset, the GO and KEGG enrichment pathways were analyzed. The top 10 GO terms and KEGG pathways are shown in Figure 4. In the BP category, nine hub genes including CYBB, NCF1, NCF2, ITGB2, RAC2, ITGAM, CYBA, ICAM1, and TLR4 were enriched in reactive oxygen species metabolic process. Furthermore, S100A8, S100A9, ITGB2, RAC2, ITGAM, ICAM1, CXCL10, and CXCR3 were enriched in leukocyte migration. Enriched CC and MF were related to redox reactions like NADPH oxidase complex, superoxide-generating NADPH oxidase activity, and oxidoreductase activity. KEGG enrichment analyses showed that leukocyte transendothelial migration was highly correlated with these genes. The MCODE plug-in was used to conduct module analysis to detect crucial clustering modules. Three modules were retrieved from the PPI network. The criteria were set as follows: Degree Cutoff = 2, Node Score Cutoff = 0.2, K-Core = 2, and Max. Depth = 100. Module 1 included 10 nodes and 84 edges with a cluster score (density times the number of members) of 9.333. Module 2 and module 3 had 6 nodes and 20 edges and 3 nodes and 6 edges, respectively, and the scores were 4.000 and 3.000, respectively ( Figure 3B-D).
Combined with the logFC values of the hub genes in the GSE38959 dataset, the GO and KEGG enrichment pathways were analyzed. The top 10 GO terms and KEGG pathways are shown in Figure 4. In the BP category, nine hub genes including CYBB, NCF1, NCF2, ITGB2, RAC2, ITGAM, CYBA, ICAM1, and TLR4 were enriched in reactive oxygen species metabolic process. Furthermore, S100A8, S100A9, ITGB2, RAC2, ITGAM, ICAM1, CXCL10, and CXCR3 were enriched in leukocyte migration. Enriched CC and MF were related to redox reactions like NADPH oxidase complex, superoxide-generating NADPH oxidase activity, and oxidoreductase activity. KEGG enrichment analyses showed that leukocyte transendothelial migration was highly correlated with these genes.

Hub Gene Expression Validation and Prognostic Analysis
Validation was performed in the TCGA-BRCA cohort for TNBC and the GSE48452 dataset for NASH. For TNBC, the differences in the expression levels of all hub genes between normal tissues and TNBC samples were statistically significant ( Figure 5A). Compared with normal tissues, 13 hub genes were upregulated, including CYBB, NCF1, NCF2, S100A8, S100A9, ITGB2, RAC2, ITGAM, CYBA, ICAM1, CXCL10, CXCR3, and ITGAL, and TLR4 was downregulated. For NASH, the expressions of hub genes ITGB2, RAC2, ITGAM, and CYBA were upregulated, and the changes in other genes' expressions were not statistically significant ( Figure 5B).

Hub Gene Expression Validation and Prognostic Analysis
Validation was performed in the TCGA-BRCA cohort for TNBC and the GSE48452 dataset for NASH. For TNBC, the differences in the expression levels of all hub genes between normal tissues and TNBC samples were statistically significant ( Figure 5A). Compared with normal tissues, 13 hub genes were upregulated, including CYBB, NCF1, NCF2, S100A8, S100A9, ITGB2, RAC2, ITGAM, CYBA, ICAM1, CXCL10, CXCR3, and ITGAL, and TLR4 was downregulated. For NASH, the expressions of hub genes ITGB2, RAC2, ITGAM, and CYBA were upregulated, and the changes in other genes' expressions were not statistically significant ( Figure 5B). To evaluate the clinical relevance of hub gene expression, the TNBC samples were divided into a high-expression group and a low-expression group according to the median expression level of each gene for prognostic analysis. Univariate Cox analysis suggested that high expression of ITGB2, RAC2, ITGAM, and CXCL10 was associated with better overall survival (OS) ( Table 2). The corresponding KM survival curves are shown To evaluate the clinical relevance of hub gene expression, the TNBC samples were divided into a high-expression group and a low-expression group according to the median expression level of each gene for prognostic analysis. Univariate Cox analysis suggested that high expression of ITGB2, RAC2, ITGAM, and CXCL10 was associated with better overall survival (OS) ( Table 2). The corresponding KM survival curves are shown in Figure 6. In addition, two prognostic factors associated with worse OS were identified in terms of clinicopathological features, namely black or African American and Asian ethnicity, and N stage. All variables significant upon univariate Cox regression analysis (p ≤ 0.05) were subjected to multivariate Cox regression analysis, and it was found that the N stage was an independent risk factor for overall survival and the remaining factors were not significant.

Discussion
In recent years, the relationship between NAFLD and breast cancer has become a research hotspot, and an increasing number of studies have confirmed the correlation. Some studies have reported that breast cancer is a common extrahepatic complication of NAFLD [16,30]. Simultaneously, it also has been suggested that patients with breast cancer, especially those receiving endocrine therapy, present an increased risk of NAFLD [31,32]. Based on these combined results, NAFLD may be related to the occurrence and progression of breast cancer. In addition, it has been proposed that liver metastasis in the diagnosis of fatty liver patients with breast cancer is significantly lower than that of patients with normal liver histology, further revealing the correlation between the two events in clinical practice [33]. However, these studies have been mostly observational, and the mechanism connecting NAFLD and TNBC remains unclear to date. Therefore, exploring the molecular mechanisms to enable early identification and intervention is undoubtedly of great clinical significance.
In this study, we explored common DEGs of NASH and TNBC datasets in public databases through bioinformatics analysis and observed the biological processes and signaling pathways in which they jointly participate. GO and KEGG enrichment analyses of the common DEGs showed enrichment in leukocyte aggregation, migration and adhesion, apoptosis regulation, and the PPAR signaling pathway, suggesting that TNBC in NAFLD patients was likely due to enhanced leukocyte recruitment in the inflammatory response and abnormal apoptosis. Interestingly, the PPAR signaling pathway not only controls the expression of genes encoding proteins of lipid metabolism, but is also involved in anti-cancer responses [34]. One of the mechanisms by which PPARs act to control cancer progression is to affect the NF-κB signaling pathway, or its upstream pathways, such as the Toll-like receptor 4 (TLR4) signaling pathway [35,36]. PPAR γ agonists have been found to induce apoptosis in TNBC cells and inhibit melanoma progression in mice [37,38].
CYBB, CYBA, NCF1, NCF2, and RAC2 are NADPH oxidase (NOX) subunit genes and are associated with inflammation and fibrosis in multiple organs, such as the liver [39,40], lungs [41], and kidneys [42], as well as with various types of cancer [43]. NOX can produce reactive oxygen species (ROS) that cause changes in cellular redox status, leading to chronic liver injury and fibrosis, which is critical for alcoholic steatohepatitis and NASH [44,45]. The analysis of the TCGA cohort showed that NOX-related genes were expressed more highly in tumor cells than in normal tissues of the same tissue origin, which suggested that the abnormal expression and regulation of NOX may be related to tumorigenesis and the increase of ROS in tumor cells [46], which probably contributes to the increased susceptibility of TNBC patients to NAFLD compared to the healthy population. In addition, RAC2 was strongly associated with OS in patients. RAC2 is a 21 kDa RAS superfamily of GTPases that stabilize the cytoskeleton structure of actin [47,48]. Chen et al. [49] found the high expression of RAC2 can inhibit the proliferation of breast cancer cells.
ITGAM, ITGB2, and ITGAL are involved in the most common integrins expressed on leukocytes, including Mac-1 (αMβ2 or CD11b/CD18) and leukocyte function-related antigen 1 (LFA-1 or αLβ2) [50,51]. Activated integrins play a crucial role in trafficking immune cells into tissues, activating and promoting the proliferation of effector cells, and inducing the formation of immune synapses between cells [52,53]. Clinically, Mac-1 expression is increased in patients with metabolic syndrome [54]. It has also been confirmed that Mac-1 is required for pro-inflammatory gene expression by macrophages in adipose tissue inflammation and is related to recruiting monocytes from bone marrow and inducing them to transform into M1-like macrophages (pro-inflammatory and usually anti-tumor) to express cytotoxic factors to engulf and destroy tumor cells [55][56][57]. Rojas et al. [51] demonstrated that an integrin marker composed of ITGA4, ITGB2, ITGAX, ITGB7, ITGAM, ITGAL, and ITGA8 had the potential to recognize basal-like breast cancers with immune- infiltrating and favorable prognosis. Our results are similar to this finding, with ITGAM and ITGB2 highly expressed in both diseases and associated with a better prognosis in TNBC. Further analysis of immune infiltration showed a positive correlation between integrin genes and activated B cells and immature B cells.
CXCL10 and its homologous receptor CXCR3 are critical in the development of specific features of the NAFLD phenotype, wherein they are mainly involved in the induction of inflammation, regulation of adipogenesis and oxidative stress, and other related processes [58,59]. In the process of tumor progression, studies have shown that CXCL10 has a dual role. It can not only promote tumor progression by increasing cell proliferation and metastasis, but also exert an anti-malignancy function by inhibiting angiogenesis and influencing the tumor microenvironment [60][61][62]. Sun et al. [63] found that CXCL10 expression was significantly upregulated in mice with melanoma and that CXCL10 promoted the proliferation of monocyte-like MDSCs, leading to an immunosuppressive microenvironment. On the other hand, it has also been demonstrated that tumor-cell-derived CXCL9/CXCL10 regulates the recruitment of T cells in various tumors [64][65][66]. Our study suggested that CXCL10 is positively correlated with MDCSs and activated T cells, and TNBC patients with high CXCL10 expression obtained a better prognosis. Therefore, in the context of NAFLD, CXCL10 may play an anti-tumor role in TNBC, but more in-depth experimental research is still needed.
Although many studies have linked metabolic syndrome to the development of cancer and poor prognosis, it may be a symptom of a general metabolic disorder. Our study explored the relationship between NAFLD and TNBC at the genetic level for the first time, and found that the hub genes ITGB2, RAC2, and ITGAM were upregulated in both diseases and were prognostic protective factors in TNBC. This is inconsistent with our understanding of risk factors such as obesity, a high-fat diet, and NAFLD that promote the occurrence and progression of breast cancer. Therefore, further experimental studies will be of great significance and are expected to find new targets for diagnosis, prognostic assessment, and treatment of TNBC.
The study had several limitations. First, although the role of these genes has been elucidated in multiple studies, the key pathways and hub genes identified have not been validated in experiments. Second, due to the lack of a dataset, the validation of the hub gene was performed in patients with only NAFLD or TNBC, but not in patients with NAFLD combined with TNBC. Third, the relationship between the hub genes and the prognosis of TNBC patients needs to be confirmed by prospective clinical studies.

Conclusions
In conclusion, this study explored the hub genes of NAFLD and TNBC and illustrated the possible mechanisms for the co-occurrence trend of these two diseases. Redox reactions regulated by the NOX subunit genes and the transport and activation of immune cells regulated by integrins may play a central role in the development of NAFLD and TNBC. Additionally, the expressions of ITGB2, RAC2, ITGAM, and CXCL10 were significantly correlated with a good prognosis in TNBC and may be potential therapeutic targets for the development of gene therapies for TNBC patients with NAFLD.