Potential Therapeutic and Prognostic Values of LSM Family Genes in Breast Cancer

Simple Summary The roles of “like-Smith” (LSM) proteins in breast cancer development and their clinical relevance remain unclear. In this study, multiple analyses based on 3593 patients with breast cancer and their mRNA expression values were utilized to investigate the clinical relevance of LSM family genes, including cancer aggressiveness, immune cell infiltration, prognostic outcomes, and related signaling pathways. We revealed that LSM4 had higher expression levels in breast tumor and breast cancer sub-types than in normal samples, and was associated with poor survival outcomes. Interestingly, infiltration levels of most immune cell types, including cluster of differentiation for positive CD4+ T cells, CD8+ T cells, T-cell follicular helpers, and myeloid-derived suppressor cells were positively correlated with LSM4 expression in several subclasses of breast cancer (basal, human epidermal growth factor receptor 2 (HER2), luminal A, and luminal B). Abstract In recent decades, breast cancer (BRCA) has become one of the most common diseases worldwide. Understanding crucial genes and their signaling pathways remain an enormous challenge in evaluating the prognosis and possible therapeutics. The “Like-Smith” (LSM) family is known as protein-coding genes, and its member play pivotal roles in the progression of several malignancies, although their roles in BRCA are less clear. To discover biological processes associated with LSM family genes in BRCA development, high-throughput techniques were applied to clarify expression levels of LSMs in The Cancer Genome Atlas (TCGA)-BRCA dataset, which was integrated with the cBioPortal database. Furthermore, we investigated prognostic values of LSM family genes in BCRA patients using the Kaplan–Meier database. Among genes of this family, LSM4 expression levels were highly associated with poor prognostic outcomes with a hazard ratio of 1.35 (95% confidence interval 1.21–1.51, p for trend = 3.4 × 10−7). MetaCore and GlueGo analyses were also conducted to examine transcript expression signatures of LSM family members and their coexpressed genes, together with their associated signaling pathways, such as “Cell cycle role of APC in cell cycle regulation” and “Immune response IL-15 signaling via MAPK and PI3K cascade” in BRCA. Results showed that LSM family members, specifically LSM4, were significantly correlated with oncogenesis in BRCA patients. In summary, our results suggested that LSM4 could be a prospective prognosticator of BRCA.


Introduction
According to Global Cancer statistics, the estimated 2.3 million confirmed cases of breast cancer (BRCA) in 2020 elevated it to become the most prominent malignant cancer worldwide [1]. BRCA accounts for one out of four cancers and kills one out of six women, making it the most prevalent cancer among women worldwide. In developing countries, the rise in BRCA is further aggravated by economic conditions and leads to several social complications [2][3][4][5][6]. Therefore, it is imperative to build a supportive infrastructure to promote cancer prevention and treatments [7][8][9][10][11]. Although great efforts to discover new therapeutics have been greatly made, the survival rates of patients with BRCA remained relatively low [12][13][14][15]. Past researches have found that cancer can be detected at an early stage by screening the differentially expressed genes (DEGs) as prognostic biomarkers in associated with poor survival rate in cancer patients to design target drugs [16][17][18]. Fortunately, with the rapid development in both computing abilities and big data resources, large publicly available datasets have been constructed for academic research and even for commercialization [19].
Smith-like (LSM) proteins are known as a family of RNA-binding proteins that appear in essentially all cellular organisms [20]. First discovered in a patient with systemic lupus erythematosus, these so-called Sm proteins are antigens targeted by anti-Sm antibodies [21].
The LSM family has 13 members (LSM1~LSM14B) which were strongly associated with tumorigenesis and metastasis of several tumor types. In particular, the LSM1 protein plays roles in the cellular conversion and progression of BRCA, mesotheliomas, and lung cancer [22][23][24], and in the metabolism of RNA [25]. LSM3 was reported to be significantly associated with Alzheimer's disease [26], while LSM8 has a strong relationship with the development of Hashimoto's thyroiditis [27].
However, the biological functions of each member of the LSM family genes are not fully understood, especially with regard to the tumor microenvironment (TME) [28]. Multiple amounts of microarray and sequencing technologies have enhanced the ability of robust computational algorithms to rapidly analyze biomedical data rapidly [29][30][31]. In spite of this, challenges remain in imitating the human TME in vivo and in vitro. Employing genes expressions and appropriate algorithms is believed to help us understand the respective immune functions. In this study, we integrated several high throughput data and platforms to reveal insights into the molecular mechanisms of LSM family members and to clarify potential therapeutic targets for BRCA. Furthermore, we investigated the survival rate of BRCA patients based on LSM messenger (m)RNA transcription levels, using a Kaplan-Meier (KM) plot. Currently, the roles of LSM members in the development of any diseases are vague [32,33]; therefore, we attempted to predict the molecular functions and signaling pathway networks by applying MetaCore, a high-quality biological platform for multi-omics data.
Comprehensive methods can help reveal the roles of LSM family genes in BRCA growth, improve prognoses, and personalize effective treatments. We hypothesized that the LSM family, especially LSM4, might possess a novel role in tumor growth, and infiltration of immune cells may provide better predictions of survival rates in BRCA patients.

UALCAN
UALCAN (http://ualcan.path.uab.edu, accessed on 1 May 2021) is an inclusive puplic database using The Cancer Genome Atlas (TCGA) "Level 3" RNA-sequencing (RNA seq) and clinical data from more than 30 cancer types. This database includes expression values computed by the RSEM algorithm for 20,502 genes [34]. Transcripts per million (TPM) was used to evaluate whether the difference in gene expression levels between groups were statistically significant. This platform retrieved data from TCGA, including 114 normal samples and 1097 primary BRCA tumors. Our report utilized the mRNA levels of 13 LSM family genes in breast cancer and their correlations with clinicopathological parameters and tumor stages. Additionally, we also performed a comprehensive analysis of promoter DNA methylation level in both control group (n = 97) and tumor group (n = 793) base on TCGA dataset. The beta value showed the degree of DNA methylation ranging from 0 (unmethylated) to 1 (fully methylated). Student's t-test was used, and a p < 0.05 was considered as statistically significant.

DNA Methylation
We used Methsurv (https://biit.cs.ut.ee/methsurv/, accessed on 1 May 2021) to create a heatmap of the various DNA methylated regions in order to evaluate the methylation level of a target gene [35]. Beta values were used to reflect DNA methylation levels (ranging from 0 to 1). M/ (M + U + 100) is used to calculate the beta value for each CpG site. M and U represent the methylated and unmethylated intensities, respectively.

Functional Enrichment Analysis
We obtained data from the METABRIC (n = 2509) and TCGA datasets (n = 1084) in the cBioPortal (https://www.cbioportal.org, accessed on 1 May 2021) database [36][37][38]. The aim of this analysis was to determine biological processes (BPs), disease biomarker networks, and breast neoplasm cell-cell signaling pathways using the MetaCore analysis (https://portal.genego.com, accessed on 1 May 2021). Furthermore, a gene ontology (GO) analysis was also implemented to describe genes and gene products from three categories: cellular components (CPs), molecular functions (MFs), and BPs by obtaining data from DOSE packages in R (vers. 4.0). Of note, we also performed a gene set enrichment analysis (GSEA, http://software.broadinstitute.org/gsea, accessed on 1 May 2021) to identify gene product activities in BRCA, with a dataset obtained from the METABRIC database. A q-value false discovery rate (FDR) and normalized enrichment score (NES) were calculated. A q-value of <0.25 was set as the boundary criterion as previously described, and an NES of >1.5 and a nominal p value of <0.05 were set as the thresholds [36,[39][40][41][42][43][44][45][46].

Survival Analysis
We determined the correlations between LSMs mRNA expression levels and the survival of BRCA patients using the KM-plot database (https://kmplot.com, accessed on 1 May 2021). This online public database server is a robust platform for visualizing patients with several cancer types included in TCGA and METABRIC databases. Additionally, gene expression and survival data were taken from the Gene Expression Omnibus (GEO) and TCGA (HG-U133A 2.0, Affymetrix HG-U133A, and HG-U133 Plus 2.0 microarrays). Of note, this platform contains 22,277 genes on BRCA prognoses with microarray data from 1809 patients [47]. It was developed to assess the influence of target genes on the prognosis of patients with BRCA. Recurrence-free survival (RFS) for the LSM gene family was set as the default in the KM-plot database [47], including a survival curve, p log-rank value, and hazard ratios (HRs) with 95% confidence intervals (CIs), all of which were maintained in the plot. The horizontal axis (x-axis) showed the survival time in months, and the vertical axis (y-axis) showed the probability of survival.

Analysis of Protein Expression in Clinical Specimen
LSM family protein expressions were further calculated using the publicly available Human Protein Atlas (HPA) web database, which contains more than 10 million IHC images and 82,000 high-resolution immunofluorescences (IF) images of tissue microarrays. These microarrays, which contain sections from 46 normal human tissues and more than 20 types of human cancers, were labeled with antibodies against more than 11,000 human proteins. [48]. We obtained 1075 patient samples from the BRCA data resources, and evaluated protein expressions on IHC images in clinical samples. The staining is reported in terms of intensity, subcellular localization, and single-cell variability (SCV) for each cell line and antibody. Based on the laser power and detector gain parameters utilized for image capture in combination with the visual appearance of the image, the staining intensity is categorized as negative, weak, moderate, or strong. Protein expression score is determined by manually scoring immunohistochemical data for staining intensity (negative, weak, moderate, or strong) and proportion of stained cells (25 percent, 25-75 percent, or >75 percent). To automatically translate each intensity and fraction combination into a protein expression level score, the following formula is used: negative-not detected; weak < 25%-not detected; weak combined with either 25-75% or 75%-low; moderate < 25%-low; moderate combined with either 25-75% or 75%-medium; strong < 25%-medium, strong combined with either 25-75% or 75%-high.

TIMER Analysis
Using TIMER 2.0 (http://timer.comp-genomics.org/, accessed on 1 May 2021), we explored the infiltration levels of immune cells in 31 cancer types, and approximately 10,000 samples were obtained from the TCGA dataset. All TCGA tumor data were retrieved from GDAC (http://firebrowse.org/, accessed on 1 May 2021), and included somatic mutation cells, somatic copy number variations, transcriptome profiles, and clinical outcomes. Gene expression levels were expressed using log2[relative standard error of the mean (RSEM)]. To obtain expression levels in normal and cancer tissues, LSM genes under the DiffExp module with default settings were used. The "DiffExp" module allows for the investigation of differences in gene expression between normal and malignant tissues for any gene across all TCGA tumors. The "correlation" module presents expression scatterplots between two user-defined genes in a certain cancer type, along with Spearman correlation coefficients and statistical significance, and can be adjusted by tumor purity [49]. From default immune cells such as B cells, CD8 + T cells, macrophages, CD4 + T cells, neutrophils, and dendritic cells, we investigated relationships between highly expressed LSMs and the infiltration of inflammatory cells in several BRCA subtypes.

Differentially Expressed Genes (DEGs) Analysis
We attempted to validate the results using the Genomic Data Commons (GDC) database (https://portal.gdc.cancer.gov, accessed on 1 May 2021), which contains more than 50,000 raw sequencing data inputs, and various other data types. We first queried all of the genome data by applying TCGA Biolinks package in R (vers. 4.1.0), and then created a violin plot to compare gene expression levels between healthy tissues and tumor tissues. The Mann-Whitney test was applied to compare the two paired groups. A heatmap of expression levels of 1222 BRCA patients was also performed using the "DESeq2" package to illustrate differences between the two phenotypes: "normal" and "tumor". We selected the top 10% of genes, ranked by the log2[fold change (FC) expression values], with |logFC| > 1.5 set as the threshold [50][51][52][53][54].

Statistical Analysis
We utilized TCGA Pan-Cancer Atlas, a dataset from cBioPortal (https://www.cbioportal. org, accessed on 1 May 2021), to obtain patient data and query the effects of the expressions of different LSM family members on overall survival (OS). For the survival analysis, a KM plotter was applied, with all default settings, and recurrence-free survival (RFS) was preferred, with the auto-best cutoff values and J best probe set. All possible cutoff values between the lower and upper quartiles were determined, and the best presenting threshold was subsequently used as the cutoff. A log-rank p-value of <0.05 was considered statistically significant.

Analysis of Expression Profiles of LSM Family Members
Advances in storing big data, especially banking transcriptomic data, have been surprisingly robust in recent years. Since there are relatively few reports on links between LSM family genes and BRCA, we first identified expression levels of LSMs in BRCA patients using Oncomine. The LSM1 mRNA expression was upregulated in BRCA patients. Similarly, eight databases indicated that LSM4 was overexpressed in BRCA patients. Characteristics of these datasets are displayed in Supplementary Table S1. To derive benefits from clinical data in TCGA, we engaged the DiffExp module of the TIMER server to explore LSM family gene expressions in several cancer types and healthy controls across TCGA datasets. We observed that expressions of LSM1, LSM2, LSM3, LSM4, LSM7, LSM10, LSM12, LSM14A, and LSM14B were higher in tumor samples compared to normal samples. For instance, cohorts of adenoid cystic carcinoma, esophageal carcinoma, BRCA, colon cancer, Lynch syndrome, and lung adenocarcinomas had the most significantly (p < 0.001) elevated expressions of LSM1, LSM2, LSM3, LSM4, and LSM5; while LSM7, LSM10, LSM11, LSM12, and LSM14A were significantly overexpressed in stomach adenocarcinoma and uterine corpus endometrial carcinomas compared to healthy samples (Supplementary Figure S1).
We then determined the expression levels of all 13 members of the LSM family in the UALCAN database ( Figure 1). The mRNA levels of LSM1, LSM2, LSM3, LSM4, LSM5, LSM7, LSM8, LSM10, and LSM12 were significantly overexpressed in BRCA tissues. In contrast, compared to healthy controls, the transcript levels of LSM6, LSM11 and LSM14A were down-regulated. All p-values were <0.01.

Relationships between LSM Family Members and Stages of BRCA
After defining the mRNA expression levels of each LSM member in BRCA associations between their transcriptomic levels of and patients' tumor stages were analyzed using the UALCAN database. We found significant correlations of LSM1, LSM2, LSM3, LSM4, LSM5, LSM7, LSM10, LSM14B with an uptrend of tumor stages in BRCA patients ( Figure S2 in Supplementary). Relationships between the remaining genes of the LSM family and tumor stages were less clear, and statistically non-significant. p values are shown in Supplementary Table S2.

Prognosis Value of DNA Methylation of LSM Family Gene
It is widely known as DNA methylation plays an important role in cancer development. Elevated expressions of DNA methyltransferases have been shown in numerous cancers to contribute to tumor growth by methylation-mediated knockdown. [55,56]. We also performed DNA methylation levels in both normal samples (n = 97) and primary tumor samples (n = 793) of each LSMs family gene in TCGA cohort using UALCAN database ( Figure S3 in Supplementary). Methylation levels of LSM2, LSM4, and LSM10 were upregulated in primary breast tumors, while LSM6 methylation levels were downregulated. The remaining LSM family genes were not statistically significant, with a p-value of less than 0.05.

LSM Gene Mutations and Co-Expression Analysis
We investigated mutations of LSM genes in BRCA patients from TCGA Pan-Cancer Atlas (n = 1082 patients) in the cBioPortal platform, and found surprisingly high rates of alterations in LSM1 (25%), LSM2 (11%) LSM4 (8%), and LSM14B (23%). We chose the top 25% (5000 genes) from a list of co-expressed genes retrieved from the METABRIC dataset, and then intersected these lists with each gene in the LSM family. Gene ontology (GO) consists of CCs, BPs, and MFs. Results were then input into Cytoscape to build a network of GOs and Kyoto Encyclopedia of Genes and Genomes (KEGG), which are described in detail in Figure 2C. We also evaluated correlations with LSMs by exploring their mRNA expressions, and reported Pearson's correlation coefficients ( Figure 2B).

Survival Analysis of LSM Family Genes
We evaluated relationships between LSM mRNA expression levels and survival rates of BRCA patients by performing a KM analysis to uncover the prognostic value of each LSM. The results indicated that 8 of 13 members of the LSM family were significantly associated with poor prognostic outcomes of BRCA patients with respect to RFS, such as LSM1 (HR = 1.45, 95% CI: 1.3-1.6, p for trend = 2.1 × 10 −12 ), LSM2 (HR = 1.25, 95% CI: 1.12-1.39, p for trend = 4 × 10 −5 ), LSM3 (HR = 1.49, 95% CI: 1.34-1.66, p for trend = 3.4 × 10 −13 ), LSM4 (HR = 1.35, 95% CI: 1.21-1.51, p for trend = 3.4 × 10 −7 ), LSM6 (HR = 1.17, 95% CI: 1.05-1.29, p for trend = 0.0037), LSM7 (HR = 1.54, 95% CI: 1.38-1.72, p for trend = 3.2 × 10 −15 ), LSM14A (HR = 1.3, 95% CI: 1.17-1.44, p for trend = 5.7 × 10 −7 ), and LSM14B (HR = 1.45, 95% CI: 1.23-1.7, p for trend = 6 × 10 −6 ). In contrast, low expression levels of LSM5, LSM8, and LSM11 indicated longer recurrence metastasis-free survival (HR = 0.85, 95% CI: 0.76-0.95, p for trend = 0.0053, HR = 0.85, 95% CI: 0.56-0.76, p for trend = 5.6 × 10 −8 , HR = 0.64, 95% CI: 0.56-0.75, p for trend = 2 × 10 −8 , respectively). In addition, LSM10 and LSM12 showed non-significant prognostic values. The results are tabulated Figure 3. Subsequent to the differential expression analysis and survival analysis, we decided to select LSM1, LSM2, LSM3, LSM4, LSM7, and LSM14B for further exploration, due to the following reasons. First, these genes showed high expression levels in breast tumors compared to normal samples. Second, when spotting the stages of cancer development, significantly increased mRNA values in higher-level tumor stages should be detected. Third, reflecting the prognostic values, higher expression levels of these family genes should lead to poorer survival outcomes. Collectively, after comparing results from the survival analysis and relationships between expression levels of LSM family genes and normal tissues in terms of individual stages, we observed that LSM1, LSM2, LSM3, LSM4, LSM7, and LSM14B all fit the above criteria. bars are missense mutations. This plot shows a high alteration rates of LSM1 (25%), LSM2 (11%) LSM4 (8%), LSM14B (23%) (B) Correlations among LSM family in breast cancer using TCGA PanCancer Atlas dataset (n = 1082). The symmetric correlation matrix was created using the "corrplot" R package. The colors represent the degree of pairwise correlation regarding Spearman's rank correlation coefficient (rho). Darker blue color and larger dot size mean stronger positive correlation, while darker red indicates higher negative correlation. The Cross symbols represent non-significant correlation coefficient values (p-value > 0.01). (C) A network of related genes/pathways was constructed. The top 25% of co-expressing genes from the METABRIC database (5000 genes) were collected for each LSM and then intersected with a list of 25 shared genes, and finally input into ClueGo in Cytosape. Only pathways with p < 0.05 are shown, with the statistical option as a two-sided hypergeometric test for enrichment and Bonferroni for p-value correction. . Prognostic value of LSM family genes in breast cancer patients. A recurrence-free survival (RFS) dataset was used for analysis (n = 4929 patients). An auto-cutoff strategy was set in this analysis to differentiate patients into two groups based on the value of LSM mRNAs. The best JetSet probes, which describe LSMs, were used to map Affymetrix probe sets by choosing the best probe set for this analysis. Higher expression levels are shown in red, whereas lower expression values are in black. Results showed correlations between expressions of LSMs and survival outcomes of breast cancer patients. By splitting patients by the median, only the best probe Jetset and auto-cutoff were queried. The results indicated that LSM1/2/3/4/5/6/7/14A/14B were significantly associated with poor prognostic outcome of breast cancer patients.

Protein Expressions of LSM Family Members
We investigated the protein expression of LSMs in the Human Protein Atlas database ( Figure 4). The immunohistochemstry images of LSM2, LSM3, LSM4, LSM7, LSM14b in breast cancer patients, including their clinicopathological parameters such as Patient ID, Gender and Age, which showed the normal and tumor samples (Human Protein Atlas) were presented in Figure S4 in Supplementary. In this analysis, results of LSM1 protein expression were not found due to the absence of antibodies. LSM2 and LSM4 proteins were overexpressed in tumor tissues with similar patterns in BRCA patient samples in the HPA dataset. Meanwhile, we found that LSM7 and LSM14B proteins were not significantly differentially expressed between normal and cancer tissues. Of note, these results showed the same trend as the mRNA expression profiles. Protein expression score is determined by manually scoring immunohistochemical data for staining intensity (negative, weak, moderate, or strong) and proportion of stained cells (25 percent, 25-75 percent, or >75 percent). The following formula is used to automatically translate each intensity and fraction combination into a protein expression level score: negative-not detected; weak <25%-not detected; weak combined with either 25-75% or 75%-low; moderate <25%-low; moderate combined with either 25-75% or 75%-medium; strong <25%-medium, strong combined with either 25-75% or 75%-high.

Gene Ontology Enrichment Analysis
For comprehensive exploration, we extracted data from METABRIC and TCGA Pan-Cancer Data to retrieve GO enrichment results, including CCs, BPs, MFs, and KEGG. For BPs, we observed that LSM4 was correlated with non-coding (nc)RNA metabolic processes and ribonucleoprotein complex biogenesis. In contrast, the CC analysis showed localization in mitochondria, such as the mitochondrial inner membrane, mitochondrial matrix, and mitochondrial complex. Finally, from MF results, "catalytic activity, acting on RNA" and "ATPase activity" were strongly associated with high LSM4 expression in BR tumors, while KEGG ontology indicated the role of pathways of neurodegeneration-multiple diseases, as well as other disease-related pathways, such as amyotrophic lateral sclerosis, Alzheimer's disease, Huntington disease, and Parkinson's disease (Figure 7).

High Expression of LSM4 Is Related to Epithelial-Mesenchymal Transition (EMT) and Pro-Cancerous Related Gene Sets in Breast Cancer
It was interesting to discover underlying biological processes of gene sets which were co-expressed with LSM4. A GSEA was utilized to investigate the enrichment of MSigDB Hallmark gene sets in BRCA samples with high expression levels of LSM4. We found that the EMT, coagulation, tumor necrosis factor (TNF)-α-signaling via nuclear factor (NF)-κβ, interleukin (IL)-2/signal transduction and activator of transcription 5 (STAT5) signaling, apical junction, and androgen response, which are known as inflammation-and immune-related gene sets, were enriched with high LSM4 expression in breast tumors. Moreover, LSM4 was also significantly expressed in a procancerous gene set, such as KRAS signaling upregulation, angiogenesis, and transforming growth factor (TGF)-β signaling (Figure 8). The detailed enrichment results are shown in Supplementary Data.

Identification of Differentially Expressed Genes (DEGs) in BRCA Patients
It was interesting to validate our results in independent datasets to evaluate the consistency of our results. In particular, using the GDC dataset, comparisons of LSM4 expression levels in normal tissues and tumor tissues were described via a violin plot, with a t-test p-value of <2.2 × 10 −16 . We observed a congruous outcome with our results above, revealing the significantly high expression of LSM4 in BRCA patients (Supplementary Figure S5). Moreover, a heatmap was conducted to clearly describe the relationship of LSM4 expression levels between two different phenotypes: normal and tumor tissues. The analysis was based on log2[fold of change] values, showing a list of 60 genes over the entire genome, and its expression profiles. We found that LSM4 was also overexpressed in tumor samples, and underexpressed in normal controls ( Figure S6 in Supplementary). Interestingly, we observed that LSM4 was closely ranked with other breast cancer biomarkers, such as DPP3 [51], CDK5 [61], and TRIB3 [62]. These biomarkers showed consistent results of high expression levels in tumors and low expression levels in normal phenotypes. From TCGA Pan-Cancer dataset, patients were split into two groups with low or high LSM4 mRNA expression; then a related ranked genes list was obtained and input to GSEA. From the GSEA software, statistical significance was considered as an FDR value < 0.25, normalized enrichment score (NES) > 1.5, and nominal p-value < 0.05, which was recommended by GSEA database. A positive NES value, which reflects the enrichment pathway in the list, represents the enrichment at the top of pathways.
Additionally, it is widely known that DNA methylation plays a crucial role in cancer development. In many malignancies, elevated expressions of DNA methyltransferases have been shown to contribute to tumor growth by methylation-mediated gene inactivation. We performed a heatmap of the various DNA methylated locations of LSM4, and DNA methylation-based survival studies using the TCGA dataset ( Figure S7 in Supplementary). A total of 18 methylated CpG sites of LSM4 were found using the MethSurv database, with nearly half of the CpG sites having predictive relevance in breast cancer patients. Among them, cg26961332 showed the highest level of DNA methylation. This result provides a potential mechanism by which LSM4 serves as an oncogene for breast cancer.
The comprehensive values of CpGs in LSM4 are shown in Tables S3 and S4, Supplementary.
Accumulation of changes in tumor-suppressor genes and oncogenes was highly correlated with the occurrence and development of tumors [63]. In this study, we utilized a DEG method using R packages to validate our results of the roles of LSM4 expression in GO and KEGG pathway analyses. Consistently, in terms of BPs, the results showed that high levels of DEGs were highly enriched in ncRNA metabolic processes, ribonucleoprotein complex biogenesis, and ncRNA processing. Similarly, regarding MFs, upregulated DEGs were involved in catalytic activity by acting on RNA; while in terms of CCs, the mitochondrial inner membrane and mitochondrial matrix were highly related to upregulated DEGs. In addition, the KEGG analysis illustrated the role of overexpressed LSM4 in the pathway of neurodegeneration-multiple disease, Alzheimer's disease, etc. Results are shown in Supplementary Materials, Figure S8.

LSM4 Not Only Plays Roles in BRCA Development but Is Also Involved in Various Cancer Types
MetaCore is commonly employed to construct pathways networked from an input gene list to stimulate BPs. After setting the gene list from the intersection of TCGA and METABRIC datasets as input for the MetaCore analysis, we identified interesting results related to LSM4. In particular, it was closely related to many types of cancer signaling pathways, such as "Beta catenin-dependent transcription regulation in colorectal cancer", "Stem cells aberrant Wnt signaling in medulloblastoma stem cells", "Mechanism of resistance to EGFR inhibitors in lung cancer", and "Mechanism of drug resistance in multiple myeloma". Meanwhile, LSM4 was also correlated with "Immune response IL-15 signaling via MAPK and PI3K cascade", "Cell cycle spindle assembly and chromosome separation", and "Mitogenic action of ErbB2 in breast cancer", which are immune-and cell cycle-related pathways that play roles in BRCA growth. The pathway list and networks are shown in Figure 9 Table S5, Supplementary.

Discussion
The incidence of BRCA has been significantly increasing every year, making it one of the most common cancers in women worldwide [64][65][66]. For decades, although there have been astonishing efforts to improve the efficacy of BRCA treatments and prognoses, its biology has not yet been fully elucidated. Currently, most patients are diagnosed based on widespread mammogram screening programs. However, in nearly one-third of patients, cancerous growth has already spread to regional lymph nodes at the time of diagnosis [67]. Therefore, it is necessary to investigate novel prognostic techniques for early-stage detection, which employ new biomarkers in a vital role [68][69][70][71][72][73]. Furthermore, characterizing the immune system, profiling the TME, and constructing BRCA immunotherapies are also known as critical keys in cancer research [74,75].
In this study, several analyses of LSM family genes were conducted. LSMs were previously known as U6 small nuclear RNA and mRNA degradation-associated protein-coding genes, and are expressed in multiple cell lines and human organs [76][77][78]. Our goal was to thoroughly analyze the BPs of LSMs in BRCA by performing a comprehensive analysis based on public databases. We commenced by using high-throughput techniques to analyze the role of LSM family genes, by comparing normal to cancer cells, and their relevance to signaling pathways in BRCA development. We were able to identify interesting findings for each individual gene, and further evaluated targeted therapeutic approaches. To our knowledge, this is the first study to apply multiple bioinformatics strategies to explore associations between expressions of LSM family genes and comparative clinicopathological parameters in BRCA patients. We suggest that LSMs could serve as novel biomarkers in BRCA.
Oncogenes are the key genes that contribute to the transformation of normal cells into malignant cells, whereas tumor-suppressive genes prevent the development of the cancer. Tumor formation and progression are defined by individual processes that collaborate, and a greater knowledge of each individual process may give a better foundation for future anticancer research [79]. The emergence of large available datasets around the world have required the use of robust high-throughput analysis to interpret them. In this study, we conducted genes expression of LSMs family in over 20 types of cancer by screening the gene names and applying the thresholds on Oncomine. LSM1, LSM2, LSM3, LSM4, LSM5, LSM7, LSM12, and LSM14B were overexpressed in breast cancer samples compared to normal tissues, LSM6 and LSM11 were underexpressed, whereas LSM8, LSM14A did not show significant upregulations. It was previously determined that LSM1, an oncogene when working in combination with BAG4 and C8orf4, can influence growth factors and affect phenotypes in human mammary epithelial cells [80,81]. Additionally, higher expression of LSM1 resulted in a higher abundance of hepatic metastatic lesions, due to previous report [82]. Consistently, LSM1 has been found as an oncogene activated by gene amplification and it could play a crucial role in breast cancer development and progression [23]. Pan et al. has studied the role of LSM2 in lung cancer development [83]. LSM3 was found to be downregulated in cervical cancer and associated with poor progression-free survival outcome [84,85]. Those reports showed a reverse compare to our survival analysis in breast cancer patients. This inconsistency could be explained by the variability of datasets and the need for further in vitro or in vivo investigations. In colorectal cancer, LSM3 was also significantly associated with lymphatic metastasis [86], however, similar reports in breast cancer are lacking. LSM4 has known as a member of the LSM family of RNA-binding proteins, plays an important role in pre-mRNA splicing by mediating U4/U6 snRNP formation, and involved in pancreatic cancer [87,88]. Indeed, Long et al. demonstrated the effect of RNA-binding protein LSM4 on the growth and locomotion of esophageal cancer cells [89]. A study of Ho et al. showed the correlation between expression of LSM4 and ovarian cancer [90]. Interestingly, Yin et al. reported that LSM4 significantly overexpressed in triple-negative breast cancer (TNBC) patients compared to other breast cancer subtypes, which strongly support our conclusion in this study [91]. In contrast, Wang et al. found a consistent trend of LSM6 expression, which was down-regulated in Basal-like breast cancer subtype [92]. For the other genes in LSM family, there are lack of evidence of their roles in cancer phenotypes. From our point of view, the different trends of LSMs expression in specific cancer types could be explained by genetic compensation [93,94]. In particular, protein post-translational modifications (PTM) are reported to crosstalk among each other, resulting in complex phenotypic outcomes. Of interest, our findings through DNA methylation level analysis by using UALCAN database also support the consistent trend, which show LSM2, LSM4, LSM10 were upregulated in primary breast tumors, while LSM6 methylation levels were downregulated.
For more in-depth analysis, we utilized the expression profile of each gene in order to find correlations with different stages of BRCA. We found that LSM1 expression was significantly correlated with tumor stages and related to a worse distant-metastasis survival rate. Interestingly, its mutation rate was 25%, suggesting that this frequency of new mutations in BRCA is high, and might not be rate-limiting for developing tumors [95]. It was found that LSM1 was associated with the "cytoskeleton remodeling regulation of actin cytoskeleton organization by the kinase effectors of Rho GTPases" pathway. Several studies spanning decades revealed that Rho GTPases play essential roles in assorted cellular events such as cell growth control, membrane trafficking, and transcriptional regulation [96]. Moreover, LSM1 was highly correlated with "Chemoresistance pathways mediated by constitutive activation of PI3K pathway and BCL-2 in small cell lung cancer" and "IGF-1 receptor/EGFR cooperation in lung cancer", which suggested that it also plays an important role in lung cancer tumor growth.
In this study, we identified a significant correlation between the overexpression of LSM2 and increasing tumor stages. Furthermore, when extracting the co-expression gene list from TCGA and METABRIC datasets, we observed that LSM2 was closely co-expressed with various breast cancer biomarkers such as PDCD5 [97] and NUDT5 [98]. In a survival analysis, it was also indicated that LSM2 was related to a poor RFS prognosis. As a member of the LSM family, the role of LSM2 in "cell cycle-role of APC in cell cycle regulation" was established in this study. The anaphase-promoting complex (APC) is known as a ubiquitin ligase, generally required to give rise to progression and exit from mitosis by producing proteolysis of diverse cell cycle regulators. VanGenderen et al. described the key role of the APC in BRCA tumor development and progression [99], reporting that finding an inhibitor of cancer growth activities is a useful approach to treating cancer. Consistently, another report showed that LSM2, a RNA splicing gene, was upregulated in basal-like primary tumors, including BRCA [100].
When conducting a survival analysis, we observed that high LSM3 expression levels were significantly correlated with shorter survival times in patients with BRCA. Of interest, via a TIMER analysis, LSM3 showed a strong relationship with CD4 + T cells, which facilitate anticancer immunity mostly by offering help for CD8 + T cell and antibody responses, and via releasing effector cytokines including interferon (IFN)-γ and tumor necrosis factor (TNF)-α, which so far are known to play vital roles in antitumor immunity [101]. In addition, MetaCore pathway maps revealed that LSM3 was mainly correlated with ubiquinone metabolism. A previous study by Chan et al. demonstrated a correlation between ubiquinone and metabolic disorders in patients with oral cancer [102]. For the first time, our study showed that ubiquinone metabolism could have a potential role in BRCA development.
Similarly, LSM7 was also found to be upregulated in BRCA patients at different stages. Moreover, the KM plot results showed that BRCA patients with high LSM7 expression had a significantly poorer survival rate compared to a control group. A previous study revealed that triple-negative breast cancer cells react to T cells at the splicing layer, where LSM7 is located [103]. This finding is consistent with our study, which showed that LSM7 had significant relevance to CD8 + T-cell markers, macrophages, and neutrophil markers in a TME analysis. In cancer, neutrophils play major roles in inflammatory functions and in innate and adaptive immunity, and are dynamically involved in progression and metastasis, hence serving as emerging targets for multiple cancer types [104,105]. Furthermore, LSM7 was associated with "regulation of actin cytoskeleton nucleation and polymerization by Rho GTPase", a pathway that plays an important role in the movement of cancer cells in living tumor tissues [106]. In vivo, groups of cytoskeletal proteins are often overexpressed in cancer cells, while in vitro they play roles in cancer cell migration and invasion, as reported in a recent study [107].
There have been no studies determining the precise role of LSM14B in any disease. In this report, we observed evidence of LSM14B overexpression that affected outcomes of the survival analysis and increasing expression levels in more progressive tumor stages.
LSM14B showed a relationship with the BP of "Inhibition of oligodendrocyte precursor cell differentiation by Wnt signaling in multiple sclerosis", a pathway involved in chronic demyelinating diseases. Therefore, additional work is required to determine the precise role of LSM14B in BRCA.
Our method provides an observation of correlations between LSM4 expression and BRCA tumor, and revealed that LSM4 overexpression was associated with progressive stages of BRCA. Furthermore, the HPA showed that LSM4 had a moderate to strong IHC intensity in BRCA samples compared to normal breast tissues. From the results of a TME analysis, CD8 + T cells, macrophages, neutrophils, and DCs were strongly associated with LSM4 in targeting cancer cells, leading to a potential role of LSM4 in immunotherapy. Consistently, previous studies demonstrated that macrophages are widely present in solid tumors [108,109] and are a fundamental driver of cancer growth and metastasis [110]. However, only a small percentage of individuals have had a positive clinical response to such therapies. Understanding the cellular proportions, heterogeneity, and geographic distributions of the tumor immune milieu is thought to aid in better stratifying patients who would benefit from immunotherapeutics [111]. While BRCA was previously regarded as relatively non-immunogenic, it is now suggested that BRCA is in fact rich in immune infiltrates, with various functions and prognostic values [112]. Tumor-infiltrating lymphocytes (TILs) are chiefly represented by T cells (CD3 + ) and consist of CD4 + , CD8 + . and T-regulatory (Treg) cells [113]. DCs, CD4 + , and CD8 + T cells, and a minor component of TILs were represented by B cells and plasma cells. Our results in Figure 8 show that high LSM4 expression was positively correlated with CD4 + T cells, including type 1 T-helper (Th1) cells in most BRCA subtypes. Consistent with previous studies, it increases the antitumor activity of NK cells and macrophages [114]. In addition, Treg cells, which are involved in cancer growth by inhibiting anticancer immunity [115], were also highly correlated with LSM4, especially in the luminal A subclass. A previous report also demonstrated that Treg lymphocyte infiltration plays a role in metastatic BRCA [116], consistent with our findings. Furthermore, the HER2 and luminal A subtypes were also significantly and positively correlated with NK cells and cytotoxic lymphocytes, which participate in innate immunity and are capable of detecting and killing tumor cells, similar to findings by Verma et al. [117]. Interestingly, we found that high LSM4 expression was associated with myeloid-derived suppressor cells (MDSCs), which were suggested to constitute a tumor-favoring microenvironment [118,119] in basal, and luminal A and B subtypes. In a previous study, Chen et al. revealed the promoting role of expressing MDSCs in basal-like transition and metastasis of BRCA, which is similar to our report [120]. Among several signaling pathways, LSM4 was significantly correlated with "Beta catenin-dependent transcription regulation in colorectal cancer", consistent with a previous study [121]. LSM4 was found to be related to diverse cancer-type signaling pathways, such as "Stem cells aberrant Wnt signaling in medulloblastoma stem cells", "Mechanism of resistance to EGFR inhibitors in lung cancer", and "Mechanism of drug resistance in multiple myelomas", suggesting its important role in cancer development. Therefore, LSM4 is not only associated with BRCA, but is also involved in various other types of cancer.

Conclusions
In summary, while many observations have demonstrated that LSM2, LSM3, LSM7, and LSM14B play decisive roles in BRCA development, further assessments to verify these findings in BRCA tumors are warranted. Using meta-analysis and combbine with bioinformatics approach, our study suggested that among LSM family genes, LSM4 has prospective value and may serve as a new prognosticator and therapeutic target for BRCA treatment. The main drawback of the present study is the retrospective nature of transcriptomic analyis, which requires further confirmation in a larger prospective study to confirm whether LSM4 expression can be recognized as a useful biomarker in clinical practice.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/ 10.3390/cancers13194902/s1, Figure S1: LSM family genes expression in several cancer types, Figure S2: LSMs transcription levels in different stages of breast cancer (UALCAN database), Figure S3: LSMs DNA methylation level in patients with breast cancer versus healthy controls, Figure S4: Representative immunohistochemistry images of LSM2, LSM3, LSM4, LSM7, LSM14b in breast cancer patients, Figure S5: LSM4 expression between normal and tumor groups (n = 1222 patients from GDC database), Figure S6: Expression profiles of LSM4 in DEGs methods, Figure S7: DNA methylation clustered expression of LSM4 (TCGA dataset), Figure S8: Functional enrichment analysis with gene ontology (GO) terms and KEGG of differentially expressed genes (DEGs); Table S1: The basic characteristic of LSM4 gene on Oncomine database, Table S2: Statistical values of LSM family genes expression base on individual cancer stage, Table S3: Prognosis value of CpGs in LSM4 (MethSurv database), Table S4: Gene sets enriched in high expression LSM4, Table S5: Pathway analysis of LSM4-coexpressed genes from public breast cancer databases using the MetaCore database.